[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter describes the basic, ground-level functions for parsing and
handling. Covered here is parsing From
lines, removing comments
from header lines, decoding encoded words, parsing date headers and so
on. High-level functionality is dealt with in the first chapter
(see section 1. Decoding and Viewing).
4.1 rfc2045 | Encoding Content-Type headers. | |
4.2 rfc2231 | Parsing Content-Type headers. | |
4.3 ietf-drums | Handling mail headers defined by RFC822bis. | |
4.4 rfc2047 | En/decoding encoded words in headers. | |
4.5 time-date | Functions for parsing dates and manipulating time. | |
4.6 qp | Quoted-Printable en/decoding. | |
4.7 base64 | Base64 en/decoding. | |
4.8 binhex | Binhex decoding. | |
4.9 uudecode | Uuencode decoding. | |
4.10 yenc | Yenc decoding. | |
4.11 rfc1843 | Decoding HZ-encoded text. | |
4.12 mailcap | How parts are displayed is specified by the `.mailcap' file |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
RFC2045 is the "main" MIME document, and as such, one would imagine that there would be a lot to implement. But there isn't, since most of the implementation details are delegated to the subsequent RFCs.
So `rfc2045.el' has only a single function:
rfc2045-encode-string
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
RFC2231 defines a syntax for the Content-Type
and
Content-Disposition
headers. Its snappy name is MIME
Parameter Value and Encoded Word Extensions: Character Sets, Languages,
and Continuations.
In short, these headers look something like this:
Content-Type: application/x-stuff; title*0*=us-ascii'en'This%20is%20even%20more%20; title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!" |
They usually aren't this bad, though.
The following functions are defined by this library:
rfc2231-parse-string
Content-Type
header and return a list describing its
elements.
(rfc2231-parse-string "application/x-stuff; title*0*=us-ascii'en'This%20is%20even%20more%20; title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2=\"isn't it!\"") => ("application/x-stuff" (title . "This is even more ***fun*** isn't it!")) |
rfc2231-get-value
rfc2231-encode-string
Content-Type
and
Content-Disposition
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
drums is an IETF working group that is working on the replacement for RFC822.
The functions provided by this library include:
ietf-drums-remove-comments
ietf-drums-remove-whitespace
ietf-drums-get-comment
ietf-drums-parse-address
ietf-drums-parse-addresses
ietf-drums-parse-date
ietf-drums-narrow-to-header
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
RFC2047 (Message Header Extensions for Non-ASCII Text) specifies how non-ASCII text in headers are to be encoded. This is actually rather complicated, so a number of variables are necessary to tweak what this library does.
The following variables are tweakable:
rfc2047-header-encoding-alist
The keys can either be header regexps, or t
.
The values can be nil
, in which case the header(s) in question
won't be encoded, mime
, which means that they will be encoded, or
address-mime
, which means the header(s) will be encoded carefully
assuming they contain addresses.
rfc2047-charset-encoding-alist
Q
(a
Quoted-Printable-like encoding) and B
(base64). This alist
specifies which charset should use which encoding.
rfc2047-encode-function-alist
Q
, B
and nil
.
rfc2047-encoded-word-regexp
rfc2047-encode-encoded-words
Those were the variables, and these are this functions:
rfc2047-narrow-to-field
rfc2047-encode-message-header
rfc2047-header-encoding-alist
.
rfc2047-encode-region
rfc2047-encode-string
rfc2047-decode-region
rfc2047-decode-string
rfc2047-encode-parameter
rfc2231-encode-string
function. See section 4.2 rfc2231.
When attaching files as MIME parts, we should use the RFC2231 encoding to specify the file names containing non-ASCII characters. However, many mail softwares don't support it in practice and recipients won't be able to extract files with correct names. Instead, the RFC2047-like encoding is acceptable generally. This function provides the very RFC2047-like encoding, resigning to such a regrettable trend. To use it, put the following line in your `~/.gnus.el' file:
(defalias 'mail-header-encode-parameter 'rfc2047-encode-parameter) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
While not really a part of the MIME library, it is convenient to
document this library here. It deals with parsing Date
headers
and manipulating time. (Not by using tesseracts, though, I'm sorry to
say.)
These functions convert between five formats: A date string, an Emacs time structure, a decoded time list, a second number, and a day number.
Here's a bunch of time/date/second/day examples:
(parse-time-string "Sat Sep 12 12:21:54 1998 +0200") => (54 21 12 12 9 1998 6 nil 7200) (date-to-time "Sat Sep 12 12:21:54 1998 +0200") => (13818 19266) (time-to-seconds '(13818 19266)) => 905595714.0 (seconds-to-time 905595714.0) => (13818 19266 0) (time-to-days '(13818 19266)) => 729644 (days-to-time 729644) => (961933 65536) (time-since '(13818 19266)) => (0 430) (time-less-p '(13818 19266) '(13818 19145)) => nil (subtract-time '(13818 19266) '(13818 19145)) => (0 121) (days-between "Sat Sep 12 12:21:54 1998 +0200" "Sat Sep 07 12:21:54 1998 +0200") => 5 (date-leap-year-p 2000) => t (time-to-day-in-year '(13818 19266)) => 255 (time-to-number-of-days (time-since (date-to-time "Mon, 01 Jan 2001 02:22:26 GMT"))) => 4.146122685185185 |
And finally, we have safe-date-to-time
, which does the same as
date-to-time
, but returns a zero time if the date is
syntactically malformed.
The five data representations used are the following:
"Sat Sep 12
12:21:54 1998 +0200"
.
(13818 26466)
.
905595714.0
.
729644
.
(54 21 12 12 9 1998 6 t
7200)
.
All the examples above represent the same moment.
These are the functions available:
date-to-time
time-to-seconds
seconds-to-time
time-to-days
days-to-time
date-to-day
time-to-number-of-days
safe-date-to-time
time-less-p
time-since
subtract-time
days-between
date-leap-year-p
time-to-day-in-year
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This library deals with decoding and encoding Quoted-Printable text.
Very briefly explained, qp encoding means translating all 8-bit characters (and lots of control characters) into things that look like `=EF'; that is, an equal sign followed by the byte encoded as a hex string.
The following functions are defined by the library:
quoted-printable-decode-region
quoted-printable-decode-string
quoted-printable-encode-region
quoted-printable-encode-string
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Base64 is an encoding that encodes three bytes into four characters, thereby increasing the size by about 33%. The alphabet used for encoding is very resistant to mangling during transit.
The following functions are defined by this library:
base64-encode-region
base64-encode-string
base64-decode-region
nil
and don't
modify the buffer.
base64-decode-string
nil
is returned.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
binhex
is an encoding that originated in Macintosh environments.
The following function is supplied to deal with these:
binhex-decode-region
binhex
header and return the filename.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
uuencode
is probably still the most popular encoding of binaries
used on Usenet, although base64
rules the mail world.
The following function is supplied by this package:
uudecode-decode-region
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
yenc
is used for encoding binaries on Usenet. The following
function is supplied by this package:
yenc-decode-region
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
RFC1843 deals with mixing Chinese and ASCII characters in messages. In essence, RFC1843 switches between ASCII and Chinese by doing this:
This sentence is in ASCII. The next sentence is in GB.~{<:Ky2;S{#,NpJ)l6HK!#~}Bye. |
Simple enough, and widely used in China.
The following functions are available to handle this encoding:
rfc1843-decode-region
rfc1843-decode-string
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The `~/.mailcap' file is parsed by most MIME-aware message handlers and describes how elements are supposed to be displayed. Here's an example file:
image/*; gimp -8 %s audio/wav; wavplayer %s application/msword; catdoc %s ; copiousoutput ; nametemplate=%s.doc |
This says that all image files should be displayed with gimp
,
that WAVE audio files should be played by wavplayer
, and that
MS-WORD files should be inlined by catdoc
.
The mailcap
library parses this file, and provides functions for
matching types.
mailcap-mime-data
Interface functions:
mailcap-parse-mailcaps
mailcap-mime-info
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |