[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4. Basic Functions

This chapter describes the basic, ground-level functions for parsing and handling. Covered here is parsing From lines, removing comments from header lines, decoding encoded words, parsing date headers and so on. High-level functionality is dealt with in the first chapter (see section 1. Decoding and Viewing).

4.1 rfc2045  Encoding Content-Type headers.
4.2 rfc2231  Parsing Content-Type headers.
4.3 ietf-drums  Handling mail headers defined by RFC822bis.
4.4 rfc2047  En/decoding encoded words in headers.
4.5 time-date  Functions for parsing dates and manipulating time.
4.6 qp  Quoted-Printable en/decoding.
4.7 base64  Base64 en/decoding.
4.8 binhex  Binhex decoding.
4.9 uudecode  Uuencode decoding.
4.10 yenc  Yenc decoding.
4.11 rfc1843  Decoding HZ-encoded text.
4.12 mailcap  How parts are displayed is specified by the `.mailcap' file


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 rfc2045

RFC2045 is the "main" MIME document, and as such, one would imagine that there would be a lot to implement. But there isn't, since most of the implementation details are delegated to the subsequent RFCs.

So `rfc2045.el' has only a single function:

rfc2045-encode-string
Takes a parameter and a value and returns a `PARAM=VALUE' string. value will be quoted if there are non-safe characters in it.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2 rfc2231

RFC2231 defines a syntax for the Content-Type and Content-Disposition headers. Its snappy name is MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations.

In short, these headers look something like this:

 
Content-Type: application/x-stuff;
 title*0*=us-ascii'en'This%20is%20even%20more%20;
 title*1*=%2A%2A%2Afun%2A%2A%2A%20;
 title*2="isn't it!"

They usually aren't this bad, though.

The following functions are defined by this library:

rfc2231-parse-string
Parse a Content-Type header and return a list describing its elements.

 
(rfc2231-parse-string
 "application/x-stuff;
 title*0*=us-ascii'en'This%20is%20even%20more%20;
 title*1*=%2A%2A%2Afun%2A%2A%2A%20;
 title*2=\"isn't it!\"")
=> ("application/x-stuff"
    (title . "This is even more ***fun*** isn't it!"))

rfc2231-get-value
Takes one of the lists on the format above and returns the value of the specified attribute.

rfc2231-encode-string
Encode a parameter in headers likes Content-Type and Content-Disposition.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3 ietf-drums

drums is an IETF working group that is working on the replacement for RFC822.

The functions provided by this library include:

ietf-drums-remove-comments
Remove the comments from the argument and return the results.

ietf-drums-remove-whitespace
Remove linear white space from the string and return the results. Spaces inside quoted strings and comments are left untouched.

ietf-drums-get-comment
Return the last most comment from the string.

ietf-drums-parse-address
Parse an address string and return a list that contains the mailbox and the plain text name.

ietf-drums-parse-addresses
Parse a string that contains any number of comma-separated addresses and return a list that contains mailbox/plain text pairs.

ietf-drums-parse-date
Parse a date string and return an Emacs time structure.

ietf-drums-narrow-to-header
Narrow the buffer to the header section of the current buffer.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4 rfc2047

RFC2047 (Message Header Extensions for Non-ASCII Text) specifies how non-ASCII text in headers are to be encoded. This is actually rather complicated, so a number of variables are necessary to tweak what this library does.

The following variables are tweakable:

rfc2047-header-encoding-alist
This is an alist of header / encoding-type pairs. Its main purpose is to prevent encoding of certain headers.

The keys can either be header regexps, or t.

The values can be nil, in which case the header(s) in question won't be encoded, mime, which means that they will be encoded, or address-mime, which means the header(s) will be encoded carefully assuming they contain addresses.

rfc2047-charset-encoding-alist
RFC2047 specifies two forms of encoding---Q (a Quoted-Printable-like encoding) and B (base64). This alist specifies which charset should use which encoding.

rfc2047-encode-function-alist
This is an alist of encoding / function pairs. The encodings are Q, B and nil.

rfc2047-encoded-word-regexp
When decoding words, this library looks for matches to this regexp.

rfc2047-encode-encoded-words
The boolean variable specifies whether encoded words (e.g. `=?hello?=') should be encoded again.

Those were the variables, and these are this functions:

rfc2047-narrow-to-field
Narrow the buffer to the header on the current line.

rfc2047-encode-message-header
Should be called narrowed to the header of a message. Encodes according to rfc2047-header-encoding-alist.

rfc2047-encode-region
Encodes all encodable words in the region specified.

rfc2047-encode-string
Encode a string and return the results.

rfc2047-decode-region
Decode the encoded words in the region.

rfc2047-decode-string
Decode a string and return the results.

rfc2047-encode-parameter
Encode a parameter in the RFC2047-like style. This is a replacement for the rfc2231-encode-string function. See section 4.2 rfc2231.

When attaching files as MIME parts, we should use the RFC2231 encoding to specify the file names containing non-ASCII characters. However, many mail softwares don't support it in practice and recipients won't be able to extract files with correct names. Instead, the RFC2047-like encoding is acceptable generally. This function provides the very RFC2047-like encoding, resigning to such a regrettable trend. To use it, put the following line in your `~/.gnus.el' file:

 
(defalias 'mail-header-encode-parameter 'rfc2047-encode-parameter)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.5 time-date

While not really a part of the MIME library, it is convenient to document this library here. It deals with parsing Date headers and manipulating time. (Not by using tesseracts, though, I'm sorry to say.)

These functions convert between five formats: A date string, an Emacs time structure, a decoded time list, a second number, and a day number.

Here's a bunch of time/date/second/day examples:

 
(parse-time-string "Sat Sep 12 12:21:54 1998 +0200")
=> (54 21 12 12 9 1998 6 nil 7200)

(date-to-time "Sat Sep 12 12:21:54 1998 +0200")
=> (13818 19266)

(time-to-seconds '(13818 19266))
=> 905595714.0

(seconds-to-time 905595714.0)
=> (13818 19266 0)

(time-to-days '(13818 19266))
=> 729644

(days-to-time 729644)
=> (961933 65536)

(time-since '(13818 19266))
=> (0 430)

(time-less-p '(13818 19266) '(13818 19145))
=> nil

(subtract-time '(13818 19266) '(13818 19145))
=> (0 121)

(days-between "Sat Sep 12 12:21:54 1998 +0200"
              "Sat Sep 07 12:21:54 1998 +0200")
=> 5

(date-leap-year-p 2000)
=> t

(time-to-day-in-year '(13818 19266))
=> 255

(time-to-number-of-days
 (time-since
  (date-to-time "Mon, 01 Jan 2001 02:22:26 GMT")))
=> 4.146122685185185

And finally, we have safe-date-to-time, which does the same as date-to-time, but returns a zero time if the date is syntactically malformed.

The five data representations used are the following:

date
An RFC822 (or similar) date string. For instance: "Sat Sep 12 12:21:54 1998 +0200".

time
An internal Emacs time. For instance: (13818 26466).

seconds
A floating point representation of the internal Emacs time. For instance: 905595714.0.

days
An integer number representing the number of days since 00000101. For instance: 729644.

decoded time
A list of decoded time. For instance: (54 21 12 12 9 1998 6 t 7200).

All the examples above represent the same moment.

These are the functions available:

date-to-time
Take a date and return a time.

time-to-seconds
Take a time and return seconds.

seconds-to-time
Take seconds and return a time.

time-to-days
Take a time and return days.

days-to-time
Take days and return a time.

date-to-day
Take a date and return days.

time-to-number-of-days
Take a time and return the number of days that represents.

safe-date-to-time
Take a date and return a time. If the date is not syntactically valid, return a "zero" time.

time-less-p
Take two times and say whether the first time is less (i. e., earlier) than the second time.

time-since
Take a time and return a time saying how long it was since that time.

subtract-time
Take two times and subtract the second from the first. I. e., return the time between the two times.

days-between
Take two days and return the number of days between those two days.

date-leap-year-p
Take a year number and say whether it's a leap year.

time-to-day-in-year
Take a time and return the day number within the year that the time is in.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6 qp

This library deals with decoding and encoding Quoted-Printable text.

Very briefly explained, qp encoding means translating all 8-bit characters (and lots of control characters) into things that look like `=EF'; that is, an equal sign followed by the byte encoded as a hex string.

The following functions are defined by the library:

quoted-printable-decode-region
QP-decode all the encoded text in the specified region.

quoted-printable-decode-string
Decode the QP-encoded text in a string and return the results.

quoted-printable-encode-region
QP-encode all the encodable characters in the specified region. The third optional parameter fold specifies whether to fold long lines. (Long here means 72.)

quoted-printable-encode-string
QP-encode all the encodable characters in a string and return the results.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.7 base64

Base64 is an encoding that encodes three bytes into four characters, thereby increasing the size by about 33%. The alphabet used for encoding is very resistant to mangling during transit.

The following functions are defined by this library:

base64-encode-region
base64 encode the selected region. Return the length of the encoded text. Optional third argument no-line-break means do not break long lines into shorter lines.

base64-encode-string
base64 encode a string and return the result.

base64-decode-region
base64 decode the selected region. Return the length of the decoded text. If the region can't be decoded, return nil and don't modify the buffer.

base64-decode-string
base64 decode a string and return the result. If the string can't be decoded, nil is returned.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.8 binhex

binhex is an encoding that originated in Macintosh environments. The following function is supplied to deal with these:

binhex-decode-region
Decode the encoded text in the region. If given a third parameter, only decode the binhex header and return the filename.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.9 uudecode

uuencode is probably still the most popular encoding of binaries used on Usenet, although base64 rules the mail world.

The following function is supplied by this package:

uudecode-decode-region
Decode the text in the region.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.10 yenc

yenc is used for encoding binaries on Usenet. The following function is supplied by this package:

yenc-decode-region
Decode the encoded text in the region.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.11 rfc1843

RFC1843 deals with mixing Chinese and ASCII characters in messages. In essence, RFC1843 switches between ASCII and Chinese by doing this:

 
This sentence is in ASCII.
The next sentence is in GB.~{<:Ky2;S{#,NpJ)l6HK!#~}Bye.

Simple enough, and widely used in China.

The following functions are available to handle this encoding:

rfc1843-decode-region
Decode HZ-encoded text in the region.

rfc1843-decode-string
Decode a HZ-encoded string and return the result.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.12 mailcap

The `~/.mailcap' file is parsed by most MIME-aware message handlers and describes how elements are supposed to be displayed. Here's an example file:

 
image/*; gimp -8 %s
audio/wav; wavplayer %s
application/msword; catdoc %s ; copiousoutput ; nametemplate=%s.doc

This says that all image files should be displayed with gimp, that WAVE audio files should be played by wavplayer, and that MS-WORD files should be inlined by catdoc.

The mailcap library parses this file, and provides functions for matching types.

mailcap-mime-data
This variable is an alist of alists containing backup viewing rules.

Interface functions:

mailcap-parse-mailcaps
Parse the `~/.mailcap' file.

mailcap-mime-info
Takes a MIME type as its argument and returns the matching viewer.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by XEmacs Webmaster on October, 2 2007 using texi2html