Sunday, September 09, 2007

Email and MIME(2)

To support non-English languages and transmission of non-text data(image, video clip...), MIME is defined(RFC 2045 through RFC 1049).
Several more headers are added beyond those defined in RFC 2822:
(1) A MIME-Version header field, which uses a version
number to declare a message to be conformant with MIME
and allows mail processing agents to distinguish
between such messages and those generated by older or
non-conformant software, which are presumed to lack
such a field.

(2) A Content-Type header field, generalized from RFC 1049,
which can be used to specify the media type and subtype
of data in the body of a message and to fully specify
the native representation (canonical form) of such
data.

(3) A Content-Transfer-Encoding header field, which can be
used to specify both the encoding transformation that
was applied to the body and the domain of the result.
Encoding transformations other than the identity
transformation are usually applied to data in order to
allow it to pass through mail transport mechanisms
which may have data or character set limitations.

(4) Two additional header fields that can be used to
further describe the data in a body, the Content-ID and
Content-Description header fields.

Data type:
7bit Data
"7bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]. No octets with decimal values greater than 127
are allowed and neither are NULs (octets with decimal value 0). CR
(decimal value 13) and LF (decimal value 10) octets only occur as
part of CRLF line separation sequences.

8bit Data
"8bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]), but octets with decimal values greater than 127
may be used. As with "7bit data" CR and LF octets only occur as part
of CRLF line separation sequences and no NULs are allowed.

Binary Data
"Binary data" refers to data where any sequence of octets whatsoever
is allowed.
Formal Description of MIME headers:
entity-headers := [ content CRLF ][ encoding CRLF ][ id CRLF ][ description CRLF ]*( MIME-extension-field CRLF )

MIME-message-headers := entity-headers fields
version CRLF
; The ordering of the header
; fields implied by this BNF
; definition should be ignored.

MIME-part-headers := entity-headers[ fields ]
; Any field not beginning with
; "content-" can have no defined
; meaning and may be ignored.
; The ordering of the header
; fields implied by this BNF
; definition should be ignored.
Header explanation
MIME-Version:
For example:MIME-Version: 1.0
Note that the MIME-Version header field is required at the top level of a message. It is not required for each body part of a multipart entity. It is required for the embedded headers of a body of type "message/rfc822" or "message/partial" if and only if the embedded message is itself claimed to be MIME-conformant.
It is also worth noting that version control for specific media types is not accomplished using the MIME-Version mechanism. In particular, some formats (such as application/postscript) have version numbering conventions that are internal to the media format. Where such conventions exist, MIME does nothing to supersede them. Where no such conventions exist, a MIME media type might use a "version" parameter in the content-type field if necessary.
Content-Type
The Content-Type header field specifies the nature of the data in the body of an entity by giving media type and subtype identifiers, and by providing auxiliary information that may be required for certain media types. After the media type and subtype names, the remainder of the header field is simply a set of parameters, specified in an attribute=value notation. The ordering of parameters is not significant. The set of meaningful parameters depends on the media type and subtype.
For example, the "charset" parameter is applicable to any subtype of "text", while the "boundary" parameter is required for any subtype of the "multipart" media type.
If another top-level type is to be used for any reason, it must be given a name starting with "X-" to indicate its non-standard status and to avoid a potential conflict with a future official name.
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
; Matching of media type and subtype
; is ALWAYS case-insensitive.

type := discrete-type / composite-type

discrete-type := "text" / "image" / "audio" / "video" /
"application" / extension-token

composite-type := "message" / "multipart" / extension-token

extension-token := ietf-token / x-token

lt;ietf-token := <An extension token defined by a
standards-track RFC and registered
with IANA.>

x-token := <The two characters "X-" or "x-" followed, with
no intervening white space, by any token>

subtype := extension-token / iana-token

iana-token := <A publicly-defined extension token. Tokens
of this form must be registered with IANA
as specified in RFC 2048.>

parameter := attribute "=" value

attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.

value := token / quoted-string

token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>

tspecials := "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value.
Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset="us-ascii"
are the equivalent.
If no Content-Type header field is specified, default setting is:
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding
encoding := "Content-Transfer-Encoding" ":" mechanism
mechanism := "7bit" / "8bit" / "binary" /"quoted-printable" / "base64" /ietf-token / x-token
Default value is: 7bit.
Three transformations are currently defined: identity, the "quoted-printable" encoding, and the "base64" encoding. The domains are "binary", "8bit" and "7bit".
The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all mean that the identity (i.e. NO) encoding transformation has been performed. As such, they serve simply as indicators of the domain of the body data, and provide useful information about the sort of encoding that might be needed for transmission in a given transport system.
Implementors may, if necessary, define private Content-Transfer-Encoding values, but must use an x-token, which is a name prefixed by "X-", to indicate its non-standard status, e.g., "Content-Transfer- Encoding: x-my-new-encoding".
Currently the only composite media types are "multipart" and "message". All encodings that are desired for bodies of type multipart or message must be done at the innermost level, by encoding the actual body that needs to be encoded. In other words, recursive encoding is not allowed.
Content-ID
syntactically
identical to the "Message-ID" header field:
id := "Content-ID" ":" msg-id
Like the Message-ID values, Content-ID values must be generated to be world-unique.
Content-Description
simple content description

All MIME headers:
MIME-Version
Content-ID
Content-Description
Content-Transfer-Encoding
Content-Type
Content-Base
Content-Location
Content-features
Content-Disposition
Content-Language
Content-Alternative
Content-MD5
Content-Duration

No comments: