Tuesday, September 25, 2007

GRID introduction

GRID computing is becoming more and more important in both computer science community and theoretic science community. Traditional science fields (physics, biology...) have huge amount of data to be processed so that computer resources owned by a single organization can not satisfy the requirements. So grid computing was come up with to solve the problem.

Currently, most promising grid middleware includes: Globus, GLite, Unicore, Crown, (OpenPBS, LSF).
Condor is also a good distributed computing project which is widely used.
Globus is based on OGSA(Open Grid Services Architecture).

OMII-Europe aims to provide key software componenets for building e-inforastructure. Currently, it focuses on providing common interfaces and integration of major Grid software infrastructures. Its goals include interoperability of gLite/UNICORE/Globus/CROWNgrid.

ProActive is an Open Source Java library (LGPL) for parallel, distributed, and multi-threaded computing, also featuring mobility and security in a uniform framework. With a reduced set of simple primitives, ProActive provides a comprehensive toolkit that simplifies the programming of applications distributed on Local Area Networks (LANs), Clusters, Internet Grids and Peer-to-Peer Intranets.
ProActive deploys applications seamlessly on Local Area Networks (LANs), Wide Area Network (WAN), desktops, clusters, parallel machines, and data-centers, using de facto industry standards such as LSF, PBS, Globus and Unicore, or just ssh. ProActive does not require intrusive installation and enterprise IT infrastructure to be modified.

Monday, September 10, 2007

How to find the RFC document you want

RFC3700 defines the Internet Official Protocol Standards which lists all the RFC documents. By searching in this document, you can find the RFC document you want.

Email - POP3

Post Office Protocol3 is defined in RFC 1939. RFC 2249 defines extension to RFC 1939.
Default port is 110.
Request:
Commands in the POP3 consist of a case-insensitive keyword, possibly followed by one or more arguments. All commands are terminated by a CRLF pair. Keywords and arguments consist of printable ASCII characters. Keywords and arguments are each separated by a single SPACE character. Keywords are three or four characters long. Each argument may be up to 40 characters long.
Response
Responses in the POP3 consist of a status indicator and a keyword possibly followed by additional information. All responses are terminated by a CRLF pair. Responses may be up to 512 characters long, including the terminating CRLF. There are currently two status indicators: positive ("+OK") and negative ("-ERR"). Servers MUST send the "+OK" and "-ERR" in upper case.
Life Cycle
A POP3 session progresses through a number of states during its lifetime. Once the TCP connection has been opened and the POP3 server has sent the greeting, the session enters the AUTHORIZATION state. In this state, the client must identify itself to the POP3 server. Once the client has successfully done this, the server acquires resources associated with the client's maildrop, and the session enters the TRANSACTION state. In this state, the client requests actions on the part of the POP3 server. When the client has issued the QUIT command, the session enters the UPDATE state. In this state, the POP3 server releases any resources acquired during the TRANSACTION state and says goodbye. The TCP connection is then closed.
AUTHORIZATION
Once the TCP connection has been opened by a POP3 client, the POP3 server issues a one line greeting. This can be any positive response. An example might be:
S: +OK POP3 server ready
The POP3 session is now in the AUTHORIZATION state. The client must now identify and authenticate itself to the POP3 server. Two possible mechanisms for doing this are described in this document, the USER and PASS command combination and the APOP command.
TRANSACTION
Commands:
STAT
The POP3 server issues a positive response with a line containing information for the maildrop. This line is called a "drop listing" for that maildrop. The positive response consists of "+OK" followed by a single space, the number of messages in the maildrop, a single space, and the size of the maildrop in octets. This memo makes no requirement on what follows the maildrop size.
Examples:
     C: STAT
     S: +OK 2 320

LIST [msg]
Argument: a message-number (optional), which, if present, may NOT refer to a message marked as deleted. If an argument was given and the POP3 server issues a positive response with a line containing information for that message. This line is called a "scan listing" for that message.
If no argument was given and the POP3 server issues a positive response, then the response given is multi-line. After the initial +OK, for each message in the maildrop, the POP3 server responds with a line containing
information for that message. This line is also called a "scan listing" for that message.
Examples:
C: LIST
S: +OK 2 messages (320 octets)
S: 1 120
S: 2 200
S: .
...
C: LIST 2
S: +OK 2 200
... C: LIST 3
S: -ERR no such message, only 2 messages in maildrop
RETR msg
If the POP3 server issues a positive response, then the response given is multi-line. After the initial +OK, the POP3 server sends the message corresponding to the given message-number, being careful to byte-stuff the termination character (as with all multi-line responses).
Example:
Examples:
C: RETR 1
S: +OK 120 octets
S:
S: .
DELE msg
The POP3 server marks the message as deleted. Any future reference to the message-number associated with the message in a POP3 command generates an error. The POP3 server does not actually delete the message until the POP3 session enters the UPDATE state.
Examples:
C: DELE 1
S: +OK message 1 deleted
...
C: DELE 2
S: -ERR message 2 already deleted
NOOP
The POP3 server does nothing, it merely replies with a positive response.
RSET
If any messages have been marked as deleted by the POP3 server, they are unmarked. The POP3 server then replies with a positive response.
UPDATE
When the client issues the QUIT command from the TRANSACTION state, the POP3 session enters the UPDATE state. (Note that if the client issues the QUIT command from the AUTHORIZATION state, the POP3 session terminates but does NOT enter the UPDATE state.)
QUIT
The POP3 server removes all messages marked as deleted from the maildrop and replies as to the status of this operation.
Optional Commands
TOP msg n
return top n lines in mail msg.
UIDL
It is called a "unique-id listing" for that message. A unique-id listing consists of the message-number of the message, followed by a single space and the unique-id of the message. No information follows the unique-id in the unique-id listing.
USER name
To authenticate using the USER and PASS command combination, the client must first issue the USER command.
APOP name digest.
A POP3 server which implements the APOP command will include a timestamp in its banner greeting. The syntax of the timestamp corresponds to the `msg-id' in [RFC822], and MUST be different each time the POP3 server issues a banner greeting.

Email and MIME(4): MIME type specification

RFC 2046. Multipart: mixed: alternative:

Email and MIME(5): Canonical model

RFC2049

Sunday, September 09, 2007

Email and MIME(3): message header extension

RFC 2047. In previous RFC document, the values of field name and field body are limited to US Ansi characters. The extension to message headers allows users to use their own language(e.g. Chinese) as values of headers. More detail is in RFC 2047. An 'encoded-word' is defined by the following ABNF grammar: encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" charset = token ; see section 3 encoding = token ; see section 4 token = 1* especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / " <"> / "/" / "[" / "]" / "?" / "." / "=" encoded-text = 1*<Any printable ASCII character other than "?" or SPACE> ; (but see "Use of encoded-words in message ; headers", section 5) The "?" character is used within an 'encoded-word' to separate the various portions of the 'encoded-word' from one another, and thus cannot appear in the 'encoded-text' portion. An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used. While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters. unencoded white space characters (such as SPACE and HTAB) are FORBIDDEN within an 'encoded-word'. For example, the character sequence =?iso-8859-1?q?this is some text?= would be parsed as four 'atom's, rather than as a single 'atom' (by an RFC 822 parser) or 'encoded-word' (by a parser which understands 'encoded-words'). The correct way to encode the string "this is some text" is to encode the SPACE characters as well, e.g. =?iso-8859-1?q?this=20is=20some=20text?= These are the ONLY locations where an 'encoded-word' may appear. In particular: + An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'. + An 'encoded-word' MUST NOT appear within a 'quoted-string'. + An 'encoded-word' MUST NOT be used in a Received header field. + An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'. Initially, the legal values for "encoding" are "Q" and "B". These encodings are described below. The "Q" encoding is recommended for use when most of the characters to be encoded are in the ASCII character set; otherwise, the "B" encoding should be used.

Email and MIME(2)

To support non-English languages and transmission of non-text data(image, video clip...), MIME is defined(RFC 2045 through RFC 1049).
Several more headers are added beyond those defined in RFC 2822:
(1) A MIME-Version header field, which uses a version
number to declare a message to be conformant with MIME
and allows mail processing agents to distinguish
between such messages and those generated by older or
non-conformant software, which are presumed to lack
such a field.

(2) A Content-Type header field, generalized from RFC 1049,
which can be used to specify the media type and subtype
of data in the body of a message and to fully specify
the native representation (canonical form) of such
data.

(3) A Content-Transfer-Encoding header field, which can be
used to specify both the encoding transformation that
was applied to the body and the domain of the result.
Encoding transformations other than the identity
transformation are usually applied to data in order to
allow it to pass through mail transport mechanisms
which may have data or character set limitations.

(4) Two additional header fields that can be used to
further describe the data in a body, the Content-ID and
Content-Description header fields.

Data type:
7bit Data
"7bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]. No octets with decimal values greater than 127
are allowed and neither are NULs (octets with decimal value 0). CR
(decimal value 13) and LF (decimal value 10) octets only occur as
part of CRLF line separation sequences.

8bit Data
"8bit data" refers to data that is all represented as relatively
short lines with 998 octets or less between CRLF line separation
sequences [RFC-821]), but octets with decimal values greater than 127
may be used. As with "7bit data" CR and LF octets only occur as part
of CRLF line separation sequences and no NULs are allowed.

Binary Data
"Binary data" refers to data where any sequence of octets whatsoever
is allowed.
Formal Description of MIME headers:
entity-headers := [ content CRLF ][ encoding CRLF ][ id CRLF ][ description CRLF ]*( MIME-extension-field CRLF )

MIME-message-headers := entity-headers fields
version CRLF
; The ordering of the header
; fields implied by this BNF
; definition should be ignored.

MIME-part-headers := entity-headers[ fields ]
; Any field not beginning with
; "content-" can have no defined
; meaning and may be ignored.
; The ordering of the header
; fields implied by this BNF
; definition should be ignored.
Header explanation
MIME-Version:
For example:MIME-Version: 1.0
Note that the MIME-Version header field is required at the top level of a message. It is not required for each body part of a multipart entity. It is required for the embedded headers of a body of type "message/rfc822" or "message/partial" if and only if the embedded message is itself claimed to be MIME-conformant.
It is also worth noting that version control for specific media types is not accomplished using the MIME-Version mechanism. In particular, some formats (such as application/postscript) have version numbering conventions that are internal to the media format. Where such conventions exist, MIME does nothing to supersede them. Where no such conventions exist, a MIME media type might use a "version" parameter in the content-type field if necessary.
Content-Type
The Content-Type header field specifies the nature of the data in the body of an entity by giving media type and subtype identifiers, and by providing auxiliary information that may be required for certain media types. After the media type and subtype names, the remainder of the header field is simply a set of parameters, specified in an attribute=value notation. The ordering of parameters is not significant. The set of meaningful parameters depends on the media type and subtype.
For example, the "charset" parameter is applicable to any subtype of "text", while the "boundary" parameter is required for any subtype of the "multipart" media type.
If another top-level type is to be used for any reason, it must be given a name starting with "X-" to indicate its non-standard status and to avoid a potential conflict with a future official name.
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
; Matching of media type and subtype
; is ALWAYS case-insensitive.

type := discrete-type / composite-type

discrete-type := "text" / "image" / "audio" / "video" /
"application" / extension-token

composite-type := "message" / "multipart" / extension-token

extension-token := ietf-token / x-token

lt;ietf-token := <An extension token defined by a
standards-track RFC and registered
with IANA.>

x-token := <The two characters "X-" or "x-" followed, with
no intervening white space, by any token>

subtype := extension-token / iana-token

iana-token := <A publicly-defined extension token. Tokens
of this form must be registered with IANA
as specified in RFC 2048.>

parameter := attribute "=" value

attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.

value := token / quoted-string

token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>

tspecials := "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value.
Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset="us-ascii"
are the equivalent.
If no Content-Type header field is specified, default setting is:
Content-type: text/plain; charset=us-ascii
Content-Transfer-Encoding
encoding := "Content-Transfer-Encoding" ":" mechanism
mechanism := "7bit" / "8bit" / "binary" /"quoted-printable" / "base64" /ietf-token / x-token
Default value is: 7bit.
Three transformations are currently defined: identity, the "quoted-printable" encoding, and the "base64" encoding. The domains are "binary", "8bit" and "7bit".
The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all mean that the identity (i.e. NO) encoding transformation has been performed. As such, they serve simply as indicators of the domain of the body data, and provide useful information about the sort of encoding that might be needed for transmission in a given transport system.
Implementors may, if necessary, define private Content-Transfer-Encoding values, but must use an x-token, which is a name prefixed by "X-", to indicate its non-standard status, e.g., "Content-Transfer- Encoding: x-my-new-encoding".
Currently the only composite media types are "multipart" and "message". All encodings that are desired for bodies of type multipart or message must be done at the innermost level, by encoding the actual body that needs to be encoded. In other words, recursive encoding is not allowed.
Content-ID
syntactically
identical to the "Message-ID" header field:
id := "Content-ID" ":" msg-id
Like the Message-ID values, Content-ID values must be generated to be world-unique.
Content-Description
simple content description

All MIME headers:
MIME-Version
Content-ID
Content-Description
Content-Transfer-Encoding
Content-Type
Content-Base
Content-Location
Content-features
Content-Disposition
Content-Language
Content-Alternative
Content-MD5
Content-Duration

Email and MIME(1)

Note: Most of content in this article is cited from RFC document(http://tools.ietf.org/html/rfc2822).

Format of Internet Message is defined in RFC 2822(old ones are RFC822 and RFC733).
MIME is defined by in RFC2045 through RFC2049.
Internet Message consists of two parts: Header and Body.
Standard header fields of mail and MIME are recorded in RFC4021 which actually is collection of header fields defined in other RFC documents(e.g. RFC 2822). So it is a quick reference programmers can look up.

Basics:
At the most basic level, a message is a series of characters. A message that is conformant with this RFC2822 is comprised of characters with values in the range 1 through 127 and interpreted as US-ASCII characters. Messages are divided into lines of characters. A line is a series of characters that is delimited with the two characters carriage-return and line-feed; that is, the carriage return (CR) character (ASCII value 13) followed immediately by the line feed (LF) character (ASCII value 10). (The carriage-return/line-feed pair is usually written document as "CRLF".)
A message consists of header fields (collectively called "the header of the message") followed, optionally, by a body. The header is a sequence of lines of characters with special syntax as defined in RFC2822. The body is simply a sequence of characters that follows the header and is separated from the header by an empty line(i.e., a line with nothing preceding the CRLF).

Header Fields
Format: field-name:field-body

Header fields are lines composed of a field name, followed by a colon(":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e.,characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding".

For convenience however, and to deal with the 998/78 character limitations per line,
the field body portion of a header field can be split into a multiple line epresentation; this is called "folding". The general rule is that wherever this standard allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP.

For example, the header field:
Subject: This is a test
can be represented as:
Subject: This
is a test

Note: Though structured field bodies are defined in such a way that folding can take place between many of the lexical tokens (and even within some of the lexical tokens), folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks. For instance, if a field body is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items in preference to other places where the field could be folded, even if it is allowed elsewhere.

The process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is ccomplished by simply removing any CRLF that is immediately followed by WSP. Each header field should be treated in its unfolded form for further syntactic and semantic evaluation.
Body
The body of a message is simply lines of US-ASCII characters. The only two limitations on the body are as follows:
- CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body.
- Lines of characters in the body MUST be limited to 998 characters, and SHOULD be limited to 78 characters, excluding the CRLF.
Token definition:
NO-WS-CTL       =       %d1-8 /         ; US-ASCII control characters
%d11 / ; that do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and white space characters
%d127
text = %d1-9 / ; Characters excluding CR and LF
%d11 /
%d12 /
%d14-127 /
obs-text

specials = "(" / ")" / ; Special characters used in
"<" / ">" / ; other parts of the syntax
"[" / "]" /
":" / ";" /
"@" / "\" /
"," / "." /
DQUOTE
How to escape scharacter
Some characters are reserved for special interpretation, such as delimiting lexical tokens. To permit use of these characters as uninterpreted data, a quoting mechanism is provided.

quoted-pair = ("\" text) / obs-qp

Where any quoted-pair appears, it is to be interpreted as the text character alone. That is to say, the "\" character that appears as part of a quoted-pair is semantically "invisible".
Note: The "\" character may appear in a message where it is not part of a quoted-pair. A "\" character that does not appear in a quoted-pair is not semantically invisible. The only places in this standard where quoted-pair currently appears are ccontent, qcontent, dcontent, no-fold-quote, and no-fold-literal.

Comments
Strings of characters enclosed in parentheses are considered comments so long as they do not appear within a "quoted-string".There are several places in this standard where comments and FWS may be freely inserted. To ccommodate that syntax, an additional token for "CFWS" is defined for places where comments and/or FWS can occur.
FWS             =       ([*WSP CRLF] 1*WSP) /   ; Folding white space
obs-FWS
ctext = NO-WS-CTL / ; Non white space controls
%d33-39 / ; The rest of the US-ASCII
%d42-91 / ; characters not including "(",
%d93-126 ; ")", or "\"
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = *([FWS] comment) (([FWS] comment) / FWS)
A comment is normally used in a structured field body to provide some human readable informational text. Since a comment is allowed to contain FWS, folding is permitted within the comment. Also note that since quoted-pair is allowed in a comment, the parentheses and backslash characters may appear in a comment so long as they appear as a quoted-pair.

Email Address
address         =       mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
group = display-name ":" [mailbox-list / CFWS] ";"
[CFWS]
display-name = phrase
mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
address-list = (address *("," address)) / obs-addr-list
Field Definition
fields          =       *(trace
*(resent-date /
resent-from /
resent-sender /
resent-to /
resent-cc /
resent-bcc /
resent-msg-id))
*(orig-date /
from /
sender /
reply-to /
to /
cc /
bcc /
message-id /
in-reply-to /
references /
subject /
comments /
keywords /
optional-field)
Date:The origination date specifies the date and time at which the creator of the message indicated that the message was complete and ready to enter the mail delivery system.
from: The author(s) of the message. This field contains more than one email address when number of authors is more than 1.
sender: the agent that is responsible for transporting the message. This is different from "from". For example, a secretary can send email for her superior. In this case, value of "from" field should be the superior and value of "sender" field should be the secretary. If the author and transmitter are identical, the "sender" field should not be used.
reply-to:When this field is present, it indicates the mailbox(es) to which the author of the message suggests that replies be sent.
To:contains the address(es) of the primary recipient(s) of the message.
cc:contains the addresses of others who are to receive the message, though the content of the message may not be directed at them.
bcc:contains addresses of recipients of the message whose addresses are not to be revealed to other recipients of the message. There are three ways in which the "Bcc:" field is used.
Message-ID:provides a unique message identifier that refers to a particular version of a particular message.
In-reply-to and References:These two fields are used when creating a reply to a message. They hold the message identifier of the original message and the message identifiers of other messages (for example, in the case of a reply to a message which was itself a reply). The "In-Reply-To:" field may be used to identify the message (or messages) to which the new message is a reply, while the "References:" field may be used to identify a "thread" of conversation.
Keywords:contains a comma-separated list of one or more words or quoted-strings.
subjectcontains a short string identifying the topic of the message
comments:contains any additional comments on the text of the body of the message.
Resent Field
Resent fields SHOULD be added to any message that is reintroduced by a user into the transport system. A separate set of resent fields SHOULD be added each time this is done. All of the resent fields corresponding to a particular resending of the message SHOULD be together. Each new set of resent fields is prepended to the message; that is, the most recent set of resent fields appear earlier in the message. No other fields in the message are changed when resent fields are added.

Each of the resent fields corresponds to a particular field elsewhere in the syntax. For instance, the "Resent-Date:" field corresponds to the "Date:" field and the "Resent-To:" field corresponds to the "To:" field. In each case, the syntax for the field body is identical to the syntax given previously for the corresponding field.

When resent fields are used, the "Resent-From:" and "Resent-Date:" fields MUST be sent. The "Resent-Message-ID:" field SHOULD be sent. "Resent-Sender:" SHOULD NOT be used if "Resent-Sender:" would be identical to "Resent-From:".
The purpose of
using resent fields is to have the message appear to the final recipient as if it were sent directly by the original sender, with all of the original fields remaining the same. Each set of resent fields correspond to a particular resending event. That is, if a message is resent multiple times, each set of resent fields gives identifying information for each individual time. Resent fields are strictly informational.
Trace fields:Full discussion in RFC2821.
The trace fields are a group of header fields consisting of an optional Return-Path:" field, and one or more "Received:" fields.
Return-Path:" contains a pair of angle brackets that enclose an optional addr-spec.
Received:contains a (possibly empty) list of name/value pairs followed by a semicolon and a date-time specification.


Limit of number of characters in a single line
There are two limits that this standard places on the number of characters in a line. Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF.

The 998 character limit is due to limitations in many implementations which send, receive, or store Internet Message Format messages that simply cannot handle more than 998 characters on a line. Receiving implementations would do well to handle an arbitrarily large number of characters in a line for robustness sake. However, there are so many implementations which (in compliance with the transport requirements of [RFC2821]) do not accept messages containing more than 1000 character including the CR and LF per line, it is important for implementations not to create such messages.

The more conservative 78 character recommendation is to accommodate the many implementations of user interfaces that display these messages which may truncate, or disastrously wrap, the display of more than 78 characters per line, in spite of the fact that such implementations are non-conformant to the intent of this specification (and that of [RFC2821] if they actually cause information to be lost). Again, even though this limitation is put on messages, it is encumbant upon implementations which display messages to handle an arbitrarily large number of characters in a line (certainly at least up to the 998 character limit) for the sake of robustness.