Sunday, April 06, 2008

HTTP Form POST Encoding

Form is a common way to submit requests to server with various parameters. In W3C HTML4(http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4), form element has a special attribute "enctype" to indicate how to encode form data for submission.

(1) application/x-www-form-urlencoded (default type)

    from W3C specification:
  1. Control names and values are escaped. Space characters are replaced by `+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
  2. The control names/values are listed in the order they appear in the document. The name is separated from the value by `=' and name/value pairs are separated from each other by `&'.
Examples
    home=Cosby&favorite+flavor=flies 

(2) multipart/form-data
From W3C specification:
"he content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.

The content "multipart/form-data" follows the rules of all multipart MIME data streams as outlined in [RFC2045]. The definition of "multipart/form-data" is available at the [IANA] registry.

A "multipart/form-data" message contains a series of parts, each representing a successful control. The parts are sent to the processing agent in the same order the corresponding controls appear in the document stream. Part boundaries should not occur in any of the data; how this is done lies outside the scope of this specification.

As with all multipart MIME types, each part has an optional "Content-Type" header that defaults to "text/plain". User agents should supply the "Content-Type" header, accompanied by a "charset" parameter.

Each part is expected to contain:

  1. a "Content-Disposition" header whose value is "form-data".
  2. a name attribute specifying the control name of the corresponding control. Control names originally encoded in non-ASCII character sets may be encoded using the method outlined in [RFC2045]. "
"

Examples:

    Content-Type: multipart/form-data; boundary=AaB03x
   --AaB03x
   Content-Disposition: form-data; name="submit-name"

   Larry
   --AaB03x
   Content-Disposition: form-data; name="files"; filename="file1.txt"
   Content-Type: text/plain

   ... contents of file1.txt ...
   --AaB03x--

No comments: