Difference between revisions of "MIME"
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
Need to describe MIME headers and body types here, and what types of MIME bodies can be in multiparts/treed stuff, etc. This is really Entity, probably. Need to be able to distinguish between HTTP and MIME (mail) type entities. Also, an indicator as to whether the entity is currently encoded or decoded. Functions to encode/decode (to MIME transport spec, gzip, compress, etc). When the entity is encoded, it probably shouldn't be writable, if we can enforce that. Should have header access for raw header contents as well as an array of values, with true value, raw value group, and properties array for each value. Should have raw access to the body (including entity bodies, and multipart bodies), as well as split into the entity tree form. Preferably things are only added/changed/removed via the methods or non-raw arrays/hashes, and just viewed in the raw views. | Need to describe MIME headers and body types here, and what types of MIME bodies can be in multiparts/treed stuff, etc. This is really Entity, probably. Need to be able to distinguish between HTTP and MIME (mail) type entities. Also, an indicator as to whether the entity is currently encoded or decoded. Functions to encode/decode (to MIME transport spec, gzip, compress, etc). When the entity is encoded, it probably shouldn't be writable, if we can enforce that. Should have header access for raw header contents as well as an array of values, with true value, raw value group, and properties array for each value. Should have raw access to the body (including entity bodies, and multipart bodies), as well as split into the entity tree form. Preferably things are only added/changed/removed via the methods or non-raw arrays/hashes, and just viewed in the raw views. | ||
HTTP-Message := HTTP-Request | HTTP-Response | HTTP-Message := HTTP-Request | HTTP-Response | ||
HTTP-Request := request-line CRLF entity | MIME-Message := entity | ||
HTTP-Response := status-line CRLF | HTTP-Request := request-line CRLF entity | ||
HTTP-Response := status-line CRLF entity | |||
entity := | entity := header-group CRLF entity-body | ||
multipart := leader (CRLF boundary entity)+ boundary-end trailer | entity-body := text-body | entity | multipart-group | ||
multipart-group := leader (CRLF boundary CRLF entity)+ CRLF boundary-end CRLF trailer | |||
text-body := <true entity content> | |||
header := header-name ':' header-value CRLF | leader := <content that is not important and must be discarded when encountered> | ||
header-value := values | trailer := <content that is not important and must be discarded when encountered> | ||
values := (value-exp ',')* value-exp | |||
value-exp := real-value (';' property-exp)* | header-group := header* | ||
property-exp := property-key ['=' property-value] | header := header-name ':' header-value CRLF | ||
header-value := values | |||
values := (value-exp ',')* value-exp | |||
value-exp := real-value (';' property-exp)* | |||
real-value := value-key ['=' value-value] | |||
property-exp := property-key ['=' property-value] | |||
*Header comments are in '(' ')' and should not be kept except in the raw header value. | *Header comments are in '(' ')' and should not be kept except in the raw header value. | ||
*Header contents can have double quoted strings for atoms. | *Header contents can have double quoted strings for atoms. | ||
*Line folding can happen (header line followed by CRLF LWSP extra line), but should be decoded as soon as possible by the entity (right after receiving the mail message), and encoded only at the last moment (before sending the mail message). HTTP headers don't need line folding. | |||
*Be careful with cookies (usually will have a cookie header that appears as one value with all other values being properties of that value, which is incorrect, they are usually all distinct values. However, value-keys that start with '$' are usually actually properties of the preceding value, and so that's why we have to store all "properties" as an array, not a hash, since we could have duplicate names, and order matters in that regard. The HTTP-Request class should properly interpret cookies and store them as a hash at that level, with properties appropriately split out. That is where they should be accessed. Similarly, new cookies should be added in the HTTP-Response level cookie jar, where they can be correctly remerged into the proper header format at send time. Probably both request and response should have a linked cookie jar with better abstraction to properly handle cookie deletion, new cookies, etc., while making cookie actions more transparent to the user (simple access should be possible, while simultaneously encoding and decoding access). | |||
*Keep in mind that by and large only the top level entity headers "matter" for the given message type. Deeper entity headers really don't matter much, except to decode/encode their containing bodies, as with all abstract entities. For example, Cookie headers deep in multipart entities of request messages aren't parsed into the request cookie jar. Similarly, transport limitations of mail messages are enforced at the top level entity, and hopefully any lower level entities get encoded before insertion into the message if they need newline preservation or line length preservation contrary to mail specifications. | |||
*Request messages will break up the request string, query string, post data, cookie data, etc., as needed. | |||
*Response messages have to ensure the proper apache request_rec values get set based on header data in the response and the status information, and determining which cookies need to get set. They also need the obstack support for printing, etc. | |||
*Mail messages have lots to do in terms of encoding and decoding messages for transport, header folding, newline handling, etc. at send and receive time. |
Latest revision as of 17:04, 13 January 2009
Need to describe MIME headers and body types here, and what types of MIME bodies can be in multiparts/treed stuff, etc. This is really Entity, probably. Need to be able to distinguish between HTTP and MIME (mail) type entities. Also, an indicator as to whether the entity is currently encoded or decoded. Functions to encode/decode (to MIME transport spec, gzip, compress, etc). When the entity is encoded, it probably shouldn't be writable, if we can enforce that. Should have header access for raw header contents as well as an array of values, with true value, raw value group, and properties array for each value. Should have raw access to the body (including entity bodies, and multipart bodies), as well as split into the entity tree form. Preferably things are only added/changed/removed via the methods or non-raw arrays/hashes, and just viewed in the raw views.
HTTP-Message := HTTP-Request | HTTP-Response MIME-Message := entity HTTP-Request := request-line CRLF entity HTTP-Response := status-line CRLF entity
entity := header-group CRLF entity-body entity-body := text-body | entity | multipart-group multipart-group := leader (CRLF boundary CRLF entity)+ CRLF boundary-end CRLF trailer
text-body := <true entity content> leader := <content that is not important and must be discarded when encountered> trailer := <content that is not important and must be discarded when encountered>
header-group := header* header := header-name ':' header-value CRLF header-value := values values := (value-exp ',')* value-exp value-exp := real-value (';' property-exp)* real-value := value-key ['=' value-value] property-exp := property-key ['=' property-value]
- Header comments are in '(' ')' and should not be kept except in the raw header value.
- Header contents can have double quoted strings for atoms.
- Line folding can happen (header line followed by CRLF LWSP extra line), but should be decoded as soon as possible by the entity (right after receiving the mail message), and encoded only at the last moment (before sending the mail message). HTTP headers don't need line folding.
- Be careful with cookies (usually will have a cookie header that appears as one value with all other values being properties of that value, which is incorrect, they are usually all distinct values. However, value-keys that start with '$' are usually actually properties of the preceding value, and so that's why we have to store all "properties" as an array, not a hash, since we could have duplicate names, and order matters in that regard. The HTTP-Request class should properly interpret cookies and store them as a hash at that level, with properties appropriately split out. That is where they should be accessed. Similarly, new cookies should be added in the HTTP-Response level cookie jar, where they can be correctly remerged into the proper header format at send time. Probably both request and response should have a linked cookie jar with better abstraction to properly handle cookie deletion, new cookies, etc., while making cookie actions more transparent to the user (simple access should be possible, while simultaneously encoding and decoding access).
- Keep in mind that by and large only the top level entity headers "matter" for the given message type. Deeper entity headers really don't matter much, except to decode/encode their containing bodies, as with all abstract entities. For example, Cookie headers deep in multipart entities of request messages aren't parsed into the request cookie jar. Similarly, transport limitations of mail messages are enforced at the top level entity, and hopefully any lower level entities get encoded before insertion into the message if they need newline preservation or line length preservation contrary to mail specifications.
- Request messages will break up the request string, query string, post data, cookie data, etc., as needed.
- Response messages have to ensure the proper apache request_rec values get set based on header data in the response and the status information, and determining which cookies need to get set. They also need the obstack support for printing, etc.
- Mail messages have lots to do in terms of encoding and decoding messages for transport, header folding, newline handling, etc. at send and receive time.