The Python Oracle

Decoding RFC 2231 headers

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Cosmic Puzzle

--

Chapters
00:00 Decoding Rfc 2231 Headers
02:31 Accepted Answer Score 4
04:00 Answer 2 Score 0
04:34 Thank you

--

Full question
https://stackoverflow.com/questions/1809...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #http #mime #multipartformdata

#avk47



ACCEPTED ANSWER

Score 4


Currently the functions from email.utils are rarely used besides within email.message. Most users seem to prefer using email.message.Message directly. There's even a somewhat old issue report on adding unit tests (that would certainly be usable as examples) to Python, even if I'm not sure on how it relates to email.util.

A short example I found is this blogpost which, however, doesn't contain more than once sentence and a few SLOCs of information about RFC2231 parsing. The author notes, however, that many MTAs use RFC2047 instead. Depending on your usecase, that might also be an issue.

Judging from the few examples I could find I assume your way of parsing using email.util is the only way to go, even if the long list comprehension is somewhat ugly.

Because of the lack of examples in some respect it could be wise to write a new RFC2231 parser (if you really need a better, maybe faster or more beautiful codebase). A new implementation could be based on existing implementations like the Dovecot RFC2231 parser for compatibility reasons (you could even use the Dovecot unit test. As the C code seems quite complex to me and since I can't find any python implementation besides email.util and Python2 backports of email.util the task of porting to Python won't be easy (note that Dovecot is LGPL-licensed, which might be an issue in your project)

I think the email.util RFC2231 API has not been designed for easy standalone usage but more as a pile of utility methods for use in email.message.Message.




ANSWER 2

Score 0


Old question, but I could not find a complete answer that works on this. So this is what I ended up doing (on Python 2.7):

def decode_rfc2231_header(header):
    """Decode a RFC 2231 header"""
    # Remove any quotes
    header = email.utils.unquote(header)
    encoding, language, value = email.utils.decode_rfc2231(header)
    value = urllib.unquote(value)
    return email.utils.collapse_rfc2231_value((encoding, language, value))

For example:

>>> name = u'èéêëēėęûüùúūàáâäæãåāāîïíīįì test ôöòóœøōõssśšłžźżçćčñń'
>>> encoded_header = email.utils.encode_rfc2231(name.encode("utf8"), 'utf8', 'en')
>>> print encoded_header 
utf8'en'%C3%A8%C3%A9%C3%AA%C3%AB%C4%93%C4%97%C4%99%C3%BB%C3%BC%C3%B9%C3%BA%C5%AB%C3%A0%C3%A1%C3%A2%C3%A4%C3%A6%C3%A3%C3%A5%C4%81%C4%81%C3%AE%C3%AF%C3%AD%C4%AB%C4%AF%C3%AC%20test%20%C3%B4%C3%B6%C3%B2%C3%B3%C5%93%C3%B8%C5%8D%C3%B5ss%C5%9B%C5%A1%C5%82%C5%BE%C5%BA%C5%BC%C3%A7%C4%87%C4%8D%C3%B1%C5%84
>>> print decode_rfc2231_header(encoded_header)
èéêëēėęûüùúūàáâäæãåāāîïíīįì test ôöòóœøōõssśšłžźżçćčñń