I wanted to sort emails into different mailboxes, depending on the subject line. I tried various approaches to deal with encoded unicode subjects in procmail, and ultimately ended up creating my own solution.
#!/usr/bin/env python
import datetime
import re
import sys
from email.header import decode_header
from email.parser import Parser
def proc_stdin(backup=True):
mail = Parser().parsestr(sys.stdin.read())
if backup is True:
ts = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
latest = f"backup_{ts}_latest.eml"
with open(latest, "w") as fh:
fh.write(str(mail))
subject = mail["Subject"]
if hasattr(subject, "replace"): # Waaat
subject = subject.replace("\n", " ")
utf = re.compile(r"^(=\?UTF.*)", re.IGNORECASE)
match = re.match(utf, subject)
if match:
dh = decode_header(subject)
default_charset = "ASCII"
subject = "".join([str(t[0], t[1] or default_charset) for t in dh])
# Just output to stdout, procmail parses it, mail clients ignore it
# The encoding is not right, but whatever
print(f"Subject-Decoded: {subject}")
print(mail)
if __name__ == "__main__":
proc_stdin()
What this script does is fairly trivial: it reads the mail from stdin, looks if the Subject header has encoded unicode content, decodes it, and outputs a new header Subject-Decoded to stdout. You can test it like this:
cat Maildir/cur/the_email | ./headerscript.py
Once you're satisfied this works on your existing emails you can call it from procmail and update your rules as below:
# Decode unicode subject
:0fhw
| /path/to/headerscript.py
# Sort mail with Important in subject to Important mailbox
:0
* ^Subject[^:]*: .*(Important).*
.Important/
The rule will match the Subject and the Subject-Decoded headers, actually, any Subject-foo headers. With this approach I could very easily update all my existing rules to be aware of unicode subjects.
Edit 2022-10: Procmail is supported upstream again, which is cool: https://github.com/BuGlessRB/procmail
Edit 2023-01: A reader reported UnicodeEncodeError problems, perhaps their system wasn't configured to use a unicode locale. Reportedly adding PYTHONIOENCODING="utf8" to their ~/.procmailrc fixed the problem.
0 comments
Reply