Remove (not clear) attachments from email

45 Views Asked by At

Python 3.6

I'm trying to archive some old mails, and I want to remove attachments from some of them.

However, if I use the clear() method, the MIME part remains in the mail, just empty (so it's assumed to be of type text/plain). I came up with a really hacky solution of converting the EmailMessage object to text then removing any boundary lines that aren't followed by headers, but surely there's a better way.

Example mail with two .png inline attachments and two .txt attachments.

Here's a sample:

from email import policy
from email.parser import BytesParser
from email.iterators import _structure

with open(eml_path, 'rb') as fp:
    msg = BytesParser(policy=policy.SMTP).parse(fp)

print(_structure(msg))

for part in msg.walk():
     cd = part.get_content_disposition()
     if cd is not None:
        part.clear()

print(_structure(msg))

Structure of original mail:

multipart/mixed
    multipart/alternative
        text/plain
        multipart/related
            text/html
            image/png
            image/png
    text/plain
    text/plain

Structure after removing attachments:

multipart/mixed
    multipart/alternative
        text/plain
        multipart/related
            text/html
            text/plain
            text/plain
    text/plain
    text/plain

The last 4 parts are left empty, but I want to remove them.

This causes some graphical issues in Thunderbird and Gmail, from what I've tried. Once I remove the lingering boundary lines, they display correctly.

1

There are 1 best solutions below

0
VPfB On

I think you need to call set_payload() to modify the structure:

if msg.is_multipart():
    payload = msg.get_payload()
    payload = [
        part for part in payload
        # optionally fine-tune the condition, e.g.
        # you might want to keep the "inline" parts
        if part.get_content_disposition() is None]
    msg.set_payload(payload)