Extract meta data from Microsoft PST file with Python and pypff

32 Views Asked by At

I am encountering a problem extracting meta data from a PST file.

As you can see in the code I am using pypff to read the PST file. I need to extract the following data from the emails: sender, recipient, subject, time and date and of course the email content.

But apparently I'm too stupid for that, because I just can't find the recipient.

I'm asking you professionals for help, maybe you know a better way to do this. I have already thought about "unpacking" all .msg from the PST into a folder and then itterrating over it. But I wouldn't know how to do that either.

Thanks in advance for your answers and help.

# Retrieving E-Mails from a PST file
#File opening

#Fist we load the libraries
import pypff
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Then we open the file: the opening can neverthless be quite long
#depending on the size of the archive.
pst = pypff.file()
pst.open("PathTo.pst")

# Metadata extraction

#It is possible to navigate through the structure using the functions
#offered by the library, from the root:
root = pst.get_root_folder()

#To extract the data, a recursive function is necessary:
def parse_folder(base):
    messages = []
    for folder in base.sub_folders:
        if folder.number_of_sub_folders:
            messages += parse_folder(folder)
        print(folder.name)
        for message in folder.sub_messages:
            print(message.transport_headers)
            messages.append({
                "subject": message.subject,
                "sender": message.sender_name,
                "datetime": message.client_submit_time,
            })
    return messages

messages = parse_folder(root)
0

There are 0 best solutions below