Python: Reading 7zipfile without extracting it

734 Views Asked by At

I have a zip directory similar to this one:

folder_to_zip
    - file_1.csv
    - folder_1.zip
        1. file_2.csv
        2. file_3.csv
        3. folder_2.zip
            **.**file_4.csv
            **.**file_5.csv
            **.** file_6.csv
    -file_7.csv

and I would like to "put" each csv file in a different pandas dataframe

The reason I want to do that is because I do not want this project to be too "heavy" ( the zip_folder is just 639MB insted of 7.66 GB)

based on these questions (Python: Open file in zip without temporarily extracting it, Python py7zr can't list files in archive - how to read 7z archive without extracting it) I tried something like this:

from py7zr import SevenZipFile as szf
import os
import pandas as pd


def unzip_(folder_to_zip):
    dfs= []
    if not folder_to_zip.endswith('.csv'):
        dfs.append(pd.read_csv(folder_to_zip))
    else:      
        with szf(folder_to_zip, 'r') as z:
            for f in z.getnames():
                dfs += unzip_(f)
    return dfs       

1

There are 1 best solutions below

2
Sam Mason On

If you really want to do this, it would be something like:

import py7zr
import pandas as pd

ar = py7zr.SevenZipFile("archive.7z")
dfs = {}
for name, fd in ar.read(name for name in ar.getnames() if name.endswith(".csv")).items():
    dfs[name] = pd.read_csv(fd)

Note this loads into a dictionary rather than a list (as I'm not sure how well defined the ordering coming out of is).

But given the RAM requirements, this seems less useful in your use case.