Parse lines in files with similar strings using Python

71 Views Asked by At

AH! I'm new to Python. Trying to get the pattern here, but could use some assistance to get unblocked.

Scenario:

  • testZip.zip file with test.rpt files inside
  • The .rpt files have multiple areas of interest ("AOI") to parse
  • AOI1: Line starting with $$
  • AOI2: Multiple lines starting with a single $

Goal:

  • To get AOI's into tabular format for upload to SQL

Sample file:

$$ADD ID=TEST BATCHID='YEP' PASSWORD=NOPE
###########################################################################################
$KEY= 9/21/2020 3:53:55 PM/2002/B0/295.30/305.30/4FAOA973_3.0_v2.19.2.0_20150203_1/20201002110149
$TIMESTAMP= 20201002110149
$MORECOLUMNS=  more columns
$YETMORE = yay

Tried so far:

import zipfile

def get_aoi1(zip):
    z = zipfile.ZipFile(zip)
    for f in z.namelist():
        with z.open(f, 'r') as rptf:
            for l in rptf.readlines():
                if l.find(b"$$") != -1:
                    return l

def get_aoi2(zip):
    z = zipfile.ZipFile(zip)
    for f in z.namelist():
        with z.open(f, 'r') as rptf:
            for l in rptf.readlines():
                if l.find(b"$") != -1:
                    return l


aoi1 = get_aoi1('testZip.zip')
aoi2 = get_aoi2('testZip.zip')

print(aoi1)
print(aoi2)

Results:

  • I get the same results for both functions
b"$$ADD ID=TEST BATCHID='YEP' PASSWORD=NOPE\r\n"
b"$$ADD ID=TEST BATCHID='YEP' PASSWORD=NOPE\r\n"

How do I get the results in text instead of bytes (b) and remove the \r\n from AOI1?

  • There doesn't seem to be an r option for z.open()
  • I've been unsuccessful with .strip()

EDIT 1:

  • Thanks for the pep @furas!
  • return l.strip().decode() worked for removing the new line and b

How do I get the correct results from AOI2 (lines with a single $ in a tabular format)?

EDIT 2:

  • @furas 2021!
  • Adding the following logic to aoi2 function worked great.
col_array = []
    for l in rptf.readlines():
        if not l.startswith(b"$$") and l.startswith(b"$"):
            col_array.append(l)
    return col_array
0

There are 0 best solutions below