Python working with complex data structure - random acces

Question

Python working with complex data structure - random acces

51 Views Asked by WJ.Lesster At 30 August 2022 at 20:14

Looking for advice on the best approach.

I'm working with a text file that is colon delimited, with 4 columns:

user1:company1:QUOTE:printer1
user1:company2:INVOICE:printer2
user1:company1:PURCHASE:printer3
user1:company2:CREDIT:printer4
user2:company1:QUOTE:printer4
user2:company2:INVOICE:printer5
user2:company1:PURCHASE:printer5
user2:company2:CREDIT:printer1
user3:company1:QUOTE:printer2
user3:company2:INVOICE:printer3
user3:company1:PURCHASE:printer4
user3:company2:CREDIT:printer6

This file maps a user to a printer for a specific type of document.

I need to read and potentially manipulate this file.

When reading the file I want to be able to answer different questions:

List all printers for a specific user
List all users that use a specific printer
List all users that have have a specific document
Does "this" user exist in the file with "this" printer and "this" document

So the access is somewhat random, ie there is no single query.

My current attempt is with nested dictionaries:

mydict[user][printer] = [list of documents]

I'm looking for a cleaner way to do this.

My current thinking is to use dataclass and create an instance of every record. But how do I do efficiently query these as per my examples above?

Thanks for reading, hope you can guide me.

Original Q&A

There are 2 best solutions below

Erik On 30 August 2022 at 20:28

The snippet below uses pandas to read the text file into a dataframe. The answer to your first question is also there. I leave the others as part of your training on the pandas package.

import pandas as pd

user_printer_info = pd.read_csv('mydata.txt', names=['username', 'company', 'type', 'printer')

# List all printers for a specific user
user = 'user1'
printers_for_user = user_printer_info[user_printer_info['username'] == user]['printer'].drop_duplicates()

**fsimonjetz** · Accepted Answer · 2022-08-30T20:28:50.257000

pandas is made for such analyses.

import pandas as pd # pip install pandas

df = pd.read_csv("path_to_your_file.txt", 
                 sep=":",
                 names=['User', 'Company', 'Doctype', 'Printer'])

List all printers for a specific user

>>> df[df.User == "user1"].Printer
0    printer1
1    printer2
2    printer3
3    printer4
Name: Printer, dtype: object

List all users that use a specific printer

>>> df[df.Printer == "printer1"].User
0    user1
7    user2
Name: User, dtype: object

List all users that have have a specific document

>>> df[df.Doctype == "PURCHASE"].User
2     user1
6     user2
10    user3
Name: User, dtype: object

Does "this" user exist in the file with "this" printer and "this" document? (In this case: Nope.)

>>> df[(df.User == "user1") & (df.Doctype == "PURCHASE") & (df.Printer == "printer2")]
Empty DataFrame
Columns: [User, Company, Doctype, Printer]
Index: []

(Note the obligatory(!) parentheses around each condition and usage of & - not and - in the last example. That's a major source of errors for pandas beginners.)

Python working with complex data structure - random acces

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-DATACLASSES

Related Questions in COMPLEX-DATA-TYPES

Trending Questions

Popular # Hahtags

Popular Questions