How do you compare two csv files with identical columns but different values?

2.6k Views Asked by At

Here's my problem, I need to compare two procmon scans which I converted into CSV files.

Both files have identical column names, but obviously the contents differ. I need to check the "Path" (5th column) from the first file to the one to the second file and print out the ENTIRE row of the second file into a third CSV, if there are corresponding matches.

I've been googling for quite a while and can't seem to get this to work like I want it to, any help is appreciated!

I've tried numerous online tools and other python scripts, to no avail.

2

There are 2 best solutions below

0
sigma1510 On

Have you tried using pandas and numpy together?

It would look something like this:

import pandas as pd
import numpy as np

#get your second file as a Dataframe, since you need the whole rows later
file2 = pd.read_csv("file2.csv")

#get your columns to compare
file1Column5 = pd.read_csv("file1.csv")["name of column 5"]
file2Column5 = file2["name of column 5"]

#add a column where if values match, row marked True, else False
file2["ColumnsMatch"] = np.where(file1Column5 == file2Column5, 'True', 'False')

#filter rows based on that column and remove the extra column
file2 = file2[file2['ColumnsMatch'] == 'True'].drop('ColumnsMatch', 1)

#write to new file
file2.to_csv(r'file3.csv')
4
organicData On

Just write for such things your own code. It's probably easier than you are expecting.

#!/usr/bin/env python

import pandas as pd

# read the csv files
csv1 = pd.read_csv('<first_filename>')
csv2 = pd.read_csv('<sencond_filename>')

# create a comapare series of the files
iseq = csv1['Path'] == csv2['Path']

# push compared data with 'True' from csv2 to csv3
csv3 = pd.DataFrame(csv2[iseq])

# write to a new csv file
csv3.to_csv('<new_filename>')