Im trying to compare two sets of coordinates in to dataframes using nested for loops. Where distance is less than a predefined value, I want to overwrite the coordinates in the qinsy_file_2. If they are not within that distance, I want to drop the row.
So far, the script seems to pass through one iteration, but fail on a key error on the second iteration whilst calculating the distance.
Is there anything Im obviously doing wrong here? Ive looked extensively for questions already but have come up empty so far. (I'm going slightly mad, this has stumped me the whole week)
## Pull values from GUI
qinsy_file=pd.read_csv(values["-QINSYInput-"],sep=',')
segy_file=pd.read_csv(values["-SEGYInput-"],sep='\t')
#print(segy_file)
in_file=str(values["-QINSYInput-"])
## Make the outfile name by replacing file suffix
out_file=in_file.replace(".csv","_SEGY_NAV.csv").replace(".txt","_SEGY_NAV.txt")
## Correlation zone = 30cm
buffer=0.2
## Get required headers
segy_vlookup=segy_file[['CDP_X','CDP_Y']]
qinsy_file_2=qinsy_file[['Date','Time','Sparker CoG Easting','Sparker CoG Northing',
'Streamer CoG Easting','Streamer CoG Northing','CMP Easting',
'CMP Northing','Fix Number','CMP DTM Depth']]
## Loop through Qinsy file
for index_qinsy,row_qinsy in qinsy_file_2.iterrows():
## Loop through SEGY navigation
for index_segy,row_segy in segy_vlookup.iterrows():
## Calculate distance between points
distance = (((segy_vlookup["CDP_X"][index_segy] - qinsy_file_2["CMP Easting"][index_qinsy])**2) + ((segy_vlookup["CDP_Y"][index_segy] - qinsy_file_2["CMP Northing"][index_qinsy])**2))**0.5
print(distance)
## If distance between points is less than or equal to the correlation value, replace the CMP X and Y values in the QINSY file
if distance <= buffer:
qinsy_file_2["CMP Easting"][index_qinsy]=segy_vlookup["CDP_X"][index_segy]
qinsy_file_2["CMP Northing"][index_qinsy]=segy_vlookup["CDP_Y"][index_segy]
print(qinsy_file_2)
#qinsy_file_2["CMP Easting"]=segy_vlookup["CDP_X"]
#qinsy_file_2["CMP Northing"]=segy_vlookup["CDP_Y"]
else:
## Need to delete the row at this point
qinsy_file_2.drop(index_qinsy,inplace=True)
## Export the "filtered" dataframe to csv, turning off index
qinsy_file_2.to_csv(out_file,sep=',',index=False,header=True)
When it works, it should export a stripped down version of Qinsy_file_2, only containing rows with coordinates in common with SEGY_vlookup (I appreciate the last is poorly named, I changed my methodology)
Here is the terminal feedback I keep recieving:
71.10718458835196 # this is distance
Traceback (most recent call last):
File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\range.py", line 414, in get_loc
return self._range.index(new_key)
ValueError: 0 is not in range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "p:\Xtra\Public\TH\Python Code\SEIS_NAV Comparison.py", line 52, in <module>
distance = (((segy_vlookup["CDP_X"][index_segy] - qinsy_file_2["CMP Easting"][index_qinsy])**2) + ((segy_vlookup["CDP_Y"][index_segy] - qinsy_file_2["CMP Northing"][index_qinsy])**2))**0.5
File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py", line 1040, in __getitem__
return self._get_value(key)
File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py", line 1156, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\range.py", line 416, in get_loc
raise KeyError(key) from err
KeyError: 0