How to Match and Extract Values between two files in python?

77 Views Asked by At

I have two DataFiles, and I need to match and extract the Information

File1

Chr,Start,End,ID
6,38517417,38517437,kgp17152035
6,38517556,38517576,rs4254983
6,38517997,38518017,kgp10250023
6,38519465,38519485,kgp17245206
6,38519751,38519771,kgp8446980

File2

Gene_ID,Gene_Name,RE_Locus
ENSG00000112164.5,GLP1R,chr6:39041458-39041477
ENSG00000112164.5,GLP1R,chr6:39053087-39053106
ENSG00000112164.5,GLP1R,chr6:39049954-39049973
ENSG00000112164.5,GLP1R,chr6:39041701-39041720
ENSG00000112164.5,GLP1R,chr6:39047953-39047972

Here I would like to Match the range of the RE_Locus Column in File2 with File1 (Start & End Columns) as range and Whenever it finds a matching range it has to give their ID

Desired Output:

ID
tsp17152035
rs874983
kgp10250023
rsp17245206
ex8446980
1

There are 1 best solutions below

8
Abhishek On

Check below code

import pandas as pd

df = pd.read_csv('Book1 copy.csv') #file1

df1 = pd.read_csv('Book1.csv') #file2

pd.merge(df, df1, 
         left_on=(df['Start'].astype('str')+'-'+df['End'].astype('str'))
         ,right_on = df1['RE_Locus'].str.slice(start=5 ), how='inner')[['ID']]

Output:

enter image description here