Intersecting two panda dataframe

13 Views Asked by At

I am working with panda dataframes, having different columns, but let's focus on depth and value1. I used to concatenate different dataframes matching a condition, and then dropping the duplicates over one column. The problem is that the depths can be different and the dropping doesn't work. Is there an efficient way to check the intersection depths between the dataframes and then prioritizing one of the two over that depth?

here is a snippet

for i in range(loc_list_unique.__len__()):
 ###initialization
    
    for j in range(list.__len__()):
        if loc_list[j] == loc_list_unique[i]:
            item = list[j]
            temp = full.loc[full["Location"]==item].sort_values(by="depth")            
            temp_concat = pd.concat([temp_concat, temp])

    ###Merge multiple CPTs at 1 location and drops last values
    cond = ['38','42']
    if loc_list_unique[i] in cond:
        item_be_kept = 'first'
    else:
        item_be_kept = 'last'
        
    temp_concat =temp_concat.drop_duplicates(subset='depth', keep=item_be_kept).sort_values(by="depth").reset_index(drop=True)

Thanks in advance! F

I am trying to find the intersection and manually copying the DF

            if temp_concat.__len__()>0:
                min1 = temp_concat['SCPT_DPTH'].min()
                max1 = temp_concat['SCPT_DPTH'].max()
            else:
                min1=0
                max1=0
            min2 = temp['SCPT_DPTH'].min()
            max2 = temp['SCPT_DPTH'].max()

            depth_intersection_min = max(min1, min2)
            depth_intersection_max = min(max1, max2)

            if depth_intersection_min < depth_intersection_max:
                df1_within_intersection = df1[(df1['depth'] >= depth_intersection_min) & (df1['depth'] <= depth_intersection_max)]
                df2_within_intersection = df2[(df2['depth'] >= depth_intersection_min) & (df2['depth'] <= depth_intersection_max)]

                # Prioritize one dataframe over the other based on specified conditions within the intersection
                prioritized_within_intersection = df1_within_intersection  # or df2_within_intersection based on your conditions
                
                temp_concat = pd.concat([prioritized_within_intersection, df1[~df1['depth'].between(depth_intersection_min, depth_intersection_max)], df2[~df2['depth'].between(depth_intersection_min, depth_intersection_max)]])

            ```
0

There are 0 best solutions below