Python: Program running Slooooow

Question

Python: Program running Slooooow

93 Views Asked by GarryU At 15 March 2021 at 20:54

I've got an issue with my Pandas data analysis running very slowly and wondered if someone could help please.

I have 2 data frames I'm trying to compare if the IP address in one frame is part of the network in another frame:

VFFrame contains 10 columns, 4798 rows: 'CIDR' column contains the network
AssetFrame contains 3368 columns, 115816 rows: 'IPv4s' contains the IP addresses to match in 'CIDR' (as 'IPv4s' can have 1 to 3 possible IP addresses, I've had to split it out and clean them up)

I'm adding the additional 7 columns to the AssetFrame ready to take the new values once matched.

for index, row in AssetFrame.iterrows():
    IPstore = AssetFrame.loc[index, 'ipv4s'].split(',')
    
    for index, row in VFFrame.iterrows():
        net = ipaddress.IPv4Network(VFFrame.loc[index, 'CIDR'])
    
        for i in range(len(IPstore)):
            IPstore[i] = IPstore[i].strip()
            IP = ipaddress.IPv4Address(IPstore[i])
        
            if IP in net:
                row = [IP, net]       # Used to check list of matches to export as CSV,
                TheList.append(row)   # to check my assumption below is correct.

                # All IPs will be in the same network 'CIDR' or there will be no match
                # The columns have already been added to the AssetFrame ready to take the new values
                AssetFrame.loc[index, 'comment'] = VFFrame.loc[index, 'comment']
                AssetFrame.loc[index, 'country'] = VFFrame.loc[index, 'country']
                AssetFrame.loc[index, 'city'] = VFFrame.loc[index, 'city']
                AssetFrame.loc[index, 'site-name'] = VFFrame.loc[index, 'site-name']
                AssetFrame.loc[index, 'site-id'] = VFFrame.loc[index, 'site-id']
                AssetFrame.loc[index, 'vf-device'] = VFFrame.loc[index, 'vf-device']
                AssetFrame.loc[index, 'vlan'] = VFFrame.loc[index, 'vlan']



AssetFrame:
    id            ipv4s             fqdn
0   b564a4        192.168.20.4      [email protected]
1   e454a4        192.168.20.74     [email protected]
2   a454a4        192.168.20.84     [email protected]

VFFrame:
    subnet          mask                CIDR                Comment         vlan
0   192.168.20.0    255.255.255.224     192.168.20.0/26     Blah Blah       101
1   192.168.20.64   255.255.255.240     192.168.20.64/28    Ditto Blah      201

Result Should be:
AssetFrame:
    id        ipv4s             fqdn                    Comment         vlan
0   b564a4    192.168.20.4      [email protected]    Blah Blah       101
1   e454a4    192.168.20.74     [email protected]    Ditto Blah      201
2   a454a4    192.168.20.84     [email protected]   No Match        No Match

Example of the data frames and wanted output:

Original Q&A

There are 1 best solutions below

**Glauco** · Answer 1 · 2021-03-16T10:07:52.447000

"divide and conquer" is your friends. First of all we have O(n³) complexity and this is no a big deal. The first step in these cases is to understand where time is lost, i can guess the .loc is the bottleneck. But for a clear analisys i suggest to use a performing tools. I wrote perf_tool that can guide you to the solution:

Some hits

Internal loop do scalar assignment when you can do it in vectorial mode. The AssetFrame.loc is very expensive and called too many times.
Middle loop do again elaboration for each record, this can be done in vectorial mode using masks
Top Loop again iter on each row.
Probably the Top and Middle loop can be removed doing a single merge operation between the DataFrames.

Python: Program running Slooooow

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in PERFORMANCE

Related Questions in LARGE-DATA-VOLUMES

Trending Questions

Popular # Hahtags

Popular Questions