Fastest minimization algorithm to define optimal network

54 Views Asked by user150272 At 22 January 2024 at 16:18

I have a large set of clients, and each of them have an id. They are related to each other by the column "Value", see below:

import pandas as pd
import numpy as np

# Example with 7 clients
df = pd.DataFrame(np.random.randint(7, size=(20, 2)), columns=['start id', 'end id'])
df = df[df['start id'] != df['end id']]
df = df.drop_duplicates()
df['Value'] = np.random.uniform(low=0.0, high=1.0, size=len(df))
print(df)

	start id	end id	Value
0	1	4	0.315743
1	4	3	0.449567
2	4	2	0.945336
3	3	0	0.556950
5	3	4	0.002412
6	2	1	0.976020
8	4	0	0.480784
9	4	1	0.798300

And the score is the aggregation of values over "start id":

print(df.groupby('start id').agg(Score=('Value', 'sum')))

start id	Score
1	0.315743
2	0.976020
3	0.559362
4	2.673987

My goal is to remove the lowest number of clients such that the score for each client is below a threshold. Let's assume that the threshold is 0.7. Removing 2 and 4 is enough, and the maximum score would be 0.55695:

df = df[~df['start id'].isin([2,4])]
df = df[~df['end id'].isin([2,4])]
print(df)

	start id	end id	Value
3	3	0	0.55695

df_2 = df.groupby('start id').agg(Score=('Value', 'sum'))
print(df_2['Score'].max())
0.55695

I have thousands of clients that are connected, and a brute force approach (removing each possible combination is unfeasible), what optimization algorithm would you recommend?

Original Q&A

There are 1 best solutions below

benvigano On 23 January 2024 at 10:02

I think what you want to do is figure out which ids will generate scores higher than the threshold, then wipe the rows that involve those ids from the original dataframe (both as start and as end), then redo the grouping.

threshold = 0.7

# Pre-compute group-by
grouped_df = df.groupby('start id').agg(Score=('Value', 'sum'))

# Isolate ids to remove
ids_to_remove = grouped_df[grouped_df["Score"] > threshold].index

# Edit the dataframe
df = df[(~df['start id'].isin(ids_to_remove))&(~df['end id'].isin(ids_to_remove))]

# Compute final group-by
grouped_df = df.groupby('start id').agg(Score=('Value', 'sum'))

print(grouped_df['Score'].max())
>> 0.55695

Fastest minimization algorithm to define optimal network

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in NETWORK-PROGRAMMING

Related Questions in MINIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions