Get the most different combinations while distributing values equally

69 Views Asked by ASIER RODRIGUEZ GONZALEZ At 30 January 2023 at 16:20

I'm trying to get the most different combinations of a given set of variables:values but keeping every element more or less equally distributed.

For example, for given:

{ 'cat0' : [0,1,2,3,4,5],
  'cat1' : [0,1,2,3,4,5],
  'cat2' : [0,1,2,3,4,5]
}

I generate the all combinations dataframe, where each line is a possible and unique combination of the elements of the previously defined variables.

cat0	cat1	cat2
0	0	0
0	0	1

... ...

And so.

For example, if the given number of rows is 6, the output can be similar to:

0,0,0
1,1,1
2,2,2
3,3,3
4,4,4
5,5,5

The expected output have to keep every row as maximum distant as possible from the others. And also, each component must be similarly distributed. For example, if the given number of rows is 11 the expected output could be similar to:

0,0,0
1,1,1
2,2,2
3,3,3
4,4,4
5,5,5

0,1,2
1,2,3
4,5,4
2,3,0
3,4,1

As you can see for each 'cat' all the values are equally distributed (as much as possible) and each combination is the as different as possible from the previously selected ones.

I have made a function but it does not cover the full problem:

def get_distanced_creatives(n, combinations_df, weighted_variables = {
        'cat1': 0.33,
        'cat2': 0.33,
        'cat3': 0.33,
    }):

    def scalar_product(v1, v2, weighted_variables = weighted_variables):
        adding = 0
        for var in weighted_variables:
            if v1[var] != v2[var]:
                adding += weighted_variables[var]
        return adding

    distance_matrix = np.array(list(itertools.starmap(scalar_product, itertools.product([comb[1] for comb in combinations_df.iterrows()],[comb[1] for comb in combinations_df.iterrows()])))).reshape(len(combinations_df), len(combinations_df))
    
    initial = np.random.randint(len(combinations_df))
    list_elements = [initial]

    iteration = 0
    while (len(list_elements) < n):
        aux = distance_matrix[list_elements].sum(axis = 0)
        aux2 = distance_matrix[list_elements].sum(axis = 1)
        list_ordered = sorted(range(len(aux)), key=lambda k: -aux[k])
        for i in list_ordered:
            if i not in list_elements:
                list_elements.append(i)
                break
    return list_elements, combinations_df.iloc[list_elements]

It implements only the part of distribute equally each element but it generates a non desired output. For example, for the previous combinations dataframe, given n=11 it outputs:

0,0,0
1,1,1
2,2,2
3,3,3
4,4,4
5,5,5

0,1,1
1,2,2
2,3,3
3,4,4
4,5,5

As you can see the output keeps distributed the values for each variable but the combinations are not the most possible different ones as the second and the seventh ends equal.

How can I correct this?

Thanks

Original Q&A

Get the most different combinations while distributing values equally

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in COMBINATIONS

Related Questions in PYTHON-ITERTOOLS

Related Questions in DISTANCE-MATRIX

Trending Questions

Popular # Hahtags

Popular Questions