find Minimum sum of squares of set partition in k cluster

194 Views Asked by At

Problem

Given a set of n positive integers, partition them into k subsets, then minimize the sum of the squares of the sum of each subset. For example, let the set be [1, 2, 3] and k be 2, then the solution is [1, 2] and [3]. The square of the sum from the first subset is (1+2)^2=9, and the square of the sum from the second subset is 3^2=9. The sum is 9+9=18, which is the minimum.

sample input

n=10, k=2 [63230795, 3521578, 37513838, 37860789, 30498450, 29795141, 41263743, 5815341, 19046274, 20919844] -> 41895269854617569

n=10, k=5 [42566460, 61080136, 12375813, 29881559, 61767889, 60645182, 22105410, 17262225, 34309213, 38950048] -> 29098109328960071

constraints

  • 1≤N≤20
  • 1≤K≤10
  • The numbers in the set are all positive. You need to use uint64_t for arithmetics.

my code

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <limits.h>
#include <stdint.h>

bool used[20] = {0};
int n, m;
uint64_t arr[20], min = UINT64_MAX;
int find(int nset, uint64_t sum);
int subset(uint64_t subsum, int cur, int sum, int nset){
    if (cur == n){
        find(nset+1, sum+subsum*subsum);
        return 0;
    }
    subset(subsum, cur+1, sum, nset);
    if (!used[cur]){
        used[cur] = 1;
        subset(subsum+arr[cur], cur+1, sum, nset);
        used[cur] = 0;
    }
    return 0;
}
int find(int nset, uint64_t sum){
    if (sum >= min)
        return 0;
    if (nset == m-1){
        uint64_t setsum = 0;
        for (int i = 0; i < n; i++)
            if (!used[i])
                setsum += arr[i];
        sum += setsum*setsum;
        if (sum < min)
            min = sum;
        return 0;
    }else{
        subset(0, 0, sum, nset);
        return 0;
    }
}
int main(){
    scanf("%d %d", &n, &m);
    for (int i = 0; i < n; i++)
        scanf("%llu", &arr[i]);
    uint64_t z = 0;
    find(0, z);
    printf("%llu", min);
}

My idea is using brutal search that counting the sum of squares of one subset and next with simple pruning when current solution is larger than current answer, but wrong. Do I lost something? thank you for answering.

3

There are 3 best solutions below

0
bruce On BEST ANSWER

Sol

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <limits.h>
#include <stdint.h>

uint64_t befsq[10] = {0}, arr[20], min = UINT64_MAX, avg;
int n, m, len[10] = {0};
int find(int cur){
    uint64_t s = 0;
    for (int i = 0; i < m; i++) s += befsq[i]*befsq[i];
    if (s >= min) return 0;
    if (cur == n){
        min = s;
        return 0;
    }
    for (int i = 0; i < m; i++){
        if (befsq[i] > avg)
            continue;
        if (befsq[i]+arr[cur] > avg){
            if (befsq[i]+arr[cur]-avg > (avg-befsq[i]))
                continue;
        }
        len[i]++;
        befsq[i] += arr[cur];
        find(cur+1);
        befsq[i] -= arr[cur];
        len[i]--;
        if (!len[i]) return 0;
    }
}

int main(){
    scanf("%d %d", &n, &m);
    for (int i = 0; i < n; i++)
        scanf("%llu", &arr[i]), avg += arr[i];
    avg /= m;
    find(0);
    printf("%llu", min);
}

Finally, I came up with the solution. The idea is trying to distribute every element to every subset. Similarly, with some "cut" to reduce search tree. The "cut" here, is that the closer the sum of subset to the average the smaller the minimum. Also, the more subset the case has the smaller the minimum. So once found that a subset has no element, the function return directly. These are from my observation, i am not sure if it is true for all similar question. Hope someone confirms the truth of my idea. Thanks.

0
Slazaa On

I have tried writing a solution myself and I came up with this.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define FIRST_SET_LEN 10
#define SECOND_SET_LEN 10

uint64_t min_sum_square_subsets(const uint64_t* set, size_t set_len, uint32_t subset_count) {
    const size_t subset_len = set_len / subset_count;

    uint64_t result = 0;
    uint64_t curr_subset_sum = 0;

    for (size_t i = 0; i < set_len; i++) {
        if (i % subset_len == 0) {
            result += curr_subset_sum * curr_subset_sum;
            curr_subset_sum = 0;
        }

        curr_subset_sum += set[i];
    }

    result += curr_subset_sum * curr_subset_sum;

    return result;
}

int main(void) {
    const uint32_t first_subset_count = 2;

    const uint64_t first_set[FIRST_SET_LEN] = {
        63230795, 3521578,
        37513838, 37860789,
        30498450, 29795141,
        41263743, 5815341,
        19046274, 20919844
    };

    const uint64_t first_result = min_sum_square_subsets(first_set, FIRST_SET_LEN, first_subset_count);

    printf("First minimum sum of squares of subsets: %llu\n", first_result);
    printf("First expected result: %llu\n", 41895269854617569);

    printf("----------\n");

    const uint32_t second_subset_count = 5;

    const uint64_t second_set[SECOND_SET_LEN] = {
        42566460, 61080136,
        12375813, 29881559,
        61767889, 60645182,
        22105410, 17262225,
        34309213, 38950048
    };

    const uint64_t second_result = min_sum_square_subsets(second_set, SECOND_SET_LEN, second_subset_count);

    printf("Second minimum sum of squares of subsets: %llu\n", second_result);
    printf("Second expected result: %llu\n", 29098109328960071);

    return 0;
}

The results don't match the expected results. But I hope it can still help you.

0
Mark Adler On

For a brute-force approach, you first have to find all of the k partitions of n. All of the 2 partitions of 10 are {{9, 1}, {8, 2}, {7, 3}, {6, 4}, {5, 5}}. All of the 5 partitions of 10 are {{6, 1, 1, 1, 1}, {5, 2, 1, 1, 1}, {4, 3, 1, 1, 1}, {4, 2, 2, 1, 1}, {3, 3, 2, 1, 1}, {3, 2, 2, 2, 1}, {2, 2, 2, 2, 2}}. Those can be generated with a simple recursive function.

For each partition, generate all unique subsets with those partitions and test them for the minimum of the objective function. E.g. for {4, 2, 2, 1, 1}, start with the 4 and generate all 10!/(4!6!) = 210 subsets of 4. In each case, with the 6 elements that remain generate all 6!/(2!2!2!2!) = 45 unique pairs of subsets of 2. The 2 that remain for each of those go in the remaining two 1 slots. There are then a total of 210*45 = 9450 arrangements to test for that partition.

The total number of arrangements for all seven 5 partitions of 10 is only 42525. For all five 2 partitions of 10, there are 511 arrangements. The examples given are then easily amenable to this brute-force approach. Though I suspect that there is a more economical search for the solution, which would be better suited for larger vectors where the number of arrangements will increase exponentially.

The answers are:

{{3521578, 19046274, 20919844, 37860789, 63230795}, {5815341, 29795141, 30498450, 37513838, 41263743}}

and

{{12375813, 61767889}, {17262225, 61080136}, {22105410, 60645182}, {29881559, 42566460}, {34309213, 38950048}}