Efficient way of approaching the Subset Sum Problem with very large input sets

137 Views Asked by At

The problem I am facing: I need to find a way to deal with very large sets (3 to 10000000) of positive and negative ints, this seemed relatively impossible based off of previous experiments. However, I received hope when I found a Algorithm on github that is really efficient.

However, I really need to adjust it to work with positive and negative numbers... but I am struggling, I know the unsigned int's should be int. but that's all I've got so far.

#include<stdio.h>
#include<stdlib.h>
#include<time.h>

short* flag;
int N, K, correspond = 0;
unsigned int* check, X = 0;
clock_t t1, t2;

void init() {
    int i, j;

    printf("N=");
    scanf_s("%d", &N);

    check = malloc(sizeof(unsigned int) * N);
    if (check == NULL) {
        perror("Out of memory");
        exit(-1);
    }

    srand((unsigned)time(NULL));
    printf("\n///check list///\n");
    for (i = 0; i < N; i++) {
        check[i] = rand() % 1000000 + 1;
        printf("%uyen ", check[i]);
    }
    printf("\n");

    K = rand() % N;

    flag = malloc(sizeof(short) * N);
    for (i = 0; i < N; i++)flag[i] = 0; 
    i = 0;
    while (i <= K) {
        j = rand() % N;
        if (flag[j] == 0) {
            flag[j] = 1;
            X = X + check[j];
            i++;
        }
    }
    printf("\nX=%uyen\n", X);
}

void swap(int j, int k) {
    unsigned int tmp;

    tmp = check[j];
    check[j] = check[k];
    check[k] = tmp;
}

int partition(int left, int right) {
    int j = left, k = right;
    unsigned int v;

    v = check[(left + right) / 2];
    do {
        while (check[j] > v) j++;
        while (v > check[k]) k--;
        swap(j, k);
    } while (check[j] != check[k]);

    return j;
}

void quicksort(int left, int right) {
    int j;

    if (left < right) {
        j = partition(left, right);
        quicksort(left, j - 1);
        quicksort(j + 1, right);
    }
}

void func(unsigned int sum, int i) {
    int j, k, t = 0;

    if (sum == X) {
        correspond = 1;
        t2 = clock();
        double record = (double)(t2 - t1) / CLOCKS_PER_SEC;

        printf("\nAnswer : ");
        for (k = 0; k < N; k++) {
            if (flag[k] == 1) {
                if (t == 0) t = 1;
                else if (t == 1) printf("+");
                printf("%u", check[k]);
            }
        }
        printf("\n\nThinking time : %f sec . \n", record);
        if (record <= 60) printf("Triumph!\n");
        else printf("Failure...\n");
        return;
    }
    else if (sum < X) {
        for (j = i + 1; (j <= N) && (correspond == 0); j++) {
            flag[j] = 1;
            func(sum + check[j], j);
        }
    }
    flag[i] = 0;
    return;
}

int main() {
    int i;

    init();
    t1 = clock();
    for (i = 0; i < N; i++)flag[i] = 0;
    quicksort(0, N);
    func(0, 0);

    return 0;
}

EDITS: Thanks for all of your inputs, it does help to get some constructive criticism. To start off here is the to the Github Repo https://github.com/parthnan/SubsetSum-BacktrackAlgorithm credit goes to Parth Shirish Nandedkar. The name of this Algorithm is "Amortized O(n) algorithm based on Recursive Backtracking" I am not really sure why it would be called "Amortized" as this would mean it divides the input array into multiple subset and use polynomial-time algorithm on each one.

**I have tried to fix the issues mentioned by ** "chux - Reinstate Monica"... please let me know if I did it incorrectly.

1

There are 1 best solutions below

1
chux - Reinstate Monica On

seemed relatively impossible based off of previous experiments.

First fix known problems.


At least these problems:

Out of range access

flag = malloc(sizeof(short) * (N + 1)); allocates such that code can access flag[0] to flag[N].

for (k = 1; k <= N + 1; k++) { if (flag[k] == 1) { attempts to access flag[N+1]. Result undefined behavior (UB).

Mis-matched printf

warning: format '%ld' expects argument of type 'long int', but argument 2 has type 'unsigned int' [-Wformat=]

More undefined behavior (UB).

printf("%ldyen ", check[i]);
printf("\nX=%ldyen\n", X);

Allocation success

Since the goal is "very large sets (3 to 10000000)", code definitely should check allocation success to save debug time.

check = malloc(sizeof(unsigned int) * (N + 1));
if (check == NULL) {
  perror("Out of memory");
  exit -1;
}

Side issues:

Code uses non-idiomatic array access

Code indexes [1...N]. More common to use [0...N-1].

Heavy use of global variables

More common to local variables, passing data as needed on function arguments.

"deal with ... positive and negative ints"

Fix that before posting - or if not important, no need to mention it here.