kdb/q Modified Climbing hill algo

145 Views Asked by At

I have a question about the methods for doing the climbing hill algorithm with a specific problem I have.

I have 2 metrics: Sum_x and Sum_y, and I have about a hundred rows in this format:

CategoryA CategoryB x y
A 10 1 4
A 12 3 1
B 10 7 4
C 17 3 3

where the combination of CategoryA and CategoryB are unique. I need to find which rows are included in Sum_x and Sum_y.

Do you know how I can do this? My current approach is to code each row in binary, starting with all rows as 0, and change each value from 0 to 1 to sum the x and y column, plotting the sums of x and y and finding which matches.

Is this the right approach? I'm trying to write this in Q for now, but I think even Python will work for this.

2

There are 2 best solutions below

2
scottstein37 On BEST ANSWER

here's a solution using dynamic programming:

find_subsets_summing_to_value: { [p_list; p_sum]
    len:count p_list;
    // dp[x][y] = 1b iff any of the first x elements of p_list sum to y
    dp: #[1+len; enlist (1b , p_sum#0b)] {: .[y; z; :; y[z[0]-1][z[1]] | y[z[0]-1][z[1] - x[z[0]-1]]]; }[p_list]/ cross[1+ til count p_list; 1 + til p_sum];
    backtrack: { [p_dp; p_list; p_subsets; p_current_subset; p_idx; p_target]
        if[ (p_idx=0)&p_target=0; : p_subsets , enlist p_current_subset; ];
        if[ (p_idx=0)| p_target<0; : p_subsets; ];
        if[ p_dp[p_idx][p_target];
            p_subsets: .z.s[p_dp; p_list; p_subsets; p_current_subset , p_list[p_idx-1]; p_idx - 1; p_target - p_list[p_idx - 1]];
            ];
        : .z.s[p_dp; p_list; p_subsets; p_current_subset; p_idx - 1; p_target]
        };
    : backtrack[dp; p_list; (); (); len; p_sum];
    };

find_matches: { [p_tbl; p_sumX; p_sumY]
    : {distinct x where {y=sum x `y}[;y] each x}[;p_sumY] raze { {`CategoryA`CategoryB xasc y $[1=count x; enlist x; x]}[;y] each (cross/)group[y `x] x }[;p_tbl] each distinct find_subsets_summing_to_value[p_tbl `x; p_sumX];
    };

demo_find_matches: { [p_tbl; p_sumX; p_sumY]
    show "***************************************";
    show p_tbl;
    -1 "SUM_X: " , string[p_sumX];
    -1 "SUM_Y: " , string[p_sumY];
    show "***************************************";
    show each find_matches[p_tbl; p_sumX; p_sumY];
    -1 "\n\n";
    };

q)demo_find_matches[([] CategoryA:`A`A`B`C; CategoryB:10 12 10 17; x:1 3 7 3; y:4 1 4 3); 8; 8]
"***************************************"
CategoryA CategoryB x y
-----------------------
A         10        1 4
A         12        3 1
B         10        7 4
C         17        3 3
SUM_X: 8
SUM_Y: 8
"***************************************"
CategoryA CategoryB x y
-----------------------
A         10        1 4
B         10        7 4

q)demo_find_matches[([] CategoryA:`A`A`B`B`C`C; CategoryB:1 2 1 2 1 2; x:1 2 2 3 4 5; y:7 5 4 2 9 5); 4; 9]
"***************************************"
CategoryA CategoryB x y
-----------------------
A         1         1 7
A         2         2 5
B         1         2 4
B         2         3 2
C         1         4 9
C         2         5 5
SUM_X: 4
SUM_Y: 9
"***************************************"
CategoryA CategoryB x y
-----------------------
C         1         4 9
CategoryA CategoryB x y
-----------------------
A         1         1 7
B         2         3 2
CategoryA CategoryB x y
-----------------------
A         2         2 5
B         1         2 4
0
terrylynch On

Something like this could work for you but won't scale well

t:([]catA:`A`A`B`C;catB:10 12 10 17;x:1 3 7 3;y:4 1 4 3)

/your 8 8 example
t w where s=min s:sqrt sum{x*x}8 8-sum''[t[`x`y]@\:w:where'[(-1+count t)(01b cross)/01b]]
+`catA`catB`x`y!(`A`B;10 10;1 7;4 4)

/single row answer
t w where s=min s:sqrt sum{x*x}7 4-sum''[t[`x`y]@\:w:where'[(-1+count t)(01b cross)/01b]]
+`catA`catB`x`y!(,`B;,10;,7;,4)

/three row answer
t w where s=min s:sqrt sum{x*x}7 8-sum''[t[`x`y]@\:w:where'[(-1+count t)(01b cross)/01b]]
+`catA`catB`x`y!(`A`A`C;10 12 17;1 3 3;4 1 3)

/an example where there can be more than one solution
t w where s=min s:sqrt sum{x*x}3 2-sum''[t[`x`y]@\:w:where'[(-1+count t)(01b cross)/01b]]
+`catA`catB`x`y!(,`C;,17;,3;,3)
+`catA`catB`x`y!(,`A;,12;,3;,1)

Obviously you would turn this into a function that accepts sum_x and sum_y as input