Why does my binary search algorithm miss the optimal solution, and how can it be improved?

102 Views Asked by At

I am looking primarily for a theoretical answer, ideally one addressing my misunderstanding (below) about binary search algorithms. Practical examples are welcome as well.

I have a complex function where I want to do a binary search on that function to find the value of n, so that computed_goal matches a given goal, where goal is a previously-given value I am seeking to match. The algorithm below illustrates this.

l = 0, h = 4000; // low, and high bounds for binary search
while (l < h)
{
    n = (l + h) / 2;

    x = vars / n; // vars represents a more complex computation, x depends on n
        
    computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal depends on n
        
    if (computed_goal < goal)
        l = n + 1;
    else
        h = n - 1;

    if (abs(computed_goal - goal) / goal < 0.01) // check for relative error within 1%
        return n; // successfully matched goal value!
    else
        return 0; // iteration failed to match goal to desired precision
}

Now, I think you may already be seeing a problem if you are familiar with binary search algorithm, but I'm not super-verse on them. My problem does not always present itself but sometimes it is as follows:

Say my function gives me the following values for these n:

n = 2998: computed_goal = 967.8732568 (-2.1267432 = computed_goal - goal)
n = 2999: computed_goal = 968.6512540 (-1.3487460)
n = 3000: computed_goal = 969.4294677 (-0.5705323)
n = 3001: computed_goal = 970.2078977 (+0.2078977) <= This is the optimal solution
n = 3002: computed_goal = 970.9865442 (+0.9865442)

Now, suppose l = 2987, and h = 3015. This makes n = 3001. My goal is to match value of 970. Note that n = 3001 is the optimal solution because |computed_goal - goal| is the smallest in magnitude and it is the closest to the goal

However, my algorithm looks at it and goes computed_goal = 970.2078977 < goal = 970 = False, and sets new h to be n - 1 = 3000, thus excluding the optimal solution from any future iterations. It so happens that my algorithm finally incorrectly settles in on n = 2999, two steps away from optimal solution, here's a listing for reference:

   l    h      n    computed_goal    goal
2987 3015 = 3001, 970.207897725643 < 970
2987 3000 = 2993, 963.9865161108   < 970
2994 3000 = 2997, 967.095475944351 < 970
2998 3000 = 2999, 968.651254012028 < 970

Misunderstanding

I understand that I am checking but then excluding the optimal solution and because I'm not backtracking back to it, it gets lost and excluded...

I was told before that you should (always) exclude the endpoints that you've already checked, because you've already checked them, so it does not make sense to include them in future search bounds. So like in this case I've checked 3001, and in any future iterations I exclude that value from further checks. I am guessing that is incorrect, and I am guessing I must include them(?) Can you help me with theory part of this?

Attempt to fix. I should note as well that when I change my if statement to include end points, like so:

   if (computed_goal < goal)
        l = n;
    else
        h = n;

Algorithm keeps executing indefinitely in an infinite look with l = 3000 and h = 3001. I am not sure how to go around that. I could change my while to while(l + 1 < h), which will break the infinite loop, but it looks off to me, and I am concerned that I may be introducing a different error into the code, and I don't have enough insight to say if I should be using the l + 1 < h loop condition.

Function Sketch

enter image description here

Code

function matchGoal(int $goal = 970): int
{
    $l = 0;
    $h = 3900;

    while ($l < $h)
    {
        $n = intdiv($h + $l, 2);
        $x = 8321100 / $n;
        $factor = 5.113233695 * pow($n / 3000, 2);

        $computed_goal = (((-8.16965E-10 * $x + -3.15457E-06) * $x + 0.008437163) * $x + 207.9079475) * $factor - 0.076934117;

        if ($computed_goal < $goal)
            $l = $n + 1;
        else
            $h = $n - 1;
    }

    $relative_error = 100 * abs($computed_goal - $goal) / $goal;
    return ($relative_error < 1) ? $n : 0;
}

// outputs 2999, when 3001 is optimal
print matchGoal() . PHP_EOL; 
3

There are 3 best solutions below

2
Gassa On BEST ANSWER

The binary search would like a monotonic binary predicate to work.
A binary predicate is a function with two possible values: true and false.
A monotonic binary predicate is true up to some point and false after, or vice versa.

Suppose the problem is "find the X such that value f(X) is closest to Y".
Further, suppose f(X) is increasing.

First, we reformulate the problem as one with a binary predicate, for example, "find the smallest X such that f(X) >= Y". Or "find the largest X such that f(X) < Y". Now, these are problems solvable by binary search. With the intent of, after solving this simpler problem, return to the original problem in some simple final step: "check X and X-1" or "check X and X+1", respectively, and pick the best of the two.

Next, I'd like to show an asymmetric version of binary search, instead of one excluding endpoints from both sides. Binary search doesn't have to work this way, but this may show the method from a different point of view than you already have, which may help understanding.

Let us solve the "find the largest X such that f(X) < Y" version using binary search. Formally, the value of the binary predicate pred(X) is true if f(X) < Y and false if f(X) >= Y. Now, it would be convenient to maintain a semi-interval [lo, hi), such that pred(lo) = true and pred(hi) = false. Initially, lo is the inclusive initial lower bound, and hi is the exclusive initial upper bound. Check the middle value me = (lo + hi) / 2; round to either side, this version will work either way. If pred(me) = true, then [me, hi) is the new interval. If pred(me) = false, then [lo, me) is the new interval. Proceed until lo + 1 = hi: this means the only point in [lo, hi) is lo, and it is the answer (to our binary search subproblem, not to the original problem).

0
StriplingWarrior On

I was told before that you should (always) exclude the endpoints that you've already checked, because you've already checked them, so it does not make sense to include them in future search bounds. So like in this case I've checked 3001, and in any future iterations I exclude that value from further checks. I am guessing that is incorrect, and I am guessing I must include them(?) Can you help me with theory part of this?

That would only be correct if "checking" an "endpoint" means you verify that it's not the right answer.

Think about looking up a word in a real-world dictionary. You can start by opening the book to the center and checking whether the word is above or below that point in the dictionary, then discard the half which you know doesn't have your word, bisect the remainder, and repeat the process until you've found the page and column that should have your word. But if you open the dictionary and the word at the top of the page happens to be the word you're looking for, you don't exclude that word: you short-circuit the whole process and say you've won.

Now, what if the word isn't in the dictionary, but your job is to find the word nearest to it? You can't discard a word just because it's not the word you're looking for. You'd probably need to keep track of the closest endpoints you've encountered above and below the word you're searching for, and when you get to a point where there are no words between them then see which of those two words is "closest" to the one you're looking for.

0
Jean-Baptiste Yunès On

You may always test for the base case first (a safe way to do it), so:

while (l <= h)
 {
     n = (l + h) / 2;

     x = vars / n; // vars represents a more complex computation, x depends on n
     
     computed_goal = ax^4 + bx^3 + cx^2 + dx + c; // computed_goal depends on n
     
     // base case reached ?
     if (abs(computed_goal - goal) / goal < 0.01)
         return n; // successfully matched goal value!

     // goal not reached, recurse on one of the halves
     if (computed_goal < goal)
         l = n + 1;
     else
         h = n - 1;
 }

You may also use the follwing pattern (from Wikipedia), note the if else if... and the L<=R:

function binary_search(A, n, T) is
    L := 0
    R := n − 1
    while L ≤ R do
        m := floor((L + R) / 2)
        if A[m] < T then
            L := m + 1
        else if A[m] > T then
            R := m − 1
        else:
            return m
    return unsuccessful

Be careful to ensure that your goal is always reached, if not you will end the while and must return something!