Setting priority queue values to optimize the probability of finding a 'gift'

Question

Setting priority queue values to optimize the probability of finding a 'gift'

200 Views Asked by Philip H At 01 September 2021 at 15:36

I have a priority queue of "door numbers". I get the next door number from the priority queue (i.e. the door with the lowest corresponding priority value), and then open the door. Behind the door, there may be a gift or not. Based on the presence / absence of a gift, I update the priority for this door number, and put it back into the priority queue. I then repeat, getting the next door number to open, and so on.

Assuming every door has a different gift-replenishment rate (i.e. some may get a new gift daily, others never at all), how should I update the priority values in order to maximize the number of gifts I find? That is, I want to maximize the ratio of doors I open with gifts to doors I open without gifts.

I should note that replenishment rates are not guaranteed to be fixed over time / there is random variation. But I'm okay with simplifying assumptions here.

This almost seems like a Monte-Carlo problem to me, except that the more often I explore a node (door), the lower its expected value. (And of course there's no tree to build; we only need to figure out the value of depth-1 nodes.)

The most trivial way is to keep track of last priority (LP) and current priority (CP), with delta = CP - LP. If we find a gift, set the next priority NP = CP + delta - 1; otherwise set NP = CP + delta + 1. This works I guess, but seems rather slow in its optimization.

Or we could have a multiplicative value instead: NP = CP + delta * shrink or NP = CP + delta * grow, where shrink < 1 and grow > 1. This is what I currently have, and it seemed to work fine for months, but now I'm getting the situation where some doors are being opened back-to-back (i.e. open door D, found gift, put back on priority queue, D is now best choice again, no gift found of course, now put back on queue with worse priority) which seems pretty bad. For reference, I used shrink = 0.9 and grow = 1.3.

Is there a math formula (as with Monte-Carlo) expressing the optimal way to explore doors?

Original Q&A

There are 1 best solutions below

**David Eisenstat** · Answer 1 · 2021-09-01T20:46:05.050000

Multi-armed bandit theory runs deep and is not my specialty, so there's probably a reference that I don't know about. That being said, my first instinct is:

Simplify the math with the spherical-cow assumption that, for each door, the replenishment time is exponentially distributed with some unknown rate that stays constant over time.
Separate out our estimate of the replenishment rate from the history.
Set the priority of each door to 1 − exp(−λx) where λ is the estimated replenishment rate and x is the time since we last opened the door. (Higher is better.)

Multi-armed bandits typically have to balance exploration with exploitation, but my hunch here is that we'll get this naturally from the replenishment process.

Most of the technical detail is in doing the estimate. We have a bunch of examples (x, b) where x is the time since we last opened the door and b is whether there was a gift. For a given rate λ, the formula above for the priority gives the expected value of b. I'll suggest a maximum likelihood estimator for λ. This means maximizing the sum of log(exp(−λx)) = −λx over all (x, 0) examples plus the sum of log(1 − exp(−λx)) over all (x, 1) examples. This function can be optimized directly, but there are two issues:

The more times we open a door, the more expensive the optimization gets.
If there are no positive or negative examples, then the solution is degenerate. Probably we should require λ be at least monthly or something to avoid giving up on a door entirely.

What I would actually recommend is picking a small set of λ values to make this a discrete optimization problem.

(Another potential problem is that the priority formula could be inefficient for many doors. What you could do instead is pick a target threshold for priority and then calculate when the priority will exceed that threshold.)

Setting priority queue values to optimize the probability of finding a 'gift'

There are 1 best solutions below

Related Questions in ALGORITHM

Related Questions in PRIORITY-QUEUE

Related Questions in MONTECARLO

Related Questions in NONLINEAR-OPTIMIZATION

Related Questions in MONTE-CARLO-TREE-SEARCH

Trending Questions

Popular # Hahtags

Popular Questions