How to generate uniformly distributed subintervals of an interval?

93 Views Asked by At

I have a non-empty integer interval [a; b). I want to generate a random non-empty integer subinterval [c; d) (where a <= c and d <= b). The [c; d) interval must be uniformly distributed in the sense that every point in [a; b) must be equally likely to end up in [c; d).

I tried generating uniformly distributed c from [a; b - 1), and then uniformly distributed d from [c + 1; b), like this:

a = -100
b = 100

N = 10000
cs = np.random.randint(a, b - 1, N)
ds = np.random.randint(cs + 1, b)

But when measuring how often each point ends up being sampled, the the distribution is clearly non-unifrom:

import numpy as np
import matplotlib.pyplot as plt

hist = np.zeros(b - a, int)
for c, d in zip(cs, ds):
    hist[c - a:d - a] += 1

plt.plot(np.arange(a, b), hist)
plt.show()

histogram

How do I do this correctly?

2

There are 2 best solutions below

2
Matt Timmermans On BEST ANSWER

If you divide the range [a,b) into N sections, and then select one of those sections at random, then the chance of selecting each point is uniform -- 1/N. It doesn't matter how you divide the range.

Here's a solution along those lines that uses a pretty uniform selection of division points.

from random import randint

a, b, N = -100, 100, 1000000

intervals = []
while len(intervals) < N:
    # divide the range into 3 intervals [a,x), [x,y), and [y,b)
    # the distributions of the division points don't change the histogram
    # we're careful here to make sure none of them are empty
    p1 = randint(a+1,b-2)
    p2 = randint(a+1,b-2)
    x = min(p1,p2)
    y = max(p1,p2)+1

    # select one at random
    intervals.append([(a,x),(x,y),(y,b)][randint(0,2)])
1
seateen On

The way cs and ds are being created is what is messing up the distribution as it's adding bias. Because the range of possible d values depends on c, it results in a non-uniform distribution

Instead, generate subintervals by randomly selecting start and end points, ensuring each subinterval is equally likely. For each start point c, any end point d within [c+1, b] is possible, effectively randomizing subinterval selection

A cleaner way to achieve this could look like

import numpy as np
import matplotlib.pyplot as plt

a, b, N = -100, 100, 10000

# random start and end points for subintervals, ensuring they are sorted
starts_ends = np.sort(np.random.randint(a, b, (N, 2)), axis=1)
# Extract start (c) and end (d) points
cs, ds = starts_ends[:,0], starts_ends[:,1]

# store frequency counts
hist = np.zeros(b - a)

# count occurrences within each subinterval
for c, d in zip(cs, ds):
    hist[c-a:d-a] += 1

plt.plot(np.arange(a, b), hist)
plt.title("Points in Selected Subintervals")
plt.show()

histogram