Uniformly randomly generate a vector of k unsigned ints that sums to N

134 Views Asked by At

Another phrasing is: randomly partition N identical items into k buckets, allowing some buckets to be empty.

For this discussion:

  • an "integer partition of N", to match the usual definition and counting, can be considered as:
    • a tuple of positive integers, in decreasing order, which sums to N
  • a vector of unsigned integers is a "partition" of N, if the sum of its elements (without integer overflow) is N.

I'd like to write a function f(N,k) that randomly and uniformly selects amongst the possible vectors of length k that partition N and returns the selected vector.

It would be nice if there was a solution that worked for all k>=1, but I'm particularly interested in k > N. So if it helps to focus or limit to that condition, that is okay. And if we have to delve into approximations/heuristics, it's okay to consider k large enough that most of the vector entries must be zero (so at least k > 2N).


My initial thoughts are:

  1. If N is small enough that it is reasonable to calculate (or look up in a table?) the number of integer partitions of N, then maybe we could proceed as:
    • Create a vector of k unsigned ints initialized to zero
    • Make a random integer partition of N. Let m be the length of this tuple.
    • Place those values in the initial m positions of the vector.
    • Randomly shuffle the vector.

This would naively treat it as equally likely for the output vector to have one entry containing N being just as likely as N entries containing 1. That isn't correct. But maybe there is a simple weighting that can be applied to "Make a random integer partition of N" which would correct for this?

  1. Another approach which feels cleaner, but would likely still need "re-weighting" somewhere:
    • Create a vector of k unsigned ints initialized to zero
    • do the following N times:
      • randomly choose an element of the vector and increment it

While this feels cleaner to start, I think this would be much messier to try to "re-weight". While the weights for part 1 sounds like a difficult algorithm question to me, I can at least imagine what needs to be calculated. Here, I'm not even sure what needs to be reweighted and how.

The reason I think it likely still needs reweighting is that there is exactly one sequence of random choices that would lead to the vector looking like [N,0,0,...,0], and N! sequences of random choices that would lead to the vector starting with N ones [1,1,...,1,0,0,...,0]. Calculating the ratios of these "incorrect weighting" of the final result sounds doable, but I don't know how I'd go about reweighting the individual steps to correct for it.

  1. Or maybe there is another approach entirely, that I have not thought of?
2

There are 2 best solutions below

8
Dave On BEST ANSWER

Generate a random int in the range 0-n k-1 times. Treat these as partitions of [1, 1, ..., 1] <- size n. Then the sums between partitions (& endpoints) are your vector.

E.g., n=2, k=5:

say we get 0, 1, 1, 2

we think of this as: [|1||1|]

which we interpret as [0, 1, 0, 1, 0] (treating gaps as zeroes).

If we had 0,0,2,2 instead, we'd have [||1,1||] or [0,0,2,0,0]

Here's Ruby code for this:

def f(n, k)
  arr = [0]
  ans = []
  (k-1).times do
    arr.append(rand(n+1))
  end
  arr.append(n)
  arr.sort!
  1.upto(k) do |i|
    ans.append(arr[i] - arr[i-1])
  end
  return ans
end

Running time is O(k log k) because of the sort. We might be able to avoid the sort by generating the random numbers in sorted order.

-- update --

This is not uniform. Here are a million runs of f(4,2)

1_000_000.times do
  m[f(2,4)] += 1
end
=> 1000000
> m
=> 
{[1, 0, 1, 0]=>205646,
 [0, 1, 1, 0]=>411624,
 [0, 0, 1, 1]=>205916,
 [0, 1, 0, 1]=>205144,
 [0, 0, 2, 0]=>205961,
 [0, 0, 0, 2]=>68718,
 [1, 0, 0, 1]=>68178,
 [0, 2, 0, 0]=>205736,
 [2, 0, 0, 0]=>68275,
 [1, 1, 0, 0]=>205783}

--- update ---

Stars and bars works. Here's Ruby code and another million run:

def g(n,k)
  arr = [1]*n + [0]*(k-1) # 1's represent stars (what we're counting), and 0's represent bars (separators)
  arr.shuffle!
  ans = []
  sum = 0
  arr.each do |val|
    if val == 0
      ans.append(sum)
      sum = 0
    else
      sum += val
    end
  end
  ans.append(sum)
end

1_000_000.times do
  m[g(2,4)] += 1
end
=> 1000000
> m
=> 
{[1, 1, 0, 0]=>99977,
 [0, 2, 0, 0]=>100150,
 [1, 0, 0, 1]=>100201,
 [0, 1, 1, 0]=>100034,
 [0, 1, 0, 1]=>99422,
 [0, 0, 0, 2]=>99865,
 [2, 0, 0, 0]=>99662,
 [1, 0, 1, 0]=>100359,
 [0, 0, 1, 1]=>100332,
 [0, 0, 2, 0]=>99998}
5
Severin Pappadeux On

How about Multinomial Distribution. Literally one liner

import numpy as np

z=np.random.multinomial(20, [1/40.]*40)

np.sum(z)

UPDATE

ok, you want vectors to have the same probability. From multinomial PMF this is obviously not the case, as you noted

PMF=(n!/x1!...xk!) p1x1...pkxk even for all pk=1/k factorials in denominator kill the deal.

So what could be done is to use Dirichlet-Multinomial, which has when ak=1 PMF, which is NOT xk dependent, meaning all vectors would be equiprobable.

PMF = Γ(K) Γ(n+1)/Γ(n+K)

There is Dirichlet-multinomial in SciPy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.dirichlet_multinomial.html) but it doesn't have sampling method ( rvs() ).

UPDATE II

Looks like one could use two stage sampling like that

n=2
k=4

a = np.ones(k)

probs = np.random.dirichlet(a)
z = np.random.multinomial(n, probs)

and here is test code which proofs it works as intended

n=2
k=4

d = dict()

N = 100000
a = np.ones(k)

for _ in range(0, N):

    probs = np.random.dirichlet(a)
    z = np.random.multinomial(n, probs)
    key = str(z)
    if not key in d:
        d[key] = 0
        
    d[key] += 1
    
print(d)