How to sample a dataframe using a dataframe as weights with pandas

337 Views Asked by At

I want to sample rows from each columns of a dataframe according to a dataframe of weights. All columns of the dataframe of weights sum to 1.

A=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]]).transpose()
w=pd.DataFrame([[0.2,0.5,0.3],[0.1,0.3,0.6],[0.4,0.5,0.1]])
sampled_data = A.sample(n=10, replace=True, weights=w)

But this code yields the following error

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Obviously I would like the first column of A sampled according to the weights from the first column of w and so on.

With the solution like this:

sampled_data =
  1 2 3
0 2 6 8
1 2 5 7
2 3 4 8
. .....
9 1 6 9
1

There are 1 best solutions below

2
Jordan Rozum On

It sounds like you want independent samples from each column. If so, I think this does what you want:

import pandas as pd
A=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]]).transpose()
w=pd.DataFrame([[0.2,0.5,0.3],[0.1,0.3,0.6],[0.4,0.5,0.1]]).transpose()
L=[]
for i in [0,1,2]:
    s=A[i].sample(n=10,replace=True,weights=w[i])
    L.append(s.values)
A_sample=pd.DataFrame(L).transpose()
print(A_sample)

The output is

   0  1  2
0  3  6  7
1  2  5  8
2  3  6  8
3  1  6  7
4  1  5  8
5  3  6  8
6  1  6  9
7  1  6  7
8  2  4  8
9  2  6  7

Note that to make this work, I made A and w be the transposes of what you originally had.

There's probably a slicker way to do this, but I don't know it.