I have a dataframe df with transactions where the values in the column Col can be repeated. I use Counter dictionary1 to count the frequency for each Col value, then I would like to run a for loop on a subset of the data and obtain a value pit. I want to create a new dictionary dict1 where the key is the key from dictionary1 and the value is the value of pit. This is the code I have so far:
dictionary1 = Counter(df['Col'])
dict1 = defaultdict(int)
for i in range(len(dictionary1)):
temp = df[df['Col'] == dictionary1.keys()[i]]
b = temp['IsBuy'].sum()
n = temp['IsBuy'].count()
pit = b/n
dict1[dictionary1.keys()[i]] = pit
My question is, how can i assign the key and value for dict1 based on the key of dictionary1 and the value obtained from the calculation of pit. In other words, what is the correct way to write the last line of code in the above script.
Thank you.
Since you're using
pandas, I should point out that the problem you're facing is common enough that there's a built-in way to do it. We call collecting "similar" data into groups and then performing operations on them agroupbyoperation. It's probably wortwhile reading the tutorial section on the groupbysplit-apply-combineidiom -- there are lots of neat things you can do!The pandorable way to compute the
pitvalues would be something likeFor example:
which you could turn into a dictionary from a Series if you insisted: