So basically the data looks like this

enter image description here

enter image description here

The Unique ID repeats, the integer is always different, and the year_month goes from 2001_1 (jan 2001) to 2001_12(Dec 2001) AND it repeats for more years 2002_1-12, 2003_1-12

The Unique ID is an individual, the integer is the likelihood of finding that individual during that particular year_month.

I need to calculate the mean likelihood of finding the individual for each month throughout all years.

So I can say for individual 1, the probability of finding them in January is X , in February is X

So my first thought was aggregate by Unique ID and then combine/average probability for each month.

There are ~3.5 thousand unique IDs in each excel sheet. Each has a integer and then a year_month. I merged all excel sheets and now have ~ 1.6 million rows.

I don't know if it's bc the data is so big but I can't seem to figure this out.

0

There are 0 best solutions below