I have a set of data that has social media user users, the list of people who follow them, and the list of people they follow.
I want to be able to do queries on the data to find "lookalikes" - whereby we take the list of users, find their followers, then find who those people follow, and finally aggregate by that to find the people with the most "common" followers.
This data will only need to be run once per month, and is for about 500k users.
I use AWS currently.
What would you suggest is a good way to do this calculation, i.e. what services work well given the size of the data and workload of this type?