I have a mobile application that fetches data from sensors and pushes this data to AWS IoT Core Topic. I want to relay this data to AWS IoT Analytics and then analyze it with my own machine learning code - using container data-sets. The important thing is to make sure that the events are segregated and batched by device_id and analyzed in 30 minute time-windows. In my case it only makes sense to analyze together a group of events that are generated by the same device_id. The event payload already contains the unique device_id property. The first solution that comes to mind is to have a separate Channel -> Pipeline -> DataStore -> SQL DataSet -> Container Data Set setup for each of the mobile clients. Visually depicted that looks like this:
Given the number of devices is N, the problem with this architecture is that I will need to have N channels, N pipelines which are actually identical, N data stores which store identical type/schema of data and finally 2*N Data Sets. So if I have 50.000 devices the number of resources is huge. This makes me realize this is not a good solution.
The next idea that comes to my mind is to have only one Channel, one Pipeline and one Datastore for all devices and only have different SQL Data sets and different Container Data sets for each device. That looks like this:
This architecture feels much better now but if I had 50.000 devices I'd still need 100.000 different data sets. The default AWS limit is 100 data-sets per account. Of course I can request a limit increase but if the default limit is 100 data sets then I am wondering if it makes sense to request limit increase which is x1000 times the default one? Is any of these 2 architectures how AWS IoT Analytics is supposed to be used or am I missing something?
How to design an AWS IoT Analytics Pipeline that will have separate data-set for each device?
413 Views Asked by Dejan Bogatinovski At
1
I posted the same question on the AWS Forum and I got a helpful answer from an engineer who works there. I am posting his answer here for those who might have a similar architecture requirements like me: