Bucket aggregation and grouping in MongoDB Java driver

380 Views Asked by At

I do a short query to get some data from MongoDB using MongoDB Java Driver v 3.2. I am using Bucket and Unwind aggregation to make my work and result a bit meaningful. I made a filter based on a specific id field (IdManfc) and a specific signal field (mysignal). I aim to have a result using bucket aggregation to group my data based on some boundaries(hour by hour timestamp). I want to show the id and signal in my result and that is why I used a traditional SQL trick (min(id), because I filtered only one id) and (min(signal), because I filtered only one signal) to include my id and signal in the result. Now I want to change a bit my theory, I want to imagine that I only have one id but many signal and I want to include them in my result in a way that id is one but information for each signal is different. Or what if I have many Ids and many signals and I want to include them in my result. Does anyone have any idea?

Some hint on what id and signal means in my code, id refers to a machine in a factory and signal means the type of work that machine does. Let's say a computer in the factory has the id of 12345 and it does 3 types of work which are identified by signals, one signal is printing(signal name X), the second signal is scanning(Y), the third signal is not working(Z). Now I want to make a bucket specifying machine-id(12345) and its type of work grouping hour by hour and calculating the min and max of the time spent( which is the value) for each signal.

    Bson startTimeBucket = match( gte("lastTimestamp",startTime));
    Bson endTimeBucket= match(lte("lastTimestamp",finishTime));
    Bson idFilterBucket = match(eq("id", IdManfc));
    Bson unwindBucket = unwind("$signals");
    Bson filterSignalUnBucket = match(eq("signals.signal", mysignal));

    Bson Bucket=bucket("$lastTimestamp", timeBucketList, new BucketOptions()
            .defaultBucket("sum")
            .output(min("id","$id"),
                    min("signal","$signals.signal"),
                    avg("Average",Document.parse("{$toDouble: \"$signals.value\"}")),
                    stdDevSamp("STD",Document.parse("{$toDouble: \"$signals.value\"}")),
                    max("Max",Document.parse("{$toDouble: \"$signals.value\"}")),
                    min("Min",Document.parse("{$toDouble: \"$signals.value\"}"))
            )
    );

    List<Document> resultbuckt = coll.aggregate(asList(startTimeBucket,
                                                        endTimeBucket,
                                                        idFilterBucket,
                                                        unwindBucket,
                                                        filterSignalUnBucket,
                                                        Bucket
                                                )).into(new ArrayList<Document>());
    for (Document Document : resultbuckt) {
        System.out.println(Document);

My data in the database: enter image description here

A corresponding JSON:

{"_id":{"$oid":"5db06b6880bd68ca32a853bc"},"disconnected":false,"id":"144-12","lastTimestamp":{"$numberLong":"1571842909626"},"manufacturer":"gmb","modelName":"GMB_Mod1","signals":[{"event":"changed","signal":"InterfaceType.Moulds.Mould_1.TemperatureZones.TemperatureZone_4.ActualTemperature","stability":"unstable","timestamp":{"$numberLong":"1571842907576"},"type":"DOUBLE","value":"175.0","writable":"read","written":false},{"event":"changed","signal":"InterfaceType.InjectionUnits.InjectionUnit_1.TemperatureZones.TemperatureZone_3.ActualTemperature","stability":"unstable","timestamp":{"$numberLong":"1571842907576"},"type":"DOUBLE","value":"35.0","writable":"read","written":false}],"tC":{"$numberInt":"10"}}
{"_id":{"$oid":"5db069c480bd68ca32a841ec"},"disconnected":false,"id":"144-12","lastTimestamp":{"$numberLong":"1571842499744"},"manufacturer":"gmb","modelName":"GMB_Mod1","signals":[{"event":"changed","signal":"InterfaceType.InjectionUnits.InjectionUnit_1.TemperatureZones.TemperatureZone_1.ActualTemperature","stability":"unstable","timestamp":{"$numberLong":"1571842496543"},"type":"DOUBLE","value":"65.0","writable":"read","written":false}],"tC":{"$numberInt":"10"}}

2 lines of my result which is the same signal and different time(because i group the result using bucket and time boundaries:

Document{{_id=1574434800859, id=144-12, signal=signal 1, Average=69.86274509803921, STD=1.2003267528991806, Max=71.0, Min=68.0}}
Document{{_id=1578444800859, id=144-12, signal=signal 1, Average=69.86274509803921, STD=1.2003267528991806, Max=72.0, Min=69.0}}

In my code i specifically filtered one signal and then to see it in my result i use the min(signal). what I expect to see from the result is something similar to what I already got. but I would like to have also the results for other signals within that time bounderies("_id), and remove the filter for only one signal, i expect something like below:

Document{{_id=1574434800859, id=144-12, signal=**the name of 1st signal**, Average=69.86274509803921, STD=1.20035267528991806, Max=80.0, Min=68.0}}
Document{{_id=1574434800859, id=144-12, signal=**the name 2nd signal**, Average=69.86274509803921, STD=12.2003267528991806, Max=95.0, Min=68.0}}
Document{{_id=1574434800859, id=144-12, signal=**the name of 3rd signal**, Average=69.86274509803921, STD=0.2003267528991806, Max=85.0, Min=68.0}}

Does anyone have any idea?

Thank you

0

There are 0 best solutions below