I'm currently working on a Kotlin project to automate the upkeep of emergency location data. As part of that project I need to compare two collections of Json objects to determine what data needs to be added/updated/deleted. Collection A is the collection of existing data. Collection B is the incoming data from the job. While the objects in the collections are not identical, they have a unique id that can be used to link them.
For each item that exists in A but not in B I need to make a delete call. For each item that is in B but not in A I need to do a create. For each item that exists in A and B I need to do an update.
I need to figure out a way to determine the needed operations, each of which will involve an HTTP request to a 3rd party API, and execute the needed operations as efficiently as possible. I know I could simply brute force the problem by iterating over all the items in each collection. However since this is going to be part of an AWS Lambda, I don't think that is going to cut it.
What is the most efficient approach to a problem like this using Kotlin?
Update: I ended up going with the solution from @broot however it required an extra step. Since the objects in each list are from two different classes, I needed to create an interface to define their common properties. When I was done my classes looked like
interface HasUid{
val uid:String
}
data class OldData(val id:String, val name:String, val mac:String):HasUid {
override val uid: String
get() = name
}
data class NewData(val identifier:String, val display_name:String, val mac:String):HasUid {
override val uid: String
get() = display_name
}
and I called the function with
val(toDelete,toAdd,toUpdate) = diffSetsBy(oldDataList, newDataList, HasUid::uid)
println("to update: " + toUpdate.size)
println("to add: " + toAdd.size)
println("to delete: " + toDelete.size)
If I understand the question correctly, it is simply about comparing two sets of objects that are stored already in the memory, by some kind of a key. There isn't really too much rocket science or performance optimizations we can involve here. The only optimization is to not compare each item with each item (that would be
O(N*M)), but instead, create a hash set/map and search in it (O(N+M)):Algorithm requires that keys are not duplicated in a single set. I used a traditional imperative code as I don't see a way to make the code more functional or idiomatic and keep the readability. Also, as I created a generic
diffSetsByfunction, it would probably make sense to return pairs of objects forboth, but we don't need it for our case and I was too lazy.We can use it like this: