Efficient way to compare and partition two collections in Kotlin

63 Views Asked by At

I'm currently working on a Kotlin project to automate the upkeep of emergency location data. As part of that project I need to compare two collections of Json objects to determine what data needs to be added/updated/deleted. Collection A is the collection of existing data. Collection B is the incoming data from the job. While the objects in the collections are not identical, they have a unique id that can be used to link them.

For each item that exists in A but not in B I need to make a delete call. For each item that is in B but not in A I need to do a create. For each item that exists in A and B I need to do an update.

I need to figure out a way to determine the needed operations, each of which will involve an HTTP request to a 3rd party API, and execute the needed operations as efficiently as possible. I know I could simply brute force the problem by iterating over all the items in each collection. However since this is going to be part of an AWS Lambda, I don't think that is going to cut it.

What is the most efficient approach to a problem like this using Kotlin?

Update: I ended up going with the solution from @broot however it required an extra step. Since the objects in each list are from two different classes, I needed to create an interface to define their common properties. When I was done my classes looked like

interface HasUid{
    val uid:String
}
data class OldData(val id:String, val name:String, val mac:String):HasUid {
    override val uid: String
        get() = name
}

data class NewData(val identifier:String, val display_name:String, val mac:String):HasUid {
    override val uid: String
        get() = display_name
}

and I called the function with

val(toDelete,toAdd,toUpdate) = diffSetsBy(oldDataList, newDataList, HasUid::uid)
        println("to update: " + toUpdate.size)
        println("to add: " + toAdd.size)
        println("to delete: " + toDelete.size)
1

There are 1 best solutions below

2
broot On BEST ANSWER

If I understand the question correctly, it is simply about comparing two sets of objects that are stored already in the memory, by some kind of a key. There isn't really too much rocket science or performance optimizations we can involve here. The only optimization is to not compare each item with each item (that would be O(N*M)), but instead, create a hash set/map and search in it (O(N+M)):

fun <T : Any> diffSetsBy(setA: Iterable<T>, setB: Iterable<T>, keySelector: (T) -> Any?): SetsDiff<T> {
    val onlyA = setA.associateByTo(mutableMapOf(), keySelector)
    val onlyB = mutableListOf<T>()
    val both = mutableListOf<T>()

    for (b in setB) {
        if (onlyA.remove(keySelector(b)) != null) {
            both += b
        } else {
            onlyB += b
        }
    }
    return SetsDiff(onlyA.values, onlyB, both)
}

data class SetsDiff<T>(val onlyA: Iterable<T>, val onlyB: Iterable<T>, val both: Iterable<T>)

Algorithm requires that keys are not duplicated in a single set. I used a traditional imperative code as I don't see a way to make the code more functional or idiomatic and keep the readability. Also, as I created a generic diffSetsBy function, it would probably make sense to return pairs of objects for both, but we don't need it for our case and I was too lazy.

We can use it like this:

fun main() {
    val users = listOf(User("1", "James"), User("2", "Jonh"))
    val newUsers = listOf(User("2", "John"), User("3", "Dave"))

    val (toDelete, toAdd, toUpdate) = diffSetsBy(users, newUsers, User::login)
    println(toDelete) // [User(login=1, name=James)]
    println(toAdd) // [User(login=3, name=Dave)]
    println(toUpdate) // [User(login=2, name=John)]
}

data class User(val login: String, val name: String)