Efficient way to compare and partition two collections in Kotlin

Question

Efficient way to compare and partition two collections in Kotlin

63 Views Asked by pbuchheit At 18 March 2024 at 13:59

I'm currently working on a Kotlin project to automate the upkeep of emergency location data. As part of that project I need to compare two collections of Json objects to determine what data needs to be added/updated/deleted. Collection A is the collection of existing data. Collection B is the incoming data from the job. While the objects in the collections are not identical, they have a unique id that can be used to link them.

For each item that exists in A but not in B I need to make a delete call. For each item that is in B but not in A I need to do a create. For each item that exists in A and B I need to do an update.

I need to figure out a way to determine the needed operations, each of which will involve an HTTP request to a 3rd party API, and execute the needed operations as efficiently as possible. I know I could simply brute force the problem by iterating over all the items in each collection. However since this is going to be part of an AWS Lambda, I don't think that is going to cut it.

What is the most efficient approach to a problem like this using Kotlin?

Update: I ended up going with the solution from @broot however it required an extra step. Since the objects in each list are from two different classes, I needed to create an interface to define their common properties. When I was done my classes looked like

interface HasUid{
    val uid:String
}
data class OldData(val id:String, val name:String, val mac:String):HasUid {
    override val uid: String
        get() = name
}

data class NewData(val identifier:String, val display_name:String, val mac:String):HasUid {
    override val uid: String
        get() = display_name
}

and I called the function with

val(toDelete,toAdd,toUpdate) = diffSetsBy(oldDataList, newDataList, HasUid::uid)
        println("to update: " + toUpdate.size)
        println("to add: " + toAdd.size)
        println("to delete: " + toDelete.size)

Original Q&A

There are 1 best solutions below

**broot** · Accepted Answer · 2024-03-18T19:26:46.293000

If I understand the question correctly, it is simply about comparing two sets of objects that are stored already in the memory, by some kind of a key. There isn't really too much rocket science or performance optimizations we can involve here. The only optimization is to not compare each item with each item (that would be O(N*M)), but instead, create a hash set/map and search in it (O(N+M)):

fun <T : Any> diffSetsBy(setA: Iterable<T>, setB: Iterable<T>, keySelector: (T) -> Any?): SetsDiff<T> {
    val onlyA = setA.associateByTo(mutableMapOf(), keySelector)
    val onlyB = mutableListOf<T>()
    val both = mutableListOf<T>()

    for (b in setB) {
        if (onlyA.remove(keySelector(b)) != null) {
            both += b
        } else {
            onlyB += b
        }
    }
    return SetsDiff(onlyA.values, onlyB, both)
}

data class SetsDiff<T>(val onlyA: Iterable<T>, val onlyB: Iterable<T>, val both: Iterable<T>)

Algorithm requires that keys are not duplicated in a single set. I used a traditional imperative code as I don't see a way to make the code more functional or idiomatic and keep the readability. Also, as I created a generic diffSetsBy function, it would probably make sense to return pairs of objects for both, but we don't need it for our case and I was too lazy.

We can use it like this:

fun main() {
    val users = listOf(User("1", "James"), User("2", "Jonh"))
    val newUsers = listOf(User("2", "John"), User("3", "Dave"))

    val (toDelete, toAdd, toUpdate) = diffSetsBy(users, newUsers, User::login)
    println(toDelete) // [User(login=1, name=James)]
    println(toAdd) // [User(login=3, name=Dave)]
    println(toUpdate) // [User(login=2, name=John)]
}

data class User(val login: String, val name: String)

Efficient way to compare and partition two collections in Kotlin

There are 1 best solutions below

Related Questions in ALGORITHM

Related Questions in KOTLIN

Related Questions in AWS-LAMBDA

Related Questions in COLLECTIONS

Trending Questions

Popular # Hahtags

Popular Questions