Implementing java's `HashCode` in Presto/Trino

54 Views Asked by At

I am trying to find a method in SQL world (specifically Trino/Presto) where I can reproduce the following Input / Output combination.

Input - s (type String)
Output - o Int where o = hashCode(s)

E.g.

String x = "plaintext";
System.out.println(x.hashCode());

The above java code snippet prints 1973234167. I want a method in Presto/Trino that does the same thing

somemethod("plaintext") = 1973234167

What's the quickest way to reproduce this in Presto/Trino world?

My attempts:

  • I have tried xxhash64. But it doesn't return the same value.
  • I can write a udf / function in Java and call in Trino environment. But that's going to take a longer path given the production environment I am working on.
1

There are 1 best solutions below

0
Heisenberg On

It was easier to write my own scaler function using Trino SPI.

I had to generate my jar and deploy it to all Trino instances via EMR bootstrap actions.

public final class MyHasher {

    @Description("UDF to generate java hashcode from Trino/Presto environment")
    @ScalarFunction("javahashcode")
    @SqlType(VARCHAR)
    public static Slice javaHashCode(
            @SqlNullable @SqlType(VARCHAR) Slice str) {
        return utf8Slice(computeHash(str.toStringUtf8()));
    }

    public static String computeHash(String val) {
        try {
            return String.valueOf(val.hashCode());
        }
        catch (Throwable t) {
            return "0";
        }
    }
}