Trying to understand distinct() for streams in Java8

1k Views Asked by At

I was going through a book on Java8, where distinct was explained for stream. It is mentioned that the equality in order to produce distinct elements is determined by the implementation of hashCode() & equals() method. Therefore I wrote the below code to understand with example :

static class Order{
        public Order(int id,Double value){
            this.id = id;
            this.value = value;
        }
        int id;
        Double value;
        @Override
        public int hashCode() {
            System.out.println("In Hashcode() - " + this.id +","+this.value);
            return this.id;
        }
        @Override
        public boolean equals(Object o){
            System.out.println("In Equals()");
            return this.id == ((Order)o).id;
        }
    }

    public static void main(String[] args) {
        Stream<Order> orderList = Stream.of(new Order(1,10.0),new Order(2,140.5),new Order(2,100.8));
        Stream<Order> biggerOrders = orderList.filter(o->o.value > 75.0);
        biggerOrders.distinct().forEach(o->System.out.println("OrderId:"+ o.id));
    }

It produced the following output :

In Hashcode() - 2,140.5
In Hashcode() - 2,140.5
OrderId:2
In Hashcode() - 2,100.8
In Equals()

I am confused about why the hashCode method on the same Order object(2,140.5) is called twice before comparing it with another Order Object(2,100.8).

Thanks in advance.

2

There are 2 best solutions below

0
Adi On BEST ANSWER

First time hashCode is called to check if the item (order) is already present in HashMap (distinct uses internal HashMap). Second time it is called to put the item (order) in the hashmap if not present.

Tip: Try debugging the hashCode method.

3
11thdimension On

As answerd by @Adi, distinct() is using a HashMap internally which calls hashCode() of the Order.

Here's the relevant code where both of the calls are made

In the java.util.stream.DistinctOps.makeRef()

return new Sink.ChainedReference<T, T>(sink) {
    Set<T> seen;

    @Override
    public void begin(long size) {
        seen = new HashSet<>();
        downstream.begin(-1);
    }

    @Override
    public void end() {
        seen = null;
        downstream.end();
    }

    @Override
    public void accept(T t) {
        if (!seen.contains(t)) {//first call is made here
            seen.add(t);//second call is made here
            downstream.accept(t);
        }
    }
};

Following is the stacktrace for both the calls.

enter image description here enter image description here