Hibernate: JOIN inheritance question - why the need for two left joins

26 Views Asked by At

I've been reading an article about inheritance strategies in Hibernate.

An Author has publications, and a Publication can be a Book or a BlogPost. In the join inheritance strategy the following table structure is generated:

enter image description here

So the publicationauthor table stores two foreign keys: authorid and publicationid. The publication table stores the fields that are common to books and blogposts, and the book and blogpost store fields that are common only to the subclasses.

Then the article shows the SQL query generated when a author.getPublications() method is called:

select publicatio0_.authorId as authorId2_4_0_, publicatio0_.publicationId as 
publicat1_4_0_, publicatio1_.id as id1_3_1_, publicatio1_.publishingDate as 
publishi2_3_1_, publicatio1_.title as title3_3_1_, publicatio1_.version as version4_3_1_, 
publicatio1_1_.pages as pages1_2_1_, publicatio1_2_.url as url1_1_1_, 

case when 
publicatio1_1_.id is not null then 1 when publicatio1_2_.id is not null then 2 when 
publicatio1_.id is not null then 0 end as clazz_1_ 

from PublicationAuthor publicatio0_ 

inner join Publication publicatio1_ on publicatio0_.publicationId=publicatio1_.id 

left outer join Book publicatio1_1_ on publicatio1_.id=publicatio1_1_.id 

left outer join BlogPost publicatio1_2_ on publicatio1_.id=publicatio1_2_.id 

where publicatio0_.authorId=?

The SQL query first inner joins the publicationauthor and publication, so that each Publication gets an authorid from the publicationauthor table.

Then the result (which would have Books.length + BlogPosts.length number of rows) is left outer joined with book table, so that each publication that is a book gets the columns from the book table.

At this point the result looks something like this, the green rows are the rows that correspond to books and have the corresponding columns, the gray rows are the rows that didn't map to any id in the book table and thus only have authorid column and columns from the publication table filled.

enter image description here

Then we do the second left outer join to join the result with the blogpost table. This is the way I imagine what's going on here: the cartesian product is created: each row in the result table is mapped to all rows in the blogpost table (orange rows):

enter image description here

Then each row from the result table (green and gray) is compared by id with the orange row, and if there's a match, the columns from the blogpost row are added to the result table for the matching row.

However, as can be seen from the example, most of the result table is green rows - books, which by definition wouldn't be able to find a matching row in the blogpost table, yet we perform the comparison many times over regardless.

Wouldn't it be more efficient to only make the cartesian product of the gray and orange rows, like this:

enter image description here

If we assume we have a million books and only one blogpost, wouldn't it enhance performace dramatically? The way I see it, once we've joined the books, we could isolate the gray rows and join them with blogpost separately, and then perhaps use a kind of a union operation to get the final result table?

Is there something I'm not seeing/understanding here?

0

There are 0 best solutions below