How do I eliminate data duplication in querydsl?

102 Views Asked by At

query

select
        book1.id as col_0_0_,
        book1.title as col_2_0_,
        images1_.id as col_5_0_,
        images1_.filesize as col_9_0_ 
    from
        book book1 
    left outer join
        image images1_ 
            on book1.id=images1_.book_id 
    order by
        book1.id desc limit ? offset ?

querydsl

return jpaQueryFactory
                .select(book)
                .from(book)
                .leftJoin(book.images, image)
                .orderBy(book.id.desc())
                .offset((long)(pageNo-1) * pageSize)
                .limit(pageSize)
                .transform(
                        groupBy(book.id).list(
                                Projections.constructor(Book.class, book.id, book.title, 
                                list(Projections.constructor(Image.class, image.id, book.id, image.filesize))
                        )
                ));

Book and image have a 1:N relationship I want to send a single query while using transform and projection. If you write a query like the one below, data duplication will occur due to Leftjoin. What should I do in this case?

Requirements

Use projection and transform Must be completed with one sql

I tried groupby or distinct. However, if duplicates are removed that way, the image list cannot be retrieved by transform.

The problem with that sql is:

  1. Until the orderby clause, rows with duplicate book_ids appear.
  2. If transform is performed, duplicate rows are removed, but paging is not performed as intended. This is because the number of rows including duplicates and the number of rows that appear after removing duplicates are different.
1

There are 1 best solutions below

0
abbatrombone On

Duplication can be annoying to deal with select distinct can be used to remove duplicates. There are a few other ways you can do this if this does not work.

Method 1:

    select
        book1.id as col_0_0_,
        book1.title as col_2_0_,
        images1_.id as col_5_0_,
        images1_.filesize as col_9_0_ 
    from
        book book1 
   Where book1.bookid IN(
select bookid from image images1
); 

Method 2:

select
        book1.id as col_0_0_,
        book1.title as col_2_0_,
        images1_.id as col_5_0_,
        images1_.filesize as col_9_0_ 
    from
        book book1 
   Where exists(select bookid from imagie imagies1 where imagies1.bookid = book1.bookid); 

These will grab only the unique bookids. In method 1 it takes all of the ids as a result set and puts those ids in a list for the query. The 2nd method uses a corollary query to associate the ids in the exists statement with the main query. This limits the scope of the inner query to only rows from the outer query. (MySQL works from the outside in).

I would still put in the order by statements stack overflow was fighting me and putting them outside the code block.