Django using Subquery in annotate: How to fetch all rows that match the condition of filtering

42 Views Asked by At

I have two models with M2M field. Because there wont be any update or deletion (just need to read data from db) I'm looking to have single db hit to retrieve all the required data. I used prefetch_related with Prefetch to be able to filter data and also have filtered objects in a cached list using to_attr. I tried to achieve the same result using annotate along with Subquery. but here I can't understand why the annotated filed contains only one value instead of a list of values. let's review the code I have:

  • some Routes may have more than one special point (Point instances with is_special=True).

models.py

class Route(models.Model):
    indicator = models.CharField()

class Point(models.Model):
    indicator = models.CharField()
    route = models.ManyToManyField(to=Route, related_name="points")
    is_special=models.BooleanField(default=False)

views.py

routes = Route.objects.filter(...).prefetch_related(
        Prefetch(
            "points",
            queryset=Point.objects.filter(is_special=True),
            to_attr="special_points",
        )
    )

this will work as expected but it will result in a separate database querying to fetch the points data. in the following code I tried to use Subquery instead to have a single database hit.

routes = Route.objects.filter(...).annotate(
        special_points=Subquery(
            Point.objects.filter(route=OuterRef("pk"), is_special=True).values("indicator")
        )

the problem is in the second approach will have either one or none special-point indicator when printing route_instance.special_points even if when using prefetch the printed result for the same instance of Route shows that there are two more special points.

  • I know in the first approach route_instance.special_points will contains the Point instances and not their indicators but that is the problem.

  • I checked the SQL code of the Subquery and there is no sign of limitation in the query as I did not used slicing in the python code as well. but again the result is limited to either one (if one or more exists) or none if there isn't any special point.

This is how I check db connection

# Enable query counting
from django.db import connection

connection.force_debug_cursor = True

route_analyzer(data, err)

# Output the number of queries
print(f"Total number of database queries: {len(connection.queries)}")
for query in connection.queries:
    print(query["sql"])
# Disable query counting
connection.force_debug_cursor = False

with the help from GPT, I have raw sql code that gives the result:

  • it is based on some python code so it's not clean template.
SELECT "general_route"."id", "general_route"."indicator",
       (SELECT GROUP_CONCAT(U0."indicator", ', ')
        FROM "points_point" U0
        INNER JOIN "points_point_route" U1 ON (U0."id" = U1."point_id")
        WHERE (U1."route_id" = "general_route"."id" AND U0."is_special")
        ) AS "special_points",
       (SELECT GROUP_CONCAT(U0."indicator", ', ')
        FROM "points_point" U0
        INNER JOIN "points_point_route" U1 ON (U0."id" = U1."point_id")
        WHERE (U1."route_id" = "general_route"."id" AND U0."indicator" IN ('CAK', 'NON'))
       ) AS "all_points"
    FROM "general_route"
    WHERE ("general_route"."indicator" LIKE 'OK%' OR "general_route"."indicator" LIKE 'OI%')
    ORDER BY "general_route"."indicator" ASC
1

There are 1 best solutions below

0
mh-firouzjah On

Answer to my question

After reviewing the generated SQL code of the django-ORM I tried to create a custom version of GroupConcat method so to concatenate the indicator field of all filtered objects together (then simply use split to generate a list of them). here I realized that inheritance from Aggregate (from django.db.models), will cause the generated SQL code to have an unwanted GROUP BY statement which also includes all the fields of the target model so the output will include only one value(if any) or nothing at all. even if the GroupConcat is well implemented, that group-by will end up kinda loop that will restarts GroupConcat and feed it only one value per rosw, so there is nothing from previous rows left to be concatenated with new row. But inheritance from Func (from django.db.models) dropped that extra and unwanted GROUP BY so the query will fetch all the rows and feeds the expected column of all rows to the GroupConcat at once and at the end of day the expected result is produced.

from django.db.models import Func, OuterRef, Subquery
from main.models import Point, Route


class CustomConcat(Func):
    function = "GROUP_CONCAT"

    def __init__(self, expression, delimiter=",", **extra):
        super().__init__(expression, delimiter=delimiter, **extra)


# Subquery to fetch related indicators for each Route
special_points_indicators = Point.objects.filter(route=OuterRef("pk"), special=True).values("indicator")

# Annotate Route queryset with the subquery using the custom aggregate function
routes = Route.objects.annotate(
    special_points=Subquery(
        special_points_indicators.annotate(indicators=CustomConcat("indicator", delimiter=",")).values(
            "indicators"
        )
    )
)

# Accessing the translated SQL query
sql_query = str(routes.query)
print(sql_query)

# Accessing each Route object along with its associated boundary point indicators
for f in routes:
    print(f, f.special_points)