I use a query in the Stack Exchange Data Explorer (SEDE).
This is my query:
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, A.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 1
UNION
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, B.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
RIGHT JOIN posts B
ON A.ParentId = B.Id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 2
In the code above, posts in Stack Overflow have 2 types: question and answer. Questions(PostTypeId is 1 in database schema) have the tags, but the answers(PostTypeId is 2 in database schema) do not have the tags.
Answers belong to questions through the ParentId.
But the efficiency of the my query above is too low, I only can get some (using user id) posts' tags.
How can I get all users' posts' tags within the SEDE time out?
Several things:
Userstable but ID, then don't include that table. It chews up cycles andPosts.OwnerUserIdis the same thing.UNIONstatements if possible (it is in this case).UNIONstatements, useUNION ALLif possible (it is in this case). This saves the engine from having to do duplicate checks.So, here is the execution plan for the original query:
Here is a streamlined plan:
And the query that corresponds to it:
-- which also gives more readable results -- especially when the
WHEREclause is removed.But, if you can limit by, say, user before hand; you get an even more efficient query:
(This query adds a convenient hyperlink to the user id.)
Note that just the top 10 users have more than 50K posts.