I'm working on a project where I need to parse SQL queries and determine if the final SQL output will be ordered or not. I am checking based on the presence of an ORDER BY clause.
Here's what I've tried so far:
def contains_outermost_order_by(sql):
parsed_sql = parse(sql)
for statement in parsed_sql:
tokens = [token for token in statement.tokens if token.ttype in Keyword or token.ttype in DML]
if 'ORDER BY' in ' '.join(re.sub(' +', ' ', token.value.upper()) for token in tokens):
return True
return False
However, this code only checks for the presence of ORDER BY in the outermost level of the query, and does not account for ORDER BY in subqueries or CTEs. I understand that the presence of an ORDER BY clause in a subquery or CTE does not necessarily mean the final output will be ordered. However, there can be some cases.
Consider:
1.
SELECT *
FROM
(SELECT *
FROM Customers
ORDER BY CustomerName) AS CustomerList
2.
WITH Orders_CTE AS
(SELECT *
FROM Orders),
Customers_CTE AS
(SELECT *
FROM Customers
ORDER BY CustomerName)
SELECT *
FROM Customers_CTE
Any help would be appreciated. Also, I'm new to SQL parsing or so. Could you provide guidance on the best way to understand a library like sqlparse, given that the documentation appears tailored to experienced users?