Following documentation, and the code snippet provided from https://docs.sqlalchemy.org/en/14/core/connections.html#streaming-with-a-fixed-buffer-via-yield-per (posted directly below), my query is not being batched into 50_000 rows.
with engine.connect() as conn:
result = conn.execution_options(yield_per=100).execute(text("select * from table"))
for partition in result.partitions():
# partition is an iterable that will be at most 100 items
for row in partition:
print(f"{row}")
Here I am writing the results of a SQL query to a CSV file:
with engine.connect() as conn: # connect to database
with open(csv_path, mode='w', newline='') as f:
c = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
c.writerow(columns) # write column headers
result = conn.execution_options(yield_per=50_000).execute(sql_string) # execute query that results in 110k rows, expecting batches of 50k rows
for partition in result.partitions():
for row in tqdm.tqdm(partition): # each partition only has 1 row, when I expect 50k rows
c.writerow(row)
This sql_string results in 110k rows, so I expect 3 iterations of result.partitions(), however, I am seeing 110k iterations. I fear this is equivalent to DDOS-ing my own SQL Server database.
I've also tried doing this without partitions(), and the same thing happens - no batching.
with engine.connect() as conn:
with open(csv_path, mode='w', newline='') as f:
c = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
c.writerow(columns)
for row in tqdm.tqdm(conn.execution_options(yield_per=50_000).execute(sql_string)): # I expect 3 iterations, but I get 110k.
c.writerow(row)
The server seems to handle this ok, I get 22k iterations / second, but I still wonder, am I DDOS-ing myself?
I am using SQL Alchemy 1.4.42 (pip install --upgrade 'sqlalchemy==1.4.42') and Microsoft SQL Server 2019 - 15.0.X (X64)
The
yield_perargument was having no effect in execute_options. In the example query snippets I posted in my question,fetchone()gets called N times, where N is the query result's row count. I discovered thesefetchone()calls by getting the traceback from doing ctrl-C during an execution. That's a lot of server calls.Instead of concerning myself with
yield_per, I was able to achieve pagination by usingfetchmany(batch_size).I don't understand the reason for
yield_perin SQLAlchemy 1.4.42, or why the the SQLAlchemy docs don't reference usingfetchmany()instead. Oh well.