I am profiling an SQLAalchemy repository method to fetch orders as of a certain date
def test_profile_get_orders_asof():
for i in range(1, 11):
with DBContextManager() as session:
data_repo = SqlAlchemyRepository(session)
sut = data_repo.get_orders_asof
sut = profile(
sut, immediate=True, filename=f"get_orders_asof_{i}.prof"
)
sut(datetime(2022, 12, 1, tzinfo=pytz.utc))
And the total execution time is always higher for the first call of the method
[1.3190752999999988,
0.2215163999999999,
0.25437130000000036,
0.19252480000000013,
0.1886178,
0.22220109999999996,
0.21765510000000007,
0.2034394999999999,
0.19447940000000002,
0.20421450000000013]
The method data_repo.get_orders_asof like something like this
from datetime import datetime
import pandas as pd
def get_orders_asof(self, date: datetime) -> pd.DataFrame:
asof_date = date.strftime("%Y-%m-%d")
column_types = {
"id": "int",
"quantity": "int",
"price": "float",
"date": "str"
}
return pd.read_sql(
"SELECT * FROM Orders WHERE order_date <= ?",
con=self.session.get_bind(), params=(asof_date,), dtype=column_types
)
and is executed against an SQL Server database. Why is the first call so much slower than the subsequent calls? Is the database caching anything?