I know there are a few ways to retrieve data from RDB table. One with pandas as read_sql, the other with cursore.fetchall(). What are the main differences between both ways in terms of:
- memory usage - is df less reccomended?
- performance - selecting data from a table (e.g. large set of data)
- performace - inserting data with a loop for cursor vs df.to_sql.
Thanks!
That's an interesting question. For a ~10GB SQLite database, I get the following results for your second question.
pandas.sql_queryseems comparable to speed with thecursor.fetchall.The rest I leave as an exercise. :D
The difference is that
cursor.fetchall()is a bit more spartan (=plain).pandas.read_sql_queryreturns a<class 'pandas.core.frame.DataFrame'>and so you can use all the methods ofpandas.DataFrame, likepandas.DataFrame.to_latex,pandas.DataFrame.to_csvpandas.DataFrame.to_excel, etc. (documentation link)One can accomplish the same exact goals with
cursor.fetchall, but needs to press some or a lot extra keys.