I am using Mysql database.
I have a table daily_price_history of stock values stored with the following fields. It has 11 million+ rows
id
symbolName
symbolId
volume
high
low
open
datetime
close
So for each stock SymbolName there are various daily stock values. And the data is now more than 11 million rows,
The following sql try to get the last 100 days of daily data for a set of 1500 symbols
SELECT `daily_price_history`.`id`,
`daily_price_history`.`symbolId_id`,
`daily_price_history`.`volume`,
`daily_price_history`.`close`
FROM `daily_price_history`
WHERE (`daily_price_history`.`id` IN
(SELECT U0.`id`
FROM `daily_price_history` U0
WHERE (U0.`symbolName` = `daily_price_history`.`symbolName`
AND U0.`datetime` >= 1598471533546))
AND `daily_price_history`.`symbolName` IN (A,AA, ...... 1500 symbols Names)
I have the table indexed on symbolName and also datetime
For getting 130K (i.e 1500 x 100 ~ 150000) rows of data it takes 20 secs.
Also i have weekly_price_history and monthly_price_history tables, and I try to run the similar sql, they take less time for the same number (130K) of rows, because they have less data in the table than daily.
weekly_price_history getting 150K rows takes 3s. The total number of rows in it are 2.5million
monthly_price_history getting 150K rows takes 1s. The total number of rows in it are 800K
So how to speed up the thing when the size of table is large.
When you add
INDEX(a,b), removeINDEX(a)as being no longer necessary.Your dataset and query may be a case for using
PARTITIONing.This will do "partition pruning":
datetime >= 1598471533546. Then thePRIMARY KEYwill do most of the rest of the work forsymbolname in ('A', 'AA', ...).Aim for about 50 partitions; the exact number does not matter. Too many partitions may hurt performance; too few won't provide effective pruning.
Yes, get rid of the subquery as GMB suggests.
Meanwhile, it sounds like Django is getting in the way.
Some discussion of partitioning: http://mysql.rjweb.org/doc.php/partitionmaint