I have a dataframe containing historical stock data looks like this:
| stock_name | price | time_stamp |
|---|---|---|
| stock_a | 35.6 | 2024-01-01 |
| stock_a | 34.7 | 2024-01-02 |
| stock_b | 56.7 | 2024-01-01 |
| stock_b | 55 | 2024-01-02 |
I have multiple stocks in database up to 500, and I want to calculate the correlation between stock_a and all other stocks (ex. stock_b, stock c, stock d...) with their price value across the time interval.
I expect getting the result like:
Stock_A correlation
| stock_name | correlation |
|---|---|
| stock_b | 0.87 |
| stock_c | 0.76 |
Now I can only calculate the correlation while specifying each stock other than stock a one by one (PostgreSQL as example):
select stock_b, corr(d1.close_price, d2.close_price)
from stock_dmt d1
inner join stock_dmt d2
on d1.time_stamp = d2.time_stamp
where d1.stock_code = 'stock_a' and d2.stock_code = 'stock_b';
However it's impossible to do that manually for over 500 stocks.
Is there any efficient way to complete this task? Either with SQL or Python script is welcome.
Thank you
I am trying to do with python. I hope this way will meets your question.