How to calculate the correlation of same column value between different group based on another column

21 Views Asked by At

I have a dataframe containing historical stock data looks like this:

stock_name price time_stamp
stock_a 35.6 2024-01-01
stock_a 34.7 2024-01-02
stock_b 56.7 2024-01-01
stock_b 55 2024-01-02

I have multiple stocks in database up to 500, and I want to calculate the correlation between stock_a and all other stocks (ex. stock_b, stock c, stock d...) with their price value across the time interval.

I expect getting the result like:

Stock_A correlation

stock_name correlation
stock_b 0.87
stock_c 0.76

Now I can only calculate the correlation while specifying each stock other than stock a one by one (PostgreSQL as example):

select stock_b, corr(d1.close_price, d2.close_price)
from stock_dmt d1
inner join stock_dmt d2
on d1.time_stamp = d2.time_stamp
where d1.stock_code = 'stock_a' and d2.stock_code = 'stock_b';

However it's impossible to do that manually for over 500 stocks.

Is there any efficient way to complete this task? Either with SQL or Python script is welcome.

Thank you

1

There are 1 best solutions below

1
udaykumar gajavalli On BEST ANSWER

I am trying to do with python. I hope this way will meets your question.

import pandas as pd

# Creating a sample data
data = {
    "stock_name": ["stock_a", "stock_a", "stock_b", "stock_b"],
    "price": [35.6, 34.7, 56.7,55],
    "time_stamp": ["2024-01-01", "2024-01-02", "2024-01-01", "2024-01-02"]
}

# Create the DataFrame from the dictionary
df = pd.DataFrame(data)

# Convert to the pivot table 

df_pivot = df.pivot(index = 'time_stamp',columns=["stock_name"], values="price").reset_index()

# If you can drop the column timestamp 
df_pivot.drop(['time_stamp'], axis = 1).corr()