I've got two year's worth of energy data in 15 minute increments, and need to develop a similarity score for a forecasted day i.e. identify past days that are similar to the forecasted day.
I started by splitting the initial dataframe into a list (called trading_days below) of 730 dataframes (one dataframe for each 24 hour period), planning on feeding the forecasted day and this list into a function to calculate a similarity score then rank the historic days based on this metric.
I'm struggling with which similarity measure would be better, any help would be hugely appreciated!
I tried Euclidean distance and it worked fine, it's just clearly too primitive and doesn't take the trend over time into account.
I tried cross correlation using ccf(), adapting some AI generated code, trying to compare a new day (called first_day) to just one of the past days as a test. It got me 96 numbers (as expected, the cross correlation at each lag) but each of these 96 values is just the same number! My code is shown below:
cross_corr2 <- function(vec1, vec2) {
# Initialize a vector to store cross-correlation values
ccf_values <- numeric(length(vec1))
# Iterate over each timestamp in vec1
for (i in 1:length(vec1)) {
# Calculate cross-correlation at the current timestamp
ccf_result <- ccf(vec1, vec2, lag.max = i - 1, plot = FALSE)
# Extract the cross-correlation value at lag 0
ccf_values[i] <- ccf_result$acf[i]
}
return(ccf_values)
}
attempt2 <- cross_corr2(first_day[["NI Demand"]], trading_days[[12]]$`NI Demand`)
attempt2
I would have expected 96 different values as my output, but it was just the same number repeated 96 times. Changing which day from "trading_days" I used made this number change, but it was always repeated 96 times.