I have a working repo which is generating time series forecasts using scikit-learns random forest regression model. Everything around it, data processing, feature engineering, forecasting, tuning etc. is generalized to a certain extend, but several places are tailored specific to scikit-learn models (the way I pass parameters to the model for instance) and some places to random forest (the way I extract percentiles is using the forest predictions).
I now want to expand beyond the random forest and specifically have my eyes on LightGBM (after reading about M5 forecasting competition). This will mean that I will have to rewrite some of my architecture and therefor thought that it at the same time might make sense to look into if I should move the architecture in specific directions or perhaps use packages which would potentially could substitute a large portion of my code. I have looked at functime and mlforecast - however I have no prior experience with this. All my code is executed in Databricks which is also where I save model, metrics etc. using MLFlow.
Have anyone worked with these packages in a professional setup and have some recommendations/dont's or maybe other ideas?
As inspiration, some of my concerns are:
- Can I rely on these libraries? Will they keep existing?
- What models do they support? I cannot find a "complete" list.
- How do they handle large volumes of data? I can either have multiple timeseries (tens of thousands) or group time series into few (4-8 for instance).