I have a step by step process that I am trying to do some predictions on. Basically we have a 10 step process that we log each step and how long it takes, when it starts, ends etc. I want to predict how long it will take to finish the 10th step once the first step has started.
Searching for "modeling, step by step process" leads you to a bunch of articles about the steps to create a model.......
I'm assuming I can just create a feature for length of time between each step and then have the label be the time between the first step and the last step, and just use a regression model, but was curious if something more specific for this type of task existed.
We work in python, and generally use sklearn.
If the steps are sequential only, then the problem is just regression of a linear sum, which you can choose to solve by either regressing on the summation of times in all of the 10 steps or individually regressing the time required for each step and then summing it up.
Each of the time values is a random variable and you can make the required transformations to satisfy the assumptions of a linear model for each of them. But otherwise using the linearity of expectation you can just use the model to regress on the sum of the time intervals in each process.