Pipeline data processing and code architecture

33 Views Asked by At

I am currently working for my PhD on a Python package that extracts multiple features from data. Each feature can be defined as a pipeline of mathematical operations, which can take some parameters. The results of those operations are needed by different features, and I would like to avoid computational cost (at the moment this is not a problem, but at some point I will have to process a lot of data). Is there a way of structuring my code so that those shared operations are only computed once ?

Here is an illustration : Pipelines of operations

In this example, Operation 1 with Parameter = Value 1 only needs to be computed once, and the result is used in different pipelines which lead to 3 different features at the end. Also in this example Operation 3 needs implicitely the result of Operation 2.

For the moment, my solution is that all my operations inherits from a PipelineElement class, which have a parent attribute and a children attribute. This way I can create a tree of operations and only compute it once. Is it the way ? Is this pattern frequently used and if so where can I learn some good practices ?

1

There are 1 best solutions below

0
Juan Manuel Rodriguez On

Your solution seems to be something in the middle of a Strategy, Composite, or Visitor pattern as defined in the "Gang of four" design pattern book. It is an old, but still relevant book. Its official name is "Design Patterns Elements of Reusable Object-Oriented Software" By Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. 1994.

You can find a fine resume of these patterns in:

https://www.digitalocean.com/community/tutorials/gangs-of-four-gof-design-patterns#behavioral-design-patterns