How to compare "raw" joins to the output of deep feature synthesis in Featuretools?

45 Views Asked by At

Is it possible to get the results someone would get from deep feature synthesis, but without any aggregations?

I have some small datasets, and I want to be able to compare the "processed" outputs of deep feature synthesis with the "raw" joined data.

For example, this aggregate collapses the resulting df down to 1 row per customer:

fm, features = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    agg_primitives=["sum"],  
    trans_primitives=[],
)

fm.head()

I'd love to not have that "sum" happening, so that I get a resulting dataframe with multiple rows per customer. But I can't swap out agg_primitives=["sum"], for agg_primitives=[], because I get:

AssertionError: No features can be generated from the specified primitives. Please make sure the primitives you are using are compatible with the variable types in your data.

I expect the answer is "what you want is not possible in featuretools".

Thank you!

1

There are 1 best solutions below

1
Nate Parsons On

If you want to see the output without any aggregations performed you can simply set the agg_primitives parameter to an empty list in your call to ft.dfs. Similarly, you can disable transformations by passing an empty list to trans_primitives.

Here is an example of how you would do this using one of the Featuretools demo EntitySets:

import featuretools as ft

es = ft.demo.load_retail()

fm, features = ft.dfs(
    entityset=es,
    target_dataframe_name="order_products",
    agg_primitives=[],
    trans_primitives=[],
)

                 order_id product_id  quantity  unit_price   total orders.customer_name  orders.country  orders.cancelled
order_product_id
0                  536365     85123A         6      4.2075  25.245         Andrea Brown  United Kingdom             False
1                  536365      71053         6      5.5935  33.561         Andrea Brown  United Kingdom             False
2                  536365     84406B         8      4.5375  36.300         Andrea Brown  United Kingdom             False
3                  536365     84029G         6      5.5935  33.561         Andrea Brown  United Kingdom             False
4                  536365     84029E         6      5.5935  33.561         Andrea Brown  United Kingdom             False

One thing to note, by default Featuretools only includes certain column types in the output, specifically numeric, boolean and categorical columns. If you want to include all column types, simple include return_types="all" in the call to ft.dfs.