I am following along with the Tabular Training tutorial with my own data to try to teach myself their library.
https://docs.fast.ai/tutorial.tabular.html
I have repeated this exercise several times, and it does work sometimes with some data sets. But sometimes I encounter this error, and I am not clear on what is causing the error or what is resolving it.
I am doing this in a Google Colab notebook. The following code works:
from fastai.tabular.all import *
df = pd.read_csv('/tp-forecast-ml.csv')
df.columns = df.columns.str.replace(' ', '')
df.head()
list(df)
The output from the above is:
['Month',
'Search_Volume',
'Total_Clicks',
'Total_Purchases',
'SoV_Clicks',
'SoV_Purchases',
'Total_Search_CVR',
'Total_Click_CVR',
'Clicks_from_Search',
'Purchases_from_Search',
'Non_search_sessions',
'Non_search_purchase',
'Total_Sessions',
'Total_Orders',
'Total_CVR',
'Ad_Spend',
'Attr_Conversion',
'Attrb_Clicks',
'Next_Month_Sales']
I then run this:
splits = RandomSplitter(valid_pct=0.2)(range_of(df))
to = TabularPandas(df, procs=[Categorify, FillMissing, Normalize],
cat_names = ['Month'],
cont_names = ['Search_Volume', 'Total_Clicks', 'Total_Purchases', 'SoV_Clicks', 'SoV_Purchases', 'Total_Search_CVR', 'Total_Click_CVR', 'Clicks_from_Search', 'Purchases_from_Search', 'Non_search_sessions', 'Non_search_purchase', 'Total_Sessions', 'Total_Orders', 'Total_CVR', 'Ad_Spend', 'Attr_Conversion', 'Attrb_Clicks'],
y_names='Next_Month_Sales',
splits=splits)
Which results in this:
/usr/local/lib/python3.10/dist-packages/pandas/core/ops/__init__.py in to_series(right)
237 else:
238 if len(left.columns) != len(right):
--> 239 raise ValueError(
240 msg.format(req_len=len(left.columns), given_len=len(right))
241 )
ValueError: Unable to coerce to Series, length must be 17: given 6
My understanding is that this error would result from not having column header names in the tabular pandas block, or incorrect header names. But everything is correct. I've stripped whitespace from the column headers, since that is a frequent cause of this error. I've intentionally not put spaces in my column headers for the same reason.
What am I doing wrong?
I tried stripping whitespace from column header names, quadrouple checking spelling of my column header names, and ensuring all headers have been encountered for. I've checked the dataframe head as well to ensure all matches the CSV and has the expected contents.