Passing a set as an indexer is not supported. Use a list instead

440 Views Asked by At

Could anyone give me a support how to fix this issue?

X_train, X_test, y_train, y_test = train_test_split(df[set(df.columns) - set(['load_date','target'])],
                                                    df['target'],
                                                    test_size=0.2,
                                                    shuffle=True,
                                                    random_state=21,
                                                    stratify = df[['load_date','target']]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 X_train, X_test, y_train, y_test = train_test_split(df[set(df.columns) - set(['load_date','target'])],
      2                                                     df['target'],
      3                                                     test_size=0.2,
      4                                                     shuffle=True,
      5                                                     random_state=21,
      6                                                     stratify = df[['load_date','target']])

File /opt/conda/lib/python3.10/site-packages/pandas/core/frame.py:3714, in DataFrame.__getitem__(self, key)
   3713 def __getitem__(self, key):
-> 3714     check_dict_or_set_indexers(key)
   3715     key = lib.item_from_zerodim(key)
   3716     key = com.apply_if_callable(key, self)

File /opt/conda/lib/python3.10/site-packages/pandas/core/indexing.py:2618, in check_dict_or_set_indexers(key)
   2610 """
   2611 Check if the indexer is or contains a dict or set, which is no longer allowed.
   2612 """
   2613 if (
   2614     isinstance(key, set)
   2615     or isinstance(key, tuple)
   2616     and any(isinstance(x, set) for x in key)
   2617 ):
-> 2618     raise TypeError(
   2619         "Passing a set as an indexer is not supported. Use a list instead."
   2620     )
   2622 if (
   2623     isinstance(key, dict)
   2624     or isinstance(key, tuple)
   2625     and any(isinstance(x, dict) for x in key)
   2626 ):
   2627     raise TypeError(
   2628         "Passing a dict as an indexer is not supported. Use a list instead."
   2629     )

TypeError: Passing a set as an indexer is not supported. Use a list instead.

A few days ago I was reading from my notebook and the code above hadnt't any issue, I mean was able to split my data in training/test df = pd.read_pickle('/home/jupyter/o2extras/o2extradataset3mtarget.pickle')

However ad the data set was huge , I was facing memory space issues so I had to save it in a GCP/Bucket , since then I am reading the data set as df = pd.read_pickle('gs://xxxxx/preprocessed_dadataset.pickle')

I was wondering if it was this change which affect some index ???I am not a master on Python so any help will be very appreciated.

1

There are 1 best solutions below

2
Talha Tayyab On

Passing a set as an indexer is not supported

Change this:

df[set(df.columns) - set(['load_date','target'])

to

df[list(set(df.columns) - set(['load_date','target']))