i have these two dataframe, below is the result if i print the info:
<class 'pandas.core.frame.DataFrame'\>
> Index: 3432 entries, 11433 to 559
> Data columns (total 3 columns):
> Column Non-Null Count Dtype
> 0 text 3432 non-null object
> 1 input_ids 3432 non-null object
> 2 attention_mask 3432 non-null object
> dtypes: object(3)
> memory usage: 107.2+ KB
> None
>
> \<class 'pandas.core.frame.DataFrame'\>
> Index: 3432 entries, 11433 to 559
> Data columns (total 1 columns):
> Column Non-Null Count Dtype
>
> ----------------------------
>
> 0 labels 3432 non-null int64
> dtypes: int64(1)
> memory usage: 53.6 KB
> None\`
then i split it by train_test_split:
X_train, X_test, y_train, y_test = train_test_split(X_resampled_df, y_resampled_df, test_size=0.2)
i just want to put input_ids into the model
X_train= X_train['input_ids']
X_test= X_test['input_ids']
this is my trial model:
from tensorflow.keras import layers
from keras.optimizers import Adam
model = Sequential()
model.add(layers.Embedding(2000,20)) #The embedding layer
model.add(layers.LSTM(15,dropout=0.5)) #Our LSTM layer
model.add(layers.Dense(6,activation='softmax'))
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
i fit it with the model.fit()
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))
the error occurs, refering to the X_train and X_test:
Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
Here is the details if i print X_train.to_numpy()
[array([ 101, 1045, 2514, 2004, 2065, 1996, 4177, 1997, 3032,
2079, 2025, 17120, 1996, 2111, 1997, 2037, 3032, 2138,
2005, 1996, 2293, 1997, 2643, 1045, 3246, 2053, 2028,
2245, 2012, 2035, 1045, 2001, 1999, 2151, 2126, 16408,
2030, 2066, 2577, 1059, 102, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0], dtype=int32)
array([ 101, 1045, 2572, 2074, 2785, 1997, 2187, 3110, 16021,
29150, 1998, 15491, 1999, 2026, 2219, 3096, 102, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0], dtype=int32)
array([ 101, 1045, 2031, 2069, 2579, 2093, 9372, 7171, 2061,
2521, 1998, 2428, 1045, 2031, 2042, 3110, 2026, 2126,
2007, 1037, 2200, 4326, 4950, 1037, 2422, 22828, 1998,
1996, 2146, 6404, 2245, 6194, 1997, 4030, 5855, 102,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0], dtype=int32) ...
array([ 101, 1045, 2514, 2061, 8239, 22614, 2035, 1996, 3513,
1998, 2049, 2061, 3483, 2098, 2066, 2065, 2465, 4627,
...
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0], dtype=int32)
and the dtype of X_train.to_numpy is object
Can anyone please help on this, i believe it is a format problem but i cannot find a solution after i spent half of my day. Thanks!
Expect a solution, i tried np.stack, as the sentence size is not the same, it cannot be used. i tried to change the object type by astype() but python do not allow. i tried to wrap it with an numpy array and not work as well.