text classification based on TF-IDF and CNN

47 Views Asked by At

I'm doing binary text classification. I used TF-IDF weighting to build the CNN model, but I got results that weren't as expected.

train_df = pd.read_csv("merged_data.csv", encoding='utf-8')

x = train_df['Text'].values
y = train_df['Label'].values


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)




encoder = LabelEncoder()
y_train = encoder.fit_transform(y_train)
y_test = encoder.transform(y_test)



vectorizer = TfidfVectorizer(max_features = 10000)
train = vectorizer.fit_transform(x_train)
x_test = vectorizer.transform(x_test)

x_train = pad_sequences(x_train.toarray(), padding='post', dtype='float32', maxlen=1000)
x_test = pad_sequences(x_test.toarray(), padding='post', dtype='float32', maxlen=1000)



max_words = 10000
cnn = Sequential([
    Embedding(max_words, 64, input_length=1000),
    Conv1D(64, 3, activation='relu'),
    GlobalMaxPooling1D(),
    Dropout(0.5),
    Dense(10, activation='relu'),
    Dense(1, activation='sigmoid') 
])
cnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', Precision(), Recall()])
cnn.summary()

batch_size = 128
epochs = 1
history = cnn.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_test, y_test))

(loss, accuracy, precision, recall) = cnn.evaluate(x_test, y_test, batch_size=batch_size)



preds = cnn.predict(x_test)
y_pred = np.argmax(preds, axis=1)
y_pred


clr = classification_report(y_test, y_pred)
print(clr)

Model performance results:

result

I would like to adjust this code to improve model performance.

Can you please adjust this code to provide better model performance?

0

There are 0 best solutions below