I'm doing binary text classification. I used TF-IDF weighting to build the CNN model, but I got results that weren't as expected.
train_df = pd.read_csv("merged_data.csv", encoding='utf-8')
x = train_df['Text'].values
y = train_df['Label'].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
encoder = LabelEncoder()
y_train = encoder.fit_transform(y_train)
y_test = encoder.transform(y_test)
vectorizer = TfidfVectorizer(max_features = 10000)
train = vectorizer.fit_transform(x_train)
x_test = vectorizer.transform(x_test)
x_train = pad_sequences(x_train.toarray(), padding='post', dtype='float32', maxlen=1000)
x_test = pad_sequences(x_test.toarray(), padding='post', dtype='float32', maxlen=1000)
max_words = 10000
cnn = Sequential([
Embedding(max_words, 64, input_length=1000),
Conv1D(64, 3, activation='relu'),
GlobalMaxPooling1D(),
Dropout(0.5),
Dense(10, activation='relu'),
Dense(1, activation='sigmoid')
])
cnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', Precision(), Recall()])
cnn.summary()
batch_size = 128
epochs = 1
history = cnn.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_test, y_test))
(loss, accuracy, precision, recall) = cnn.evaluate(x_test, y_test, batch_size=batch_size)
preds = cnn.predict(x_test)
y_pred = np.argmax(preds, axis=1)
y_pred
clr = classification_report(y_test, y_pred)
print(clr)
Model performance results:
I would like to adjust this code to improve model performance.
Can you please adjust this code to provide better model performance?
