im try to do training and testing for my decision tree classifier. im still new in decision tree. i have 150 data with two columns in my csv file and im tried to split it into 100 training and 50 for testing. i've tried using scikit but i still don't understand.
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(random_state=17)
classifier.fit(train_x, train_Y)
pred_y = classifier.predict(test_x)
print(classification_report(test_Y,pred_y))
accuracy_score(test_Y,pred_y)
can anyone help me how to do it ? i appreciate every help
You need to perform a
train-test-split.As you got 150 samples in total and 50 should be part of your test set, you can set the test size as an integer equal to 50.
You might want to set the
random_statefor reproducability. Generally, it's also good advice to leaveshuffle=Trueactivated. If your data is time-correlated, deactivate it to prevent data leakage. You can find detailled examples in this book.