I would like to implement the classification of the algorithm based on the paper. I have a single J48
(C4.5) decision tree (code mentioned down). I would like to run it for several (I_max
) times over the dataset and calculate the C* = class membership probabilities for all the ensemble. As described here and in page 8 in the paper.
import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
url="https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"
c=pd.read_csv(url, header=None)
X = c.values[:,1:8]
Y = c.values[:,0]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
probs = clf_entropy.predict_proba(X_test)
probs
Here is my implementation of Decorate based on the proposed algorithm in the mentioned paper. Feel free to improve the solution.