sklearn: Is it possible to create a regression tree that splits on discrete data?

32 Views Asked by At

I have a data set with multiple features. One of the features can take 10 possible discrete values. When generating a regression tree using sklearn, how can I get the tree to split a node on one of the discrete values rather than on a continuous range. For example, suppose the feature X can take values of 0.0, 0.1, 0.2, 0.3, 0.4, 0.5.0.6, 0.7 , 0.8 and 0,9. Currently when generating the regression tree , the current graph shows that a split is made when X < 0.25. Is it possible to modify my code so that a split can only be made using the above discrete values?

I thought turning the numerical data into categorical data would help the tree split discretely but apparently sklearn cannot use categorical data

Thank you for reading this question

1

There are 1 best solutions below

0
Muhammed Yunus On

This SO question has got some answers that look useful: sklearn tree treats categorical variable as float during splits, how should I solve this?

I think the basic idea is that you either one-hot encode the categorical variable (that post has some example code), or you use an algorithm that natively supports categorical features, such as sklearn.ensemble.HistGradientBoostingRegressor.