I am trying to define a class called XGBExtended that extends the class xgboost.XGBClassifier, the scikit-learn API for xgboost. I am running into some issues with the get_params method. Below is an IPython session illustrating the issue. Basically, get_params seems to only be returning the attributes I define within XGBExtended.__init__, and attributes defined during the parent init method (xgboost.XGBClassifier.__init__) are ignored. I am using IPython and running python 2.7. Full system specs at bottom.
In [182]: import xgboost as xgb
...:
...: class XGBExtended(xgb.XGBClassifier):
...: def __init__(self, foo):
...: super(XGBExtended, self).__init__()
...: self.foo = foo
...:
...: clf = XGBExtended(foo = 1)
...:
...: clf.get_params()
...:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-182-431c4c3f334b> in <module>()
8 clf = XGBExtended(foo = 1)
9
---> 10 clf.get_params()
/Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.pyc in get_params(self, deep)
188 if isinstance(self.kwargs, dict): # if kwargs is a dict, update params accordingly
189 params.update(self.kwargs)
--> 190 if params['missing'] is np.nan:
191 params['missing'] = None # sklearn doesn't handle nan. see #4725
192 if not params.get('eval_metric', True):
KeyError: 'missing'
So I've hit an error because 'missing' is not a key in the params dict within the XGBClassifier.get_params method. I enter the debugger to poke around:
In [183]: %debug
> /Users/andrewhannigan/lib/xgboost/python-package/xgboost/sklearn.py(190)get_params()
188 if isinstance(self.kwargs, dict): # if kwargs is a dict, update params accordingly
189 params.update(self.kwargs)
--> 190 if params['missing'] is np.nan:
191 params['missing'] = None # sklearn doesn't handle nan. see #4725
192 if not params.get('eval_metric', True):
ipdb> params
{'foo': 1}
ipdb> self.__dict__
{'n_jobs': 1, 'seed': None, 'silent': True, 'missing': nan, 'nthread': None, 'min_child_weight': 1, 'random_state': 0, 'kwargs': {}, 'objective': 'binary:logistic', 'foo': 1, 'max_depth': 3, 'reg_alpha': 0, 'colsample_bylevel': 1, 'scale_pos_weight': 1, '_Booster': None, 'learning_rate': 0.1, 'max_delta_step': 0, 'base_score': 0.5, 'n_estimators': 100, 'booster': 'gbtree', 'colsample_bytree': 1, 'subsample': 1, 'reg_lambda': 1, 'gamma': 0}
ipdb>
As you can see, the params contains only the foo variable. However, the object itself contains all of the params defined by xgboost.XGBClassifier.__init__. But for some reason the BaseEstimator.get_params method which is called from xgboost.XGBClassifier.get_params is only getting the parameters defined explicitly in the XGBExtended.__init__ method. Unfortunately, even if I explicitly call get_params with deep = True, it still does not work correctly:
ipdb> super(XGBModel, self).get_params(deep=True)
{'foo': 1}
ipdb>
Can anyone tell why this is happening?
System specs:
In [186]: print IPython.sys_info()
{'commit_hash': u'1149d1700',
'commit_source': 'installation',
'default_encoding': 'UTF-8',
'ipython_path': '/Users/andrewhannigan/virtualenvironment/nimble_ai/lib/python2.7/site-packages/IPython',
'ipython_version': '5.4.1',
'os_name': 'posix',
'platform': 'Darwin-14.5.0-x86_64-i386-64bit',
'sys_executable': '/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python',
'sys_platform': 'darwin',
'sys_version': '2.7.10 (default, Jul 3 2015, 12:05:53) \n[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]'}
The problem here is incorrect declaration of child class. When you declare the init method only using
foo, you are overriding the original one. It will not be initialized automatically, even if the base class constructor is supposed to have default values for them.You should use the following:
After that you will not get any error.