Can scipy.stats.bootstrap be used to compute confidence intervals for feature weighs in regression or classification tasks?

Question

Can scipy.stats.bootstrap be used to compute confidence intervals for feature weighs in regression or classification tasks?

48 Views Asked by Johannes Wiesner At 16 February 2024 at 14:23

I am interested in computing confidence intervals for my feature weights using a bootstrap approach. Is scipy.stats.bootstrap able to do this? Consider this classification task as an example (but same idea for regression tasks). We can get coefficients which will return a vector of feature weights.

import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = LinearDiscriminantAnalysis()
clf.fit(X, y)
coefficients = clf.coef_

The idea would be to draw samples (with replacement) n-times from X and y, fit the classifier on these batches, get the coefficients and finally compute confidence intervals using the coefficients from all resampling trials.

Original Q&A

There are 1 best solutions below

**Matt Haberland** · Answer 1 · 2024-02-16T16:44:57.777000

Yes, in the sense that bootstrap supports vector-valued statistics. For instance, this is valid code:

import numpy as np
from scipy import stats
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
X = np.array([[-1.1, -1.1], [-2.2, -1.2], [-3.3, -2.3], 
             [1.4, 1.4], [2.5, 1.5], [3.6, 2.6]])
y = np.array([1, 1, 1, 2, 2, 2])

def f(*samples):
    # each feature and the target is passed as a separate array,
    # so split them up again
    samples = np.asarray(samples)
    X = samples[:-1].T
    y = samples[-1]
    # confirm that observations stayed together/resamples make sense
    # print(X, y)
    clf = LinearDiscriminantAnalysis()
    clf.fit(X, y)
    return clf.coef_  # returning multiple values is OK
    
# pass the features and target as three separate samples
samples = (X[:, 0], X[:, 1], y)
res = stats.bootstrap(samples, statistic=f, paired=True)

LinearDiscriminantAnalysis seems to have trouble with some of the resamples, but you can see that the code is valid by replacing the clf lines with something like return X[:, 0].mean(), X[:, 1].var(); i.e. bootstrap confidence intervals of the mean of the first feature and variance of the second feature at the same time. Importantly, because paired=True, different features of the same observations stay paired, and of course the statistic can depend on both the first and second feature at the same time, as in your example.

Can scipy.stats.bootstrap be used to compute confidence intervals for feature weighs in regression or classification tasks?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SCIPY

Related Questions in BOOTSTRAPPING

Trending Questions

Popular # Hahtags

Popular Questions