feature selection: minimum redundancy, maximum relevancy (mRMR) using mutual information and scikit-learn

425 Views Asked by At

I've been trying to implement a minimum redondancy, maximum relevency strategy for feature selection using mutual information.

def mrmr(X_train, y_train):
    X_train_copy = X_train.copy()
    y_train_copy = y_train.copy()

    # relevancy of input features with the continuous target
    relevancies = mutual_info_regression(X_train_copy, y_train_copy)

    redundancies = []
    for index, data in X_train_copy.iteritems():
        # redundancy of input feature "i" with all other input features
        target = X_train_copy.loc[:, index]
        input = X_train_copy.drop(columns=index)
        redundancy = mutual_info_regression(input, target)
        redundancies.append(redundancy.sum() / input.shape[1])
    
    # compute score
    scores = relevancies - np.abs(redundancies)
    
    idx_sorted = np.argsort(scores)[::-1]
    sorted_scores = scores[idx_sorted]
    sorted_columns = X_train.columns[idx_sorted].values

    return sorted_scores, sorted_columns

However, when plotting the result, I get negative scores, does that make sense ?

scores, columns = mrmr(X_train, y_train)

scores_df = pd.Series(scores, index=columns)
scores_df.plot.bar(figsize=(20, 5))

enter image description here

0

There are 0 best solutions below