How to create DataFrame with feature importance from XGBClassifier made by GridSearchCV?

AI悦创原创2022年8月8日
大约 2 分钟

I use GridSearchCV of scikit-learn to find the best parameters for my XGBClassifier model, I use code like below:

grid_params = {
      'n_estimators' : [100, 500, 1000],
      'subsample' : [0.01, 0.05]
}

est = xgb.Classifier()
grid_xgb = GridSearchCV(param_grid = grid_params,
                        estimator = est,
                        scoring = 'roc_auc',
                        cv = 4,
                        verbose = 0)
grid_xgb.fit(X_train, y_train)

print('best estimator:', grid_xgb.best_estimator_)
print('best AUC:', grid_xgb.best_score_)
print('best parameters:', grid_xgb.best_params_)

I need to have feature importance DataFrame with my variables and their importance something like below:

variable | importance
---------|-------
x1       | 12.456
x2       | 3.4509
x3       | 1.4456
...      | ...

How can I achieve above DF from my XGBClassifier made by using GridSearchCV ?

I tried to achieve that by using something like below:

f_imp_xgb = grid_xgb.get_booster().get_score(importance_type='gain')
keys = list(f_imp_xgb.keys())
values = list(f_imp_xgb.values())

df_f_imp_xgb = pd.DataFrame(data = values, index = keys, columns = ['score']).sort_values(by='score', ascending = False)

But I have error:

AttributeError: 'GridSearchCV' object has no attribute 'get_booster'

What can I do?


You can use:

clf.best_estimator_.get_booster().get_score(importance_type='gain')

Example:

import pandas as pd
import numpy as np
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
np.random.seed(42)

# generate some dummy data
df = pd.DataFrame(data=np.random.normal(loc=0, scale=1, size=(100, 3)), columns=['x1', 'x2', 'x3'])
df['y'] = np.where(df.mean(axis=1) > 0, 1, 0)

# find the best model
X = df.drop(labels=['y'], axis=1)
y = df['y']

parameters = {
    'n_estimators': [100, 500, 1000],
    'subsample': [0.01, 0.05]
}

clf = GridSearchCV(
    param_grid=parameters,
    estimator=XGBClassifier(random_state=42),
    scoring='roc_auc',
    cv=4,
    verbose=0
)

clf.fit(X, y)

# get the feature importances
importances = clf.best_estimator_.get_booster().get_score(importance_type='gain')
importances = pd.DataFrame(importances, index=[0]).transpose().rename(columns={0: 'importance'})
print(importances)
#     importance
# x1    1.782590
# x2    1.420949
# x3    1.500457

欢迎关注我公众号:AI悦创,有更多更好玩的等你发现!

公众号:AI悦创【二维码】

AI悦创·编程一对一

AI悦创·推出辅导班啦,包括「Python 语言辅导班、C++ 辅导班、java 辅导班、算法/数据结构辅导班、少儿编程、pygame 游戏开发」,全部都是一对一教学:一对一辅导 + 一对一答疑 + 布置作业 + 项目实践等。当然,还有线下线上摄影课程、Photoshop、Premiere 一对一教学、QQ、微信在线,随时响应!微信:Jiabcdefh

C++ 信息奥赛题解,长期更新!长期招收一对一中小学信息奥赛集训,莆田、厦门地区有机会线下上门,其他地区线上。微信:Jiabcdefh

方法一:QQopen in new window

方法二:微信:Jiabcdefh