eli5 show prediction not showing probability

0 votes

I'm using the show_prediction function in the eli5 package to understand how my XGBoost classifier arrived at a prediction. For some reason I seem to be getting a regression score instead of a probability for my model.

Below is a fully reproducible example with a public dataset.

from sklearn.datasets import load_breast_cancer
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from eli5 import show_prediction

# Load dataset
data = load_breast_cancer()

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']


# Split the data
train, test, train_labels, test_labels = train_test_split(
    features,
    labels,
    test_size=0.33,
    random_state=42
)

# Define the model
xgb_model = XGBClassifier(
    n_jobs=16,
    eval_metric='auc'
)

# Train the model
xgb_model.fit(
    train,
    train_labels
)

show_prediction(xgb_model.get_booster(), test[0], show_feature_values=True, feature_names=feature_names)

This gives me the following result. Note the score of 3.7, which is definitely not a probability.

enter image description here

The official eli5 documentation correctly shows a probability though.

enter image description here

The missing probability seems to be related to my use of xgb_model.get_booster(). Looks like the official documentation doesn't use that and passes the model as-is instead, but when I do that I get TypeError: 'str' object is not callable, so that doesn't seem to be an option.

I'm also concerned that eli5 is not explaining the prediction by traversing the xgboost trees. It appears that the "score" I'm getting is actually just a sum of all the feature contributions, like I would expect if eli5 wasn't actually traversing the tree but fitting a linear model instead. Is that true? How can I also make eli5 traverse the tree?

Apr 5 in Machine Learning by Dev
• 6,000 points
37 views

1 answer to this question.

0 votes

I was able to solve my own issue. eli5 only supports an older version of XGBoost (<=0.6), according to this Github Issue. I was using XGBoost 0.80 and eli5 0.8 at the time.

I'm going to post the issue's solution:

import eli5
from xgboost import XGBClassifier, XGBRegressor

def _check_booster_args(xgb, is_regression=None):
    # type: (Any, bool) -> Tuple[Booster, bool]
    if isinstance(xgb, eli5.xgboost.Booster): # patch (from "xgb, Booster")
        booster = xgb
    else:
        booster = xgb.get_booster() # patch (from "xgb.booster()" where `booster` is now a string)
        _is_regression = isinstance(xgb, XGBRegressor)
        if is_regression is not None and is_regression != _is_regression:
            raise ValueError(
                'Inconsistent is_regression={} passed. '
                'You don\'t have to pass it when using scikit-learn API'
                .format(is_regression))
        is_regression = _is_regression
    return booster, is_regression

eli5.xgboost._check_booster_args = _check_booster_args

Then replace the last line of the code snippet in my question with:

show_prediction(xgb_model, test[0], show_feature_values=True, feature_names=feature_names)

with this s​my issue was resolved 

answered Apr 7 by Nandini
• 5,480 points

Related Questions In Machine Learning

0 votes
1 answer

why scipy poisson do not have a pdf (probability density function) method?

The poisson distribution has no density function ...READ MORE

answered Apr 5 in Machine Learning by Dev
• 6,000 points
23 views
+1 vote
2 answers

ValueError: Not enough values to unpack

Make the following changes in your script, ...READ MORE

answered Jun 24, 2019 in Machine Learning by Omkar
• 69,190 points
15,180 views
0 votes
0 answers
0 votes
2 answers
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 7, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 2,756 views
0 votes
1 answer
0 votes
1 answer

Leela Chess Zero: how large is the probability vector in the output layer?

The next move's probability vector (called the ...READ MORE

answered Mar 9 in Machine Learning by Nandini
• 5,480 points
23 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
Send OTP
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP