I have seen a similar implementation in R (xgboostExplainer, on CRAN). The main difference is that contributions are expressed in log-odds of probability.

I’m curious about your thoughts of using log-odds, which has the advantage to bring a “bayesian interpretation” of contributions. However, it seems that it is not possible to maintain all additivity properties [1] and [2] ([1] a contribution of feature F is equal to the mean of the contributions of feature F for all decision trees ; [2] the prediction score is equal to the sum of all feature contributions and equal to the mean of prediction score for all decision trees.).

Any thoughts on using log-odds for contributions?

The calculation of all features could be too time consuming. ]]>

Are you aware of any research paper on this computation of “contribution by averaging decision paths over trees” ? All similar implementations in R or python I have found, trace back to this blog post.

Were you (in)directly inspired by some paper, or is it an original “contribution” from yourself?

Many thanks,

]]>