Question though… Quoting this:

” For the decision tree, the contribution of each feature is not a single predetermined value, but depends on the rest of the feature vector which determines the decision path that traverses the tree and thus the guards/contributions that are passed along the way”

If in case I get the mean of the contributions of each feature for all the training data in my decision tree model, and then just use the linear regression f(x) = a + bx (where a is the mean bias and b is now the mean contributions) to do predictions for incoming data, do you think this will work?

]]>An option is to generate a lot of different transformations (log, square, sqrt) and the apply lasso to see which (transformed) features come out on top. ]]>

Thank you for such great article.

Why don’t you just delete the column? Shuffle is random changes, but what if we have a particular variable x which could have only {0,1,2}, by shuffling this features columns we might not 100% remove feature impact.

Thanks for your great blog.

Mhd. ]]>

Except maybe the typical RF variable importance calculation is performed (using training data ofc) only on the OOB samples for individual tree, and your second approach is basically using all the samples. This technique is formally known as Mean Decrease Accuracy or permutation importance:

https://stat.ethz.ch/education/semesters/ss2012/ams/slides/v10.2.pdf

Best regards,

]]>