I am using H2O Python module(Random Forest). Do you know if I can use treeinterpreter with that?

if no, can you suggest how I might be able to integrate it with H20 Python random forest predictions?

]]>If you use bootstrap in your random forest (which you do by default), then indeed the bias doesn’t necessarily exactly match original trainset mean, because the bootstrap sample set for each tree will have slightly different means (and in the end they are aggregated).

In other words, bias is the mean of “real” training set of the tree as it is trained in scikit-learn, which isn’t necessarily exactly the same as the mean for the original training set, due to bootstrap.

The given bias shouldn’t be adjusted, it is in fact the correct one for the given model.

And i found something that confuses me during my application of this ”treeinterpreter” in regression:

prediction=bias+feature1contribution+..+featurencontribution.

the bias, known as the mean value of the training set, is calculated in the treeinterpreter like this:

biases = np.full(X.shape[0], values[paths[0][0]])

However, in my several trials, this bias is slightly different from the real mean value of the training set.

Have you ever met this problem? Should i modify the calculation of bias by hand?Like adding some compensation like: biases = np.full(X.shape[0], values[paths[0][0]]+p), p equals some value that makes bias equal to the real mean value.

Thank you in advance.

