I’m trying to run the code and replicate the results but they are totally off. Had something changed in scikit-learn that make this code generate errors?

I use Python 3.6.0 and scikit-learn 0.18.1

The full code I run to replicate the plot is:

— BOF rf_boston.py —

from sklearn.ensemble import RandomForestRegressor

import numpy as np

from sklearn.datasets import load_boston

import pandas as pd

boston = load_boston()

X = boston[“data”]

Y = boston[“target”]

size = len(boston[“data”])

trainsize = 400

idx = list(range(size))

#shuffle the data

np.random.shuffle(idx)

rf = RandomForestRegressor(n_estimators=1000, min_samples_leaf=1)

rf.fit(X[idx[:trainsize]], Y[idx[:trainsize]])

def pred_ints(model, X, percentile=95):

err_down = []

err_up = []

err_mean = []

for x in range(len(X)):

preds = []

for pred in model.estimators_:

preds.append(pred.predict(X[x].reshape(1,-1))[0])

err_down.append(np.percentile(preds, (100 – percentile) / 2. ))

err_up.append(np.percentile(preds, 100 – (100 – percentile) / 2.))

err_mean.append(np.mean(preds))

return err_down, err_up,err_mean

err_down, err_up,err_mean = pred_ints(rf, X[idx[trainsize:]], percentile=95)

truth = Y[idx[trainsize:]]

correct = 0.

for i, val in enumerate(truth):

if err_down[i] <= val <= err_up[i]:

correct += 1

share_correct = correct/len(truth)

print(share_correct)

#% Store in DF

df = pd.DataFrame()

df['v']=truth

df['p_d']=err_down

df['p_u']=err_up

df['p']=err_mean

#Plot DF

a=df.sort_values(['v']).reset_index()

plt.scatter(a.index,a.v,color='green')

plt.errorbar(a.index,a.p,yerr=[a.p_d,a.p_u])

—– EOF —-

And an example of the plot I get can be seen here: https://dl.dropboxusercontent.com/u/20904939/rf_boston.png

Note that the prediction intervals are WAY bigger than in this original blog post. I've been scratching my head for a while about this, and any help would be appreciated!

]]>Congrats for the very clear and useful tutorial !

I have a question: Why did you normalize only the data for Linear Regression (normalize=True) and not for the other methods, including Lasso and RandomizedLasso that also have this parameter option?

Thank you in advance!

