The reproducible example to fix the discussion:
from sklearn.linear_model import RidgeCV
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale
boston = scale(load_boston().data)
target = load_boston().target
import numpy as np
alphas = np.linspace(1.0,200.0, 5)
fit0 = RidgeCV(alphas=alphas, store_cv_values = True, gcv_mode='eigen').fit(boston, target)
fit0.alpha_
fit0.cv_values_[:,0]
The question: what formula is used to compute fit0.cv_values_
?
Edit:
@Abhinav Arora answer below seems to suggests that fit0.cv_values_[:,0][0]
, the first entry of fit0.cv_values_[:,0]
would be
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
where fit1
is a ridge regression with alpha = 1.0, fitted to the data-set from which observation 0
was removed.
Let's see:
1) create new dataset with first row of original dataset removed:
from sklearn.linear_model import Ridge
boston1 = np.delete(boston, (0), axis=0)
target1 = np.delete(target, (0), axis=0)
2) fit a ridge model with alpha = 1.0 on this truncated dataset:
fit1 = Ridge(alpha=1.0).fit(boston1, target1)
3) check the MSE of that model on the first data-point:
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
it is array([ 37.64650853])
which is not the same as what is produced by the fit0.cv_values_[:,0]
, ergo:
fit0.cv_values_[:,0][0]
which is 37.495629960571137
What gives?
Aucun commentaire:
Enregistrer un commentaire