Sunday, 25 August 2013

Implementing linear regression with standardization

Implementing linear regression with standardization

I have this confusion related to implementing linear regression with
normalization. Let's say I have a training set trainX and trainY, and test
set testX and testY. For the training set I take the mean and standard
deviation of trainX, use it to transform the trainX data to have a mean
center and unit standard deviation. I do the same for trainY. Now I run a
ridge regression for training. For cross validation, I use 10 fold and
then get some coefficients optimal.
Now when I use these coefficients on the test sets testX and testY, I need
to mean center and give unit standard deviation to both testX and testY
using the mean and standard deviation I got from training data sets. I
apply the coefficients to predict Y. To these predicted Y values, do I
again need to add the mean and standard deviation used before to get the
actual Y values? Is this the way to go?

No comments:

Post a Comment