Linear Regression with Python (3.6).

K-fold Cross-validation for tuning hyper parameters:

The goal of this section is to tune our hyper parameter using a simple grid search; before doing so, observe the effect of fitting a high order polynomial to our data.

As you may have noticed, the higher the order of the polynomial, the more we fit our curve to the training set; notice that at same time our testing score will become low. We are over-fitting to the training data and diverging from the testing data. Intuitively we can deduce that we need to find a polynomial function with an order that performs well on both testing and training sets.

However, by doing so we are inadvertently training on the testing set as well. because we are tuning the hyper parameter based on our testing performance (this will result in bad generalization when new data is introduced).

The solution is to create a validation set  that falls between the training and testing sets. We can use this validation set to tune our hyper parameter.

I’ve taken the liberty to convert our polynomial function into a class as shown below to make things neat.

Polynomial Class
@author: Abdullah Alnuaimi
class Poly():
def __init__(self,a,k):
self.A =lambda x,a=a,k=k:[[a*n**k for a,k in zip(a,k)] for n in x]
def fit(self,x):
def evaluate(self):
assert self.V != None, "No data to evaluate, please fit\
data using the method"
return [sum(i) for i in self.V]
def poly_features(self):
assert self.V != None, "No data to generate matrix, please fit\
data using the method"
return self.V

view raw

hosted with ❤ by GitHub


Since we are doing K-fold cross-validation. the validation set consists of K-folds,  where each fold is split into a training/validation set. Each of the folds is passed on to our model. The result of passing all of the folds is averaged into a single value that represents the performance of our model.


In this case this case the number of models corresponds to the highest order polynomial of our choice.

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from Poly import Poly
from sklearn.model_selection import KFold
#%% Import Data
# k = 8
degree=8 # k=8
# Initialize the polynomial class and fit
# Get the polynomial features for 8th order model
# reshape the data
# split the model into test/train sets
#%% Split into Kfold and get the test/valid score.
# Create 4 folds
# Initiate classifiers
models=[LinearRegression() for m in range(0,degree)]
valid_error=[] # variable to store the valid error
train_error=[] # variable to store the train error
#looping on every model.
for k,m in enumerate(models):
# get poly features for current order [k] poly. model
# function: fit and get r2 score of model.
f=lambda t,[t],_y[t]).\
# validation scores for each fold
v_score=[f(train,valid) for valid,train in kf.split(_x)]
# training score for each fold
t_score=[f(train,train) for _,train in kf.split(_x)]
# average valid and test error for all folds
# get best model poly. order
# fit model to training and validation data
# score model against testing data
print('Optimum polynomial order = {0}\n\
testing score = {1:.2f}'.format(k,k_score))
#%% plotting
label='Polynomial model k={1} | R2={0:.2f}'.format(k_score,k))
ax.scatter(x_raw,y_raw,color='r',marker='.',label='Input data')
ax2.set_xlabel('Polynomial Order'),ax2.set_ylabel('Error')
ax2.plot(train_error,label='Train Error')
ax2.plot(valid_error,label='Valid Error')

view raw

hosted with ❤ by GitHub


As see you can deduce from the output; the point were the validation error reaches a minimum corresponds with the ideal polynomial order. Furthermore the effect of overfitting is apparent as the polynomial order increases.

Polynomial Model cross

The result of fitting a 3rd order polynomial is identical as the last section.

We are going to end our discussion of Linear Regression with a multivariate example in the next page.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s