Linear Regression with Python (3.6).

Polynomial Regression.

In order to create a polynomial model; we need to generate polynomial features in the form of [1,x,x^2,…,x^n]. These augmented features will be the input to the classifier. They represent a polynomial basis; hence any polynomial can be rewritten as a combination of the the basis. You can choose any basis for the input features as you might see fit. However in this case we will stick to a simple polynomial basis.

Figure_2

To generate the polynomial features we will use our fit_poly() and evaluate_poly functions. The weights can be simply set to [1] and the powers will represent the order of the polynomial basis. For example if a=[1,1,1,1] and k=[0,1,2,3] the result of poly_features (refer to the code) will be [1 , x , x^2, x^3] for each sample of our input.

The shape of poly_features would be (n,m) | n=samples , m= features. which satisfies sklearn .fit input shape requirements.


import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
def fit_poly(a,k):
'''returns a function of the dot product (A=V.a) '''
A=lambda x,a=a,k=k:[[a*n**k for a,k in zip(a,k)] for n in x]
return A
def evaluate_poly(x,A):
''' evaluates A=V.a,stores it in matrix form, and
returns a list y(x)=[A0,..An]'''
A=A
y=[sum(i) for i in A(x)]
return y,A(x)
# loading data
x_raw,y_raw=np.loadtxt('data.csv',delimiter=',')
# generate polynomial features
degree=3
poly_features=fit_poly([1]*(degree+1),list(range(0,degree+1)))
_,x=evaluate_poly(x_raw,poly_features)
# reshape the data
x=np.array(x)
y=y_raw.reshape(1,1)
# split the model into test/train sets
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3,
random_state=1337,
shuffle=True)
# fit the linear model
model=LinearRegression()
model.fit(x_train,y_train)
# get the R2 score
R2=model.score(x_test,y_test)
print ('The R2 score is {0:.3f}'.format (R2))
#get model parameters
coef=model.coef_.T
print (', '.join('a{0}={1:.2f}'.format(i,*a) for i,a in enumerate(coef)))
# get the model predictions &
# visualize the model
predictions=model.predict(x)
fig1=plt.figure('Linear Regression')
plt.plot(x_raw,predictions,color='b',label='Polynomial model | R2={0:.2f}'.format(R2))
plt.scatter(x_raw,y_raw,color='r',marker='.',label='Input data')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid()
plt.legend()
plt.title('Polynomial Regression Using sklearn')

Polynomial Model

As you can see the model fits the data pretty well (R2=0.90). The result of the regression yielded the following coefficients :

obtained_param
These fairly match our original model. However this approach is incorrect because we had prior knowledge of the models true parameters (polynomial degree). This raises the question of how can we make the appropriate choice for the polynomial order. In the next section we will talk about cross-validation and how taking the polynomial degree as a hyper parameter can help in solving this issue.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s