Fit a linear regressor and evaluate the R2 score.
In this section sklearn is utilized to do linear regression, the data is reshaped into
[n_samples, n_features], in this case the n_samples=len(x_raw) and the n_features=1 (This is required by the sklearn .fit method). Then, we split the data into a train and test set. Finally we fit the model on the training data and test the R2 score on the testing data. The python implementation is quite simple.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from sklearn.linear_model import LinearRegression | |
import matplotlib.pyplot as plt | |
from sklearn.model_selection import train_test_split | |
# loading data | |
x_raw,y_raw=np.loadtxt('data.csv',delimiter=',') | |
# reshape the data | |
x=x_raw.reshape(–1,1) | |
y=y_raw.reshape(–1,1) | |
# split the model into test/train sets | |
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.3, | |
random_state=1337, | |
shuffle=True) | |
# fit the linear model | |
model=LinearRegression() | |
model.fit(x_train,y_train) | |
# get the R2 score | |
R2=model.score(x_test,y_test) | |
print ('The R2 score is {0:.2f}'.format (R2)) | |
#get model parameters | |
coef=model.intercept_,*model.coef_ | |
print (', '.join('a{0}={1:.2f}'.format(i,*a) for i,a in enumerate(coef))) | |
# get the model predictions & | |
# visualize the model | |
predictions=model.predict(x) | |
fig1=plt.figure('Linear Regression') | |
plt.plot(x,predictions,color='b',label='Linear model | R2={0:.2f}'.format(R2)) | |
plt.scatter(x_raw,y_raw,color='r',marker='.',label='Input data') | |
plt.xlabel('x') | |
plt.ylabel('f(x)') | |
plt.grid() | |
plt.legend() | |
plt.title('Linear Regression Using sklearn') |
As you may predicted, the result is less than satisfactory, the R2 score is (0.78). The next obvious step is to use a polynomial with a higher degree to model the non-linearity at the right end of the data,hence let us move on to polynomial regression.