Linear Regression with Python (3.6).

Multivariate Regression(WIP):

When it comes to multivariate we will resort to sklearns  Polyfeatures() along with Pipeline(). This will facilitate for a more readable code.

The steps are very similar to the previous example. The only difference is that we will create function of [x,y] and evaluate it in order to create some dummy data. Then the data is fed through a pipeline that generate the polynomial features and get the score for the k-folds.

multivariate
Prediction Surface


import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
def gaussian_noise(mu,sigma,n):
return np.random.normal(mu,sigma,n)
from sklearn.preprocessing import PolynomialFeatures
degree=8
mean=0
sigma=.2
n=30
x = np.linspace(2,2,n)
y = np.linspace(2,2,n)
X, Y = np.meshgrid(x, y) # create a meshgrid to evaluate z(x,y)
Z=Y**2 X**2 # evaluate z
# reshape for sklearn and add noise
Z=Z.reshape(1,1).T+np.array([gaussian_noise(mean,sigma,n**2)]).T
X=X.reshape(1,1).T
Y=Y.reshape(1,1).T
# input features shape [2,n**2] n= len(x) or len(y)
data=np.concatenate([X,Y],axis=1)
# split to test train sets
_x,x_test,_y,y_test = train_test_split(data, Z,\
test_size=0.30,
random_state=1337)
# create expression for pipline
estimators=lambda x:[('PolyFeatures',PolynomialFeatures(degree=x)),\
('clf',LinearRegression())]
# create a pipeline for every order of poly.
models=[Pipeline(estimators(m)) for m in range(degree)]
# create a generator for 10 folds
kf=KFold(10)
valid_error=[] # variable to store the valid error
train_error=[] # variable to store the train error
for m in models:
# function: fit and get r2 score of model.
f=lambda t,v:m.fit(_x[t],_y[t]).\
score(_x[v],_y[v])
# validation scores for each fold
v_score=[f(train,valid) for valid,train in kf.split(_x)]
# training score for each fold
t_score=[f(train,train) for _,train in kf.split(_x)]
# average valid and test error for all folds
valid_error.append(1sum(v_score)/len(v_score))
train_error.append(1sum(t_score)/len(t_score))
k=np.argmin(valid_error)
# fit model to training and validation data
models[k].fit(_x[:,0:k+1],_y)
# score model against testing data
k_score=models[k].score(x_test[:,0:k+1],y_test)
print('Optimum polynomial order = {0}\n\
testing score = {1:.2f}'.format(k,k_score))
#plotting
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.scatter3D(X, Y, Z,linewidth=0, antialiased=True,marker='.',\
color='blue',label='original data')
X_p, Y_p = np.meshgrid(x, y)
Z_p=models[k].predict(data).reshape(n,n)
from matplotlib import cm
ax.plot_surface(X_p,Y_p,Z_p,cmap=cm.coolwarm,linewidth=0,\
antialiased=True,label='Predicted Surface')
fig2,ax2=plt.subplots()
ax2.set_xlabel('Polynomial Order'),ax2.set_ylabel('Error')
ax2.plot(train_error,label='Train Error')
ax2.plot(valid_error,label='Valid Error')
ax2.grid(),ax2.legend()

 

multivariate error

 

As you can see the order of the polynomial is (2) as it was set in the code before adding gaussian noise. this concludes this series about polynomial regression.

Github repository :

https://github.com/b00033811/ml-uae

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s