# Linear Regression with Python (3.6).

## Multivariate Regression(WIP):

When it comes to multivariate we will resort to sklearns  Polyfeatures() along with Pipeline(). This will facilitate for a more readable code.

The steps are very similar to the previous example. The only difference is that we will create function of [x,y] and evaluate it in order to create some dummy data. Then the data is fed through a pipeline that generate the polynomial features and get the score for the k-folds.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

 import numpy as np from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.model_selection import KFold from sklearn.pipeline import Pipeline def gaussian_noise(mu,sigma,n): return np.random.normal(mu,sigma,n) from sklearn.preprocessing import PolynomialFeatures degree=8 mean=0 sigma=.2 n=30 x = np.linspace(–2,2,n) y = np.linspace(–2,2,n) X, Y = np.meshgrid(x, y) # create a meshgrid to evaluate z(x,y) Z=Y**2 – X**2 # evaluate z # reshape for sklearn and add noise Z=Z.reshape(1,–1).T+np.array([gaussian_noise(mean,sigma,n**2)]).T X=X.reshape(1,–1).T Y=Y.reshape(1,–1).T # input features shape [2,n**2] n= len(x) or len(y) data=np.concatenate([X,Y],axis=1) # split to test train sets _x,x_test,_y,y_test = train_test_split(data, Z,\ test_size=0.30, random_state=1337) # create expression for pipline estimators=lambda x:[('PolyFeatures',PolynomialFeatures(degree=x)),\ ('clf',LinearRegression())] # create a pipeline for every order of poly. models=[Pipeline(estimators(m)) for m in range(degree)] # create a generator for 10 folds kf=KFold(10) valid_error=[] # variable to store the valid error train_error=[] # variable to store the train error for m in models: # function: fit and get r2 score of model. f=lambda t,v:m.fit(_x[t],_y[t]).\ score(_x[v],_y[v]) # validation scores for each fold v_score=[f(train,valid) for valid,train in kf.split(_x)] # training score for each fold t_score=[f(train,train) for _,train in kf.split(_x)] # average valid and test error for all folds valid_error.append(1–sum(v_score)/len(v_score)) train_error.append(1–sum(t_score)/len(t_score)) k=np.argmin(valid_error) # fit model to training and validation data models[k].fit(_x[:,0:k+1],_y) # score model against testing data k_score=models[k].score(x_test[:,0:k+1],y_test) print('Optimum polynomial order = {0}\n\ testing score = {1:.2f}'.format(k,k_score)) #plotting fig = plt.figure() ax = fig.gca(projection='3d') ax.scatter3D(X, Y, Z,linewidth=0, antialiased=True,marker='.',\ color='blue',label='original data') X_p, Y_p = np.meshgrid(x, y) Z_p=models[k].predict(data).reshape(n,n) from matplotlib import cm ax.plot_surface(X_p,Y_p,Z_p,cmap=cm.coolwarm,linewidth=0,\ antialiased=True,label='Predicted Surface') fig2,ax2=plt.subplots() ax2.set_xlabel('Polynomial Order'),ax2.set_ylabel('Error') ax2.plot(train_error,label='Train Error') ax2.plot(valid_error,label='Valid Error') ax2.grid(),ax2.legend() As you can see the order of the polynomial is (2) as it was set in the code before adding gaussian noise. this concludes this series about polynomial regression.

Github repository :

https://github.com/b00033811/ml-uae