Deep Learning Homework Assignment¶

In this notebook, we explore the concept of momentum in deep learning, practice object oriented programming, and then finish by implementing a deep learning model for multiclass classifcation of three types of wines.

1. SGD with momentum¶


Read the following menual, and try to understand the pseudo-code.
https://pytorch.org/docs/stable/generated/torch.optim.SGD.html
(a) Question: How does the momentum parameter plays a role in the SGD function?

Answer:
From reading the documentation and watching various videos from Andrew Ng and DeepLearningAI, it seems that the momentum parameter is used to set the size of the batch over which a running average of the gradient is calculated. Instead of updating our parameters with the current partials, we instead use u running total average of sorts to use for updating. For example, with a momentum parameter of 0.9, we are updating our parameters each time with an average of about the last 10 gradient calculations, rather than with only the current partial derivatives. This has the effect of quickly smoothing out oscillations, allowing us to potentially use a larger learning rate and to reach our minimum loss more quickly.

(b) Add the momentum parameter in the following code (an example we did in class), plot the loss function before and after adding the momentum parameter, and see which one goes faster. If you set up the parameter correctly, you should see a faster convergence.

In [1]:
import torch
import torch.nn as nn
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
# 0. prepare data
x_np,y_np = datasets.make_regression(n_samples=100,n_features=1,noise=20,random_state=1)
#convert to torch tensor, it was double data type, want to convert to float32
x=torch.from_numpy(x_np.astype(np.float32))
y=torch.from_numpy(y_np.astype(np.float32))
print(x.shape)
print(y.shape)
#want to change the shape of y to one column
y=y.view(y.shape[0],1)
#print(y)
n_samples,n_features=x.shape
# 1. model
input_size=n_features
output_size=1
model=nn.Linear(input_size,output_size)
# 2. loss and optimizer
learning_rate=0.01
loss_nn=nn.MSELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=learning_rate)
# 3. training loop
n_epochs=200
#save loss for figure
l_no_momt=torch.ones(n_epochs)
w_no_momt=torch.ones(n_epochs)
b_no_momt=torch.ones(n_epochs)
for epoch in range(n_epochs):
    
    #forward pass
    y_predicted = model(x)
    loss = loss_nn(y_predicted,y)
    
    #backward pass
    loss.backward()
    
    #update
    optimizer.step()
    
    optimizer.zero_grad()
    
    [w, b] = model.parameters()
        
    if epoch%10==0:
        print(f'epoch:{epoch}, loss ={loss.item():.4f}')
    
    l_no_momt[epoch]=loss.item()
    w_no_momt[epoch]=w.item()
    b_no_momt[epoch]=b.item()
torch.Size([100, 1])
torch.Size([100])
epoch:0, loss =5665.2095
epoch:10, loss =4195.6938
epoch:20, loss =3133.8694
epoch:30, loss =2365.7358
epoch:40, loss =1809.4591
epoch:50, loss =1406.2043
epoch:60, loss =1113.6072
epoch:70, loss =901.1198
epoch:80, loss =746.6875
epoch:90, loss =634.3671
epoch:100, loss =552.6208
epoch:110, loss =493.0897
epoch:120, loss =449.7123
epoch:130, loss =418.0893
epoch:140, loss =395.0245
epoch:150, loss =378.1946
epoch:160, loss =365.9091
epoch:170, loss =356.9382
epoch:180, loss =350.3852
epoch:190, loss =345.5970
In [2]:
#Redefine model so that the loss resets
model=nn.Linear(input_size,output_size)
# 2. loss and optimizer
learning_rate=0.01
loss_nn=nn.MSELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=learning_rate, momentum = 0.9)
# 3. training loop
n_epochs=200
#save loss for figure
l_momt=torch.ones(n_epochs)
w_momt=torch.ones(n_epochs)
b_momt=torch.ones(n_epochs)
for epoch in range(n_epochs):
    
    #forward pass
    y_predicted = model(x)
    loss = loss_nn(y_predicted,y)
    
    #backward pass
    loss.backward()
    
    #update
    optimizer.step()
    
    optimizer.zero_grad()
    
    [w, b] = model.parameters()
        
    if epoch%10==0:
        print(f'epoch:{epoch}, loss ={loss.item():.4f}')
    
    l_momt[epoch]=loss.item()
    w_momt[epoch]=w.item()
    b_momt[epoch]=b.item()
epoch:0, loss =5686.2954
epoch:10, loss =1315.3956
epoch:20, loss =474.7921
epoch:30, loss =580.3893
epoch:40, loss =348.6200
epoch:50, loss =348.3355
epoch:60, loss =341.3312
epoch:70, loss =333.0224
epoch:80, loss =333.4992
epoch:90, loss =332.8613
epoch:100, loss =332.5858
epoch:110, loss =332.6186
epoch:120, loss =332.5745
epoch:130, loss =332.5695
epoch:140, loss =332.5699
epoch:150, loss =332.5676
epoch:160, loss =332.5677
epoch:170, loss =332.5676
epoch:180, loss =332.5676
epoch:190, loss =332.5676
In [3]:
plt.plot(l_no_momt, label ='Loss - no momentum')
plt.plot(l_momt, label = 'Loss - momentum')
plt.legend()
Out[3]:
<matplotlib.legend.Legend at 0x166e6c81040>

2. Object Oriented Programming (OOP)¶


To better understand how Pytorch, or in gerneral how Python class works, we covered the OOP in class.
Define a class named my_matrix. In this class, implement the following methods

- shape: return the number of rows, and number of columns
- get: that take the number of rows, the number of columns as parameters, and returns the content of cell corresponding to row number col number 
- scalar_mult: that take a scalar and return a new matrix which is the scalar product of matrix x val

Create an example to test if your class is functioning as you expected.

In [4]:
class my_matrix:
    def __init__(self,matrix):
        self.matrix = matrix
        self.rows = matrix.shape[0]
        self.columns = matrix.shape[1]
    def shape(self):
        return (self.rows,self.columns)
    def get(self,row,col):
        return self.matrix[(row-1,col-1)] #This way we index starting at 1 instead of 0 for our matrix
    def scalar_mult(self,k):
        return(k*self.matrix)
In [5]:
#Test out the class
import numpy as np

A = np.random.rand(2,4)
print(A)
[[0.56990919 0.69346765 0.64956677 0.9062466 ]
 [0.75915085 0.74553715 0.46487096 0.62558745]]
In [6]:
B = my_matrix(A)
print(B.shape())  # Should get (2,4)
print(B.get(1,2)) #our get function should pull 1st row, 2nd column (0.314...)
print(B.scalar_mult(4))
(2, 4)
0.6934676481184192
[[2.27963677 2.77387059 2.59826707 3.62498642]
 [3.03660341 2.98214861 1.85948385 2.50234981]]

3. Inherit the class you defined in problem 2 to define the new class named m2vector. Add the following method¶

  • resize: resize the matrix to have only one row if the parameter row=1, otherwise, resize it to a column vector.
In [7]:
class m2vector(my_matrix):
    def resize(self):
        if self.rows == 1:
            return self.matrix
        else:
            A = self.matrix.reshape(-1,1)
            return(A)
In [8]:
#Test m2vector class check resize changes A into a column vector

A = np.random.rand(3,4)
print(A)
B = m2vector(A)
print(B.resize())
print(f'See above that B and its reshape have the same entries, but B has dimension {B.shape()}, but its reshape has dimension {B.resize().shape}')
[[0.18143545 0.2205793  0.1924449  0.25638343]
 [0.15934867 0.3581916  0.13244368 0.64762501]
 [0.95805765 0.34123219 0.17388773 0.44132445]]
[[0.18143545]
 [0.2205793 ]
 [0.1924449 ]
 [0.25638343]
 [0.15934867]
 [0.3581916 ]
 [0.13244368]
 [0.64762501]
 [0.95805765]
 [0.34123219]
 [0.17388773]
 [0.44132445]]
See above that B and its reshape have the same entries, but B has dimension (3, 4), but its reshape has dimension (12, 1)
In [9]:
# Check to see that resize returns a row vector if given a row vector
A = np.random.rand(1,6)
print(A)

B = m2vector(A)
print(B.resize())
[[0.06095501 0.41850816 0.63834825 0.13367014 0.23235661 0.24849694]]
[[0.06095501 0.41850816 0.63834825 0.13367014 0.23235661 0.24849694]]

4. Super class. Fill out the code for area_2 function.¶

And then run pyramid.area_2() after you defined the object pyramid that has base equal to 2, and slant height equal to 4. The result should be 20, same as the result of pyramid.area().

In [10]:
class Rectangle:
    def __init__(self, length, width):
        self.length = length
        self.width = width

    def area(self):
        return self.length * self.width

    def perimeter(self):
        return 2 * self.length + 2 * self.width

class Square(Rectangle):
    def __init__(self, length):
        super().__init__(length, length)

class Triangle:
    def __init__(self, base, height):
        self.base = base
        self.height = height

    def tri_area(self):
        return 0.5 * self.base * self.height

class RightPyramid(Square, Triangle):
    def __init__(self, base, slant_height):
        self.base = base
        self.slant_height = slant_height
        Triangle.__init__(self, base=self.base, height=slant_height)
        Square.__init__(self, length=self.base)

    def area(self):
        base_area = super().area()
        perimeter = super().perimeter()
        return 0.5 * perimeter * self.slant_height + base_area

    def area_2(self):
        base_area =  super().area()
        triangle_area =  super().tri_area()
        return triangle_area * 4 + base_area
In [11]:
pyramid = RightPyramid(base=2, slant_height=4)
pyramid.area_2() # After fill out the code, you should see the following result
Out[11]:
20.0

5. Use the wine dataset and a deep learning framework to predict type of wine¶

In [34]:
import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np
import math
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
from sklearn.metrics import accuracy_score



#Create a custom dataset

class WineDataset(Dataset):

    def __init__(self,x,y):
        #define shape, x_data, and y_data from input x and y torch tensors
        self.n_samples = x.shape[0]
        self.n_features = x.shape[1]

        self.x_data = x # size [n_samples, n_features]
        self.y_data = y # size [n_samples, 1]
       
        

    # support indexing such that dataset[i] can be used to get i-th sample
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # we can call len(dataset) to return the size
    def __len__(self):
        return self.n_samples


#Load the data into numpy array xy
df = pd.read_csv('wine.csv')

df.head()


X = df.drop(['Wine'],axis=1)
y = df['Wine']
y -= 1


#train test split, 20% for testing data
x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.2)

#Scale data
x_scaler = StandardScaler()
x_train = x_scaler.fit_transform(x_train)
x_test = x_scaler.transform(x_test)

#Switch to tensors
x_train =torch.Tensor(x_train)
x_test = torch.Tensor(x_test)

y_train =torch.LongTensor(y_train.values)
y_test = torch.LongTensor(y_test.values)
print(y_train.shape)
# create dataset class
train_data = WineDataset(x_train,y_train)



# Load training dataset with DataLoader
# shuffle: shuffle data, good for training
# num_workers: faster loading with multiple subprocesses

train_loader = DataLoader(dataset=train_data,
                          batch_size=4,
                          shuffle=True,
                          num_workers=0)





# Dummy Training loop
num_epochs = 20
total_samples = len(train_data)
n_iterations = math.ceil(total_samples/4)
print(total_samples, n_iterations)

# Define Multiclass network. Output layer has to have 3 nodes since our entries of y are now vectors with 3 entries
class MultiClass(nn.Module):
    def __init__(self, n_features):
        super(MultiClass,self).__init__()
        #Define the Layers
        self.network=nn.Sequential(
        nn.Linear(n_features,64),
        nn.ReLU(),
        nn.Linear(64,32),
        nn.ReLU(),
        nn.Linear(32,16),
        nn.ReLU(),
        nn.Linear(16,3)
        #No activation function here. Aparently CrossEntropyLoss automatically builds in a sort of SoftMax Activation
        )
        
    def forward(self,x):
        y=self.network(x)
        return y

#Define Model, Loss Function and Optimizer

model = MultiClass(train_data.n_features)

loss_fn = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(model.parameters(),lr =0.001)

train_loss = []
test_loss = []
train_acc = []
test_acc = []
for epoch in range(num_epochs):
    for i,(inputs, labels) in enumerate(train_loader):
        y_pred = model(inputs)
        loss = loss_fn(y_pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if epoch % 10 == 9 or epoch == num_epochs-1:
            if (i+1) % 12 == 0:
                print(f'Epoch: {epoch+1}/{num_epochs}, Step {i+1}/{n_iterations}| Loss = {loss.item():.3f}')
    model.eval()
    with torch.no_grad():
        y_pred1 = model(x_train)
        y_pred1 = y_pred1.argmax(dim=1)
        y_pred2 = model(x_test)
        y_pred2 = y_pred2.argmax(dim=1)

        train_acc.append(accuracy_score(y_train,y_pred1))
        test_acc.append(accuracy_score(y_test,y_pred2))
torch.Size([142])
142 36
Epoch: 10/20, Step 12/36| Loss = 0.001
Epoch: 10/20, Step 24/36| Loss = 0.007
Epoch: 10/20, Step 36/36| Loss = 0.001
Epoch: 20/20, Step 12/36| Loss = 0.000
Epoch: 20/20, Step 24/36| Loss = 0.004
Epoch: 20/20, Step 36/36| Loss = 0.000
In [35]:
with torch.no_grad():
    plt.plot(train_acc, label = 'Train')
    plt.plot(test_acc, label= 'Test')
    plt.legend()
In [36]:
#Redefine y_pred, currently it's defined on a batch of x values
y_pred = model(x_train)
y_pred = y_pred.argmax(dim=1)#(abs(y_pred)/y_pred+1)/2 


test_acc = accuracy_score(y_train.detach(),y_pred.detach())

print(f'Test Set accuracy: {test_acc}')
Test Set accuracy: 1.0
In [37]:
#Evaluate model on test set
test_result = model(x_test)
test_result = test_result.argmax(dim=1)
test_acc = accuracy_score(y_test.detach(),test_result.detach())

print(f'Test Set accuracy: {test_acc}')
Test Set accuracy: 0.9722222222222222
In [ ]: