Training
After you have created the data loaders and defined your model, it is time to improve the weights and biases through training.
Chose hyperparameters
While the learning parameters of a model (weights and biases) are the values that get adjusted through training (and they will become part of the final program, along with the model architecture, once training is over), hyperparameters control the training process.
They include:
- the batch size: number of samples passed through the model before the parameters are updated,
- the number of epochs: number iterations,
- the learning rate (lr): size of the incremental changes to model parameters at each iteration. Smaller values yield slow learning speed, while large values may miss minima.
Let’s define them here:
= 1e-3
learning_rate = 64
batch_size = 5 epochs
Define the loss function
To assess the predicted outputs of our model against the true values from the labels, we also need a loss function (e.g. mean square error for regressions: nn.MSELoss
or negative log likelihood for classification: nn.NLLLoss
)
The machine learning literature is rich in information about various loss functions.
Here is an example with nn.CrossEntropyLoss
which combines nn.LogSoftmax
and nn.NLLLoss
:
= nn.CrossEntropyLoss() loss_fn
Initialize the optimizer
The optimization algorithm determines how the model parameters get adjusted at each iteration.
There are many optimizers and you need to search in the literature which one performs best for your time of model and data.
Below is an example with stochastic gradient descent:
= torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9) optimizer
lr
is the learning rate.
momentum
is a method increasing convergence rate and reducing oscillation for SDG.
Define the train and test loops
Finally, we need to define the train and test loops.
The train loop:
- gets a batch of training data from the DataLoader,
- resets the gradients of model parameters with
optimizer.zero_grad()
, - calculates predictions from the model for an input batch,
- calculates the loss for that set of predictions vs. the labels on the dataset,
- calculates the backward gradients over the learning parameters (that’s the backpropagation) with
loss.backward()
, - adjusts the parameters by the gradients collected in the backward pass with
optimizer.step()
.
The test loop evaluates the model’s performance against the test data.
def train(dataloader, model, loss_fn, optimizer):
= len(dataloader.dataset)
size for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
= model(X)
pred = loss_fn(pred, y)
loss
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
= loss.item(), batch * len(X)
loss, current print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test(dataloader, model, loss_fn):
= len(dataloader.dataset)
size = len(dataloader)
num_batches = 0, 0
test_loss, correct
with torch.no_grad():
for X, y in dataloader:
= model(X)
pred += loss_fn(pred, y).item()
test_loss += (pred.argmax(1) == y).type(torch.float).sum().item()
correct
/= num_batches
test_loss /= size
correct print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
Train
To train our model, we run the loop over the epochs:
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model, loss_fn)print("Training completed")