A quick introduction to deep learning, NLP, and LLMs

Marie-Hélène Burle

February 16, 2024



Artificial intelligence (AI)

Any human-made system mimicking animal intelligence. This is a large and very diverse field

Machine learning (ML)

A subfield of AI that can be defined as computer programs whose performance at a task improves with experience. This includes statistical inference and deep learning

Deep learning (DL)

A subfield of ML using artificial neural networks with two or more hidden layers

Natural language processing (NLP)

A subfield of AI focused on human languages. It can use statistical inference or deep learning

ML allows to achieve previously impossible tasks

Let’s take the example of image recognition:

In typical computing, a programmer writes code that gives a computer detailed instructions of what to do

Coding all the possible ways—pixel by pixel—that an image can represent, say, a dog is an impossibly large task: there are many breeds of dogs, the image can be a picture, a blurred picture, a drawing, a cartoon, the dog can be in all sorts of positions, wearing clothes, etc.

There just aren’t enough resources to make the traditional programming approach able to create a computer program that can identify a dog in images

By feeding a very large number of dog images to a neural network however, we can train that network to recognize dogs in images that it has never seen (without explicitly programming how it does this!)

Old concept … new computing power

The concept is everything but new: Arthur Samuel came up with it in 1949 and built a self-learning Checkers-playing program in 1959

Machine learning consists of feeding vast amounts of data to algorithms to strengthen pathways, so the excitement for the approach became somewhat dormant due to the lack of computing power and the lack of training data at the time

The advent of powerful computers, GPUs, and massive amounts of data have brought the old concept to the forefront


From xkcd.com

So how does it all work?

It depends on the type of learning

Supervised learning

  • Regression is a form of supervised learning with continuous outputs
  • Classification is supervised learning with discrete outputs

Supervised learning uses training data in the form of example input/output pairs


Find the relationship between inputs and outputs

Unsupervised learning

Clustering, social network analysis, market segmentation, PCA … are all forms of unsupervised learning

Unsupervised learning uses unlabelled data


Find structure within the data

Reinforcement learning

The algorithm explores by performing random actions and these actions are rewarded or punished (bonus points or penalties)

This is how algorithms learn to play games

Let’s explore the case of supervised learning

Decide on an architecture


The architecture won’t change during training

The type of architecture you choose (e.g. CNN, Transformer) depends on the type of data you have (e.g. vision, textual). The depth and breadth of your network depend on the amount of data and computing resource you have

Set some initial parameters


You can initialize them randomly or get much better ones through transfer learning

While the parameters are also part of the model, those will change during training

Get some labelled data


When we say that we need a lot of data for machine learning, we mean “lots of labelled data” as this is what gets used for training models

Make sure to keep some data for testing


Those data won’t be used for training the model. Often people keep around 20% of their data for testing

Pass data and parameters through the architecture


The train data are the inputs and the process of calculating the outputs is the forward pass

The outputs of the model are predictions


Compare those predictions to the train labels


Since our data was labelled, we know what the true outputs are

Calculate train loss


The deviation of our predictions from the true outputs gives us a measure of training loss

Adjust parameters


The parameters get automatically adjusted to reduce the training loss through the mechanism of backpropagation. This is the actual training part

This process is repeated many times. Training models is pretty much a giant for loop

From model to program


Remember that the model architecture is fixed, but that the parameters change at each iteration of the training process



While the labelled data are key to training, what we are really interested in is the combination of architecture + final parameters



When the training is over, the parameters become fixed. Which means that our model now behaves like a classic program

Evaluate the model


We can now use the testing set (which was never used to train the model) to evaluate our model: if we pass the test inputs through our program, we get some predictions that we can compare to the test labels (which are the true outputs)

This gives us the test loss: a measure of how well our model performs

Use the model


Now that we have a program, we can use it on unlabelled inputs to get what people ultimately want: unknown outputs

This is when we put our model to actual use to solve some problem

Artificial neural networks

In biological networks, the information consists of action potentials (neuron membrane rapid depolarizations) propagating through the network. In artificial ones, the information consists of tensors (multidimensional arrays) of weights and biases: each unit passes a weighted sum of an input tensor with an additional—possibly weighted—bias through an activation function before passing on the output tensor to the next layer of units

Artificial neural networks are a series of layered units mimicking the concept of biological neurons: inputs are received by every unit of a layer, computed, then transmitted to units of the next layer. In the process of learning, experience strengthens some connections between units and weakens others

Schematic of a biological neuron:

Schematic of an artificial neuron: