NN vs biological neurons
Types of NN

Marie-Hélène Burle

frontlogo

In biological networks, the information consists of action potentials (neuron membrane rapid depolarizations) propagating through the network. In artificial ones, the information consists of tensors (multidimensional arrays) of weights and biases: each unit passes a weighted sum of an input tensor with an additional—possibly weighted—bias through an activation function before passing on the output tensor to the next layer of units.

Artificial neural networks are a series of layered units mimicking the concept of biological neurons: inputs are received by every unit of a layer, computed, then transmitted to units of the next layer. In the process of learning, experience strengthens some connections between units and weakens others.


Schematic of a biological neuron:

Schematic of an artificial neuron:

noshadow

Modified from O.C. Akgun & J. Mei 2019

While biological neurons are connected in extremely intricate patterns, artificial ones follow a layered structure. Another difference in complexity is in the number of units: the human brain has 65–90 billion neurons. ANN have much fewer units.


Neurons in mouse cortex:

noshadow

Neurons are in green, the dark branches are blood vessels. Image by Na Ji, UC Berkeley

Neural network with 2 hidden layers:

The information in biological neurons is an all-or-nothing electrochemical pulse or action potential. Greater stimuli don’t produce stronger signals but increase firing frequency. In contrast, artificial neurons pass the computation of their inputs through an activation function and the output can take any of the values possible with that function.

Threshold potential in biological neurons:

noshadow

Modified from Blacktc, Wikimedia
Some common activation functions in ANNs:

Central to both systems is the concept of learning.

The process of learning in biological NN happens through neuron death or growth and the creation or loss of synaptic connections between neurons.

In ANN, learning happens through optimization algorithms such as gradient descent which minimize cross entropy loss functions by adjusting the weights and biases connecting each layer of neurons over many iterations.

Types of ANN

Fully connected neural networks

Each neuron receives inputs from every neuron of the previous layer and passes its output to every neuron of the next layer.

Convolutional neural networks

noshadow

Convolutional neural networks (CNN) are used for spatially structured data (e.g. images).

Images have huge input sizes and would require a very large number of neurons in a fully connected neural net. In convolutional layers, neurons receive input from a subarea (called local receptive field) of the previous layer. This greatly reduces the number of parameters. Optionally, pooling (combining the outputs of neurons in a subarea) reduces the data dimensions.

Recurrent neural networks

noshadow

Recurrent neural networks (RNN) such as Long Short-Term Memory (LSTM) are used for chain structured data (e.g. text).

They are not feedforward networks (i.e. networks for which the information moves only in the forward direction without any loop).

Transformers

A combination of two RNNs (the encoder and the decoder) is used in sequence to sequence models for translation or picture captioning.

In 2014 the concept of attention (giving added weight to important words) was developed, greatly improving the ability of such models to process a lot of data.

The problem with recurrence is that it is not easily to parallelize (and thus to run fast on GPUs).

In 2017, a new model—the transformer—was proposed: by using only attention mechanisms and no recurrence, the transformer achieves better results in an easily parallelizable fashion.

With the addition of transfer learning, powerful transformers emerged in the field of NLP (e.g. Bidirectional Encoder Representations from Transformers (BERT) from Google and Generative Pre-trained Transformer-3 (GPT-3) from OpenAI).