import torch
PyTorch tensors
Before information can be processed by algorithms, it needs to be converted to floating point numbers. Indeed, you don’t pass a sentence or an image through a model; instead you input numbers representing a sequence of words or pixel values.
All these floating point numbers need to be stored in a data structure. The most suited structure is multidimensional (to hold several layers of information) and homogeneous—all data of the same type—for efficiency.
Python already has several multidimensional array structures (e.g. NumPy’s ndarray) but the particularities of deep learning call for special characteristics such as the ability to run operations on GPUs and/or in a distributed fashion, the ability to keep track of computation graphs for automatic differentiation, and different defaults (lower precision for improved training performance).
The PyTorch tensor is a Python data structure with these characteristics that can easily be converted to/from NumPy’s ndarray and integrates well with other Python libraries such as Pandas.
In this section, we will explore the basics of PyTorch tensors.
Importing PyTorch
First of all, we need to import the torch
library:
We can check its version with:
torch.__version__
'2.3.0+cu121'
Creating tensors
There are many ways to create tensors:
torch.tensor
: Input individual valuestorch.arange
: 1D tensor with a sequence of integerstorch.linspace
: 1D linear scale tensortorch.logspace
: 1D log scale tensortorch.rand
: Random numbers from a uniform distribution on[0, 1)
torch.randn
: Numbers from the standard normal distributiontorch.randperm
: Random permutation of integerstorch.empty
: Uninitialized tensortorch.zeros
: Tensor filled with0
torch.ones
: Tensor filled with1
torch.eye
: Identity matrix
From input values
= torch.tensor(3) t
Your turn:
Without using the shape
descriptor, try to get the shape of the following tensors:
0.9704, 0.1339, 0.4841])
torch.tensor([
0.9524, 0.0354],
torch.tensor([[0.9833, 0.2562],
[0.0607, 0.6420]])
[
0.4604, 0.2699],
torch.tensor([[[0.8360, 0.0317],
[0.3289, 0.1171]]])
[
0.0730, 0.8737],
torch.tensor([[[[0.2305, 0.4719],
[0.0796, 0.2745]]],
[
0.1534, 0.9442],
[[[0.3287, 0.9040],
[0.0948, 0.1480]]]]) [
Let’s create a random tensor with a single element:
= torch.rand(1)
t t
tensor([0.0664])
We can extract the value from a tensor with one element:
t.item()
0.06640791893005371
All these tensors have a single element, but an increasing number of dimensions:
1) torch.rand(
tensor([0.4521])
1, 1) torch.rand(
tensor([[0.4222]])
1, 1, 1) torch.rand(
tensor([[[0.8421]]])
1, 1, 1, 1) torch.rand(
tensor([[[[0.5642]]]])
You can tell the number of dimensions of a tensor easily by counting the number of opening square brackets.
1, 1, 1, 1).dim() torch.rand(
4
Tensors can have multiple elements in one dimension:
6) torch.rand(
tensor([0.4880, 0.4136, 0.6164, 0.7771, 0.3317, 0.9322])
6).dim() torch.rand(
1
And multiple elements in multiple dimensions:
2, 3, 4, 5) torch.rand(
tensor([[[[1.9265e-01, 7.1588e-01, 5.4991e-02, 8.2984e-02, 4.7106e-01],
[1.4702e-01, 7.0770e-01, 3.7774e-01, 1.9632e-01, 3.7828e-01],
[5.3180e-01, 5.0883e-01, 8.8231e-01, 6.6615e-01, 8.9560e-01],
[2.1757e-01, 5.9166e-01, 9.3296e-01, 4.9402e-01, 7.4369e-01]],
[[5.6226e-01, 7.9807e-01, 8.5299e-01, 3.0352e-02, 5.7470e-01],
[6.9126e-01, 1.4833e-03, 1.0773e-01, 2.4625e-01, 3.3941e-01],
[1.1600e-01, 9.9698e-01, 4.1395e-01, 8.2424e-01, 5.0606e-01],
[9.3411e-01, 4.9257e-01, 7.2200e-01, 3.5606e-01, 6.8473e-01]],
[[6.3870e-01, 8.4146e-01, 1.4000e-02, 4.7660e-01, 2.5765e-01],
[3.9077e-01, 7.6622e-02, 5.0639e-01, 3.7614e-02, 3.4253e-02],
[2.3641e-01, 6.4974e-01, 7.0924e-01, 7.3478e-01, 6.9183e-01],
[5.5115e-01, 5.7502e-01, 8.1053e-01, 6.5448e-01, 7.6442e-01]]],
[[[6.6645e-01, 5.6170e-01, 5.5790e-01, 5.9724e-01, 6.7921e-01],
[5.9885e-01, 6.0820e-01, 5.0443e-02, 1.2864e-01, 3.9098e-01],
[8.1274e-01, 7.8897e-01, 4.7621e-01, 8.8376e-02, 2.0044e-01],
[5.5256e-01, 2.6450e-01, 1.5427e-01, 2.6887e-01, 2.2558e-01]],
[[4.1520e-01, 9.7462e-01, 7.5100e-01, 9.9890e-01, 6.8974e-01],
[2.3860e-01, 6.1438e-01, 3.9230e-01, 7.8527e-01, 5.9984e-01],
[5.7508e-01, 7.9849e-02, 8.4372e-01, 1.5977e-01, 1.0906e-01],
[1.7758e-01, 8.3926e-01, 9.9416e-01, 8.6307e-01, 8.6240e-01]],
[[4.6696e-01, 8.9729e-01, 9.9784e-01, 8.6357e-01, 2.0131e-01],
[3.9958e-01, 5.5251e-01, 5.1938e-01, 5.3351e-01, 2.3864e-01],
[9.4331e-01, 8.3029e-05, 6.8900e-01, 5.0304e-01, 1.3088e-01],
[6.5368e-01, 9.8662e-01, 7.8843e-01, 4.3189e-01, 9.8437e-01]]]])
2, 3, 4, 5).dim() torch.rand(
4
2, 3, 4, 5).numel() torch.rand(
120
2, 4) torch.ones(
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.]])
= torch.rand(2, 3)
t # Matches the size of t torch.zeros_like(t)
tensor([[0., 0., 0.],
[0., 0., 0.]])
torch.ones_like(t)
tensor([[1., 1., 1.],
[1., 1., 1.]])
torch.randn_like(t)
tensor([[-1.6889, -1.4382, 1.1412],
[ 1.3235, -1.4399, -0.5927]])
2, 10, 3) # From 2 to 10 in increments of 3 torch.arange(
tensor([2, 5, 8])
2, 10, 3) # 3 elements from 2 to 10 on the linear scale torch.linspace(
tensor([ 2., 6., 10.])
2, 10, 3) # Same on the log scale torch.logspace(
tensor([1.0000e+02, 1.0000e+06, 1.0000e+10])
3) torch.randperm(
tensor([0, 1, 2])
3) torch.eye(
tensor([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
Conversion to/from NumPy
PyTorch tensors can be converted to NumPy ndarrays and vice-versa in a very efficient manner as both objects share the same memory.
From PyTorch tensor to NumPy ndarray
= torch.rand(2, 3)
t t
tensor([[0.7550, 0.4205, 0.3024],
[0.9266, 0.8816, 0.9083]])
= t.numpy()
t_np t_np
array([[0.7550101 , 0.42052966, 0.30236405],
[0.92664444, 0.88160074, 0.90829116]], dtype=float32)
From NumPy ndarray to PyTorch tensor
import numpy as np
= np.random.rand(2, 3)
a a
array([[0.08169611, 0.69920343, 0.83400031],
[0.40020636, 0.99345611, 0.94510268]])
= torch.from_numpy(a)
a_pt a_pt
tensor([[0.0817, 0.6992, 0.8340],
[0.4002, 0.9935, 0.9451]], dtype=torch.float64)
Note the different default data types.
Indexing tensors
= torch.rand(3, 4)
t t
tensor([[0.3933, 0.6787, 0.4420, 0.1485],
[0.1954, 0.8715, 0.7792, 0.6891],
[0.0908, 0.3443, 0.7069, 0.0127]])
2] t[:,
tensor([0.4420, 0.7792, 0.7069])
1, :] t[
tensor([0.1954, 0.8715, 0.7792, 0.6891])
2, 3] t[
tensor(0.0127)
A word of caution about indexing
While indexing elements of a tensor to extract some of the data as a final step of some computation is fine, you should not use indexing to run operations on tensor elements in a loop as this would be extremely inefficient.
Instead, you want to use vectorized operations.
Vectorized operations
Since PyTorch tensors are homogeneous (i.e. made of a single data type), as with NumPy’s ndarrays, operations are vectorized and thus fast.
NumPy is mostly written in C, PyTorch in C++. With either library, when you run vectorized operations on arrays/tensors, you don’t use raw Python (slow) but compiled C/C++ code (much faster).
Here is an excellent post explaining Python vectorization & why it makes such a big difference.
Data types
Default data type
Since PyTorch tensors were built with efficiency in mind for neural networks, the default data type is 32-bit floating points.
This is sufficient for accuracy and much faster than 64-bit floating points.
By contrast, NumPy ndarrays use 64-bit as their default.
= torch.rand(2, 4)
t t.dtype
torch.float32
Setting data type at creation
The type can be set with the dtype
argument:
= torch.rand(2, 4, dtype=torch.float64)
t t
tensor([[0.7931, 0.0869, 0.0231, 0.6726],
[0.1689, 0.2116, 0.7150, 0.2311]], dtype=torch.float64)
Printed tensors display attributes with values ≠ default values.
t.dtype
torch.float64
Changing data type
= torch.rand(2, 4)
t t.dtype
torch.float32
= t.type(torch.float64)
t2 t2.dtype
torch.float64
List of data types
dtype | Description |
---|---|
torch.float16 / torch.half | 16-bit / half-precision floating-point |
torch.float32 / torch.float | 32-bit / single-precision floating-point |
torch.float64 / torch.double | 64-bit / double-precision floating-point |
torch.uint8 | unsigned 8-bit integers |
torch.int8 | signed 8-bit integers |
torch.int16 / torch.short | signed 16-bit integers |
torch.int32 / torch.int | signed 32-bit integers |
torch.int64 / torch.long | signed 64-bit integers |
torch.bool | boolean |
Simple operations
= torch.tensor([[1, 2], [3, 4]])
t1 t1
tensor([[1, 2],
[3, 4]])
= torch.tensor([[1, 1], [0, 0]])
t2 t2
tensor([[1, 1],
[0, 0]])
Operation performed between elements at corresponding locations:
+ t2 t1
tensor([[2, 3],
[3, 4]])
Operation applied to each element of the tensor:
+ 1 t1
tensor([[2, 3],
[4, 5]])
Reduction
= torch.ones(2, 3, 4);
t t
tensor([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
sum() # Reduction over all entries t.
tensor(24.)
Other reduction functions (e.g. mean) behave the same way.
Reduction over a specific dimension:
sum(0) t.
tensor([[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.]])
sum(1) t.
tensor([[3., 3., 3., 3.],
[3., 3., 3., 3.]])
sum(2) t.
tensor([[4., 4., 4.],
[4., 4., 4.]])
Reduction over multiple dimensions:
sum((0, 1)) t.
tensor([6., 6., 6., 6.])
sum((0, 2)) t.
tensor([8., 8., 8.])
sum((1, 2)) t.
tensor([12., 12.])
In-place operations
With operators post-fixed with _
:
= torch.tensor([1, 2])
t1 t1
tensor([1, 2])
= torch.tensor([1, 1])
t2 t2
tensor([1, 1])
t1.add_(t2) t1
tensor([2, 3])
t1.zero_() t1
tensor([0, 0])
While reassignments will use new addresses in memory, in-place operations will use the same addresses.
Tensor views
= torch.tensor([[1, 2, 3], [4, 5, 6]]); print(t)
t
t.size()6)
t.view(3, 2)
t.view(3, -1) # Same: with -1, the size is inferred from other dimensions t.view(
Note the difference
= torch.tensor([[1, 2, 3], [4, 5, 6]])
t1 t1
tensor([[1, 2, 3],
[4, 5, 6]])
= t1.t()
t2 t2
tensor([[1, 4],
[2, 5],
[3, 6]])
= t1.view(3, 2)
t3 t3
tensor([[1, 2],
[3, 4],
[5, 6]])
Logical operations
= torch.randperm(5)
t1 t1
tensor([1, 0, 2, 4, 3])
= torch.randperm(5)
t2 t2
tensor([4, 1, 2, 0, 3])
Test each element:
> 3 t1
tensor([False, False, False, True, False])
Test corresponding pairs of elements:
< t2 t1
tensor([ True, True, False, False, False])
Device attribute
Tensor data can be placed in the memory of various processor types:
- the RAM of CPU,
- the RAM of a GPU with CUDA support,
- the RAM of a GPU with AMD’s ROCm support,
- the RAM of an XLA device (e.g. Cloud TPU) with the torch_xla package.
The values for the device attributes are:
- CPU:
'cpu'
, - GPU (CUDA & AMD’s ROCm):
'cuda'
, - XLA:
xm.xla_device()
.
This last option requires to load the torch_xla package first:
import torch_xla
import torch_xla.core.xla_model as xm
Creating a tensor on a specific device
By default, tensors are created on the CPU.
You can create a tensor on an accelerator by specifying the device attribute (our current training cluster does not have GPUs, so don’t run this on it):
= torch.rand(2, device='cuda') t_gpu
Copying a tensor to a specific device
You can also make copies of a tensor on other devices:
# Make a copy of t on the GPU
= t.to(device='cuda')
t_gpu = t.cuda() # Alternative syntax
t_gpu
# Make a copy of t_gpu on the CPU
= t_gpu.to(device='cpu')
t = t_gpu.cpu() # Alternative syntax t
Multiple GPUs
If you have multiple GPUs, you can optionally specify which one a tensor should be created on or copied to:
= torch.rand(2, device='cuda:0') # Create a tensor on 1st GPU
t1 = t1.to(device='cuda:0') # Make a copy of t1 on 1st GPU
t2 = t1.to(device='cuda:1') # Make a copy of t1 on 2nd GPU t3
Or the equivalent short forms:
= t1.cuda(0)
t2 = t1.cuda(1) t3