Author

Marie-Hélène Burle

Before information can be processed by algorithms, it needs to be converted to floating point numbers. Indeed, you don’t pass a sentence or an image through a model; instead you input numbers representing a sequence of words or pixel values.

All these floating point numbers need to be stored in a data structure. The most suited structure is multidimensional (to hold several layers of information) and homogeneous—all data of the same type—for efficiency.

Python already has several multidimensional array structures (e.g. NumPy’s ndarray) but the particularities of deep learning call for special characteristics such as the ability to run operations on GPUs and/or in a distributed fashion, the ability to keep track of computation graphs for automatic differentiation, and different defaults (lower precision for improved training performance).

The PyTorch tensor is a Python data structure with these characteristics that can easily be converted to/from NumPy’s ndarray and integrates well with other Python libraries such as Pandas.

In this section, we will explore the basics of PyTorch tensors.

Importing PyTorch

First of all, we need to import the `torch` library:

``import torch``

We can check its version with:

``torch.__version__``
``'2.3.0+cu121'``

Creating tensors

There are many ways to create tensors:

• `torch.tensor`:   Input individual values
• `torch.arange`:   1D tensor with a sequence of integers
• `torch.linspace`:  1D linear scale tensor
• `torch.logspace`:  1D log scale tensor
• `torch.rand`:     Random numbers from a uniform distribution on `[0, 1)`
• `torch.randn`:     Numbers from the standard normal distribution
• `torch.randperm`:   Random permutation of integers
• `torch.empty`:     Uninitialized tensor
• `torch.zeros`:     Tensor filled with `0`
• `torch.ones`:     Tensor filled with `1`
• `torch.eye`:       Identity matrix

From input values

``t = torch.tensor(3)``

Without using the `shape` descriptor, try to get the shape of the following tensors:

``````torch.tensor([0.9704, 0.1339, 0.4841])

torch.tensor([[0.9524, 0.0354],
[0.9833, 0.2562],
[0.0607, 0.6420]])

torch.tensor([[[0.4604, 0.2699],
[0.8360, 0.0317],
[0.3289, 0.1171]]])

torch.tensor([[[[0.0730, 0.8737],
[0.2305, 0.4719],
[0.0796, 0.2745]]],

[[[0.1534, 0.9442],
[0.3287, 0.9040],
[0.0948, 0.1480]]]])``````

Let’s create a random tensor with a single element:

``````t = torch.rand(1)
t``````
``tensor([0.0664])``

We can extract the value from a tensor with one element:

``t.item()``
``0.06640791893005371``

All these tensors have a single element, but an increasing number of dimensions:

``torch.rand(1)``
``tensor([0.4521])``
``torch.rand(1, 1)``
``tensor([[0.4222]])``
``torch.rand(1, 1, 1)``
``tensor([[[0.8421]]])``
``torch.rand(1, 1, 1, 1)``
``tensor([[[[0.5642]]]])``

You can tell the number of dimensions of a tensor easily by counting the number of opening square brackets.

``torch.rand(1, 1, 1, 1).dim()``
``4``

Tensors can have multiple elements in one dimension:

``torch.rand(6)``
``tensor([0.4880, 0.4136, 0.6164, 0.7771, 0.3317, 0.9322])``
``torch.rand(6).dim()``
``1``

And multiple elements in multiple dimensions:

``torch.rand(2, 3, 4, 5)``
``````tensor([[[[1.9265e-01, 7.1588e-01, 5.4991e-02, 8.2984e-02, 4.7106e-01],
[1.4702e-01, 7.0770e-01, 3.7774e-01, 1.9632e-01, 3.7828e-01],
[5.3180e-01, 5.0883e-01, 8.8231e-01, 6.6615e-01, 8.9560e-01],
[2.1757e-01, 5.9166e-01, 9.3296e-01, 4.9402e-01, 7.4369e-01]],

[[5.6226e-01, 7.9807e-01, 8.5299e-01, 3.0352e-02, 5.7470e-01],
[6.9126e-01, 1.4833e-03, 1.0773e-01, 2.4625e-01, 3.3941e-01],
[1.1600e-01, 9.9698e-01, 4.1395e-01, 8.2424e-01, 5.0606e-01],
[9.3411e-01, 4.9257e-01, 7.2200e-01, 3.5606e-01, 6.8473e-01]],

[[6.3870e-01, 8.4146e-01, 1.4000e-02, 4.7660e-01, 2.5765e-01],
[3.9077e-01, 7.6622e-02, 5.0639e-01, 3.7614e-02, 3.4253e-02],
[2.3641e-01, 6.4974e-01, 7.0924e-01, 7.3478e-01, 6.9183e-01],
[5.5115e-01, 5.7502e-01, 8.1053e-01, 6.5448e-01, 7.6442e-01]]],

[[[6.6645e-01, 5.6170e-01, 5.5790e-01, 5.9724e-01, 6.7921e-01],
[5.9885e-01, 6.0820e-01, 5.0443e-02, 1.2864e-01, 3.9098e-01],
[8.1274e-01, 7.8897e-01, 4.7621e-01, 8.8376e-02, 2.0044e-01],
[5.5256e-01, 2.6450e-01, 1.5427e-01, 2.6887e-01, 2.2558e-01]],

[[4.1520e-01, 9.7462e-01, 7.5100e-01, 9.9890e-01, 6.8974e-01],
[2.3860e-01, 6.1438e-01, 3.9230e-01, 7.8527e-01, 5.9984e-01],
[5.7508e-01, 7.9849e-02, 8.4372e-01, 1.5977e-01, 1.0906e-01],
[1.7758e-01, 8.3926e-01, 9.9416e-01, 8.6307e-01, 8.6240e-01]],

[[4.6696e-01, 8.9729e-01, 9.9784e-01, 8.6357e-01, 2.0131e-01],
[3.9958e-01, 5.5251e-01, 5.1938e-01, 5.3351e-01, 2.3864e-01],
[9.4331e-01, 8.3029e-05, 6.8900e-01, 5.0304e-01, 1.3088e-01],
[6.5368e-01, 9.8662e-01, 7.8843e-01, 4.3189e-01, 9.8437e-01]]]])``````
``torch.rand(2, 3, 4, 5).dim()``
``4``
``torch.rand(2, 3, 4, 5).numel()``
``120``
``torch.ones(2, 4)``
``````tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.]])``````
``````t = torch.rand(2, 3)
torch.zeros_like(t)             # Matches the size of t``````
``````tensor([[0., 0., 0.],
[0., 0., 0.]])``````
``torch.ones_like(t)``
``````tensor([[1., 1., 1.],
[1., 1., 1.]])``````
``torch.randn_like(t)``
``````tensor([[-1.6889, -1.4382,  1.1412],
[ 1.3235, -1.4399, -0.5927]])``````
``torch.arange(2, 10, 3)    # From 2 to 10 in increments of 3``
``tensor([2, 5, 8])``
``torch.linspace(2, 10, 3)  # 3 elements from 2 to 10 on the linear scale``
``tensor([ 2.,  6., 10.])``
``torch.logspace(2, 10, 3)  # Same on the log scale``
``tensor([1.0000e+02, 1.0000e+06, 1.0000e+10])``
``torch.randperm(3)``
``tensor([0, 1, 2])``
``torch.eye(3)``
``````tensor([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])``````

Conversion to/from NumPy

PyTorch tensors can be converted to NumPy ndarrays and vice-versa in a very efficient manner as both objects share the same memory.

From PyTorch tensor to NumPy ndarray

``````t = torch.rand(2, 3)
t``````
``````tensor([[0.7550, 0.4205, 0.3024],
[0.9266, 0.8816, 0.9083]])``````
``````t_np = t.numpy()
t_np``````
``````array([[0.7550101 , 0.42052966, 0.30236405],
[0.92664444, 0.88160074, 0.90829116]], dtype=float32)``````

From NumPy ndarray to PyTorch tensor

``````import numpy as np
a = np.random.rand(2, 3)
a``````
``````array([[0.08169611, 0.69920343, 0.83400031],
[0.40020636, 0.99345611, 0.94510268]])``````
``````a_pt = torch.from_numpy(a)
a_pt``````
``````tensor([[0.0817, 0.6992, 0.8340],
[0.4002, 0.9935, 0.9451]], dtype=torch.float64)``````

Note the different default data types.

Indexing tensors

``````t = torch.rand(3, 4)
t``````
``````tensor([[0.3933, 0.6787, 0.4420, 0.1485],
[0.1954, 0.8715, 0.7792, 0.6891],
[0.0908, 0.3443, 0.7069, 0.0127]])``````
``t[:, 2]``
``tensor([0.4420, 0.7792, 0.7069])``
``t[1, :]``
``tensor([0.1954, 0.8715, 0.7792, 0.6891])``
``t[2, 3]``
``tensor(0.0127)``

A word of caution about indexing

While indexing elements of a tensor to extract some of the data as a final step of some computation is fine, you should not use indexing to run operations on tensor elements in a loop as this would be extremely inefficient.

Instead, you want to use vectorized operations.

Vectorized operations

Since PyTorch tensors are homogeneous (i.e. made of a single data type), as with NumPy’s ndarrays, operations are vectorized and thus fast.

NumPy is mostly written in C, PyTorch in C++. With either library, when you run vectorized operations on arrays/tensors, you don’t use raw Python (slow) but compiled C/C++ code (much faster).

Here is an excellent post explaining Python vectorization & why it makes such a big difference.

Data types

Default data type

Since PyTorch tensors were built with efficiency in mind for neural networks, the default data type is 32-bit floating points.

This is sufficient for accuracy and much faster than 64-bit floating points.

By contrast, NumPy ndarrays use 64-bit as their default.

``````t = torch.rand(2, 4)
t.dtype``````
``torch.float32``

Setting data type at creation

The type can be set with the `dtype` argument:

``````t = torch.rand(2, 4, dtype=torch.float64)
t``````
``````tensor([[0.7931, 0.0869, 0.0231, 0.6726],
[0.1689, 0.2116, 0.7150, 0.2311]], dtype=torch.float64)``````

Printed tensors display attributes with values ≠ default values.

``t.dtype``
``torch.float64``

Changing data type

``````t = torch.rand(2, 4)
t.dtype``````
``torch.float32``
``````t2 = t.type(torch.float64)
t2.dtype``````
``torch.float64``

List of data types

dtype Description
torch.float16 / torch.half 16-bit / half-precision floating-point
torch.float32 / torch.float 32-bit / single-precision floating-point
torch.float64 / torch.double 64-bit / double-precision floating-point
torch.uint8 unsigned 8-bit integers
torch.int8 signed 8-bit integers
torch.int16 / torch.short signed 16-bit integers
torch.int32 / torch.int signed 32-bit integers
torch.int64 / torch.long signed 64-bit integers
torch.bool boolean

Simple operations

``````t1 = torch.tensor([[1, 2], [3, 4]])
t1``````
``````tensor([[1, 2],
[3, 4]])``````
``````t2 = torch.tensor([[1, 1], [0, 0]])
t2``````
``````tensor([[1, 1],
[0, 0]])``````

Operation performed between elements at corresponding locations:

``t1 + t2``
``````tensor([[2, 3],
[3, 4]])``````

Operation applied to each element of the tensor:

``t1 + 1``
``````tensor([[2, 3],
[4, 5]])``````

Reduction

``````t = torch.ones(2, 3, 4);
t``````
``````tensor([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],

[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])``````
``t.sum()   # Reduction over all entries``
``tensor(24.)``

Other reduction functions (e.g. mean) behave the same way.

Reduction over a specific dimension:

``t.sum(0)``
``````tensor([[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.]])``````
``t.sum(1)``
``````tensor([[3., 3., 3., 3.],
[3., 3., 3., 3.]])``````
``t.sum(2)``
``````tensor([[4., 4., 4.],
[4., 4., 4.]])``````

Reduction over multiple dimensions:

``t.sum((0, 1))``
``tensor([6., 6., 6., 6.])``
``t.sum((0, 2))``
``tensor([8., 8., 8.])``
``t.sum((1, 2))``
``tensor([12., 12.])``

In-place operations

With operators post-fixed with `_`:

``````t1 = torch.tensor([1, 2])
t1``````
``tensor([1, 2])``
``````t2 = torch.tensor([1, 1])
t2``````
``tensor([1, 1])``
``````t1.add_(t2)
t1``````
``tensor([2, 3])``
``````t1.zero_()
t1``````
``tensor([0, 0])``

While reassignments will use new addresses in memory, in-place operations will use the same addresses.

Tensor views

``````t = torch.tensor([[1, 2, 3], [4, 5, 6]]); print(t)
t.size()
t.view(6)
t.view(3, 2)
t.view(3, -1) # Same: with -1, the size is inferred from other dimensions``````

Note the difference

``````t1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
t1``````
``````tensor([[1, 2, 3],
[4, 5, 6]])``````
``````t2 = t1.t()
t2``````
``````tensor([[1, 4],
[2, 5],
[3, 6]])``````
``````t3 = t1.view(3, 2)
t3``````
``````tensor([[1, 2],
[3, 4],
[5, 6]])``````

Logical operations

``````t1 = torch.randperm(5)
t1``````
``tensor([1, 0, 2, 4, 3])``
``````t2 = torch.randperm(5)
t2``````
``tensor([4, 1, 2, 0, 3])``

Test each element:

``t1 > 3``
``tensor([False, False, False,  True, False])``

Test corresponding pairs of elements:

``t1 < t2``
``tensor([ True,  True, False, False, False])``

Device attribute

Tensor data can be placed in the memory of various processor types:

The values for the device attributes are:

• CPU:  `'cpu'`,
• GPU (CUDA & AMD’s ROCm):  `'cuda'`,
• XLA:  `xm.xla_device()`.

This last option requires to load the torch_xla package first:

``````import torch_xla
import torch_xla.core.xla_model as xm``````

Creating a tensor on a specific device

By default, tensors are created on the CPU.

You can create a tensor on an accelerator by specifying the device attribute (our current training cluster does not have GPUs, so don’t run this on it):

``t_gpu = torch.rand(2, device='cuda')``

Copying a tensor to a specific device

You can also make copies of a tensor on other devices:

``````# Make a copy of t on the GPU
t_gpu = t.to(device='cuda')
t_gpu = t.cuda()             # Alternative syntax

# Make a copy of t_gpu on the CPU
t = t_gpu.to(device='cpu')
t = t_gpu.cpu()              # Alternative syntax``````

Multiple GPUs

If you have multiple GPUs, you can optionally specify which one a tensor should be created on or copied to:

``````t1 = torch.rand(2, device='cuda:0')  # Create a tensor on 1st GPU
t2 = t1.to(device='cuda:0')          # Make a copy of t1 on 1st GPU
t3 = t1.to(device='cuda:1')          # Make a copy of t1 on 2nd GPU``````

Or the equivalent short forms:

``````t2 = t1.cuda(0)
t3 = t1.cuda(1)``````