PyTorch tensors

Author

Marie-Hélène Burle

Before information can be processed by algorithms, it needs to be converted to floating point numbers. Indeed, you don’t pass a sentence or an image through a model; instead you input numbers representing a sequence of words or pixel values.

All these floating point numbers need to be stored in a data structure. The most suited structure is multidimensional (to hold several layers of information) and homogeneous—all data of the same type—for efficiency.

Python already has several multidimensional array structures (e.g. NumPy’s ndarray) but the particularities of deep learning call for special characteristics such as the ability to run operations on GPUs and/or in a distributed fashion, the ability to keep track of computation graphs for automatic differentiation, and different defaults (lower precision for improved training performance).

The PyTorch tensor is a Python data structure with these characteristics that can easily be converted to/from NumPy’s ndarray and integrates well with other Python libraries such as Pandas.

In this section, we will explore the basics of PyTorch tensors.

Importing PyTorch

First of all, we need to import the torch library:

import torch

We can check its version with:

torch.__version__

'2.3.0+cu121'

Creating tensors

There are many ways to create tensors:

torch.tensor: Input individual values
torch.arange: 1D tensor with a sequence of integers
torch.linspace: 1D linear scale tensor
torch.logspace: 1D log scale tensor
torch.rand: Random numbers from a uniform distribution on [0, 1)
torch.randn: Numbers from the standard normal distribution
torch.randperm: Random permutation of integers
torch.empty: Uninitialized tensor
torch.zeros: Tensor filled with 0
torch.ones: Tensor filled with 1
torch.eye: Identity matrix

From input values

t = torch.tensor(3)

Your turn:

Without using the shape descriptor, try to get the shape of the following tensors:

torch.tensor([0.9704, 0.1339, 0.4841])

torch.tensor([[0.9524, 0.0354],
        [0.9833, 0.2562],
        [0.0607, 0.6420]])

torch.tensor([[[0.4604, 0.2699],
         [0.8360, 0.0317],
         [0.3289, 0.1171]]])

torch.tensor([[[[0.0730, 0.8737],
          [0.2305, 0.4719],
          [0.0796, 0.2745]]],

        [[[0.1534, 0.9442],
          [0.3287, 0.9040],
          [0.0948, 0.1480]]]])

Let’s create a random tensor with a single element:

t = torch.rand(1)
t

tensor([0.0664])

We can extract the value from a tensor with one element:

t.item()

0.06640791893005371

All these tensors have a single element, but an increasing number of dimensions:

torch.rand(1)

tensor([0.4521])

torch.rand(1, 1)

tensor([[0.4222]])

torch.rand(1, 1, 1)

tensor([[[0.8421]]])

torch.rand(1, 1, 1, 1)

tensor([[[[0.5642]]]])

You can tell the number of dimensions of a tensor easily by counting the number of opening square brackets.

torch.rand(1, 1, 1, 1).dim()

Tensors can have multiple elements in one dimension:

torch.rand(6)

tensor([0.4880, 0.4136, 0.6164, 0.7771, 0.3317, 0.9322])

torch.rand(6).dim()

And multiple elements in multiple dimensions:

torch.rand(2, 3, 4, 5)

tensor([[[[1.9265e-01, 7.1588e-01, 5.4991e-02, 8.2984e-02, 4.7106e-01],
          [1.4702e-01, 7.0770e-01, 3.7774e-01, 1.9632e-01, 3.7828e-01],
          [5.3180e-01, 5.0883e-01, 8.8231e-01, 6.6615e-01, 8.9560e-01],
          [2.1757e-01, 5.9166e-01, 9.3296e-01, 4.9402e-01, 7.4369e-01]],

         [[5.6226e-01, 7.9807e-01, 8.5299e-01, 3.0352e-02, 5.7470e-01],
          [6.9126e-01, 1.4833e-03, 1.0773e-01, 2.4625e-01, 3.3941e-01],
          [1.1600e-01, 9.9698e-01, 4.1395e-01, 8.2424e-01, 5.0606e-01],
          [9.3411e-01, 4.9257e-01, 7.2200e-01, 3.5606e-01, 6.8473e-01]],

         [[6.3870e-01, 8.4146e-01, 1.4000e-02, 4.7660e-01, 2.5765e-01],
          [3.9077e-01, 7.6622e-02, 5.0639e-01, 3.7614e-02, 3.4253e-02],
          [2.3641e-01, 6.4974e-01, 7.0924e-01, 7.3478e-01, 6.9183e-01],
          [5.5115e-01, 5.7502e-01, 8.1053e-01, 6.5448e-01, 7.6442e-01]]],


        [[[6.6645e-01, 5.6170e-01, 5.5790e-01, 5.9724e-01, 6.7921e-01],
          [5.9885e-01, 6.0820e-01, 5.0443e-02, 1.2864e-01, 3.9098e-01],
          [8.1274e-01, 7.8897e-01, 4.7621e-01, 8.8376e-02, 2.0044e-01],
          [5.5256e-01, 2.6450e-01, 1.5427e-01, 2.6887e-01, 2.2558e-01]],

         [[4.1520e-01, 9.7462e-01, 7.5100e-01, 9.9890e-01, 6.8974e-01],
          [2.3860e-01, 6.1438e-01, 3.9230e-01, 7.8527e-01, 5.9984e-01],
          [5.7508e-01, 7.9849e-02, 8.4372e-01, 1.5977e-01, 1.0906e-01],
          [1.7758e-01, 8.3926e-01, 9.9416e-01, 8.6307e-01, 8.6240e-01]],

         [[4.6696e-01, 8.9729e-01, 9.9784e-01, 8.6357e-01, 2.0131e-01],
          [3.9958e-01, 5.5251e-01, 5.1938e-01, 5.3351e-01, 2.3864e-01],
          [9.4331e-01, 8.3029e-05, 6.8900e-01, 5.0304e-01, 1.3088e-01],
          [6.5368e-01, 9.8662e-01, 7.8843e-01, 4.3189e-01, 9.8437e-01]]]])

torch.rand(2, 3, 4, 5).dim()

torch.rand(2, 3, 4, 5).numel()

torch.ones(2, 4)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])

t = torch.rand(2, 3)
torch.zeros_like(t)             # Matches the size of t

tensor([[0., 0., 0.],
        [0., 0., 0.]])

torch.ones_like(t)

tensor([[1., 1., 1.],
        [1., 1., 1.]])

torch.randn_like(t)

tensor([[-1.6889, -1.4382,  1.1412],
        [ 1.3235, -1.4399, -0.5927]])

torch.arange(2, 10, 3)    # From 2 to 10 in increments of 3

tensor([2, 5, 8])

torch.linspace(2, 10, 3)  # 3 elements from 2 to 10 on the linear scale

tensor([ 2.,  6., 10.])

torch.logspace(2, 10, 3)  # Same on the log scale

tensor([1.0000e+02, 1.0000e+06, 1.0000e+10])

torch.randperm(3)

tensor([0, 1, 2])

torch.eye(3)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

Conversion to/from NumPy

PyTorch tensors can be converted to NumPy ndarrays and vice-versa in a very efficient manner as both objects share the same memory.

From PyTorch tensor to NumPy ndarray

t = torch.rand(2, 3)
t

tensor([[0.7550, 0.4205, 0.3024],
        [0.9266, 0.8816, 0.9083]])

t_np = t.numpy()
t_np

array([[0.7550101 , 0.42052966, 0.30236405],
       [0.92664444, 0.88160074, 0.90829116]], dtype=float32)

From NumPy ndarray to PyTorch tensor

import numpy as np
a = np.random.rand(2, 3)
a

array([[0.08169611, 0.69920343, 0.83400031],
       [0.40020636, 0.99345611, 0.94510268]])

a_pt = torch.from_numpy(a)
a_pt

tensor([[0.0817, 0.6992, 0.8340],
        [0.4002, 0.9935, 0.9451]], dtype=torch.float64)

Note the different default data types.

Indexing tensors

t = torch.rand(3, 4)
t

tensor([[0.3933, 0.6787, 0.4420, 0.1485],
        [0.1954, 0.8715, 0.7792, 0.6891],
        [0.0908, 0.3443, 0.7069, 0.0127]])

t[:, 2]

tensor([0.4420, 0.7792, 0.7069])

t[1, :]

tensor([0.1954, 0.8715, 0.7792, 0.6891])

t[2, 3]

tensor(0.0127)

A word of caution about indexing

While indexing elements of a tensor to extract some of the data as a final step of some computation is fine, you should not use indexing to run operations on tensor elements in a loop as this would be extremely inefficient.

Instead, you want to use vectorized operations.

Vectorized operations

Since PyTorch tensors are homogeneous (i.e. made of a single data type), as with NumPy’s ndarrays, operations are vectorized and thus fast.

NumPy is mostly written in C, PyTorch in C++. With either library, when you run vectorized operations on arrays/tensors, you don’t use raw Python (slow) but compiled C/C++ code (much faster).

Here is an excellent post explaining Python vectorization & why it makes such a big difference.

Data types

Default data type

Since PyTorch tensors were built with efficiency in mind for neural networks, the default data type is 32-bit floating points.

This is sufficient for accuracy and much faster than 64-bit floating points.

By contrast, NumPy ndarrays use 64-bit as their default.

t = torch.rand(2, 4)
t.dtype

torch.float32

Setting data type at creation

The type can be set with the dtype argument:

t = torch.rand(2, 4, dtype=torch.float64)
t

tensor([[0.7931, 0.0869, 0.0231, 0.6726],
        [0.1689, 0.2116, 0.7150, 0.2311]], dtype=torch.float64)

Printed tensors display attributes with values ≠ default values.

t.dtype

torch.float64

Changing data type

t = torch.rand(2, 4)
t.dtype

torch.float32

t2 = t.type(torch.float64)
t2.dtype

torch.float64

List of data types

dtype	Description
torch.float16 / torch.half	16-bit / half-precision floating-point
torch.float32 / torch.float	32-bit / single-precision floating-point
torch.float64 / torch.double	64-bit / double-precision floating-point
torch.uint8	unsigned 8-bit integers
torch.int8	signed 8-bit integers
torch.int16 / torch.short	signed 16-bit integers
torch.int32 / torch.int	signed 32-bit integers
torch.int64 / torch.long	signed 64-bit integers
torch.bool	boolean

Simple operations

t1 = torch.tensor([[1, 2], [3, 4]])
t1

tensor([[1, 2],
        [3, 4]])

t2 = torch.tensor([[1, 1], [0, 0]])
t2

tensor([[1, 1],
        [0, 0]])

Operation performed between elements at corresponding locations:

t1 + t2

tensor([[2, 3],
        [3, 4]])

Operation applied to each element of the tensor:

t1 + 1

tensor([[2, 3],
        [4, 5]])

Reduction

t = torch.ones(2, 3, 4);
t

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

t.sum()   # Reduction over all entries

tensor(24.)

Other reduction functions (e.g. mean) behave the same way.

Reduction over a specific dimension:

t.sum(0)

tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

t.sum(1)

tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.]])

t.sum(2)

tensor([[4., 4., 4.],
        [4., 4., 4.]])

Reduction over multiple dimensions:

t.sum((0, 1))

tensor([6., 6., 6., 6.])

t.sum((0, 2))

tensor([8., 8., 8.])

t.sum((1, 2))

tensor([12., 12.])

In-place operations

With operators post-fixed with _:

t1 = torch.tensor([1, 2])
t1

tensor([1, 2])

t2 = torch.tensor([1, 1])
t2

tensor([1, 1])

t1.add_(t2)
t1

tensor([2, 3])

t1.zero_()
t1

tensor([0, 0])

While reassignments will use new addresses in memory, in-place operations will use the same addresses.

Tensor views

t = torch.tensor([[1, 2, 3], [4, 5, 6]]); print(t)
t.size()
t.view(6)
t.view(3, 2)
t.view(3, -1) # Same: with -1, the size is inferred from other dimensions

Note the difference

t1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
t1

tensor([[1, 2, 3],
        [4, 5, 6]])

t2 = t1.t()
t2

tensor([[1, 4],
        [2, 5],
        [3, 6]])

t3 = t1.view(3, 2)
t3

tensor([[1, 2],
        [3, 4],
        [5, 6]])

Logical operations

t1 = torch.randperm(5)
t1

tensor([1, 0, 2, 4, 3])

t2 = torch.randperm(5)
t2

tensor([4, 1, 2, 0, 3])

Test each element:

t1 > 3

tensor([False, False, False,  True, False])

Test corresponding pairs of elements:

t1 < t2

tensor([ True,  True, False, False, False])

Device attribute

Tensor data can be placed in the memory of various processor types:

the RAM of CPU,
the RAM of a GPU with CUDA support,
the RAM of a GPU with AMD’s ROCm support,
the RAM of an XLA device (e.g. Cloud TPU) with the torch_xla package.

The values for the device attributes are:

CPU: 'cpu',
GPU (CUDA & AMD’s ROCm): 'cuda',
XLA: xm.xla_device().

This last option requires to load the torch_xla package first:

import torch_xla
import torch_xla.core.xla_model as xm

Creating a tensor on a specific device

By default, tensors are created on the CPU.

You can create a tensor on an accelerator by specifying the device attribute (our current training cluster does not have GPUs, so don’t run this on it):

t_gpu = torch.rand(2, device='cuda')

Copying a tensor to a specific device

You can also make copies of a tensor on other devices:

# Make a copy of t on the GPU
t_gpu = t.to(device='cuda')
t_gpu = t.cuda()             # Alternative syntax

# Make a copy of t_gpu on the CPU
t = t_gpu.to(device='cpu')
t = t_gpu.cpu()              # Alternative syntax

Multiple GPUs

If you have multiple GPUs, you can optionally specify which one a tensor should be created on or copied to:

t1 = torch.rand(2, device='cuda:0')  # Create a tensor on 1st GPU
t2 = t1.to(device='cuda:0')          # Make a copy of t1 on 1st GPU
t3 = t1.to(device='cuda:1')          # Make a copy of t1 on 2nd GPU

Or the equivalent short forms:

t2 = t1.cuda(0)
t3 = t1.cuda(1)