import torchPyTorch tensors
Before information can be processed by algorithms, it needs to be converted to floating point numbers. Indeed, you don’t pass a sentence or an image through a model; instead you input numbers representing a sequence of words or pixel values.
All these floating point numbers need to be stored in a data structure. The most suited structure is multidimensional (to hold several layers of information) and homogeneous—all data of the same type—for efficiency.
Python already has several multidimensional array structures (e.g. NumPy’s ndarray) but the particularities of deep learning call for special characteristics such as the ability to run operations on GPUs and/or in a distributed fashion, the ability to keep track of computation graphs for automatic differentiation, and different defaults (lower precision for improved training performance).
The PyTorch tensor is a Python data structure with these characteristics that can easily be converted to/from NumPy’s ndarray and integrates well with other Python libraries such as Pandas.
In this section, we will explore the basics of PyTorch tensors.
Importing PyTorch
First of all, we need to import the torch library:
We can check its version with:
torch.__version__'2.7.0+cu126'
Creating tensors
There are many ways to create tensors:
torch.tensor: Input individual valuestorch.arange: 1D tensor with a sequence of integerstorch.linspace: 1D linear scale tensortorch.logspace: 1D log scale tensortorch.rand: Random numbers from a uniform distribution on[0, 1)torch.randn: Numbers from the standard normal distributiontorch.randperm: Random permutation of integerstorch.empty: Uninitialized tensortorch.zeros: Tensor filled with0torch.ones: Tensor filled with1torch.eye: Identity matrix
From input values
t = torch.tensor(3)Your turn:
Without using the shape descriptor, try to get the shape of the following tensors:
torch.tensor([0.9704, 0.1339, 0.4841])
torch.tensor([[0.9524, 0.0354],
[0.9833, 0.2562],
[0.0607, 0.6420]])
torch.tensor([[[0.4604, 0.2699],
[0.8360, 0.0317],
[0.3289, 0.1171]]])
torch.tensor([[[[0.0730, 0.8737],
[0.2305, 0.4719],
[0.0796, 0.2745]]],
[[[0.1534, 0.9442],
[0.3287, 0.9040],
[0.0948, 0.1480]]]])Let’s create a random tensor with a single element:
t = torch.rand(1)
ttensor([0.1258])
We can extract the value from a tensor with one element:
t.item()0.1257982850074768
All these tensors have a single element, but an increasing number of dimensions:
torch.rand(1)tensor([0.3900])
torch.rand(1, 1)tensor([[0.6882]])
torch.rand(1, 1, 1)tensor([[[0.0423]]])
torch.rand(1, 1, 1, 1)tensor([[[[0.3623]]]])
You can tell the number of dimensions of a tensor easily by counting the number of opening square brackets.
torch.rand(1, 1, 1, 1).dim()4
Tensors can have multiple elements in one dimension:
torch.rand(6)tensor([0.3194, 0.8324, 0.6842, 0.5462, 0.4335, 0.0477])
torch.rand(6).dim()1
And multiple elements in multiple dimensions:
torch.rand(2, 3, 4, 5)tensor([[[[0.2751, 0.7491, 0.3606, 0.1847, 0.8210],
[0.8549, 0.7280, 0.6912, 0.3304, 0.3114],
[0.4724, 0.8165, 0.2218, 0.6130, 0.3458],
[0.6167, 0.2413, 0.8206, 0.5638, 0.0965]],
[[0.9852, 0.8703, 0.9640, 0.4937, 0.9714],
[0.9394, 0.5743, 0.9706, 0.0757, 0.7892],
[0.9826, 0.3664, 0.3062, 0.6258, 0.0423],
[0.0121, 0.7599, 0.6933, 0.6317, 0.8294]],
[[0.9104, 0.3898, 0.7956, 0.4905, 0.2473],
[0.0213, 0.9614, 0.4768, 0.8116, 0.2958],
[0.9169, 0.7930, 0.0436, 0.5157, 0.5013],
[0.4241, 0.3144, 0.1485, 0.6809, 0.7301]]],
[[[0.3292, 0.6150, 0.4489, 0.1435, 0.9072],
[0.5220, 0.7579, 0.6088, 0.5416, 0.7387],
[0.5016, 0.1188, 0.1102, 0.4963, 0.6499],
[0.4095, 0.9137, 0.9722, 0.5457, 0.5097]],
[[0.3042, 0.6062, 0.8467, 0.2048, 0.8266],
[0.0151, 0.9860, 0.2823, 0.8156, 0.0425],
[0.9102, 0.9277, 0.8388, 0.1567, 0.0447],
[0.6520, 0.5048, 0.7269, 0.2211, 0.4119]],
[[0.6430, 0.9144, 0.4872, 0.4569, 0.4097],
[0.5599, 0.1621, 0.3895, 0.4058, 0.1664],
[0.9839, 0.9917, 0.4786, 0.5395, 0.3695],
[0.9295, 0.4590, 0.2973, 0.9712, 0.3366]]]])
torch.rand(2, 3, 4, 5).dim()4
torch.rand(2, 3, 4, 5).numel()120
torch.ones(2, 4)tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.]])
t = torch.rand(2, 3)
torch.zeros_like(t) # Matches the size of ttensor([[0., 0., 0.],
[0., 0., 0.]])
torch.ones_like(t)tensor([[1., 1., 1.],
[1., 1., 1.]])
torch.randn_like(t)tensor([[ 0.2893, -1.7632, -0.2417],
[-1.4069, 0.8735, 0.6806]])
torch.arange(2, 10, 3) # From 2 to 10 in increments of 3tensor([2, 5, 8])
torch.linspace(2, 10, 3) # 3 elements from 2 to 10 on the linear scaletensor([ 2., 6., 10.])
torch.logspace(2, 10, 3) # Same on the log scaletensor([1.0000e+02, 1.0000e+06, 1.0000e+10])
torch.randperm(3)tensor([2, 0, 1])
torch.eye(3)tensor([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
Conversion to/from NumPy
PyTorch tensors can be converted to NumPy ndarrays and vice-versa in a very efficient manner as both objects share the same memory.
From PyTorch tensor to NumPy ndarray
t = torch.rand(2, 3)
ttensor([[0.4518, 0.4918, 0.1410],
[0.9275, 0.2999, 0.2147]])
t_np = t.numpy()
t_nparray([[0.45182675, 0.4917711 , 0.14095235],
[0.9274815 , 0.29993367, 0.2146874 ]], dtype=float32)
From NumPy ndarray to PyTorch tensor
import numpy as np
a = np.random.rand(2, 3)
aarray([[0.95829615, 0.14386425, 0.18845223],
[0.38030131, 0.26575602, 0.55428177]])
a_pt = torch.from_numpy(a)
a_pttensor([[0.9583, 0.1439, 0.1885],
[0.3803, 0.2658, 0.5543]], dtype=torch.float64)
Note the different default data types.
Indexing tensors
t = torch.rand(3, 4)
ttensor([[0.0526, 0.0594, 0.8536, 0.7605],
[0.8433, 0.6671, 0.7284, 0.7912],
[0.1491, 0.4907, 0.3182, 0.5749]])
t[:, 2]tensor([0.8536, 0.7284, 0.3182])
t[1, :]tensor([0.8433, 0.6671, 0.7284, 0.7912])
t[2, 3]tensor(0.5749)
A word of caution about indexing
While indexing elements of a tensor to extract some of the data as a final step of some computation is fine, you should not use indexing to run operations on tensor elements in a loop as this would be extremely inefficient.
Instead, you want to use vectorized operations.
Vectorized operations
Since PyTorch tensors are homogeneous (i.e. made of a single data type), as with NumPy’s ndarrays, operations are vectorized and thus fast.
NumPy is mostly written in C, PyTorch in C++. With either library, when you run vectorized operations on arrays/tensors, you don’t use raw Python (slow) but compiled C/C++ code (much faster).
Here is an excellent post explaining Python vectorization & why it makes such a big difference.
Data types
Default data type
Since PyTorch tensors were built with efficiency in mind for neural networks, the default data type is 32-bit floating points.
This is sufficient for accuracy and much faster than 64-bit floating points.
By contrast, NumPy ndarrays use 64-bit as their default.
t = torch.rand(2, 4)
t.dtypetorch.float32
Setting data type at creation
The type can be set with the dtype argument:
t = torch.rand(2, 4, dtype=torch.float64)
ttensor([[0.0689, 0.1494, 0.6843, 0.0534],
[0.7135, 0.0026, 0.4056, 0.5815]], dtype=torch.float64)
Printed tensors display attributes with values ≠ default values.
t.dtypetorch.float64
Changing data type
t = torch.rand(2, 4)
t.dtypetorch.float32
t2 = t.type(torch.float64)
t2.dtypetorch.float64
List of data types
| dtype | Description |
|---|---|
| torch.float16 / torch.half | 16-bit / half-precision floating-point |
| torch.float32 / torch.float | 32-bit / single-precision floating-point |
| torch.float64 / torch.double | 64-bit / double-precision floating-point |
| torch.uint8 | unsigned 8-bit integers |
| torch.int8 | signed 8-bit integers |
| torch.int16 / torch.short | signed 16-bit integers |
| torch.int32 / torch.int | signed 32-bit integers |
| torch.int64 / torch.long | signed 64-bit integers |
| torch.bool | boolean |
Simple operations
t1 = torch.tensor([[1, 2], [3, 4]])
t1tensor([[1, 2],
[3, 4]])
t2 = torch.tensor([[1, 1], [0, 0]])
t2tensor([[1, 1],
[0, 0]])
Operation performed between elements at corresponding locations:
t1 + t2tensor([[2, 3],
[3, 4]])
Operation applied to each element of the tensor:
t1 + 1tensor([[2, 3],
[4, 5]])
Reduction
t = torch.ones(2, 3, 4);
ttensor([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
t.sum() # Reduction over all entriestensor(24.)
Other reduction functions (e.g. mean) behave the same way.
Reduction over a specific dimension:
t.sum(0)tensor([[2., 2., 2., 2.],
[2., 2., 2., 2.],
[2., 2., 2., 2.]])
t.sum(1)tensor([[3., 3., 3., 3.],
[3., 3., 3., 3.]])
t.sum(2)tensor([[4., 4., 4.],
[4., 4., 4.]])
Reduction over multiple dimensions:
t.sum((0, 1))tensor([6., 6., 6., 6.])
t.sum((0, 2))tensor([8., 8., 8.])
t.sum((1, 2))tensor([12., 12.])
In-place operations
With operators post-fixed with _:
t1 = torch.tensor([1, 2])
t1tensor([1, 2])
t2 = torch.tensor([1, 1])
t2tensor([1, 1])
t1.add_(t2)
t1tensor([2, 3])
t1.zero_()
t1tensor([0, 0])
While reassignments will use new addresses in memory, in-place operations will use the same addresses.
Tensor views
t = torch.tensor([[1, 2, 3], [4, 5, 6]]); print(t)
t.size()
t.view(6)
t.view(3, 2)
t.view(3, -1) # Same: with -1, the size is inferred from other dimensionsNote the difference
t1 = torch.tensor([[1, 2, 3], [4, 5, 6]])
t1tensor([[1, 2, 3],
[4, 5, 6]])
t2 = t1.t()
t2tensor([[1, 4],
[2, 5],
[3, 6]])
t3 = t1.view(3, 2)
t3tensor([[1, 2],
[3, 4],
[5, 6]])
Logical operations
t1 = torch.randperm(5)
t1tensor([2, 4, 0, 3, 1])
t2 = torch.randperm(5)
t2tensor([0, 3, 1, 2, 4])
Test each element:
t1 > 3tensor([False, True, False, False, False])
Test corresponding pairs of elements:
t1 < t2tensor([False, False, True, False, True])
Device attribute
Tensor data can be placed in the memory of various processor types:
- the RAM of CPU,
- the RAM of a GPU with CUDA support,
- the RAM of a GPU with AMD’s ROCm support,
- the RAM of an XLA device (e.g. Cloud TPU) with the torch_xla package.
The values for the device attributes are:
- CPU:
'cpu', - GPU (CUDA & AMD’s ROCm):
'cuda', - XLA:
xm.xla_device().
This last option requires to load the torch_xla package first:
import torch_xla
import torch_xla.core.xla_model as xmCreating a tensor on a specific device
By default, tensors are created on the CPU.
You can create a tensor on an accelerator by specifying the device attribute (our current training cluster does not have GPUs, so don’t run this on it):
t_gpu = torch.rand(2, device='cuda')Copying a tensor to a specific device
You can also make copies of a tensor on other devices:
# Make a copy of t on the GPU
t_gpu = t.to(device='cuda')
t_gpu = t.cuda() # Alternative syntax
# Make a copy of t_gpu on the CPU
t = t_gpu.to(device='cpu')
t = t_gpu.cpu() # Alternative syntaxMultiple GPUs
If you have multiple GPUs, you can optionally specify which one a tensor should be created on or copied to:
t1 = torch.rand(2, device='cuda:0') # Create a tensor on 1st GPU
t2 = t1.to(device='cuda:0') # Make a copy of t1 on 1st GPU
t3 = t1.to(device='cuda:1') # Make a copy of t1 on 2nd GPUOr the equivalent short forms:
t2 = t1.cuda(0)
t3 = t1.cuda(1)