Collections

Author

Marie-Hélène Burle

Values can be stored in collections. This section introduces tuples, dictionaries, sets, and arrays in Python.

Lists

Lists are declared in square brackets:

l = [2, 1, 3]
type(l)
list

They are ordered:

['b', 'a'] == ['a', 'b']
False

They can have repeat values:

['a', 'a', 'a', 't']
['a', 'a', 'a', 't']

They can be homogeneous:

['b', 'a', 'x', 'e']
['b', 'a', 'x', 'e']
type('b') == type('a') == type('x') == type('e')
True

or heterogeneous:

[3, 'some string', 2.9, 'z']
[3, 'some string', 2.9, 'z']
type(3) == type('some string') == type(2.9) == type('z')
False

They can even be nested:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8]

The length of a list is the number of items it contains and can be obtained with the function len:

len([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])
3

Your turn:

What are the 3 elements of this list?
What are their respective types?

To extract an item from a list, you index it:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][0]
3

Python starts indexing at 0, so what we tend to think of as the “first” element of a list is for Python the “zeroth” element.

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][1]
['b', 'e', 3.9, ['some string', 9.9]]
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][2]
8

Of course you can’t extract items that don’t exist:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][3]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[342], line 1
----> 1 [3, ['b', 'e', 3.9, ['some string', 9.9]], 8][3]

IndexError: list index out of range

You can index from the end of the list with negative values (here you start at -1 for the last element):

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][-1]
8

Your turn:

How could you extract the string 'some string' from the list:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8]

You can also slice a list:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][0:1]
[3]

Notice how slicing returns a list.
Notice also how the left index is included but the right index excluded.

If you omit the first index, the slice starts at the beginning of the list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][:6]
[1, 2, 3, 4, 5, 6]

If you omit the second index, the slice goes to the end of the list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][6:]
[7, 8, 9]

Your turn:

From the list:

l = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Extract the list [4, 5, 6].

When slicing, you can specify the stride (the step size):

[1, 2, 3, 4, 5, 6, 7, 8, 9][2:7:2]
[3, 5, 7]

The default stride is 1:

[1, 2, 3, 4, 5, 6, 7, 8, 9][2:7] == [1, 2, 3, 4, 5, 6, 7, 8, 9][2:7:1]
True

You can reverse the order of a list with a -1 stride applied on the whole list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]

You can test whether an item is in a list:

3 in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
True
9 in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
False

or not in a list:

3 not in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
False

You can get the index (position) of an item inside a list:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8].index(3)
0

Note that this only returns the index of the first occurrence:

[3, 3, ['b', 'e', 3.9, ['some string', 9.9]], 8].index(3)
0

Lists are mutable (they can be modified), so you can replace items in a list by other items:

L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L[1] = 2
print(L)
[3, 2, 8]

You can delete items from a list using their indices with list.pop:

L.pop(2)
print(L)
[3, 2]

Your turn:

Why is 2 still in the list after using L.pop(2)?

or with del:

del L[0]
print(L)
[2]

Notice how a list can have a single item:

len(L)
1

It is then called a “singleton list”.

You can also delete items from a list using their values with list.remove:

L.remove(2)
L
[]

Here, because we are using list.remove, 2 represents the value 2, not the index.

Notice how a list can even be empty:

len(L)
0

You can actually initialise empty lists:

M = []
type(M)
list

You can add items to a list. One at a time as we saw at the top of this page:

L.append(7)
print(L)
[7]

And if you want to add multiple items at once?

# This doesn't work...
L.append(3, 6, 9)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[364], line 2
      1 # This doesn't work...
----> 2 L.append(3, 6, 9)

TypeError: list.append() takes exactly one argument (3 given)
# This doesn't work either (that's not what we wanted)
L.append([3, 6, 9])
print(L)
[7, [3, 6, 9]]

Your turn:

Fix this mistake we just made and delete the nested list [3, 6, 9].

To add multiple values to a list (and not a nested list), you need to use list.extend:

L.extend([3, 6, 9])
print(L)
[7, [3, 6, 9], 3, 6, 9]

If you don’t want to add an item at the end of a list, you can use list.insert(<index>, <object>).

Your turn:

Between which elements will 'test' be inserted?

You can sort homogeneous lists:

L = [3, 9, 10, 0]
L.sort()
print(L)
[0, 3, 9, 10]
L = ['some string', 'b', 'a']
L.sort()
print(L)
['a', 'b', 'some string']

Heterogeneous lists cannot be sorted:

L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L.sort()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[369], line 2
      1 L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
----> 2 L.sort()

TypeError: '<' not supported between instances of 'list' and 'int'

You can also get the min and max value of homogeneous lists:

min([3, 9, 10, 0])
0
max(['some string', 'b', 'a'])
'some string'

For heterogeneous lists, this also doesn’t work:

min([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[372], line 1
----> 1 min([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])

TypeError: '<' not supported between instances of 'list' and 'int'

Lists can be concatenated with +:

L + [3, 6, 9]
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8, 3, 6, 9]

or repeated with *:

L * 3
[3,
 ['b', 'e', 3.9, ['some string', 9.9]],
 8,
 3,
 ['b', 'e', 3.9, ['some string', 9.9]],
 8,
 3,
 ['b', 'e', 3.9, ['some string', 9.9]],
 8]

Your turn:

Do you remember this exercise from an earlier section?

a = 1
b = a
a = 2

What do you think the value of b is now?

Here is a new exercise for you:

a = [0, 1, 2]
b = a
a.append(3)

What do you think the value of b is now?

Wait, what? 😵

This is because, in Python, a scalar (an object with a single value) is immutable, meaning that you can’t change its value.

So in the first example, when we run a = 2, we are creating a new object (because the value of the initial object cannot be changed).

Lists however are mutable. So when we run a.append(3), we are not creating a new object. Instead we are modifying the existing object. Since both a and b point to that object, if the object changes, the value of both a and b changes.

In this case, if you want to create a copy, you have to use the copy function from the copy module (we will talk about modules later in the course):

import copy

a = [0, 1, 2]
b = a.copy()
a.append(3)

print(b)
[0, 1, 2]

To sum up, lists are declared in square brackets. They are mutable, ordered (thus indexable), and possibly heterogeneous collections of values.

List comprehensions

List comprehensions allow to create lists by applying a function to or testing a condition on each element of iterables.

Examples:

Create a new list by applying a function to each element of a first list:

l = [-3, 5, -2, 0, 9]
l2 = [x**2 for x in l]
print(l2)
[9, 25, 4, 0, 81]

Create a new list by testing a condition on each element of a first list:

l3 = [x for x in l if x<=0]
print(l3)
[-3, -2, 0]

Create a new list by applying a function to each element of a first list matching a condition:

l4 = [x**2 for x in l if x<=0]
print(l4)
[9, 4, 0]

Flatten a list with two for statements:

nested_l = [[1, 2], [3], [4, 5, 6]]
flat_l = [x for y in nested_l for x in y]
print(flat_l)
[1, 2, 3, 4, 5, 6]

By adding more for statements, you can flatten more deeply nested lists:

l = [[[3, 4], [4]]]
[x for y in l for z in y for x in z]
[3, 4, 4]

Strings

Strings behave (a little) like lists of characters in that they have a length (the number of characters):

S = 'This is a string.'
len(S)
17

They have a min and a max:

min(S)
' '
max(S)
't'

You can index them:

S[3]
's'

Slice them:

S[10:16]
'string'

Your turn:

Reverse the order of the string S.

They can also be concatenated with +:

T = 'This is another string.'
print(S + ' ' + T)
This is a string. This is another string.

or repeated with *:

print(S * 3)
This is a string.This is a string.This is a string.

Your turn:

Modify the last expression to have spaces after the periods.

This is where the similarities stop however: methods such as list.sort, list.append, etc. will not work on strings.

S.append('This will fail.')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[388], line 1
----> 1 S.append('This will fail.')

AttributeError: 'str' object has no attribute 'append'

Arrays

Python comes with a built-in array module. When you need arrays for storing and retrieving data, this module is perfectly suitable and extremely lightweight. This tutorial covers the syntax in detail.

Whenever you plan on performing calculations on your data however (which is the vast majority of cases), you should instead use the NumPy package. We will talk about NumPy briefly in another section.

Tuples

Tuples are defined with parentheses:

t = (3, 1, 4, 2)
type(t)
tuple

They are ordered:

(2, 3) == (3, 2)
False

This means that they are indexable and sliceable:

(2, 4, 6)[2]
6
(2, 4, 6)[::-1]
(6, 4, 2)

They can be nested:

type((3, 1, (0, 2)))
tuple
len((3, 1, (0, 2)))
3
max((3, 1, 2))
3

They can be heterogeneous:

type(('string', 2, True))
tuple

You can create empty tuples:

type(())
tuple

You can also create singleton tuples, but the syntax is a bit odd:

# This is not a tuple...
type((1))
int
# This is the weird way to define a singleton tuple
type((1,))
tuple

However, the big difference with lists is that tuples are immutable:

T = (2, 5)
T[0] = 8
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[401], line 2
      1 T = (2, 5)
----> 2 T[0] = 8

TypeError: 'tuple' object does not support item assignment

Tuples are quite fascinating:

a, b = 1, 2
print(a, b)
1 2
a, b = b, a
print(a, b)
2 1

Tuples are declared in parentheses. They are immutable, ordered (thus indexable), and possibly heterogeneous collections of values.

Sets

Sets are declared in curly braces:

s = {3, 2, 5}
type(s)
set

They are unordered:

{2, 4, 1} == {4, 2, 1}
True

Consequently, it makes no sense to index a set:

s[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[407], line 1
----> 1 s[0]

TypeError: 'set' object is not subscriptable

Sets can be heterogeneous:

S = {2, 'a', 'string'}
isinstance(S, set)
True
type(2) == type('a') == type('string')
False

There are no duplicates in a set:

{2, 2, 'a', 2, 'string', 'a'}
{2, 'a', 'string'}

You can define an empty set, but only with the set function (because empty curly braces define a dictionary as we will see below):

t = set()
len(t)
0
type(t)
set

Since strings an iterables, you can use set to get a set of the unique characters:

set('abba')
{'a', 'b'}

Your turn:

How could you create a set with the single element 'abba' in it?

Sets are declared in curly brackets. They are mutable, unordered (thus non indexable), possibly heterogeneous collections of unique values.

Dictionaries

Dictionaries are also declared in curly braces. They associate values to keys:

d = {'key1': 'value1', 'key2': 'value2'}
type(d)
dict

The key/value pairs are unique:

{'key1': 'value1', 'key2': 'value2', 'key1': 'value1'}
{'key1': 'value1', 'key2': 'value2'}

They are unordered:

{'a': 1, 'b': 2} == {'b': 2, 'a': 1}
True

Consequently, the pairs themselves cannot be indexed:

d[0]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[419], line 1
----> 1 d[0]

KeyError: 0

However, you can access values from their keys:

D = {'c': 1, 'a': 3, 'b': 2}
D['b']
2

or:

D.get('b')
2

There are methods to get the items (the pairs), the values, or the keys:

D.items()
dict_items([('c', 1), ('a', 3), ('b', 2)])
D.values()
dict_values([1, 3, 2])
D.keys()
dict_keys(['c', 'a', 'b'])

To return a sorted list of keys:

sorted(D)
['a', 'b', 'c']

You can create empty dictionaries:

E = {}
type(E)
dict

Dictionaries are mutable, so you can add, remove, or replace items.

Let’s add an item to our empty dictionary E:

E['author'] = 'Proust'
print(E)
{'author': 'Proust'}

We can add another one:

E['title'] = 'In search of lost time'
print(E)
{'author': 'Proust', 'title': 'In search of lost time'}

We can modify one:

E['author'] = 'Marcel Proust'
E
{'author': 'Marcel Proust', 'title': 'In search of lost time'}

Your turn:

Add a third item to E with the number of volumes.

We can also remove items:

E.pop('author')
print(E)
{'title': 'In search of lost time'}

or:

del E['title']
print(E)
{}

Dictionaries are declared in curly braces. They are mutable and unordered collections of unique key/value pairs. They play the role of an associative array.

Conversion between collections

From tuple to list:

list((3, 8, 1))
[3, 8, 1]

From tuple to set:

set((3, 2, 3, 3))
{2, 3}

From list to tuple:

tuple([3, 1, 4])
(3, 1, 4)

From list to set:

set(['a', 2, 4])
{2, 4, 'a'}

From set to tuple:

tuple({2, 3})
(2, 3)

From set to list:

list({2, 3})
[2, 3]

Collections module

Python has a built-in collections module providing the additional much more niche data structures: deque, defaultdict, namedtuple, OrderedDict, Counter, ChainMap, UserDict, UserList, and UserList.