Collections

Author

Marie-Hélène Burle

Values can be stored in collections. This section introduces tuples, dictionaries, sets, and arrays in Python.

Lists

Lists are declared in square brackets:

l = [2, 1, 3]
l
[2, 1, 3]
type(l)
list

They are mutable:

l.append(0)
l
[2, 1, 3, 0]

Lists are ordered:

['b', 'a'] == ['a', 'b']
False

They can have repeat values:

['a', 'a', 'a', 't']
['a', 'a', 'a', 't']

Lists can be homogeneous:

['b', 'a', 'x', 'e']
['b', 'a', 'x', 'e']
type('b') == type('a') == type('x') == type('e')
True

or heterogeneous:

[3, 'some string', 2.9, 'z']
[3, 'some string', 2.9, 'z']
type(3) == type('some string') == type(2.9) == type('z')
False

They can even be nested:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8]

The length of a list is the number of items it contains and can be obtained with the function len:

len([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])
3

To extract an item from a list, you index it:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][0]
3

Python starts indexing at 0, so what we tend to think of as the “first” element of a list is for Python the “zeroth” element.

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][1]
['b', 'e', 3.9, ['some string', 9.9]]
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][2]
8
# Of course you can't extract items that don't exist
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][3]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[15], line 2
      1 # Of course you can't extract items that don't exist
----> 2 [3, ['b', 'e', 3.9, ['some string', 9.9]], 8][3]

IndexError: list index out of range

You can index from the end of the list with negative values (here you start at -1 for the last element):

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][-1]
8

Your turn:

How could you extract the string 'some string' from the list [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]?

You can also slice a list:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][0:1]
[3]

Notice how slicing returns a list.

Notice also how the left index is included but the right index excluded.

If you omit the first index, the slice starts at the beginning of the list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][:6]
[1, 2, 3, 4, 5, 6]

If you omit the second index, the slice goes to the end of the list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][6:]
[7, 8, 9]

When slicing, you can specify the stride:

[1, 2, 3, 4, 5, 6, 7, 8, 9][2:7:2]
[3, 5, 7]

The default stride is 1:

[1, 2, 3, 4, 5, 6, 7, 8, 9][2:7] == [1, 2, 3, 4, 5, 6, 7, 8, 9][2:7:1]
True

You can reverse the order of a list with a -1 stride applied on the whole list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]

You can test whether an item is in a list:

3 in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
True
9 in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
False

or not in a list:

3 not in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
False

You can get the index (position) of an item inside a list:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8].index(3)
0

Note that this only returns the index of the first occurrence:

[3, 3, ['b', 'e', 3.9, ['some string', 9.9]], 8].index(3)
0

Lists are mutable (they can be modified). For instance, you can replace items in a list by other items:

L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L[1] = 2
L
[3, 2, 8]

You can delete items from a list using their indices with list.pop:

L.pop(2)
L
[3, 2]

Here, because we are using list.pop, 2 represents the index (the 3rd item).

or with del:

del L[0]
L
[2]

Notice how a list can have a single item:

len(L)
1

It is then called a “singleton list”.

You can also delete items from a list using their values with list.remove:

L.remove(2)
L
[]

Here, because we are using list.remove, 2 is the value 2.

Notice how a list can even be empty:

len(L)
0

You can actually initialise empty lists:

M = []
type(M)
list

You can add items to a list. One at a time:

L.append(7)
L
[7]

And if you want to add multiple items at once?

# This doesn't work...
L.append(3, 6, 9)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[37], line 2
      1 # This doesn't work...
----> 2 L.append(3, 6, 9)

TypeError: list.append() takes exactly one argument (3 given)
# This doesn't work either (that's not what we wanted)
L.append([3, 6, 9])
L
[7, [3, 6, 9]]

Your turn:

Fix this mistake we just made and remove the nested list [3, 6, 9].

One option is:

del L[1]

To add multiple values to a list (and not a nested list), you need to use list.extend:

L.extend([3, 6, 9])
L
[7, 3, 6, 9]

If you don’t want to add an item at the end of a list, you can use list.insert(<index>, <object>):

L.insert(3, 'test')
L
[7, 3, 6, 'test', 9]

Your turn:

Let’s have the following list:

L = [7, [3, 6, 9], 3, 'test', 6, 9]

Insert the string 'nested' in the zeroth position of the nested list [3, 6, 9] in L.

You can sort an homogeneous list:

L = [3, 9, 10, 0]
L.sort()
L
[0, 3, 9, 10]
L = ['some string', 'b', 'a']
L.sort()
L
['a', 'b', 'some string']

Heterogeneous lists cannot be sorted:

L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L.sort()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[45], line 2
      1 L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
----> 2 L.sort()

TypeError: '<' not supported between instances of 'list' and 'int'

You can also get the min and max value of homogeneous lists:

min([3, 9, 10, 0])
0
max(['some string', 'b', 'a'])
'some string'

For heterogeneous lists, this also doesn’t work:

min([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[48], line 1
----> 1 min([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])

TypeError: '<' not supported between instances of 'list' and 'int'

Lists can be concatenated with +:

L + [3, 6, 9]
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8, 3, 6, 9]

or repeated with *:

L * 3
[3,
 ['b', 'e', 3.9, ['some string', 9.9]],
 8,
 3,
 ['b', 'e', 3.9, ['some string', 9.9]],
 8,
 3,
 ['b', 'e', 3.9, ['some string', 9.9]],
 8]

To sum up, lists are declared in square brackets. They are mutable, ordered (thus indexable), and possibly heterogeneous collections of values.

Strings

Strings behave (a little) like lists of characters in that they have a length (the number of characters):

S = 'This is a string.'
len(S)
17

They have a min and a max:

min(S)
' '
max(S)
't'

You can index them:

S[3]
's'

Slice them:

S[10:16]
'string'

Your turn:

Reverse the order of the string S.

They can also be concatenated with +:

T = 'This is another string.'
print(S + ' ' + T)
This is a string. This is another string.

or repeated with *:

print(S * 3)
This is a string.This is a string.This is a string.
print((S + ' ') * 3)
This is a string. This is a string. This is a string. 

This is where the similarities stop however: methods such as list.sort, list.append, etc. will not work on strings.

Arrays

Python comes with a built-in array module. When you need arrays for storing and retrieving data, this module is perfectly suitable and extremely lightweight. This tutorial covers the syntax in detail.

Whenever you plan on performing calculations on your data however (which is the vast majority of cases), you should instead use the NumPy package, covered in another section.

Tuples

Tuples are defined with parentheses:

t = (3, 1, 4, 2)
t
(3, 1, 4, 2)
type(t)
tuple

Tuples are ordered:

(2, 3) == (3, 2)
False

This means that they are indexable and sliceable:

(2, 4, 6)[2]
6
(2, 4, 6)[::-1]
(6, 4, 2)

They can be nested:

type((3, 1, (0, 2)))
tuple
len((3, 1, (0, 2)))
3
max((3, 1, 2))
3

They can be heterogeneous:

type(('string', 2, True))
tuple

You can create empty tuples:

type(())
tuple

You can also create singleton tuples, but the syntax is a bit odd:

# This is not a tuple...
type((1))
int
# This is the weird way to define a singleton tuple
type((1,))
tuple

However, the big difference with lists is that tuples are immutable:

T = (2, 5)
T[0] = 8
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[71], line 2
      1 T = (2, 5)
----> 2 T[0] = 8

TypeError: 'tuple' object does not support item assignment

Tuples are quite fascinating:

a, b = 1, 2
a, b
(1, 2)
a, b = b, a
a, b
(2, 1)

Tuples are declared in parentheses. They are immutable, ordered (thus indexable), and possibly heterogeneous collections of values.

Sets

Sets are declared in curly braces:

s = {3, 2, 5}
s
{2, 3, 5}
type(s)
set

Sets are unordered:

{2, 4, 1} == {4, 2, 1}
True

Consequently, it makes no sense to index a set.

Sets can be heterogeneous:

S = {2, 'a', 'string'}
isinstance(S, set)
True
type(2) == type('a') == type('string')
False

There are no duplicates in a set:

{2, 2, 'a', 2, 'string', 'a'}
{2, 'a', 'string'}

You can define an empty set, but only with the set function (because empty curly braces define a dictionary):

t = set()
t
set()
len(t)
0
type(t)
set

Since strings an iterables, you can use set to get a set of the unique characters:

set('abba')
{'a', 'b'}

Your turn:

How could you create a set with the single element 'abba' in it?

Sets are declared in curly brackets. They are mutable, unordered (thus non indexable), possibly heterogeneous collections of unique values.

Dictionaries

Dictionaries are declared in curly braces. They associate values to keys:

d = {'key1': 'value1', 'key2': 'value2'}
d
{'key1': 'value1', 'key2': 'value2'}
type(d)
dict

Dictionaries are unordered:

{'a': 1, 'b': 2} == {'b': 2, 'a': 1}
True

Consequently, the pairs themselves cannot be indexed. However, you can access values in a dictionary from their keys:

D = {'c': 1, 'a': 3, 'b': 2}
D['b']
2
D.get('b')
2
D.items()
dict_items([('c', 1), ('a', 3), ('b', 2)])
D.values()
dict_values([1, 3, 2])
D.keys()
dict_keys(['c', 'a', 'b'])

To return a sorted list of keys:

sorted(D)
['a', 'b', 'c']

You can create empty dictionaries:

E = {}
type(E)
dict

Dictionaries are mutable, so you can add, remove, or replace items.

Let’s add an item to our empty dictionary E:

E['author'] = 'Proust'
E
{'author': 'Proust'}

We can add another one:

E['title'] = 'In search of lost time'
E
{'author': 'Proust', 'title': 'In search of lost time'}

We can modify one:

E['author'] = 'Marcel Proust'
E
{'author': 'Marcel Proust', 'title': 'In search of lost time'}

Your turn:

Add a third item to E with the number of volumes.

We can also remove items:

E.pop('author')
E
{'title': 'In search of lost time'}

Another method to remove items:

del E['title']
E
{}

Dictionaries are declared in curly braces. They are mutable and unordered collections of key/value pairs. They play the role of an associative array.

Conversion between collections

From tuple to list:

list((3, 8, 1))
[3, 8, 1]

From tuple to set:

set((3, 2, 3, 3))
{2, 3}

From list to tuple:

tuple([3, 1, 4])
(3, 1, 4)

From list to set:

set(['a', 2, 4])
{2, 4, 'a'}

From set to tuple:

tuple({2, 3})
(2, 3)

From set to list:

list({2, 3})
[2, 3]

Collections module

Python has a built-in collections module providing the additional data structures: deque, defaultdict, namedtuple, OrderedDict, Counter, ChainMap, UserDict, UserList, and UserList.