Collections

Author

Marie-Hélène Burle

Lists

Lists are declared in square brackets. They are mutable, ordered (thus indexable), and possibly heterogeneous collections of values.

Lists are ordered:

['b', 'a'] == ['b', 'a']
True
['b', 'a'] == ['a', 'b']
False

They can have repeat values:

type(['a', 'a', 'a', 't'])
list

Lists can be homogeneous:

type(['b', 'a', 'x', 'e'])
list
type('b') == type('a') == type('x') == type('e')
True

or heterogeneous:

type([3, 'some string', 2.9, 'z'])
list
type(3) == type('some string') == type(2.9) == type('z')
False

They can even be nested:

type([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])
list

The length of a list is the number of items it contains and can be obtained with the function len:

len([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])
3

To extract an item from a list, you index it:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][0]
3

Python starts indexing at 0, so what we tend to think of as the “first” element of a list is for Python the “zeroth” element.

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][1]
['b', 'e', 3.9, ['some string', 9.9]]
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][2]
8
# Of course you can't extract items that don't exist
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][3]
IndexError: list index out of range

You can index from the end of the list with negative values (here you start at -1 for the last element):

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][-1]
8

Your turn:

How could you extract the string 'some string' from the list [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]?

You can also slice (index multiple values) a list:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8][0:1]
[3]

Notice how slicing returns a list.

Notice also how the left index is included but the right index excluded.

If you omit the first index the slice starts at the beginning of the list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][:6]
[1, 2, 3, 4, 5, 6]

If you omit the second index the slice goes to the end of the list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][6:]
[7, 8, 9]

When slicing, you can specify the stride:

[1, 2, 3, 4, 5, 6, 7, 8, 9][2:7:2]
[3, 5, 7]

The default stride is 1:

[1, 2, 3, 4, 5, 6, 7, 8, 9][2:7] == [1, 2, 3, 4, 5, 6, 7, 8, 9][2:7:1]
True

A consequence of the stride is that you can reverse the order of a list with a -1 stride applied on the whole list:

[1, 2, 3, 4, 5, 6, 7, 8, 9][::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]

You can test whether an item is in a list:

3 in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
True
9 in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
False

or not in a list:

3 not in [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
False

You can get the index (position) of an item inside a list:

[3, ['b', 'e', 3.9, ['some string', 9.9]], 8].index(3)
0

Note that this only returns the index of the first occurrence:

[3, 3, ['b', 'e', 3.9, ['some string', 9.9]], 8].index(3)
0

Lists are mutable (they can be modified):

  • You can replace items in a list by other items:
L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L
[3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L[1] = 2
L
[3, 2, 8]
  • You can delete items from a list:
# Using their indices with list.pop
L.pop(2)
L
[3, 2]

Here, because we are using list.pop, 2 represents the index (the 3rd item).

# Using their indexes with del
del L[0]
L
[2]

Notice how a list can have a single item:

len(L)
1

It is then called a “singleton list”.

# Using their values
L.remove(2)
L
[]

Here, because we are using list.remove, 2 is the value 2.

Notice how a list can even be empty:

len(L)
0

You can actually initialise empty lists:

M = []
type(M)
list
  • You can add items to a list:

One at a time:

L.append(7)
L
[7]

And if you want to add multiple items at once?

# This doesn't work...
L.append(3, 6, 9)
TypeError: list.append() takes exactly one argument (3 given)
# This doesn't work either (that's not what we wanted)
L.append([3, 6, 9])
L
[7, [3, 6, 9]]

In this case, you need to use list.extend:

L.extend([3, 6, 9])
L
[7, [3, 6, 9], 3, 6, 9]

If you don’t want to add an item at the end of a list, you can use list.insert(<index>, <object>):

L.insert(3, 'test')
L
[7, [3, 6, 9], 3, 'test', 6, 9]

Your turn:

Insert the string 'nested' in the zeroth position of the nested list [3, 6, 9] in L.

(If you are running behind, you can recreate L with:

L = [7, [3, 6, 9], 3, 'test', 6, 9]
  • You can sort an homogeneous list:
# Items of different types cannot be sorted
L = [3, ['b', 'e', 3.9, ['some string', 9.9]], 8]
L.sort()
TypeError: '<' not supported between instances of 'list' and 'int'
L = [3, 9, 10, 0]
L.sort()
L
[0, 3, 9, 10]
L = ['some string', 'b', 'a']
L.sort()
L
['a', 'b', 'some string']

You can also get the min and max value of homogeneous lists:

min([3, 9, 10, 0])
0
max(['some string', 'b', 'a'])
'some string'

But not heterogeneous lists:

min([3, ['b', 'e', 3.9, ['some string', 9.9]], 8])
TypeError: '<' not supported between instances of 'list' and 'int'

Lists can also be concatenated with +:

L + [3, 6, 9]
['a', 'b', 'some string', 3, 6, 9]

or repeated with *:

L * 3
['a', 'b', 'some string', 'a', 'b', 'some string', 'a', 'b', 'some string']

Strings

Strings behave (a little) like lists of characters in that they have a length (the number of characters):

S = 'This is a string.'
len(S)
17

They have a min and a max:

min(S)
' '
max(S)
't'

You can index them:

S[3]
's'

Slice them:

S[10:16]
'string'

Your turn:

Reverse the order of the string S.

They can also be concatenated with +:

T = 'This is another string.'
print(S + ' ' + T)
This is a string. This is another string.

or repeated with *:

print(S * 3)
This is a string.This is a string.This is a string.
print((S + ' ') * 3)
This is a string. This is a string. This is a string. 

This is where the similarities stop however: methods such as list.sort, list.append, etc. will not work on strings.

Arrays

Python comes with a built-in array module. When you need arrays for storing and retrieving data, this module is perfectly suitable and extremely lightweight. This tutorial covers the syntax in detail.

Whenever you plan on performing calculations on your data however (which is the vast majority of cases), you should instead use the NumPy package that Alex will cover this afternoon.

Tuples

Tuples are declared in parentheses. They are immutable, ordered (thus indexable), and possibly heterogeneous collections of values.

Tuples are ordered:

(2, 3) == (3, 2)
False

This means that they are indexable (sliceable, etc.):

(2, 4, 6)[2]
6
(2, 4, 6)[::-1]
(6, 4, 2)

They can be nested:

type((3, 1, (0, 2)))
tuple
len((3, 1, (0, 2)))
3
max((3, 1, 2))
3

They can be heterogeneous:

type(('string', 2, True))
tuple

You can create empty tuples:

type(())
tuple

You can also create singleton tuples, but the syntax is a bit odd:

# This is not a tuple...
type((1))
int
# This is the weird way to define a singleton tuple
type((1,))
tuple

However, the big difference with lists is that tuples are immutable:

T = (2, 5)
T[0] = 8
TypeError: 'tuple' object does not support item assignment

Tuples are quite fascinating:

a, b = 1, 2
a, b
(1, 2)
a, b = b, a
a, b
(2, 1)

Sets

Sets are declared in curly brackets. They are mutable, unordered (thus non indexable), possibly heterogeneous collections of unique values.

Sets are unordered:

{2, 4, 1} == {4, 2, 1}
True

Consequently, it makes no sense to index a set.

Sets can be heterogeneous:

type({2, 'a', 'string'})
set

There are no duplicates in a set:

{2, 'a', 'string', 'a'}
{2, 'a', 'string'}

You can define an empty set, but only with the set function since empty curly braces define a dictionary:

type({})
dict
type(set())
set

Since strings an iterables, you can use set to get a set of the unique characters:

set('abba')
{'a', 'b'}

Your turn:

How could you create a set with the single element 'abba' in it?

Dictionaries

Dictionaries are declared in curly braces. They are mutable and unordered collections of key/value pairs. They play the role of an associative array.

Dictionaries are unordered:

{'a': 1, 'b': 2} == {'b': 2, 'a': 1}
True

Consequently, the pairs themselves cannot be indexed. However, you can access values from a dictionary:

D = {'a': 1, 'b': 3, 'c': 2}
D['b']
3
D.get('b')
3
D.items()
dict_items([('a', 1), ('b', 3), ('c', 2)])
D.values()
dict_values([1, 3, 2])
D.keys()
dict_keys(['a', 'b', 'c'])

To return a sorted list of keys:

sorted(D)
['a', 'b', 'c']

As we saw earlier, you can create empty dictionaries:

E = {}
type(E)
dict

Dictionaries are mutable, so you can add, remove, or replace items:

E['key1'] = 'value1'
E
{'key1': 'value1'}
E.pop('key1')
E
{}
print(D)
del D['b']
D
{'a': 1, 'b': 3, 'c': 2}
{'a': 1, 'c': 2}

Conversion between collections

list((3, 8, 1))
[3, 8, 1]
tuple([3, 1, 4])
(3, 1, 4)
set((3, 2, 3, 3))
{2, 3}
set(['a', 2, 4])
{2, 4, 'a'}

Collections module

Python has a built-in collections module providing the additional data structures: deque, defaultdict, namedtuple, OrderedDict, Counter, ChainMap, UserDict, UserList, and UserList, but we will not cover these in this workshop.