CS 1 Reading 15: Dictionaries

Overview

A dictionary is a new kind of Python data type. Dictionaries are fantastically useful and are found in nearly all Python programs. Once you learn what dictionaries are and how they work, you won’t want to program without them.

Before we describe what dictionaries are, let’s describe a problem they can solve.

Example problem: phone number database

You want to keep track of your friends' phone numbers. But since you have so many friends, this is a difficult job! How can the computer help?

For each friend, you need to store:

the name of the friend
their phone number

Also, you want to be able to retrieve the phone number for a given friend. Given what you know now, how can you do this?

Using a list

You could create a list of names and phone numbers:

phone_numbers = ['Joe', '567-8901', 'Jane', '123-4567', ...]

...but it would not be easy to find the number corresponding to a different name. It would be better if a name and the corresponding phone number were associated in some way.

Using a list of tuples

You could have a list of (<name>, <phone number>) tuples:

phone_numbers = [('Joe', '567-8901'), ('Jane', '123-4567'), ...]

Let’s see what we would need to do in order to find the phone number corresponding to a particular name e.g. 'Alex'.

We could write the code like this:

for (name, num) in friends:
    if name == 'Alex':
        print('Phone number: {}'.format(num))

This is not too bad, but:

We can’t modify the phone number! (Tuples are immutable.) We might use lists instead of tuples, but...
We might have to look through the entire list (in the worst case) to find one number, which is inefficient.

Using a dictionary

The Right Thing^TM to do in cases like this is to use a dictionary. So let’s talk about dictionaries and what makes them so awesome.

Keys and values

A dictionary (sometimes called a dict for short) is a Python data structure that associates keys with values. Each key is associated with exactly one value. (Sometimes this is called a mapping between keys and values.) Dictionaries allow you to do these things:

find the value corresponding to a particular key
change the value associated with a key
add new key/value pairs
delete key/value pairs

and they’re fast! (Much faster than a list of tuples, for instance.) ^[1]

Because we can add key/value pairs to a dictionary and delete key/value pairs from a dictionary, dictionaries, like lists, are not immutable.

There are two rules for keys and values:

The values in a dictionary can be any Python value.
The keys in a dictionary can be any kind of immutable Python value. ^[2]

Since strings are immutable, we can use strings as dictionary keys. You can also use numbers, tuples, and other kinds of values we haven’t seen yet. ^[3] In the example above, we can use names as keys and phone numbers as values.

Dictionary syntax

We want to create a dictionary from our friends' names and phone numbers. First, we have to know the syntax of dictionaries.

Empty dictionary

Dictionaries use curly braces, and the simplest dictionary is the empty one, which looks like this:

{}

It’s a dictionary with no key/value pairs. Pretty exciting!

Actually, though, empty dictionaries, like empty lists, are very useful. Often you start with an empty dictionary and then fill it up element-by-element in a loop, adding a new key/value pair for every iteration of the loop.

Non-empty dictionary

Alternatively, you can create a dictionary by writing out the key/value pairs inside of curly braces, separated in two ways:

different key/value pairs are separated by commas
the key and the value in a single key/value pair are separated by a colon (:) ^[4]

For our example, here is the dictionary we can create:

phone_numbers = { 'Joe' : '567-8901', 'Jane' : '123-4567' }

If there are more key/value pairs, we can add them too. You can see that the first key/value pair is 'Joe' : '567-8901' and the second is 'Jane' : '123-4567'. The keys are 'Joe' and 'Jane' and the values are '567-8901' and '123-4567'. The spaces in the dictionary are not required, but they help to keep it readable.

Most of the time, when we write out a dictionary like this (called a literal dictionary), the keys and values are Python values, but they can also be Python expressions. Here’s a contrived example.

phone_numbers = { 'J' + 'oe' : '567' + '-8901', 'Ja' + 'ne' : '123-' + '4567' }

This would give the same dictionary as the previous code.

In cases like this (which are very rare), the key expressions and the value expressions are evaluated before the dictionary is created. (It’s not that rare to have computed values, but computed keys are very unusual.)

Dictionary types

The only restriction on the types of keys or values in a dictionary is that the key must be immutable i.e. its type must be the type of an immutable Python object. Other than that, a dictionary can have any type of key or value.

In particular, a single dictionary can have different types of (immutable) keys, and different types of values. This is a bit unusual, but sometimes it’s quite useful. So this is legal:

mydict = { 1 : 'foo', 'bar' : [1, 2, 3], ('baz', 'boom') : 3.14159 }

The mydict dictionary has three different (immutable) key types, and three different value types.

You may have heard of the JSON data format, which is a way of formatting structured data which is used a lot by internet applications. A JSON object is almost identical to a Python dictionary with string keys and different types of values. Python, like most languages, has a JSON library (actually more than one).

Getting a value given a key

The most common thing to do with a dictionary is to look up the value that corresponds to a particular key. We’ll assume this dictionary again:

phone_numbers = { 'Joe' : '567-8901', 'Jane' : '123-4567' }

To get Joe’s phone number, all we have to write is this:

phone_numbers['Joe']

which will evaluate to '567-8901'.

Notice that phone_numbers['Joe'] looks like accessing a list with a value of 'Joe'. Python is overloading the meaning of the square brackets! Before this, the value inside the brackets could only be an integer. But with a dictionary, it can be any key value (which means any immutable Python value). Python really likes to re-use its syntax for distinct but similar things!

Changing a value at a key

Another thing you commonly want to do with a dictionary is to change the value associated with a particular key. For instance, let’s say that Joe’s phone number changes. We can change the dictionary value too:

phone_numbers['Joe'] = '314-1592'   # cool new phone number!

This is just like the syntax for changing a list value, except that the "index" is a string, not a number. Here’s the new dictionary:

>>> phone_numbers
{'Joe': '314-1592', 'Jane': '123-4567'}

Adding a new key/value pair

Another very common thing to do with dictionaries is to add new key/value pairs. The syntax for this is identical to the syntax for changing the values at existing keys, except that the keys are not in the dictionary until after you add them. For instance, let’s say that you just made a new friend named Bob, and you wanted to add his phone number. No problem!

phone_numbers['Bob'] = '000-0000'

Now when you look at the entire dictionary, you see this:

>>> phone_numbers
{'Joe': '314-1592', 'Jane': '123-4567', 'Bob': '000-0000'}

Even though it looks like the key/value pairs are stored in the order they were added, you shouldn’t depend on this. Python dictionaries are not sequences. The current implementation does keep keys in "insertion order", but earlier versions of Python dictionaries didn’t, and this might change again in the future.

This is one way in which a dictionary is very different from e.g. a list. With a list, you can’t add new entries like this.

>>> lst = [0,1,2,3,4]
>>> lst[5] = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

Instead, you have to use lst.append(5).

Accessing a nonexistent key

What happens when you try to access a nonexistent key in a dictionary?

>>> phone_numbers['Mike']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Mike'

Python raises a KeyError exception. This is the right thing to do. ^[5]

Deleting a key/value pair: the `del` statement

It’s not that common, but sometimes we want to delete a key/value pair. Let’s say you had a falling out with your new friend Bob, and you decide you don’t ever want to talk to him again. You might want to delete his phone number from your phone_number dictionary. Here’s how to do it:

>>> phone_numbers
{'Joe': '314-1592', 'Jane': '123-4567', 'Bob': '000-0000'}
>>> del phone_numbers['Bob']
>>> phone_numbers
{'Joe': '314-1592', 'Jane': '123-4567'}

The new keyword del is short for "delete". Given a key, it removes the key/value pair that the key is part of from the dictionary. This is not a function or method call! del is actually a special Python statement. Because it isn’t a function call, you don’t have to put parentheses around its argument (and you shouldn’t).

del can remove elements from things other than dictionaries (e.g. lists) but it’s more useful with dictionaries than with lists. We will meet del again in future readings.

Back to the example: tuples as keys

Let’s improve the example by using a tuple of first and last names as keys:

phone_numbers = {
  ('Joe', 'Smith') : '567-8910',
  ('Jane', 'Doe') : '123-4567',
  ('El', 'Hovik') : '000-0000',
  ('Mike', 'Vanier') : '111-1111',
}

(Fun fact: we don’t have to use the \<return> line continuation characters at the ends of the lines when writing out a dictionary like this.)

It’s OK to use a tuple of strings as a dictionary key, because both tuples and strings are immutable, so a tuple of strings is immutable too. If we had e.g. a tuple of lists, that would not be immutable, so you couldn’t use it as a key. Similarly, a list of strings is not an acceptable dictionary key. Let’s try it anyway:

>>> phone_numbers = { ['Joe', 'Smith'] : '567-8910', ['Jane', 'Doe'] : '123-4567' }
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

The error message unhashable type: 'list' means that since lists are mutable, they can’t be used as dictionary keys. ^[6]

OK, so we’ll use tuples. Once we’ve done this, we can access a value corresponding to a tuple:

>>> phone_numbers[('Joe', 'Smith')]
'567-8910'

We have to use the entire tuple; either the first or last name doesn’t work:

>>> phone_numbers['Joe']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Joe'
>>> phone_numbers['Smith']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Smith'

Dictionaries and `for` loops

We’ve seen many things that can be looped over using for loops:

lists
strings
files

So it shouldn’t surprise you to learn that dictionaries can also be looped over in a for loop. The way it works is that when you have a dictionary in a for loop following the in keyword, you loop over the keys of the dictionary. For instance, we could write this loop:

for key in phone_numbers:
    print(f'key: {key}, value: {phone_numbers[key]}')

which would print:

key: ('Joe', 'Smith'), value: 567-8910
key: ('Jane', 'Doe'), value: 123-4567
key: ('El', 'Hovik'), value: 000-0000
key: ('Mike', 'Vanier'), value: 111-1111

We could use this to print out the phone numbers of every person in the dictionary whose first name is 'Joe':

for key in phone_numbers:
    (first_name, last_name) = key
    if first_name == 'Joe':
        print(f'name: {first_name} {last_name}, number: {phone_numbers[key]}')

Since there is only one 'Joe' in the dictionary, this will print:

name: Joe Smith, number: 567-8910

Dictionary methods

Dictionaries are objects in Python (like lists, and strings, and files) Therefore, they have methods. In this section, we’ll discuss a few of the most important ones. For a full list of dictionary methods, consult the Python documentation.

`get`

If you try to get the value in a dictionary corresponding to a key which isn’t in the dictionary, normally this results in a KeyError exception:

>>> phone_numbers[('William', 'Shakespeare')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: ('William', 'Shakespeare')

Instead of this, you can use the get method if you would rather return a default value:

>>> phone_numbers.get(('William', 'Shakespeare'), 'unknown')
# 'unknown' is the default value returned if the key isn't in the dictionary
'unknown'

This is usually not what you want to do, but we’ll see an example below where this method is useful.

`clear`

If you want to empty out an existing dictionary, you can do that with the clear method:

>>> phone_numbers
{('Joe', 'Smith'): '567-8910', ('Jane', 'Doe'): '123-4567', ('El', 'Hovik'): '000-0000', ('Mike', 'Vanier'): '111-1111'}
>>> phone_numbers.clear()
>>> phone_numbers
{}

This is rarely needed; it’s actually easier to just do this:

>>> phone_numbers = {}

On the other hand, if the dictionary was passed in as an argument to a function or was part of a larger data structure, you might need to use the clear method if you need to empty it out.

`keys` and `values`

If you need the dictionary’s keys or values as a separate thing, you can use the keys or values methods. These return (respectively) a dict_keys or a dict_values object. These act basically like iterators, and they can easily be converted to lists:

phone_numbers = {
  ('Joe', 'Smith') : '567-8910',
  ('Jane', 'Doe') : '123-4567',
  ('El', 'Hovik') : '000-0000',
  ('Mike', 'Vanier') : '111-1111',
}
>>> phone_numbers.keys()
dict_keys([('Joe', 'Smith'), ('Jane', 'Doe'), ('El', 'Hovik'), ('Mike', 'Vanier')])
>>> list(phone_numbers.keys())
[('Joe', 'Smith'), ('Jane', 'Doe'), ('El', 'Hovik'), ('Mike', 'Vanier')]
>>> phone_numbers.values()
dict_values(['567-8910', '123-4567', '000-0000', '111-1111'])
>>> list(phone_numbers.values())
['567-8910', '123-4567', '000-0000', '111-1111']

It’s rare that you actually need these methods.

`items`

The items method is like the keys and values methods combined: it returns a dict_items object which can be converted to a list of key/value pairs:

>>> phone_numbers.items()
dict_items([(('Joe', 'Smith'), '567-8910'), (('Jane', 'Doe'), '123-4567'), (('El', 'Hovik'), '000-0000'), (('Mike', 'Vanier'), '111-1111')])
>>> list(phone_numbers.items())
[(('Joe', 'Smith'), '567-8910'), (('Jane', 'Doe'), '123-4567'), (('El', 'Hovik'), '000-0000'), (('Mike', 'Vanier'), '111-1111')]

Sometimes the items method can be used to good effect in a for loop.

>>> for (key, value) in phone_numbers.items():
...     print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 123-4567
('El', 'Hovik') 000-0000
('Mike', 'Vanier') 111-1111

You usually don’t need to convert the items return value into a list, and you normally shouldn’t. (In this respect, the items method is similar to the range function.)

`update`

The update method adds the key/value pairs from another dictionary into a dictionary, overwriting old values if the other dictionary has the same keys with different values.

>>> for (key, value) in phone_numbers.items():
...     print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 123-4567
('El', 'Hovik') 000-0000
('Mike', 'Vanier') 111-1111
>>> new_phone_numbers = {
...     ('Bob', 'Johnson') : '543-9876',
...     ('Jane', 'Doe') : '7654-321'
... }
>>> phone_numbers
{('Joe', 'Smith'): '567-8910', ('Jane', 'Doe'): '123-4567', ('El', 'Hovik'): '000-0000', ('Mike', 'Vanier'): '111-1111'}
>>> phone_numbers.update(new_phone_numbers)
>>> phone_numbers
{('Joe', 'Smith'): '567-8910', ('Jane', 'Doe'): '7654-321', ('El', 'Hovik'): '000-0000', ('Mike', 'Vanier'): '111-1111', ('Bob', 'Johnson'): '543-9876'}
>>> for (key, value) in phone_numbers.items():
...     print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 7654-321
('El', 'Hovik') 000-0000
('Mike', 'Vanier') 111-1111
('Bob', 'Johnson') 543-9876

We see that updating the phone_numbers dictionary with new_phone_numbers has provided a new phone number for Bob Johnson and has overwritten the old phone number for Jane Doe.

What about `append`?

There is no append method for dictionaries, because it’s not needed! To add a new key/value pair, just use normal assignment syntax:

>>> phone_numbers[('Don', 'Knuth')] = '271-8281'
>>> for (key, value) in phone_numbers.items():
...     print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 7654-321
('El', 'Hovik') 000-0000
('Mike', 'Vanier') 111-1111
('Bob', 'Johnson') 543-9876
('Don', 'Knuth') 271-8281

The `in` operator

Previously we’ve seen the in operator for sequences. We can also use in with dictionaries. <key> in <dictionary> means: is the key <key> one of the keys in the dictionary <dictionary>?

>>> ('Don', 'Knuth') in phone_numbers
True
>>> ('Bill', 'Gates') in phone_numbers
False

Example: creating a frequency table

OK, let’s do something useful!

We have a list of words. We want to create a frequency table for each word, which means that for each word, we want to record the number of times it occurs in the word list.

We will solve this by creating a dictionary:

key: a word in the list
value: the count of that word

Let’s write the code, and also print out the resulting table at the end.

words = ['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']
freqs = {}
for word in words:
    if word in freqs:
        freqs[word] += 1
    else:  # first time we've seen that word
        freqs[word] = 1
for (key, value) in freqs.items():
    print(f'Word: {key} occurs: {value} times')

This prints:

Word: to occurs: 2 times
Word: be occurs: 2 times
Word: or occurs: 1 times
Word: not occurs: 1 times
Word: that occurs: 1 times
Word: is occurs: 1 times
Word: the occurs: 1 times
Word: question occurs: 1 times

See how easy that was? Dictionaries can make many programming tasks much easier to accomplish.

Of course, it’s pretty rare to find any code that can’t be improved somewhere... What can we do here?

Remember the get method we described above? The idea there was that if the key wasn’t in the dictionary, we would supply a default value to return. Here, we have a similar situation, except that we are setting the values in a dictionary. But if you look closely, you’ll see that the line

        freqs[word] += 1

is equivalent to:

        freqs[word] = freqs[word] + 1

which means that this line is both getting a value from a dictionary at a particular key and setting the value at the same key.

The trick to making this code simpler is to realize that when the key isn’t in the dictionary, we can use the get method to just return a count of 0. Then the code simplifies to this:

words = [...]  # as before
freqs = {}
for word in words:
    freqs[word] = freqs.get(word, 0) + 1
for (key, value) in freqs.items():
    print(f'Word {key} occurs {value} times')

We aren’t using the += operator any more, so line 4 is longer, but we’ve eliminated the if statement entirely. This counts as a win.

Conclusion

You may think that this is just another reading, but if you continue programming in Python we guarantee you that dictionaries will be one of the most useful things you ever learn. They are used everywhere, and learning to use them effectively will take you a long way towards becoming a good Python programmer.

[End of reading]

1. Dictionaries allow you to look up the value corresponding to a key in constant time, which means that the time to retrieve a value given the key doesn’t depend on how many key/value pairs are stored in the dictionary. In contrast, searching for a key/value pair in a list of tuples takes linear time i.e. time proportional to the size of the list. You’ll learn much more about these things in CS 2.

2. The reasons for this are technical, having to do with the way that dictionaries are implemented. Internally, they use what are called hash tables, and for a hash table to work correctly, the hash value of a dictionary key shouldn’t change, which means that the key itself shouldn’t change. You will meet hash tables again in CS 2, CS 11 C track, and many other courses.

3. Like immutable sets of values. Don’t worry, we’ll get there.

4. Yes, here it is, yet another meaning for the poor overused colon character.

5. Some languages *cough* *Javascript* return a special "undefined" value, which creates problems because programmers rarely check for "undefined" after doing every dictionary access.

6. If you read the previous footnotes, you won’t be surprised to see the word "unhashable", which relates to hash tables, which is the data structure used to implement dictionaries.