CS 1 Reading 13: range and list slices

Overview

This reading supplements Lab02 on Lists and Loops, extending what we have learned so far about lists and loops. In particular, you will learn about how to "slice" list sequences to extract a "sublist" using slice notation and how to use the range function to loop over some desired number of times in a for loop. The enumerate function will also be introduced.

These topics are very Python-specific (for example, Java doesn't support slicing or range), but that doesn’t mean they aren’t important. They can make your programs more concise and easier to understand, and they are well worth learning.

Topics

More about the in operator
The range function
The enumerate function
Sequence slices

The `in` operator

We have seen the Python keyword in in the context of a for expression:

>>> for i in [1, 2, 3]:
...     print(i)
...
1
2
3

(Note the ... secondary prompt, by the way.)

However, in has a completely different meaning when used as an operator all by itself (i.e. when it is put between Python expressions, but not in a line with for). In this case, it is a test to see if a Python value is found inside a data structure (like a list) that contains other Python values.

For instance,

1 in [1, 2, 3]

means: "does 1 occur in the list [1, 2, 3]?", and

't' in 'Caltech'

means: "does the character 't' appear in the string 'Caltech'?" in used as an operator this way returns a True/False value:

>>> 1 in [1, 2, 3]
True
>>> 0 in [1, 2, 3]
False
>>> 't' in 'Caltech'
True
>>> 'z' in 'Caltech'
False

With strings, you can do even more: you can test if a string is found anywhere inside another string:

>>> 'alt' in 'Caltech'
True
>>> 'Caltech' in 'Caltech'
True
>>> 'MIT' in 'Caltech'
False

This doesn’t work for lists:

>>> [1, 2] in [1, 2, 3]
False

This is because a list could conceivably have another list as one of its elements, whereas a string can only be made up of individual characters.

You can use the in operator with variables, too:

>>> x = 1
>>> x in [1, 2, 3]
True
>>> y = [1, 2, 3]
>>> x in y
True
>>> 1 in y
True

Be aware that in used as an operator has nothing to do with in used in a for loop! Python is overloading the meaning of the keyword in to do two completely different things. Most of the time, this is obvious, but since in used as an operator returns a True/False (boolean) value, you often see it used in an if statement e.g.

if x in [1, 2, 3]:
    print('Found!')

This might be confusing, because you are seeing the if in a position where a for is more typical. You can even have both forms next to each other:

for line in lines:    # for loop
    if 'Z' in line:   # in used as an operator
        print('Found a Z!')

The `range` function

The built-in function range is used to generate a sequence of consecutive integers. Very often, these are intended to be used inside a for loop.

>>> for n in range(0, 5):
...     print(n)
...
0
1
2
3
4

If you use the range function outside of a for loop, it will just return a range object.

>>> range(0, 5)
range(0, 5)

The output here isn’t a string! It’s just the way that Python represents the range object as a string (like if you did str(range(0, 5))).

A range object contains an iterator (like file objects do) that can produce (in this case) the numbers 0, 1, 2, 3, and 4 in order. If we want, we can convert a range object to a list: ^[3]

>>> list(range(0, 5))
[0, 1, 2, 3, 4]

We will often use this trick to show exactly what values range is capable of returning. However, most of the time we use range without converting it to a list.

If you unnecessarily convert a range object to a list, it will probably work, but you will lose marks for writing unnecessary code. (It’s also slower, and wastes space if the range is very large.)

range is a flexible function. Before we explain exactly how it works, here are a "range" of examples:

>>> list(range(0, 5))
[0, 1, 2, 3, 4]
>>> list(range(10, 20))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> list(range(-10, -5))
[-10, -9, -8, -7, -6]

`range` with two arguments

range's arguments are always integers. If there are two integer arguments, they represent the endpoints of the range. Specifically:

range(m, n)

means to create a range

starting with and including the integer m
ending with and excluding the integer n

Another way we say this is that range(m, n) creates a range of integers going "from" m and "up to but not including" n.

So list(range(0, 5)) results in [0, 1, 2, 3, 4], and not [0, 1, 2, 3, 4, 5]. (Forgetting this is a common beginner’s error.) This may seem unintuitive or even wrong, but as you’ll see, it turns out to be the most natural choice.

range's arguments must be integers, but they don’t have to be positive integers. For instance:

>>> list(range(-10, -5))
[-10, -9, -8, -7, -6]

This creates a range that starts on the first argument (-10) and goes up to the second argument (-5) without including it. So the last element in the range is -6.

A puzzle

What does this range expression return? (Type this into the Python interpreter.)

list(range(10, 1))

Does this make sense given what we have already told you? What about list(range(10, 10))?

`range` with 3 arguments

The range function is a bit unusual in that it can take 1, 2, or 3 arguments. ^[4] With 3 arguments, the first two mean the same thing as they do for range with 2 arguments. The last argument is the step size, which means how much to increase the range value at each step. Again, we’ll convert ranges to lists for illustration only:

>>> list(range(0, 10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(0, 10, 2))
[0, 2, 4, 6, 8]
>>> list(range(0, 10, 3))
[0, 3, 6, 9]

Notice that each pair of consecutive elements in range objects created from the 3-argument form of range differ by the step size. If the third argument is not provided, Python assumes you want a step size of 1.

When using the third argument, the rule for when to end the range is simple: if the next element in the range is equal to or greater than the second argument, don’t include it. So it’s still "starting from the first argument, going up to but not including the second argument".

You can even have negative step sizes:

>>> list(range(10, 0, -1))
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

This doesn’t include 0 at the end because we are going "up to but not including" 0.

`range` with 1 argument

range is actually most often used with a single argument only. We said above that range with two arguments assumes that the (missing) step size argument is 1. Similarly, range called with only one argument assumes that the starting point is 0.

To recap: range "really" takes three arguments. If there are only two, the last one "defaults" to 1. If there is only one argument, the (missing) first argument defaults to 0, and the (missing) last argument defaults to 1. The endpoint argument always has to be included.

>>> list(range(0, 10, 1))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(0, 10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

All three forms of range shown here generate the same range. So if the starting point of your range is 0 and the step size is 1, you should use the one-argument form of range.

`range` in `for` loops

range has lots of uses, but it’s most commonly used in a for loop to generate a sequence of consecutive integers. We saw this above:

>>> for n in range(0, 5):
...     print(n)
...
0
1
2
3
4

More usefully, we can use range to generate the indices of a list:

>>> cheer = ['Caltech', 'is', 'great']
>>> for i in range(0, 3):
...     print('{} -- YEAH!'.format(cheer[i]))
...
Caltech -- YEAH!
is -- YEAH!
great -- YEAH!

Notice that the choice to exclude the endpoint for the range works perfectly with the way that lists are indexed! The valid list indices for the cheer list are 0, 1, and 2 because it’s a list of length 3 — and these are exactly the integers that range(0, 3) generates.

This is OK, but notice that we had to put the length of the list (3) directly into the call to range. This is ugly! Fortunately, it’s easy to fix.

The `range(len(...))` idiom

Since you can get the length of a list by using the len function, we can improve this code as follows:

>>> cheer = ['Caltech', 'is', 'great']
>>> for i in range(0, len(cheer)):
...     print('{} -- YEAH!'.format(cheer[i]))
...
Caltech -- YEAH!
is -- YEAH!
great -- YEAH!

Also, since the first argument to range is 0, we can drop it:

>>> cheer = ['Caltech', 'is', 'great']
>>> for i in range(len(cheer)):
...     print('{} -- YEAH!'.format(cheer[i]))
...
Caltech -- YEAH!
is -- YEAH!
great -- YEAH!

This range(len(...)) pattern is a common idiom in Python. ^[5] The nice thing about this is that even if cheer was changed, the for loop wouldn’t have to change:

>>> cheer = ['Caltech', 'is', 'really', 'really', 'great']
>>> for i in range(len(cheer)):
...     print('{} -- YEAH!'.format(cheer[i]))
...
Caltech -- YEAH!
is -- YEAH!
really -- YEAH!
really -- YEAH!
great -- YEAH!

Notice, though, that in this case you don’t actually need to use range(len(...)) in this example, because you don’t need the indices of the list:

>>> cheer = ['Caltech', 'is', 'really', 'really', 'great']
>>> for word in cheer:
...     print('{} -- YEAH!'.format(word))
...
Caltech -- YEAH!
is -- YEAH!
really -- YEAH!
really -- YEAH!
great -- YEAH!

However, if you need to change an element in a list that you are looping over, you will need its index, and in that case, the range(len(...)) idiom is useful. In fact, it used to be one of the standard ways to iterate through a list, but now we prefer to use the enumerate function (described below).

Here’s a simple example of changing the elements of a list:

# We want to double each element in a list.
nums = [23, 12, 45, 68, -101]
for i in range(len(nums)):
    nums[i] = nums[i] * 2

This works, but we would normally shorten it using the *= operator:

# We want to double each element in a list.
nums = [23, 12, 45, 68, -101]
for i in range(len(nums)):
    nums[i] *= 2

The `enumerate` function

Earlier, we saw this code:

# We want to double each element in a list.
nums = [23, 12, 45, 68, -101]
for i in range(len(nums)):
    nums[i] *= 2

The purpose of range(len(nums)) is to produce all the valid indices of the nums list (which we know are 0 to 4). This seems like a lot of work for something so simple.

A different, and more modern, way to write this code is as follows:

nums = [23, 12, 45, 68, -101]
for (i, e) in enumerate(nums):
    nums[i] *= 2

What the built-in enumerate function does is to take a sequence and generate tuples of indices (i) and elements (e) one at a time. So the first time through the loop body, i will be 0 and e will be 23; the second time i will be 1 and e will be 12, and so on.

Since this is a tuple unpacking, we can leave off the parentheses around (i, e), and some programmers think this looks better:

nums = [23, 12, 45, 68, -101]
for i, e in enumerate(nums):
    nums[i] *= 2

We prefer to keep the parentheses.

If you use enumerate directly, you will see that it returns an enumerate object:

>>> enumerate(nums)
<enumerate object at 0x107cc3a80>

Like a list or a range object, an enumerate object contains an iterator. This iteration generates the (i, e) tuples one at a time, and it can be used in a for loop like any other iterator. If you want to see what an enumerate object will generate, you can convert it to a list:

>>> list(enumerate(nums))
[(0, 23), (1, 12), (2, 45), (3, 68), (4, -101)]

However, don’t do this when using an enumerate in a for loop, since it’s totally unnecessary.

Getting back to our example:

nums = [23, 12, 45, 68, -101]
for (i, e) in enumerate(nums):
    nums[i] *= 2

You might have noticed that we don’t use the e variable anywhere. It’s just there to make enumerate happy. What if we just left it off?

nums = [23, 12, 45, 68, -101]
for i in enumerate(nums):
    nums[i] *= 2

This is not a syntax error, but it won’t work either. In this case, the i variable will have the entire tuple assigned to it, so the first value of i would be (0, 23). This obviously will make the line nums[i] *= 2 fail.

If you want to say "I know there is supposed to be a variable here, but I don’t need it", the standard way to do that is to use the variable name _, which means "I don’t care about this variable". Our example then becomes:

nums = [23, 12, 45, 68, -101]
for (i, _) in enumerate(nums):
    nums[i] *= 2

This is really not much of an improvement over the range(len(nums)) code, but it is the preferred way to write this. It would be nice if there was a variant of enumerate that only returned the indices.

In fact, you could easily write one:

def enum(iterable):
    return range(len(iterable))

and then you could re-write the example as:

nums = [23, 12, 45, 68, -101]
for i in enum(nums):
    nums[i] *= 2

but there is no enum-like function in the Python standard libraries as far as we know.

Sequence slices

This is a long reading, but we’ve saved the best for last. It’s very common, when working with sequences, to want to get more than one element from the sequence. For instance, you might have a DNA sequence like this:

seq = 'ATTGGCGCGTTA'

and you might want to get the subsequence starting from index 3 and going up to (but not including) index 9. This would be the sequence 'GGCGCG'. Python allows you to get this all at once using a sequence slice, which is a copy of part of the sequence:

>>> seq = 'ATTGGCGCGTTA'
>>> seq[3:9]  # seq[3:9] is a sequence slice
'GGCGCG'

This works for all kinds of sequences, not just strings:

>>> lst = [1, 2, 3, 4, 5]
>>> lst[1:4]
[2, 3, 4]

Slice syntax

A sequence slice (which we’ll just call a slice from now on), has this syntax:

seq[start:end]

where:

seq is a sequence
start is the integer index of the first location of the slice
end is the integer index that is one location beyond the last location in the slice
the colon character (:) separates the start and end parts

Note that this is yet another special meaning for the poor colon character. ^[10]

The start and end indices are optional. If start is not included, it defaults to 0. If end is not included, it defaults to the length of the sequence.

Some examples:

>>> lst = [10, 20, 30, 40, 50, 60]
>>> lst[1:5]
[20, 30, 40, 50]
>>> lst[0:6]  # the whole list
[10, 20, 30, 40, 50, 60]
>>> lst[1:]   # all but the first element
[20, 30, 40, 50, 60]
>>> lst[:5]  # all but the last element
[10, 20, 30, 40, 50]
>>> lst[:]    # the entire list
[10, 20, 30, 40, 50, 60]
>>> tup = (4, 8, 10, 25, 46)
>>> tup[1:3]
(8, 10)
>>> tup[1:2]
(8,)
>>> tup[1:1]
()
>>> s = 'this is a test'
>>> s[4:8]
>>> ' is '
>>> s[4:]
>>> ' is a test'
>>> s[:4]
>>> 'this'

Remember that a slice is a copy of part of a sequence, so e.g. lst[:] is a very simple way to make a copy of a list. (You can also write lst.copy().)

You can use negative indices too:

>>> lst = [10, 20, 30, 40, 50, 60]
>>> lst[:-1]  # all but the last element
[10, 20, 30, 40, 50]
>>> lst[-2:]  # last two elements
[50, 60]
>>> lst[-5:-3]
[20, 30]

One common application of this is to remove the newline character of a string which is read in from a file using the readline method:

file = open('nums.txt', 'r')
line = file.readline()
line = line[:-1]  # remove newline

The start and end indices of a slice don’t have to be literal integers; they can be expressions that evaluate to integers.

>>> lst = [10, 20, 30, 40, 50, 60]
>>> n = 2
>>> lst[n-1:n+2]
[20, 30, 40]

In this case, the expressions on either side of the colon are evaluated before the slice is computed.

If the slice’s final index is greater than the index of the last element, the slice ends at the last element.

>>> lst = [10, 20, 30, 40, 50, 60]
>>> lst[2:3000]
[30, 40, 50, 60]

Note that this is not an error.

Wrapping up and looking forward

Even though this was a long reading, there are still more "odds and ends" of Python that we haven’t covered. We will see more of these later on in the course.

[End of reading]

1. Some people view this as a language wart i.e. something in the language that ought to be changed. Do we feel that way? No comment.

3. Back in the long-ago days of Python 2, range actually did return a list.

4. Python allows you to define functions like this that can take varying numbers of arguments, although this is rarely needed. Later in the course, we’ll show you how to do this with your own functions.

5. Idiom just means "a typical way to write something".

7. Actually, go ahead and try it! You’ll see what we mean.

8. It has to do with dictionaries, if you must know. Stay tuned!

10. When we talk about dictionaries, you’ll see that colons have another special meaning there too.

CS 1 Reading 13: range and Sequence Slicing

Overview

Topics

The in operator

The range function

range with two arguments

A puzzle

range with 3 arguments

range with 1 argument

range in for loops

The range(len(...)) idiom

The enumerate function