Overview
This reading supplements Lab02 on Lists and Loops, extending what we have learned so far about lists and loops.
In particular, you will learn about how to "slice" list sequences to extract a "sublist" using slice notation and
how to use the range
function to loop over some desired number of times in a for
loop.
The enumerate
function will also be introduced.
These topics are very Python-specific (for example, Java doesn't support slicing or range
), but that doesn’t mean they aren’t
important. They can make your programs more concise and easier to understand,
and they are well worth learning.
Topics
-
More about the
in
operator -
The
range
function -
The
enumerate
function -
Sequence slices
The in
operator
We have seen the Python keyword in
in the context of a for
expression:
>>> for i in [1, 2, 3]: ... print(i) ... 1 2 3
(Note the ...
secondary prompt, by the way.)
However, in
has a completely different meaning when used as an operator all
by itself (i.e. when it is put between Python expressions, but not in a line
with for
). In this case, it is a test to see if a Python value is found
inside a data structure (like a list) that contains other Python values.
For instance,
1 in [1, 2, 3]
means: "does 1
occur in the list [1, 2, 3]?
", and
't' in 'Caltech'
means: "does the character 't'
appear in the string 'Caltech'
?" in
used
as an operator this way returns a True
/False
value:
>>> 1 in [1, 2, 3] True >>> 0 in [1, 2, 3] False >>> 't' in 'Caltech' True >>> 'z' in 'Caltech' False
With strings, you can do even more: you can test if a string is found anywhere inside another string:
>>> 'alt' in 'Caltech' True >>> 'Caltech' in 'Caltech' True >>> 'MIT' in 'Caltech' False
This doesn’t work for lists:
>>> [1, 2] in [1, 2, 3] False
This is because a list could conceivably have another list as one of its elements, whereas a string can only be made up of individual characters.
You can use the in
operator with variables, too:
>>> x = 1 >>> x in [1, 2, 3] True >>> y = [1, 2, 3] >>> x in y True >>> 1 in y True
Be aware that if x in [1, 2, 3]: print('Found!') This might be confusing, because you are seeing the for line in lines: # for loop if 'Z' in line: # in used as an operator print('Found a Z!') |
The range
function
The built-in function range
is used to generate a sequence of consecutive
integers. Very often, these are intended to be used inside a for
loop.
>>> for n in range(0, 5): ... print(n) ... 0 1 2 3 4
If you use the range
function outside of a for
loop, it will just return a
range
object.
>>> range(0, 5) range(0, 5)
The output here isn’t a string! It’s just the way that Python represents
the range object as a string (like if you did str(range(0, 5)) ).
|
A range
object contains an iterator (like file objects do) that can produce
(in this case) the numbers 0
, 1
, 2
, 3
, and 4
in order. If we want,
we can convert a range
object to a list: [3]
>>> list(range(0, 5)) [0, 1, 2, 3, 4]
We will often use this trick to show exactly what values range
is capable of
returning. However, most of the time we use range
without converting it to a
list.
If you unnecessarily convert a |
range
is a flexible function. Before we explain exactly how it works, here
are a "range" of examples:
>>> list(range(0, 5)) [0, 1, 2, 3, 4] >>> list(range(10, 20)) [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] >>> list(range(-10, -5)) [-10, -9, -8, -7, -6]
range
with two arguments
range
's arguments are always integers. If there are two integer arguments,
they represent the endpoints of the range. Specifically:
range(m, n)
means to create a range
-
starting with and including the integer
m
-
ending with and excluding the integer
n
Another way we say this is that range(m, n)
creates a range of integers going
"from" m
and "up to but not including" n
.
So list(range(0, 5))
results in [0, 1, 2, 3, 4]
, and not [0, 1, 2, 3, 4,
5]
. (Forgetting this is a common beginner’s error.) This may seem
unintuitive or even wrong, but as you’ll see, it turns out to be the most
natural choice.
range
's arguments must be integers, but they don’t have to be positive
integers. For instance:
>>> list(range(-10, -5)) [-10, -9, -8, -7, -6]
This creates a range that starts on the first argument (-10
) and goes up to
the second argument (-5
) without including it. So the last element in the
range is -6
.
A puzzle
What does this range
expression return? (Type this into the Python
interpreter.)
list(range(10, 1))
Does this make sense given what we have already told you? What about
list(range(10, 10))
?
range
with 3 arguments
The range
function is a bit unusual in that it can take 1, 2, or 3 arguments.
[4] With 3 arguments, the
first two mean the same thing as they do for range
with 2 arguments. The
last argument is the step size, which means how much to increase the range
value at each step. Again, we’ll convert ranges to lists for illustration
only:
>>> list(range(0, 10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(range(0, 10, 2)) [0, 2, 4, 6, 8] >>> list(range(0, 10, 3)) [0, 3, 6, 9]
Notice that each pair of consecutive elements in range
objects created from
the 3-argument form of range
differ by the step size. If the third argument
is not provided, Python assumes you want a step size of 1
.
When using the third argument, the rule for when to end the range is simple: if the next element in the range is equal to or greater than the second argument, don’t include it. So it’s still "starting from the first argument, going up to but not including the second argument".
You can even have negative step sizes:
>>> list(range(10, 0, -1)) [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
This doesn’t include 0
at the end because we are going "up to but not
including" 0
.
range
with 1 argument
range
is actually most often used with a single argument only. We said above
that range
with two arguments assumes that the (missing) step size argument
is 1
. Similarly, range
called with only one argument assumes that the
starting point is 0
.
To recap: range
"really" takes three arguments. If there are only two, the
last one "defaults" to 1
. If there is only one argument, the (missing) first
argument defaults to 0
, and the (missing) last argument defaults to 1
. The
endpoint argument always has to be included.
>>> list(range(0, 10, 1)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(range(0, 10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(range(10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
All three forms of range
shown here generate the same range. So if the
starting point of your range is 0
and the step size is 1
, you should use
the one-argument form of range
.
range
in for
loops
range
has lots of uses, but it’s most commonly used in a for
loop to
generate a sequence of consecutive integers. We saw this above:
>>> for n in range(0, 5): ... print(n) ... 0 1 2 3 4
More usefully, we can use range
to generate the indices of a list:
>>> cheer = ['Caltech', 'is', 'great'] >>> for i in range(0, 3): ... print('{} -- YEAH!'.format(cheer[i])) ... Caltech -- YEAH! is -- YEAH! great -- YEAH!
Notice that the choice to exclude the endpoint for the range works perfectly
with the way that lists are indexed! The valid list indices for the cheer
list are 0
, 1
, and 2
because it’s a list of length 3 — and these are
exactly the integers that range(0, 3)
generates.
This is OK, but notice that we had to put the length of the list (3
) directly
into the call to range
. This is ugly! Fortunately, it’s easy to fix.
The range(len(...))
idiom
Since you can get the length of a list by using the len
function, we can
improve this code as follows:
>>> cheer = ['Caltech', 'is', 'great'] >>> for i in range(0, len(cheer)): ... print('{} -- YEAH!'.format(cheer[i])) ... Caltech -- YEAH! is -- YEAH! great -- YEAH!
Also, since the first argument to range
is 0
, we can drop it:
>>> cheer = ['Caltech', 'is', 'great'] >>> for i in range(len(cheer)): ... print('{} -- YEAH!'.format(cheer[i])) ... Caltech -- YEAH! is -- YEAH! great -- YEAH!
This range(len(...))
pattern is a common idiom in Python. [5] The nice thing about this is
that even if cheer
was changed, the for
loop wouldn’t have to change:
>>> cheer = ['Caltech', 'is', 'really', 'really', 'great'] >>> for i in range(len(cheer)): ... print('{} -- YEAH!'.format(cheer[i])) ... Caltech -- YEAH! is -- YEAH! really -- YEAH! really -- YEAH! great -- YEAH!
Notice, though, that in this case you don’t actually need to use
range(len(...))
in this example, because you don’t need the indices of the
list:
>>> cheer = ['Caltech', 'is', 'really', 'really', 'great'] >>> for word in cheer: ... print('{} -- YEAH!'.format(word)) ... Caltech -- YEAH! is -- YEAH! really -- YEAH! really -- YEAH! great -- YEAH!
However, if you need to change an element in a list that you are looping
over, you will need its index, and in that case, the range(len(...))
idiom is
useful. In fact, it used to be one of the standard ways to iterate through a
list, but now we prefer to use the enumerate
function (described below).
Here’s a simple example of changing the elements of a list:
# We want to double each element in a list. nums = [23, 12, 45, 68, -101] for i in range(len(nums)): nums[i] = nums[i] * 2
This works, but we would normally shorten it using the *=
operator:
# We want to double each element in a list. nums = [23, 12, 45, 68, -101] for i in range(len(nums)): nums[i] *= 2
The enumerate
function
Earlier, we saw this code:
# We want to double each element in a list. nums = [23, 12, 45, 68, -101] for i in range(len(nums)): nums[i] *= 2
The purpose of range(len(nums))
is to produce all the valid indices of the
nums
list (which we know are 0
to 4
). This seems like a lot of work for
something so simple.
A different, and more modern, way to write this code is as follows:
nums = [23, 12, 45, 68, -101] for (i, e) in enumerate(nums): nums[i] *= 2
What the built-in enumerate
function does is to take a sequence and generate
tuples of indices (i
) and elements (e
) one at a time. So the first time
through the loop body, i
will be 0
and e
will be 23
; the second time
i
will be 1
and e
will be 12, and so on.
Since this is a tuple unpacking, we can leave off the parentheses around (i,
e)
, and some programmers think this looks better:
nums = [23, 12, 45, 68, -101] for i, e in enumerate(nums): nums[i] *= 2
We prefer to keep the parentheses.
If you use enumerate
directly, you will see that it returns an enumerate
object:
>>> enumerate(nums) <enumerate object at 0x107cc3a80>
Like a list or a range
object, an enumerate
object contains an iterator.
This iteration generates the (i, e)
tuples one at a time, and it can be used
in a for
loop like any other iterator. If you want to see what an
enumerate
object will generate, you can convert it to a list:
>>> list(enumerate(nums)) [(0, 23), (1, 12), (2, 45), (3, 68), (4, -101)]
However, don’t do this when using an enumerate
in a for
loop, since it’s
totally unnecessary.
Getting back to our example:
nums = [23, 12, 45, 68, -101] for (i, e) in enumerate(nums): nums[i] *= 2
You might have noticed that we don’t use the e
variable anywhere. It’s just
there to make enumerate
happy. What if we just left it off?
nums = [23, 12, 45, 68, -101] for i in enumerate(nums): nums[i] *= 2
This is not a syntax error, but it won’t work either. In this case, the i
variable will have the entire tuple assigned to it, so the first value of i
would be (0, 23)
. This obviously will make the line nums[i] *= 2
fail.
If you want to say "I know there is supposed to be a variable here, but I don’t
need it", the standard way to do that is to use the variable name _
, which
means "I don’t care about this variable". Our example then becomes:
nums = [23, 12, 45, 68, -101] for (i, _) in enumerate(nums): nums[i] *= 2
This is really not much of an improvement over the range(len(nums))
code, but
it is the preferred way to write this. It would be nice if there was a variant
of enumerate
that only returned the indices.
In fact, you could easily write one: def enum(iterable): return range(len(iterable)) and then you could re-write the example as: nums = [23, 12, 45, 68, -101] for i in enum(nums): nums[i] *= 2 but there is no |
Sequence slices
This is a long reading, but we’ve saved the best for last. It’s very common, when working with sequences, to want to get more than one element from the sequence. For instance, you might have a DNA sequence like this:
seq = 'ATTGGCGCGTTA'
and you might want to get the subsequence starting from index 3
and going up
to (but not including) index 9
. This would be the sequence 'GGCGCG'
.
Python allows you to get this all at once using a sequence slice, which is a
copy of part of the sequence:
>>> seq = 'ATTGGCGCGTTA' >>> seq[3:9] # seq[3:9] is a sequence slice 'GGCGCG'
This works for all kinds of sequences, not just strings:
>>> lst = [1, 2, 3, 4, 5] >>> lst[1:4] [2, 3, 4]
Slice syntax
A sequence slice (which we’ll just call a slice from now on), has this syntax:
seq[start:end]
where:
-
seq
is a sequence -
start
is the integer index of the first location of the slice -
end
is the integer index that is one location beyond the last location in the slice -
the colon character (
:
) separates the start and end parts
Note that this is yet another special meaning for the poor colon character. [10]
The start
and end
indices are optional. If start
is not included, it
defaults to 0
. If end
is not included, it defaults to the length of the
sequence.
Some examples:
>>> lst = [10, 20, 30, 40, 50, 60] >>> lst[1:5] [20, 30, 40, 50] >>> lst[0:6] # the whole list [10, 20, 30, 40, 50, 60] >>> lst[1:] # all but the first element [20, 30, 40, 50, 60] >>> lst[:5] # all but the last element [10, 20, 30, 40, 50] >>> lst[:] # the entire list [10, 20, 30, 40, 50, 60] >>> tup = (4, 8, 10, 25, 46) >>> tup[1:3] (8, 10) >>> tup[1:2] (8,) >>> tup[1:1] () >>> s = 'this is a test' >>> s[4:8] >>> ' is ' >>> s[4:] >>> ' is a test' >>> s[:4] >>> 'this'
Remember that a slice is a copy of part of a sequence, so e.g. lst[:]
is a
very simple way to make a copy of a list. (You can also write lst.copy()
.)
You can use negative indices too:
>>> lst = [10, 20, 30, 40, 50, 60] >>> lst[:-1] # all but the last element [10, 20, 30, 40, 50] >>> lst[-2:] # last two elements [50, 60] >>> lst[-5:-3] [20, 30]
One common application of this is to remove the newline character of a string
which is read in from a file using the readline
method:
file = open('nums.txt', 'r') line = file.readline() line = line[:-1] # remove newline
The start and end indices of a slice don’t have to be literal integers; they can be expressions that evaluate to integers.
>>> lst = [10, 20, 30, 40, 50, 60] >>> n = 2 >>> lst[n-1:n+2] [20, 30, 40]
In this case, the expressions on either side of the colon are evaluated before the slice is computed.
If the slice’s final index is greater than the index of the last element, the slice ends at the last element.
>>> lst = [10, 20, 30, 40, 50, 60] >>> lst[2:3000] [30, 40, 50, 60]
Note that this is not an error.
Wrapping up and looking forward
Even though this was a long reading, there are still more "odds and ends" of Python that we haven’t covered. We will see more of these later on in the course.
[End of reading]