Overview
There are a lot of little topics that don’t fit neatly into a reading because they don’t introduce a new major language construct or cover some important aspect of program design. They are just "odds and ends" that you need to know about to be an effective Python programmer. This reading will cover these. (There are additional odds and ends besides these, but we’ll leave them to later readings.)
All of these are very Python-specific, but that doesn’t mean they aren’t important. They can make your programs more concise and easier to understand, and they are well worth learning. Don’t feel bad if you can’t absorb them all in a single sitting; just skim over this material, learning what seems interesting, and come back to it later as needed.
Topics
-
The
//
operator -
More on format strings
-
More on booleans
-
Looping over files with
for
-
The
in
operator -
The
range
function -
Tuples
-
The
enumerate
function -
Sequence slices
The //
operator
Python’s division operator (/
) is a bit odd when you use it on integers; it
always returns a float even if the numbers can be evenly divided.
[1]
For instance:
>>> 1 / 3 0.3333333333333333 >>> 6 / 3 2.0
The first division makes sense: you can’t divide 3
evenly into 1
. But the
second one could just as well have been 2
. Furthermore, sometimes you want
an integer result when dividing two integers. For this case, Python provides
the //
operator:
>>> 6 // 3 2
This operator will always return an integer given integer inputs, and it will throw away the remainder after dividing the two integers (it doesn’t try to round to the nearest integer, for instance):
>>> 8 // 3 2
You can even use it with floats, but it’s not that useful:
>>> 8.999 // 3 2.0
Basically, it truncates the float or floats to ints, does the integer division, and converts the result back to a float. [2]
The bottom line is: if you really want integer division to result in an
integer, use the //
operator.
More on booleans
We’ve seen previously that Python is kind of sloppy about what it considers to
be "true" and "false". In addition to the actual False
value, there are
other values that are "false" in a boolean context (like an if
statement);
they include the number 0
, the empty string, the empty list, and others we
haven’t seen yet. We say that these values are "falsy". All other values are
"truthy", though there is a specific True
value as well.
Also, at the end of reading 12, we introduced the not
operator, which is like
a function on booleans: it changes a "truthy" value to False
and a "falsy"
value to True
.
>>> not True False >>> not False True >>> not 0 True >>> not '' True >>> not [] True >>> not 42 False
Python has two more boolean operators you need to know about: and
and or
.
These are used to combine two boolean values to make a third.
-
and
only returnsTrue
if both its arguments are true (truthy):>>> True and True True >>> True and False False >>> False and True False >>> False and False False
-
or
only returnsTrue
if either of its arguments are true (truthy):>>> True or True True >>> True or False True >>> False or True True >>> False or False False
Most of the time, though, we use and
and or
with expressions which evaluate
to boolean values. Often, these are relational operators used in an if
statement:
if a > 0 and a < 10: print('in range') else: print('out of range')
Boolean operator "short-circuiting"
The and
and or
boolean operators have one other cool property: sometimes
they don’t have to evaluate both of their arguments! This is easiest to
understand using examples.
With or
, it’s easy to see that if its left-hand operand is True
(or some
other "truthy" value), then the result of the or
expression has to be true
too, since True or <anything>
should be true. Because of this, when the
left-hand operand of or
evaluates to a true (truthy) value, the right-hand
operand is never evaluated. This is called "short circuiting" and it’s
actually very useful. It allows you to take code like this:
if lst == []: return True elif lst[0] == 42: return True else: return False
and shrink it down to this:
return (lst == []) or (lst[0] == 42)
(Actually, we could leave off the parentheses too, since the or
operator has
very low precedence.)
The interesting thing in this example is that the two operands of the or
operator (lst == []
and lst[0] == 42
) are sometimes mutually exclusive. If
or
wasn’t short-circuiting and both operands were evaluated, then the second
operand would be an error if the list lst
was empty. (Make sure you
understand why this is.)
The and
operator also does short-circuiting, but it words differently. In
the case of the and
operator, if the first operand is false, there is no
need to evaluate the second operand because the result of the entire expression
will still be false. In other words, False and <anything>
will be false.
Because of this, in this case the second operand is not evaluated.
To sum up:
-
If you have a boolean expression of the form
True or <anything>
, the<anything>
part is not evaluated because the result will have to beTrue
no matter what<anything>
evaluates to. -
If you have a boolean expression of the form
False and <anything>
the<anything>
part is not evaluated because the result will have to beFalse
no matter what<anything>
evaluates to.
Looping over files with for
We saw previously that we could loop over files using a while
loop. Our
final version of the code we wrote looked like this:
temps = open('temps.txt', 'r') sum_nums = 0.0 while True: line = temps.readline() if not line: break sum_nums += float(line) temps.close()
This code is OK, but it seems a bit long winded. If we were to say what this
code did in English, we would probably say something like "loop over all the
lines in the file, converting each line to a float and adding it to a sum
variable." Notice that this description has no infinite loops, no break
s,
and is just generally shorter and easier to understand. Shouldn’t Python allow
us to express ourselves in a similar way?
Fortunately, it does. You can use a for
loop on files to write this
amazingly concise code that does the same thing:
temps = open('temps.txt', 'r') sum_nums = 0.0 for line in temps: sum_nums += float(line) temps.close()
We’ve replaced the entire while
loop with a two-line for
loop! Cool, huh?
Because it’s concise but also readable, this is the preferred way to
write this.
You probably have questions about this! Up until now, the Python value in a
for
loop following the in
keyword was either a list or a string i.e. some
kind of Python "sequence". Now we are putting a Python file object after the
in
. Does that mean that a file is a sequence? Actually, no! Files are
not sequences in Python. For instance, a sequence in Python should be able
to use the square bracket indexing operator. If you try to do this with a
file:
>>> temps = open('temps.txt', 'r') >>> line0 = temps[0]
you will get an error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: '_io.TextIOWrapper' object is not subscriptable
This just means that the square bracket syntax doesn’t work on files.
OK, so files aren’t "sequences" as such. So how can you use them in for
loops?
The full answer will have to wait for a future reading, but to give you a
preview, the answer is that a Python object doesn’t need to be a full-fledged
sequence in order to be useable after the in
in a for
loop. It just has to
be associated with a kind of Python object called an iterator, which
basically means "something that can be looped over in a for
loop". An
iterator knows how to get the "next" value in an object (like a list, a string,
a file, or other things), and it knows when there is no "next" value, which
indicates that the for
loop has finished executing. Python’s lists, strings,
and file objects are each associated with particular iterators over those
objects, which is what allows them to be used in for
loops. It’s also
possible to define your own iterators, which can be very useful. When we
discuss iterators in depth, we will see how to do that too.
The in
operator
We have seen the Python keyword in
in the context of a for
expression:
>>> for i in [1, 2, 3]: ... print(i) ... 1 2 3
(Note the ...
secondary prompt, by the way.)
However, in
has a completely different meaning when used as an operator all
by itself (i.e. when it is put between Python expressions, but not in a line
with for
). In this case, it is a test to see if a Python value is found
inside a data structure (like a list) that contains other Python values.
For instance,
1 in [1, 2, 3]
means: "does 1
occur in the list [1, 2, 3]?
", and
't' in 'Caltech'
means: "does the character 't'
appear in the string 'Caltech'
?" in
used
as an operator this way returns a True
/False
value:
>>> 1 in [1, 2, 3] True >>> 0 in [1, 2, 3] False >>> 't' in 'Caltech' True >>> 'z' in 'Caltech' False
With strings, you can do even more: you can test if a string is found anywhere inside another string:
>>> 'alt' in 'Caltech' True >>> 'Caltech' in 'Caltech' True >>> 'MIT' in 'Caltech' False
This doesn’t work for lists:
>>> [1, 2] in [1, 2, 3] False
This is because a list could conceivably have another list as one of its elements, whereas a string can only be made up of individual characters.
You can use the in
operator with variables, too:
>>> x = 1 >>> x in [1, 2, 3] True >>> y = [1, 2, 3] >>> x in y True >>> 1 in y True
Be aware that if x in [1, 2, 3]: print('Found!') This might be confusing, because you are seeing the for line in lines: # for loop if 'Z' in line: # in used as an operator print('Found a Z!') |
Tuples
We want to show you the enumerate
function, but before we do that, we need to
talk about tuples. [6] A tuple is a kind of Python sequence. In many ways,
it’s much like a list, except that it’s written using parentheses instead of
square brackets:
# list lst = [1, 2, 3, 4, 5] # tuple tup = (1, 2, 3, 4, 5)
Since parentheses are used for grouping in Python, we have to write tuples of length 1 in a special way:
# tuple of length 1; note the extra comma at the end tup1 = (1,)
This syntax is necessary because (1)
is just a Python expression that happens
to evaluate to the number 1
, whereas (1,)
can only be a tuple.
To write a zero-length tuple, just use empty parentheses:
# zero-length tuple tup0 = ()
Similarities between tuples and lists
Tuples are sequences, and most, but not all, of the common list operations work in a similar way with tuples.
-
You can use
for
loops with tuples:>>> for i in (1, 2, 3, 4, 5): ... print(i) ... 1 2 3 4 5
-
len
works with tuples and (of course) returns the length of the tuple:>>> len((1, 2, 3, 4, 5)) 5
(Note the doubled-up parentheses in this example. The tuple parentheses are only the inner ones.)
-
You can concatenate tuples with the
+
operator:>>> (1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6)
Don’t try concatenating tuples to lists, or vice-versa, or bad things will happen. [7]
-
You can index tuples the same way you index lists:
>>> tup = ('foo', 'bar', 'baz') >>> tup[0] 'foo' >>> tup[-1] 'baz'
In addition, you can convert tuples to lists, and lists to tuples:
>>> tuple([1, 2, 3]) (1, 2, 3) >>> list((1, 2, 3)) [1, 2, 3]
tuple
is a built-in function which converts sequences to tuples if possible.
It even works on strings:
>>> tuple('Caltech') ('C', 'a', 'l', 't', 'e', 'c', 'h')
Differences between tuples and lists
The main difference between tuples and lists is that tuples are immutable. That means that you can’t change the contents of a tuple once it is created.
>>> tup = ('foo', 'bar', 'baz) >>> tup[0] = 'hello' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment
Tuples are thus basically a restricted kind of list. So, you’re probably thinking, what good are tuples, anyway?
Tuples are rarely essential. However, there are definitely some cases where they are very convenient:
-
returning multiple values from functions
-
"tuple unpacking"
-
for
loops with multiple bindings -
...and one more case that we will discuss in future readings. [8]
We’ll discuss the first three cases below.
Multiple return values
Functions in Python can only return a single value. Most of the time, this is fine. Sometimes, though, it would be very nice to be able to return more than one value from a function. The most natural way to do this in Python is to create a tuple from all the values you want to return, and then just return that tuple.
Consider the built-in function divmod
:
>>> divmod(10, 3) (3, 1) >>> divmod(42, 7) (6, 0) >>> divmod(101, 5) (20, 1)
divmod
divides two integers. It returns the quotient and the remainder
of its two arguments, as a tuple. We could define it ourselves using the //
operator we showed you above and the %
remainder operator:
def divmod(m, n): return (m // n, m % n)
The actual definition is more complex because it has to work correctly for negative numbers too. |
Now we can write:
>>> qr = divmod(101, 5) >>> quotient = qr[0] >>> remainder = qr[1]
This is a bit crude, though. Let’s improve it.
Tuple unpacking
One cool thing about tuples is that you can unpack them by writing a "tuple of variables" on the left-hand side of an assignment. So the previous example could have been written more concisely as follows:
>>> qr = divmod(101, 5) >>> (quotient, remainder) = qr
Since qr
is a tuple of length 2, and (quotient, remainder)
is a "tuple of
variables" of length 2, Python lets us "assign to the tuple", which actualy
means that the parts of the tuple (the variables quotient
and remainder
)
will be assigned to. This is called "tuple unpacking" and it’s basically a
multiple assignment statement.
Let’s check that it worked:
>>> qr = divmod(101, 5) >>> (quotient, remainder) = qr >>> quotient 20 >>> remainder 1
Python allows you to write tuple unpacking without using parentheses:
>>> qr = divmod(101, 5) >>> quotient, remainder = qr >>> quotient 20 >>> remainder 1
Although leaving off the tuple parentheses works in this particular case, we advise against leaving them off in general, since there are many situations where you have to use parentheses when writing a tuple. (It’s easiest to remember if you always use them, and that will never be wrong.)
Tuple unpacking works as follows. You have a tuple of variables on the
left-hand side of an =
assignment operator, and a tuple of the same length on
the right-hand side (or a variable whose value is a tuple of that length).
Then, the elements of the tuples on the right-hand side are copied into the
variables on the left-hand side. If the lengths don’t match, it’s an error.
>>> (a, b, c) = (1, 2, 3) >>> a 1 >>> b 2 >>> c 3 >>> v = (1, 2, 3, 4, 5) >>> (x, y) = v Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: too many values to unpack (expected 2) >>> (a, b, c, d, e, f) = v Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: not enough values to unpack (expected 6, got 5)
Application: swapping two variables
One spiffy application for tuples is to swap the values of two variables. [9] The usual textbook way to do this is to use a temporary variable:
a = 10 b = 42 # Swap a and b. temp = a a = b b = temp
This is kind of ugly, especially compared to what you can do with tuples:
a = 10 b = 42 # Swap a and b. (a, b) = (b, a)
The way this works is as follows. The right-hand side (b, a)
is a tuple
created from the variables a
and b
. Its value is (42, 10)
. The
left-hand side is a "tuple of variables" that use the same variable names.
When tuple unpacking happens, the 42
gets unpacked into variable a
and the
10
gets unpacked into variable b
, which gives us the result we want.
So in the very unlikely event that you need to swap the value of two variables, rest assured that Python has you covered!
Application: for
loops with multiple bindings
We can use tuples and tuple unpacking with for
loops to assign ("bind")
values to multiple names in each iteration of a loop:
>>> for (n, s) in [(1, 'a'), (2, 'b')]: ... print(f'num = {n}, char = {s}') num = 1, char = a num = 2, char = b
(Notice that we slipped in the new format string syntax too!)
What’s happening here is that in each iteration of the loop, a new tuple is
unpacked into the variables n
and s
using tuple unpacking. This allows us
to iterate over two variables simultaneously. One use of this is very common,
which leads us into the next subject.
The enumerate
function
Earlier, we saw this code:
# We want to double each element in a list. nums = [23, 12, 45, 68, -101] for i in range(len(nums)): nums[i] *= 2
The purpose of range(len(nums))
is to produce all the valid indices of the
nums
list (which we know are 0
to 4
). This seems like a lot of work for
something so simple.
A different, and more modern, way to write this code is as follows:
nums = [23, 12, 45, 68, -101] for (i, e) in enumerate(nums): nums[i] *= 2
What the built-in enumerate
function does is to take a sequence and generate
tuples of indices (i
) and elements (e
) one at a time. So the first time
through the loop body, i
will be 0
and e
will be 23
; the second time
i
will be 1
and e
will be 12, and so on.
Since this is a tuple unpacking, we can leave off the parentheses around (i,
e)
, and some programmers think this looks better:
nums = [23, 12, 45, 68, -101] for i, e in enumerate(nums): nums[i] *= 2
We prefer to keep the parentheses.
If you use enumerate
directly, you will see that it returns an enumerate
object:
>>> enumerate(nums) <enumerate object at 0x107cc3a80>
Like a list or a range
object, an enumerate
object contains an iterator.
This iteration generates the (i, e)
tuples one at a time, and it can be used
in a for
loop like any other iterator. If you want to see what an
enumerate
object will generate, you can convert it to a list:
>>> list(enumerate(nums)) [(0, 23), (1, 12), (2, 45), (3, 68), (4, -101)]
However, don’t do this when using an enumerate
in a for
loop, since it’s
totally unnecessary.
Getting back to our example:
nums = [23, 12, 45, 68, -101] for (i, e) in enumerate(nums): nums[i] *= 2
You might have noticed that we don’t use the e
variable anywhere. It’s just
there to make enumerate
happy. What if we just left it off?
nums = [23, 12, 45, 68, -101] for i in enumerate(nums): nums[i] *= 2
This is not a syntax error, but it won’t work either. In this case, the i
variable will have the entire tuple assigned to it, so the first value of i
would be (0, 23)
. This obviously will make the line nums[i] *= 2
fail.
If you want to say "I know there is supposed to be a variable here, but I don’t
need it", the standard way to do that is to use the variable name _
, which
means "I don’t care about this variable". Our example then becomes:
nums = [23, 12, 45, 68, -101] for (i, _) in enumerate(nums): nums[i] *= 2
This is really not much of an improvement over the range(len(nums))
code, but
it is the preferred way to write this. It would be nice if there was a variant
of enumerate
that only returned the indices.
In fact, you could easily write one: def enum(iterable): return range(len(iterable)) and then you could re-write the example as: nums = [23, 12, 45, 68, -101] for i in enum(nums): nums[i] *= 2 but there is no |
Sequence slices
This is a long reading, but we’ve saved the best for last. It’s very common, when working with sequences, to want to get more than one element from the sequence. For instance, you might have a DNA sequence like this:
seq = 'ATTGGCGCGTTA'
and you might want to get the subsequence starting from index 3
and going up
to (but not including) index 9
. This would be the sequence 'GGCGCG'
.
Python allows you to get this all at once using a sequence slice, which is a
copy of part of the sequence:
>>> seq = 'ATTGGCGCGTTA' >>> seq[3:9] # seq[3:9] is a sequence slice 'GGCGCG'
This works for all kinds of sequences, not just strings:
>>> lst = [1, 2, 3, 4, 5] >>> lst[1:4] [2, 3, 4]
Slice syntax
A sequence slice (which we’ll just call a slice from now on), has this syntax:
seq[start:end]
where:
-
seq
is a sequence -
start
is the integer index of the first location of the slice -
end
is the integer index that is one location beyond the last location in the slice -
the colon character (
:
) separates the start and end parts
Note that this is yet another special meaning for the poor colon character. [10]
The start
and end
indices are optional. If start
is not included, it
defaults to 0
. If end
is not included, it defaults to the length of the
sequence.
Some examples:
>>> lst = [10, 20, 30, 40, 50, 60] >>> lst[1:5] [20, 30, 40, 50] >>> lst[0:6] # the whole list [10, 20, 30, 40, 50, 60] >>> lst[1:] # all but the first element [20, 30, 40, 50, 60] >>> lst[:5] # all but the last element [10, 20, 30, 40, 50] >>> lst[:] # the entire list [10, 20, 30, 40, 50, 60] >>> tup = (4, 8, 10, 25, 46) >>> tup[1:3] (8, 10) >>> tup[1:2] (8,) >>> tup[1:1] () >>> s = 'this is a test' >>> s[4:8] >>> ' is ' >>> s[4:] >>> ' is a test' >>> s[:4] >>> 'this'
Remember that a slice is a copy of part of a sequence, so e.g. lst[:]
is a
very simple way to make a copy of a list. (You can also write lst.copy()
.)
You can use negative indices too:
>>> lst = [10, 20, 30, 40, 50, 60] >>> lst[:-1] # all but the last element [10, 20, 30, 40, 50] >>> lst[-2:] # last two elements [50, 60] >>> lst[-5:-3] [20, 30]
One common application of this is to remove the newline character of a string
which is read in from a file using the readline
method:
file = open('nums.txt', 'r') line = file.readline() line = line[:-1] # remove newline
The start and end indices of a slice don’t have to be literal integers; they can be expressions that evaluate to integers.
>>> lst = [10, 20, 30, 40, 50, 60] >>> n = 2 >>> lst[n-1:n+2] [20, 30, 40]
In this case, the expressions on either side of the colon are evaluated before the slice is computed.
If the slice’s final index is greater than the index of the last element, the slice ends at the last element.
>>> lst = [10, 20, 30, 40, 50, 60] >>> lst[2:3000] [30, 40, 50, 60]
Note that this is not an error.
Wrapping up and looking forward
Even though this was a long reading, there are still more "odds and ends" of Python that we haven’t covered. We will see more of these later on in the course.
[End of reading]