CS 1 Reading 14: Odds and ends

Overview

There are a lot of little topics that don’t fit neatly into a reading because they don’t introduce a new major language construct or cover some important aspect of program design. They are just "odds and ends" that you need to know about to be an effective Python programmer. This reading will cover these. (There are additional odds and ends besides these, but we’ll leave them to later readings.)

All of these are very Python-specific, but that doesn’t mean they aren’t important. They can make your programs more concise and easier to understand, and they are well worth learning. Don’t feel bad if you can’t absorb them all in a single sitting; just skim over this material, learning what seems interesting, and come back to it later as needed.

Topics

The // operator
More on format strings
More on booleans
Looping over files with for
The in operator
The range function
Tuples
The enumerate function
Sequence slices

The `//` operator

Python’s division operator (/) is a bit odd when you use it on integers; it always returns a float even if the numbers can be evenly divided. ^[1] For instance:

>>> 1 / 3
0.3333333333333333
>>> 6 / 3
2.0

The first division makes sense: you can’t divide 3 evenly into 1. But the second one could just as well have been 2. Furthermore, sometimes you want an integer result when dividing two integers. For this case, Python provides the // operator:

>>> 6 // 3
2

This operator will always return an integer given integer inputs, and it will throw away the remainder after dividing the two integers (it doesn’t try to round to the nearest integer, for instance):

>>> 8 // 3
2

You can even use it with floats, but it’s not that useful:

>>> 8.999 // 3
2.0

Basically, it truncates the float or floats to ints, does the integer division, and converts the result back to a float. ^[2]

The bottom line is: if you really want integer division to result in an integer, use the // operator.

More on booleans

We’ve seen previously that Python is kind of sloppy about what it considers to be "true" and "false". In addition to the actual False value, there are other values that are "false" in a boolean context (like an if statement); they include the number 0, the empty string, the empty list, and others we haven’t seen yet. We say that these values are "falsy". All other values are "truthy", though there is a specific True value as well.

Also, at the end of reading 12, we introduced the not operator, which is like a function on booleans: it changes a "truthy" value to False and a "falsy" value to True.

>>> not True
False
>>> not False
True
>>> not 0
True
>>> not ''
True
>>> not []
True
>>> not 42
False

Python has two more boolean operators you need to know about: and and or. These are used to combine two boolean values to make a third.

and only returns True if both its arguments are true (truthy):

>>> True and True
True
>>> True and False
False
>>> False and True
False
>>> False and False
False

or only returns True if either of its arguments are true (truthy):

>>> True or True
True
>>> True or False
True
>>> False or True
True
>>> False or False
False

Most of the time, though, we use and and or with expressions which evaluate to boolean values. Often, these are relational operators used in an if statement:

if a > 0 and a < 10:
    print('in range')
else:
    print('out of range')

Boolean operator "short-circuiting"

The and and or boolean operators have one other cool property: sometimes they don’t have to evaluate both of their arguments! This is easiest to understand using examples.

With or, it’s easy to see that if its left-hand operand is True (or some other "truthy" value), then the result of the or expression has to be true too, since True or <anything> should be true. Because of this, when the left-hand operand of or evaluates to a true (truthy) value, the right-hand operand is never evaluated. This is called "short circuiting" and it’s actually very useful. It allows you to take code like this:

if lst == []:
    return True
elif lst[0] == 42:
    return True
else:
    return False

and shrink it down to this:

return (lst == []) or (lst[0] == 42)

(Actually, we could leave off the parentheses too, since the or operator has very low precedence.)

The interesting thing in this example is that the two operands of the or operator (lst == [] and lst[0] == 42) are sometimes mutually exclusive. If or wasn’t short-circuiting and both operands were evaluated, then the second operand would be an error if the list lst was empty. (Make sure you understand why this is.)

The and operator also does short-circuiting, but it words differently. In the case of the and operator, if the first operand is false, there is no need to evaluate the second operand because the result of the entire expression will still be false. In other words, False and <anything> will be false. Because of this, in this case the second operand is not evaluated.

To sum up:

If you have a boolean expression of the form True or <anything>, the <anything> part is not evaluated because the result will have to be True no matter what <anything> evaluates to.
If you have a boolean expression of the form False and <anything> the <anything> part is not evaluated because the result will have to be False no matter what <anything> evaluates to.

Looping over files with `for`

We saw previously that we could loop over files using a while loop. Our final version of the code we wrote looked like this:

temps = open('temps.txt', 'r')
sum_nums = 0.0
while True:
    line = temps.readline()
    if not line:
        break
    sum_nums += float(line)
temps.close()

This code is OK, but it seems a bit long winded. If we were to say what this code did in English, we would probably say something like "loop over all the lines in the file, converting each line to a float and adding it to a sum variable." Notice that this description has no infinite loops, no breaks, and is just generally shorter and easier to understand. Shouldn’t Python allow us to express ourselves in a similar way?

Fortunately, it does. You can use a for loop on files to write this amazingly concise code that does the same thing:

temps = open('temps.txt', 'r')
sum_nums = 0.0
for line in temps:
    sum_nums += float(line)
temps.close()

We’ve replaced the entire while loop with a two-line for loop! Cool, huh? Because it’s concise but also readable, this is the preferred way to write this.

You probably have questions about this! Up until now, the Python value in a for loop following the in keyword was either a list or a string i.e. some kind of Python "sequence". Now we are putting a Python file object after the in. Does that mean that a file is a sequence? Actually, no! Files are not sequences in Python. For instance, a sequence in Python should be able to use the square bracket indexing operator. If you try to do this with a file:

>>> temps = open('temps.txt', 'r')
>>> line0 = temps[0]

you will get an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '_io.TextIOWrapper' object is not subscriptable

This just means that the square bracket syntax doesn’t work on files.

OK, so files aren’t "sequences" as such. So how can you use them in for loops?

The full answer will have to wait for a future reading, but to give you a preview, the answer is that a Python object doesn’t need to be a full-fledged sequence in order to be useable after the in in a for loop. It just has to be associated with a kind of Python object called an iterator, which basically means "something that can be looped over in a for loop". An iterator knows how to get the "next" value in an object (like a list, a string, a file, or other things), and it knows when there is no "next" value, which indicates that the for loop has finished executing. Python’s lists, strings, and file objects are each associated with particular iterators over those objects, which is what allows them to be used in for loops. It’s also possible to define your own iterators, which can be very useful. When we discuss iterators in depth, we will see how to do that too.

The `in` operator

We have seen the Python keyword in in the context of a for expression:

>>> for i in [1, 2, 3]:
...     print(i)
...
1
2
3

(Note the ... secondary prompt, by the way.)

However, in has a completely different meaning when used as an operator all by itself (i.e. when it is put between Python expressions, but not in a line with for). In this case, it is a test to see if a Python value is found inside a data structure (like a list) that contains other Python values.

For instance,

1 in [1, 2, 3]

means: "does 1 occur in the list [1, 2, 3]?", and

't' in 'Caltech'

means: "does the character 't' appear in the string 'Caltech'?" in used as an operator this way returns a True/False value:

>>> 1 in [1, 2, 3]
True
>>> 0 in [1, 2, 3]
False
>>> 't' in 'Caltech'
True
>>> 'z' in 'Caltech'
False

With strings, you can do even more: you can test if a string is found anywhere inside another string:

>>> 'alt' in 'Caltech'
True
>>> 'Caltech' in 'Caltech'
True
>>> 'MIT' in 'Caltech'
False

This doesn’t work for lists:

>>> [1, 2] in [1, 2, 3]
False

This is because a list could conceivably have another list as one of its elements, whereas a string can only be made up of individual characters.

You can use the in operator with variables, too:

>>> x = 1
>>> x in [1, 2, 3]
True
>>> y = [1, 2, 3]
>>> x in y
True
>>> 1 in y
True

Be aware that in used as an operator has nothing to do with in used in a for loop! Python is overloading the meaning of the keyword in to do two completely different things. Most of the time, this is obvious, but since in used as an operator returns a True/False (boolean) value, you often see it used in an if statement e.g.

if x in [1, 2, 3]:
    print('Found!')

This might be confusing, because you are seeing the if in a position where a for is more typical. You can even have both forms next to each other:

for line in lines:    # for loop
    if 'Z' in line:   # in used as an operator
        print('Found a Z!')

Tuples

We want to show you the enumerate function, but before we do that, we need to talk about tuples. ^[6] A tuple is a kind of Python sequence. In many ways, it’s much like a list, except that it’s written using parentheses instead of square brackets:

# list
lst = [1, 2, 3, 4, 5]
# tuple
tup = (1, 2, 3, 4, 5)

Since parentheses are used for grouping in Python, we have to write tuples of length 1 in a special way:

# tuple of length 1; note the extra comma at the end
tup1 = (1,)

This syntax is necessary because (1) is just a Python expression that happens to evaluate to the number 1, whereas (1,) can only be a tuple.

To write a zero-length tuple, just use empty parentheses:

# zero-length tuple
tup0 = ()

Similarities between tuples and lists

Tuples are sequences, and most, but not all, of the common list operations work in a similar way with tuples.

You can use for loops with tuples:

>>> for i in (1, 2, 3, 4, 5):
...     print(i)
...
1
2
3
4
5

len works with tuples and (of course) returns the length of the tuple:
>>> len((1, 2, 3, 4, 5)) 5
(Note the doubled-up parentheses in this example. The tuple parentheses are only the inner ones.)
You can concatenate tuples with the + operator:
>>> (1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6)
Don’t try concatenating tuples to lists, or vice-versa, or bad things will happen. ^[7]

You can index tuples the same way you index lists:

>>> tup = ('foo', 'bar', 'baz')
>>> tup[0]
'foo'
>>> tup[-1]
'baz'

In addition, you can convert tuples to lists, and lists to tuples:

>>> tuple([1, 2, 3])
(1, 2, 3)
>>> list((1, 2, 3))
[1, 2, 3]

tuple is a built-in function which converts sequences to tuples if possible. It even works on strings:

>>> tuple('Caltech')
('C', 'a', 'l', 't', 'e', 'c', 'h')

Differences between tuples and lists

The main difference between tuples and lists is that tuples are immutable. That means that you can’t change the contents of a tuple once it is created.

>>> tup = ('foo', 'bar', 'baz)
>>> tup[0] = 'hello'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

Tuples are thus basically a restricted kind of list. So, you’re probably thinking, what good are tuples, anyway?

Tuples are rarely essential. However, there are definitely some cases where they are very convenient:

returning multiple values from functions
"tuple unpacking"
for loops with multiple bindings
...and one more case that we will discuss in future readings. ^[8]

We’ll discuss the first three cases below.

Multiple return values

Functions in Python can only return a single value. Most of the time, this is fine. Sometimes, though, it would be very nice to be able to return more than one value from a function. The most natural way to do this in Python is to create a tuple from all the values you want to return, and then just return that tuple.

Consider the built-in function divmod:

>>> divmod(10, 3)
(3, 1)
>>> divmod(42, 7)
(6, 0)
>>> divmod(101, 5)
(20, 1)

divmod divides two integers. It returns the quotient and the remainder of its two arguments, as a tuple. We could define it ourselves using the // operator we showed you above and the % remainder operator:

def divmod(m, n):
    return (m // n, m % n)

The actual definition is more complex because it has to work correctly for negative numbers too.

Now we can write:

>>> qr = divmod(101, 5)
>>> quotient = qr[0]
>>> remainder = qr[1]

This is a bit crude, though. Let’s improve it.

Tuple unpacking

One cool thing about tuples is that you can unpack them by writing a "tuple of variables" on the left-hand side of an assignment. So the previous example could have been written more concisely as follows:

>>> qr = divmod(101, 5)
>>> (quotient, remainder) = qr

Since qr is a tuple of length 2, and (quotient, remainder) is a "tuple of variables" of length 2, Python lets us "assign to the tuple", which actualy means that the parts of the tuple (the variables quotient and remainder) will be assigned to. This is called "tuple unpacking" and it’s basically a multiple assignment statement.

Let’s check that it worked:

>>> qr = divmod(101, 5)
>>> (quotient, remainder) = qr
>>> quotient
20
>>> remainder
1

Python allows you to write tuple unpacking without using parentheses:

>>> qr = divmod(101, 5)
>>> quotient, remainder = qr
>>> quotient
20
>>> remainder
1

Although leaving off the tuple parentheses works in this particular case, we advise against leaving them off in general, since there are many situations where you have to use parentheses when writing a tuple. (It’s easiest to remember if you always use them, and that will never be wrong.)

Tuple unpacking works as follows. You have a tuple of variables on the left-hand side of an = assignment operator, and a tuple of the same length on the right-hand side (or a variable whose value is a tuple of that length). Then, the elements of the tuples on the right-hand side are copied into the variables on the left-hand side. If the lengths don’t match, it’s an error.

>>> (a, b, c) = (1, 2, 3)
>>> a
1
>>> b
2
>>> c
3
>>> v = (1, 2, 3, 4, 5)
>>> (x, y) = v
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
>>> (a, b, c, d, e, f) = v
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 6, got 5)

Application: swapping two variables

One spiffy application for tuples is to swap the values of two variables. ^[9] The usual textbook way to do this is to use a temporary variable:

a = 10
b = 42
# Swap a and b.
temp = a
a = b
b = temp

This is kind of ugly, especially compared to what you can do with tuples:

a = 10
b = 42
# Swap a and b.
(a, b) = (b, a)

The way this works is as follows. The right-hand side (b, a) is a tuple created from the variables a and b. Its value is (42, 10). The left-hand side is a "tuple of variables" that use the same variable names. When tuple unpacking happens, the 42 gets unpacked into variable a and the 10 gets unpacked into variable b, which gives us the result we want.

So in the very unlikely event that you need to swap the value of two variables, rest assured that Python has you covered!

Application: `for` loops with multiple bindings

We can use tuples and tuple unpacking with for loops to assign ("bind") values to multiple names in each iteration of a loop:

>>> for (n, s) in [(1, 'a'), (2, 'b')]:
...     print(f'num = {n}, char = {s}')
num = 1, char = a
num = 2, char = b

(Notice that we slipped in the new format string syntax too!)

What’s happening here is that in each iteration of the loop, a new tuple is unpacked into the variables n and s using tuple unpacking. This allows us to iterate over two variables simultaneously. One use of this is very common, which leads us into the next subject.

The `enumerate` function

Earlier, we saw this code:

# We want to double each element in a list.
nums = [23, 12, 45, 68, -101]
for i in range(len(nums)):
    nums[i] *= 2

The purpose of range(len(nums)) is to produce all the valid indices of the nums list (which we know are 0 to 4). This seems like a lot of work for something so simple.

A different, and more modern, way to write this code is as follows:

nums = [23, 12, 45, 68, -101]
for (i, e) in enumerate(nums):
    nums[i] *= 2

What the built-in enumerate function does is to take a sequence and generate tuples of indices (i) and elements (e) one at a time. So the first time through the loop body, i will be 0 and e will be 23; the second time i will be 1 and e will be 12, and so on.

Since this is a tuple unpacking, we can leave off the parentheses around (i, e), and some programmers think this looks better:

nums = [23, 12, 45, 68, -101]
for i, e in enumerate(nums):
    nums[i] *= 2

We prefer to keep the parentheses.

If you use enumerate directly, you will see that it returns an enumerate object:

>>> enumerate(nums)
<enumerate object at 0x107cc3a80>

Like a list or a range object, an enumerate object contains an iterator. This iteration generates the (i, e) tuples one at a time, and it can be used in a for loop like any other iterator. If you want to see what an enumerate object will generate, you can convert it to a list:

>>> list(enumerate(nums))
[(0, 23), (1, 12), (2, 45), (3, 68), (4, -101)]

However, don’t do this when using an enumerate in a for loop, since it’s totally unnecessary.

Getting back to our example:

nums = [23, 12, 45, 68, -101]
for (i, e) in enumerate(nums):
    nums[i] *= 2

You might have noticed that we don’t use the e variable anywhere. It’s just there to make enumerate happy. What if we just left it off?

nums = [23, 12, 45, 68, -101]
for i in enumerate(nums):
    nums[i] *= 2

This is not a syntax error, but it won’t work either. In this case, the i variable will have the entire tuple assigned to it, so the first value of i would be (0, 23). This obviously will make the line nums[i] *= 2 fail.

If you want to say "I know there is supposed to be a variable here, but I don’t need it", the standard way to do that is to use the variable name _, which means "I don’t care about this variable". Our example then becomes:

nums = [23, 12, 45, 68, -101]
for (i, _) in enumerate(nums):
    nums[i] *= 2

This is really not much of an improvement over the range(len(nums)) code, but it is the preferred way to write this. It would be nice if there was a variant of enumerate that only returned the indices.

In fact, you could easily write one:

def enum(iterable):
    return range(len(iterable))

and then you could re-write the example as:

nums = [23, 12, 45, 68, -101]
for i in enum(nums):
    nums[i] *= 2

but there is no enum-like function in the Python standard libraries as far as we know.

Sequence slices

This is a long reading, but we’ve saved the best for last. It’s very common, when working with sequences, to want to get more than one element from the sequence. For instance, you might have a DNA sequence like this:

seq = 'ATTGGCGCGTTA'

and you might want to get the subsequence starting from index 3 and going up to (but not including) index 9. This would be the sequence 'GGCGCG'. Python allows you to get this all at once using a sequence slice, which is a copy of part of the sequence:

>>> seq = 'ATTGGCGCGTTA'
>>> seq[3:9]  # seq[3:9] is a sequence slice
'GGCGCG'

This works for all kinds of sequences, not just strings:

>>> lst = [1, 2, 3, 4, 5]
>>> lst[1:4]
[2, 3, 4]

Slice syntax

A sequence slice (which we’ll just call a slice from now on), has this syntax:

seq[start:end]

where:

seq is a sequence
start is the integer index of the first location of the slice
end is the integer index that is one location beyond the last location in the slice
the colon character (:) separates the start and end parts

Note that this is yet another special meaning for the poor colon character. ^[10]

The start and end indices are optional. If start is not included, it defaults to 0. If end is not included, it defaults to the length of the sequence.

Some examples:

>>> lst = [10, 20, 30, 40, 50, 60]
>>> lst[1:5]
[20, 30, 40, 50]
>>> lst[0:6]  # the whole list
[10, 20, 30, 40, 50, 60]
>>> lst[1:]   # all but the first element
[20, 30, 40, 50, 60]
>>> lst[:5]  # all but the last element
[10, 20, 30, 40, 50]
>>> lst[:]    # the entire list
[10, 20, 30, 40, 50, 60]
>>> tup = (4, 8, 10, 25, 46)
>>> tup[1:3]
(8, 10)
>>> tup[1:2]
(8,)
>>> tup[1:1]
()
>>> s = 'this is a test'
>>> s[4:8]
>>> ' is '
>>> s[4:]
>>> ' is a test'
>>> s[:4]
>>> 'this'

Remember that a slice is a copy of part of a sequence, so e.g. lst[:] is a very simple way to make a copy of a list. (You can also write lst.copy().)

You can use negative indices too:

>>> lst = [10, 20, 30, 40, 50, 60]
>>> lst[:-1]  # all but the last element
[10, 20, 30, 40, 50]
>>> lst[-2:]  # last two elements
[50, 60]
>>> lst[-5:-3]
[20, 30]

One common application of this is to remove the newline character of a string which is read in from a file using the readline method:

file = open('nums.txt', 'r')
line = file.readline()
line = line[:-1]  # remove newline

The start and end indices of a slice don’t have to be literal integers; they can be expressions that evaluate to integers.

>>> lst = [10, 20, 30, 40, 50, 60]
>>> n = 2
>>> lst[n-1:n+2]
[20, 30, 40]

In this case, the expressions on either side of the colon are evaluated before the slice is computed.

If the slice’s final index is greater than the index of the last element, the slice ends at the last element.

>>> lst = [10, 20, 30, 40, 50, 60]
>>> lst[2:3000]
[30, 40, 50, 60]

Note that this is not an error.

Wrapping up and looking forward

Even though this was a long reading, there are still more "odds and ends" of Python that we haven’t covered. We will see more of these later on in the course.

[End of reading]

1. Some people view this as a language wart i.e. something in the language that ought to be changed. Do we feel that way? No comment.

2. Or something like that. We’re not 100% sure, and since we would never intentionally use the // operator with floats, we aren’t motivated to hunt through the documentation to find out. But if you do, and the answer is particularly interesting, do let us know.

3. Back in the long-ago days of Python 2, range actually did return a list.

4. Python allows you to define functions like this that can take varying numbers of arguments, although this is rarely needed. Later in the course, we’ll show you how to do this with your own functions.

5. Idiom just means "a typical way to write something".

6. By the way, there is a controversy in how to pronounce the word "tuple". Some people (evil, horrible people) pronounce it "tupple" (rhymes with "supple"). Other people (good, virtuous people) pronounce it "toople" (rhymes with "hoople"). We’ll leave it to you to choose which side you’re on.

7. Actually, go ahead and try it! You’ll see what we mean.

8. It has to do with dictionaries, if you must know. Stay tuned!

9. I’ve been programming for decades, and I don’t think I have ever needed to swap the values of two variables. Nevertheless, this seems to be a favorite example for people teaching new programmers, so who am I to disagree?

10. When we talk about dictionaries, you’ll see that colons have another special meaning there too.