CS 1 Reading 17: Classes, Part 1

Overview

In this unit we will discuss classes and object-oriented programming. This is one of the most important topics in the entire course.

Topics

Objects
The class statement
Creating your own objects
The __init__ constructor method
Defining and calling methods

Objects

Since the beginning of the course, we’ve been working with objects:

strings
lists
dictionaries
tuples
files
canvases in tkinter

All of these kinds of objects have either been built in to Python or have been defined in Python’s standard libraries.

We’ve used an operational definition of what an "object" is. It’s some kind of thing that has some kind of internal data (called its state), and is something we can call methods on. A method is like a function, but it acts on an object (and may take other arguments). Methods use the "dot syntax":

object.method(arg1, arg2, ...)

An example is the append method on lists:

>>> lst = [1, 2, 3, 4, 5]
>>> lst.append(42)
>>> lst
[1, 2, 3, 4, 5, 42]

Here, lst is the object (a list), append is the name of the method, and 42 is the method’s (only) argument.

Methods vs. functions

We might ask: why not just have append be a regular Python function? Then we could write this:

>>> lst = [1, 2, 3, 4, 5]
>>> append(lst, 42)

What’s wrong with this?

Problem 1:

If append were a function, it would have to change the list argument passed to it. Usually, we don’t like to do this. (Functions are easier to manage if we don’t change the arguments to the function.) Having append as a method on a list object suggests that it’s OK for append to change the list that it’s acting on. You’re "appending to this list (lst)", and the argument 42 is "the thing you’re appending".
Problem 2:

append as a function can only work on one kind of data (say, on lists). We might have some other kind of data (say, a Thing) that has a different way of appending. In that case, we would need to use a different function with a different name. It’s annoying to have to come up with a completely new name for a similar kind of action. (For instance, append_Thing instead of just append.) If you have a lot of different types of data with similar behaviors, this can make programs cluttered and hard to read.
Problem 3:

This will only work if the internal state of the objects (lists here) can be changed by any function. We may want to allow only particular functions to change the internal state. More specifically, we may only want to allow methods (which in some sense "belong" to the object) to change the internal state.

So methods have some (conceptual, readability, convenience, safety) advantages over functions in some cases. Functions have one big advantage so far: we know how to define them! Methods are inextricably tied to objects, so in order to talk about defining methods, we have to talk about defining new kinds of objects.

Objects and internal data

Objects can store internal data (we saw this with tkinter event objects and their x and y attributes).

Sometimes we would like to create special kinds of objects to store particular kinds of data, and define new methods to interact with that data in object-specific ways. Creating new object types will allow us to do that.

Running example

We have spent a lot of time discussing the matplotlib.pyplot plotting library. matplotlib is mostly object-oriented. Some things in plt (Figures (e.g. fig), Axes (e.g. ax), Line2D, etc.) are objects and have methods (for example, ax.plot is a method belonging to an Axes to plot the provided data given optional keyword arguments).

Some things (shapes on canvases) are not "objects" in the Python sense, so they must be handled indirectly using "handles" and methods.

This distinction in matplotlib.pyplot between

true objects, and
things that seem like they should be objects but aren’t

can be confusing, which is why El touched on the "Implicit" vs. "Explicit" patterns in Matplotlib.

Fortunately, Matplotlib's caveat on these two approaches (the "explicit" approach we've been using is predominantly object-oriented) makes it easier for programmers to respect one approach over the other without confusing library-wide functions (e.g. plt.show()) with object methods (e.g. the explicit ax.plot call on an Axes object that manages a single 2D plot), which is preferred over matplotlib.pyplot.plot ( the less reliable/generalizable implicit approach, which uses the plot function belonging to the library, which you are relying on plt to figure out what to do with; that gets less reliable when you have multiple plots that you want to customize after plotting the data).

So why is one approach recommended over the other? Whenever you are working with state, it is much easier to isolate state for a single entity (a Figure, Axes, etc.) that can be interfaced with using methods attached to that object. In particular, you can think of a method as a function on an object having state. If you recall from Week 2, we talked about the "." dot syntax to identify a method vs. a function; The "." is used on a variable (an object) that has state we can access, as opposed to calling a function relying only on arguments passed to it.

lst = [1, 2, 3] # a list object
x = 1 # an int value, not an object
# The value 4 is appended to the end of the list, using its defined `append` method
# Under the hood, an object like a list manages its state internally, and changes as a result 
# of method call
lst.append(4) 
print(x) # print is a function that takes x as a parameter, it is "global" and not attached to any object

In previous offerings of CS 1, we used a graphics library tkinter which was less consistent with its notion of objects and functions. One approach to improve the programs we worked with was to add our own classes to wrap around tkinter drawing functionality (e.g. creating a Circle and Line class for drawing circles and lines). But luckily, we don't have to worry about that as much with pyplot, as long as you leverage its "explicit" object-oriented approach with Axes for managing plots instead of the plt plotting functions. In fact, Matplotlib's object-oriented approach isn't too far off from the the classes you would often write manually in tkinter!

There are more motivations of object-oriented programming in CS 1 which can simplify some of the code we've already done in MPs/Lectures/Labs. For example, we gave you a Move class in MP4 to factor out a lot of functionality you could interact with for each Move once you constructed the object.

MP4 is a very motivating example of object-oriented programming, particularly after you've implemented it without classes. Why? Well, when you ask whether a program should be object-oriented or not, one of the most important factors in that decision is whether you are managing state that is otherwise tedious to manage in an entire program.

In MP4, the "state" we managed was information about Pokemon in pokedex.csv and collected Pokemon in collected.csv. Each time a user wanted to use a feature of the program, a function(s) would be called behind the screen to use the CSV data, populate a list of row dictionaries, modify that list, and then re-write to collected.csv file to save the results. All of this can be done with objects, minimizing interaction/tedious CSV-processing, until the user might want to save the collection at the very end.

Classes and the `class` statement

Objects in Python are all instances of some class. A class describes what an object is and what it can do. It includes the definition of all the methods that objects of that class can do. A class is also a type (a kind of data). Instances of a class are the objects created from the class. Instances can contain internal data. For instance (sorry about the pun!), a list has elements, a dictionary has key/value pairs, etc. A particular list (an object) is an instance of the class list (a class).

Classes in Python are defined using the class statement. A class statement is a complete description of how instances of that class (objects of that class) behave. Let’s look at a template for a class statement.

class ClassName:
    """
    <class docstring>
    """

    def __init__(self, arg1, ...):
        ...

    def method1(self, arg1, ...):
        ...

    def method2(self, arg1, ...):
        ...

A trivial class

Let’s fill in the blanks for a trivial class:

class Box:
    """
    A Box instance represents a Box with a value inside.
    """
    def __init__(self, value):
        self.value = value

    def get_value(self):
        return self.value

    def set_value(self, new_value):
        self.value = new_value

There’s a lot going on here! Let’s cut it down and talk about the pieces separately.

The `class` keyword

Classes are introduced using the class keyword:

class Thing:
    ...

The name of the class follows the class keyword, followed by a colon (:). So this defines the Thing class. Everything inside the class statement is indented (as with most Python statements).

PEP8 Style Notes:

You'll see a few different conventions between a class program and a function-based program. PEP8 is particular on some conventions to make the distinction clearer (we expect you to follow these as well):

Unlike function and variable names, class names follow PascalCase naming conventions (e.g. DictWriter)
Methods still follow lower_case naming conventions (e.g. the method writer.write_row
Functions are separated by 2 blank lines; methods in a class definition are still defined with the def keyword, but are separated by a single blank line each; pycodestyle will catch this for you!

The class docstring

The class docstring is the first thing inside the class statement.

class Box:
    """
    A Box instance represents a Box with a value inside.
    """
    ...

It’s optional, but it’s much better to have it. It should describe in general terms what the instances of the class can do, so that someone calling help(Box) can get a summary of the class and its methods (also provided in the documentation if they have valid docstrings).

Method definitions

After the docstring comes one or more method definitions.

class Box:
    """
    A Box instance represents a Box with a value inside.
    """
    def __init__(self, value):
        ...

    def get_value(self):
        ...

    def set_value(self, new_value):
        ...

Python method definitions use def as their keyword like regular functions. For the most part, these are the same as regular function definitions, but they are called differently.

`self`

The first argument to a Python method represents the object being acted on. By convention, this is called self (though it doesn’t have to be). The self object is kind of like a dictionary, but it uses the dot syntax instead of dictionary syntax. The contents of an object are called the object attributes.

If you’ve programmed in Java before, you might think that self is Python’s equivalent to the this keyword in Java. And you’d basically be right, but there are some important differences:

self is not a keyword in Python. Using self to stand for the object being acted on is a convention. You could use this or any other variable name. ^[2]
In Java, this is implicit. You don’t have to use this to get an attribute from the object. Any attribute name (unless it’s also the name of a local variable) can be used without this. In Python, you do have to use self. ^[3]

The effect of this is that it’s easier to understand Python code than Java code (in our opinion).

Constructor: `init`

def __init__(self, value):
    self.value = value

This method makes value an attribute of the object represented by self. The attribute self.value is assigned to be the same as the method’s argument called value. So this method just stores the value argument as self.value.

This method has a funny name: __init__. Recall that names in Python that have two initial underscores and two terminal underscores are "special". They have a special meaning to Python e.g. __name__ is the current module’s name.

The __init__ method is what is called the constructor method for the class (or just the constructor for short). This method is called when an instance of the class is being created. It is responsible for initializing the object in whatever way is required. The __init__ method returns the object that has been constructed (even though there is no return statement). It’s as if it were written:

def __init__(self, value):
    self.value = value
    return self  # not necessary (and also wrong)

However, if you actually write it like that, it will result in an error once the constructor is called. Constructors must not explicitly return anything!

The constructor is normally where the attributes of the object are defined and given their initial values. Here, the object will have one attribute: self.value, which is set to be the same as the value argument.

Adding methods

The rest of the class definition is the following method definitions:

def get_value(self):
    return self.value

def set_value(self, new_value):
    self.value = new_value

(They are indented inside the class definition, but we’re showing them unindented here. We would also add docstrings, but we left them out for simplicity.)

Method definitions are syntactically almost exactly like function definitions. The one "difference" is that they have a special first argument (self), and when the method is called, the name self will refer to the object method is acting on. (You can think of this as "the current object".)

The get_value method takes the object as its argument and returns the value attribute of the object.

The set_value method takes the object and a new value as arguments and changes the value attribute of the object to new_value.

Accessors

Methods like get_value and set_value which do nothing more than return an object attribute or change the value of an object attribute are usually referred to as "accessor methods" (or "accessors" for short) because they "access" an attribute. ^[4] Some computer languages, like Java, encourage you to write accessor methods for every attribute. Python is a bit less strict about this. There are design considerations here, and good arguments can be made either way, but don’t worry about this for now.

Using objects

That’s the end of the definition of class Box. Now we need to know how to use it to create Thing objects (instances of the class Box) and call their methods.

Creating objects

To create a new Box object, just use Box as if it were the name of a function:

>>> t = Box(42)

What does this mean? Thing is a class name, but we’re using it like a function! When you use a class name as if it were a function, what you are doing is calling the __init__ method of that class. However, the __init__ method took two arguments (self and value), and this only takes one (42) -– what’s going on?

To understand this, you need to know that when Python sees a line like this:

b = Box(42)

it translates the line internally into something like this:

b = makeEmptyObject()
Box.__init__(t, 42)

(This isn’t exactly what happens, but it’s conceptually correct.)

In other words, Python:

creates a new object with no attributes
passes it and the argument 42 to the __init__ method of the class Box

The __init__ method itself isn’t responsible for creating the (initially empty) object. (Python does that just before __init__ is called.) Instead, __init__ is responsible for creating and initializing the attributes of the object and whatever other initialization might be necessary.

Calling methods

Now that we’ve created the object t (which is a Thing), we can call its methods:

>>> b.get_value()
42
>>> b.set_value(101)
>>> b.get_value()
101

Again, something a bit weird is happening. Recall the definition of get_value:

def get_value(self):
    return self.value

get_value was defined to take one argument, but we called it with no arguments. Why does this work? The reason is that when we write

t.get_value()

Python translates it to something like:

Thing.get_value(t)

So the object (t in this case) is always the first argument to every method call.

Although Box.get_value(b) is legal Python syntax, and will usually do the same thing as b.get_value(), b.get_value() is not exactly the same thing as Box.get_value(b). When we discuss class inheritance, we will see that things get a bit more complicated when one class "inherits" from another one.

Now recall the definition of set_value:

def set_value(self, value):
    self.value = value

When we write

t.set_value(101)

Python translates it into something like:

Thing.set_value(t, 101)

which explains why set_value takes two arguments even though it’s only called with one.

One way to think about both of these cases is that the t. in t.get_value() and t.set_value(101) is an extra argument which is moved from the place it would occupy in a function call to before the dot in the dot syntax.

So methods are really just functions with

a special call syntax (the dot syntax)
an object as the implicit first argument

The first argument (usually called self) is the object before the dot in the method call. t.get_value() means that the get_value method from the Thing class is called on the Thing object t.

Attributes

Our Thing object has one attribute: value. We can directly access it if we want to:

>>> t.value
101

We can also change its value directly:

>>> t.value = 999
>>> t.value
999

Directly accessing attributes isn’t always a good idea, though it’s pretty common Python practice. Directly changing attribute values is usually a very bad idea. Attributes should be considered to be the private state of an object ("private" means "used only by the object’s methods"). There are ways to restrict access to attributes, which we will see later, but Python is still not as restrictive as e.g. Java when it comes to accessing attributes. ^[5]

Next time

In the next reading, we’ll continue with more examples of more complex classes, with an overview of strategies to design and use classes in the programs we've already been writing.

[End of reading]

1. But please don’t!

2. This follows from the "Zen of Python": "Explicit is better than implicit".

3. Sometimes you will hear methods like set_value referred to as "mutator methods" because they "mutate" (change) the value of an attribute.

4. If you’re a Java programmer, you should know that Python has no real equivalent to the private, protected, and public keywords. Instead, everything is "public" by default, though there is a weak form of "private" that we will see later in the course.