Overview
In this unit we will discuss classes and object-oriented programming. This is one of the most important topics in the entire course.
Topics
-
Objects
-
The
class
statement -
Creating your own objects
-
The
__init__
constructor method -
Defining and calling methods
Objects
Since the beginning of the course, we’ve been working with objects:
-
strings
-
lists
-
dictionaries
-
tuples
-
files
-
canvases in
tkinter
All of these kinds of objects have either been built in to Python or have been defined in Python’s standard libraries.
We’ve used an operational definition of what an "object" is. It’s some kind of thing that has some kind of internal data (called its state), and is something we can call methods on. A method is like a function, but it acts on an object (and may take other arguments). Methods use the "dot syntax":
object.method(arg1, arg2, ...)
An example is the append
method on lists:
>>> lst = [1, 2, 3, 4, 5] >>> lst.append(42) >>> lst [1, 2, 3, 4, 5, 42]
Here, lst
is the object (a list),
append
is the name of the method,
and 42
is the method’s (only) argument.
Methods vs. functions
We might ask: why not just have append
be a regular Python function?
Then we could write this:
>>> lst = [1, 2, 3, 4, 5] >>> append(lst, 42)
What’s wrong with this?
-
Problem 1:
If
append
were a function, it would have to change the list argument passed to it. Usually, we don’t like to do this. (Functions are easier to manage if we don’t change the arguments to the function.) Havingappend
as a method on a list object suggests that it’s OK forappend
to change the list that it’s acting on. You’re "appending to this list (lst
)", and the argument42
is "the thing you’re appending". -
Problem 2:
append
as a function can only work on one kind of data (say, on lists). We might have some other kind of data (say, aThing
) that has a different way of appending. In that case, we would need to use a different function with a different name. It’s annoying to have to come up with a completely new name for a similar kind of action. (For instance,append_Thing
instead of justappend
.) If you have a lot of different types of data with similar behaviors, this can make programs cluttered and hard to read. -
Problem 3:
This will only work if the internal state of the objects (lists here) can be changed by any function. We may want to allow only particular functions to change the internal state. More specifically, we may only want to allow methods (which in some sense "belong" to the object) to change the internal state.
So methods have some (conceptual, readability, convenience, safety) advantages over functions in some cases. Functions have one big advantage so far: we know how to define them! Methods are inextricably tied to objects, so in order to talk about defining methods, we have to talk about defining new kinds of objects.
Objects and internal data
Objects can store internal data
(we saw this with tkinter event objects and their x
and y
attributes).
Sometimes we would like to create special kinds of objects to store particular kinds of data, and define new methods to interact with that data in object-specific ways. Creating new object types will allow us to do that.
Running example
We have spent a lot of time discussing the matplotlib.pyplot
plotting library.
matplotlib
is mostly object-oriented.
Some things in plt
(Figures (e.g. fig
), Axes
(e.g. ax
), Line2D
, etc.)
are objects and have methods (for example, ax.plot
is a method belonging to an Axes
to plot
the provided data given optional keyword arguments).
Some things (shapes on canvases) are not "objects" in the Python sense, so they must be handled indirectly using "handles" and methods.
This distinction in matplotlib.pyplot
between
-
true objects, and
-
things that seem like they should be objects but aren’t
can be confusing, which is why El touched on the "Implicit" vs. "Explicit" patterns in Matplotlib.
Fortunately, Matplotlib's caveat on these two approaches (the "explicit" approach we've been using
is predominantly object-oriented) makes it easier for programmers to respect one approach over the other
without confusing library-wide functions (e.g. plt.show()
) with object methods (e.g. the explicit ax.plot
call on an
Axes
object that manages a single 2D plot), which is preferred over matplotlib.pyplot.plot
(
the less reliable/generalizable implicit approach, which uses the plot
function belonging to the
library, which you are relying on plt
to figure out what to do with; that gets less reliable when you have
multiple plots that you want to customize after plotting the data).
So why is one approach recommended over the other? Whenever you are working with state,
it is much easier to isolate state for a single entity (a Figure
, Axes
, etc.)
that can be interfaced with using methods attached to that object.
In particular, you can think of a method as a function on an object having state.
If you recall from Week 2, we talked about the "." dot syntax to identify a method vs. a function;
The "." is used on a variable (an object) that has state we can access, as opposed to calling a function
relying only on arguments passed to it.
lst = [1, 2, 3] # a list object x = 1 # an int value, not an object # The value 4 is appended to the end of the list, using its defined `append` method # Under the hood, an object like a list manages its state internally, and changes as a result # of method call lst.append(4) print(x) # print is a function that takes x as a parameter, it is "global" and not attached to any object
In previous offerings of CS 1, we used a graphics library tkinter
which was less
consistent with its notion of objects and functions. One approach to improve the programs
we worked with was to add our own classes to wrap around tkinter
drawing functionality (e.g.
creating a Circle
and Line
class for drawing circles and lines).
But luckily, we don't have to worry about that as much with pyplot
, as long as you
leverage its "explicit" object-oriented approach with Axes
for managing plots instead of the plt
plotting functions. In fact, Matplotlib's object-oriented approach isn't too far off from the
the classes you would often write manually in tkinter
!
There are more motivations of object-oriented programming in CS 1 which can simplify some
of the code we've already done in MPs/Lectures/Labs. For example, we gave you a Move
class in MP4
to factor out a lot of functionality you could interact with for each Move
once you constructed the object.
MP4 is a very motivating example of object-oriented programming, particularly after you've implemented it without classes. Why? Well, when you ask whether a program should be object-oriented or not, one of the most important factors in that decision is whether you are managing state that is otherwise tedious to manage in an entire program.
In MP4, the "state" we managed
was information about Pokemon in pokedex.csv
and collected Pokemon in collected.csv
.
Each time a user wanted to use a feature of the program, a function(s) would be called behind the screen
to use the CSV data, populate a list of row dictionaries, modify that list, and then re-write to collected.csv
file to save the results. All of this can be done with objects, minimizing interaction/tedious CSV-processing,
until the user might want to save the collection at the very end.
Classes and the class
statement
Objects in Python are all instances of some class.
A class describes what an object is and what it can do.
It includes the definition of all the methods
that objects of that class can do.
A class is also a type (a kind of data).
Instances of a class are the objects created from the class.
Instances can contain internal data.
For instance (sorry about the pun!), a list has elements,
a dictionary has key/value pairs, etc.
A particular list (an object) is an instance of the class list
(a class).
Classes in Python are defined using the class
statement.
A class
statement is a complete description of how instances of that class
(objects of that class) behave.
Let’s look at a template for a class
statement.
class ClassName: """ <class docstring> """ def __init__(self, arg1, ...): ... def method1(self, arg1, ...): ... def method2(self, arg1, ...): ...
A trivial class
Let’s fill in the blanks for a trivial class:
class Box: """ A Box instance represents a Box with a value inside. """ def __init__(self, value): self.value = value def get_value(self): return self.value def set_value(self, new_value): self.value = new_value
There’s a lot going on here! Let’s cut it down and talk about the pieces separately.
The class
keyword
Classes are introduced using the class
keyword:
class Thing: ...
The name of the class follows the class
keyword,
followed by a colon (:
).
So this defines the Thing
class.
Everything inside the class
statement is indented
(as with most Python statements).
PEP8 Style Notes:
You'll see a few different conventions between a class program and a function-based program. PEP8 is particular on some conventions to make the distinction clearer (we expect you to follow these as well):
- Unlike function and variable names, class names follow PascalCase naming conventions (e.g.
DictWriter
) - Methods still follow lower_case naming conventions (e.g. the method
writer.write_row
- Functions are separated by 2 blank lines; methods in a class definition are still defined with the
def
keyword, but are separated by a single blank line each;pycodestyle
will catch this for you!
The class docstring
The class docstring is the first thing inside the class
statement.
class Box: """ A Box instance represents a Box with a value inside. """ ...
It’s optional, but it’s much better to have it.
It should describe in general terms what the instances
of the class can do, so that someone calling help(Box)
can get a
summary of the class and its methods (also provided in the documentation if
they have valid docstrings).
Method definitions
After the docstring comes one or more method definitions.
class Box: """ A Box instance represents a Box with a value inside. """ def __init__(self, value): ... def get_value(self): ... def set_value(self, new_value): ...
Python method definitions use def
as their keyword like regular functions.
For the most part, these are the same as regular function definitions,
but they are called differently.
self
The first argument to a Python method represents the object being acted on.
By convention, this is called self
(though it doesn’t have to be).
The self
object is kind of like a dictionary,
but it uses the dot syntax instead of dictionary syntax.
The contents of an object are called the object attributes.
If you’ve programmed in Java before, you might think that
The effect of this is that it’s easier to understand Python code than Java code (in our opinion). |
Constructor: __init__
def __init__(self, value): self.value = value
This method makes value
an attribute of the object represented by self
.
The attribute self.value
is assigned to be the same
as the method’s argument called value
.
So this method just stores the value
argument as self.value
.
This method has a funny name: __init__
.
Recall that names in Python that have two initial underscores
and two terminal underscores are "special".
They have a special meaning to Python
e.g. __name__
is the current module’s name.
The __init__
method is what is called the constructor method
for the class (or just the constructor for short).
This method is called when an instance of the class is being created.
It is responsible for initializing the object in whatever way is required.
The __init__
method returns the object that has been constructed
(even though there is no return
statement).
It’s as if it were written:
def __init__(self, value): self.value = value return self # not necessary (and also wrong)
However, if you actually write it like that, it will result in an error once the constructor is called. Constructors must not explicitly return anything!
The constructor is normally where the attributes of the object
are defined and given their initial values.
Here, the object will have one attribute: self.value
,
which is set to be the same as the value
argument.
Adding methods
The rest of the class definition is the following method definitions:
def get_value(self): return self.value def set_value(self, new_value): self.value = new_value
(They are indented inside the class
definition, but we’re showing them
unindented here. We would also add docstrings, but we left them out
for simplicity.)
Method definitions are syntactically almost exactly like function definitions.
The one "difference" is that they have a special first argument (self
),
and when the method is called, the name self
will refer to the object
method is acting on.
(You can think of this as "the current object".)
The get_value
method takes the object as its argument
and returns the value
attribute of the object.
The set_value
method takes the object and a new value as arguments
and changes the value
attribute of the object to new_value
.
Using objects
That’s the end of the definition of class Box
.
Now we need to know how to use it to create Thing
objects
(instances of the class Box
)
and call their methods.
Creating objects
To create a new Box
object,
just use Box
as if it were the name of a function:
>>> t = Box(42)
What does this mean?
Thing
is a class name, but we’re using it like a function!
When you use a class name as if it were a function,
what you are doing is calling the __init__
method of that class.
However, the __init__
method took two arguments (self
and value
),
and this only takes one (42
) -– what’s going on?
To understand this, you need to know that when Python sees a line like this:
b = Box(42)
it translates the line internally into something like this:
b = makeEmptyObject() Box.__init__(t, 42)
(This isn’t exactly what happens, but it’s conceptually correct.)
In other words, Python:
-
creates a new object with no attributes
-
passes it and the argument
42
to the__init__
method of the classBox
The __init__
method itself isn’t responsible
for creating the (initially empty) object.
(Python does that just before __init__
is called.)
Instead, __init__
is responsible for creating and initializing
the attributes of the object
and whatever other initialization might be necessary.
Calling methods
Now that we’ve created the object t
(which is a Thing
), we can call its methods:
>>> b.get_value() 42 >>> b.set_value(101) >>> b.get_value() 101
Again, something a bit weird is happening.
Recall the definition of get_value
:
def get_value(self): return self.value
get_value
was defined to take one argument,
but we called it with no arguments.
Why does this work? The reason is that when we write
t.get_value()
Python translates it to something like:
Thing.get_value(t)
So the object (t
in this case) is always the first argument
to every method call.
Although |
Now recall the definition of set_value
:
def set_value(self, value): self.value = value
When we write
t.set_value(101)
Python translates it into something like:
Thing.set_value(t, 101)
which explains why set_value
takes two arguments even though it’s only called
with one.
One way to think about both of these cases is that the t.
in t.get_value()
and t.set_value(101)
is an extra argument which is moved from the place it
would occupy in a function call to before the dot in the dot syntax.
So methods are really just functions with
-
a special call syntax (the dot syntax)
-
an object as the implicit first argument
The first argument (usually called self
) is the object
before the dot in the method call.
t.get_value()
means that the get_value
method
from the Thing
class
is called on the Thing
object t
.
Attributes
Our Thing
object has one attribute: value
.
We can directly access it if we want to:
>>> t.value 101
We can also change its value directly:
>>> t.value = 999 >>> t.value 999
Directly accessing attributes isn’t always a good idea, though it’s pretty common Python practice. Directly changing attribute values is usually a very bad idea. Attributes should be considered to be the private state of an object ("private" means "used only by the object’s methods"). There are ways to restrict access to attributes, which we will see later, but Python is still not as restrictive as e.g. Java when it comes to accessing attributes. [5]
Next time
In the next reading, we’ll continue with more examples of more complex classes, with an overview of strategies to design and use classes in the programs we've already been writing.
[End of reading]