Overview
In this unit we will discuss classes and object-oriented programming. This is one of the most important topics in the entire course.
Topics
- 
Objects 
- 
The classstatement
- 
Creating your own objects 
- 
The __init__constructor method
- 
Defining and calling methods 
Objects
Since the beginning of the course, we’ve been working with objects:
- 
strings 
- 
lists 
- 
dictionaries 
- 
tuples 
- 
files 
- 
canvases in tkinter
All of these kinds of objects have either been built in to Python or have been defined in Python’s standard libraries.
We’ve used an operational definition of what an "object" is. It’s some kind of thing that has some kind of internal data (called its state), and is something we can call methods on. A method is like a function, but it acts on an object (and may take other arguments). Methods use the "dot syntax":
object.method(arg1, arg2, ...)
An example is the append method on lists:
>>> lst = [1, 2, 3, 4, 5] >>> lst.append(42) >>> lst [1, 2, 3, 4, 5, 42]
Here, lst is the object (a list),
append is the name of the method,
and 42 is the method’s (only) argument.
Methods vs. functions
We might ask: why not just have append be a regular Python function?
Then we could write this:
>>> lst = [1, 2, 3, 4, 5] >>> append(lst, 42)
What’s wrong with this?
- 
Problem 1: If appendwere a function, it would have to change the list argument passed to it. Usually, we don’t like to do this. (Functions are easier to manage if we don’t change the arguments to the function.) Havingappendas a method on a list object suggests that it’s OK forappendto change the list that it’s acting on. You’re "appending to this list (lst)", and the argument42is "the thing you’re appending".
- 
Problem 2: appendas a function can only work on one kind of data (say, on lists). We might have some other kind of data (say, aThing) that has a different way of appending. In that case, we would need to use a different function with a different name. It’s annoying to have to come up with a completely new name for a similar kind of action. (For instance,append_Thinginstead of justappend.) If you have a lot of different types of data with similar behaviors, this can make programs cluttered and hard to read.
- 
Problem 3: This will only work if the internal state of the objects (lists here) can be changed by any function. We may want to allow only particular functions to change the internal state. More specifically, we may only want to allow methods (which in some sense "belong" to the object) to change the internal state. 
So methods have some (conceptual, readability, convenience, safety) advantages over functions in some cases. Functions have one big advantage so far: we know how to define them! Methods are inextricably tied to objects, so in order to talk about defining methods, we have to talk about defining new kinds of objects.
Objects and internal data
Objects can store internal data
(we saw this with tkinter event objects and their x and y attributes).
Sometimes we would like to create special kinds of objects to store particular kinds of data, and define new methods to interact with that data in object-specific ways. Creating new object types will allow us to do that.
Running example
We have spent a lot of time discussing the matplotlib.pyplot plotting library.
matplotlib is mostly object-oriented.
Some things in plt (Figures (e.g. fig), Axes (e.g. ax), Line2D, etc.)
are objects and have methods (for example, ax.plot is a method belonging to an Axes to plot
the provided data given optional keyword arguments).
Some things (shapes on canvases) are not "objects" in the Python sense, so they must be handled indirectly using "handles" and methods.
This distinction in matplotlib.pyplot between
- 
true objects, and 
- 
things that seem like they should be objects but aren’t 
can be confusing, which is why El touched on the "Implicit" vs. "Explicit" patterns in Matplotlib.
Fortunately, Matplotlib's caveat on these two approaches (the "explicit" approach we've been using 
        is predominantly object-oriented) makes it easier for programmers to respect one approach over the other
        without confusing library-wide functions (e.g. plt.show()) with object methods (e.g. the explicit ax.plot call on an 
        Axes object that manages a single 2D plot), which is preferred over matplotlib.pyplot.plot (
            the less reliable/generalizable implicit approach, which uses the plot function belonging to the
            library, which you are relying on plt to figure out what to do with; that gets less reliable when you have
            multiple plots that you want to customize after plotting the data).
    So why is one approach recommended over the other? Whenever you are working with state,
    it is much easier to isolate state for a single entity (a Figure, Axes, etc.)
    that can be interfaced with using methods attached to that object.
    In particular, you can think of a method as a function on an object having state.
    If you recall from Week 2, we talked about the "." dot syntax to identify a method vs. a function;
    The "." is used on a variable (an object) that has state we can access, as opposed to calling a function
    relying only on arguments passed to it.
lst = [1, 2, 3] # a list object x = 1 # an int value, not an object # The value 4 is appended to the end of the list, using its defined `append` method # Under the hood, an object like a list manages its state internally, and changes as a result # of method call lst.append(4) print(x) # print is a function that takes x as a parameter, it is "global" and not attached to any object
    In previous offerings of CS 1, we used a graphics library tkinter which was less 
    consistent with its notion of objects and functions. One approach to improve the programs
    we worked with was to add our own classes to wrap around tkinter drawing functionality (e.g. 
    creating a Circle and Line class for drawing circles and lines).
    But luckily, we don't have to worry about that as much with pyplot, as long as you
    leverage its "explicit" object-oriented approach with Axes for managing plots instead of the plt
    plotting functions. In fact, Matplotlib's object-oriented approach isn't too far off from the
    the classes you would often write manually in tkinter!
    There are more motivations of object-oriented programming in CS 1 which can simplify some 
    of the code we've already done in MPs/Lectures/Labs. For example, we gave you a Move class in MP4
    to factor out a lot of functionality you could interact with for each Move once you constructed the object.
MP4 is a very motivating example of object-oriented programming, particularly after you've implemented it without classes. Why? Well, when you ask whether a program should be object-oriented or not, one of the most important factors in that decision is whether you are managing state that is otherwise tedious to manage in an entire program.
    In MP4, the "state" we managed 
    was information about Pokemon in pokedex.csv and collected Pokemon in collected.csv.
    Each time a user wanted to use a feature of the program, a function(s) would be called behind the screen
    to use the CSV data, populate a list of row dictionaries, modify that list, and then re-write to collected.csv
    file to save the results. All of this can be done with objects, minimizing interaction/tedious CSV-processing,
    until the user might want to save the collection at the very end.
Classes and the class statement
Objects in Python are all instances of some class.
A class describes what an object is and what it can do.
It includes the definition of all the methods
that objects of that class can do.
A class is also a type (a kind of data).
Instances of a class are the objects created from the class.
Instances can contain internal data.
For instance (sorry about the pun!), a list has elements,
a dictionary has key/value pairs, etc.
A particular list (an object) is an instance of the class list (a class).
Classes in Python are defined using the class statement.
A class statement is a complete description of how instances of that class
(objects of that class) behave.
Let’s look at a template for a class statement.
class ClassName:
    """
    <class docstring>
    """
    def __init__(self, arg1, ...):
        ...
    def method1(self, arg1, ...):
        ...
    def method2(self, arg1, ...):
        ...
A trivial class
Let’s fill in the blanks for a trivial class:
class Box:
    """
    A Box instance represents a Box with a value inside.
    """
    def __init__(self, value):
        self.value = value
    def get_value(self):
        return self.value
    def set_value(self, new_value):
        self.value = new_value
There’s a lot going on here! Let’s cut it down and talk about the pieces separately.
The class keyword
Classes are introduced using the class keyword:
class Thing:
    ...
The name of the class follows the class keyword,
followed by a colon (:).
So this defines the Thing class.
Everything inside the class statement is indented
(as with most Python statements).
PEP8 Style Notes:
You'll see a few different conventions between a class program and a function-based program. PEP8 is particular on some conventions to make the distinction clearer (we expect you to follow these as well):
- Unlike function and variable names, class names follow PascalCase naming conventions (e.g. DictWriter)
- Methods still follow lower_case naming conventions (e.g. the method writer.write_row
- Functions are separated by 2 blank lines; methods in a class definition are still defined with the defkeyword, but are separated by a single blank line each;pycodestylewill catch this for you!
The class docstring
The class docstring is the first thing inside the class statement.
class Box:
    """
    A Box instance represents a Box with a value inside.
    """
    ...
It’s optional, but it’s much better to have it.
It should describe in general terms what the instances
of the class can do, so that someone calling help(Box) can get a
summary of the class and its methods (also provided in the documentation if 
they have valid docstrings). 
Method definitions
After the docstring comes one or more method definitions.
class Box:
    """
    A Box instance represents a Box with a value inside.
    """
    def __init__(self, value):
        ...
    def get_value(self):
        ...
    def set_value(self, new_value):
        ...
Python method definitions use def as their keyword like regular functions.
For the most part, these are the same as regular function definitions,
but they are called differently.
self
The first argument to a Python method represents the object being acted on.
By convention, this is called self (though it doesn’t have to be).
The self object is kind of like a dictionary,
but it uses the dot syntax instead of dictionary syntax.
The contents of an object are called the object attributes.
| If you’ve programmed in Java before, you might think that  
 The effect of this is that it’s easier to understand Python code than Java code (in our opinion). | 
Constructor: __init__
def __init__(self, value):
    self.value = value
This method makes value an attribute of the object represented by self.
The attribute self.value is assigned to be the same
as the method’s argument called value.
So this method just stores the value argument as self.value.
This method has a funny name: __init__.
Recall that names in Python that have two initial underscores
and two terminal underscores are "special".
They have a special meaning to Python
e.g. __name__ is the current module’s name.
The __init__ method is what is called the constructor method
for the class (or just the constructor for short).
This method is called when an instance of the class is being created.
It is responsible for initializing the object in whatever way is required.
The __init__ method returns the object that has been constructed
(even though there is no return statement).
It’s as if it were written:
def __init__(self, value):
    self.value = value
    return self  # not necessary (and also wrong)
However, if you actually write it like that, it will result in an error once the constructor is called. Constructors must not explicitly return anything!
The constructor is normally where the attributes of the object
are defined and given their initial values.
Here, the object will have one attribute: self.value,
which is set to be the same as the value argument.
Adding methods
The rest of the class definition is the following method definitions:
def get_value(self):
    return self.value
def set_value(self, new_value):
    self.value = new_value
(They are indented inside the class definition, but we’re showing them
unindented here.  We would also add docstrings, but we left them out
for simplicity.)
Method definitions are syntactically almost exactly like function definitions.
The one "difference" is that they have a special first argument (self),
and when the method is called, the name self will refer to the object
method is acting on.
(You can think of this as "the current object".)
The get_value method takes the object as its argument
and returns the value attribute of the object.
The set_value method takes the object and a new value as arguments
and changes the value attribute of the object to new_value.
Using objects
That’s the end of the definition of class Box.
Now we need to know how to use it to create Thing objects
(instances of the class Box)
and call their methods.
Creating objects
To create a new Box object,
just use Box as if it were the name of a function:
>>> t = Box(42)
What does this mean?
Thing is a class name, but we’re using it like a function!
When you use a class name as if it were a function,
what you are doing is calling the __init__ method of that class.
However, the __init__ method took two arguments (self and value),
and this only takes one (42) -– what’s going on?
To understand this, you need to know that when Python sees a line like this:
b = Box(42)
it translates the line internally into something like this:
b = makeEmptyObject() Box.__init__(t, 42)
(This isn’t exactly what happens, but it’s conceptually correct.)
In other words, Python:
- 
creates a new object with no attributes 
- 
passes it and the argument 42to the__init__method of the classBox
The __init__ method itself isn’t responsible
for creating the (initially empty) object.
(Python does that just before __init__ is called.)
Instead, __init__ is responsible for creating and initializing
the attributes of the object
and whatever other initialization might be necessary.
Calling methods
Now that we’ve created the object t
(which is a Thing), we can call its methods:
>>> b.get_value() 42 >>> b.set_value(101) >>> b.get_value() 101
Again, something a bit weird is happening.
Recall the definition of get_value:
def get_value(self):
    return self.value
get_value was defined to take one argument,
but we called it with no arguments.
Why does this work?  The reason is that when we write
t.get_value()
Python translates it to something like:
Thing.get_value(t)
So the object (t in this case) is always the first argument
to every method call.
| Although  | 
Now recall the definition of set_value:
def set_value(self, value):
    self.value = value
When we write
t.set_value(101)
Python translates it into something like:
Thing.set_value(t, 101)
which explains why set_value takes two arguments even though it’s only called
with one.
One way to think about both of these cases is that the t. in t.get_value()
and t.set_value(101) is an extra argument which is moved from the place it
would occupy in a function call to before the dot in the dot syntax.
So methods are really just functions with
- 
a special call syntax (the dot syntax) 
- 
an object as the implicit first argument 
The first argument (usually called self) is the object
before the dot in the method call.
t.get_value() means that the get_value method
from the Thing class
is called on the Thing object t.
Attributes
Our Thing object has one attribute: value.
We can directly access it if we want to:
>>> t.value 101
We can also change its value directly:
>>> t.value = 999 >>> t.value 999
Directly accessing attributes isn’t always a good idea, though it’s pretty common Python practice. Directly changing attribute values is usually a very bad idea. Attributes should be considered to be the private state of an object ("private" means "used only by the object’s methods"). There are ways to restrict access to attributes, which we will see later, but Python is still not as restrictive as e.g. Java when it comes to accessing attributes. [5]
Next time
In the next reading, we’ll continue with more examples of more complex classes, with an overview of strategies to design and use classes in the programs we've already been writing.
[End of reading]