Contents
- Basic Concepts, Common Beginner Pitfalls, and Best Practices
- Seriously, learn to use the docs
- Variables and scope
- Assignment
=
operator vs Equality==
operator - Arithemetic
- Be wary of multiplying bracketed expressions
- Indentation makes a difference
- Why doesn't my code do anything?
print
vsreturn
while
vsfor
loops- Single vs Double-quoted strings
- Comparisons and 'Falsey' Values
- Appending to Lists
- Python passes object-references by value
- Weird patterns I've seen (please don't do this)
- Numeric Types
- Strings
- Flow Control
- User Input
- Functions
- Iterables (String, List, Tuple, Set)
- Indexing and Slice Notation
- Iteration and Reverse Iteration
- Get the Length
- Get the Max and Min
- Sum an iterable
- Sorting an iterable
- Reversing an iterable
- Rearranging a set of iterables - getting the ith element into the ith iterable (like transposing a matrix, or pairing two lists)
- Checking if all items in an iterable are True
- Checking if any items in a iterable are True
- Mapping Lists (applying a function onto all elements of the list)
- Filtering Lists (getting certain values that fulfill some condttion)
- List Comprehension
- Checking if an Iterable contains a certain element
- Iterating through Lists
- Iterating with the index
- When to use Tuples versus Lists
- Nifty String Methods
- Nifty List Methods
- Nifty Tuple Syntax
- Nifty Set Operations
- Mappings (Dictionary)
- File IO
- Object-Oriented Programming (OOP)
- State Machines
No internet access during exams. Your memory isn't infallible. Your notes aren't perfect. The docs will be hosted on a local server. Please please please learn to use them.
In all honesty, the Python docs aren't the best. Which makes it all the more important that you familiarize yourself with them so you know where to search for stuff when you need to search for stuff. (To prove my point, why not try typing string methods
into the "Quick Search" bar and see whether you intuitively know which result to click on to get what you want)
These pages will be your best friends:
If you need to look up functions like input()
, sum()
, int()
, max()
, round()
etc.
If you need to look up string (str
) methods, list
methods, dictionary (dict
) methods and operations, and file
methods
More explicit documentation on list
methods and list comprehension
See, it ain't that hard to learn to use the docs.
You can't just have values floating around; you need handles on them. That's what variables are, names that are bound to some value.
I got this nice imagery from Eloquent Javascript: think of variables as octopus tentacles grasping onto values. They are not the values themselves. You can reassign variables, so you have the same tentacle (name) grapsing (bound to) a different value.
x = 10
x = 2
Pictorially:
![variables are tentacles][variable-octopus] [variable-octopus]:https://raw.githubusercontent.com/tjjjwxzq/Digital-World-TA-Materials/master/Python-2.7-Notes/variable%20octopus.jpg
This mental model is important in understanding how python is pass by object reference.
Also python has this cool feature of multiple assignment:
x,y,z = 1,2,3
print x # prints 1
print y # prints 2 print z # prints 3
# naise
Another important concept is variable scope. This is the region where python will look for variable bindings when you reference a variable:
# top-level, global scope
x = 10
# I can access the variable x anywhere in my script
# because it is a global variable
# this is still in global scope
print x
# this is within a local function scope
# but if python can't find anything in the local scope
# it will check the for any enclosing function scopes
# and then the global scope
def f():
print x # x here is found in the global scope
f() # prints 10
def g():
x = 3
def h():
print x # this finds x in the enclosing function scope
h()
g() # prints 3
# this is again a local function scope
# but within the local scope I have defined another variable x
# this is different from the global variable x
# and won't affect the value of the global variable
def g():
x = 2
print x # this x is found in the local function scope
g() # prints 2
print x # prints 10, this is the global variable x
# code blocks like functions and classes introduce a new scope
class Dog:
x = 0
# this is again a different x
# unlike function blocks, the scope of class blocks
# do not extend to any nested blocks
def woof(self):
print x # this finds x in the global scope, not in the class scope
def woof2(self):
print self.x # this references the x defined in the class scope
dog = Dog()
dog.woof() # prints 10
dog.woof2() # prints 0
I have taken the liberty of giving single character variable and function names for demonstration purposes but this is not in general a good A VERY BAD idea. You should always always always name your variables something descriptive (though not tediously long). Unless you are writing a very short and simple program and know what you are doing (or such single character names are conventional, as in looping variables i
,j
,k
), littering your code with crappy variable names like a
, b
, c
,d
, x
, y
, z
, and even worse, reusing those same crappy variable names for different and completely unrelated purposes is a surefire way to screw yourself over, start a debugging nightmare, and piss off anybody who has to read your code (and they do mark your code manually, so if you want partial credit...).
Also, you will make python sad.
YOU DON'T WANT TO MAKE PYTHON SAD, DO YOU????
The assignment =
operator is for assigning a value to a variable.
The equality ==
operator compares to values and returns a boolean value (True
or False
).
You really shouldn't be confusing them at this point.
Remember that in python 2x, dividing int
s with the /
operator does integer division:
print 3/4 # prints 0
print 3.0/4 # prints 0.75
Make sure you make one of the values a float if you want normal division.
Yes, when you write algebraic expressions down on the pen and paper and put bracketed stuff side by side, we know the multiplication sign is implied. Python is not as smart as you (sorry):
print (10+5)(10-5) # throws TypeError, `int` object is not callable
# bonus points for figuring out why
# otherwise just don't do this
# you forgot your * d'oh
print (10+5)*(10-5) # prints 75
This is a very good article on the issue
The upshot is that we count with 10 fingers while our computer counts with 2 (stupid computer!), and you can't exactly represent base 10 numbers in binary, so python tries its best. Still, the errors can add up, as some of you might have realized when you've tried to iteratively increment the value of a float, and realized that the end result is a little off from the expected.
This has consequences when comparing floats. In general, either round your floats to the precision you care about, or don't compare for direct equality, but rather a range within a threshold you care about:
print round(4.9999987214,2) == 5.0 # prints True
x = 4.999827213
print abs(x-5.0) <= 0.005 # prints True
In python indentation is syntactically significant.
So there was this question about checking whether a number is prime:
# what's the difference between this
def is_prime(num):
if num >0 and num <= 3:
return True
for i in range(2,num):
if num%i ==0:
return False
else:
return True
# and this?
def is_prime2(num):
if num >0 and num <= 3:
return True
for i in range(2,num):
if num%i == 0:
return False
return True
Better make sure it's not something facepalm-worthy before you ask...
So you happily define a function, run your code, and are baffled when nothing happens.
Remember, functions are factories right? When you defined a function, you built the factory. But if you don't order from factory, why are you even expecting anything?
If you want something from your factory, you have to order from it. If you want something from your function, you have to call it. How do you call a function?
functionname(arguments)
The parentheses are key to calling the function. (And get the number of arguments correct too, or python will throw an error)
Okay, you defined your function and you called it. But your program still isn't doing anything!
Except that it is. If only your vision could penetrate into that 6th Generation Intel Core i7 processor and see how hard its transistors are chugging away to evaluate your if
s and for
s and +
s and -
s. Alas, as a substitute for being Clarke Kent, you have to consign yourself to putting print
statements in your code.
So that, you know, you can actually see the stuff your program is outputting. We all know the interactive console/commandline is lamer than supervision, but it's all we've got.
This is a very common misconception. "But aren't print
and return
the same thing???"
![python says no][python-no] [python-no]:https://raw.githubusercontent.com/tjjjwxzq/Digital-World-TA-Materials/master/Python-2.7-Notes/angry-python.jpg
Observe:
def f(x):
print x
# this function doesn't have an explicit return statement
# so it returns None by default
f(2) # prints 2
print f(2)
# prints
# None
# 2
# This is because print prints the value of the expression passed to it
# when we call a function, it evaluates to its return value
# So in this case, we printed the return value of None
# and when the function was called the print statement within
# the function was executed and printed 2
# see how this is different
def g(x):
return x
g(2) # you won't see any output on your console
print g(2) # prints 2
When should you use one over the other?
Generally, we use while
loops when we don't know beforehand how many times we need to loop, but we keep checking the looping condition in each loop and exit when the condition becomes false.
Use for i in range(numloops)
when you already know the number of times you need to loop (numloops
). Also you can use for
to iterate over iterables like lists, strings, tuples, dictionaries, files etc. as well.
There is no difference between using either in Python (unlike for example Ruby or Java). It might be slightly more convenient to use double-quotes for strings like this:
"I'm using double-quotes 'cos my string has single-quotes in it!"
# though you could just escape those single quotes like so
'I\'m using single-quotes so I have to escape them!'
Unlike some languages(eg. Java), you can test conditions on non-Boolean values as well:
astring = ''
if not astring:
astring = "something"
print astring # prints something
The empty string evaluated to a false value in the condition. Python has this notion of 'truthy' and 'falsey' values which you can substitute for the value of True
and False
, which is convenient, as you've seen above.
None
, False
, zero of any numeric type (0
,0L
,0.0
,0+0j
), empty sequences (strings, lists, tuples) or mappings (dictionaries) (""
, []
, ()
, {}
) evaluate to false. Everything else evaluates to true*.
* This is not strictly speaking correct, but good enough for now. For the full picture, head to the docs
Just be wary of the difference between list.append(item)
and alist += item
. The latter is equivalent to list.extend(item)
, ie. item
has to itself be a list.
a = [1,2,3]
a.append(4)
print a # prints [1,2,3,4]
a += 5 # throws TypeError: 'int' object is not iterable
a.extend(5) # throws TypeError: 'int' object is not iterable
a += [5] # make sure you are adding a list
a.extend([6]) # likewise
print a # prints [1,2,3,4,5,6]
# if you want to add in sublists, then use append
a.append([7])
print a # prints [1,2,3,4,5,6,[7]]
This is by far the trickiest concept to grasp. This is the reason you can inadvertantly end up mutating a list that you've passed to a function, or assigned to another variable (innocently thinking that you've passed a copy and that original list will be untouched). This is why you have to explicity copy your lists when you pass them around, if you don't want to mess up your original list:
a = [1,2,3]
b = a # let me operate on b so I don't end up changing a, because I might need it later
b[0] = 0
print b # prints [0,2,3]
print a # prints [0,2,3]; oops, a was changed as well!
# remedy this by explicitly copying your list
a = [1,2,3]
b = a[:] # remember this slice notation returns a copy of the whole list
b[0] = 0
print b # prints [0,2,3]
print a # prints [1,2,3] # that's more like it
This concept lies pretty closely with the notion of mutable and immutable types in python, so let's use our octopus tentacles to dissect it.
![tentacleception][tentacleception] [tentacleception]:https://raw.githubusercontent.com/tjjjwxzq/Digital-World-TA-Materials/master/Python-2.7-Notes/tentacleception.jpg
So, now you know why, to truly copy a list with nested lists/dictionaries/sets/any mutable type, you have to use deepcopy()
:
import copy
# no copying
a = [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b = a # no copy
b[0] = 0 # you changed a as well
print a # prints [0,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
# shallow copying
a = [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b = a[:] # shallow copy, equivalent to copy.copy(a)
b[0] = 0 # you didn't change a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b[-1][3] = 4 # now you changed a, because b contains a reference to the same dictionary as a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:4}]
# also to note
# do you think you can do this:
# b[3][0] = 0
# deep copying
a = [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b = copy.deepcopy(a) # deep copy, copies everything, even with multiple levels of nesting
b[0] = 0 # you didn't change a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
b[-1][3] = 4 # you still didn't change a
print a # prints [1,2, [1,2,3], (1,2,3), {1:1, 2:2, 3:3}]
# whew, safe at last
You should now know why to be wary when passing lists or mutable types into functions as arguments. You may inadvertantly end up modifying your original list in your function, because python passes object-references by value, not the object itself. With immutable types this is okay, because you can't change them anyway. With mutable types, this means a lot of weird shit can happen if you're not careful. You have been warned.
If you need more explanation, here's a pretty good (albeit tentacle-less) article.
Why you do this:
def somefunc():
# do stuff
def anotherfunct():
# do completely unrelated stuff
There are good reasons to use nested functions, none of which you will encounter at this point.
Please, don't do this.
Why you do this:
if somecondition:
def somefunc():
# do stuff
If you're doing this, I think you're confused and actually want the conditional inside the function body.
There are good reasons to conditionally define functions, none of which you will encounter at this point.
Please, don't do this.
Why you do this:
def somefunc():
pass
x = 10
print x
...
...
# do stuff
Why you do this:
if somecondition:
pass
else:
# do stuff
You realize you could have just done this:
if not somecondition:
# do stuff
right?
There are no good reasons for superfluous pass statements.
Please, get rid of them.
Python has four numeric types: int
, long
float
and complex
. To specify the numeric types:
x = 10 # x is an int
x = 10L # x is a long
x = 10.0 # x is a float
x = 10+0j # x is a complex
You can carry out arithmetic operations with a mixture of these types, and python will return a result with the broadest type (complex
is broader than float
is broader than long
is broader than int
)
The arithmetic operations are:
# x and y are variables with some numeric value
x + y # addition
x - y # subtraction
x * y # multiplication
x / y # division; note that if x and y are ints, this will be integer division which is equivalent to floored division
x // y # floored division
x % y # remainder of divison; modulo
x ** y # x to the power of y
(You may be wondering about the other ways you can go about getting x
to the power of y
. What about pow(x,y)
and math.pow(x,y)
?. If you're confused just stick with x ** y
)
To get the absolute value, use the built-in function abs()
:
x = -10
print abs(x) # prints 10
To round off floating point values, use the built-in function round()
:
x = 10.575
print round(x,2) # prints 10.57; not exactly what we expect
# because of limitations of floating point arithmetic
To convert between numeric types, use the built-in functions int()
, long()
, float()
, and complex()
:
x = 15.6
print int(x) # prints 15
print long(x) # prints 15L
print complex(x) # prints (15.6 + 0j)
You can also use those functions to convert a string literal into those numeric types, provided the string literal is valid (or ValueError
will be thrown):
s = "123"
print int(s) # prints 123
print float(s) # prints 123.0
print long(s) # prints 123L
print complex(s) # prints (123+0j)
s = "abc"
print int(s) # throws ValueError
s = "14 + 5j"
print int(s) # throws ValueError
print complex(s) # throws ValueError; the spaces make a difference!
s = "14+5j"
print complex(s) # prints (14+5j)
(This section is on things specific to strings. For operations that you can apply to strings as iterables, see Iterables)
String concatenation and 'multiplication':
s = ""
s += "abc"
print s # prints abc
s = "4"
s *= 3
print s # prints 444
Checking whether one string contains another substring:
# say I want to count the number of vowels in a long string
count = 0
for char in longstring:
if char in "aeiou":
count += 1
Use if
statements to test conditions. This can be followed by zero or more elif
s and optionally an else
.
programmingisfun = True
programmingisboring = False
if programmingisfun:
print "Yay"
elif programmingisboring:
print "But why"
else:
print "So what do you think of programming?"
Repeat a block of code until the while
condition becomes false:
suxatprogramming = True
level = 0
while suxatprogramming:
# practicepracticepracticepractice
level += 1
if level >= 1000:
suxatprogramming = False
Iterate over an iterable/sequence. The basic idiom is to iterate over a list returned by the range()
function, if you know beforehand the number of types you need to loop.
numloops = 10
for i in range(numloops):
# do stuff
break
breaks out of the smallest enclosing for
or while
loop. You usually use it when you achieve a condition that makes you want to prematurely terminate the loop or together with a while True
loop:
# breaking from a for loop
newnumlist = []
oldnumlist = [1,3,5,12,30,13,4,56,2]
for num in oldnumlist:
newnumlist.append(num)
if sum(newnumlist) > 10:
break
print newnumlist # prints [1,3,5,12]
# breaking from a while True loop
# typically you use this idiom when there are many potential conditions
# you want to test for. So instead of testing for the conditions after
# the while loop statement, test them in the loop body and break
from random import randint
while True:
x = randint(1,6)
y = randint(1,6)
if x == 6:
break
if x + y == 5:
break
continue
allows you to move on to the next iteration of the loop even before the current iteration has completed fully:
astring = "abcdefg"
for char in astring:
if char == "d":
continue
print char
# prints
# a
# b
# c
# e
# f
# g
This statement does nothing. Use it as filler when you don't need your code to do anything but you syntactially need a statement. Commonly used in the skeleton code given to you:
def yourfunction():
pass
# add your code here and please remove the pass afterwards
If it's not there python will scream about a SyntaxError
.
Handy when you are in the midst of building up a complex program, and are thinking at and abstract level about the kinds of functions you'd need to implement. You can just put down your function definitions while leaving the body empty with a pass
statement:
# making a super complex tictactoe game
# prompts user for row number
def prompt_row():
pass
# prompts user for col number
def prompt_col():
pass
# plays piece on board
def play_piece(board, piece, coords):
pass
Not necessary but good to know.
To get user input from standard input, use the built-in input()
or raw_input()
functions.
promptmsg = "Enter something"
x = raw_input(promptmsg)
# x is a string
print x # prints whatever the user input
The difference between the two functions is that input()
tries to evaluate as if it were Python code:
print raw_input("Hi!")
# say the user types "ghi"
# then ghi is printed
print input("Hi!")
# say the user types "ghi"
# then we get a NameError: name 'ghi' is not defined
# because Python tries to interpret the input as Python code
In otherwords input(msg)
is equivalent to eval(raw_input(msg))
, and there's not much reason to really use it. In fact it's functionality is deprecated in python 3x, with raw_input()
being renamed to input()
.
Think of functions as mobile factories - they take in some kind of input (passed in as arguments to the function) and produce an output (the return value of the function). You can build factories (define functions) with the following syntax:
def functionname(arg1,arg2...):
# function body
# do something
# return something
Functions allow you to compartmentalize and reuse your code. If you find yourself repeating the same or almost similar block of code lots of times, maybe you should encapsulate it in a function.
So now we've got our factory, but it won't do anything unless we order something from it (call the function):
def f(x):
print x
# if I don't call the function I just defined this snippet of code won't do anything!
f(1) # prints 1
You call a function by adding the parenthesis to the function name and passing in whatever arguments it takes. Some functions don't take any arguments, so you just add a set of empty parentheses:
def f():
print "howdy"
f() # prints "howdy"
Note that adding the parenthesis is crucial to actually calling functions. Because in python functions are actually just objects that you can pass around:
def f():
print "howdy"
print f # prints <function f at someidnumber>; nope, it doesn't print "howdy" and None
f() # prints "howdy"
See the difference?
Arguments are the values you pass into a function when you call it. For example:
def f(x):
print x
f("hello") # "hello" is my argument
You also hear people talking about function parameters. What's the difference between arguments and parameters?
Python has a handful of handy features(default arguments, keyword arguments etc) regarding function arguments. Check the docs if you're interested.
Functions return something. To specify the return value use the return
statement:
def f(x):
return x +1
print f(2) # prints 3
If a return
statement is not executed (ie. it is not specifed, or the logical flow of the function code is such that any explicit return
statements are not executed), then functions will return a None
value by default:
def f(x):
x += 1
print f(2) # prints None
Function are considered code blocks in Python meaning they introduce a new scope for any variables defined within it:
# x here is a top-level global variable
x = 10
def f():
# this x is in the function's local scope
x = 2
print x
f() # prints 2
# outside the function body I'm back in global scope
# the variable x I reference here is the global variable
print x # prints 10
You can't access variables local to a function block in the global scope:
def f():
x = 3
print x # this will throw a NameError: name 'x' is not defined
The scope of variables defined in function blocks extends to any code blocks enclosed by that function block, such as nested functions:
def f():
x = 3
def innerf():
# I'm still able to access x here
print x
innerf()
f() # prints 2
(Note that at this point, you have hardly any/no good reasons to use nested functions, so you really shouldn't. Just do all your function definitions in the top-level of your script. If you're really interested, read up on 'closures')
When defining a function the normal way with a def
key, you have give the function an explicit name. That you can use to reference the function in the future. Sometimes, however, you just need to define a short little function and use it only once. Python allows you to use the lambda
keyword to create these anonymous functions. A common use case is passing these anonymous functions as arguments to built-in functions like sorted()
and max()
:
listoftups = [(1,2,3), (2,3), (4,5), (1,3)]
# I want to get the maximum tuple, based on the sum of its first two values
print max(listoftups, key=lambda x: x[0] + x[1]) # prints (4,5)
The lambda
syntax is as follows:
lambda arguments: expression
the value of the expression will be returned by the lambda function.
More examples:
f = lambda x,y: x + y
print f(2,3) # prints 5
Conceptually iterables are python objects you can iterate (ie. loop) over, accessing each of its elements ("it is capable of returning its members one at a time"). The iterables you should be familiar with are sequence types such as lists, tuples and strings, and some other non-sequence types like dictionaries and files (though they will be treated here in their own section).
Basically if you can call a for x in iterable
on it you've got an iterable on your hands.
For a more technically exact definition of python iterables, head to the docs
To get one item in an ordered iterable, specify it by its index (remember that the index starts from 0)
someiterable[i] # get the i'th item of someiterable
Indices can be negative, which means that you count from the back of the iterable. A common idiom is getting the last item in the iterable:
someiterable[-1] # get the last item of someiterable
Note that only string
, list
and tuple
types support indexing (since these iterables are ordered types)
Python's slice notation allows you to get a subset of the iterable's items (eg. part of a list, or a substring)
someiterable[start:stop:step]
The slice syntax is similar to that of the built-in range()
function - you specify the starting index, the stopping index, and optionally, the step. Note that the subset will contain items from and including start
up to but not including stop
(thus up to and including stop-1
)
Examples:
"abcdefg"[2:5] # gets the substring "de"
If the start
is ommitted, the subset starts from the first item:
"abcdefg"[:5] # gets the substring "abcde"
If the stop
is ommitted, the subset goes till (including) the last item:
"abcdefg"[1:] # gets the substring "bcdefg"
A common idiom is to get a copy of a whole list like so:
somelist = ["a","b","c"]
copylist = somelist[:] # gets the a copy of somelist and assigns it to copylist
(Remember it only really makes sense to do this only for mutable iterables like lists, because don't want to modify the original list. Since immutable types like strings and tuples can't be modified, there isn't really such a thing as a "copy of a string/tuple". If string1 = "abc"
and string2 = "abc"
, string1
and string2
point to the same string object)
If either start
or stop
are out of range, the slice will get as many items as is possible:
"abcdefg"[2:19] # gets the substring "cdefg"
"abcdefg"[-10:5] # gets the substring "abcde"
If the slice doesn't end up selecting anything, we get an empty subset:
"abcdefg"[4:3] # gets a empty string ""
We can specify negative indices in slice notation as well:
"abcdefg"[1:-1] # gets the substring "bcdef"
If step
is specified and positive, we get a subset of items with the indices: [start, start + step, start+2*step...]
till the largest start + i*step
smaller than stop
:
"abcdefg"[::2] # gets the substring "aceg"
"abcdefg"[1::2] # gets the substring "bdf"
"abcdefg"[:-1:3] # gets the substring "ad"
If step
is negative, we get a subset of items with the indices: [start, start + step, start +2*step... ]
till the smallest start + i*step
larger than stop
"abcdefg"[::-1] # gets the reversed string "gfedcba"
"abcdefg"[-1::-2] # gets the substring "geca"
"abcdefg"[4:-4:-1] # gets the substring "e"
You can only assign to slices of mutable types (like lists!) You can only assign iterable types to the slice. And the length of slice doesn't necessarily have to be the same as the length of the iterable you assign it to. Basically imagine cutting away that slice and pasting in the assigned iterable, whatever the length is. For example:
a = range(10) # range(10) returns the list [0,1,2,3,4,5,6,7,8,9]
a[3:6] = ["a","b","c"] # slice and assigned iterable of the same length, now a becomes [0,1,2,3,"a","b","c",7,8,9]
a[3:6] = ["a","b"] # now a becomes [0,1,2,3,"a","b",7,8,9]
a[3:6] = ["a","b","c","d","e"] # now a becomes [0,1,2,3,"a","b","c","d","e",7,8,9]
a[3:6] = ("a","b") # we can assign a tuple too: now a becomes [0,1,2,3,"a","b",7,8,9]
a[3:6] = "ab" # we can assign a string as well: now a becomes [0,1,2,3,"a","b",7,8,9]
Assigning to an empty slice is a quick way to insert values into a list:
a = range(10)
a[4:3] = "abc" # now a becomes [0,1,2,3,"a","b","c",4,5,6,7,8,9] (the assigned iterable is inserted into the start index)
for item in iterable:
# do stuff
# to reverse use the built-in function reversed()
for item in reversed(iterable):
# do stuff
len(iterable)
max(iterable)
min(iterable)
If you want to specify your own ordering function, you can pass in the key
argument: max(iterable, key=somefunction)
For example
max(d, key=lambda k: d[k])
By passing in a lambda
(anonymous) function as the key
argument, I'm asking max
to call the function passed to the key argument on the iterable elements before finding the maximum. So I'm basically finding the key that corresponds to the maximum of dictionary's values.
Note that to get the maximum of the dictionary's keys, I can just call max(d)
(calling d.keys()
, though intuitive, is superfluous)
Another common idiom is to get the maximum string by length
max(str1,str2,str3,...strn, key=len) # remember len() is a built-in function
Otherwise max(str1,str2,str3...strn)
normally returns the max by lexicographical order
sum(iterable, [start])
Used to sum a list of numbers (and it won't work on a list of strings, use the more efficient ''.join(listofstrings)
instead).
You can speficy an optional argument start
as a value to add to the sum (defaults to 0).
There are two ways to go about it (for lists) - use the built-in sorted()
function:
sorted(alist,[key,[reverse]])
or use the list method list.sort()
:
alist.sort([key,[reverse]])
The optional keyword arguments key
and reverse
are the same for both sorted()
and list.sort()
, and can be quite handy. The difference between the two is that sorted()
returns a copy of the sorted list while list.sort()
modifies the list in-place and returns None
. No real reason to use one over another, as long as you know what you're doing. Maybe sorted()
is a teeny bit more convenient.
The biggest difference is probably that sorted()
can be called on any iterable (strings, tuples, sets, dictionaries, files), while list.sort()
can (obviously) be only called on lists. But sorted()
always returns a list.
(If you're wondering how calling sorted()
on a dictionary or a file works, the former basically returns a sorted list of the dictionary keys, the latter a sorted list of lines in the file)
Examples:
alist = ["def", "ghi", "abc"]
print sorted(alist) # prints ["abc", "def","ghi"]
print alist # prints ["def", "ghi", "abc"]; the original list has not been modified
print alist.sort() # prints None
print alist # prints ["abc", "def", "ghi"]; the original list has been modified
The key
and reverse
keyword arguments are useful to know. Say you want to sort a list of strings, but not by the default lexicographical comparison but by their lengths, then you could do:
strings = ["efg", "hi", "abcd"]
# normal sorted() without arguments sorts lexicographically
print sorted(strings) # prints ["abcd", "efg", "hi"]
# but I can sort by string length instead
print sorted(strings, key=len) # prints ["hi", "efg", "abcd"]
Basically the key
argument specifies a function that is called on each item in the iterable and returns a value that is used to compare each item. If key
is not specified the item itself is used directly for comparison.
The reverse
keyword argument is clearly for reversing the sort:
astring = "abcdefg"
print sorted(astring) # prints ["a","b","c","d","e","f","g"]
print sorted(astring, reverse=True) # prints ["g","f","e","d","c","b","a"]
# if you want to get back a string, just use str.join()
print "".join(sorted(astring)) # prints "gfedcba"
There are a few ways. First using sorted(reverse=True)
:
tup = (12,5,12,4,55,2)
print sorted(tup, reverse=True) # prints [55,12,12,5,4,2]
Second calling list.reverse()
on a list (remember this works only for lists!):
alist = [12,5,12,4,55,2]
# like list.sort(), list.reverse() modifies the list in place and returns None
print alist.reverse() # prints None
print alist # prints [55,12,12,5,4,2]
Third, and probably the most elegant, using slice notation:
astring = "abcdefg"
print astring[::-1] # prints "gfedcba"
(Just note that sorted()
is still the most general since it can be called on dictionaries and file objects as well, though those are very uncommon use cases. And it always returns a list.)
Never use the reversed()
built-in function, which returns a reversed iterator, which is most of the time not what you want. Only use reversed()
for iteration like so:
astring = "adfsdfsdf"
for char in reversed(astring):
print char
print reversed(astring) # prints <reversed object at someidnumber >; nope, totally not what you were expecting
See this PEP for more on reverse iteration.
Rearranging a set of iterables - getting the ith element into the ith iterable (like transposing a matrix, or pairing two lists)
Use zip(iterable1, iterable2...)
which returns a list of tuples (which can be easily converted into a list of lists, if desired, using something like [list(tup) for tup in zip(iterable1,iterable2...)]
)
Here's a potential use-case: suppose you have a list of items and you want to merge every other item together. For example:
strings = ["hi", 1, "bye",2, "sad", 5, "angry", 6]
# suppose I want to transform strings into this list: ["hi1", "bye2", "sad5", "angry6"]
strings = zip(strings[::2], strings[1::2]) # this is the same as zip(["hi","bye","sad","angry"], [1,2,5,6])
# now strings is [("hi",1), ("bye",2), ("sad",5),("angry",6)]
strings = [tup[0] + str(tup[1]) for tup in strings]
# now strings is ["hi1", "bye2", "sad5", "angry6"]
all(iterable) # returns True if all items in iterable are True
The items do not literally have to be the boolean values True
or False
, since python has the notion of truthy and falsy values. For example, I have a list of strings, and I want to check if all of them are non-empty:
strings = ["acb", "" ,"adf", "sdfs"]
if all(strings):
print "No empty strings!"
# nothing gets printed
Or maybe you want to check if all numbers in a list are equal to a certain value:
numlist = [0,0,0,0,0]
print all([num == 0 for num in numlist]) # prints True
# note that the following is different
all(numlist) # prints False, since 0 is a falsey value
any(iterable) # returns True if at least one item in the iterable is True
The items do not literally have to be the boolean values True
or False
, since python has the notion of truthy and falsy values.
Say I have a list of students and their grades and want to praise all of them once at least one of them gets an 'A' (because I'm a nice person):
students = {"john":"B", "mary":"B", "joseph":"A"}
if any([grade == "A" for grade in students.values()]):
print "Good job guys!"
map(function, iterable)
map()
applies function
to every item in iterable
and returns a list of the results.
For example, suppose I have a list of strings and I want to get a list of their lengths:
strings =["first", "second","third"]
print map(len, strings) # prints [5,6,5]
A common idiom is to convert a list of string numbers into floats or ints:
scores = ["1.3", "5.6", "9.0"]
print map(float, scores) # prints [1.3, 5.6, 9.0]
Note that you can also do this using list comprehensions, which is generally considered more 'pythonic':
strings = ["first", "second", "third"]
print [len(s) for s in strings] # prints [5,6,5]
For most part, stick with list comprehensions, though there might be some very uncommon cases that can mess you up. More on why one way over the other.
filter(function, iterable)
Filter applies function
to each element in iterable
and returns a list of the elements for which function
returns true.
For example, suppose I have a list of integers and I want to extract only the even ones:
numlist = [1,2,3,4,5,7,13,4,5,61,50,98]
print filter(lambda x: x%2==0, numlist) # prints [2,4,4,50,98]
As with the map()
function, you can accomplish the same thing with the if clause in list comprehensions, also generally considered more 'pythonic':
numlist = [1,2,4,5,1,2,4,5,3,1,0,0,5,5,4]
maxnum = max(numlist) # maxnum is 5
print len([num for num in numlist if num ==maxnum]) # prints 4, the number of 5's in numlist
# this is just a convoluted way of doing numlist.count(maxnum), just for demonstration purposes
List comprehension is python's really cool syntactic sugar for constructing lists out of some other iterable. For example:
cubes = [x**3 for x in range(10)] # creates a list [1,8,27,...1000]
In general a list comprehension is of the form:
[ (expression) for clause [zero or more for or if clauses]]
In the example, expression
was x**3
, and there was only one for
clause: for x in range(10)
.
More complex list comprehensions might use additional for
and if
clauses. For example, the if
clause can be used for filtering an iterable:
strings = ["abd", "abc", "bcd", "gdf"]
# get only the strings that contain 'a'
strings = [s for s in strings if "a" in strings] # strings is now ["abd","abc"]
Using additional for
clauses is the same as creating nested for
loops:
z = [num1*num2 for num1 in range(3) for num2 in range(2)] # z is now [0,0,0,1,0,2]
is equivalent to:
z = []
for num1 in range(3):
for num2 in range(2):
z.append(num1*num2)
Note how the for
clauses are evaluated in order.
We may combine addtional for
and if
clauses:
names = ["finch", "ebot","amigo","pi"]
descriptions = ["is fun", "is cute", "is educational", "is cheap"]
dwstuff = [name +" "+ desc for name in names for desc in descriptions if name != "amigo" and desc != "is cheap"]
is equivalent to
names = ["finch", "ebot","amigo","pi"]
descriptions = ["is fun", "is cute", "is educational", "is cheap"]
dwstuff = []
for name in names:
for desc in descriptions:
if name!= "amigo" and desc != "is cheap":
dwstuff.append(name + " " + desc)
Note how the clauses are all evaluated in order.
Note that using the if
clause in list comprehensions is different from using a ternary operator (aka conditional expression), ie. x if condition else y
in the list comprehension expression:
names = ["finch", "ebot","amigo","pi"]
dwstuff1 = ["cute" if name=="finch" else "ugly" for name in names] # dwstuff1 is now ["cute","ugly","ugly","ugly"]
# this does something completely different
dwstuff2 = ["cute" for name in names if name == "finch"] # dwstuff2 is now ["cute"]
You can nest list comprehensions:
colours = ["red","blue","green","yellow"]
print [ [colour[i] for i in range(2)] for colour in colours] # prints [["r", "e"], ["b","l"], ["g","r"],["y","e"]]
Note that this is different from using additional for
clauses. You are basically creating a list of lists here.
char = "a"
astring = "abcdefg"
if char in astring:
print char
That nifty in
keyword.
for item in alist:
# do something
It's as simple as that, but there's a catch: what if you want to modify the list items while you loop?
alist = ["ham","egg","sausage"]
for item in alist:
print "The item is", item
item = "spam"
print "The item is now", item
print alist # prints ["ham", "egg", "sausage"]; woops, what went wrong?
The problem is that item
is like a copy of the actual list item at that point in the iteration. It just holds on to the value of the list item at that point but is not the actual list item in and of itself. Imagine that this:
for item in alist:
item = "spam"
is actually like this:
for i in range(len(alist)):
item = alist[i] # item now stores the value of alist[i]
item = "spam" # item now stores the value 'spam'; this doesn't affect the value of alist[i]
So to modify list elements while looping, you need to access the element via its index, so you need to iterate with the index.
Note that Thou Shalt Not Delete/Add Elements to A List During Iteration. You have been warned.
for i, item in enumerate(iterable):
# do something
Sometimes you want to iterate through an iterable while accessing the index for that iteration. You could just do something like:
for i in range(len(iterable)):
# do something
But this is kinda ugly and not as explicit in what you actually want to do. And what you want to do is iterate over and iterable, while accessing the current index:
for i, item in enumerate(iterable):
print "index", i
print "item", item
"For the index and item in the enumeration of the iterable". Reads like English, see?
Stack Overflow is your friend.
Most commonly used:
-
str.strip()
- returns a copy of the string with leading and trailing whitespace removed -
str.split(delimiter)
- returns a list of strings with the original string split by the specified delimiter -
str.replace(old,new)
- returns a copy of the string with all instances of theold
substring replaced by thenew
substring -
str.count(substr)
- returns the number of non-overlapping occurences ofsubstr
in the string
Head to the docs for more.
They're all used quite commonly.
Head to the docs for all.
For a 0-element tuple, the parenthesis is key:
zerotup = ()
print zerotup # prints ()
print type(zerotup) # prints <type 'tuple'>
For single element tuples, the comma is key (not the parenthesis):
onetup = 1,
print onetup # prints (1,)
print type(onetup) # prints <type 'tuple'>
# this is not a tuple
notatup = (1)
print notatup # prints 1
print type(notatup) # prints <type 'int'>
For multi-element tuples, the commas between elements is key:
# these are all the same
tup = 1,2,3
tup = 1,2,3,
tup = (1,2,3)
Be wary when comparing tuples; the equality comparison operator has precedence over your commas:
def f(x):
return x, x
print f(2) == 2,2 # prints (False, 2)
# what's happening is something like this
# print ((f(2) == 2), 2)
# what you want is this:
print f(2) == (2,2) # prints True
# what do you think the following prints?
print 2,2 == 2,2
A set is an unordered collection with no duplicates.
Say you have a list with multiples occurences of various items. If you want to remove all the duplicate items, you could use a for
loop:
newlist = []
for item in oldlist:
if item not in newlist:
newlist.append(item)
Or you could convert the list into a set:
set(alist)
(You can convert the set back into a list using list()
if need be)
Obviously using set()
is more elegant, but the difference is that the order of your elements is not preserved (since it's an unordered collection), while the first method using the for
loop preserves them. Usually this is not important.
Sets can be initialized using curly braces (but note that an empty set should be initialized with set()
; {}
initializes an empty dictionary):
a = {1,2,3,4,5}
print a # prints set([1,2,3,4,5])
b = {1,2,1}
print b # prints set([1,2])
a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in a that are not in b
print a - b # prints set([1,2,4])
a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in either a or b
print a | b # prints set([1,2,3,4,5,6,7,8,9,10])
a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in both a and b
print a & b # prints set([3,5,6])
a = {1,2,3,4,5,6}
b = {3,5,6,7,8,9,10}
# get elements in either a or b but not both
# ie. the union minus the intersection
print a ^ b # prints set([1,2,4,7,8,9,10])
Use d.items()
where d
is your dictionary. Returns a list of (key,value) tuples
This can also be used for iterating through the key-value pairs:
for k,v in d.items():
print "Key is", k, "Value is", v
Use collections.defaultdict
. The constructor takes in factory callable (ie. function) that should return the default value when called. This means that when you try to access a value in a dictionary by a key that you've never assigned to it before, it will insert the default value returned by the factory function with the given key, instead of returning None
as would happen in a normal dictionary. Better explained with examples than anything else:
Common patterns include setting the default value to an empty list:
d = collections.defaultdict(list)
for key in somelist:
d[key].append("wut")
(the list
function constructs an empty list when passed no arguments)
Setting the default value to 0 (for counting):
d = collecftions.defaultdict(int)
for key in somelist:
d[key] +=1
(the int
function returns 0 when passed no arguments)
Of course you could do it in a slightly more roundabout way with a normal dictionary:
d ={}
for key in somelist:
if d.get(key) == None:
d[key] = 0
else:
d[key] += 1
Note that you have to test whether the key exists with d.get()
because if you try to access a non-existent key with d[key]
Python will raise a KeyError
(though you could do a try/except
if you really really want. Just go with the defaultdict please)
It's good practice to always do
for key in d:
as opposed to
for key in d.keys()
Though they seem to do the same thing, the first is actually faster (has lower asymptotic complexity) though the time difference obviously won't show up for the size of dictionaries you'll be dealing with. Still, if for anything else, save yourself some typing.
For the interested, this difference is because d.keys()
returns a list of the dictionary keys and iterates through that, but it takes time to build that list (O(n) time if you have n items, and are familiar with asymptotic notation). Doing key in d
calls a function that does a efficient hash lookup in O(1) time (this fast lookup time is actually what motivates the hash data structure, of which Pyton's dictionary is a specific implementation)
Note: if you are using Canopy, make sure you are running it from the correct directory relative to the file you are trying to open, or you will get an IOError. Do this by right-clicking the Python pane (the interactive prompt) and selecting Keep working directory synced to current file
or something to that effect. You can also change your working directory manually if need be
When accessing files there's the notion of a current pointer which points to where you are in your file right now. Calling certain methods on the file object will cause the pointer to move, and if you are not aware of this it can cause some confusion.
open(filename,[mode])
The filename can be specified as a relative path or as an absolute path. (In the former case, ensure your working directory is set correctly, or the relative path might be broken). The most convenient is usually to have the file you are trying to open and your python script in the same directory.
The mode can be specified as r
(read only), w
(write only, with truncation, meaning you will overwrite whatever was originally in the file; also creates the file if it doesn't exist), and a+
(append to the file, so previous content won't be overwritten; also creates the file if it doesn't exist). If you need to read and write simultaneously, you can use the modes r+
(read and write, with the initial position at the start of the file), w+
(read and write, and truncates on write, meaning it will overwrite previous file content; also creates the file if it doesn't exist) and a+
(read and write, initial position at the end of file; also creates the file if it doesn't exist).
If the mode is not specified it defaults to r
(There is another optional buffering argument but you won't need it)
You can read the entire contents of the file as a single string:
f = open("somefile.txt")
filestring = f.read()
Doing so moves the file's current pointer to the end of file. If you want to process the file's contents again, you have to reposition the pointer to the beginning using f.seek(0)
You can also read a single line:
aline = f.readline()
A line is demarcated by a newline \n
character and this character is also kept by readline()
(ie., in the example above, aline
will be a string containing a \n
at the end). Using readline()
will move the file's current pointer to the start of the next line.
You can also get all lines at once in as a list of strings:
lines = f.readlines() # lines is a list of strings, eg. ["line1\n","line2\n"...]
The newline character is likewise kept. The current pointer will be moved to the end of the file.
If you need to go through each line of the file, you could definitely call f.readline()
repeatedly until the EOF (end of file) is reached (whereupon f.readline()
will return an empty string), but you really shouldn't (it's not pythonic). Just iterate through a file like so:
for line in f:
#do something
(Yes, a file object is also an iterable as well as an iterator)
Should you need to iterate through the file again, remember to reset the current pointer to the start with f.seek(0)
f.seek(byteoffset)
Moves the current pointer byteoffset
bytes from the start of the file. Normally you'd just need to move to to the beginning of the file, using:
f.seek(0)
Note that file output is buffered, so unless you flush or close the output stream, the string may not actually be written to file. So remember to always f.close()
your files!
To write a string:
f.write(somestring)
To write a list of strings:
f.writelines(listofstrings)
Note that writelines()
doesn't add a newline character. It's just named to match readlines()
. For example:
f.writelines(["some","lines","hoho"]) # produces a file with the text: somelineshoho
f.close()
You should make it a habit to always close files that you open, even though the python garbage collector will close the file for you when it destroys the file object (you won't have control over this and the exact behaviour varies over python versions so you shouldn't depend on it). Actually a better way to handle file resources is to use the with
statement:
with open("somefile.txt","w") as f:
# do stuff with your file
print f.write("writing")
This will ensure that your file is always closed. This is good to know, though you don't really need to know about it at this point.
OOP is a programming paradigm that logically encapsulates code in objects. Objects couple data (in the form of fields/attributes) with functions or procedures (methods).
You can think of the OOP paradigm as the way you intuitively view the world too: you classify the objects in the world based on properties they have (attributes) and the things they can do (methods). For example, a dog might have brown fur, an age and a weight, and it can run, bark, eat etc.
How might we write a Dog class?
class Dog:
# This is a special method called the constructor
# You initialize instance variables/attributes here
def __init__(self, color, age, weight):
self.color = color
self.age = age
self.weight = weight
# A method to run
def run(self, distance):
self.weight -= 0.1 * distance # dogs burn calories too!
# A method to bark
def bark(self):
print "Woof woof!"
# A method to eat
def eat(self, amount):
self.weight += amount
The form of OOP that python employs is class-based, meaning that you define classes and work with objects that are instances of those classes.
You can think of a class as your own custom type. Python has its own built-in types such as int
, str
, list
etc. But sometimes you want to work with your own type, your own category of things, such as the Dog
class we saw above.
A class is like a general category, but most times you don't want to work with the category per se, you want to work with an instance of that category (a real, concrete Fido, if you will, the one that slobbers all over your face in the morning; not some abstract Platonic notion of 'a dog'). There can be many many instances of a single class.
How do we get this concrete instance? We have to instantiate/construct it, by calling the special method known as the constructor:
# Assuming I've already defined my Dog class
# as in the previous snippet of code
fido = Dog("yellow", 5, 30) # let me instantiate a Dog, and store it in the variable fido
To call the constructor, we see that we just do classname(arguments)
. Just like any other function call, we have to pass it arguments. But where did we define our constructor, in our class? This is where constructors are a little different from normal functions you've seen so far. Let's go back to the definition of our Dog
class:
class Dog:
# This is our constructor definition, right here
def __init__(self, color, age, weight):
self.color = color
self.age = age
self.weight = weight
Even though you called your constructor by classname(arguments)
, when you defined your constructor, you used the name __init__
. That's just how it is. The __init__
method is what is called a magic method in python. These are methods which you can define in your classes which python treats somewhat differently from normal methods. They add some 'magical' behavior, so to speak. The constructor, or __init__
method is by far the most important.
A constructor allows you to initialize your instance (hence the name __init__
). This means setting values to your object attributes when you create it. The attributes of a Dog
instance are its color, age, and weight. Note how these instance attributes are prefixed with self.
, like self.color
, self.age
etc.
What is self
? Notice how all our method definitions have self
as the first parameter. This is because a reference to the instance you are calling the method on is being implicitly passed to the method as the first argument, whenever you call methods on object instances using dot notation:
# Remember our Dog class had a bark method
class Dog:
# ...
# stuff
def bark(self):
print "woof woof!"
dog = Dog()
dog.bark() # here I'm calling the method bark on dog,
# which is an instance of Dog
How is it that when I call dog.bark()
I don't pass any arguments to bark()
, but in the definition of bark
I actually have a parameter self
?
This is because when calling dog.bark()
, we are actually calling a bound method, ie. the bark
method is bound to the instance dog
, and so python implicitly passes a reference to the instance dog
into the self
parameter of bark
. Thus we don't explicitly pass in arguments to bark
when we do dog.bark()
.
Try this (in continuation with the previous code-block, ie. I assume that I've already defined the Dog
class, and instantiated dog
):
print dog.bark # prints <bound_method Dog.bark of <__main__.Dog instance at 0xsomehexaddresss>
print Dog.bark # prints <unbound_method Dog.bark>
b = dog.bark # remember we can pass methods around as well
print b.__self__ # prints <__main__.Dog object at 0xsomehexaddress>
# we can get back the object to which the method was bound!
dog.bark() # prints "woof woof!"
Dog.bark() # throws this error:
# TypeError: unbound method bark() must be called
# with Dog instance as first argument
# (got nothing instead)
Dog.bark(dog) # prints "woof woof!"
So we see how python is passing a reference to the instance on which the method is called, under the hood, and that reference is stored in the self
parameter.
So that's the magic behind self
. And actually, it really isn't very magical at all. Consider this:
class Dog:
def __init__(trolololol, color, age, weight):
trolololol.color = color
trolololol.age = age
trolololol.weight = weight
def bark(hohohoho):
print "woof woof I'm a " + str(hohohoho.age) " year old dog!"
dog = Dog("yellow",10, 40)
dog.bark() # prints "woof woof I'm a 10 year old dog!"
Works just like normal. Except, who the hell would ever write that kind of code??? Yep, only Digital World TAs. We have superpowers. #beadigitalworldTA
Most of time, we only need attributes that belong to specific instances of a class. For example, the color, age and weight attributes are particular to a specific instance of dog. Two different dog instances should keep track of their own color, age and weight.
fido = Dog("black", 10, 40)
dingo = Dog("yellow", 2, 20)
But what if there are some attributes that should be global to all instances of Dog
? For example, suppose we would like to have an attribute that tracks the threshold weight over which a dog would be considered obese.
Logically speaking, this should not be an attribute that belongs to each instance of Dog
. This threshold weight should be the same for all instances of Dog
. It should be an attribute of the Dog
class itself, a class attribute.
class Dog:
# here we initialize a class attribute, in the class body
obeseweight = 30
def __init__(self, color, age, weight):
self.color = color
self.age = age
self.weight = weight
def check_obese(self):
# we could also write this as
# if self.weight > Dog.obeseweight:
# but it's less elegant and maintainable
if self.weight > self.__class__.obeseweight:
return True
return False
dog = Dog("yellow", 10, 40)
print dog.check_obese() # prints True
Note now we initialized the class attribute, and accessed it within our method definitions. Try this instead:
class Dog:
# here we initialize a class attribute, in the class body
obeseweight = 30
def __init__(self, color, age, weight):
self.color = color
self.age = age
self.weight = weight
def check_obese(self):
if self.weight > obeseweight: # what happens here?
return True
return False
dog = Dog("yellow", 10, 40)
print dog.check_obese() # throws NameError: global name 'obeseweight' is not defined
The following error is thrown: NameError: global name 'obeseweight' is not defined
. Remember from our discussion of variable scope, class definitions introduce a new code block with a new scope, but this scope does not extend to nested blocks, that is, it does not extend to the methods that you define in your class definitions. That's why you get that error thrown: obeseweight
is not defined within your method scope.
Now try this:
class Dog:
# here we initialize a class attribute, in the class body
obeseweight = 30
def __init__(self, color, age, weight):
self.color = color
self.age = age
self.weight = weight
def check_obese(self):
if self.weight > self.obeseweight: # we can access it like an instance attribute too?
return True
return False
dog = Dog("yellow", 10, 40)
print dog.check_obese() # prints True
This works too. Note how you can actually access the class attribute through the instance as well. This is convenient but can be quite confusing. Python keeps track of an instance namespace and a class namespace. If you try to access an attribute through an instance, python checks the instance namespace first. If it doesn't find anything, it checks the class namespace. So the above code is able to access the obeseweight
class attribute through self.obeseweight
.
Now you know how to properly access class attributes. But what about updating class attributes? This is even trickier. Suppose we want to have a class attributes that keeps track of the number of instances:
class Dog:
count = 0
def __init__(self, color,age,weight):
self.color = color
self.age = age
self.weight = weight
self.__class__.count += 1
dog1 = Dog("yellow", 10, 30)
dog2 = Dog("black", 10, 40)
dog3 = Dog("black", 10, 40)
print Dog.count # prints 3
print dog1.count # prints 3
print dog2.count # prints 3
print dog3.count # prints 3
This works fine, but how about this:
class Dog:
count = 0
def __init__(self, color,age,weight):
self.color = color
self.age = age
self.weight = weight
self.count += 1
dog1 = Dog("yellow", 10, 30)
dog2 = Dog("black", 10, 40)
dog3 = Dog("black", 10, 40)
print Dog.count # prints 0; WHUT HAAPPPENED
print dog1.count # prints 1
print dog2.count # prints 1
print dog3.count # prints 1
If the class attribute was set by accessing the class (as we did with self.__class__.count += 1
), then the new value will be set for the class (and hence all instances). But if it was set by accessing an instance (as we did with self.count += 1
), then the new value will be set only for that instance; in effect you have created an instance variable of the same name that overrides the class variable.
One of the main features of OOP is inheritance. This allows code to be easily reused and makes your program more modular and maintainable.
The idea is that you have parent classes which provide a set of baseline functionalities, and you have children classes which inherit these from the parent class, but extend them with functionalities specific to each child.
Back to our Dog
class, we have this base parent class that defines all these basic attributes and methods:
class Dog:
obeseweight = 30
def __init__(self, color, age, weight):
self.color = color
self.age = age
self.weight = weight
def run(self, distance):
self.weight -= 0.1 * distance # dogs burn calories too!
def bark(self):
print "Woof woof!"
def eat(self, amount):
self.weight += amount
def check_obese(self):
if self.weight > self.__class__.obeseweight:
return True
return False
But there are many kinds of dogs! And it doesn't make sense to have the same obeseweight
threshold for all breeds of dogs. So let's create some child classes, also known as subclasses:
class Chihuahua(Dog): # this is the way of saying class
# Chihuahua inherits from class Dog
obeseweight = 5 # overwrite the parent class attribute
class BerneseMountainDog(Dog):
obeseweight = 55
chi = Chihuahua("yellow", 1, 6)
bernese = BerneseMountainDog("black", 4, 40)
genericdog = Dog("white",10,20)
# child classes inherit methods from their parent class!
chi.bark() # prints "Woof woof!"
print chi.check_obese() # prints True
print bernese.check_obese() # prints False
print genericdog.check_obese() # prints False
Using inheritance is way better than copying and pasting your Dog
class code into your Chihuahua
and BerneseMountainDog
classes.
A slightly subtle point to note - what if the check_obese
method had been written like this instead:
def check_obese(self):
# hard-code the reference Dog class attribute
if self.weight > Dog.obeseweight:
return True
return False
then
print chi.check_obese() # prints False
print bernese.check_obese() # prints True
This is why we should avoid hardcoding the class name when referencing class attributes and access it through the built-in attribute __class__
of the object instance, that is, self
; ie. self.__class__.classattribute
.
We've seen how we can override a parent class attribute. What of overriding or extending a parent class method?
class Chihuahua(Dog):
obeseweight = 5
# we want to override the bark method
def bark(self):
print "Squeak squeak!"
chi = Chihuahua("yellow",10, 3)
chi.bark() # prints "Squeak squeak!" instead of "Woof woof!"
When you define a method in the child class with the same name as a method defined in the parent class, the child class method overrides the parent class method. But what if we don't want to completely override the parent class method, but simply extend it? That is, we still want the base functionality provided by the parent method, but we want to add more stuff? For example, suppose for Chihuahuas we want to have an extra attribute called hairlength
:
class Chihuahua(Dog):
obeseweight = 5
# we want to extend the constructor
def __init__(self, color, age, weight, hairlength):
# call the parent class constructor
Dog.__init__(self, color,age,weight)
# note that hard-coding the parent class name works but
# isn't best practice
# add the extra stuff specific to this child class
self.hairlength = hairlength
# we want to override the bark method
def bark(self):
print "Squeak squeak!"
chi = Chihuahua("yellow", 3, 5, "short")
print chi.hairlength # prints "short"
This stuff is slightly tricky but for extra knowledge
Notice how we hard-coded in the parent class name Dog
in order to access the parent class __init__
method. That really isn't ideal, but is the only way we can access the parent class method when we are dealing with so-called 'old-style' classes. New-style classes on the other hand are more flexible and allow us to use the method super()
to in a way get a handle on the parent class without explicitly naming it.
New-style classes must inherit from the built-in object
class:
class Dog(object): # Dog is a new-style class
obeseweight = 30
def __init__(self, color, age, weight):
self.color = color
self.age = age
self.weight = weight
def run(self, distance):
self.weight -= 0.1 * distance
def bark(self):
print "Woof woof!"
def eat(self, amount):
self.weight += amount
def check_obese(self):
if self.weight > self.__class__.obeseweight:
return True
return False
class Chihuahua(Dog):
obeseweight = 5
# we want to extend the constructor
def __init__(self, color, age, weight, hairlength):
# call the parent class constructor
# using super instead of hard-coding the name
super(Chihuahua, self).__init__(self, color,age,weight)
# gee, doesn't this look familiar somehow?
# add the extra stuff specific to this child class
self.hairlength = hairlength
# we want to override the bark method
def bark(self):
print "Squeak squeak!"
chi = Chihuahua("yellow", 3, 5, "short")
print chi.hairlength # prints "short"
What's the difference between old-style and new-style classes?
class OldStyle:
pass
class NewStyle:
pass
print type(OldStyle) # prints <type 'classobj'>
print type(NewStyle) # prints <type 'type'>
print type(int) # prints <type 'type'>
New-style classes are fully legitimate custom types, on the same footing as built-in python types like int
. New-style classes are more flexible and in fact in Python 3x all classes are new-style.
You've been using the Digital World libary (libdw) to implement state machines in python so far. If you understood that whole deal about inheritance, you should realize by now that you've been using it all along to create your state machine classes:
import libdw.sm as sm
class MySM(sm.SM): # you are inheriting from the base SM class
# defined in the libdw.sm module
# you are overriding the default class attribute
# defined in the base SM class, which initializes
# startState as None by default
startState = "somestartstate"
# you are overriding the getNextValues method in the base class
def getNextValues(self, state, inp):
# do stuff
# make sure you return a tuple of
# (nextstate, output)
Don't jump straight into writing code, unless you already know you won't confuse yourself. Think at a higher level first. Draw out a state machine diagram. Define your states and inputs clearly. Then the coding becomes trivial.
What should your state be? A way to figure this out is to ask the question, what kind of information to I need to remember? The inputs and outputs should be something specified in the question.
Often in getNextValues
you end up building an if/elif/else
tree for all the condition-checking (what's my state, what's my input). Then you initialize the variables nextState
and output
in some of these conditional branches but not in others. Then you get this kind of error and become sadded:
import libdw.sm as sm
class MySM(sm.SM):
def getNextValues(self, state, inp):
if state == 0:
nextState = 1
output = 1
elif state == 1:
nextState = 0
output = 0
return nextState, output
mySM = MySM()
mySM.transduce([0,0,1])
# you get this error message
#Traceback (most recent call last):
# File "yourfile.py", line 13, in <module>
# mySM.transduce([0,0,1])
# File "libdw/sm.py", line 147, in transduce
# File "libdw/sm.py", line 101, in step
# File "yourfile.py", line 10, in getNextValues
# return nextState, output
#UnboundLocalError: local variable 'nextState' referenced before asnment
The problem is that are some conditional paths for which nextState
and output
are not initialized, but your return
statement, which will be executed no matter what, references nextState
and output
. Think about if your state
happens to be not 0, or 1, but 2. Then nextState
and output
will never have been initialized before the return
.
But, you protest, what if my state machine is never going to be in any other state than 0 or 1? Well, python doesn't ever care or know about that; it just knows that your code may very well end up referencing uninitialized variables, and it's not going to let that happen.
So if you're really sure that state 0 and 1 are all the state you will ever need, just change the elif state == 1
to else
. That will fix things for this particular problem.
But more often than not you will be building much more complicated conditional trees than this, and rather than having to ensure that your nextState
and output
are defined under every possible condition, just initialize them at the very top of the function:
def getNextValues(self, state, inp):
nextState, output = None, None
if state == 0:
nextState, output = 1, 1
elif state ==1:
nextState, output = 0, 0
return nextState, output
Some of you don't like to define nextState
and output
variables, and prefer to directly return
under your if/elif/else
blocks, which is also perfectly valid, except sometimes this happens:
import libdw.sm as sm
class MySM(sm.SM):
def getNextValues(self, state, inp):
if state == 0:
return 1,1
elif state == 1:
return 0,0
mySM = MySM()
mySM.transduce([0,0,1])
# you get this error mesage
#Traceback (most recent call last):
# File "test.py", line 62, in <module>
# mySM.transduce([1])
# File "libdw/sm.py", line 147, in transduce
# File "libdw/sm.py", line 101, in step
#TypeError: 'NoneType' object is not iterable
This happens for similar reasons to the UnboundLocalError
- there are some conditional paths for which you are not explicitly return
ing a value. So python returns None
by default. But transduce
is expecting your getNextValues
to return a tuple (which is an iterable) of the next state and the output, so you get that error.
For this particular case, you can fix it by defining startState = 0
, so you will always either be in state 1 or state 0. Otherwise startState
is None
by default.
Suuuuure - here ya go.
You'll be mainly interested in the libdw.sm
module.
(That's Prof Oka's github repo by the way)