This page is a recopilation of the following sources:
- Sourcery - Refactoring Part 1 by Nick Thapen
- Sourcery - Refactoring Part 2 by Nick Thapen
- Sourcery - Refactoring Part 3 by Nick Thapen
- Sourcery - Refactoring Part 4 by Nick Thapen
- Sourcery - Refactoring Part 5 by Nick Thapen
- Sourcery - Refactoring Part 6 by Nick Thapen
- Sourcery - Refactoring Part 7 by Nick Thapen
- Sourcery - Refactoring Part 8 by Nick Thapen
Open Here!
-
- 2.1. Conditionals
- 2.1.1. Merge nested if conditions
- 2.1.2. Hoist repeated code outside conditional statement
- 2.1.3. Hoist loop from if
- 2.1.4. Merge Repeated Ifs
- 2.1.5. Replace if statement with if expression
- 2.1.6. Swap if/else to remove empty if body
- 2.1.7. Simplify conditional into return statement
- 2.1.8. Simplify sequence comparison
- 2.1.9. Merge duplicate blocks in conditional
- 2.1.10. Replace multiple comparisons of same variable with in operator
- 2.1.11. Simplify if expression by using or
- 2.1.12. Lift return into if
- 2.1.13. Replace comparison with min/max call
- 2.1.14. Lift repeated conditional into its own if statement
- 2.1.15. Merge isinstance calls
- 2.1.16. Add guard clause
- 2.1.17. Remove unnecessary else after guard condition
- 2.1.18. Simplify testing of sequence comparisons
- 2.2. Loops
- 2.2.1. Hoist statements out of for/while loops
- 2.2.2. Convert for loop into list/dictionary/set comprehension
- 2.2.3. Replace yield inside for loop with yield from
- 2.2.4. Use any() instead of for loop
- 2.2.5. Replace unneeded comprehension with generator
- 2.2.6. Replace index in for loop with direct reference
- 2.2.7. Replace manual loop counter with call to enumerate
- 2.2.8. Use str.join() instead of for loop
- 2.2.9. Convert for loop into call to sum()
- 2.2.10. Replace unused for index with underscore
- 2.3. Lists
- 2.3.1. Replace list() with []
- 2.3.2. Merge append into list declaration
- 2.3.3. Simplify negative list access
- 2.4. Dictionaries
- 2.5. Boolean
- 2.6. Good Practices
- 2.6.1. Replace assignment with augmented assignment
- 2.6.2. Inline variable that is only used once
- 2.6.3. Move assignments closer to their usage
- 2.6.4. Use with when opening file to ensure closure
- 2.6.5. Extract duplicate code into a method
- 2.6.6. Simplify assignment of boolean variables
- 2.6.7. Use list, set or dictionary comprehensions directly instead of calling list(), dict() or set()
- 2.6.8. Filter using comprehensions rather than copying and deleting
- 2.6.9. Remove str() from Calls to print()
- 2.6.10. Flatten Nested Try
- 2.1. Conditionals
Too much nesting can make code difficult to understand, and this is especially true in Python, where there are no brackets to help out with the delineation of different nesting levels.
Reading deeply nested code is confusing, since you have to keep track of which conditions relate to which levels. We therefore strive to reduce nesting where possible, and the situation where two if
conditions can be combined using and
is an easy win.
From:
if a:
if b:
return c
To:
if a and b:
return c
We should always be on the lookout for ways to remove duplicated code. An opportunity for code hoisting is a nice way of doing so.
Sometimes code is repeated on both branches of a conditional. This means that the code will always execute. The duplicate lines can be hoisted out of the conditional and replaced with a single line.
if sold > DISCOUNT_AMOUNT:
total = sold * DISCOUNT_PRICE
label = f"Total: {total}"
else:
total = sold * PRICE
label = f"Total: {total}"
By taking the assignment to label outside of the conditional we have removed a duplicate line of code, and made it clearer what the conditional is actually controlling, which is the total.
if sold > DISCOUNT_AMOUNT:
total = sold * DISCOUNT_PRICE
else:
total = sold * PRICE
label = f"Total: {total}"
Where the same for
or while
loop is defined in all branches of a conditional, the code can be considerably shortened and clarified by hoisting. Instead of looping inside each branch, we can have one loop and move all branches inside the loop.
def sing_song(choice):
style = STYLES[choice]
if style == "classical":
while i_like_singing():
sing_opera()
elif style == "folk":
while i_like_singing():
sing_folk_song()
elif style == "rock":
while i_like_singing():
sing_rock_ballad()
For example the above code can be shortened to the below:
def sing_song(choice):
style = STYLES[choice]
while i_like_singing():
if style == "classical":
sing_opera()
elif style == "folk":
sing_folk_song()
elif style == "rock":
sing_rock_ballad()
By doing this we've removed two lines of duplicated code, and made the intent of the code clearer.
Don’t Repeat Yourself (DRY) is a core tenet of programming best practices. Often we think about it for large sections of code, but we also want want to avoid repeating conditionals when possible. For instance - if we’re dealing with a couple of variables that are modified when the same condition holds we might have code structured like:
if wardrobe.hats:
self.happiness += 1
else:
self.happiness -= 1
if wardrobe.hats:
self.stylishness += 1
else:
self.stylishness -= 1
Here we’re repeating both the if wardrobe.hats:
as well as the else:
and can easily group together the variable changes so we only have a single conditional check.
if wardrobe.hats:
self.happiness += 1
self.stylishness += 1
else:
self.happiness -= 1
self.stylishness -= 1
We’ve now cut down on the duplication and we’ve significantly shortened and simplified the code we’re dealing with. An important note is that we can only simplify our code like this if the statements inside the first if
statement do not impact the condition itself.
It often happens that you want to set a variable to one of two different values, depending on some program state.
if condition:
x = 1
else:
x = 2
This can be written on one line using Python's conditional expression syntax (its version of the ternary operator):
x = 1 if condition else 2
This is definitely more concise, but it is one of the more controversial refactorings (along with list comprehensions). Some coders dislike these expressions and find them slightly harder to parse than writing them out fully.
Our view is that as long as the conditional expression is short and fits on one line it is an improvement. There's only one statement where x
is defined as opposed to having to read two statements plus the if-else lines. Similarly to the comprehension example, when we're scanning the code we usually won't need to know the details of how x
gets assigned, and can just see that it's being assigned and move on.
One pattern we sometimes see is a conditional where nothing happens in the main body, and all of the action is in the else
clause.
if location == OUTSIDE:
pass
else:
take_off_hat()
In this case we can make the code shorter and more concise by swapping the main body and the else
around. We have to make sure to invert the conditional, then the logic from the else
clause moves into the main body.
if location != OUTSIDE:
take_off_hat()
else:
pass
We then have an else clause which does nothing, so we can remove it.
if location != OUTSIDE:
take_off_hat()
This is easier to read, and the intent of the conditional is clearer. When reading the code I don't have to mentally invert it to understand it, since that has been done for me.
The last refactoring to look at is where we reach the end of a method and want to return True
or False
. A common way of doing so is like this:
def function():
if isinstance(a, b) or issubclass(b, a):
return True
return False
However, it's neater just to return the result directly like so:
def function():
return isinstance(a, b) or issubclass(b, a)
This can only be done if the expression evaluates to a boolean. In this example:
def any_hats():
hats = [item for item in wardrobe if is_hat(item)]
if hats or self.wearing_hat():
return True
return False
We can't do this exact refactoring, since now we could be returning the list of hats rather than True
or False
. To make sure we're returning a boolean, we can wrap the return in a call to bool()
:
def any_hats():
hats = [item for item in wardrobe if is_hat(item)]
return bool(hats or self.wearing_hat())
Something we often do is check whether a list or sequence has elements before we try and do something with it.
if len(list_of_hats) > 0:
hat_to_wear = choose_hat(list_of_hats)
A Pythonic way of doing this is just to use the fact that Python lists and sequences evaluate to True
if they have elements, and False
otherwise.
This means we can write the above code more simply as:
if list_of_hats:
hat_to_wear = choose_hat(list_of_hats)
We should always be searching out opportunities to remove duplicated code. A good place to do so is where there are multiple identical blocks inside an if..elif
chain.
def process_payment(payment):
if payment.currency == "USD":
process_standard_payment(payment)
elif payment.currency == "EUR":
process_standard_payment(payment)
else:
process_international_payment(payment)
Here we can combine the first two blocks using or
to get this:
def process_payment(payment):
if payment.currency == "USD" or payment.currency == "EUR":
process_standard_payment(payment)
else:
process_international_payment(payment)
Now if we need to change the process_standard_payment(payment)
line we can do it in one place instead of two. This becomes even more important if these blocks involve multiple lines.
Let's see if we can refine the previous example a little bit further.
We often have to compare a value to one of several possible others. When written out like this we have to look through each comparison to understand it, as well as mentally processing the boolean operator.
def process_payment(payment):
if payment.currency == "USD" or payment.currency == "EUR":
process_standard_payment(payment)
else:
process_international_payment(payment)
By using the in
operator and moving the values we are comparing to into a collection we can simplify things.
def process_payment(payment):
if payment.currency in ["USD", "EUR"]:
process_standard_payment(payment)
else:
process_international_payment(payment)
This has avoided a little bit of duplication, and the conditional can now be taken in and understood with one glance.
Here we find ourselves setting a value if it evaluates to True
, and otherwise using a default.
This can be written long-form as:
if input_currency:
currency = input_currency
else:
currency = DEFAULT_CURRENCY
or using an if expression:
currency = input_currency if input_currency else DEFAULT_CURRENCY
Both can be simplified to the following, which is a bit easier to read and avoids the duplication of input_currency
.
currency = input_currency or DEFAULT_CURRENCY
It works because the left-hand side is evaluated first. If it evaluates to True
then currency will be set to this and the right-hand side will not be evaluated. If it evaluates to False
the right-hand side will be evaluated and currency
will be set to DEFAULT_CURRENCY
.
This is a quick way to streamline code slightly. Where a value is set on each branch of an if and then immediately returned, instead return it directly from each branch.
This means that code like this:
def f():
if condition:
val = 42
else:
val = 0
return val
Is converted into:
def f():
if condition:
return 42
else:
return 0
This has removed an unnecessary intermediate variable which we had to mentally track.
We often need to work out the smallest or largest of two values, and might think of using a pattern like this one to do so:
if first_hat.price < second_hat.price:
cheapest_hat_price = first_hat.price
else:
cheapest_hat_price = second_hat.price
A quicker way to do this in Python is to use the built-in min
and max
functions. This code is a shorter and clearer way to achieve the same result:
cheapest_hat_price = min(first_hat.price, second_hat.price)
The same functions offer a shortcut for when we want to put a cap or a floor on the value of a variable:
Before:
if sale_price >= 10:
sale_price = 10
After:
sale_price = min(sale_price, 10)
Quite often while reading code you will see a pattern like this:
if isinstance(hat, Sombrero) and hat.colour == "green":
self.wear(hat)
elif isinstance(hat, Sombrero) and hat.colour == "red":
destroy(hat)
This duplication of one of the conditions makes things more difficult to read, and carries with it all the usual problems of code duplication. While normally we try to avoid adding nesting to the code, in this case it makes sense to lift the duplicated conditional into its own if
statement:
if isinstance(hat, Sombrero):
if hat.colour == "green":
self.wear(hat)
elif hat.colour == "red":
destroy(hat)
It is now clearer at a glance that the whole if..elif
chain relates only to sombreros and not other types of hat.
A small tip - when you want to check whether something is one of multiple different types, instead of this:
if isinstance(hat, Bowler) or isinstance(hat, Fedora):
self.wear(hat)
you can merge the calls into:
if isinstance(hat, (Bowler, Fedora)):
self.wear(hat)
This is shorter while staying nice and easy to read.
Deeply nested functions can be very difficult to understand. As you read them you have to remember the conditions that hold for each level of nesting. This can be even more difficult in Python, given that there are no brackets to help define conditional blocks. An easy way to reduce nesting is to convert conditionals into guard clauses.
As an example let's look at this function:
def should_i_wear_this_hat(self, hat):
if isinstance(hat, Hat):
current_fashion = FASHION_API.get_fashion(FASHION_TYPE.HAT)
weather_outside = self.look_out_of_window()
is_stylish = self.evaluate_style(hat, current_fashion)
if weather_outside.is_raining:
print("Damn.")
return True
else:
print("Great.")
return is_stylish
else:
return False
This is quite hard to parse, given the two layers of nesting. When I get to the else
at the bottom I need to flick back and forth a bit between that and the if
test at the top before I've understood what's going on. This if
condition is a check for an edge case, where something that isn't a hat is passed in. Since we just return False
here it's a perfect place to introduce a guard clause:
def should_i_wear_this_hat(self, hat):
if not isinstance(hat, Hat):
return False
current_fashion = get_fashion()
weather_outside = self.look_out_of_window()
is_stylish = self.evaluate_style(hat, current_fashion)
if weather_outside.is_raining:
print("Damn.")
return True
else:
print("Great.")
return is_stylish
We add a guard clause by inverting the condition and returning immediately. The edge case is now taken care of at the start of the function and I don't have to worry about it any more.
Having reduced the level of nesting, the rest of the function is now easier to read. Whether to add a guard clause is sometimes subjective. For really short functions it may not be worth doing. Where functions are longer or more complex it can often be a useful tool.
A common code pattern is to have some guard clauses at the start of a function, to check whether certain conditions have been fulfilled and return early or raise an exception if not.
def f(a=None):
if a is None:
return 42
else:
# some long calculations
var = (i**2 for i in range(a))
return sum(var)
While this is perfectly valid code, it can run into problems with excessive nesting, particularly if the rest of the function is fairly long.
Here we can take advantage of the fact that we don't need the else
if the main body of the if
breaks the control flow by ending with return
or raise
. Rewriting the function as shown here is logically equivalent.
def f(a=None):
if a is None:
return 42
# some long calculations
var = (i**2 for i in range(a))
return sum(var)
Using a guard condition, or multiple guard conditions, in this way now doesn't cause the rest of the function to be indented. In general the less we have to deal with indents the easier the code is to understand.
A very common requirement when coding is to check whether or not a sequence (a string, list or tuple) is empty before doing some processing with it. One straightforward way of doing this is to use the len
built-in function like this:
if len(my_wardrobe.hats) == 0:
self.shout("Alarm!")
There is a shorter and more Pythonic way of doing this however, which is one of the style recommendations given in PEP 8. This is to use the fact that in Python, empty sequences evaluate to False
. This means that you can rewrite the above like this:
if not my_wardrobe.hats:
self.shout("Alarm!")
The equivalent case for sequences with at least one element is:
if len(my_wardrobe.hats) > 0:
self.shout("I have hats!")
which can be turned into this since these evaluate to True
:
if my_wardrobe.hats:
self.shout("I have hats!")
This simplification takes a small amount of getting used to, but once you're acclimated to it the code reads very naturally.
Another type of hoisting is pulling invariant statements out of loops. If a statement just sets up some variables for use in the loop, it doesn't need to be inside it. Loops are inherently complex, so making them shorter and easier to understand should be on your mind while writing them.
In this example the city
variable gets assigned inside the loop, but it is only read and not altered.
for building in buildings:
city = "London"
addresses.append(building.street_address, city)
It's therefore safe to hoist it out, and this makes it clearer that the same city
value will apply to every building
.
city = "London"
for building in buildings:
addresses.append(building.street_address, city)
This also improves performance - any statement in a loop is going to be executed every time the loop runs. The time spent on these multiple executions is being wasted, since it only needs to be executed once. This saving can be significant if the statements involve calls to databases or other time-consuming tasks.
A pattern that we encounter time and again when coding is that we need to create a collection of values.
A standard way of doing it in most languages would be as follows:
cubes = []
for i in range(20):
cubes.append(i**3)
We create the list, and then iteratively fill it with values - here we create a list of cubed numbers, that end up in the cubes
variable.
In Python we have access to list comprehensions. These can do the same thing on one line, cutting out the clutter of declaring an empty list and then appending to it:
cubes = [i**3 for i in range(20)]
We've turned three lines of code into one which is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.
Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.
Another point is that the assignment is now more of an atomic operation - we're declaring what cubes
is rather than giving instructions on how to build it. This makes the code read like more of a narrative, since going forward we will care more about what cubes
is than the details of its construction.
Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.
If you need a deeper dive into comprehensions check out my post here. It goes into exactly how they work, how to use them for filtering, and dictionary and set comprehensions.
One little trick that often gets missed is that Python's yield keyword has a corresponding yield from for collections, so there's no need to iterate over a collection with a for loop. This makes the code slightly shorter and removes the mental overhead and extra variable used by the for loop. Eliminating the for loop also makes the yield from version about 15% faster.
Before:
def get_content(entry):
for block in entry.get_blocks():
yield block
After:
def get_content(entry):
yield from entry.get_blocks()
A common pattern is that we need to find if some condition holds for one or all of the items in a collection. This can be done with a for loop such as this:
found = False
for thing in things:
if thing == other_thing:
found = True
break
A more concise way, that clearly shows the intentions of the code, is to use Python's any()
and all()
built in functions.
found = any(thing == other_thing for thing in things)
any()
will return True
when at least one of the elements evaluates to True
, all()
will return True
only when all the elements evaluate to True
.
These will also short-circuit execution where possible. If the call to any()
finds an element that evalutes to True
it can return immediately. This can lead to performance improvements if the code wasn't already short-circuiting.
One tip is that functions like any, all and sum allow you to pass in a generator rather than a collection. This means that instead of doing this:
hat_found = any([is_hat(item) for item in wardrobe])
you can write:
hat_found = any(is_hat(item) for item in wardrobe)
This removes a pair of brackets, making the intent slightly clearer. It will also return immediately if a hat is found, rather than having to build the whole list. This lazy evaluation can lead to performance improvements.
Note that we are actually passing a generator into any()
so strictly speaking the code would look like this:
hat_found = any((is_hat(item) for item in wardrobe))
but Python allows you to omit this pair of brackets.
The standard library functions that accept generators are:
"all", "any", "enumerate", "frozenset", "list", "max", "min", "set", "s
A pattern that is often used in Python for loops is to use range(len(list))
to generate a range of numbers that can be iterated over.
for i in range(len(currencies)):
print(currencies[i])
If the index i
is only used to do list access this code can be improved by iterating over the list directly, as in the example below.
for currency in currencies:
print(currency)
This code is easier to understand, and a lot less cluttered. In particular being able to use a meaningful name for currency
greatly improves readability.
When iterating over a list you sometimes need access to a loop counter that will let you know the index of the element you are utilising. It can be tempting to just write one yourself like this:
i = 0
for currency in currencies:
print(i, currency)
i += 1
However there's a built-in Python function, enumerate
, that lets you generate an index directly, removing two unneeded lines of code.
for i, currency in enumerate(currencies):
print(i, currency)
When reading this we don't have to worry about the book-keeping of the i
variable, letting us focus in on the code that really matters.
The most straightforward way to concatenate strings in Python is to just use the +
operator:
hat_description = hat.colour + hat.type
This is perfectly fine when you are joining together small numbers of strings (though f-strings are the best choice for doing more complicated string handling).
The problem with using +
or +=
comes when they are used to concatenate large lists of strings. For example you might use them in a for loop like this:
all_hats = ""
for hat in my_wardrobe.hats:
all_hats += hat.description
This is cumbersome to read, and also isn't very performant. A new string has to be created for every iteration in the for loop, which slows things down.
Luckily Python strings come with the join
method to solve this problem.
all_hats = "".join(hat.description for hat in my_wardrobe.hats)
This accomplishes the same task in one line and is quite a lot faster to boot. You can also add a separator in between each string easily, without having to worry about extra separators being added at the beginning or end of the result.
all_hats = ", ".join(hat.description for hat in my_wardrobe.hats)
Much of programming is about adding up lists of things, and Python has the built-in sum()
function to help with this.
You can rewrite for
loops which sum lists in this way:
Before:
total = 0
for hat in hats:
total += hat.price
After:
total = sum(hat.price for hat in hats)
This is much shorter, which is a definite bonus. The code also now explicitly tells you what it is trying to do - sum the price of all the hats.
Sometimes in a for loop we just want some code to run a certain number of times, and don't actually make use of the index variable.
For example take this code:
for hat in my_wardrobe.hats:
self.shout("Hurrah!")
We have introduced a new variable, hat
, which we have to note when reading the code, but actually we don't need it and could replace it with _:
for _ in my_wardrobe.hats:
self.shout("Hurrah!")
It is a convention in Python to use _
as a throwaway name for unused variables. This means your brain can learn to safely ignore these, reducing the overhead to understand the code. Where you see this in a for
loop it it immediately clear that the loop is just used to repeat a block of code and we don't care about the value being iterated over.
From:
x = list()
To:
x = []
The most concise and Pythonic way to create a list is to use the []
notation.
This fits in with the way we create lists with elements, saving a bit of mental energy that might be taken up with thinking about two different ways of creating lists.
x = ["first", "second"]
Doing things this way has the added advantage of being a nice little performance improvement.
Here are the timings before and after the change:
$ python3 -m timeit "x = list()"
5000000 loops, best of 5: 63.3 nsec per loop
$ python3 -m timeit "x = []"
20000000 loops, best of 5: 15.8 nsec per loop
Similar reasoning and performance results hold for replacing dict()
with {}
.
When declaring a list and filling it up with values one way that can come naturally is to declare it as empty and then append to it.
hats_i_own = []
hats_i_own.append("panama")
hats_i_own.append("baseball_cap")
hats_i_own.append("bowler")
This can be done in place, shortening the code and making the intent more explicit. Now I just need to glance at one line to see that I'm filling a variable with hats, rather than four.
hats_i_own = ["panama", "baseball_cap", "bowler"]
Doing it this way is also slightly more performant since it avoids the function calls to append
. The same holds true for filling up other collection types like sets and dictionaries.
In Python you can access the end of a list by using negative indices. So my_list[-1]
gets the final element, my_list[-2]
gets the penultimate one and so on.
This means that you can turn this:
a = [1, 2, 3]
last_element = a[len(a) - 1]
into the simpler:
a = [1, 2, 3]
last_element = a[-1]
Sometimes when iterating over a dictionary you only need to use the dictionary keys.
for currency in currencies.keys():
process(currency)
In this case the call to keys()
is unnecessary, since the default behaviour when iterating over a dictionary is to iterate over the keys.
for currency in currencies:
process(currency)
The code is now slightly cleaner and easier to read, and avoiding a function call will yield a (small) performance improvment.
When iterating over a dictionary a good tip is that you can use items()
to let you access the keys and values at the same time. This lets you transform this:
hats_by_colour = {"blue": ["panama", "baseball_cap"]}
for hat_colour in hats_by_colour:
hats = hats_by_colour[hat_colour]
if hat_colour in self.favourite_colours:
think_about_wearing(hats)
into this:
hats_by_colour = {"blue": ["panama", "baseball_cap"]}
for hat_colour, hats in hats_by_colour.items():
if hat_colour in self.favourite_colours:
think_about_wearing(hats)
This saves us the line that we used to assign to hats
, incorporating it into the for loop. The code now reads more naturally, with a touch less duplication.
We often want to pick something from a dictionary if the key is present, or use a default value if it isn't. One way of doing this is to use a conditional statement like this one:
def pick_hat(available_hats: Dict[Label, Hat]):
if self.favourite_hat in available_hats:
hat_to_wear = available_hats[self.favourite_hat]
else:
hat_to_wear = NO_HAT
return hat_to_wear
A useful shortcut is that Python dictionaries have a get()
method which lets you set a default value using the second parameter. We can therefore shorten the above code to this:
def pick_hat(available_hats: Dict[Label, Hat]):
hat_to_wear = available_hats.get(self.favourite_hat, NO_HAT)
return hat_to_wear
This has slimmed the code down and removed some duplication. A point to note is that if you don't pass in a default value it will use None
.
A pattern that we often see when initialising a dictionary is to declare an empty dictionary and then add items into it line by line.
hats_i_own = {}
hats_i_own["panama"] = 1
hats_i_own["baseball_cap"] = 2
hats_i_own["bowler"] = 23
This winds up taking up many more lines than necessary. Instead you can merge the whole assignment process together in a single line - declaring the dictionary alongside the items in it
hats_i_own = {"panama": 1, "baseball_cap": 2, "bowler": 23}
Now while reading the code, if we are not interested in the individual elements the initialisation no longer uses up so much screen real estate, but they are still there if we need to read them.
Sticking with dictionaries for the moment - if we are using both keys and values from a dictionary we need to call dict.items()
. But, if we’re only calling for the keys then this call is unnecessary. For instance:
for name, age in people.items():
print("Hi, my name is", name)
In this example we’re grabbing the name of each person in people
, but we’re only using the dictionary key and never the value (their age) so we don’t actually need the .items()
. Instead we can simplify the code down to:
for name in people:
print("Hi, my name is", name)
This is a bit nicer to read because we no longer have the unnecessary dict.items()
call and we also are able to remove the unused variable age
.
If we were calling the key and the value of each item in the dictionary we would need to maintain the dict.items()
.
Python's PEP 8 style guide is a nice source of little improvements you can make to your code.
One tip is that it's easier to read, and reads more like English prose, if you write not in
and is not
rather than negating the whole in
or is
expression.
Before:
if not hat in hats:
scream("Where's my hat?")
if not hat is None:
scream("I have a hat!")
After:
if hat not in hats:
scream("Where's my hat?")
if hat is not None:
scream("I have a hat!")
When we’re dealing with a list we’ll often want to check whether or not a certain value exists in that list. One common way to handle that is using any
.
if any(hat == "bowler" for hat in hats):
shout("I have a bowler hat!")
This works, but Python’s built in in operator can make it even easier to handle this check:
if "bowler" in hats:
shout("I have a bowler hat!")
Now we’re able to do the exact same check, but in a much clearer way. At a quick glance we can see that we’re checking whether an item is in a list.
Augmented assignments are a quick and easy bit of Python syntax to include.
Wherever there's code like this:
count = count + other_value
we can replace it with:
count += other_value
This is a bit shorter and clearer - we don't need to think about the count
variable twice. Other operators that can be used include -=
, *=
, /=
and **=
.
One thing to be slightly careful of is that the type you're assigning to has to have the appropriate operator defined. For instance numpy
arrays do not support the /=
operation.
Something that we often see in people's code is assigning to a result variable and then immediately returning it.
def state_attributes(self):
"""Return the state attributes."""
state_attr = {
ATTR_CODE_FORMAT: self.code_format,
ATTR_CHANGED_BY: self.changed_by,
}
return state_attr
Here it's better to just return the result directly:
def state_attributes(self):
"""Return the state attributes."""
return {
ATTR_CODE_FORMAT: self.code_format,
ATTR_CHANGED_BY: self.changed_by,
}
This shortens the code and removes an unnecessary variable, reducing the mental load of reading the function.
Where intermediate variables can be useful is if they then get used as a parameter or a condition, and the name can act like a comment on what the variable represents. In the case where you're returning it from a function, the function name is there to tell you what the result is - in the example above it's the state attributes, and the state_attr
name wasn't providing any extra information.
The scope of local variables should always be as tightly defined as possible and practicable.
This means that:
- You don't have to keep the variable in your working memory through the parts of the function where it's not needed. This cuts down on the cognitive load of understanding your code.
- If code is in coherent blocks where variables are declared and used together, it makes it easier to split functions apart, which can lead to shorter, easier to understand methods.
- If variables are declared far from their usage, they can become stranded. If the code where they are used is later changed or removed unused variables can be left sitting around, complicating the code unnecessarily.
Let's take another look at the earlier example.
def should_i_wear_this_hat(self, hat):
if not isinstance(hat, Hat):
return False
current_fashion = get_fashion()
weather_outside = self.look_out_of_window()
is_stylish = self.evaluate_style(hat, current_fashion)
if weather_outside.is_raining:
print("Damn.")
return True
else:
print("Great.")
return is_stylish
Here the is_stylish
variable isn't actually needed if the weather is rainy. It could be moved inside the else
clause. This means we can also move the current_fashion
variable, which is only used here. You do need to check that these variables aren't used later in the function, which is easier if functions are kept short and sweet.
Moving the assignment to current_fashion
also avoids a function call when the weather is raining, which could lead to a performance improvement if it's an expensive call.
def should_i_wear_this_hat(self, hat):
if not isinstance(hat, Hat):
return False
weather_outside = self.look_out_of_window()
if weather_outside.is_raining:
print("Damn.")
return True
else:
print("Great.")
current_fashion = get_fashion()
is_stylish = self.evaluate_style(hat, current_fashion)
return is_stylish
We could actually then go one step further and inline the is_stylish
variable. This shows how small refactorings can often build on one another and lead to further improvements.
def should_i_wear_this_hat(self, hat):
if not isinstance(hat, Hat):
return False
weather_outside = self.look_out_of_window()
if weather_outside.is_raining:
print("Damn.")
return True
else:
print("Great.")
current_fashion = get_fashion()
return self.evaluate_style(hat, current_fashion)
Doing it this way is a convention, set out in Python's PEP8 style guide. Once you've gotten used to doing it this way it does make the code slightly easier to read and a bit less cluttered.
A common way of opening and using files in Python is:
file = open("welcome.txt")
data = file.read()
print(data)
file.close()
However if an exception is thrown in between the file being opened and closed the call to file.close()
may end up being skipped. One way to resolve this would be to use code like this. Here the try...finally
structure ensures that the file will be closed.
file = open("welcome.txt")
try:
data = file.read()
print(data)
finally:
file.close()
While this has resolved the file closure issue it is quite verbose. Here's where Python's with
context manager comes to the rescue. Under the hood this behaves in the same way as the try...finally
example - the file is closed for you as soon as the block is exited, even where an exception has been thrown.
with open("welcome.txt") as file:
data = file.read()
print(data)
This code is slightly shorter and easier to read - letting you focus on the logic that matters rather than the details of file closure.
Don't Repeat Yourself (DRY) is an important tenet of writing clean, maintainable code. Duplicated code bloats the code base, making it harder to read and understand. It often also leads to bugs. Where changes are made in only some of the duplicated areas unintended behaviour will often arise.
One of the main ways to remove duplication is to extract the common areas into another method and call that.
def extraction_example():
self.speed_slider = Scale(
self.parent, from_=1, to=10, orient=HORIZONTAL, label="Speed"
)
self.speed_slider.pack()
self.speed_slider.set(DEFAULT_SPEED)
self.speed_slider.configure(background="white")
self.force_slider = Scale(
self.parent, from_=1, to=10, orient=HORIZONTAL, label="Force"
)
self.force_slider.pack()
self.force_slider.set(DEFAULT_FORCE)
self.force_slider.configure(background="white")
Here you can see that the code to create a slider is repeated twice. In each case the Scale
object is created and then the same three methods are called on it, with some differing arguments.
You can extract this out into a method like this:
def extraction_example():
self.speed_slider = create_slider(self, "Speed", DEFAULT_SPEED)
self.force_slider = create_slider(self, "Force", DEFAULT_FORCE)
def create_slider(self, label, default_value):
result = Scale(self.parent, from_=1, to=10, orient=HORIZONTAL, label=label)
result.pack()
result.set(default_value)
result.configure(background="white")
return result
The original method is now shorter and clearer, and the new method is focused on doing one thing - creating a slider.
If more sliders are needed later the create_slider
method can be called instead of duplicating the code again. Also if something changes in the way sliders are created, we only need to change the code in one place instead of two or more.
Recognising where you can extract duplicate, or more commonly almost-duplicate, code into its own method is a key skill to learn in improving your code quality.
A pattern we often see when setting the value of boolean variables is one like this (these two code snippets are equivalent):
if hat_string.startswith("hat"):
starts_with_hat = True
else:
starts_with_hat = False
starts_with_hat = True if hat_string.startswith("hat") else False
The aim is to set starts_with_hat
to True
if hat_string.startswith('hat')
returns True
, and False
if it returns False
.
This can be done much more directly by just setting the variable straight from the function call. To do this you must be sure that it returns a boolean!
starts_with_hat = hat_string.startswith("hat")
This is much shorter and clearer.
2.6.7. Use list, set or dictionary comprehensions directly instead of calling list(), dict() or set()
The Pythonic way to create a list, set or dictionary from a generator is to use comprehensions.
This can be done for lists, sets and dictionaries:
Before:
squares = list(x * x for x in y)
squares = set(x * x for x in y)
squares = dict((x, x * x) for x in xs)
After:
squares = [x * x for x in y]
squares = {x * x for x in y}
squares = {x: x * x for x in xs}
Using the comprehensions rather than the methods is slightly shorter, and the dictionary comprehension in particular is easier to read in comprehension form.
Comprehensions are also slightly more performant than calling the methods.
One pattern that we sometimes see is that filtered collections are created by copying and deleting unwanted elements.
my_hats = {"bowler": 1, "sombrero": 2, "sun_hat": 3}
for hat in my_hats.copy():
if hat not in stylish_hats:
del my_hats[hat]
Using comprehensions is a much nicer way to achieve the same aim.
my_hats = {"bowler": 1, "sombrero": 2, "sun_hat": 3}
my_hats = {hat: value for hat, value in my_hats.items() if hat in stylish_hats}
Doing it this way round lets you define what you want to be in your output collection rather than deleting what isn't needed. This reads much more naturally than the for
loop version.
When we’re printing something in Python we don’t need to explicitly call str()
to print non-strings - it’s done automatically. So if we have something like:
print(str(1))
We can simplify it down to
print(1)
Cutting out the str()
call is a nice, easy way to simplify print statements, that is also a slight performance improvement.
Nested try-except
statements don’t create the same complexity issues as what we can run into from traditional nested conditionals. But, they’re still best to avoid from an overall readability perspective. For example:
def testConnection(db, credentials):
try:
try:
db.connect(credentials)
except InvalidCredentials:
return "Check your credentials"
except ConnectionError:
return "Error while trying to connect"
finally:
print("Connection attempt finished")
return "Connection Successful"
Has a nested try
at the top that doesn’t need to be included this is because try
can contain both a series of except
clauses and a finally
. We can therefore cut it down to:
def testConnection(db, credentials):
try:
db.connect(credentials)
except InvalidCredentials:
return "Check your credentials"
except ConnectionError:
return "Error while trying to connect"
finally:
print("Connection attempt finished")
return "Connection Successful"
The reduction in nesting has made this code a bit clearer and easier to read.