Loops are a concept for repeat an action on each item in a collection. In day to day life, you might think of this like brushing your teeth -- for each tooth in your mouth scrub with toothbrush and a little bit of toothpaste.
The basic format for looping in python is usually taught like this:
>>> for number in range(5):
... print(number)
0
1
2
3
4
>>>
What happened here? What is range
? What is number
?
In python, the function range
creates an integer generator.
>>> range(5)
range(0, 5)
>>>
In more simple terms, range
creates a starting point, a stopping point, and how big of steps to take, but doesn't create any data. In order to be super efficient, it waits until you use the integers in the range to do anything. If we call the help
function on range
you can see more details and explanations:
>>> help(range)
class range(object)
| range(stop) -> range object
| range(start, stop[, step]) -> range object
...
By default, range
only expects the stopping point and assumes the start is 0 and the step is 1, but you can choose other values when they make sense. Additionally, the starting point is always included and the stopping point is always excluded. In our example, range(5)
you can visualize the data that function generates by explicitly converting it to a list.
>>> list(range(5))
[0, 1, 2, 3, 4]
So range(5)
gave us 5 integers, 0 through 4. In our example, we print each of those integers by telling python to act on each one. This is all a for
loop does. So what was number
in our example? It was a variable, same as if we assigned one normally like number = 0
. So for each item in the collection of integers from 0 through 4, the for
loop first assigns the variable number
to the item and then carries our whatever task is below. In fact, at the end of your for
loop, you can call the variable and it will come back as the last item ran.
>>> for number in range(5):
... pass
>>> number
4
In that example, pass
keyword is used to do nothing.
An alternative to for
is while
. Any loop can be constructed using either keyword, but usually for
is used to repeat through a collection and while
is used to repeat until some specific condition is met.
A typical example for a while
loop might be:
>>> countdown = 5
>>> while countdown >= 0:
... if countdown > 0:
... print(countdown)
... else:
... print("Blastoff!")
... countdown -= 1
5
4
3
2
1
Blastoff!
>>>
While loops are common sources of bugs in python code. It's really easy to miss something and end up with a loop that never ends. For example, I did this when constructing this example -- I got so excited about printing out "Blastoff!"
than I forgot to add that final line to subtract 1 from the countdown!
On the other hand, sometimes you do want a program to run forever until the person using the program decides to quit. For example, if you wanted to build a never ending Tetris clone, you might initiate the program as:
while True:
play_tetris()
That program would just keep running until it window was closed or maybe a keyboard interrupt was encountered.
Sometimes when using loops you need to keep track of both the item and how far along you are. This is exactly what the enumerate
function does.
>>> fruits = ['apple', 'banana', 'cherry', 'dragonfruit']
>>> for i, fruit in enumerate(fruits):
... print("item:", i, "is", fruit)
item: 0 is apple
item: 1 is banana
item: 2 is cherry
item: 3 is dragonfruit
Any iterable (i.e. thing you can loop over) works with enumerate
. You may find this handy when working with different lists of the same length where you want to update one based on the other or want to use items from one list in another.
>>> tastes_good = [False, True, False, True]
>>> for i, fruit in enumerate(fruits):
... if tastes_good[i]:
... print(fruit, "tastes good")
banana tastes good
dragonfruit tastes good
Like that last example in enumerate
, sometime you actually want to combine multiple iterables into a single one joined by the elements. This is where the zip
function comes in.
For example, if we want to loop over the combine lists to create a dict
of fruits and their taste, we could do something like:
>>> fruit_data = {}
>>> for fruit, taste in zip(fruits, tastes_good):
... fruit_data[fruit] = taste
>>> fruit_data
{'apple': False, 'banana': True, 'cherry': False, 'dragonfruit': True}
This example was meant to show how you can loop through a zip
object, which you may want to do if you need to apply some function or logic to the items conditionally. That said, you could create the dict
more simply with just:
>>> fruit_data = dict(zip(fruits, tastes_good))
The dict
in python is a {key: value} data structure. A dict
key can be anything that is hashable, which is to say anything that is not itself some sort of indexed data structure. For example, any string or number is a valid key, but a list or another dictionary is not. Interestingly, a tuple
is a valid key even though it looks like a list because it is immutable (i.e. once it is defined it can't be changed). A dict
value can be any object, including another dict
.
With that information in mind, python gives you a few options of how to loop over a dict
. Using the regular for
loop way only looks at the keys.
>>> for thing in fruit_data:
... print(thing)
apple
banana
cherry
dragonfruit
Additionally, dict
objects have methods for getting just the keys, just the values, or both:
>>> fruit_data.keys()
dict_keys(['apple', 'banana', 'cherry', 'dragonfruit'])
>>> fruit_data.values()
dict_values([False, True, False, True])
>>> fruit_data.items()
dict_items([('apple', False), ('banana', True), ('cherry', False), ('dragonfruit', True)])
Sometimes your data and logic might be complex enough that a single loop won't be enough to check all conditions or make all necessary changes. This might occur when working with data from the web, which is typically in JSON format.
{
"time": "2019-04-12 11:49:07",
"pets": {
"dogs": [
{
"name": "Koda",
"sex": "female"
},
{
"name": "Wilbur",
"sex": "male"
}
],
"cats": [
{
"name": "Tipsy",
"sex": "female"
},
{
"name": "Balto",
"sex": "male"
}
]
}
}
Let's assume we converted this JSON to a python dictionary (maybe using json.load()
) names pet_data
. If we access the pets
from the dict
, we would get another dict
:
>>> with open("pets.json") as pets:
... pet_data = json.load(pets)
>>> pet_data
{'time': '2019-04-12 11:49:07', 'pets': {'dogs': [{'name': 'Koda', 'sex': 'female'}, {'name': 'Wilbur', 'sex': 'male'}], 'cats': [{'name': 'Tipsy', 'sex': 'female'}, {'name': 'Balto', 'sex': 'male'}]}}
>>> pet_data["pets"]
{'dogs': [{'name': 'Koda', 'sex': 'female'}, {'name': 'Wilbur', 'sex': 'male'}], 'cats': [{'name': 'Tipsy', 'sex': 'female'}, {'name': 'Balto', 'sex': 'male'}]}
Let's say we want to enhance this data by adding the sound the animal makes -- dogs go "woof" and cats go "meow".
First just to show what each item results in:
>>> for pet_type, pets in pet_data["pets"].items():
... print(pet_type)
... for pet in pets:
... print(pet)
dogs
{'name': 'Koda', 'sex': 'female'}
{'name': 'Wilbur', 'sex': 'male'}
cats
{'name': 'Tipsy', 'sex': 'female'}
{'name': 'Balto', 'sex': 'male'}
Next how we might accomplish the task:
>>> for pet_type, pets in pet_data["pets"].items():
... if pet_type == "dogs":
... for pet in pets:
... pet["sound"] = "woof!"
... if pet_type == "cats":
... for pet in pets:
... pet["sound"] = "meow"
>>> pet_data["pets"]
{'dogs': [{'name': 'Koda', 'sex': 'female', 'sound': 'woof!'}, {'name': 'Wilbur', 'sex': 'male', 'sound': 'woof!'}], 'cats': [{'name': 'Tipsy', 'sex': 'female', 'sound': 'meow'}, {'name': 'Balto', 'sex': 'male', 'sound': 'meow'}]}
As you can see, nested loops are really small components of logic stacked together. This means we could probably break out the logic into small functions in order to keep our code DRY (don't repeat yourself) and more importantly allow for testing (it's easy to test a small function, but difficult to test a large, complex loop).
A lot of the times you want to get the results of a loop back into a list. You could could accomplish this by creating an empty list and adding the results to that list.
>>> results = []
>>> for number in range(1, 11):
... results.append(number ** 2)
>>> results
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
What's wrong with this approach? It looks straight-forward and is easy to understand, which is good, but it's a little inefficient. For small lists, this is totally fine. When working with really large lists, this starts to become a problem because of how python represents lists "under the hood". Specifically, each iteration in the loop causes python to have to grow the list as it goes. A more efficient approach would be to create an empty list of the size of the results first and then add each item (this is known as preallocation).
>>> better_results = [None] * 10
>>> better_results
[None, None, None, None, None, None, None, None, None, None]
>>> for number in range(10):
... better_results[number] = (number + 1) ** 2
>>> better_results
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
We can prove this using the timeit
module, which is really easy to do in ipython or jupyter using the %%timeit
magic.
In [5]: %%timeit
...: results = []
...: for number in range(10_000_000):
...: results.append(number ** 0.5)
...:
1.43 s ± 38.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: %%timeit
...: results = [None] * 10_000_000
...: for number in range(10_000_000):
...: results[number] = number ** 0.5
...:
1.14 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
While it might not seem that much better, this is a pretty simple example. Also, if you are doing this often, each savings can add up in the long run.
Python has support for applying functions to iterables instead of using a loop. The idea is exactly the same, but instead of writing a loop statement, we wrap our work in a function.
The map
function runs another function against each item in an iterable. For example, say we want to get the square root of a list of numbers. The for
loop syntax could be:
>>> square_roots = []
>>> for number in range(10):
... square_roots.append(sqrt(number))
>>> square_roots
[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178, 2.6457513110645907, 2.8284271247461903, 3.0]
We can shorten the amount of code quite a bit using map
instead, but it works a little differently. For one, it doesn't immediately calculate the results. Instead, in order to be efficient it waits until we do something with that map
object as well as let's us work on only one item at time. This prevents really large lists from having to be built at once. The caveat is that you only get one chance at each item before it gets cleared from memory. To read more about this, check out this article from Real Python on Generators.
>>> square_roots = map(sqrt, range(10))
>>> square_roots
<map object at 0x1251b0828>
>>> list(square_roots)
[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178, 2.6457513110645907, 2.8284271247461903, 3.0]
Performance of using map
can be hit or miss. For this example it was very good. Compared to the preallocated list creation, 1.14 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
, we see quite a big speedup.
In [17]: %%timeit
...: square_roots = list(map(sqrt, range(10_000_000)))
...:
697 ms ± 26.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If we have an iterable and we just to exclude items from it, the filter
function is a great way to do this.
Say we only want to keep the even numbers, the for
loop way might be:
>>> def number_is_even(number):
... return number % 2 == 0
>>> numbers = list(range(20))
>>> even_numbers = []
>>> for number in numbers:
... if number_is_even(number):
... even_numbers.append(number)
>>> even_numbers
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
The much shorter filter
way would be:
>>> f = filter(number_is_even, numbers)
>>> f
<filter object at 0x126886cf8>
>>> list(f)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
As you can see, this works the same way as map
and also has similar performance.
In [22]: %%timeit
...: numbers = list(range(1_000_000))
...: even_numbers = []
...: for number in numbers:
...: if number_is_even(number):
...: even_numbers.append(number)
...:
162 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [23]: %%timeit
...: list(filter(number_is_even, range(1_000_000)))
...:
110 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Whenever you loop through an iterable in order to aggregate or combine results, we store a current state and modify it with each new item. This is known as reduce
in functional programming.
Side Note: In python2, reduce
was a built-in function. In python3 it was moved to the functools
module so needs to be imported.
Say we want multiply all the items of a list together.
>>> total = 1
>>> for number in range(1, 11):
... total *= number
>>> total
3628800
The reduce
way to do this requires us to use a function that multiplies numbers together instead of using the operator *
. We could write this as:
def multiply(x, y):
return x * y
However, the built-in operators are also accessible from the operator
module.
>>> import operator
>>> multiply(4, 5) == operator.mul(4, 5)
True
So the reduce
way could be:
>>> reduce(operator.mul, range(1, 11))
3628800
So what about performance? For this specific example, the performance is almost identical.
In [12]: %%timeit
...: total = 1
...: for number in range(1, 100_000):
...: total *= number
...:
2.33 s ± 71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [13]: %%timeit
...: reduce(operator.mul, range(1, 100_000))
...:
...:
2.29 s ± 40.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In general, performance using map
, filter
and reduce
can sometimes be slower than a for
loop depending on the function getting applied and how the code in the for
loop is written. Whenever you aren't sure, running some benchmark tests like timeit
can be very helpful.
What are list comprehensions? They are just a convenience form for doing exactly this, otherwise known as syntactic sugar. The form of a list comprehension goes like this:
- create the empty list
- add the result item first
- add the for loop statement next
- add any conditionals last
Say you have this loop:
divisible_by_seven = [ ]
for num in range(100):
if not num % 7:
divisible_by_seven.append(num)
Step 1:
divisible_by_seven = []
Step 2:
divisible_by_seven = [num]
Step 3:
divisible_by_seven = [num for num in range(100)]
Step 4:
divisible_by_seven = [num for num in range(100) if not num % 7]
Additionally, you are free to use extra whitespace to make it more readable as your list comprehensions can't become longer than a typical single line. For example, this is equivalent to the prior example:
divisible_by_seven = [
num
for num in range(100)
if not num % 7
]
Lastly, let's benchmark our previous preallocated list creation using the comprehension syntax. For reference, the stats for that way were 1.14 s ± 36.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
.
In [7]: %%timeit
...: results = [sqrt(number) for number in range(10_000_000)]
...:
985 ms ± 44.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Not only is this less to write, it's actually gains a little bit more performance over the standard loop. You might also notice this is slower than the map
version. That's definitely true in this example, but in practice the difference may not always be very big. In general, map
works really great for cases where both the function and the iterable are already defined and the function overhead is minimal, where list comprehensions work really well when building logic and conditions on the fly.
There is not hard or fast rule, but in general an old style for
loop can be more readable after a certain amount of complexity. Here are some basic rules that I find helpful.
- When there are more than two
for
statements in the loop, should probably break that nested data down first. - When there are more than two
if
statements in thefor
loop, should probably turn that into a function first.
Like the list comprehension, the dict comprehension is syntactic sugar for build dictionaries in an efficient manner.
Old way to build a dict
:
>>> is_even = {}
>>> for number in range(10):
... is_even[number] = number % 2 == 0
...
>>> is_even
{0: True,
1: False,
2: True,
3: False,
4: True,
5: False,
6: True,
7: False,
8: True,
9: False}
Using a dict comprehension goes like this:
- create the empty dict
- add
key: value
item - add the for loop statement next
- add any conditionals last
So it's basically the same as list comprehension but you have a key: value
item instead of a single item.
>>> is_even = {number: number % 2 == 0 for number in range(10)}
>>> is_even
{0: True,
1: False,
2: True,
3: False,
4: True,
5: False,
6: True,
7: False,
8: True,
9: False}
In addition to the built-in loop constructors, python comes with the module itertools
containing functions creating iterators for efficient looping.
The functions in itertools
can help for situations where you need to do things like:
cycle
through a list forever until you say stop, e.g. start back at the beginning every it reaches the endrepeat
the items from a small list into a much bigger list
>>> list(itertools.repeat([1, 2], 4))
>>> [[1, 2], [1, 2], [1, 2], [1, 2]]
chain
multiple lists together into a single long list
>>> list(itertools.chain([1, 2, 3], ['A', 'B', 'C']))
>>> [1, 2, 3, 'A', 'B', 'C']
product
of multiple lists into a list of all the combinations
>>> list(itertools.product([1, 2, 3], ['A', 'B', 'C']))
[(1, 'A'),
(1, 'B'),
(1, 'C'),
(2, 'A'),
(2, 'B'),
(2, 'C'),
(3, 'A'),
(3, 'B'),
(3, 'C')]
And many more! Whenever you think there's a better way to work with loops than what you're doing now, consider looking up itertools
examples.