Skip to content

Instantly share code, notes, and snippets.

@bearfrieze
Last active December 23, 2023 22:49
Show Gist options
  • Save bearfrieze/a746c6f12d8bada03589 to your computer and use it in GitHub Desktop.
Save bearfrieze/a746c6f12d8bada03589 to your computer and use it in GitHub Desktop.
Comprehensions in Python the Jedi way

Comprehensions in Python the Jedi way

by Bjørn Friese

Beautiful is better than ugly. Explicit is better than implicit.

-- The Zen of Python

I frequently deal with collections of things in the programs I write. Collections of droids, jedis, planets, lightsabers, starfighters, etc. When programming in Python, these collections of things are usually represented as lists, sets and dictionaries. Oftentimes, what I want to do with collections is to transform them in various ways. Comprehensions is a powerful syntax for doing just that. I use them extensively, and it's one of the things that keep me coming back to Python. Let me show you a few examples of the incredible usefulness of comprehensions.

All of the tasks presented in the examples can be accomplished with the extensive standard library available in Python. These solutions would arguably be more terse and efficient in some cases. I don't have anything against the standard library. To me there is a certain elegance and beauty in the explicit nature of comprehensions. Everything you need to know is right there in the code in a concise and readable form – no need to dig through the docs.

Note: I'm using Python 3.5. List, set, and dictionary comprehensions are available in Python 2.7 and above, but the functions and syntax used in the examples might not be available/valid for other versions of Python.

Bleeps and bloops

We are trying to have a meaningful conversation with R2-D2, but he just bleeps and bloops in a seemingly random pattern. After scratching our head for a while, we start jotting down the sequence of bleeps 0 and bloops 1:

bbs = '01110011001000000110111001101111001000000010000001101001001000000111001101101110001000000110010100100000001000000110100000100000001000000110010100100000011100100010000000100000011100000110110100100000011011110010000001100011'

Hmm. That looks interesting. Maybe it's octets of bits denoting ASCII characters? Let's try splitting up the bit string into octets.

Using an imperative approach that might look something like this:

octets = []
for i in range(0, len(bbs), 8):
  octets.append(bbs[i:i+8])

First initialize a new list. For every 8th index in the string we slice a string of length 8 and append it to the list of octets.

Can we do better than this? Of course we can! Take a look at this functional approach:

octets = list(map(lambda i: bbs[i:i+8], range(0, len(bbs), 8)))

We map the indexes of the octets to a lambda function that return an octet starting at that index. The map function returns an iterable which we turn into a list with the list function. This is slightly more concise than the imperative approach, but arguably less readable.

We decide to ask master Yoda for advice. He suggests the following:

octets = [bbs[i:i+8] for i in range(0, len(bbs), 8)]

Wait, is that the force? Nope, that right there is a comprehension. A list comprehension to be more exact.

The brackets [] indicate that we are making a new list. Inside the brackets we first have an expression: bbs[i:i+8]. Next up is a for clause: for i in range(0, len(bbs), 8). The for clause defines the iterator that we use as a basis for our new list, and the initial expression defines the resulting element in the new list.

Bonus info: The stuff inside the brackets is called a generator expression and can be used on it's own to create iterators.

Now that we know what a list comprehension is, we can use it again to turn the octets into characters:

chrs = [chr(int(octet, 2)) for octet in octets]

And we get:

['s', ' ', 'n', 'o', ' ', ' ', 'i', ' ', 's', 'n', ' ', 'e', ' ', ' ', 'h', ' ', ' ', 'e', ' ', 'r', ' ', ' ', 'p', 'm', ' ', 'o', ' ', 'c']

Hmm, that looks promising, but it's still kind of fragmented. What if we removed the spaces?

Normally we would filter out all the ' ' characters:

chrs = list(filter(lambda c: c != ' ', chrs))

That would work, but now that we know the true power of the comprehension, we can simply do this instead:

chrs = [c for c in chrs if c != ' ']

We can use if clauses in our list comprehensions to perform a filtering operations. Neat!

Finally we join up the letters into a string to make the message more readable:

message = ''.join(chrs)

Err, what is a "snoisneherpmoc"? Maybe R2-D2 spoke the message in reverse for some reason.

message = ''.join(reversed(chrs))

Ah! The message is "comprehensions". R2-D2 knows what's up.

Droid dating

For this example we are making a dating service for heroic droids. We want a list of all the ways to match up 2 droids from the following list:

droids = [
  {'name': 'BB-8', 'fav_jedi': 'Rey'},
  {'name': 'R2-D2', 'fav_jedi': 'Luke Skywalker'},
  {'name': 'C-3PO', 'fav_jedi': 'Luke Skywalker'},
]

We could use itertools.combinations to do this, but for now let's imagine it doesn't exist and that we have to write our own code for once.

Let's start out by creating a list of all the possible permutations of 2 droids the old school way:

matches = []
for i in range(len(droids)):
  for j in range(i + 1, len(droids)):
    matches.append((droids[i], droids[j]))

We can make that a little nicer if we use the built in enumerate function and some list slicing:

matches = []
for i, a in enumerate(droids):
  for b in droids[i + 1:]:
    matches.append((a, b))

That can be turned into a cute one liner with a nested list comprehension (yes, you can nest them!):

matches = [(a, b) for i, a in enumerate(droids) for b in droids[i + 1:]]

Finally, we might want to score these matches based on whether the droids share a favourite jedi. This just happens to be really easy to do with an inline conditional expression:

scores = ['Great' if a['fav_jedi'] == b['fav_jedi'] else 'Miserable' for a, b in matches]

Let's zip the matches with the scores and print them in a nice and readable format:

print(['{[name]} + {[name]} = {}'.format(*m, s) for m, s in zip(matches, scores)])
# ['BB-8 + R2-D2 = Miserable', 'BB-8 + C-3PO = Miserable', 'R2-D2 + C-3PO = Great']

And thus we can conclude that R2-D2 and C-3PO are a great match.

Lift-off

Darth Vader and Luke Skywalker can't find their ships right before the big chase around the Death Star. Let's help them out.

pilots = [
  {'name': 'Luke Skywalker', 'ship_id': 0},
  {'name': 'Darth Vader', 'ship_id': 1},
]
ships = [
  {'id': 0, 'model': 'T-65B X-wing'},
  {'id': 1, 'model': 'TIE Advanced x1'},
]

No problem, we just join the two lists using a nested list comprehension:

pilot_ships = [(p, s) for p in pilots for s in ships if p['ship_id'] == s['id']]

For each pilot we iterate over all the ships. If the pilots ship_id is equal to the ships id they are a match, and we add the tuple to the list.

Let's see if we got this right:

print(['{[name]} → {[model]}'.format(p, s) for p, s in pilot_ships])
# ['Luke Skywalker → T-65B X-wing', 'Darth Vader → TIE Advanced x1']

Ready for lift-off!

Planets

We are presented with a dictionary of episodes each containing a (non-exhaustive) list of names of planets that appears in that episode:

episodes = {
  'Episode I': {'planets': ['Naboo', 'Tatooine', 'Coruscant']},
  'Episode II': {'planets': ['Geonosis', 'Kamino', 'Geonosis']},
  'Episode III': {'planets': ['Felucia', 'Utapau', 'Coruscant', 'Mustafar']},
  'Episode IV': {'planets': ['Tatooine', 'Alderaan', 'Yavin 4']},
  'Episode V': {'planets': ['Hoth', 'Dagobah', 'Bespin']},
  'Episode VI': {'planets': ['Tatooine', 'Endor']},
  'Episode VII': {'planets': ['Jakku', 'Takodana', 'Ahch-To']},
}

How can we get a collection of unique planets that appeared throughout the episodes? First we use a nested list comprehension to flatten the planets into a single list:

planets_flat = [planet for episode in episodes.values() for planet in episode['planets']]

Note: The nested comprehension is consumed from left to right, and thus we need to have the episodes loop before the planets loop.

From here we could wrap the resulting list in a set like this to remove the duplicates:

planets_set = set(planets_flat)

But we won't bother with that. We got a secret weapon that will simplify and obliterate this task:

planets_set = {planet for episode in episodes.values() for planet in episode['planets']}

Set comprehensions!

Lightsabers

I recently stumbled upon the collections.Counter class while reading some code a friend had written. He was using it to buld a dictionary of frequencies of certain values appearing in a list of dictionaries roughly like this:

import collections

jedis = [
  {'name': 'Ahsoka Tano', 'lightsaber_color': 'green'},
  {'name': 'Anakin Skywalker', 'lightsaber_color': 'blue'},
  {'name': 'Anakin Solo', 'lightsaber_color': 'blue'},
  {'name': 'Ben Skywalker', 'lightsaber_color': 'blue'},
  {'name': 'Count Duku', 'lightsaber_color': 'red'},
  {'name': 'Darth Craidus', 'lightsaber_color': 'red'},
  {'name': 'Darth Maul', 'lightsaber_color': 'red'},
  {'name': 'Darth Vader', 'lightsaber_color': 'red'},
  {'name': 'Jacen Solo', 'lightsaber_color': 'green'},
  {'name': 'Ki-Adi-Mundi', 'lightsaber_color': 'blue'},
  {'name': 'Kit Fisto', 'lightsaber_color': 'green'},
  {'name': 'Luke Skywalker', 'lightsaber_color': 'green'},
  {'name': 'Obi-Wan Kenobi', 'lightsaber_color': 'blue'},
  {'name': 'Palpatine', 'lightsaber_color': 'red'},
  {'name': 'Plo-Koon', 'lightsaber_color': 'blue'},
  {'name': 'Qui-Gon Jinn', 'lightsaber_color': 'green'},
  {'name': 'Yoda', 'lightsaber_color': 'green'},
]

frequencies = collections.Counter(jedi['lightsaber_color'] for jedi in jedis)

print(frequencies)
# Counter({'blue': 6, 'green': 6, 'red': 5})

I thought that was a really cool solution. Note that we are using a generator expression here rather than a list comprehension, since we don't need the list (Counter takes an iterable which is exactly what you get from a generator expression).

But do we really need to import a class and read the documentation for said class to accomplish this? No! Dictionary comprehensions can do this:

colors = [jedi['lightsaber_color'] for jedi in jedis]
frequencies = {color: colors.count(color) for color in set(colors)}

print(frequencies)
# {'green': 6, 'red': 5, 'blue': 6}

This approach uses an additional line to create a list of colors, but on the other hand it's easy to understand what's going on without reading the Counter documentation.

Note: The solution with comprehensions run in quadratic time while collections.Counter runs in linear time. If you need to do this efficiently use collections.Counter.

That's all

I hope you feel like you now got a comprehensive overview of comprehensions. I urge you to give them a test drive if you haven't already.

Thanks for reading this article. Let me know how you use comprehensions in the comments section.

Thanks

License

Creative Commons License
Comprehensions in Python the Jedi way by Bjørn Friese is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

@K900
Copy link

K900 commented Mar 27, 2016

Please don't tell people to compare strings with is not, even one character ones. It's not reliable.

@WhyNotHugo
Copy link

@K900: Could you explain why that is?

@Strikeskids
Copy link

is not compares references while != compares the actual contents

@frewsxcv
Copy link

is not compares references while != compares the actual contents

An example of this:

>>> a = 'Don Quijote'
>>> b = 'Quijote'
>>> a[4:] == b
True
>>> a[4:] is b
False

@gagomes
Copy link

gagomes commented Mar 27, 2016

An alternative way to filter the blanks out (two pass, though)

>>> filter(None, filter(string.strip, chrs))
['s', 'n', 'o', 'i', 's', 'n', 'e', 'h', 'e', 'r', 'p', 'm', 'o', 'c']

@bearfrieze
Copy link
Author

Thanks @K900, @Strikeskids, and @frewsxcv! I've updated the Gist.

@joelgrus
Copy link

collections are super useful, it seems unfortunate to discourage people from using them simply because then they'd have to add an import statement and read the docs. (And I suspect most people don't regularly use list.count(), so they might have to read the docs for that anyway.)

@bearfrieze
Copy link
Author

@joelgrus: Fair point. My intention was to showcase the usefulness and variety of comprehensions – not to discourage people from using the standard library. In the introduction I briefly touch upon this.

@alan-andrade
Copy link

This is very cool. Thank you ❤️ 🎆 :shipit:

@popey456963
Copy link

Never realised you could have set comprehensions, didn't even occur to me... Thanks for writing this up! 👍

@dat-cxa
Copy link

dat-cxa commented Mar 28, 2016

It's really exciting to read.

@aldanor
Copy link

aldanor commented Mar 28, 2016

Why no dict comprehensions? 🐼

@bearfrieze
Copy link
Author

@aldanor: Check out the lightsabers example 🐻

@RomainGehrig
Copy link

You can use parentheses instead of brackets to create generators (the same kind you get with a yield value in a fonction) and abracadabra you have laziness!

(It can be inferred by reading carefully the "bonus info" but it's better said than not, isn't it 😁)

@veirus
Copy link

veirus commented Mar 30, 2016

Yet another confusing thing to consider – this starred expression is possible thanks to PEP 0448 and doesn't work in versions <3.5:

print(['{[name]} + {[name]} = {}'.format(*m, s) for m, s in zip(matches, scores)])
>>> only named arguments may follow *expression [pyflakes]

@jrusev
Copy link

jrusev commented Jul 23, 2016

@veirus: This will work in Python 2.7:

print(['{} + {} = {score}'.format(*m, score=s) for m, s in zip(matches, scores)])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment