Skip to content

Instantly share code, notes, and snippets.

@scardine
Last active August 29, 2015 14:11
Show Gist options
  • Save scardine/f8b9698969668c5e9e7a to your computer and use it in GitHub Desktop.
Save scardine/f8b9698969668c5e9e7a to your computer and use it in GitHub Desktop.
Iterating over dict keys versus key/value inside a list comprehension when you need access to the value (Python).

From a discussion with Martijn Pieters at stackoverflow

This is meant as a comment, posted as an answer because it is too big for the comment system. Answering to Martijn Pieters:

PauloScardine: but now you are inserting a value lookup each iteration. That's not more efficient. And in this specific case, you'd have to call vars(MyClass) too, or do an extra attribute lookup with MyClass.__dict__

Probably the difference is marginal, but it is always good to profile before claiming some code is more efficient than other, so here we go:

$ python --version
Python 2.7.6
$ python -m timeit 'from test import Foo; foo = Foo(); [x for x in foo.__dict__ if isinstance(foo.__dict__[x], property)]'
1000000 loops, best of 3: 1.04 usec per loop
$ python -m timeit 'from test import Foo; foo = Foo(); [x for x, y in foo.__dict__ if isinstance(y, property)]'
1000000 loops, best of 3: 1.08 usec per loop

Curiously:

$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x in foo.__dict__ if isinstance(foo.__dict__[x], property)]'
100000 loops, best of 3: 2.3 usec per loop
$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x, y in foo.__dict__ if isinstance(y, property)]'
100000 loops, best of 3: 2.09 usec per loop

Just a matter of taste then?

[update]

Forgot to call foo.__dict__.items():

$ cat test.py
class Foo(object):
    a = 1
    b = 2
    c = 3
    @property
    def bar(self):
        return self.a, self.b, self.c

$ python -m timeit 'from test import Foo; foo = Foo(); [x for x in foo.__dict__ if isinstance(foo.__dict__[x], property)]'
1000000 loops, best of 3: 1.1 usec per loop
$ python -m timeit 'from test import Foo; foo = Foo(); [x for x, y in foo.__dict__.items() if isinstance(y, property)]'
1000000 loops, best of 3: 1.17 usec per loop

$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x in foo.__dict__ if isinstance(foo.__dict__[x], property)]'
100000 loops, best of 3: 2.3 usec per loop
$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x, y in foo.__dict__.items() if isinstance(y, property)]'
100000 loops, best of 3: 2.32 usec per loop

Does it holds true for larger dicts?

$ cat test.py
class Foo(object):
    def __init__(self):
        for n in range(1000):
            setattr(self, 'x{:04d}'.format(n), n)
    @property
    def bar(self):
        return self.a, self.b, self.c

$ python -m timeit 'from test import Foo; foo = Foo(); [x for x in foo.__dict__ if isinstance(foo.__dict__[x], property)]'
1000 loops, best of 3: 661 usec per loop
$ python -m timeit 'from test import Foo; foo = Foo(); [x for x, y in foo.__dict__.items() if isinstance(y, property)]'
1000 loops, best of 3: 634 usec per loop

$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x in foo.__dict__ if isinstance(foo.__dict__[x], property)]'
1000 loops, best of 3: 1.07 msec per loop
$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x, y in foo.__dict__.items() if isinstance(y, property)]'
1000 loops, best of 3: 1.01 msec per loop

[update]

Another error spotted - should be iterating over Foo, not foo:

$ python -m timeit 'from test import Foo; foo = Foo(); [x for x in Foo.__dict__ if isinstance(Foo.__dict__[x], property)]'
1000 loops, best of 3: 441 usec per loop
$ python -m timeit 'from test import Foo; foo = Foo(); [x for x, y in Foo.__dict__.items() if isinstance(y, property)]'
1000 loops, best of 3: 446 usec per loop

$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x in Foo.__dict__ if isinstance(Foo.__dict__[x], property)]'
1000 loops, best of 3: 760 usec per loop
$ python3.4 -m timeit 'from test import Foo; foo = Foo(); [x for x, y in Foo.__dict__.items() if isinstance(y, property)]'
1000 loops, best of 3: 783 usec per loop

Conclusion: after a certain size the call to dict.items() starts to pay off, but for the average use case I prefer the first style (iterating over the keys and making a dict lookup) instead of using dict.items(). Mandatory quote from Knut: "premature optimization is the root of all evil".

[and another update]

Now this is strange; Am I missing something?

$ (python -m timeit 'from test import Foo; [x for x in Foo.__dict__ if isinstance(Foo.__dict__[x], property)]']
   python -m timeit 'from test import Foo; [x for x, y in Foo.__dict__.items() if isinstance(y, property)]')
100000 loops, best of 3: 2.65 usec per loop
100000 loops, best of 3: 2.73 usec per loop
$ (python3.4 -m timeit 'from test import Foo; [x for x in Foo.__dict__ if isinstance(Foo.__dict__[x], property)]'
   python3.4 -m timeit 'from test import Foo; [x for x, y in Foo.__dict__.items() if isinstance(y, property)]')
100000 loops, best of 3: 3.97 usec per loop
100000 loops, best of 3: 3.59 usec per loop

Why the hell does it make any difference if Foo is instantiated or not?

[and another update]

Keeping the setup code outside of the loop yields more consistency:

$ ( 
    python3.4 -m timeit -n 1000000 -s 'from test import Foo' '[x for x in Foo.__dict__ if isinstance(Foo.__dict__[x], property)]'
    python3.4 -m timeit -n 1000000 -s 'from test import Foo' '[x for x, y in Foo.__dict__.items() if isinstance(y, property)]
)
1000000 loops, best of 3: 2.01 usec per loop
1000000 loops, best of 3: 1.72 usec per loop

$ ( 
    python -m timeit -s 'from test import Foo' '[x for x in Foo.__dict__ if isinstance(Foo.__dict__[x], property)]'
    python -m timeit -s 'from test import Foo' '[x for x, y in Foo.__dict__.items() if isinstance(y, property)]'
)
1000000 loops, best of 3: 1.49 usec per loop
1000000 loops, best of 3: 1.55 usec per loop

Looks like dictionary lookup is way more expensive in python 3.4.

@mjpieters
Copy link

Note that none of the attributes on the instance will ever test as a property, because by that time the property objects are already bound and have returned the property value, not the property object. Not that it matters here; you just are producing empty lists, thats all.

@mjpieters
Copy link

Ah, it does matter, because foo.__dict__ won't have that key, so you are throwing KeyError exceptions here. You moved to testing how fast an exception is thrown again.

@scardine
Copy link
Author

Indeed, we don't need a Foo instance for this test.

@mjpieters
Copy link

Why the hell does it make any difference if Foo is instantiated or not? Because you are creating the instance each iteration in the loop. You want to move the import out to the setup code, not be part of the test as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment