Skip to content

Instantly share code, notes, and snippets.

@umutseven92
Last active July 6, 2022 14:02
Show Gist options
  • Save umutseven92/ac7a11723b9586375575714d1d092561 to your computer and use it in GitHub Desktop.
Save umutseven92/ac7a11723b9586375575714d1d092561 to your computer and use it in GitHub Desktop.
Python Cliff Notes

Python

How it Works

  • Python code, written in .py file is first compiled to what is called bytecode, which is stored with a .pyc, .pyo or .pyd format.
    • This bytecode is then executed by the interpreter, on instruction at a time.
    • When a module is imported, Python will cache the bytecode of the module as .pyc, .pyo and .pyd files in the __pycache__ folder.
    • .pyc is the bytecode of the module.
    • .pyo is the bytecode of the module if Python is ran with optimisation options (-o and -oo)
    • .pyd is Windows only and is packaged as a DLL.
  • Any Python file is a module, its name being the file's base name without the .py extension.
  • A package is a collection of Python modules.
    • A package also contains an additional __init__.py file, to distinguish a package from a directory that just happens to contain a bunch of Python scripts.
    • Since 3.3, __init__.py is not required for namespace packages.
      • Namespace packages are for different libraries that reside in different locations, and you want them each to contribute a subpackage to the parent package.
      • For example, if you have the following project structure:
      path1
       namespace
        module1.py
        module2.py
      path2
       namespace
        module3.py
        module4.py
      
      You can then do the following:
      from namespace import module1, module3

Interpreters & Compilers

  • CPython is the reference implementation of the Python programming language, written in (mostly) C.
  • PyPy is a fast, compiled alternative implementation of Python, written in Python.
    • PyPy is JIT compiled, not AOT compiled.
    • Since it does not support C extensions, C extension modules (numpy, scikit etc) run much slower than in CPython.
    • It also introduces some overhead, especially noticable with short scripts.
  • CPython mainly uses reference counting for memory management.
    • Objects created in Python have a reference count variable that keeps track of the number of references that point to the object. When this count reaches zero, the memory occupied by the object is released.
    • Due to potential reference cycle issues, where an instance has a reference to itself, which causes the reference count to never be zero, CPython also uses a cyclic garbage collector.
  • PyPy, on the other hand, doesn’t use reference counting. Instead, it uses only the cyclic garbage collector.
  • GIL (Global Interpreter Lock), is a mutex (a lock) that allows only one thread to hold the control of the Python interpreter.
    • It was added as a simple way of preventing deadlocks & other memory issues during reference counting.
    • This means that only one thread can be in a state of execution at any point in time, meaning that it is a performance bottleneck in CPU-bound and multi-threaded code.
    • Multiprocessing bypasses this limitation, as it creates a new process with its own interpreter and GIL.
    • Existing C extensions rely on the GIL, so it is difficult to replace.
    • PyPy also has a GIL, for reasons other than reference counting.
  • Cython is a programming language that is superset of the Python programming language, designed to give C-like performance.
    • Cython is a compiled language that is typically used to generate CPython extension modules.
  • Python has excellent interoperability with C, which is one of the reasons for its popularity.

Dunders

  • Double underscores (--) are called dunders.
  • A double underscore prefix causes the Python interpreter to rewrite the attribute name in order to avoid naming conflicts in subclasses.
    • This is called name mangling.
    • The interpreter will rename __baz to _ClassName__baz.
    • This attribute will not be visible outside the class, and each subclass will have its own __baz.
  • Dunder methods (like __init__) are also called magic methods.
  • __all__ is a list of strings defining what symbols in a module will be exported when import * is used on the module.
  • __call__ method is used to write classes where the instances behave like functions and can be called like a function.
  • __str__ is a special method used to represent a class’s objects as a string, called by the str() function.
  • __repr__ is a special method used to represent a class’s objects as a string, called by the repr() function.
    • __repr__ is a representation of the object, while __str__ is a user-friendly string explanation.
  • __getitem__ is called when an element is accessed using the array ([i]) notation.
  • __setitem__ is called when an element is added using the array ([i]) notation.
  • __delitem__ is called when an element deleted using the del notation.
  • __len__ is called when the len method is called on the object.
  • __contains__ is called when when the in notation is called on the object.
  • __add__, __sub__, __mul__ and __truediv__ is used for operator overloading.
  • __await__ is implemented by coroutines.

Modules & Execution

  • Wildcard imports do not import names starting with underscore.
    • Unless an __all__ list is defined, overriding the behaviour.
    • Wildcard imports should be avoided, as they make it unclear which names are present in the namespace.
      • One rare case where it's fine is when using the REPL for an interactive session, to save typing.
  • When the Python interpreter reads a source file, it does two things:
    • It sets the __name__ variable.
      • If the module is run as the main program (python main.py), then __name__ is set to __main__.
      • If the module is imported (import foo), __name__ is set to foo.
    • It executes all the code in the module, one statement at a time.
      • Imported modules are loaded and assinged to a variable, like so: math = __import__("math").
      • It executes any def blocks, creating a function object, then assigning that function object to a variable with the same name as the function.
    • if __name__ == "__main__" block, therefore only gets executed if the module is ran as the main program.
  • The source file gets compiled into bytecode, and stored in a .pyc file.
    • This is also done for every imported module.
    • If the .pyc file is up to date (checked by comparing timestamps), then this step is skipped.
  • The bytecode is then interpreted by the Python Virtual Machine, which is part of Python.

Iterators

  • Iterators are objects that you can loop over like a list.
    • The itertools module contains a number of iterator building functions, like chain(), groupby() and product().
  • With iterators, you do not have to have the whole collection in memory.
    • Very useful for large files, or data streams.
  • Iterators are implemented via the __iter__ and __next__ methods.
    • __iter__ should return the class that implements __next__ (usually self).
    • __next__ should return the next element in the collection.
      • If the sequence is exhausted, it needs to raise StopIteration.

Generators

  • Generators are a subset of iterators.
  • They are a simpler way of implementing iterators, via the yield keyword.
  • yield indicates where a value is sent back to the caller, but unlike return, the function is not exited afterwards.
    • Instead, the state of the function is remembered, and when next() is called on a generator object (either explicitly or implicitly within a for loop), the function is ran again.
    • If a generator is exhausted, it will return a StopIteration exception.
  • They can also be created via Generator Expressions: (num**2 for num in range(5))
    • In this example, num**2 is the yield'ed value.
  • send() method sends a value back to the generator.
  • close() stops the generator.
  • throw() throws an Exception with the generator.
  • yield from enables us to yield an inner generator, and pass any next(), send() and throw() values to it.
    • For example (bar and baz are both generators that yield values):
    def foo():
      for v in bar():
          yield v
      for v in baz():
          yield v
    Can be written as:
    def foo():
       yield from bar()
       yield from baz()
  • For cases with complex state, iterators are a better choice.

Coroutines

  • Coroutines are an extension of generators that ca pause and resume execution context.
  • yield allows the function to wait until it gets input.
    • It does this by via the value = yield statement.
    • The .send() method will then send the value to the yield, assigning it to value.
  • The async keyword introduces a native coroutine.
    • Before async and await was implemented in 3.5, it was accomplished by the @asyncio.coroutine decorator, which created a generator-based coroutine.
  • The await keyword suspends the execution of the surrounding coroutine and passes function control back to the event loop.
  • Coroutines created with async def are implemented using the __await__ dunder method.
    • They can yield, which makes them an async generator, but cannot yield from- that is for generator-based coroutines.

AsyncIO

  • asyncio is a single-threaded, single-process design. It uses cooperative multitasking.
    • It takes long waiting periods in which functions would otherwise be blocking and allows other functions to run during that downtime.
  • The event loop monitors coroutines, taking feedback on what’s idle, and looking around for things that can be executed in the meantime. It is able to wake up an idle coroutine when whatever that coroutine is waiting on becomes available.
  • If Python encounters an await f() expression in the scope of g(), await tells the event loop: “Suspend execution of g() until whatever I’m waiting on—the result of f()—is returned. In the meantime, go let something else run.”
  • asyncio is great when you have multiple IO-bound tasks where the tasks would otherwise be dominated by blocking IO-bound wait time, like network IO.

Typing

  • Type annotations are optional and do not affect runtime.
    • They get added to the objects __annotations__ dictionary.
  • To enforce type checks, third-party libraries such as mypy can be used.

Wheels

  • Python packages are distributed as either a source distribution, or a wheel.
  • A source distribution (sdist) contains source code.
    • That includes not only Python code but also the source code of any extension modules (usually in C or C++) bundled with the package.
    • With source distributions, extension modules are compiled on the user’s side rather than the developer’s.
    • When installing a sdist, pip will first download the archive (.tar.gz), and then build a wheel (.whl).
    • Usually complex packages that can't be distributed as a wheel will use sdist.
  • A wheel is a ready to go, pre-built format. When pip downloads a wheel, there is no build stage.
    • Wheels are much faster and easier to use.
    • pip will always prefere a wheel if it exists.

Multiprocessing

  • Each Python process has a main thread.
  • Due to the GIL, only one thread can be active at any time, this means that they run concurrently- the execution will switch from one thread to another.
  • This means threading is bad for CPU bound tasks, as it won't make the execution faster. It may even make it slower due to the overhead of creating threads.
  • Threading is good for IO and Network bound tasks, as the execution can continue on another thread while one thread is blocked.
  • Multiprocessing creates a fresh new process with its own interpreter and GIL.
  • This means that all processes will execute at the same time.
  • Multiprocessing is parallelism, while multithreading is concurrency.

Misc

  • The slice notation: array[start:stop:step]
    • Any of the three values can be left empty, in which case they will be substituted by their default value.
      • The default values are: start: beginning of the array, stop: end of the array, step: 1.
    • start is inclusive, while stop is exclusive.
    • All three values can be negative, in which case they will be counted in reverse order.
  • A single star in a function decleration (*arg) allows a variable number of arguments to be passed to the parameter.
    • The parameter will be a Tuple in the function.
  • Double star in a function decleration (**arg) allows multiple keywords (a=3, b=5) to be passed to the parameter.
    • The parameter will be Dict in the function.
  • A single star, when calling functions is used to pass list values as parameters, while a double star is used to pass dict values as parameters.
  • A class method (@classmethod) receives the class as implicit first argument. Useful as an alternative constructor.
  • Lists can be created via List Comprehension: [num**2 for num in range(5)]
  • / is regular division, while // is floor division (result is floored down).
  • The walrus operator (:=) assigns a value to a variable, and returns that variable.
    • Useful in situations where you'd want to assign values to variables within an expression.
  • PEP is shorthand for Python Enhancement Proposal.
  • Everything in Python is an object.
    • Assignment operation is just binding a name to an object.
      • So an assignment doesn’t copy the value. It just places a kind of identification like a “post-it” note on the box.
      • The name is just a reference to the object, not the object itself.
    • All values are boxed.
  • Class type is the metaclass of class object, and every class (including type) has inherited directly or indirectly from object.
    • Metaclasses are the 'stuff' that creates classes.
    • type is the metaclass Python uses to create all classes behind the scenes.
    • type is its own metaclass. This is not something you could reproduce in pure Python, and is done by cheating a little bit at the implementation level.
  • Immutable objects: bool, int, float, string, tuple, frozenset.
  • Mutable objects: list, set, dict.
  • Python is call-by-sharing, also known as call-by-object, “call-by-object-sharing” or pass-by-object-reference.
    • A function receives a reference to the same object in memory as used by the caller, but will create its own variable.
    • This means that if the variable in function is reassigned, it will not affect the original object.
    • However, if the object is modified, the original will be affected (for example, via the .append(..) function).
    • In practice, mutable objects act like call-by-reference and immutable objects act like call-by-value.
  • String Interning is a CPython optimization that tries to use existing immutable objects in some cases rather than creating a new object every time.
    • String containing ASCII letters, digits and underscores are interned.
  • is operator checks if both the operands refer to the same object; i.e. it checks if the identity of the operands matches or not (reference equality).
  • == operator compares the values of both the operands and checks if they are the same (value equality).
  • Uniqueness of keys in a Python dictionary is by equivalence, not identity.
    • So even though 5, 5.0, and 5 + 0j are distinct objects of different types, since they're equal, they can't both be in the same dict (or set).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment