- Python code, written in
.pyfile is first compiled to what is calledbytecode, which is stored with a.pyc,.pyoor.pydformat.- This bytecode is then executed by the interpreter, on instruction at a time.
- When a module is imported, Python will cache the bytecode of the module as
.pyc,.pyoand.pydfiles in the__pycache__folder. .pycis the bytecode of the module..pyois the bytecode of the module if Python is ran with optimisation options (-oand-oo).pydis Windows only and is packaged as a DLL.
- Any Python file is a module, its name being the file's base name without the .py extension.
- A package is a collection of Python modules.
- A package also contains an additional
__init__.pyfile, to distinguish a package from a directory that just happens to contain a bunch of Python scripts. - Since 3.3,
__init__.pyis not required for namespace packages.- Namespace packages are for different libraries that reside in different locations, and you want them each to contribute a subpackage to the parent package.
- For example, if you have the following project structure:
You can then do the following:path1 namespace module1.py module2.py path2 namespace module3.py module4.pyfrom namespace import module1, module3
- A package also contains an additional
CPythonis the reference implementation of the Python programming language, written in (mostly) C.PyPyis a fast, compiled alternative implementation of Python, written in Python.PyPyis JIT compiled, not AOT compiled.- Since it does not support C extensions, C extension modules (numpy, scikit etc) run much slower than in CPython.
- It also introduces some overhead, especially noticable with short scripts.
- CPython mainly uses reference counting for memory management.
- Objects created in Python have a reference count variable that keeps track of the number of references that point to the object. When this count reaches zero, the memory occupied by the object is released.
- Due to potential reference cycle issues, where an instance has a reference to itself, which causes the reference count to never be zero, CPython also uses a cyclic garbage collector.
PyPy, on the other hand, doesn’t use reference counting. Instead, it uses only the cyclic garbage collector.GIL(Global Interpreter Lock), is a mutex (a lock) that allows only one thread to hold the control of the Python interpreter.- It was added as a simple way of preventing deadlocks & other memory issues during reference counting.
- This means that only one thread can be in a state of execution at any point in time, meaning that it is a performance bottleneck in CPU-bound and multi-threaded code.
- Multiprocessing bypasses this limitation, as it creates a new process with its own interpreter and
GIL. - Existing C extensions rely on the
GIL, so it is difficult to replace. - PyPy also has a
GIL, for reasons other than reference counting.
Cythonis a programming language that is superset of the Python programming language, designed to give C-like performance.Cythonis a compiled language that is typically used to generate CPython extension modules.
- Python has excellent interoperability with C, which is one of the reasons for its popularity.
- Double underscores (
--) are called dunders. - A double underscore prefix causes the Python interpreter to rewrite the attribute name in order to avoid naming conflicts in subclasses.
- This is called name mangling.
- The interpreter will rename
__bazto_ClassName__baz. - This attribute will not be visible outside the class, and each subclass will have its own
__baz.
- Dunder methods (like
__init__) are also called magic methods. __all__is a list of strings defining what symbols in a module will be exported whenimport *is used on the module.__call__method is used to write classes where the instances behave like functions and can be called like a function.__str__is a special method used to represent a class’s objects as a string, called by thestr()function.__repr__is a special method used to represent a class’s objects as a string, called by therepr()function.__repr__is a representation of the object, while__str__is a user-friendly string explanation.
__getitem__is called when an element is accessed using the array ([i]) notation.__setitem__is called when an element is added using the array ([i]) notation.__delitem__is called when an element deleted using thedelnotation.__len__is called when thelenmethod is called on the object.__contains__is called when when theinnotation is called on the object.__add__,__sub__,__mul__and__truediv__is used for operator overloading.__await__is implemented by coroutines.
- Wildcard imports do not import names starting with underscore.
- Unless an
__all__list is defined, overriding the behaviour. - Wildcard imports should be avoided, as they make it unclear which names are present in the namespace.
- One rare case where it's fine is when using the REPL for an interactive session, to save typing.
- Unless an
- When the Python interpreter reads a source file, it does two things:
- It sets the
__name__variable.- If the module is run as the main program (
python main.py), then__name__is set to__main__. - If the module is imported (
import foo),__name__is set tofoo.
- If the module is run as the main program (
- It executes all the code in the module, one statement at a time.
- Imported modules are loaded and assinged to a variable, like so:
math = __import__("math"). - It executes any
defblocks, creating a function object, then assigning that function object to a variable with the same name as the function.
- Imported modules are loaded and assinged to a variable, like so:
if __name__ == "__main__"block, therefore only gets executed if the module is ran as the main program.
- It sets the
- The source file gets compiled into bytecode, and stored in a
.pycfile.- This is also done for every imported module.
- If the
.pycfile is up to date (checked by comparing timestamps), then this step is skipped.
- The bytecode is then interpreted by the Python Virtual Machine, which is part of Python.
- Iterators are objects that you can loop over like a list.
- The
itertoolsmodule contains a number of iterator building functions, likechain(),groupby()andproduct().
- The
- With iterators, you do not have to have the whole collection in memory.
- Very useful for large files, or data streams.
- Iterators are implemented via the
__iter__and__next__methods.__iter__should return the class that implements__next__(usuallyself).__next__should return the next element in the collection.- If the sequence is exhausted, it needs to raise
StopIteration.
- If the sequence is exhausted, it needs to raise
- Generators are a subset of iterators.
- They are a simpler way of implementing iterators, via the
yieldkeyword. yieldindicates where a value is sent back to the caller, but unlike return, the function is not exited afterwards.- Instead, the state of the function is remembered, and when
next()is called on a generator object (either explicitly or implicitly within a for loop), the function is ran again. - If a generator is exhausted, it will return a
StopIterationexception.
- Instead, the state of the function is remembered, and when
- They can also be created via Generator Expressions:
(num**2 for num in range(5))- In this example,
num**2is theyield'ed value.
- In this example,
send()method sends a value back to the generator.close()stops the generator.throw()throws an Exception with the generator.yield fromenables us toyieldan inner generator, and pass anynext(),send()andthrow()values to it.- For example (
barandbazare both generators thatyieldvalues):
Can be written as:def foo(): for v in bar(): yield v for v in baz(): yield v
def foo(): yield from bar() yield from baz()
- For example (
- For cases with complex state, iterators are a better choice.
- Coroutines are an extension of generators that ca pause and resume execution context.
yieldallows the function to wait until it gets input.- It does this by via the
value = yieldstatement. - The
.send()method will then send the value to the yield, assigning it tovalue.
- It does this by via the
- The
asynckeyword introduces a native coroutine.- Before
asyncandawaitwas implemented in 3.5, it was accomplished by the@asyncio.coroutinedecorator, which created a generator-based coroutine.
- Before
- The
awaitkeyword suspends the execution of the surrounding coroutine and passes function control back to the event loop. - Coroutines created with
async defare implemented using the__await__dunder method.- They can
yield, which makes them an async generator, but cannotyield from- that is for generator-based coroutines.
- They can
asynciois a single-threaded, single-process design. It uses cooperative multitasking.- It takes long waiting periods in which functions would otherwise be blocking and allows other functions to run during that downtime.
- The event loop monitors coroutines, taking feedback on what’s idle, and looking around for things that can be executed in the meantime. It is able to wake up an idle coroutine when whatever that coroutine is waiting on becomes available.
- If Python encounters an
await f()expression in the scope ofg(),awaittells the event loop: “Suspend execution ofg()until whatever I’m waiting on—the result off()—is returned. In the meantime, go let something else run.” asynciois great when you have multiple IO-bound tasks where the tasks would otherwise be dominated by blocking IO-bound wait time, like network IO.
- Type annotations are optional and do not affect runtime.
- They get added to the objects
__annotations__dictionary.
- They get added to the objects
- To enforce type checks, third-party libraries such as
mypycan be used.
- Python packages are distributed as either a source distribution, or a wheel.
- A source distribution (
sdist) contains source code.- That includes not only Python code but also the source code of any extension modules (usually in C or C++) bundled with the package.
- With source distributions, extension modules are compiled on the user’s side rather than the developer’s.
- When installing a
sdist, pip will first download the archive (.tar.gz), and then build a wheel (.whl). - Usually complex packages that can't be distributed as a wheel will use
sdist.
- A wheel is a ready to go, pre-built format. When pip downloads a wheel, there is no build stage.
- Wheels are much faster and easier to use.
- pip will always prefere a wheel if it exists.
- Each Python process has a main thread.
- Due to the
GIL, only one thread can be active at any time, this means that they run concurrently- the execution will switch from one thread to another. - This means threading is bad for CPU bound tasks, as it won't make the execution faster. It may even make it slower due to the overhead of creating threads.
- Threading is good for IO and Network bound tasks, as the execution can continue on another thread while one thread is blocked.
- Multiprocessing creates a fresh new process with its own interpreter and
GIL. - This means that all processes will execute at the same time.
- Multiprocessing is parallelism, while multithreading is concurrency.
- The slice notation:
array[start:stop:step]- Any of the three values can be left empty, in which case they will be substituted by their default value.
- The default values are:
start: beginning of the array,stop: end of the array,step: 1.
- The default values are:
startis inclusive, whilestopis exclusive.- All three values can be negative, in which case they will be counted in reverse order.
- Any of the three values can be left empty, in which case they will be substituted by their default value.
- A single star in a function decleration (
*arg) allows a variable number of arguments to be passed to the parameter.- The parameter will be a Tuple in the function.
- Double star in a function decleration (
**arg) allows multiple keywords (a=3, b=5) to be passed to the parameter.- The parameter will be Dict in the function.
- A single star, when calling functions is used to pass list values as parameters, while a double star is used to pass dict values as parameters.
- A class method (
@classmethod) receives the class as implicit first argument. Useful as an alternative constructor. - Lists can be created via List Comprehension:
[num**2 for num in range(5)] /is regular division, while//is floor division (result is floored down).- The walrus operator (
:=) assigns a value to a variable, and returns that variable.- Useful in situations where you'd want to assign values to variables within an expression.
PEPis shorthand forPython Enhancement Proposal.- Everything in Python is an object.
- Assignment operation is just binding a name to an object.
- So an assignment doesn’t copy the value. It just places a kind of identification like a “post-it” note on the box.
- The name is just a reference to the object, not the object itself.
- All values are boxed.
- Assignment operation is just binding a name to an object.
- Class
typeis the metaclass of classobject, and every class (includingtype) has inherited directly or indirectly fromobject.- Metaclasses are the 'stuff' that creates classes.
typeis the metaclass Python uses to create all classes behind the scenes.typeis its own metaclass. This is not something you could reproduce in pure Python, and is done by cheating a little bit at the implementation level.
- Immutable objects:
bool,int,float,string,tuple,frozenset. - Mutable objects:
list,set,dict. - Python is call-by-sharing, also known as call-by-object, “call-by-object-sharing” or pass-by-object-reference.
- A function receives a reference to the same object in memory as used by the caller, but will create its own variable.
- This means that if the variable in function is reassigned, it will not affect the original object.
- However, if the object is modified, the original will be affected (for example, via the
.append(..)function). - In practice, mutable objects act like call-by-reference and immutable objects act like call-by-value.
- String Interning is a CPython optimization that tries to use existing immutable objects in some cases rather than creating a new object every time.
- String containing ASCII letters, digits and underscores are interned.
isoperator checks if both the operands refer to the same object; i.e. it checks if the identity of the operands matches or not (reference equality).==operator compares the values of both the operands and checks if they are the same (value equality).- Uniqueness of keys in a Python dictionary is by equivalence, not identity.
- So even though
5,5.0, and5 + 0jare distinct objects of different types, since they're equal, they can't both be in the same dict (or set).
- So even though