- Python code, written in
.py
file is first compiled to what is calledbytecode
, which is stored with a.pyc
,.pyo
or.pyd
format.- This bytecode is then executed by the interpreter, on instruction at a time.
- When a module is imported, Python will cache the bytecode of the module as
.pyc
,.pyo
and.pyd
files in the__pycache__
folder. .pyc
is the bytecode of the module..pyo
is the bytecode of the module if Python is ran with optimisation options (-o
and-oo
).pyd
is Windows only and is packaged as a DLL.
- Any Python file is a module, its name being the file's base name without the .py extension.
- A package is a collection of Python modules.
- A package also contains an additional
__init__.py
file, to distinguish a package from a directory that just happens to contain a bunch of Python scripts. - Since 3.3,
__init__.py
is not required for namespace packages.- Namespace packages are for different libraries that reside in different locations, and you want them each to contribute a subpackage to the parent package.
- For example, if you have the following project structure:
You can then do the following:path1 namespace module1.py module2.py path2 namespace module3.py module4.py
from namespace import module1, module3
- A package also contains an additional
CPython
is the reference implementation of the Python programming language, written in (mostly) C.PyPy
is a fast, compiled alternative implementation of Python, written in Python.PyPy
is JIT compiled, not AOT compiled.- Since it does not support C extensions, C extension modules (numpy, scikit etc) run much slower than in CPython.
- It also introduces some overhead, especially noticable with short scripts.
- CPython mainly uses reference counting for memory management.
- Objects created in Python have a reference count variable that keeps track of the number of references that point to the object. When this count reaches zero, the memory occupied by the object is released.
- Due to potential reference cycle issues, where an instance has a reference to itself, which causes the reference count to never be zero, CPython also uses a cyclic garbage collector.
PyPy
, on the other hand, doesn’t use reference counting. Instead, it uses only the cyclic garbage collector.GIL
(Global Interpreter Lock), is a mutex (a lock) that allows only one thread to hold the control of the Python interpreter.- It was added as a simple way of preventing deadlocks & other memory issues during reference counting.
- This means that only one thread can be in a state of execution at any point in time, meaning that it is a performance bottleneck in CPU-bound and multi-threaded code.
- Multiprocessing bypasses this limitation, as it creates a new process with its own interpreter and
GIL
. - Existing C extensions rely on the
GIL
, so it is difficult to replace. - PyPy also has a
GIL
, for reasons other than reference counting.
Cython
is a programming language that is superset of the Python programming language, designed to give C-like performance.Cython
is a compiled language that is typically used to generate CPython extension modules.
- Python has excellent interoperability with C, which is one of the reasons for its popularity.
- Double underscores (
--
) are called dunders. - A double underscore prefix causes the Python interpreter to rewrite the attribute name in order to avoid naming conflicts in subclasses.
- This is called name mangling.
- The interpreter will rename
__baz
to_ClassName__baz
. - This attribute will not be visible outside the class, and each subclass will have its own
__baz
.
- Dunder methods (like
__init__
) are also called magic methods. __all__
is a list of strings defining what symbols in a module will be exported whenimport *
is used on the module.__call__
method is used to write classes where the instances behave like functions and can be called like a function.__str__
is a special method used to represent a class’s objects as a string, called by thestr()
function.__repr__
is a special method used to represent a class’s objects as a string, called by therepr()
function.__repr__
is a representation of the object, while__str__
is a user-friendly string explanation.
__getitem__
is called when an element is accessed using the array ([i]
) notation.__setitem__
is called when an element is added using the array ([i]
) notation.__delitem__
is called when an element deleted using thedel
notation.__len__
is called when thelen
method is called on the object.__contains__
is called when when thein
notation is called on the object.__add__
,__sub__
,__mul__
and__truediv__
is used for operator overloading.__await__
is implemented by coroutines.
- Wildcard imports do not import names starting with underscore.
- Unless an
__all__
list is defined, overriding the behaviour. - Wildcard imports should be avoided, as they make it unclear which names are present in the namespace.
- One rare case where it's fine is when using the REPL for an interactive session, to save typing.
- Unless an
- When the Python interpreter reads a source file, it does two things:
- It sets the
__name__
variable.- If the module is run as the main program (
python main.py
), then__name__
is set to__main__
. - If the module is imported (
import foo
),__name__
is set tofoo
.
- If the module is run as the main program (
- It executes all the code in the module, one statement at a time.
- Imported modules are loaded and assinged to a variable, like so:
math = __import__("math")
. - It executes any
def
blocks, creating a function object, then assigning that function object to a variable with the same name as the function.
- Imported modules are loaded and assinged to a variable, like so:
if __name__ == "__main__"
block, therefore only gets executed if the module is ran as the main program.
- It sets the
- The source file gets compiled into bytecode, and stored in a
.pyc
file.- This is also done for every imported module.
- If the
.pyc
file is up to date (checked by comparing timestamps), then this step is skipped.
- The bytecode is then interpreted by the Python Virtual Machine, which is part of Python.
- Iterators are objects that you can loop over like a list.
- The
itertools
module contains a number of iterator building functions, likechain()
,groupby()
andproduct()
.
- The
- With iterators, you do not have to have the whole collection in memory.
- Very useful for large files, or data streams.
- Iterators are implemented via the
__iter__
and__next__
methods.__iter__
should return the class that implements__next__
(usuallyself
).__next__
should return the next element in the collection.- If the sequence is exhausted, it needs to raise
StopIteration
.
- If the sequence is exhausted, it needs to raise
- Generators are a subset of iterators.
- They are a simpler way of implementing iterators, via the
yield
keyword. yield
indicates where a value is sent back to the caller, but unlike return, the function is not exited afterwards.- Instead, the state of the function is remembered, and when
next()
is called on a generator object (either explicitly or implicitly within a for loop), the function is ran again. - If a generator is exhausted, it will return a
StopIteration
exception.
- Instead, the state of the function is remembered, and when
- They can also be created via Generator Expressions:
(num**2 for num in range(5))
- In this example,
num**2
is theyield
'ed value.
- In this example,
send()
method sends a value back to the generator.close()
stops the generator.throw()
throws an Exception with the generator.yield from
enables us toyield
an inner generator, and pass anynext()
,send()
andthrow()
values to it.- For example (
bar
andbaz
are both generators thatyield
values):
Can be written as:def foo(): for v in bar(): yield v for v in baz(): yield v
def foo(): yield from bar() yield from baz()
- For example (
- For cases with complex state, iterators are a better choice.
- Coroutines are an extension of generators that ca pause and resume execution context.
yield
allows the function to wait until it gets input.- It does this by via the
value = yield
statement. - The
.send()
method will then send the value to the yield, assigning it tovalue
.
- It does this by via the
- The
async
keyword introduces a native coroutine.- Before
async
andawait
was implemented in 3.5, it was accomplished by the@asyncio.coroutine
decorator, which created a generator-based coroutine.
- Before
- The
await
keyword suspends the execution of the surrounding coroutine and passes function control back to the event loop. - Coroutines created with
async def
are implemented using the__await__
dunder method.- They can
yield
, which makes them an async generator, but cannotyield from
- that is for generator-based coroutines.
- They can
asyncio
is a single-threaded, single-process design. It uses cooperative multitasking.- It takes long waiting periods in which functions would otherwise be blocking and allows other functions to run during that downtime.
- The event loop monitors coroutines, taking feedback on what’s idle, and looking around for things that can be executed in the meantime. It is able to wake up an idle coroutine when whatever that coroutine is waiting on becomes available.
- If Python encounters an
await f()
expression in the scope ofg()
,await
tells the event loop: “Suspend execution ofg()
until whatever I’m waiting on—the result off()
—is returned. In the meantime, go let something else run.” asyncio
is great when you have multiple IO-bound tasks where the tasks would otherwise be dominated by blocking IO-bound wait time, like network IO.
- Type annotations are optional and do not affect runtime.
- They get added to the objects
__annotations__
dictionary.
- They get added to the objects
- To enforce type checks, third-party libraries such as
mypy
can be used.
- Python packages are distributed as either a source distribution, or a wheel.
- A source distribution (
sdist
) contains source code.- That includes not only Python code but also the source code of any extension modules (usually in C or C++) bundled with the package.
- With source distributions, extension modules are compiled on the user’s side rather than the developer’s.
- When installing a
sdist
, pip will first download the archive (.tar.gz
), and then build a wheel (.whl
). - Usually complex packages that can't be distributed as a wheel will use
sdist
.
- A wheel is a ready to go, pre-built format. When pip downloads a wheel, there is no build stage.
- Wheels are much faster and easier to use.
- pip will always prefere a wheel if it exists.
- Each Python process has a main thread.
- Due to the
GIL
, only one thread can be active at any time, this means that they run concurrently- the execution will switch from one thread to another. - This means threading is bad for CPU bound tasks, as it won't make the execution faster. It may even make it slower due to the overhead of creating threads.
- Threading is good for IO and Network bound tasks, as the execution can continue on another thread while one thread is blocked.
- Multiprocessing creates a fresh new process with its own interpreter and
GIL
. - This means that all processes will execute at the same time.
- Multiprocessing is parallelism, while multithreading is concurrency.
- The slice notation:
array[start:stop:step]
- Any of the three values can be left empty, in which case they will be substituted by their default value.
- The default values are:
start
: beginning of the array,stop
: end of the array,step
: 1.
- The default values are:
start
is inclusive, whilestop
is exclusive.- All three values can be negative, in which case they will be counted in reverse order.
- Any of the three values can be left empty, in which case they will be substituted by their default value.
- A single star in a function decleration (
*arg
) allows a variable number of arguments to be passed to the parameter.- The parameter will be a Tuple in the function.
- Double star in a function decleration (
**arg
) allows multiple keywords (a=3, b=5
) to be passed to the parameter.- The parameter will be Dict in the function.
- A single star, when calling functions is used to pass list values as parameters, while a double star is used to pass dict values as parameters.
- A class method (
@classmethod
) receives the class as implicit first argument. Useful as an alternative constructor. - Lists can be created via List Comprehension:
[num**2 for num in range(5)]
/
is regular division, while//
is floor division (result is floored down).- The walrus operator (
:=
) assigns a value to a variable, and returns that variable.- Useful in situations where you'd want to assign values to variables within an expression.
PEP
is shorthand forPython Enhancement Proposal
.- Everything in Python is an object.
- Assignment operation is just binding a name to an object.
- So an assignment doesn’t copy the value. It just places a kind of identification like a “post-it” note on the box.
- The name is just a reference to the object, not the object itself.
- All values are boxed.
- Assignment operation is just binding a name to an object.
- Class
type
is the metaclass of classobject
, and every class (includingtype
) has inherited directly or indirectly fromobject
.- Metaclasses are the 'stuff' that creates classes.
type
is the metaclass Python uses to create all classes behind the scenes.type
is its own metaclass. This is not something you could reproduce in pure Python, and is done by cheating a little bit at the implementation level.
- Immutable objects:
bool
,int
,float
,string
,tuple
,frozenset
. - Mutable objects:
list
,set
,dict
. - Python is call-by-sharing, also known as call-by-object, “call-by-object-sharing” or pass-by-object-reference.
- A function receives a reference to the same object in memory as used by the caller, but will create its own variable.
- This means that if the variable in function is reassigned, it will not affect the original object.
- However, if the object is modified, the original will be affected (for example, via the
.append(..)
function). - In practice, mutable objects act like call-by-reference and immutable objects act like call-by-value.
- String Interning is a CPython optimization that tries to use existing immutable objects in some cases rather than creating a new object every time.
- String containing ASCII letters, digits and underscores are interned.
is
operator checks if both the operands refer to the same object; i.e. it checks if the identity of the operands matches or not (reference equality).==
operator compares the values of both the operands and checks if they are the same (value equality).- Uniqueness of keys in a Python dictionary is by equivalence, not identity.
- So even though
5
,5.0
, and5 + 0j
are distinct objects of different types, since they're equal, they can't both be in the same dict (or set).
- So even though