Last active
October 3, 2018 20:37
-
-
Save kylebgorman/86c972918e1ac372a66df2a33a9d2dfd to your computer and use it in GitHub Desktop.
LING78100 Lecture 4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Lecture 4: Functions & generators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Computer programs naturally become complex; one way to manage this complexity and keep them maintainable is to reuse blocks of code. Today we will learn about two ways to do this: _functions_ and _generators_." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Functions" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Functions can be thought of as named chunks of code with special conventions for passing data to and from the code chunk. Like variables, functions must be defined before they are used. Their definition resembles a normal block of code, but makes use of two special keywords: `def` (short for \"define\") and `return`." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### No arguments" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The simplest form of a function is one which has no input arguments. It has the following form:\n", | |
"\n", | |
" def NAME():\n", | |
" EXPR1\n", | |
" ...\n", | |
" \n", | |
"where `NAME` is a valid variable name and `EXPR1` is one or more indented expressions." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def greeting1():\n", | |
" print(\"Hello, world!\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To actually invoke the function, we use its name followed by a pair of parentheses (but not the colon)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Hello, world!\n" | |
] | |
} | |
], | |
"source": [ | |
"greeting1()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Simple arguments" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Functions can be defined to take one or more _arguments_, indicating values to be \"passed into\" the block of code. Such functions take the following form:\n", | |
"\n", | |
" def NAME(ARG1):\n", | |
" EXPR1\n", | |
" ...\n", | |
" \n", | |
"or\n", | |
"\n", | |
" def NAME(ARG1, ARG2):\n", | |
" EXPR1\n", | |
" ...\n", | |
"\n", | |
"and so on, where `ARG1`, `ARG2`, etc. are valid variable names." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def greeting2(name, location):\n", | |
" print(f\"{name}, welcome to {location}.\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To invoke such functions, one must pass expressions in place of these arguments. The expressions are then bound to (i.e., assigned to) the argument variables when the function execution begins." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Kyle, welcome to Queens.\n" | |
] | |
} | |
], | |
"source": [ | |
"greeting2(\"Kyle\", \"Queens\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Return values" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Function execution terminates when it reaches the end of the indented function body, or when they hit a _return statement_, whichever comes first. The simplest return statement is just the `return` keyword on a single line:\n", | |
"\n", | |
" return\n", | |
"\n", | |
"This can be used to halt function execution. However, the `return` keyword is also used to return data from a function, in which case the statement takes the following form:\n", | |
"\n", | |
"\n", | |
" return EXPR\n", | |
" \n", | |
"where `EXPR` is a Python expression." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def add_2(x):\n", | |
" return x + 2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A function invocation that returns a value is an expression evaluating to that variable, and we can, for instance, assign the value to a variable." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"4\n" | |
] | |
} | |
], | |
"source": [ | |
"four = add_2(2)\n", | |
"print(four)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"But if a function `return`s without a following expression, the value \"returned\" is the special `None` type." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"None\n" | |
] | |
} | |
], | |
"source": [ | |
"def void():\n", | |
" return\n", | |
"\n", | |
"none = void()\n", | |
"print(none)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This is all I meant earlier when I said \"`None` is the type returned by a function that returns nothing.\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If a function has more than one return statement, its execution terminates when the first one is reached. Similarly, if a function has a return statement but execution reaches the end the function body without hitting said return statement, the function terminates, returning nothing (i.e., `None`)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"None\n" | |
] | |
} | |
], | |
"source": [ | |
"def weird():\n", | |
" if False:\n", | |
" return True\n", | |
"\n", | |
"print(weird())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In some cases it is desirable to return multiple values. The simplest way to do this is to return a tuple." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import math # We will cover this syntax at a later date.\n", | |
"\n", | |
"# This is just a hasty version of `math.modf`.\n", | |
"def modf(f):\n", | |
" \"\"\"Returns the integral and fractional components of a floating point number.\n", | |
" \n", | |
" This is a hasty version of math.modf.\n", | |
" \n", | |
" Args:\n", | |
" f: A number type.\n", | |
"\n", | |
" Returns:\n", | |
" A (integral component, fractional component) tuple.\n", | |
" \"\"\"\n", | |
" integral = math.trunc(f)\n", | |
" fractional = f - integral\n", | |
" return (integral, fractional)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Default arguments" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Functions can also support _optional arguments_, i.e., ones which have default values assigned to the variable if not otherwise specified at invocation. In Python, a default argument definition takes the form `ARG=EXPR` where `ARG` is a valid variable name and `EXPR` is a Python expression. All optional arguments must appear after all non-optional (or _positional_) arguments in the argument list." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def greeting3(name, location=\"Manhattan\"):\n", | |
" print(f\"{name}, welcome to {location}.\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"When we invoke a function with optional arguments, we can:\n", | |
"\n", | |
"* provide the optional argument using `ARG=EXPR`,\n", | |
"* provide the optional argument \"positionally\", or\n", | |
"* (if the default is appropriate) omit the optional argument(s) altogether." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Kyle, welcome to Queens.\n", | |
"Kyle, welcome to Brooklyn.\n", | |
"Kyle, welcome to Manhattan.\n" | |
] | |
} | |
], | |
"source": [ | |
"greeting3(\"Kyle\", location=\"Queens\") # ARG=EXPR binding; preferred style.\n", | |
"greeting3(\"Kyle\", \"Brooklyn\") # Positional binding. \n", | |
"greeting3(\"Kyle\") # Default value." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It is unsafe to use mutable objects like `list`s as default arguments, because default parameter values are evaluated when the function is defined. See [here](https://docs.python.org/3.0/reference/compound_stmts.html#function-definitions) for a workaround." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### \"Star\" arguments" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A function can also support an arbitrary number of arguments. To use this construction, place an asterisk `*` before an argument in the function definition." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def print_everything1(*items):\n", | |
" for item in items:\n", | |
" print(item)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Within the function, the \"starred\" argument name in the definition is a `tuple` containing all positional arguments not otherwise bound. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Basenji\n", | |
"Akita\n", | |
"Borzoi\n" | |
] | |
} | |
], | |
"source": [ | |
"print_everything1(\"Basenji\", \"Akita\", \"Borzoi\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A similar construction is available for keyword arguments: two asterisks `**` before an argument in the function definition." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def print_everything2(*items, **item_pairs):\n", | |
" print_everything1(*items) # `*` expands the tuple as if it were separate arguments.\n", | |
" for (key, value) in item_pairs.items():\n", | |
" print(f\"{key} -> {value}\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"\"Double-starred\" arguments are passed like ordinary keyword arguments using `ARG=EXPR` form, with `ARG` is interpreted as a string." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Pug\n", | |
"Pomeranian\n", | |
"Wrinkly -> Shar-Pei\n", | |
"Long -> Greyhound\n" | |
] | |
} | |
], | |
"source": [ | |
"print_everything2(\"Pug\", \"Pomeranian\",\n", | |
" Wrinkly=\"Shar-Pei\", Long=\"Greyhound\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There can only be one star-argument per function definition, and it must be placed after all (other) positional arguments and before any keyword arguments. Similarly, there can only be one double-star-argument per function definition, and it must be the last argument." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Direct use of `*args` and `**kwargs` variables are somewhat rare in practice; they are primarily used for [_metaprogramming_](https://en.wikipedia.org/wiki/Metaprogramming) tricks and are not recommended for ordinary development. However, it is important to be able to understand their use in others' code." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Generators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"_Generators_ resemble functions, and in fact, their top-line declarations are identical. However, unlike functions, which return an object, they return an iterator is _lazily evaluated_: values in the iterator are computed as needed. Syntactically, generators differ from functions in that they do not allow traditional return statements (e.g., `return EXPR`). Instead, they use _yield statements_ (`yield EXPR`) which appends (conceptually speaking) `EXPR` to the iterator.\n", | |
"\n", | |
"Simply invoking a generator does no work except for parsing the arguments and creating a `generator` object. However, each time we iterate over this generator object (usually by placing it on the right-hand side of a for-loop) we evaluate it until we reach a yield statement. We then suspend evaluation until we request the next value (i.e., at the start of the next iteration of the loop)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"four\n", | |
"score\n", | |
"and\n", | |
"seven\n", | |
"years\n", | |
"ago\n" | |
] | |
} | |
], | |
"source": [ | |
"def normalize(tokens):\n", | |
" \"\"\"Applies case-folding to an iterable of tokens.\"\"\"\n", | |
" for token in tokens:\n", | |
" yield token.casefold() # We will cover this syntax at a later date.\n", | |
"\n", | |
" \n", | |
"tokens = (\"Four\", \"score\", \"and\", \"seven\", \"years\", \"ago\")\n", | |
"for token in normalize(tokens):\n", | |
" print(token)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Iteration continues until either we reach the end of the code block (as above) or we reach a simple `return` statement. So in the following example, the final two `yield` statements are never reached when `big_dogs=False`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Boston Terrier\n", | |
"Chihuahua\n" | |
] | |
} | |
], | |
"source": [ | |
"def dog_generator(big_dogs=False):\n", | |
" yield \"Boston Terrier\"\n", | |
" yield \"Chihuahua\"\n", | |
" if not big_dogs:\n", | |
" return\n", | |
" # May be unreachable.\n", | |
" yield \"Great Pyrenees\"\n", | |
" yield \"Neopolitan Mastiff\"\n", | |
"\n", | |
"\n", | |
"for dog in dog_generator():\n", | |
" print(dog)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"So far, we have seen one way to \"unroll\" (that is, evaluate) a generator: placing it on the right-hand side of a for-loop. But we can also unroll a generator (so long as it's finite) by casting it to a list or tuple." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"('four', 'score', 'and', 'seven', 'years', 'ago')" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"gen = normalize(tokens)\n", | |
"tuple(normalize(tokens))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"And, with some caution, we can also cast it to a hash-backed container. For instance, `set(gen)`, `frozenset(get)`, and `collections.Counter(gen)` are valid so long as all the items yielded by the generator `gen` are hashable." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"Counter({'four': 1, 'score': 1, 'and': 1, 'seven': 1, 'years': 1, 'ago': 1})" | |
] | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import collections # We will cover this syntax at a later date.\n", | |
"collections.Counter(normalize(tokens))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### When to use generators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A few good cases include:\n", | |
"\n", | |
"* Functions which repeatedly appends to a list, and then returns that list: you can replace the append operation with a `yield`, and then cast the generator to a list or a tuple, like we did above with `normalize`.\n", | |
"* Code that reads through a large amount of input data, processing it item by item or line by line: make a \"reader\" generator and invoke it on the right-hand side of a for-loop.\n", | |
"* Functions which generate an infinite sequence (like the Fibonacci numbers)." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.5" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment