Created
February 24, 2020 11:27
-
-
Save t-flora/e945a65b645ddb32b44b61c0936713e5 to your computer and use it in GitHub Desktop.
CS110 7.2 – Quick Select and Median Heap
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n", | |
| "\n", | |
| "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "NAME = \"Tiago Flora\"\n", | |
| "COLLABORATORS = [\"Alex Wu\", \"Ba Thien\", \"Cameron Watts\", \"Chloe Go\"]" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "---" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "b829d01fa88366d60630dae1299fa764", | |
| "grade": false, | |
| "grade_id": "cell-d94cea4a211410ce", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "# CS110 Pre-class Work 7.1\n", | |
| "\n", | |
| "## Part A. Median Heap (watch this [video explanation](https://www.youtube.com/watch?v=756_8C9YBZQ&list=PLF_a-qBXTGFektoI6JUOTRL36JlvD04BR&index=5&t=0s) or read this [description](https://stackoverflow.com/a/15319593/7946759))\n", | |
| "\n", | |
| "Throughout this pre-class work, please use the following definition of median: the median of a list of numbers is the one in the middle of the list when the list is ordered. When such the middle element can’t be determined (i.e., in a list of even length), the average of the two middle elements is the median. For example, 5 is the median of [-1,2,4,5,8,10,12], and (5+7)/2=6 is the median of [1,2,3,5,7,8,10,11].\n", | |
| "\n", | |
| "Using the idea from Lesson 3.2, we can use a pair of heaps to create a data structure which allows fast access to the median. Use the heapq module in python to create both a max-heap and a min-heap. Note that by default, the heapq module in python only creates min-heaps, but if we multiply elements by -1 when we store them, then we can also create max-heaps.\n", | |
| "\n", | |
| "\n", | |
| "## Question 1.\n", | |
| "Write a function `add_to_median_heap(minh, maxh, elem)`. It must accept a min heap, a max heap, and an element to add.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "3f8e0bb0de793ce786eba2f6715d69b3", | |
| "grade": false, | |
| "grade_id": "cell-50cb08de70712ee7", | |
| "locked": false, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import heapq\n", | |
| "\n", | |
| "def add_to_median_heap(minh, maxh, elem):\n", | |
| " \"\"\"\n", | |
| " Adds a value to either the max-heap or min-heap objects passed to it, and balances the size of both heaps if necessary\n", | |
| " \n", | |
| " Inputs:\n", | |
| " minh: min-heap\n", | |
| " maxh: max-heap\n", | |
| " elem: Element to be inserted \n", | |
| " \n", | |
| " Output: None\n", | |
| " \"\"\"\n", | |
| " \n", | |
| " # Check if heaps are empty or not, and add element to maxh if it is\n", | |
| " # If elem is smaller than the opposite of the largerst element in maxh, add it as top node\n", | |
| "\n", | |
| " if len(maxh) == 0 or elem <= -maxh[0]:\n", | |
| " heapq.heappush(maxh, -elem)\n", | |
| "\n", | |
| " # Otherwise, add the negative of elem to maxh so that it tops maxh\n", | |
| " else:\n", | |
| " heapq.heappush(minh, elem)\n", | |
| " \n", | |
| " # Define a rebalancing function that approximates the sizes of both heaps if necessary\n", | |
| " def rebalance(minh, maxh):\n", | |
| " if len(minh) - len(maxh) > 1:\n", | |
| " heapq.heappush(maxh, -heapq.heappop(minh))\n", | |
| " elif len(maxh) - len(minh) > 1:\n", | |
| " heapq.heappush(minh, heapq.heappop(maxh))\n", | |
| " else:\n", | |
| " pass\n", | |
| "\n", | |
| " rebalance(minh, maxh)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "47ae2ae64e7e5a094e32b8e29577ebd2", | |
| "grade": false, | |
| "grade_id": "cell-b62f7899e36af96f", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 2\n", | |
| "Write a function `median(minh, maxh)`. It must return the median element.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "4fdb08f74f611846be80db8164227117", | |
| "grade": false, | |
| "grade_id": "cell-43ce866dd1dc2d5b", | |
| "locked": false, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "def median(minh, maxh):\n", | |
| " \"\"\"\n", | |
| " Inputs: minh and maxh, two heaps\n", | |
| " minh: min-heap containing values larger than the median\n", | |
| " maxh: max-heap containing values smaller than the median\n", | |
| " \n", | |
| " Output: Median value of the elements of both heaps\n", | |
| " \"\"\"\n", | |
| " # If the heaps have the same size, the median is the average of both root nodes\n", | |
| " if len(maxh) == len(minh):\n", | |
| " return (-maxh[0] + minh[0])/2\n", | |
| " \n", | |
| " # If maxh is larger than minh, return the inverse of the root node of maxh\n", | |
| " elif len(maxh) == len(minh) + 1:\n", | |
| " return float(-maxh[0])\n", | |
| " \n", | |
| " elif len(minh) == len(maxh) + 1:\n", | |
| " return float(minh[0])\n", | |
| " " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "9f79f245efea8c7b7d7c444af36ad997", | |
| "grade": true, | |
| "grade_id": "cell-673faa69bf330eda", | |
| "locked": true, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "# Please ignore this cell. This cell is for us to implement the tests \n", | |
| "# to see if your code works properly. " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "ea112ad1eac7cf41487236cf22381071", | |
| "grade": false, | |
| "grade_id": "cell-ec86b3a3c5adaedf", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 3.\n", | |
| "\n", | |
| "Uncomment and run the testing code given below to test your functions. It should print out numbers ranging from 1 to 50, in that order." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "1.0\n", | |
| "2.0\n", | |
| "3.0\n", | |
| "4.0\n", | |
| "5.0\n", | |
| "6.0\n", | |
| "7.0\n", | |
| "8.0\n", | |
| "9.0\n", | |
| "10.0\n", | |
| "11.0\n", | |
| "12.0\n", | |
| "13.0\n", | |
| "14.0\n", | |
| "15.0\n", | |
| "16.0\n", | |
| "17.0\n", | |
| "18.0\n", | |
| "19.0\n", | |
| "20.0\n", | |
| "21.0\n", | |
| "22.0\n", | |
| "23.0\n", | |
| "24.0\n", | |
| "25.0\n", | |
| "26.0\n", | |
| "27.0\n", | |
| "28.0\n", | |
| "29.0\n", | |
| "30.0\n", | |
| "31.0\n", | |
| "32.0\n", | |
| "33.0\n", | |
| "34.0\n", | |
| "35.0\n", | |
| "36.0\n", | |
| "37.0\n", | |
| "38.0\n", | |
| "39.0\n", | |
| "40.0\n", | |
| "41.0\n", | |
| "42.0\n", | |
| "43.0\n", | |
| "44.0\n", | |
| "45.0\n", | |
| "46.0\n", | |
| "47.0\n", | |
| "48.0\n", | |
| "49.0\n", | |
| "50.0\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "minh = []\n", | |
| "maxh = []\n", | |
| "for a in range(1,100,2):\n", | |
| " add_to_median_heap(minh, maxh, a)\n", | |
| " print(median(minh, maxh)) " | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "be3a94b3063c99a265c7b868e45a5f69", | |
| "grade": false, | |
| "grade_id": "cell-11d4510a48c38394", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 4.\n", | |
| "What’s the worst case complexity to build a median heap using `add_to_median_heap`?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "4ddd5b5c6be74e7306bd08ee76a1b858", | |
| "grade": true, | |
| "grade_id": "cell-c8e33ebdc8aaf328", | |
| "locked": false, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "source": [ | |
| "The worst case happens when the initial heaps are most unbalanced, and we add values one by one from one heap to another so they trickle down to the leaves. <br>\n", | |
| "In that case, we have to continuously go through the elements of `maxh` or `minh` and push the elements of one onto another. That process is proportional to the total number of elements in both heaps, in which case it is $O(n)$. Thus, balancing the heaps has linear time complexity. <br>\n", | |
| "Each time we call heappush(), however, we call a process that takes time $O(\\log n)$. That is because in the worst case for building a heap by adding values, each value trickles down from the root to the leaves of the heaps, meaning each value goes through $\\log n$ levels to get to their respective positions. <br>\n", | |
| "Thus, we can expect the worst case complexity of building a median heap to be $O(n \\log n)$, by considering the $cn$ times we will call heappush, which takes $k\\log n$ steps." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "8a1276ebbad9b1292badcb9090c8a910", | |
| "grade": false, | |
| "grade_id": "cell-dada08b7f913c96f", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 5.\n", | |
| "What’s the worst case complexity of `median`?" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "7d67e5e84a28b798ba456d72603533f0", | |
| "grade": true, | |
| "grade_id": "cell-c061920faa337890", | |
| "locked": false, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "source": [ | |
| "Since `median` only performs comparisons of elements at the roots of both heaps given to it and returns some value accordingly, we can expect it to have constant time complexity $O(1)$.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "73694bd23377143032d5224818a389b7", | |
| "grade": false, | |
| "grade_id": "cell-fcf89277dcd46cbe", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 6.\n", | |
| "\n", | |
| "How does this way of finding the median compare with the vanilla way of sorting the list and pick the middle element? Use arguments based on efficiency or clearity of the respective algorithms." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "733fe35313ebf5888c622423bb7462b4", | |
| "grade": true, | |
| "grade_id": "cell-1f31afecda9730a7", | |
| "locked": false, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "source": [ | |
| "This method precludes the need to sort the list.<br>\n", | |
| "For this heap-based method of finding medians, we can just separate elements into \"larger\" and \"smaller\" relative to the median, which go to a min- and a max-heap, respectively. We do not sort these heaps, and do not care for much other than the heaps being balanced and obeying heap properties. <br>\n", | |
| "The best sorting algorithms we know so far take time $O(n\\log(n))$. This means the comparison we are undertaking depends on the sorting algorithm being used, and also the input instances being considered. <br>\n", | |
| "Thus, if constant factors in the sorting algorithm amount to a lower running time than in the method above, the vanilla method could be superior. In terms of memory, it would also be important to consider if the sorting algorithm sorts in-place or not, and how that compares to the heaps we use to find the median." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "d4869e0b17f0248df4c3ae91668257e8", | |
| "grade": false, | |
| "grade_id": "cell-695c23986b831032", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## [Optional] Question 7.\n", | |
| "\n", | |
| "Is it possible to extend this idea to any percentile? If it is, then write code to do so. If it’s not possible, prove why it is not possible." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "c293e85a4cf43662dc635c9e0f511be4", | |
| "grade": true, | |
| "grade_id": "cell-4d0c197e6d484338", | |
| "locked": false, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "ename": "NotImplementedError", | |
| "evalue": "", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", | |
| "\u001b[1;31mNotImplementedError\u001b[0m Traceback (most recent call last)", | |
| "\u001b[1;32m<ipython-input-6-15b94d1fa268>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# YOUR CODE HERE\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mNotImplementedError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", | |
| "\u001b[1;31mNotImplementedError\u001b[0m: " | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# YOUR CODE HERE\n", | |
| "raise NotImplementedError()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "bd7ff85760b2e8303c3b4482b6ca44b2", | |
| "grade": false, | |
| "grade_id": "cell-3cba7ecc0de11f1b", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Part B. Quickselect\n", | |
| "\n", | |
| "Quicksort can be modified to find the $k$-th smallest element in an unordered list. This is known as quickselect. It does this by choosing a partition (as in quicksort). Once the list has been partitioned then we know how many elements lie to the left and to the right of the partition. This allows us to recursively call quickselect on the correct sublist.\n", | |
| "\n", | |
| "## Question 1.\n", | |
| "\n", | |
| "Write a function `qselect(lst, k)`, which takes both a list and an index $k$. The function must then return the $k$-th smallest item in the list.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "3714ebe85f9dbc77eb3bb3c106a009ea", | |
| "grade": false, | |
| "grade_id": "cell-fb9c2bc9d5b4bc92", | |
| "locked": false, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "def partition(A,p,r):\n", | |
| " \"\"\"\n", | |
| " Assume r<len(A) and p>=0\n", | |
| " \"\"\"\n", | |
| " # Determine a pivot to compare elements and define groups later on\n", | |
| " x = A[r]\n", | |
| " i = p-1\n", | |
| " \n", | |
| " # We compare all elements in the (sub)array to the pivot\n", | |
| " for j in range(p, r):\n", | |
| " # If an element is smaller than the pivot, we increment the second marker (i) and swap the elements\n", | |
| " # in the j-th and i-th positions\n", | |
| " if A[j] <= x:\n", | |
| " # Increment i and swap elements\n", | |
| " i += 1\n", | |
| " A[i], A[j] = A[j], A[i]\n", | |
| " \n", | |
| " # When we reach the end of array, we swap the pivot with the first element of the \"larger\" group\n", | |
| " A[i+1], A[r] = A[r], A[i+1]\n", | |
| " \n", | |
| " # Return the final position of the pivot in the array\n", | |
| " return(i+1)\n", | |
| "\n", | |
| "def qselect_inds(lst, p, r, k):\n", | |
| " \"\"\"\n", | |
| " This function takes a similar approach to quick-sort in finding the desired element of the list.\n", | |
| " \n", | |
| " Inputs: A list, two indices p and r, and the value k of the \"k-th smallest\" property of the element we want\n", | |
| " \n", | |
| " Output: The k-th object of lst, which will be the k-th smallest object of lst.\n", | |
| " \"\"\"\n", | |
| " \n", | |
| " # Define a pivot element q as the partition of lst\n", | |
| " q = partition(lst, p, r)\n", | |
| " \n", | |
| " # In this function, we do not care about sorting the array.\n", | |
| " # Thus, we separate cases based on where the smallest elements are\n", | |
| " \n", | |
| " # If the the pivot is equal to k, return the k-th element\n", | |
| " if k == q:\n", | |
| " return lst[q]\n", | |
| " \n", | |
| " # If k lies below the partition, recursively call the function on the left subarray\n", | |
| " elif k < q:\n", | |
| " qselect_inds(lst, p, q-1, k)\n", | |
| " \n", | |
| " # If k lies above the partition, call the function on the right subarray\n", | |
| " elif k > q:\n", | |
| " qselect_inds(lst, q+1, r, k)\n", | |
| " \n", | |
| " # Return the k-th element. As the left subarray will contain the k-1 smallest elements and the right\n", | |
| " # subarray will contain the n-k-1 largest elements, the k-th smallest will be the k-th element\n", | |
| " return lst[k]\n", | |
| " \n", | |
| "def qselect(lst,k):\n", | |
| " \"\"\"\n", | |
| " This function merely calls on qselect_inds as defined above, so that we can use recursion.\n", | |
| " \n", | |
| " Inputs: A list and and an index\n", | |
| " lst: list object with numerical elements\n", | |
| " k: index of element in a sorted version of lst + 1\n", | |
| " \n", | |
| " Outputs: k-th smallest element of lst\n", | |
| " \"\"\"\n", | |
| " return(qselect_inds(lst, 0, len(lst)-1, k))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "5ee41421528107f7ae195a7197a5d6d9", | |
| "grade": true, | |
| "grade_id": "cell-85a25ed6a1f62a66", | |
| "locked": true, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import random\n", | |
| "random.seed(123) # introducing a seed for reproducibility purposes\n", | |
| "lst1 = list(range(100))\n", | |
| "random.shuffle(lst1)\n", | |
| "lst2 = []\n", | |
| "for a in range(100):\n", | |
| " lst2.append(qselect(lst1, a))\n", | |
| "assert(lst2 == list(range(100)))" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "54fddc25ea0529f44aa6653b18ff7745", | |
| "grade": false, | |
| "grade_id": "cell-8013044acd03642a", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 2.\n", | |
| "Uncomment and run the testing code given below to test your function. It should print out integers from 0 to 99.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "0\n", | |
| "1\n", | |
| "2\n", | |
| "3\n", | |
| "4\n", | |
| "5\n", | |
| "6\n", | |
| "7\n", | |
| "8\n", | |
| "9\n", | |
| "10\n", | |
| "11\n", | |
| "12\n", | |
| "13\n", | |
| "14\n", | |
| "15\n", | |
| "16\n", | |
| "17\n", | |
| "18\n", | |
| "19\n", | |
| "20\n", | |
| "21\n", | |
| "22\n", | |
| "23\n", | |
| "24\n", | |
| "25\n", | |
| "26\n", | |
| "27\n", | |
| "28\n", | |
| "29\n", | |
| "30\n", | |
| "31\n", | |
| "32\n", | |
| "33\n", | |
| "34\n", | |
| "35\n", | |
| "36\n", | |
| "37\n", | |
| "38\n", | |
| "39\n", | |
| "40\n", | |
| "41\n", | |
| "42\n", | |
| "43\n", | |
| "44\n", | |
| "45\n", | |
| "46\n", | |
| "47\n", | |
| "48\n", | |
| "49\n", | |
| "50\n", | |
| "51\n", | |
| "52\n", | |
| "53\n", | |
| "54\n", | |
| "55\n", | |
| "56\n", | |
| "57\n", | |
| "58\n", | |
| "59\n", | |
| "60\n", | |
| "61\n", | |
| "62\n", | |
| "63\n", | |
| "64\n", | |
| "65\n", | |
| "66\n", | |
| "67\n", | |
| "68\n", | |
| "69\n", | |
| "70\n", | |
| "71\n", | |
| "72\n", | |
| "73\n", | |
| "74\n", | |
| "75\n", | |
| "76\n", | |
| "77\n", | |
| "78\n", | |
| "79\n", | |
| "80\n", | |
| "81\n", | |
| "82\n", | |
| "83\n", | |
| "84\n", | |
| "85\n", | |
| "86\n", | |
| "87\n", | |
| "88\n", | |
| "89\n", | |
| "90\n", | |
| "91\n", | |
| "92\n", | |
| "93\n", | |
| "94\n", | |
| "95\n", | |
| "96\n", | |
| "97\n", | |
| "98\n", | |
| "99\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import random\n", | |
| "random.seed(123) # introducing a seed for reproducibility purposes\n", | |
| "lst = list(range(100))\n", | |
| "random.shuffle(lst)\n", | |
| "for a in range(100):\n", | |
| " print(qselect(lst, a))\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "3fede812466807c2a250ab3c6427f75c", | |
| "grade": false, | |
| "grade_id": "cell-e7a965de7bb56aa3", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 3.\n", | |
| "\n", | |
| "Write down the recurrence relation for your code." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "6fb3d2481f5df1b96c902e5446844f87", | |
| "grade": true, | |
| "grade_id": "cell-24144607b3ffade3", | |
| "locked": false, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "source": [ | |
| "In each step, we divide the problem into 1 subproblem to be solved within the left and right subarray. The amount of steps that each subproblem will take to solve depends on the input instance we meet. The runtime depends on the pivot chosen. <br>\n", | |
| "In the best case scenario, where our partitions decrease the size of the of the search set to some constant fraction $n/h$, we would get:\n", | |
| "$$ T(n) = 1\\cdot T(n/h) + O(n) = T(n/h) + O(n) $$\n", | |
| "In the worst case scenario, the partitions would only decrease the size of the search set by 1. And so:\n", | |
| "$$ T(n) = 1\\cdot T(n-1) + O(n) = T(n-1) + O(n) $$\n", | |
| "\n", | |
| "The $O(n)$ term added to both recurrences reflects the time of comparing all the elements of the array to the pivot." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "311ee23a32dbc3ba43d3bba0d433871b", | |
| "grade": false, | |
| "grade_id": "cell-fd740516f4259f3c", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 4.\n", | |
| "\n", | |
| "Solve the recurrence relation for quickselect in the best case.\n" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "78416da569a0269f78967b11d7b93332", | |
| "grade": true, | |
| "grade_id": "cell-c125f4742933e820", | |
| "locked": false, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "source": [ | |
| "Given that the resulting series for this recurrence will result in a sum of the form $n(1/h + 1/h^2 + 1/h^3 + ...)$ until we get to our base case, we have that this geometric series sum will tend to $1/(1-1/h)$ as $n$ tends to infinity. This means we have $T(n) = c(h/(h-1)) + O(n)$ in the best case, on the limit. <br>\n", | |
| "Thus, we can conclude the best case complexity for quickselect is $O(n)$, which dominates the first term of the resulting recurrence (a constant)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "editable": false, | |
| "nbgrader": { | |
| "checksum": "7cc9c5e0531e8ece71be2ff60ee65995", | |
| "grade": false, | |
| "grade_id": "cell-e281cecc190fd662", | |
| "locked": true, | |
| "schema_version": 1, | |
| "solution": false | |
| } | |
| }, | |
| "source": [ | |
| "## Question 5.\n", | |
| "\n", | |
| "Solve the recurrence relation for quickselect in the worst case." | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": { | |
| "deletable": false, | |
| "nbgrader": { | |
| "checksum": "f4cceeaea59e48c3fabdb8d19e302db5", | |
| "grade": true, | |
| "grade_id": "cell-852c3a71d8c2d7c1", | |
| "locked": false, | |
| "points": 0, | |
| "schema_version": 1, | |
| "solution": true | |
| } | |
| }, | |
| "source": [ | |
| "The resulting series sum for this recurrence can be expressed as\n", | |
| "$$ T(n) = O(1) + O(2) + ... + O(n-2) + T(n-1)$$\n", | |
| "Thus, adding all the terms and assuming they take the same constant $c$, we have \n", | |
| "$$T(n) = c(n-1)(n-1+1)/2 = c(n^2-n)/2$$\n", | |
| "Because the term with the highest order in the recurrence is a quadratic, we have that the worst-case complexity for quick select is $O(n^2)$." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.7.3" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 2 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment