steven-tey · October 23, 2019 17:11
diff --git a/CS110 Session 7.2 Pre-Class Work.ipynb b/CS110 Session 7.2 Pre-Class Work.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\\rightarrow$Run All).\n",
    "\n",
    "Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "NAME = \"Steven Tey\"\n",
    "COLLABORATORS = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "2843e94073cad72d7107eb55b3f5d153",
     "grade": false,
     "grade_id": "cell-5b237cc13279b029",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "# CS110 Pre-class Work 7.2\n",
    "\n",
    "## Part A. Direct Address Tables\n",
    "As the first step in setting up a crossword solving algorithm you need to create 2 direct address tables, one to store all the “up” answers-whether correct or not-and one to store all the “across” answers. Write python code to create a direct address table that allows you to:\n",
    "\n",
    "1. initialize N empty guesses\n",
    "2. set a guess for the i-th entry\n",
    "3. clear an incorrect guess for the i-th entry\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "69e5d714c7245702b8732209ea0b2b42",
     "grade": true,
     "grade_id": "cell-d8d2e3c1b136cdb9",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "def guess_list(N): # Creating a list of N empty guesses\n",
    "    return [[] for n in range(N)]\n",
    "\n",
    "def hashing_func(key, hash_table): # Hashing the key to fit the length of the hash table\n",
    "    return key % len(hash_table)\n",
    "\n",
    "def set_guess(hash_table, key, value): # Setting the guess for the i-th entry\n",
    "    hash_key = hashing_func(key, hash_table)\n",
    "    hash_table[hash_key] = value \n",
    "\n",
    "def clear_guess(hash_table, key):  # Clear an incorrect guess for the i-th entry\n",
    "    hash_key = hashing_func(key, hash_table)\n",
    "    hash_table[hash_key] =  None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "cdb562114e50f9ce1c97be3536b59efa",
     "grade": false,
     "grade_id": "cell-b27891b3ff16f730",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Part B. Social Security\n",
    "\n",
    "Could we use a direct address table to store a country's entire set of social security numbers (aka id numbers)? Why or why not?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "7b193e66cbea5bd9dac131a3d6adfefa",
     "grade": true,
     "grade_id": "cell-e751647d9af78c11",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "source": [
    "No, because the country's population is way too big - the U.S. alone has over 300,000,000 people, and if we use a direct address table to store all the SSNs, we will need more than 300M slots, which will significantly increase the time complexity when it comes to searching the list."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "35f74a7f882cabb36df7ee2c43ee51db",
     "grade": false,
     "grade_id": "cell-cba2028918eea1fd",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Part C. Chained Hash-table\n",
    "\n",
    "## Question 1.\n",
    "\n",
    "Using the code in the cell below, complete the missing sections of code. You should copy and paste the code in an additional cell and fill in the code there."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "48a17c00317da5b4474d91c6d8e6b920",
     "grade": false,
     "grade_id": "cell-9e41d88b035338f4",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "outputs": [],
   "source": [
    "import random\n",
    "import string\n",
    "\n",
    "\n",
    "def randomword(length):\n",
    "    return ''.join(random.choice(string.ascii_lowercase) for i in range(length))\n",
    "\n",
    "\n",
    "def empty_hash_table(N):\n",
    "    return [[] for n in range(N)]\n",
    "\n",
    "\n",
    "def add_to_hash_table(hash_table, item, hash_function):\n",
    "    N = len(hash_table)\n",
    "    # Your code here\n",
    "    return hash_table\n",
    "\n",
    "\n",
    "def contains(hash_table, item, hash_function):\n",
    "    N = len(hash_table)\n",
    "    # Your code here\n",
    "    # return true if the item has already been stored in the hash_table\n",
    "\n",
    "\n",
    "def remove(hash_table, item, hash_function):\n",
    "    if not contains(hash_table, item, hash_function):\n",
    "        raise ValueError()\n",
    "    # Your code here\n",
    "    return hash_table\n",
    "\n",
    "\n",
    "def hash_str1(string):\n",
    "    ans = 0\n",
    "    for chr in string:\n",
    "        ans += ord(chr)\n",
    "    return ans\n",
    "\n",
    "# def hash_str2(string):\n",
    "#     ans = 0\n",
    "#     for chr in string:\n",
    "#         ans = ans ^ ord(chr)\n",
    "#     return ans\n",
    "def hash_str2(string):\n",
    "    ans=int(ord(string[0]))\n",
    "    for ix in range(1, len(string)):\n",
    "        ans = ans ^ ord(string[ix])   \n",
    "    return bin(ans)\n",
    "\n",
    "\n",
    "def hash_str3(string):\n",
    "    ans = 0\n",
    "    for chr in string:\n",
    "        ans = ans * 128 + ord(chr)\n",
    "    return ans\n",
    "\n",
    "\n",
    "def hash_str4(string):\n",
    "    random.seed(ord(string[0]))\n",
    "    return random.getrandbits(32)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "b39c1af2de8a4581d16a2d630fcaac74",
     "grade": true,
     "grade_id": "cell-1b6cc6df105c101f",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "import random\n",
    "import string\n",
    "\n",
    "\n",
    "def randomword(length):\n",
    "    return ''.join(random.choice(string.ascii_lowercase) for i in range(length))\n",
    "\n",
    "\n",
    "def empty_hash_table(N):\n",
    "    return [[] for n in range(N)]\n",
    "\n",
    "\n",
    "def add_to_hash_table(hash_table, item, hash_function):\n",
    "    N = len(hash_table)\n",
    "    key = hash_function(item)\n",
    "    hash_key = hash(key) % N\n",
    "    hash_table[hash_key].append(item)\n",
    "    return hash_table\n",
    "\n",
    "\n",
    "def contains(hash_table, item, hash_function):\n",
    "    '''This code returns true if the item has already been stored in the hash_table'''\n",
    "    N = len(hash_table)\n",
    "    key = hash_function(item)\n",
    "    hash_key = hash(key) % N\n",
    "    for key, element in enumerate(hash_table[hash_key]):\n",
    "        if element == item:\n",
    "            return True\n",
    "    return False\n",
    "\n",
    "\n",
    "def remove(hash_table, item, hash_function):\n",
    "    if not contains(hash_table, item, hash_function):\n",
    "        raise ValueError()\n",
    "    N = len(hash_table)\n",
    "    key = hash_function(item)\n",
    "    hash_key = hash(key) % N\n",
    "    hash_table[hash_key].remove(item)\n",
    "    return hash_table\n",
    "\n",
    "# The four hash functions:\n",
    "\n",
    "def hash_str1(string):\n",
    "    ans = 0\n",
    "    for chr in string:\n",
    "        ans += ord(chr)\n",
    "    return ans\n",
    "\n",
    "# def hash_str2(string):\n",
    "#     ans = 0\n",
    "#     for chr in string:\n",
    "#         ans = ans ^ ord(chr)\n",
    "#     return ans\n",
    "\n",
    "def hash_str2(string):\n",
    "    ans=int(ord(string[0]))\n",
    "    for ix in range(1, len(string)):\n",
    "        ans = ans ^ ord(string[ix])   \n",
    "    return bin(ans)\n",
    "\n",
    "\n",
    "def hash_str3(string):\n",
    "    ans = 0\n",
    "    for chr in string:\n",
    "        ans = ans * 128 + ord(chr)\n",
    "    return ans\n",
    "\n",
    "\n",
    "def hash_str4(string):\n",
    "    random.seed(ord(string[0]))\n",
    "    return random.getrandbits(32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "d323c68243f6d1fe79cbf5ad01ee85ae",
     "grade": false,
     "grade_id": "cell-66fb150dd2e509a7",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Question 2.\n",
    "Using the code, create 100,000 words of 10 characters each.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "b9d2f1faad23eecd6f3691968bb4e455",
     "grade": true,
     "grade_id": "cell-234b1e83c871ddc7",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "word_list = [randomword(10) for _ in range(100000)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "190e7fdc00fc311f5ff32a390fff1ca4",
     "grade": false,
     "grade_id": "cell-2fbadca52cec7230",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Question 3.\n",
    "\n",
    "Create four chained hash-tables with 5000 slots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "0d1473300ae336ceb57d11687ab3ec0a",
     "grade": true,
     "grade_id": "cell-2dabefc41493ec16",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "hash_table1 = empty_hash_table(5000)\n",
    "hash_table2 = empty_hash_table(5000)\n",
    "hash_table3 = empty_hash_table(5000)\n",
    "hash_table4 = empty_hash_table(5000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "c3d147917ae237a6dc75eddab48dbaa6",
     "grade": false,
     "grade_id": "cell-cc2e6d00330e691e",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Question 4.\n",
    "\n",
    "Store all the words in each chained hash table using each of the different hash functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "af209dceb17bf5f8d9fadf0808855f79",
     "grade": true,
     "grade_id": "cell-8a08ade8d45590a7",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [],
   "source": [
    "for word in word_list:\n",
    "    add_to_hash_table(hash_table1, word, hash_str1)\n",
    "    add_to_hash_table(hash_table2, word, hash_str2)\n",
    "    add_to_hash_table(hash_table3, word, hash_str3)\n",
    "    add_to_hash_table(hash_table4, word, hash_str4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "858437d142a66b825fb0bebdb1c4d9fe",
     "grade": false,
     "grade_id": "cell-da2c8b42618fbde4",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Question 5.\n",
    "\n",
    "Measure the number of collisions for each hash function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "156f57bc1c6c1a3e10ca1804aa205099",
     "grade": true,
     "grade_id": "cell-a968f9e14a0416c4",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "170\n",
      "30\n",
      "5000\n",
      "26\n"
     ]
    }
   ],
   "source": [
    "def collisions(hash_table):\n",
    "    '''This function measures the number of slots in the hash table where collisions occur'''\n",
    "    collision_counter = []\n",
    "    for i in hash_table:\n",
    "        if len(i) > 1:\n",
    "            collision_counter.append(1)\n",
    "    return len(collision_counter)\n",
    "\n",
    "print(collisions(hash_table1))\n",
    "print(collisions(hash_table2))\n",
    "print(collisions(hash_table3))\n",
    "print(collisions(hash_table4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "ef47cabb10897284ca375490003e6b3d",
     "grade": false,
     "grade_id": "cell-435ba1cfb2d80447",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Question 6.\n",
    "\n",
    "For each of the hash functions, how many elements are in a bucket on average (if it is not empty)?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "9f77d346c6e5ffbfb92ac0707c3c1d72",
     "grade": true,
     "grade_id": "cell-b4052afa2ee7c702",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "For the average bucket size in general:\n",
      "Bucket 1: 20.0\n",
      "Bucket 2: 20.0\n",
      "Bucket 3: 20.0\n",
      "Bucket 4: 20.0\n",
      "\n",
      "For the average bucket size (not counting empty buckets) in general:\n",
      "Bucket 1: 552.4861878453039\n",
      "Bucket 2: 3333.3333333333335\n",
      "Bucket 3: 20.0\n",
      "Bucket 4: 3846.153846153846\n"
     ]
    }
   ],
   "source": [
    "def ave_bucket_length(hash_table):\n",
    "    '''This function measures the average number of elements per bucket in the hash table'''\n",
    "    l = [] # List to store the length of each bucket\n",
    "    for i in hash_table:\n",
    "        l.append(len(i))\n",
    "    return sum(l)/len(l)\n",
    "\n",
    "print('For the average bucket size in general:')\n",
    "print(f'Bucket 1: {ave_bucket_length(hash_table1)}')\n",
    "print(f'Bucket 2: {ave_bucket_length(hash_table2)}')\n",
    "print(f'Bucket 3: {ave_bucket_length(hash_table3)}')\n",
    "print(f'Bucket 4: {ave_bucket_length(hash_table4)}')\n",
    "print('')\n",
    "\n",
    "def ave_nonempty_bucket_length(hash_table):\n",
    "    '''This function measures the average number of elements per non-empty bucket in the hash table'''\n",
    "    l = [] # List to store the length of each bucket\n",
    "    for i in hash_table:\n",
    "        if len(i) != 0:\n",
    "            l.append(len(i))\n",
    "    return sum(l)/len(l)\n",
    "\n",
    "print('For the average bucket size (not counting empty buckets) in general:')\n",
    "print(f'Bucket 1: {ave_nonempty_bucket_length(hash_table1)}')\n",
    "print(f'Bucket 2: {ave_nonempty_bucket_length(hash_table2)}')\n",
    "print(f'Bucket 3: {ave_nonempty_bucket_length(hash_table3)}')\n",
    "print(f'Bucket 4: {ave_nonempty_bucket_length(hash_table4)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "aaeb9bd55b7a86f8bc8079cbb7f8bca4",
     "grade": false,
     "grade_id": "cell-b86aef4dd22c236c",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Question 7.\n",
    "\n",
    "Time how long it takes to find elements that are in each hash table.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "23927c46b8067f49ef10e9b4183f7680",
     "grade": true,
     "grade_id": "cell-0605f7dec7128414",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The average time taken to find an element in hash table 1 is 0.00022217101536001793\n",
      "The average time taken to find an element in hash table 2 is 0.001443639153080003\n",
      "The average time taken to find an element in hash table 3 is 3.0076656109995383e-05\n",
      "The average time taken to find an element in hash table 4 is 0.002190336112360001\n"
     ]
    }
   ],
   "source": [
    "import timeit\n",
    "\n",
    "def find_elements(hash_table, hash_function):\n",
    "    list_of_Trues = []\n",
    "    for word in word_list:\n",
    "        list_of_Trues.append(contains(hash_table, word, hash_function))       \n",
    "        \n",
    "print(f'The average time taken to find an element in hash table 1 is {timeit.timeit(lambda: find_elements(hash_table1, hash_str1), number=1)/len(word_list)}')\n",
    "print(f'The average time taken to find an element in hash table 2 is {timeit.timeit(lambda: find_elements(hash_table2, hash_str2), number=1)/len(word_list)}')\n",
    "print(f'The average time taken to find an element in hash table 3 is {timeit.timeit(lambda: find_elements(hash_table3, hash_str3), number=1)/len(word_list)}')\n",
    "print(f'The average time taken to find an element in hash table 4 is {timeit.timeit(lambda: find_elements(hash_table4, hash_str4), number=1)/len(word_list)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "checksum": "b051093ca9a2faf1658679380bf29033",
     "grade": false,
     "grade_id": "cell-75db2c2e62cec090",
     "locked": true,
     "schema_version": 1,
     "solution": false
    }
   },
   "source": [
    "## Question 8.\n",
    "For each hash table, time how long it takes to find 10,000 elements that have not been stored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "checksum": "99b2fa8c88000c2143cb00fac1f6aa5f",
     "grade": true,
     "grade_id": "cell-b846b45e1b98ae59",
     "locked": false,
     "points": 0,
     "schema_version": 1,
     "solution": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The time taken to find 10,000 elements that are not in hash table 1 is 0.046605653000369784\n",
      "The time taken to find 10,000 elements that are not in hash table 2 is 28.518868270000894\n",
      "The time taken to find 10,000 elements that are not in hash table 3 is 0.2729024959990056\n",
      "The time taken to find 10,000 elements that are not in hash table 4 is 39.04465117200016\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "import timeit\n",
    "\n",
    "non_word_list = ['aaaaaaaaaa' for _ in range(10000)]\n",
    "\n",
    "def find_elements(hash_table, hash_function):\n",
    "    list_of_Falses = []\n",
    "    for word in non_word_list:\n",
    "        list_of_Falses.append(contains(hash_table, word, hash_function))       \n",
    "        \n",
    "print(f'The time taken to find 10,000 elements that are not in hash table 1 is {timeit.timeit(lambda: find_elements(hash_table1, hash_str1), number=1)}')\n",
    "print(f'The time taken to find 10,000 elements that are not in hash table 2 is {timeit.timeit(lambda: find_elements(hash_table2, hash_str2), number=1)}')\n",
    "print(f'The time taken to find 10,000 elements that are not in hash table 3 is {timeit.timeit(lambda: find_elements(hash_table3, hash_str3), number=1)}')\n",
    "print(f'The time taken to find 10,000 elements that are not in hash table 4 is {timeit.timeit(lambda: find_elements(hash_table4, hash_str4), number=1)}')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel$\\rightarrow$Restart) and then run all cells (in the menubar, select Cell$\\rightarrow$Run All).\n",
	"\n",
	"Make sure you fill in any place that says `YOUR CODE HERE` or \"YOUR ANSWER HERE\", as well as your name and collaborators below:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"NAME = \"Steven Tey\"\n",
	"COLLABORATORS = \"\""
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"---"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "2843e94073cad72d7107eb55b3f5d153",
	"grade": false,
	"grade_id": "cell-5b237cc13279b029",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"# CS110 Pre-class Work 7.2\n",
	"\n",
	"## Part A. Direct Address Tables\n",
	"As the first step in setting up a crossword solving algorithm you need to create 2 direct address tables, one to store all the “up” answers-whether correct or not-and one to store all the “across” answers. Write python code to create a direct address table that allows you to:\n",
	"\n",
	"1. initialize N empty guesses\n",
	"2. set a guess for the i-th entry\n",
	"3. clear an incorrect guess for the i-th entry\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "69e5d714c7245702b8732209ea0b2b42",
	"grade": true,
	"grade_id": "cell-d8d2e3c1b136cdb9",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [],
	"source": [
	"def guess_list(N): # Creating a list of N empty guesses\n",
	" return [[] for n in range(N)]\n",
	"\n",
	"def hashing_func(key, hash_table): # Hashing the key to fit the length of the hash table\n",
	" return key % len(hash_table)\n",
	"\n",
	"def set_guess(hash_table, key, value): # Setting the guess for the i-th entry\n",
	" hash_key = hashing_func(key, hash_table)\n",
	" hash_table[hash_key] = value \n",
	"\n",
	"def clear_guess(hash_table, key): # Clear an incorrect guess for the i-th entry\n",
	" hash_key = hashing_func(key, hash_table)\n",
	" hash_table[hash_key] = None"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "cdb562114e50f9ce1c97be3536b59efa",
	"grade": false,
	"grade_id": "cell-b27891b3ff16f730",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Part B. Social Security\n",
	"\n",
	"Could we use a direct address table to store a country's entire set of social security numbers (aka id numbers)? Why or why not?\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "7b193e66cbea5bd9dac131a3d6adfefa",
	"grade": true,
	"grade_id": "cell-e751647d9af78c11",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"source": [
	"No, because the country's population is way too big - the U.S. alone has over 300,000,000 people, and if we use a direct address table to store all the SSNs, we will need more than 300M slots, which will significantly increase the time complexity when it comes to searching the list."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "35f74a7f882cabb36df7ee2c43ee51db",
	"grade": false,
	"grade_id": "cell-cba2028918eea1fd",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Part C. Chained Hash-table\n",
	"\n",
	"## Question 1.\n",
	"\n",
	"Using the code in the cell below, complete the missing sections of code. You should copy and paste the code in an additional cell and fill in the code there."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "48a17c00317da5b4474d91c6d8e6b920",
	"grade": false,
	"grade_id": "cell-9e41d88b035338f4",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"outputs": [],
	"source": [
	"import random\n",
	"import string\n",
	"\n",
	"\n",
	"def randomword(length):\n",
	" return ''.join(random.choice(string.ascii_lowercase) for i in range(length))\n",
	"\n",
	"\n",
	"def empty_hash_table(N):\n",
	" return [[] for n in range(N)]\n",
	"\n",
	"\n",
	"def add_to_hash_table(hash_table, item, hash_function):\n",
	" N = len(hash_table)\n",
	" # Your code here\n",
	" return hash_table\n",
	"\n",
	"\n",
	"def contains(hash_table, item, hash_function):\n",
	" N = len(hash_table)\n",
	" # Your code here\n",
	" # return true if the item has already been stored in the hash_table\n",
	"\n",
	"\n",
	"def remove(hash_table, item, hash_function):\n",
	" if not contains(hash_table, item, hash_function):\n",
	" raise ValueError()\n",
	" # Your code here\n",
	" return hash_table\n",
	"\n",
	"\n",
	"def hash_str1(string):\n",
	" ans = 0\n",
	" for chr in string:\n",
	" ans += ord(chr)\n",
	" return ans\n",
	"\n",
	"# def hash_str2(string):\n",
	"# ans = 0\n",
	"# for chr in string:\n",
	"# ans = ans ^ ord(chr)\n",
	"# return ans\n",
	"def hash_str2(string):\n",
	" ans=int(ord(string[0]))\n",
	" for ix in range(1, len(string)):\n",
	" ans = ans ^ ord(string[ix]) \n",
	" return bin(ans)\n",
	"\n",
	"\n",
	"def hash_str3(string):\n",
	" ans = 0\n",
	" for chr in string:\n",
	" ans = ans * 128 + ord(chr)\n",
	" return ans\n",
	"\n",
	"\n",
	"def hash_str4(string):\n",
	" random.seed(ord(string[0]))\n",
	" return random.getrandbits(32)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 42,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "b39c1af2de8a4581d16a2d630fcaac74",
	"grade": true,
	"grade_id": "cell-1b6cc6df105c101f",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [],
	"source": [
	"import random\n",
	"import string\n",
	"\n",
	"\n",
	"def randomword(length):\n",
	" return ''.join(random.choice(string.ascii_lowercase) for i in range(length))\n",
	"\n",
	"\n",
	"def empty_hash_table(N):\n",
	" return [[] for n in range(N)]\n",
	"\n",
	"\n",
	"def add_to_hash_table(hash_table, item, hash_function):\n",
	" N = len(hash_table)\n",
	" key = hash_function(item)\n",
	" hash_key = hash(key) % N\n",
	" hash_table[hash_key].append(item)\n",
	" return hash_table\n",
	"\n",
	"\n",
	"def contains(hash_table, item, hash_function):\n",
	" '''This code returns true if the item has already been stored in the hash_table'''\n",
	" N = len(hash_table)\n",
	" key = hash_function(item)\n",
	" hash_key = hash(key) % N\n",
	" for key, element in enumerate(hash_table[hash_key]):\n",
	" if element == item:\n",
	" return True\n",
	" return False\n",
	"\n",
	"\n",
	"def remove(hash_table, item, hash_function):\n",
	" if not contains(hash_table, item, hash_function):\n",
	" raise ValueError()\n",
	" N = len(hash_table)\n",
	" key = hash_function(item)\n",
	" hash_key = hash(key) % N\n",
	" hash_table[hash_key].remove(item)\n",
	" return hash_table\n",
	"\n",
	"# The four hash functions:\n",
	"\n",
	"def hash_str1(string):\n",
	" ans = 0\n",
	" for chr in string:\n",
	" ans += ord(chr)\n",
	" return ans\n",
	"\n",
	"# def hash_str2(string):\n",
	"# ans = 0\n",
	"# for chr in string:\n",
	"# ans = ans ^ ord(chr)\n",
	"# return ans\n",
	"\n",
	"def hash_str2(string):\n",
	" ans=int(ord(string[0]))\n",
	" for ix in range(1, len(string)):\n",
	" ans = ans ^ ord(string[ix]) \n",
	" return bin(ans)\n",
	"\n",
	"\n",
	"def hash_str3(string):\n",
	" ans = 0\n",
	" for chr in string:\n",
	" ans = ans * 128 + ord(chr)\n",
	" return ans\n",
	"\n",
	"\n",
	"def hash_str4(string):\n",
	" random.seed(ord(string[0]))\n",
	" return random.getrandbits(32)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "d323c68243f6d1fe79cbf5ad01ee85ae",
	"grade": false,
	"grade_id": "cell-66fb150dd2e509a7",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Question 2.\n",
	"Using the code, create 100,000 words of 10 characters each.\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 57,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "b9d2f1faad23eecd6f3691968bb4e455",
	"grade": true,
	"grade_id": "cell-234b1e83c871ddc7",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [],
	"source": [
	"word_list = [randomword(10) for _ in range(100000)]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "190e7fdc00fc311f5ff32a390fff1ca4",
	"grade": false,
	"grade_id": "cell-2fbadca52cec7230",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Question 3.\n",
	"\n",
	"Create four chained hash-tables with 5000 slots."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 58,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "0d1473300ae336ceb57d11687ab3ec0a",
	"grade": true,
	"grade_id": "cell-2dabefc41493ec16",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [],
	"source": [
	"hash_table1 = empty_hash_table(5000)\n",
	"hash_table2 = empty_hash_table(5000)\n",
	"hash_table3 = empty_hash_table(5000)\n",
	"hash_table4 = empty_hash_table(5000)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "c3d147917ae237a6dc75eddab48dbaa6",
	"grade": false,
	"grade_id": "cell-cc2e6d00330e691e",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Question 4.\n",
	"\n",
	"Store all the words in each chained hash table using each of the different hash functions."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 59,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "af209dceb17bf5f8d9fadf0808855f79",
	"grade": true,
	"grade_id": "cell-8a08ade8d45590a7",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [],
	"source": [
	"for word in word_list:\n",
	" add_to_hash_table(hash_table1, word, hash_str1)\n",
	" add_to_hash_table(hash_table2, word, hash_str2)\n",
	" add_to_hash_table(hash_table3, word, hash_str3)\n",
	" add_to_hash_table(hash_table4, word, hash_str4)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "858437d142a66b825fb0bebdb1c4d9fe",
	"grade": false,
	"grade_id": "cell-da2c8b42618fbde4",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Question 5.\n",
	"\n",
	"Measure the number of collisions for each hash function."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 60,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "156f57bc1c6c1a3e10ca1804aa205099",
	"grade": true,
	"grade_id": "cell-a968f9e14a0416c4",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"170\n",
	"30\n",
	"5000\n",
	"26\n"
	]
	}
	],
	"source": [
	"def collisions(hash_table):\n",
	" '''This function measures the number of slots in the hash table where collisions occur'''\n",
	" collision_counter = []\n",
	" for i in hash_table:\n",
	" if len(i) > 1:\n",
	" collision_counter.append(1)\n",
	" return len(collision_counter)\n",
	"\n",
	"print(collisions(hash_table1))\n",
	"print(collisions(hash_table2))\n",
	"print(collisions(hash_table3))\n",
	"print(collisions(hash_table4))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "ef47cabb10897284ca375490003e6b3d",
	"grade": false,
	"grade_id": "cell-435ba1cfb2d80447",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Question 6.\n",
	"\n",
	"For each of the hash functions, how many elements are in a bucket on average (if it is not empty)?\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 61,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "9f77d346c6e5ffbfb92ac0707c3c1d72",
	"grade": true,
	"grade_id": "cell-b4052afa2ee7c702",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"For the average bucket size in general:\n",
	"Bucket 1: 20.0\n",
	"Bucket 2: 20.0\n",
	"Bucket 3: 20.0\n",
	"Bucket 4: 20.0\n",
	"\n",
	"For the average bucket size (not counting empty buckets) in general:\n",
	"Bucket 1: 552.4861878453039\n",
	"Bucket 2: 3333.3333333333335\n",
	"Bucket 3: 20.0\n",
	"Bucket 4: 3846.153846153846\n"
	]
	}
	],
	"source": [
	"def ave_bucket_length(hash_table):\n",
	" '''This function measures the average number of elements per bucket in the hash table'''\n",
	" l = [] # List to store the length of each bucket\n",
	" for i in hash_table:\n",
	" l.append(len(i))\n",
	" return sum(l)/len(l)\n",
	"\n",
	"print('For the average bucket size in general:')\n",
	"print(f'Bucket 1: {ave_bucket_length(hash_table1)}')\n",
	"print(f'Bucket 2: {ave_bucket_length(hash_table2)}')\n",
	"print(f'Bucket 3: {ave_bucket_length(hash_table3)}')\n",
	"print(f'Bucket 4: {ave_bucket_length(hash_table4)}')\n",
	"print('')\n",
	"\n",
	"def ave_nonempty_bucket_length(hash_table):\n",
	" '''This function measures the average number of elements per non-empty bucket in the hash table'''\n",
	" l = [] # List to store the length of each bucket\n",
	" for i in hash_table:\n",
	" if len(i) != 0:\n",
	" l.append(len(i))\n",
	" return sum(l)/len(l)\n",
	"\n",
	"print('For the average bucket size (not counting empty buckets) in general:')\n",
	"print(f'Bucket 1: {ave_nonempty_bucket_length(hash_table1)}')\n",
	"print(f'Bucket 2: {ave_nonempty_bucket_length(hash_table2)}')\n",
	"print(f'Bucket 3: {ave_nonempty_bucket_length(hash_table3)}')\n",
	"print(f'Bucket 4: {ave_nonempty_bucket_length(hash_table4)}')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "aaeb9bd55b7a86f8bc8079cbb7f8bca4",
	"grade": false,
	"grade_id": "cell-b86aef4dd22c236c",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Question 7.\n",
	"\n",
	"Time how long it takes to find elements that are in each hash table.\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 52,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "23927c46b8067f49ef10e9b4183f7680",
	"grade": true,
	"grade_id": "cell-0605f7dec7128414",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"The average time taken to find an element in hash table 1 is 0.00022217101536001793\n",
	"The average time taken to find an element in hash table 2 is 0.001443639153080003\n",
	"The average time taken to find an element in hash table 3 is 3.0076656109995383e-05\n",
	"The average time taken to find an element in hash table 4 is 0.002190336112360001\n"
	]
	}
	],
	"source": [
	"import timeit\n",
	"\n",
	"def find_elements(hash_table, hash_function):\n",
	" list_of_Trues = []\n",
	" for word in word_list:\n",
	" list_of_Trues.append(contains(hash_table, word, hash_function)) \n",
	" \n",
	"print(f'The average time taken to find an element in hash table 1 is {timeit.timeit(lambda: find_elements(hash_table1, hash_str1), number=1)/len(word_list)}')\n",
	"print(f'The average time taken to find an element in hash table 2 is {timeit.timeit(lambda: find_elements(hash_table2, hash_str2), number=1)/len(word_list)}')\n",
	"print(f'The average time taken to find an element in hash table 3 is {timeit.timeit(lambda: find_elements(hash_table3, hash_str3), number=1)/len(word_list)}')\n",
	"print(f'The average time taken to find an element in hash table 4 is {timeit.timeit(lambda: find_elements(hash_table4, hash_str4), number=1)/len(word_list)}')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"checksum": "b051093ca9a2faf1658679380bf29033",
	"grade": false,
	"grade_id": "cell-75db2c2e62cec090",
	"locked": true,
	"schema_version": 1,
	"solution": false
	}
	},
	"source": [
	"## Question 8.\n",
	"For each hash table, time how long it takes to find 10,000 elements that have not been stored."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 62,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"checksum": "99b2fa8c88000c2143cb00fac1f6aa5f",
	"grade": true,
	"grade_id": "cell-b846b45e1b98ae59",
	"locked": false,
	"points": 0,
	"schema_version": 1,
	"solution": true
	}
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"The time taken to find 10,000 elements that are not in hash table 1 is 0.046605653000369784\n",
	"The time taken to find 10,000 elements that are not in hash table 2 is 28.518868270000894\n",
	"The time taken to find 10,000 elements that are not in hash table 3 is 0.2729024959990056\n",
	"The time taken to find 10,000 elements that are not in hash table 4 is 39.04465117200016\n"
	]
	}
	],
	"source": [
	"import random\n",
	"import timeit\n",
	"\n",
	"non_word_list = ['aaaaaaaaaa' for _ in range(10000)]\n",
	"\n",
	"def find_elements(hash_table, hash_function):\n",
	" list_of_Falses = []\n",
	" for word in non_word_list:\n",
	" list_of_Falses.append(contains(hash_table, word, hash_function)) \n",
	" \n",
	"print(f'The time taken to find 10,000 elements that are not in hash table 1 is {timeit.timeit(lambda: find_elements(hash_table1, hash_str1), number=1)}')\n",
	"print(f'The time taken to find 10,000 elements that are not in hash table 2 is {timeit.timeit(lambda: find_elements(hash_table2, hash_str2), number=1)}')\n",
	"print(f'The time taken to find 10,000 elements that are not in hash table 3 is {timeit.timeit(lambda: find_elements(hash_table3, hash_str3), number=1)}')\n",
	"print(f'The time taken to find 10,000 elements that are not in hash table 4 is {timeit.timeit(lambda: find_elements(hash_table4, hash_str4), number=1)}')"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}