Skip to content

Instantly share code, notes, and snippets.

@viniciusmss
Created December 30, 2019 14:51
Show Gist options
  • Select an option

  • Save viniciusmss/cb1b98e402df8590009c984f13e4651a to your computer and use it in GitHub Desktop.

Select an option

Save viniciusmss/cb1b98e402df8590009c984f13e4651a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**1. Load the \"Matching\" library.**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:03.698041Z",
"start_time": "2019-12-30T14:49:02.701Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Loading required package: MASS\n",
"## \n",
"## Matching (Version 4.9-3, Build Date: 2018-05-03)\n",
"## See http://sekhon.berkeley.edu/matching for additional documentation.\n",
"## Please cite software as:\n",
"## Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching\n",
"## Software with Automated Balance Optimization: The Matching package for R.''\n",
"## Journal of Statistical Software, 42(7): 1-52. \n",
"##\n",
"\n"
]
}
],
"source": [
"library(Matching)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2. Load the lalonde data set into working memory.**"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:05.378258Z",
"start_time": "2019-12-30T14:49:05.338Z"
}
},
"outputs": [],
"source": [
"data(lalonde)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**3. What are the dimensions of the data set?**"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:05.741347Z",
"start_time": "2019-12-30T14:49:05.713Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>445</li>\n",
"\t<li>12</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 445\n",
"\\item 12\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 445\n",
"2. 12\n",
"\n",
"\n"
],
"text/plain": [
"[1] 445 12"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dim(lalonde)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**4. What are the names of the columns?**"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:06.127030Z",
"start_time": "2019-12-30T14:49:06.100Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>'age'</li>\n",
"\t<li>'educ'</li>\n",
"\t<li>'black'</li>\n",
"\t<li>'hisp'</li>\n",
"\t<li>'married'</li>\n",
"\t<li>'nodegr'</li>\n",
"\t<li>'re74'</li>\n",
"\t<li>'re75'</li>\n",
"\t<li>'re78'</li>\n",
"\t<li>'u74'</li>\n",
"\t<li>'u75'</li>\n",
"\t<li>'treat'</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'age'\n",
"\\item 'educ'\n",
"\\item 'black'\n",
"\\item 'hisp'\n",
"\\item 'married'\n",
"\\item 'nodegr'\n",
"\\item 're74'\n",
"\\item 're75'\n",
"\\item 're78'\n",
"\\item 'u74'\n",
"\\item 'u75'\n",
"\\item 'treat'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'age'\n",
"2. 'educ'\n",
"3. 'black'\n",
"4. 'hisp'\n",
"5. 'married'\n",
"6. 'nodegr'\n",
"7. 're74'\n",
"8. 're75'\n",
"9. 're78'\n",
"10. 'u74'\n",
"11. 'u75'\n",
"12. 'treat'\n",
"\n",
"\n"
],
"text/plain": [
" [1] \"age\" \"educ\" \"black\" \"hisp\" \"married\" \"nodegr\" \"re74\" \n",
" [8] \"re75\" \"re78\" \"u74\" \"u75\" \"treat\" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"names(lalonde)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**5. How many different variable types are represented in this data set?**"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:06.869218Z",
"start_time": "2019-12-30T14:49:06.488Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"integer \n",
"integer \n",
"integer \n",
"integer \n",
"integer \n",
"integer \n",
"numeric \n",
"numeric \n",
"numeric \n",
"integer \n",
"integer \n",
"integer \n"
]
},
{
"data": {
"text/html": [
"2"
],
"text/latex": [
"2"
],
"text/markdown": [
"2"
],
"text/plain": [
"[1] 2"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"### To see the different variable types:\n",
"for(i in 1:length(names(lalonde))) {cat( class(lalonde[,i]), \"\\n\")}\n",
"\n",
"### To count the unique types:\n",
"length(unique(sapply(lalonde, class)))\n",
"\n",
"### The answer is \"2\" -- there are two variable types.\n",
"\n",
"### To see the help file for the \"sapply\" function, type:\n",
"?sapply"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**6. What's the maximum value of the re74 column? (re74 indicates the person's real earnings in 1974)**"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:07.230251Z",
"start_time": "2019-12-30T14:49:06.864Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"39570.7"
],
"text/latex": [
"39570.7"
],
"text/markdown": [
"39570.7"
],
"text/plain": [
"[1] 39570.7"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"max(lalonde['re74'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**7. What's the minimum value of this column?**"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:07.610234Z",
"start_time": "2019-12-30T14:49:07.312Z"
},
"code_folding": []
},
"outputs": [
{
"data": {
"text/html": [
"0"
],
"text/latex": [
"0"
],
"text/markdown": [
"0"
],
"text/plain": [
"[1] 0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"0"
],
"text/latex": [
"0"
],
"text/markdown": [
"0"
],
"text/plain": [
"[1] 0"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"min(lalonde['re74'])\n",
"\n",
"### or\n",
"\n",
"min(lalonde$re74)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**8. How many of the elements of this column are equal to zero?**"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:08.037698Z",
"start_time": "2019-12-30T14:49:08.001Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"326"
],
"text/latex": [
"326"
],
"text/markdown": [
"326"
],
"text/plain": [
"[1] 326"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"326"
],
"text/latex": [
"326"
],
"text/markdown": [
"326"
],
"text/plain": [
"[1] 326"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sum(lalonde['re74'] == 0)\n",
"\n",
"### or\n",
"\n",
"length(which(lalonde$re74 == 0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**9. How many elements of this column are less than \\$15000 OR greater than 20000?**"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:08.917789Z",
"start_time": "2019-12-30T14:49:08.889Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"440"
],
"text/latex": [
"440"
],
"text/markdown": [
"440"
],
"text/plain": [
"[1] 440"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sum((lalonde['re74'] < 15000) | (lalonde['re74'] > 20000))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**10. How many people in this data set are married and have more than 8 years of education (\"educ\")?**"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:12.528204Z",
"start_time": "2019-12-30T14:49:12.501Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"68"
],
"text/latex": [
"68"
],
"text/markdown": [
"68"
],
"text/plain": [
"[1] 68"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sum((lalonde$married == 1) & (lalonde$educ > 8))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**11. What is the interquartile range of \"re78\" (real earnings in 1978)? Use the \"quantile\" function.**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:13.050673Z",
"start_time": "2019-12-30T14:49:13.014Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<strong>75%:</strong> 8124.72"
],
"text/latex": [
"\\textbf{75\\textbackslash{}\\%:} 8124.72"
],
"text/markdown": [
"**75%:** 8124.72"
],
"text/plain": [
" 75% \n",
"8124.72 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<strong>25%:</strong> 0"
],
"text/latex": [
"\\textbf{25\\textbackslash{}\\%:} 0"
],
"text/markdown": [
"**25%:** 0"
],
"text/plain": [
"25% \n",
" 0 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"### upper-bound of the interquartile range:\n",
"quantile(lalonde$re78, 0.75)\n",
"\n",
"### lower-bound of the interquartile range:\n",
"quantile(lalonde$re78, 0.25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**12. Create a scatterplot, with re74 on the x-axis, and re78 on the y-axis. Label the axes. Draw a regression line if you wish (and choose a fun color).**"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:13.661064Z",
"start_time": "2019-12-30T14:49:13.552Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAMAAABKCk6nAAAAM1BMVEUAAABNTU1oaGh8fHyM\njIyampqnp6eysrK9vb3Hx8fQ0NDZ2dnh4eHp6enw8PD/AP/////jPTT6AAAACXBIWXMAABJ0\nAAASdAHeZh94AAAVK0lEQVR4nO2dibqqIBRG0cw6pV7e/2mvsxvCgUHC3b++73QaFKgV09ZQ\nSMAa8e0CgHOBYOZAMHMgmDkQzBwIZg4EMweCmQPBzIFg5kAwcyCYORDMHAhmDgQzB4KZA8HM\ngWDmQDBzIJg5EMwcCGYOBDMHgpkDwcyBYOZAMHMgmDkQzBwIZg4EMweCmQPBzIFg5kAwcyCY\nORDMHAhmDgQzB4KZA8HMgWDmQDBzIJg5EMwcCGYOBDMHgpkDwcyBYOZAMHMgmDkQzBwIZg4E\nMweCmQPBzIFg5kAwcyCYORDMHAhmDgQzB4KZA8HMgWDmQDBzIJg5EMwcCGYOBDMHgpkDwcyB\nYOZAMHMgmDkQzBwIZg4EMweCmQPBzIFg5rgLfj8K0VGU74DlAYFxFdzkYuEWtEggJK6CS5H9\nVf29+pWJMlyBQFhcBWeimu9XIgtTGBAeV8FCrD0ASYEazByPPvhV9/fQByeNc+N6I6PovAlZ\nJBASj3lw2c+Ds+KBeXDCYHjEHAhmDkKVzEGokjkIVTIHgQ7mRAhVChAMB1H2u/RY1GAM1IMR\nUbBFqBKCgxFRsEWoEoKDEVPw8VAlBAcjqmCXLJyGCWAmccG9XSj2IG4TfTRUKdQ7EOxORMEW\noUqx8h9YE3eadDRUCcHBSDvQAcHeRBS8E6o0xtfQB/uSdg3GKNqb5EOVmAf7gVAlcxCqZE7i\nkSzgCwQzJ/FQJfAl7VAl8CbtUCXwJvFAB/AlnVBliCzAB6jBzEk+VAn8QKiSOQhVMgeRLOZA\nMHNiCm7Kbuj8yIW4/Z2UBdCJKLjO2slvkyFUGZWIgu+iaNqbeztVqu+YJkUiaiSrGW/a1hqB\njkjEDlVmgjwIngX4IGoTXbVDrCFe2Wx3whAcjIiCK5GVlSyy1vArF68zsgAfxJwmvbIlVPk4\nJwugEzfQ8Xfvz+ooHvVpWQAVRLKYA8HMgWDmQDBzIJg5EMwcCGYOBDMHgpkDwcyBYOZAMHMg\nmDnpCPZchx6YSUdw5Cx+BQhmDgQzB4KZ4yO4vovsIeUzF1ng65hBcDA8BA+/Qnk+DvwUJUKp\ngBkPwWX385MyE/dGNmXYaxFCcDA8BA8/Uhh/ixL2WoQQHAwPwUIst4HXdIbgYASowd3tzo/J\nXLMA3gTog8tmvP/VUgEzGEUzJ+o8GKvNxidiJAurzX6DiIKx2uw38JkmZTsNrQbWqvwGXvPg\nblUVi/3E2oPVLIA3XoJfwxzpIKjB38AvktW0o+L75mIMBKw2+w18Q5VVN/EpntWRiozVZr+A\ndyxaVmV29Dw5rDYbH3/BLdWzyHGwIU2CCA4OBAcjqmCEKuODUCVzEKpkjpfgqhxmPnmxs8B3\nDwId38BH8IM0ucWB/RCq/AIegl/92t7vW9FOk3YWF+1BDf4GHoJvw/mUVbew6PtAFUao8hsE\nmCb1lfHInAmhyi/gdVbluDp/9wRClYnidVblrfVUF+Ium3t7891SATM+o+ixyc2atv5mOytA\nn14qYMZrHvxsFefd2u3ZseP+CFXGB6FK5rAMVWIVl4WIgmMFOo6P6n+BiIJ3QpXBllES5Bbw\nq8FC+//j+J0XbVXpIoUqIVjBQ/DTUnCkUCUEK/g00VVm+aPROKFK9MEUvwP+YX/2bcrCZWeM\nogl+g6wnGTeFBPPgYEQcRSNU+Q0QqmQOy1AlWOAX6GCO7fginVBliCzYYz9DQA2+FPZz/Lh9\nMM6q9MMhShdzmoSzKn1JXDDOqvQldcEpZXFN4vfBjzl6YZ/QwSzAQvRRNPn9mU0KextD8CqR\n58GZeNonAMER8RRs83WyOAOEvIgjQ354Ci7E8ZXu3pm9YBzb9cVTcJ3djq9I2hTi1kc6jjfR\nODvDF+8m2mqQ9SfEn7QQjPOrvIkrWNa3boVaCI5H9EDHQ2QvCI5H/EhWle9Xd/TBwfAQ3Gmy\nbaJ77haCMYr25BuCrUqFebAfONjAHAhmDgQzB4KZA8HMgWDmQDBzIJg5voKfuZR1LnK7yxha\nZQF88BTcHzfoD+QHNQzBwfAUfBN/shK5/At7CXAIDkaAc7L6hRz8I8aWK7qAYwQQXHTL+eO8\n6ETxbqKrV/dDQTTRqeI/yBLdNRvEgYtyOGYBvPCeJg0/BM2PXDjJMQvgAwIdzIFg5oQ7bfYW\ncNU7CA5GOMFie9mNs0sFzPg20fesGz6/MvFu58PB6jBOuguGp+ByXDmnaufBjcjDlAmnzQYk\nQCRrvhNOBDnxvavBEOyOp+BsrsHZGYLFWINh2BnvJnrqg8uQ4UraLki00T74DrKmta9unYa9\n5RyslxPGj8+88Q50vDpnRVeNu5j0Fg7LCaMGexMxkuWwnDCOD3sTUbDLYqTw60tEwS7LCWMe\n7IuvYIuV7hxqsBjnSDDsjKdgm5XuHJYTxijaG+9Ah8VKd/bLCUOwN6FClYewX05YfOQI7PAU\nbLPSnUsWGGT54inYaqU7pywwSfIj3AH/A3vaX/kMNdiXiIJdrnyGPtiXxEOVhizRZluRdqjy\nUzDabEs8BNsuhOYQqjQI1h6DHSIKdjrYoBUSkQ9b4vbB1lc+01tkCLYlomC3K5/1J92JjzYb\nQ62DxBTsEqqcjycJ+gqGWoeJeLjQJYvltNlJMFELwQeIeLjQIYvB7pC4oXWG4X1iHi50OKtS\njAf91Wwh+DgRDxe6nFWp/RleBttEPFzodFalHPphraDogw8T8XChS6hyCqZoTQVG0YeJeDRp\nJ1QphCExoc2D6daWxQ6x6wWJKDiN86J/rfInH6oMLljNgD2+gyybH/U7hCqD6/i5AXjEaZJD\nqDK8Dgi22yU/96zKORJ9Rsq/gafgpjj1rEoR/hf+6IOtdrEZRTudVRl8jQ6Moq12sRHs9ANw\n72E0nVQPd0+eByc2zY47TYr+A3BSX+NU3eQaiIiC3UKVZsFHnZMeN07nm1wXH0rwu9jf7/j8\nakfw4WpCR+PaU+eQ3iDdV3B5aqhyZRR9uJqcLXilZIwEL373V3x3+QH4dLDB/Orhop4j2NCQ\nsBOciT95E3V9O3LdJIcfgBtrsMWneGofbEqRWx/cffSPtvZWh1a5sw9VGtt/G8EnjqKNxeA2\niu7eyqs7L+u0k+6Mg6yPcmwk9TEPDsXK94zXPLhom+ha5PIdV3AS1SS9/taEp+D+2oV933rf\n37G5C3EbB2Ne06REqkly/a0J32nSo3vUijtwXLjJhkD0kMghwWmvdJdGQ7JDxEhW2fXVzTPr\nh2OeNTgR0i3ZTETB2bBjneU1F8EXwEOwRehR2aS53SA4Gt6Cpx+C7Se0nP2R39wEw7Q9EQU/\n55F2LW4Ogi8xpkmOiIK7wPV477VTF82ClZe2i7iV/NqLPNuHmIJlNR9TrO/Wgk1xBbOTzaqu\nvUia/42dLkxUwQ5ZbAlec7JZ1ZUXaah6a6cLc2nB2mN9582UtdtrxB0duI7gD59rTo4LJg8g\n+HMXoXJGqTZH0RB8hAsJHkdE9NGsScn/cB9MraIP9tnFNYvPb89SjUfbUhhGxspDkux0+s/8\nHVlu0x5Fu1egywmeb8eVHYQw1D7j5zEpFCtfhXTnwT5fvosJ1ltn4ne3YMZGOF2rBJ/u4yKC\n1bGRkIK2squjLfVwyMpW6eNV8osInmdj480BwUcH3enzC4LFNK4Ss1IyOTY3vtqTy8d0iWaZ\n8AuC5dTgLs0zDTMqlXUn8JX2cNnID/TBcnaizIrJcHjZRU6zJ6205GtysYaaxyjaFDVRBM9f\n5C1Fc/DD1K4Jo/gr8APzYEEEr36hpy03vgbXFGzNvxn7fb80iqbK1r7QdPHhla8BR8H/trBP\n7lvz4P3Oc66+YuNrcME+WGVT51UFy0PD30XwxlYXG0Vb6vzwepE+WKhD4M0WWh1TG7ZMdx7s\na9PARQQrLwqpP0VeomNlEh35Ghu5n6DTkL9Dke13cc1iRfD+KFoZj30sERCPpaiuNh36ULUE\nUXZxzcIkeH0CtLTO1K9fHXbc9Us2DVxQMAlkKdsNU6R5JjwKlmPX7FaOYwMyV50x2pUrCVbb\nZi0gPVddon0OevjF+dR9Q1TOiJPxiwgmM1zloAGtpdNjQZKZtj8k+KM5Fr4610RC8PRfiWSN\nr6hTXSEVsapgsQS29vP01rn6Rj6HC9r/E7mKYPqZKAPjSfAc7tJH3KsVOLhN0xtZyX1F/Alc\nRzDtjun4aqzUU7ha+zTnSbOHTq83vJRYK5bzwMAy/yi7uGYhPiDH9edb1XGHq81/Um8AZBAP\nhnRihV8uJHieDE2vTTceldOQpVaSIB7itcgrWZ+9i2sWH3JbnHXuqDpRQsQx1Wre5+7imoWz\nTWJrGXrtCD6vW4RgHbfKuYy16ZBbqn8bmMdDAYBgnW2dchEqSRxDBBF8SkVGH6xB2lrzKJrq\nGuMexPmHYHGgjVaPOId9P188ySBNwevTpCWgoTZ8mhfaB8/TY5tBVvBW+ltHLC8iWC7/5Tjn\n1QSrdYQ8nA8Ybn/EU93X2oAronyZLiJ4vo7wVM+WFpdsqyRBYpkklr2X5fUF6191hxQClmYn\nCz3OMYeaTbXWlA4NjBzM8sjGKWPsrBxSOJW1PlgK1ep2vVS/AHvFFiTFb3WYAdDf7kUEj63z\nPNI9VIZJmZLk+tbzVOvCei8tmJi18GsxR7m22ZGrCpZaC73IWNUiaBonFzghrtoHy2Xa8/ED\nYXOIUZAU1nNi5/6qo2jFhVhenR6bDrlKScKPc4p0i4uPqMxccR4slWq4DHjVibFWtmW3ZdQt\nVd2fO3HjIoLndpm8LCSZF5PhNa3pJG5hGKJ5DETs+VJncBHBasZivJl8Crn0y+bJr1aR9Vc/\nC7Bvw07Y1zqDqILfj/7ahaIora9dON+oo2rSPau1fd5dS2d6OP+btyZPmmxoQzVbYbadQbD6\nHlFwkxMF29eyXBtkGZHKP7q9XNru+TuxeJ1bcxpCId5Un2rbbyvMsjMIWN8jCi5F9jdcItri\n+sG06k2V0+BY6rdCnSbRidR0X1JvpCv/aMk/XrDvvW0FWyV+KOdzd+lxuwK4wa+cPE/RD1W5\nnJ+e0lBG0YIK1jvnsXfXP2D1dXm24JCDv4iCP/sw7WVFkvacNAieKuzckZoFy6VxJWmPguVi\nW54o2K5OXlSwVw2ebhTBkvyfKvSa4KVFnpKWa4Jnz0t5dMEOgyZl952Ntf8+xO2DX3V/73Af\nvNJGkyotZ6HTtsvLJCnytKG71fpggw29D3YYBgmLra/ZBw/XGR7Jm60tPwXrHa36BKm/tEmX\nRJne+2rbaE/qNuj+9KmTuOYoup0Hl/08OCseB+fBklhRlM7Vl3aqil6yr1wS0NMk29AnTUU6\nU6gxu0AJRdklwSx+BQhmDgQzB4KZA8HMSVQwCIbDpx9eaLzsgqbGtGgQfEpi6RQNgk9JLJ2i\nQfApiaVTNAg+JbF0igbBpySWTtEg+JTE0ikaBJ+SWDpFg+BTEkunaBB8SmLpFA2CT0ksnaJB\n8CmJpVM0HOphDgQzB4KZA8HMgWDmQDBzIJg5EMwcCGYOBDMHgpkDwcyBYOZAMHMgmDkQzJyY\ngstMZOXmch5rPKdikiR2764klR9PYTex5i7EvQpVtJa3CJhYT0TBw7ItucOe1fSzOpLE7l0z\nZb9B1gRJTGb9FlWYorU0mQj0PmfiCX6LrJJVJnaWbDHQ7iT0JHbvriQl7k3XINxDJNZ+W+7d\nTRGkaB3F8EbDJDYQT3ApXu3tn3jY7vgUt1EwSWL3rpliSKhLzz+xtgI3Y2IhUuteHd5okMRG\n4gkuRLdwWtV/360Q5bTKFEli9+52kiJkYlmgotXTNzlY0WRMwcrKgjZU+r7zWmdbd7dougWQ\nQyVWimegot1ErSzC5120fpv9TQLhLPhjX+83/uxauDCJta1qGahoD/GnrrIIwY5vvM6KYIk9\ni6zvCf1T6xtcCJ7+ebzxJruFS6zl3rXR/qnl3dztyoIzf8Ekid2769zygInJrkfPAqR27wfG\nw8vBiiZjCh5GfrX9KFrO74QksXt3jTq/1cESW0rnnRpdKilc0WIKfvRf0df22tIrjIJJErt3\nV3jNVxAJkNgwD667iJJ3alRwgKIt6e5vEgj3SNYs2D/CUy9XiAkVyWqKrg8OFHwa3ug1I1ly\nuBDP9iV4Vpg6G5LE7l0jd7JknHdiUyz6WBKH3v74RsMkNqR4YJtANP0REKddJ8Ekid27Kwkt\ngr0Tk/1BnfwZpmj0jYZJbEjxyEbgukAwcyCYORDMHAhmDgQzB4KZA8HMgWDmQDBzIJg5EMwc\nCGYOBDMHgpkDwcyBYOZAMHMgmDkQzBwIZg4EMweCmQPBzIFg5kAwcyCYORDMHAhmDgQzB4KZ\nA8HMgWDmXF4wXbxke7vDCZqfp2tW315K1uML7xQ/zBTLZEUswdqa1Q+SdTa8MC32nBYplskK\np4XV7JnXrH6KW9Ot9F5Nr7ymtW6KSEWxI8UyWRHnU13WrL71Put5iaomG1cj+9ttRL5CimWy\nQvlUX20tysbVX5tcFO2/uhDZtFro8qjvScd1qF+tu6FTnROkGw5PzmtWT/+mJYwKMVw5YV7s\nOTFSLJMV9FN9DH1i2T9diN5KNvaYg7fp0dCT3rsnn8NOT5og2XDAuGZ1/8JUlefFnhMjxTJZ\nQcdYQvz1TWV/99ZM/57dWoODt+nRa1wqrl/Ys+p2ypcE6YY0o/5f3i8T+Z5kThV4Wew5MVIs\nkxWfg+hR8FvO/6YVeJdHxbjYY//kS0uQbqg+LzuTRSOrqTmuusUMpbLYc2KkWCYr1E+1fj1u\no2Dy6iKYPhrvlm1rXlVqgsqyzVpGfeM9jZjL8dtBFntOjBTLZIXyqd5Ia01e3RQsH52yrKYJ\nbglup0jt8EtdwZou9pwYKZbJCvqp3kX+fNW2gtumusz1PlhPWntUDdtP1z05Gm35AumVyBL6\nmfb3DwkmfbCezI7gYY3o52D2OQ6+Ifg8VMHvafyzI5iMovNh6H20BvdrRL/7nbrvSfW5RVqk\nWCYraOUpx3vvXcFTby2GCJQQy9raO4KbYY3ooWnORfO5RVqkWCYrlNbx3k5g3y8yZVkV3B8T\nGmazfSTrTRLc7oPrNpPipT5nepQIKZYpHm7rz1+KHxXcx7yawukKIdfiRwU/lCO5nPlRwfLZ\ndrw5//r7u4J/BghmDgQzB4KZA8HMgWDmQDBzIJg5EMwcCGYOBDMHgpkDwcyBYOZAMHMgmDkQ\nzBwIZg4EMweCmQPBzIFg5kAwcyCYORDMHAhmDgQz5z88+EJxQ/TlcwAAAABJRU5ErkJggg==",
"text/plain": [
"plot without title"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"options(repr.plot.width = 4, repr.plot.height = 4)\n",
"plot(lalonde$re74, lalonde$re78, xlab = 'Earnings in 1974', ylab = 'Earnings in 1978')\n",
"abline(lm(lalonde$re78 ~ lalonde$re74), lwd=3, col = \"magenta\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**13. Make a function with a single argument (column number) that outputs the median of that column. Advanced: if the user specifies a non-numeric column, then the function returns an error message.**"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:14.667317Z",
"start_time": "2019-12-30T14:49:14.602Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"0"
],
"text/latex": [
"0"
],
"text/markdown": [
"0"
],
"text/plain": [
"[1] 0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"0"
],
"text/latex": [
"0"
],
"text/markdown": [
"0"
],
"text/plain": [
"[1] 0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"3701.81"
],
"text/latex": [
"3701.81"
],
"text/markdown": [
"3701.81"
],
"text/plain": [
"[1] 3701.81"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"1"
],
"text/latex": [
"1"
],
"text/markdown": [
"1"
],
"text/plain": [
"[1] 1"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"1"
],
"text/latex": [
"1"
],
"text/markdown": [
"1"
],
"text/plain": [
"[1] 1"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"3701.81"
],
"text/latex": [
"3701.81"
],
"text/markdown": [
"3701.81"
],
"text/plain": [
"[1] 3701.81"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"median_lalonde_column <-\n",
" function(column)\n",
" {\n",
" if (is.numeric(lalonde[, column])) \n",
" {return(median(lalonde[, column]))} \n",
" else return('Error')\n",
" }\n",
"\n",
"### How you would use the function...\n",
"median_lalonde_column(column = 7)\n",
"\n",
"median_lalonde_column(column = 8)\n",
"\n",
"median_lalonde_column(column = 9)\n",
"\n",
"median_lalonde_column(column = 10)\n",
"\n",
"median_lalonde_column(10)\n",
"\n",
"median_lalonde_column(9)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**14. Run a univariate regression, with \"age\" as a predictor (x variable), re75 as outcome (the \"y\"). Interpret the 2 coefficients (of the intercept, and the x variable).**"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:49:15.492351Z",
"start_time": "2019-12-30T14:49:15.464Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"Call:\n",
"lm(formula = lalonde$re75 ~ lalonde$age)\n",
"\n",
"Coefficients:\n",
"(Intercept) lalonde$age \n",
" 784.62 23.35 \n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"lm(formula = lalonde$re75 ~ lalonde$age)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The regression coefficients identify the following linear model:\n",
"\n",
"$$\\text{re75} = 784.62 + 23.35 \\times \\text{age}$$\n",
"\n",
"In other words, the intercept means that if we interpret our model literally, our model predicts real earnings in 1975 is \n",
"\\\\$784.62 for someone with an age of zero. And, the coefficient on \"age\" means that our model predicts that every additional year of age is associated with \\\\$23.35 more real earnings in 1975, e.g., our model predicts that someone 50 years old would earn \\\\$784.62 + \\\\$1167.50 = \\\\$1952.12 in 1975."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**15. Run a regression with 2 predictors, \"age\" and \"educ\", with re75 as the outcome (the \"y\").**"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:51:00.097356Z",
"start_time": "2019-12-30T14:51:00.057Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"Call:\n",
"lm(formula = lalonde$re75 ~ lalonde$age + lalonde$educ)\n",
"\n",
"Residuals:\n",
" Min 1Q Median 3Q Max \n",
"-2073.4 -1403.5 -1209.8 -67.1 23553.9 \n",
"\n",
"Coefficients:\n",
" Estimate Std. Error t value Pr(>|t|)\n",
"(Intercept) 348.95 1006.16 0.347 0.729\n",
"lalonde$age 23.10 21.08 1.096 0.274\n",
"lalonde$educ 43.36 83.51 0.519 0.604\n",
"\n",
"Residual standard error: 3153 on 442 degrees of freedom\n",
"Multiple R-squared: 0.003377,\tAdjusted R-squared: -0.001132 \n",
"F-statistic: 0.7489 on 2 and 442 DF, p-value: 0.4735\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"lm2 <- lm(lalonde$re75 ~ lalonde$age + lalonde$educ)\n",
"summary(lm2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The regression coefficients identify the following linear model:\n",
"\n",
"$$ \\text{re75} = 348.95 + 23.10 \\times \\text{age} + 43.36 \\times \\text{educ}$$\n",
"\n",
"In other words, the intercept means that if we interpret the model literally, our model predicts real earnings in 1975 is \\\\$348.95 for someone with an age of zero. And, the coefficient on \"age\" means that our model predicts that every additional year of age is associated with $23.10 more real earnings in 1975. Every additional year of education is associated with \\\\$43.36 more real earnings in 1975. For example, our model predicts that someone 50 years old with 12 years of education would earn \\\\$348.95 + \\\\$1155 + \\\\$520.32 = \\\\$2024.27 in 1975.\n",
"\n",
"If you'd like to see the predicted values for every person in the data set (given this regression model), just type:\n",
"```predict(lm2)```"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-30T14:51:00.801466Z",
"start_time": "2019-12-30T14:51:00.574Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead><tr><th scope=col>age</th><th scope=col>educ</th><th scope=col>black</th><th scope=col>hisp</th><th scope=col>married</th><th scope=col>nodegr</th><th scope=col>re74</th><th scope=col>re75</th><th scope=col>re78</th><th scope=col>u74</th><th scope=col>u75</th><th scope=col>treat</th><th scope=col>predict(lm2)</th></tr></thead>\n",
"<tbody>\n",
"\t<tr><td>37 </td><td>11 </td><td>1 </td><td>0 </td><td>1 </td><td>1 </td><td>0 </td><td>0 </td><td> 9930.05</td><td>1 </td><td>1 </td><td>1 </td><td>1680.688</td></tr>\n",
"\t<tr><td>22 </td><td> 9 </td><td>0 </td><td>1 </td><td>0 </td><td>1 </td><td>0 </td><td>0 </td><td> 3595.89</td><td>1 </td><td>1 </td><td>1 </td><td>1247.429</td></tr>\n",
"\t<tr><td>30 </td><td>12 </td><td>1 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>24909.50</td><td>1 </td><td>1 </td><td>1 </td><td>1562.325</td></tr>\n",
"\t<tr><td>27 </td><td>11 </td><td>1 </td><td>0 </td><td>0 </td><td>1 </td><td>0 </td><td>0 </td><td> 7506.15</td><td>1 </td><td>1 </td><td>1 </td><td>1449.659</td></tr>\n",
"\t<tr><td>33 </td><td> 8 </td><td>1 </td><td>0 </td><td>0 </td><td>1 </td><td>0 </td><td>0 </td><td> 289.79</td><td>1 </td><td>1 </td><td>1 </td><td>1458.204</td></tr>\n",
"\t<tr><td>22 </td><td> 9 </td><td>1 </td><td>0 </td><td>0 </td><td>1 </td><td>0 </td><td>0 </td><td> 4056.49</td><td>1 </td><td>1 </td><td>1 </td><td>1247.429</td></tr>\n",
"</tbody>\n",
"</table>\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllllllll}\n",
" age & educ & black & hisp & married & nodegr & re74 & re75 & re78 & u74 & u75 & treat & predict(lm2)\\\\\n",
"\\hline\n",
"\t 37 & 11 & 1 & 0 & 1 & 1 & 0 & 0 & 9930.05 & 1 & 1 & 1 & 1680.688\\\\\n",
"\t 22 & 9 & 0 & 1 & 0 & 1 & 0 & 0 & 3595.89 & 1 & 1 & 1 & 1247.429\\\\\n",
"\t 30 & 12 & 1 & 0 & 0 & 0 & 0 & 0 & 24909.50 & 1 & 1 & 1 & 1562.325\\\\\n",
"\t 27 & 11 & 1 & 0 & 0 & 1 & 0 & 0 & 7506.15 & 1 & 1 & 1 & 1449.659\\\\\n",
"\t 33 & 8 & 1 & 0 & 0 & 1 & 0 & 0 & 289.79 & 1 & 1 & 1 & 1458.204\\\\\n",
"\t 22 & 9 & 1 & 0 & 0 & 1 & 0 & 0 & 4056.49 & 1 & 1 & 1 & 1247.429\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"age | educ | black | hisp | married | nodegr | re74 | re75 | re78 | u74 | u75 | treat | predict(lm2) | \n",
"|---|---|---|---|---|---|\n",
"| 37 | 11 | 1 | 0 | 1 | 1 | 0 | 0 | 9930.05 | 1 | 1 | 1 | 1680.688 | \n",
"| 22 | 9 | 0 | 1 | 0 | 1 | 0 | 0 | 3595.89 | 1 | 1 | 1 | 1247.429 | \n",
"| 30 | 12 | 1 | 0 | 0 | 0 | 0 | 0 | 24909.50 | 1 | 1 | 1 | 1562.325 | \n",
"| 27 | 11 | 1 | 0 | 0 | 1 | 0 | 0 | 7506.15 | 1 | 1 | 1 | 1449.659 | \n",
"| 33 | 8 | 1 | 0 | 0 | 1 | 0 | 0 | 289.79 | 1 | 1 | 1 | 1458.204 | \n",
"| 22 | 9 | 1 | 0 | 0 | 1 | 0 | 0 | 4056.49 | 1 | 1 | 1 | 1247.429 | \n",
"\n",
"\n"
],
"text/plain": [
" age educ black hisp married nodegr re74 re75 re78 u74 u75 treat\n",
"1 37 11 1 0 1 1 0 0 9930.05 1 1 1 \n",
"2 22 9 0 1 0 1 0 0 3595.89 1 1 1 \n",
"3 30 12 1 0 0 0 0 0 24909.50 1 1 1 \n",
"4 27 11 1 0 0 1 0 0 7506.15 1 1 1 \n",
"5 33 8 1 0 0 1 0 0 289.79 1 1 1 \n",
"6 22 9 1 0 0 1 0 0 4056.49 1 1 1 \n",
" predict(lm2)\n",
"1 1680.688 \n",
"2 1247.429 \n",
"3 1562.325 \n",
"4 1449.659 \n",
"5 1458.204 \n",
"6 1247.429 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Or, to see both the data and the predictions, we \"column bind\" the dataset to the output of predict().\n",
"head(cbind(lalonde, predict(lm2)))"
]
}
],
"metadata": {
"gist": {
"data": {
"description": "Quiz 1.ipynb",
"public": true
},
"id": ""
},
"kernelspec": {
"display_name": "R [conda env:renv]",
"language": "R",
"name": "conda-env-renv-r"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment