This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some things journalists may want to consider: | |
1. Anecdotes can mislead. People seeing another yet another episodic story on crime may infer that crime is increasing. | |
So report numbers where trustworthy numerical data are available. | |
2. But numbers need to be reported carefully. Most people, when reading news, do not do back of the envelope calculations to interpret data correctly. | |
So ill-reported numbers can mislead. | |
3. Rules for numbers: | |
a. % changes than changes in %. The former is more impressive when the base rate is low. Latter generally a better way to report things. If confused, report t1 and t2. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Text from Searchable pdfs | |
Scrape Text off Wisconsin Ads pdfs | |
Uses pyPdf to get text from searchable pdfs. The script is for tailored for getting data | |
from Wisconsin Political Ads Database: http://wiscadproject.wisc.edu/Storyboards. | |
@author: Gaurav Sood | |
Created on November 02, 2011 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Basic Sentiment Analysis | |
Builds on: | |
https://finnaarupnielsen.wordpress.com/2011/06/20/simplest-sentiment-analysis-in-python-with-af/ | |
Utilizes AFINN or a custom sentiment db | |
Example Snippets at end from: https://code.google.com/p/sentana/wiki/ExampleSentiments | |
''' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Note: | |
55000/(365*4) ~ 37.7. That seems a touch low for Sec. of state. | |
Caveats: | |
1. Clinton may have used more than one private server | |
2. Clinton may have sent emails from other servers to unofficial accounts of other state department employees | |
Lower bound for missing emails from Clinton: | |
Take a small weighted random sample (weighting seniority more) of top state department employees. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Gets Congressional speech text, arranged by speaker. | |
Produces a csv (capitolwords.csv) with the following columns: | |
speaker_state,speaker_raw,speaker_first,congress,title,origin_url,number,id,volume,chamber,session,speaker_last, | |
pages,speaker_party,date,bills,bioguide_id,order,speaking,capitolwords_url | |
Uses the Sunlight foundation library: http://python-sunlight.readthedocs.org/en/latest/ | |
''' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
What does it do? | |
Goes through a corrupted csv sequentially and outputs rows that are clean. | |
Also outputs, total n, total corrupted n | |
@author: Gaurav Sood | |
Run: python salvage_csv.py input_csv output_csv | |
''' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
" | |
Basic Text Classifier | |
- Takes a csv with a text column, and column of labels | |
- Splits into train and test | |
- Preprocesses text using tm/bag-of-words, 1/2-order Markov | |
- Uses SVM and Lasso | |
@author: Gaurav Sood | |
" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
" | |
Weighting by Propensity Scores | |
Last Edited: 5/31/2015 | |
Task Outline: | |
1. Two datasets: | |
dataset 1: large pop. representative sample | |
dataset 2: convenient sample | |
2. Create weights for dataset 2 so that its marginals are close to dataset 1 on some vars. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
apt-get upgrade | |
apt-get update | |
sudo aptitude install emacs24 | |
sudo aptitude install r-base | |
sudo aptitude install libcurl4-openssl-dev | |
sudo aptitude install libxml2-dev | |
apt-get install openjdk-7-* | |
R CMD javareconf -e |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Name | District | Education | Science | Law | |
---|---|---|---|---|---|
Jeff Sessions (R) | AL-Senate | B.A., Huntingdon College; J.D. University of Alabama School of Law | 1 | ||
Richard Shelby (R) | AL-Senate | B.A., University of Alabama; J.D. University of Alabama School of Law | 1 | ||
Jo Bonner (R) | AL-1 | B.A. Journalism, University of Alabama | 0 | ||
Bobby Bright (D) | AL-2 | B.A. Political Science, Auburn University; M.S. Criminal Justice, Troy State University; J.D. Thomas Goode Jones School of Law | 1 | ||
Mike Rogers (R) | AL-3 | B.A., Political Science; M.P.A., Jackson State University; J.D. Birmingham School of Law | 1 | ||
Robert Aderholt (R) | AL-4 | B.A., Political Science/History, Birmingham Southern College; J.D., Samford University | 1 | ||
Partker Griffith (D) | AL-5 | B.S.; M.D., Louisiana State University | 0 | ||
Spencer Bachus (R) | AL-6 | B.A., Auburn University; J.D., University of Alabama | 1 | ||
Artur Davis (D) | AL-7 | B.A., Government, Harvard University; J.D., Harvard University School of Law | 1 |
OlderNewer