- Full text search
- Map Reduce
- Graph searching and traversal
- Machine Learning/statistics
- NLP? (for text)
- word counts
- basic statistics
- averages
- standard deviations
- regression
(http://aws.amazon.com/datasets)
- All the books on project gutenburg (http://www.gutenberg.org)
- Could be very gooasd to work with full text searching
- word counting and frequency analysis?
- All of wikipedia content
- Most commonly appear 'red links' (list of topics that do not have articles)
- Common Crawl Corpus (http://aws.amazon.com/datasets/41740)
- ...search engine?
- Marvel Universe Social Graph (http://aws.amazon.com/datasets/5621954952932508)
- Social graph dataset would be very conducive to graph searching and Neo4j
- ElasticSearch/Solr/Lucene
- Hadoop/HDFS
- Neo4j
- First meeting: 9/12, intro
- Second meeting: 9/26, intro to full text search
- Third meeting: 10/10, full text search workshop
- Fourth meeting: 10/24, map reduce
- Fifth meeting: 11/7, map reduce
- Sixth meeting: 11/21, map reduce
- Sevent meeting: 12/5, last meeting? questions? graphs?