This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MLP : | |
svm - https://en.wikipedia.org/wiki/Support_vector_machine | |
neural network - https://www.quora.com/What-is-a-simple-explanation-of-how-artificial-neural-networks-work-1/answer/Annalyn-Ng?srid=XpXu | |
- https://www.quora.com/What-is-a-simple-explanation-of-how-artificial-neural-networks-work-1/answer/Chris-Nicholson-1?srid=XpXu | |
perceptron model | |
k-mean | |
naive bayes - | |
https://web.stanford.edu/class/cs124/lec/naivebayes.pdf | |
http://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-bayes-classification | |
http://deeplearning4j.org/sentiment_analysis_word2vec.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Supervised - Machine learning based | |
Unsupervised - Lexicon based | |
English lang: | |
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon | |
http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar - lexicon sentiment dictionary | |
http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar | |
https://sites.google.com/site/datascienceslab/projects/multilingualsentiment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import gearman | |
gm_worker = gearman.GearmanWorker(['localhost:4730']) | |
def task_listener_reverse(gearman_worker, gearman_job): | |
print 'Reversing string: ' + gearman_job.data | |
return gearman_job.data[::-1] | |
# gm_worker.set_client_id is optional |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
import gearman | |
import time | |
import sys | |
def check_request_status(job_request): | |
if job_request.complete: | |
#print len(job_request.result) | |
#data = json.loads(job_request.result) | |
print "Job %s finished! Result: %s - %s" % (job_request.job.unique, job_request.state, job_request.result) | |
elif job_request.timed_out: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://scrapmaker.com/home | |
http://www.momswhothink.com/reading/list-of-verbs.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bigdata is like combination of bunch of subjects. Mainly require programming, analysis, nlp, MLP, mathematics. | |
To see links, Go : http://www.quora.com/What-are-some-good-sources-to-learn-big-data | |
Here are bunch of courses I came accross: | |
Introduction to CS Course | |
Notes: Introduction to Computer Science Course that provides instructions on coding. | |
Online Resources: | |
Udacity - intro to CS course, | |
Coursera - Computer Science 101 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://www.cia.gov/library/publications/download/ | |
http://flowingdata.com/2009/10/01/30-resources-to-find-the-data-you-need/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Freebase - is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions | |
-> The MQL Read and MQL Write APIs provides access to the Freebase database using the Metaweb query language (MQL). | |
DBpedia - (from "DB" for "database") is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web.[1] DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets | |
-> Data is accessed using an SQL-like query language for RDF called SPARQL. For example, imagine you were interested in the Japanese shōjo manga series Tokyo Mew Mew, and wanted to find the genres of other works written by its illustrator. DBpedia combines information from Wikipedia's entries on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another. | |
Example: character recornization | |
Make the system learn by 10 sample of 1-10 digits | |
While learning we will collect pixel positions for each digits. | |
Like if are learning digit '1'. So for each of 10 test, we will collect pixel position. We can store normalized mean value for digit 1 now. | |
Suppose now new digit comes and we want to identify it. So we will calculate the euclidian distance for input digit to all database learned digits. For which every euclidian distance is least, that is predicted digit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-------------------------------------------------------- Edit to Enlarge ---------------------------------------------- | |
Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications. | |
Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png | |
As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success. | |
SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery. |