karimkhanp · August 29, 2015 14:05
diff --git a/nlp_defs b/nlp_defs
 Freebase -  is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions
     -> The MQL Read and MQL Write APIs provides access to the Freebase database using the Metaweb query language (MQL).

 DBpedia  - (from "DB" for "database") is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web.[1] DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets
     -> Data is accessed using an SQL-like query language for RDF called SPARQL. For example, imagine you were interested in the Japanese shōjo manga series Tokyo Mew Mew, and wanted to find the genres of other works written by its illustrator. DBpedia combines information from Wikipedia's entries on Tokyo Mew Mew, Mia Ikumi and on works such as Super Doll Licca-chan and Koi Cupid. 

 dandelion - Find places, persons, brands, and events in documents and social media
     -> datatxt semantic api - An entity extraction API that automatically links documents and social media content to our graph of places, persons and events.
     -> datagem api - A semantic graph of high quality contextual location data from hundreds of data sources, public and private.

 Sementic web - The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily
     -> It deals with unstructured data on web and provides structured data that can be used in meaningful way

 SPARQL - SPARQL (pronounced "sparkle", an acronym for SPARQL Protocol and RDF Query Language) is an RDF query language, that is, a                     
       query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format. 
       -> for dbpedia

 MQL - The MQL Read and MQL Write APIs provides access to the Freebase database using the Metaweb query language (MQL). https://www.freebase.com/query

 Tfidf - inverse document frequency consider proportional occurance. if any word occurreing too much time without any effect, then it's ratio would be automatically diminished while simple term frequency consider actuall occurence only

       like you want to search best document for "to pune" isme "to" will occure too many time, and reduce the affect of pune, so inverse ka concept he vo "to" ki value reduce karega and effect down karega taaki "pune" word also get enough consideration
 example : http://en.wikipedia.org/wiki/Tf%E2%80%93idf#Example_of_tf.E2.80.93idf

 LSA - Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.

 Data sampling :  is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points in order to identify patterns and trends in the larger data set being examined.
	Freebase - is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions
	-> The MQL Read and MQL Write APIs provides access to the Freebase database using the Metaweb query language (MQL).

	DBpedia - (from "DB" for "database") is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web.[1] DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets
	-> Data is accessed using an SQL-like query language for RDF called SPARQL. For example, imagine you were interested in the Japanese shōjo manga series Tokyo Mew Mew, and wanted to find the genres of other works written by its illustrator. DBpedia combines information from Wikipedia's entries on Tokyo Mew Mew, Mia Ikumi and on works such as Super Doll Licca-chan and Koi Cupid.

	dandelion - Find places, persons, brands, and events in documents and social media
	-> datatxt semantic api - An entity extraction API that automatically links documents and social media content to our graph of places, persons and events.
	-> datagem api - A semantic graph of high quality contextual location data from hundreds of data sources, public and private.

	Sementic web - The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily
	-> It deals with unstructured data on web and provides structured data that can be used in meaningful way

	SPARQL - SPARQL (pronounced "sparkle", an acronym for SPARQL Protocol and RDF Query Language) is an RDF query language, that is, a
	query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format.
	-> for dbpedia

	MQL - The MQL Read and MQL Write APIs provides access to the Freebase database using the Metaweb query language (MQL). https://www.freebase.com/query

	Tfidf - inverse document frequency consider proportional occurance. if any word occurreing too much time without any effect, then it's ratio would be automatically diminished while simple term frequency consider actuall occurence only

	like you want to search best document for "to pune" isme "to" will occure too many time, and reduce the affect of pune, so inverse ka concept he vo "to" ki value reduce karega and effect down karega taaki "pune" word also get enough consideration
	example : http://en.wikipedia.org/wiki/Tf%E2%80%93idf#Example_of_tf.E2.80.93idf

	LSA - Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.

	Data sampling : is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points in order to identify patterns and trends in the larger data set being examined.