Last active
August 29, 2015 14:05
-
-
Save karimkhanp/6b4fcee353dc1a8c499a to your computer and use it in GitHub Desktop.
Important terminologies for mlp stuff
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Freebase - is a large collaborative knowledge base consisting of metadata composed mainly by its community members. It is an online collection of structured data harvested from many sources, including individual 'wiki' contributions | |
-> The MQL Read and MQL Write APIs provides access to the Freebase database using the Metaweb query language (MQL). | |
DBpedia - (from "DB" for "database") is a project aiming to extract structured content from the information created as part of the Wikipedia project. This structured information is then made available on the World Wide Web.[1] DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets | |
-> Data is accessed using an SQL-like query language for RDF called SPARQL. For example, imagine you were interested in the Japanese shōjo manga series Tokyo Mew Mew, and wanted to find the genres of other works written by its illustrator. DBpedia combines information from Wikipedia's entries on Tokyo Mew Mew, Mia Ikumi and on works such as Super Doll Licca-chan and Koi Cupid. | |
dandelion - Find places, persons, brands, and events in documents and social media | |
-> datatxt semantic api - An entity extraction API that automatically links documents and social media content to our graph of places, persons and events. | |
-> datagem api - A semantic graph of high quality contextual location data from hundreds of data sources, public and private. | |
Sementic web - The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily | |
-> It deals with unstructured data on web and provides structured data that can be used in meaningful way | |
SPARQL - SPARQL (pronounced "sparkle", an acronym for SPARQL Protocol and RDF Query Language) is an RDF query language, that is, a | |
query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format. | |
-> for dbpedia | |
MQL - The MQL Read and MQL Write APIs provides access to the Freebase database using the Metaweb query language (MQL). https://www.freebase.com/query | |
Tfidf - inverse document frequency consider proportional occurance. if any word occurreing too much time without any effect, then it's ratio would be automatically diminished while simple term frequency consider actuall occurence only | |
like you want to search best document for "to pune" isme "to" will occure too many time, and reduce the affect of pune, so inverse ka concept he vo "to" ki value reduce karega and effect down karega taaki "pune" word also get enough consideration | |
example : http://en.wikipedia.org/wiki/Tf%E2%80%93idf#Example_of_tf.E2.80.93idf | |
LSA - Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts. | |
Data sampling : is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points in order to identify patterns and trends in the larger data set being examined. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment