From a reddit-comment:
stupid question on "semanticweb". How do I actucally get data? It says library of congress is on 'link-web-data' now. If I want to get a book name by ISBN (using LOCs 'linked data') how would I do that?
Is there a website for the standardize format of link-data? Are there APIs available?
Also how do I cross correlate link-data? Say Amazon also had a link data set (or other "publisher"). How do I correlate ISBN numbers between Amazon, LOCs, the patent office, etc... to verify the integrity of such data. Lots of stuff on goggle is inaccurate, but that is "ok" because people are verifying it. But with an application, you need a way to insure the data is correct and what you are actucally looking for.
The different data-providers usually provide a data-dump (eg. DBpedia, LOC headers). This means loading it in a triplestore and manipulating/querying it yourself (see below).
Some publishers provide endpoint-specific browsers. For instance, with DBPedia's browser or Dataincubator's Linked Periodicals browser, you have a spartan but functional interface to search and browse around.
There are also generic browsers link Tabulator, Disco, Marbles or Zitgist. These can aggregate links/data across the linked data-cloud while browsing.
Tabulator-example.
Free text search. Freebase rdf interface. BBC data? Sindice
Most endpoints provide a SPARQL-endpoint.
Example SPARQL query (online)
Example SPARQL query (command line)
$roquet DESCRIBE
Example SPARcool query
Publishing and interlinking (large) datasets is (see cloud 1, cloud 2). It is not yet "polished" enough for end-users to get what they want with a simple search. An current initiative is to describe the using Void (Vocabulary...) to describe the contents of datasets, the URI's to use, etc. My guess is that interfaces will start to use this information to provide a more streamlined process.
Linked data shared a common datamodel (RDF), but is serialized in different syntaxes. The most common are RDF/XML, Turtle and ntriples. RDF/XML is XML-based, and is the de facto standard for interchange. Turtle is more human-readable and suited for manual authoring. Ntriples is the "raw" dump, where for instance namespaces are not abbreviated, etc.: useful for debugging strange behavior.
These tree are pretty universally supported by services and tools, and you can roundtrip between them, either using local tools or webservices (morph, babel, triplr, ...
A second set of formats you can come into contact with is when you use SPARQL. A SPARQL-query returns a fixed resultset in either XML or JSON. The two first examples are the standard, and will likely be the format that you application will consume.
Example with triplr
Surf to http://
Command line example with rapper
# rapper is part of [librdf]()
# Install with apt-get install redland-utils
$rapper -i turtle -o rdfxml some_turtle_file.ttl > some_rdfxml_file.xml
API-example with rdflib
# Python packge rdflib
# install with easy_install rdflib
from rdflib import Graph
g = Graph()
g.load() # file in RDF/XML-format
g.load() # file in Turtle-format
len(g)
>>> X # number of triple loaded)
g.serialize(format="ntriples") # serialize in Ntriples format
g.serialize(
SPARQL query results:
SPARQL-XML results
SPARQL-JSON results
This depends on what you mean by "verify the integrity of the data".
Compare with md5-checksumming of data, etc. This is a planned but currently underdeveloped area. XML-sig. Difficulty is that for instance the RDF/XML-serialization is not fixed/ordered;
This is equivalent with e.g. receiving and handling additional data through REST or SOAP.
See triplestores. Most are a combination of store, API, query-interface, etc. The Java-based ones (Jena, Sesame) are imho the most mature, ARC (PHP) is the most user-friendly to start with. rdflib (Python) is pretty decent.
Exhibt is a very nice "ajax"