Created
June 14, 2012 01:40
-
-
Save JeniT/2927644 to your computer and use it in GitHub Desktop.
Possible way to provide POSTable URI in RDF
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<http://www.amazon.com/gp/product/B000QECL4I> | |
eg:reviews <http://www.amazon.com/product-reviews/B000QECL4I> ; | |
eg:order "http://www.amazon.com/gp/product/B000QECL4I{?copies}" ; | |
. | |
and then the definition of eg:reviews would say "the object of this property | |
provides reviews of the subject of this property" and the definition of | |
eg:order would say "POST to the URI generated by expanding the URI template | |
value of this property where the copies variable is the number of copies to | |
be ordered" | |
dunno on question of whether URI template should have its own datatype |
dret
commented
Jun 16, 2012
via email
| Atom uses link elements, in the body of a message in order to indicate
| edit links, etc. HTML uses form elements. In both cases we don't
| communicate at the HTTP level anything other than the media type. But
| we do use media types in the content to provide the necessary control
| information.
yes, from this point of view, Atom and HTML are at the exact same level.
fwiw, we're thinking about adding query capabilities to Atom
(http://geofeeds.org/earthquakes/query_schema.xml is what we've
experiemented with so far, and at http://geofeeds.org/client/map_app you
can see how these declarative queries drive runtime form generation, and
the fact that all the data is spatial is just an implementation detail;
all data is feed-based in this scenario), and with the parameter
specification and URI template (we're a bit richer than HTML when it
comes to parameter types), Atom then is pretty much exactly where HTML is.
and just as a side note: our feed queries of course easily could be
mapped to a SPARQL query in the back-end, should the back-end be
implemented in a way that manages data in an RDF store.
| RDF doesn't provide a way to annotate links, e.g. to add media type.
| But we can annotate properties in a schema, or model interactions more
| explicitly, i.e. similar to HTML forms. I think this would let us add
| in the missing H Factors.
yes it would. the difference would be that in many other scenarios, you
specify the media type and that's almost always human-readable (schema,
interactions, processing model, and so forth, a lot of prose, usually),
because there's a limit to what you can do in machine-readable formats
anyway. and like @JeniT mentioned, clients will have to be hand-coded
for supporting these scenarios anyway.
interestingly, neither XML nor JSON ever made the step to add links to
their general model. XML and JSON have no idea about links, it's only
the vocabulary and its semantics that allow a client to understand that
something is a link. i don't think that this was a conscious decisions,
but i think it demonstrates that overformalizing at least in those cases
("let's build an abstract layer for representing the link concept and
then we build media types on that") was not what people were doing. Atom
sort of did that, but in a way that is directly and immediately useful,
and also has a good extension model.
| Now, lets assume we took the approach I described earlier in the
| discussion -- i.e. forms based service description.
| This does give us the ability to describe, for example, that a service
| could accept a SPARQL query (application/sparql-query) e.g. to execute
| it or maybe to store it for later execution. We could also describe
| services that accept application/atom+xml or whatever.
just to note: SPARQL is always a slippery slope here because it usually
build on the assumption you're SPARQLing into the back-end data, right?
in most other information management scenarios today, a lot of machinery
has been developed to avoid this; decoupling the service logic (i call
it the service surface) and the service's data model in the back-end has
proven to be a good idea for a variety of reasons, ranging from security
to performance issues. multi-tier is what pretty much everybody does.
| We can also say that a service supports submission of
| application/ld+json or application/rdf+xml or text/turtle. This is
| fine in the case where we're advertising the capabilities of a storage
| service. E.g. a graph store to which I can send some data for later
| querying. As no further knowledge of the graph contents is required by
| the server.
absolutely. if your service is a generic "store and query RDF" service,
then generic media types are the way to go.
same for XML: https://twitter.com/dret/status/213363704803241984
even though for some additional concepts exposed by such a generic
storage service (collections, users, all kinds of management such as
service load), you would also have a specific "service surface" how
these concepts are exposed ("i want to buy more storage space, here's my
payment info and the order").
| But if we want to describe the format of a graph that describes an
| order, then according to REST over HTTP, we really need a media type:
| e.g. x-example/order+turtle (or something). That allows us to document
| the required graph structure in a media type and achieve some shared
| understanding between client and server.
yes, i absolutely agree with that, but i think @JeniT disagrees with that.
| An RDF Schema or OWL ontology don't let us describe the structure of a
| graph in the same way that we could define a schema for an order
| document in XML. So that doesn't give us enough leverage.
the problem is that RDF has no concept of validation. but there should
be something similar, right? checking a graph against expectations can
surely be done somehow, is there some framework for that?
| The question is: are people creating services that exchange RDF
| documents in this way, that do more than just store or update data? It
| might be worth looking for examples of what services people are
| creating?
that is a very good question and a very important one. and it seems to
me that as soon as you move out of the fairly tightly coupled scenarios,
where people freely query each others stores, there's no way you can
continue doing it. the security implications are enormous. if i do BI
and aggregate all kind of company data using linked data, will i expose
a SPARQL endpoint over that dataset to 3rd-party analytics components?
in many cases, that might be immediately illegal in terms of data
privacy laws, and any risk analysis would immediately flag these things.
you need a service surface that only exposes what you want to expose
("here's how many health insurance applications we received in the last
24hrs"), and then you map that to some canned SPARQL. anything else just
doesn't fly in a decentralized and open environment.
and once you've come to the conclusion that in such a world, you need
service surfaces, then the question whether you design these in RDF,
XML, or JSON becomes a question of how you can realize the biggest
value, or maybe you design two or all three of them, if there's demand.
| The services that I most commonly see referenced in Linked Data aren't
| RDF consuming services: they're SPARQL endpoints, search endpoints,
| item description endpoints, all of which use other media types or
| simple link construction. The other kinds of services that are being
| used on SPARQL Update, HTTP Uniform Protocol, and services that simple
| store or patch RDF.
you're correct, and that is the reason why uptake in the enterprise
world has been close to zero. when you tell people you're going to
expose generic query capabilities to a potentially vast collection of
enterprise data, they get "SQL injection" and similar flashbacks, and
rightly so. enterprise data needs to be protected, and like i said
above, for health and similar data, you'll actually end up in jail if
you happily expose all the data you have.
EMC has mostly very large customers, and our biggest selling point for
many products is compliance: when you buy our stuff, we give you many
controls over how you can make sure that the right things get exposed to
the right people and services. this becomes mind-bogglingly complex when
you're a company that has many thousands of suppliers (we have cases
like this). we would like to leverage linked data's capabilities to
aggregate data from many sources and make sense out of them, but we
absolutely need services that we can customize and control (we need a
services platform to build on, that's why we joined LDP). otherwise,
nobody will buy the things we make, and rightly so. the vast majority of
data out there is not open, so instead of trying to do LODP, our working
group really should be LDP. (hey, i like that. maybe that will become my
new slogan!)
| But having said that I'm not sure I see a big problem with creating,
| say, a media type for an "order graph" expressed in turtle, if that
| helps document what a service expects. So long as there is a way to
| capture the media types in an RDF description, then there will be
| sufficient support to enable that, I think.
ok, i think we're getting somewhere. how we represent the service
surface is a question we have to discuss, and like i said, it should be
driven by how much value we create based on possible consumers (ours
are, since we're very cross-platform, mostly XML). the really important
aspect is that we are creating RESTful services based on media types.
thanks a lot for taking the time to go through this, your comments about
"everybody is just remote-SPARQling anyway" helped me a lot to
understand why different people see different problems and solutions.
On 15 Jun 2012, at 23:17, Erik Wilde wrote:
> But if we want to describe the format of a graph that describes an
> order, then according to REST over HTTP, we really need a media type:
> e.g. x-example/order+turtle (or something). That allows us to document
> the required graph structure in a media type and achieve some shared
> understanding between client and server.
yes, i absolutely agree with that, but i think @JeniT disagrees with that.
I don't disagree that a custom media type is useful, I am questioning whether it is the only thing that works.
At one level, I have a pragmatic concern that there are multiple syntaxes for RDF, and people who operate LDP-based services will find it burdensome to define specific media types for each of the different flavours: text/vnd.amazon.order+turtle, application/vnd.amazon.order+xml, application/vnd.amazon.order+json and so on. (Note that I'm assuming that the +xml variant is RDF/XML and the +json variant is JSON-LD if we went down this path we should work with IETF to define a structured syntax suffix registration for at least +turtle.)
At another level, I want there to be specification-level clarity that states that a custom media type for each service that accepts a POST/PUT, and guidance on how to use them. It is not clear to me, when someone says "according to REST over HTTP we must..." which specification they are referring to where this constraint is specified. It could be:
1. that the HTTP specification states that on an OPTIONS request, the server
MUST provide a response with a (eg) Accept-Content-Types header that lists
acceptable media types, and further that all POST/PUT requests that include
content that is valid according to that media type MUST be successful (ie
that the media type given in Accept-Content-Types must be defined at
a granular level, so you can't just say Accept-Content-Types: application/xml
unless you really do accept all XML)
2. that the need for a specific media type is only actually at the level of a
REST best practice rather than a constraint at the HTTP specification
level, but we want to make it a tighter constraint in LDP because we want
LDP to follow all REST best practices
#1 does not seem to be the case. I'm totally fine with #2 as long as we are honest that this is what we are doing and provide sufficient detail such that developers writing servers and clients know what they need to do to satisfy it.
I think that the REST best practice is not so much "the constraints on a POST/PUT should be identified through a media type" as "the constraints on a POST/PUT should be discoverable". @dret said:
in media types, that knowledge would be coupled to the link relation, either implicitly (submit something using this vocabulary when traversing such a link), or explicitly (often using ***@***.*** or ***@***.*** attributes in XML vocabularies). this allows clients to choose according to their capabilities and preferences, if servers provide alternatives, and those alternatives are communicated through media types. new capabilities may show up when a server starts supporting additional interactions, but clients often need to be updated (learning about the new media types) to be able to take advantage of these new capabilities.
Taking Atom as an example of good RESTful practice, I note that its `link` element has a `@type` attribute, but it is only defined in terms of the media type of the response to a GET on the `@href`, not on limiting what can be submitted when POSTing to that URI. The `edit` link relation defined in the Atom publishing protocol doesn't say anything about the interpretation of the `@type` attribute in this context either. The Atom service descriptions do have an `accept` element, but it's not specified how these are located. It would be really good to have an example of an RESTful API that is actually doing this _right_, that we could follow. Presumably you have an example in mind, @dret?
But anyway, let's explore some possible patterns in an XML world. As @dret said, the first possibility would be for the link relation to implicitly describe what is expected by the endpoint, so when you GET information about a product, it includes a link like:
<link rel="http://example.com/relation/order"
href="http://amazon.com/order" />
and by knowledge of the link relation http://example.com/relation/order (which is presumably defined at that URI, although there's no constraint to make that so within Atom so far as I can tell), an application can work out what it can send to the endpoint. This only works if there aren't endpoint-specific constraints on what's acceptable.
A second pattern is that the owner of the web service specifies a media type `application/vnd.order+xml` and that's used in the `@type` attribute, with the link relation `http://example.com/relation/order` specifying that the `@type` attribute indicates the media type of what can be POSTed to the URI in the `@href`:
<link rel="http://example.com/relation/order"
href="http://amazon.com/order"
type="application/vnd.amazon.order+xml" />
A third possible pattern would be to have the `application/xml` media type specify some media type parameters that enabled people to specify a schema location for and document element of some XML (there are multiple ways to cut that of course; I'm more interested in the pattern of using media type parameters than the niceties of what that would mean for XML). In that case, the link would look like:
<link rel="http://example.com/relation/order"
href="http://amazon.com/order"
type='application/xml;schema="http://amazon.com/schema/order.xsd";root="{http://amazon.com/schema/order}Order"' />
A fourth possible pattern would be to add `@x:schema` and `@x:root` attributes to Atom's `link` element to provide equivalent information, like this:
<link rel="http://example.com/relation/order"
href="http://amazon.com/order"
type="application/xml"
x:schema="http://amazon.com/schema/order.xsd"
x:root="{http://amazon.com/schema/order}Order" />
A fifth pattern would be to not define anything on the `link` element itself, but for the documentation of the link relation `http://example.com/relation/order` to state that applications can query on the `@href` URI using the OPTIONS method, and what should be returned in that case, and for that response to specify the constraints. So the document containing the link would have:
<link rel="http://example.com/relation/order"
href="http://amazon.com/order" />
just like in the first example, but doing an OPTIONS request on `http://amazon.com/order` would result in something like:
<service xmlns="http://www.w3.org/2007/app"
xmlns:atom="http://www.w3.org/2005/Atom">
<workspace>
atom:titleAmazon/atom:title
<collection href="http://amazon.com/order" >
atom:titleOrders/atom:title
<accept>application/vnd.amazon.order+xml</accept>
</collection>
</workspace>
</service>
with of course also the possibility for the `accept` element in this case to follow any of the patterns above.
There may be other plausible patterns. All these patterns are possible for RDF-based services too.
My hypothesis is that it's impossible in the general case to specify all possible constraints on acceptable POST/PUT entities. Some constraints are going to be unknowable because they depend on the state of the world at submission time (eg are there sufficient items in stock to fulfil the order). Other constraints are going to be endpoint specific (eg is the item of a type that the vendor sells).
So you have to draw the line somewhere. I think as a developer the crucial thing is discoverability: I would prefer to have the link relation/link/endpoint specify `application/xml` plus the schema and document element of the expected XML than for it to specify an unregistered media type of `application/vnd.amazon.order+xml`. But I may have missed some REST theory that states that this is not a good way of specifying constraints?
The equivalent for RDF would be for the property/endpoint metadata/endpoint itself to specify an RDF serialisation (application/rdf+xml, text/turtle etc) plus something that defines acceptable RDF graphs. As @ldodds said:
> An RDF Schema or OWL ontology don't let us describe the structure of a
> graph in the same way that we could define a schema for an order
> document in XML. So that doesn't give us enough leverage.
the problem is that RDF has no concept of validation. but there should
be something similar, right? checking a graph against expectations can
surely be done somehow, is there some framework for that?
This is a gap in the RDF stack (and one that's come up a few times during TAG discussions over the last few days). OWL inference can be run in a "closed world" mode that does a kind of validation. We have SPARQL graph patterns, but using them as a means of validating RDF would be like doing XML validation solely through XPath expressions. It would be nice to have a grammar more like RELAX NG for RDF graphs; I think that Eric Prud'hommeaux is interested in doing something like that, but it would surprise me if there weren't something similar around already from which we could learn.
We should really be on the LDP mailing list to discuss this rather than here...
| At one level, I have a pragmatic concern that there are multiple syntaxes for RDF, and people who operate LDP-based services will find it burdensome to define specific media types for each of the different flavours: text/vnd.amazon.order+turtle, application/vnd.amazon.order+xml, application/vnd.amazon.order+json and so on. (Note that I'm assuming that the +xml variant is RDF/XML and the +json variant is JSON-LD if we went down this path we should work with IETF to define a structured syntax suffix registration for at least +turtle.)
i absolutely agree that this is not nice on a variety of levels. i
wouldn't get my hopes too high on fixing the media types spec, though.
there's a lot of history to it, it's even bigger than the web, so making
any changes is a very sensitive thing to do. regarding the suffixes,
maybe that's something that could be done, but you'd end up answering a
lot of questions that are very hard to answer.
| 1. that the HTTP specification states that on an OPTIONS request, the server
| MUST provide a response with a (eg) Accept-Content-Types header that lists
| acceptable media types, and further that all POST/PUT requests that include
| content that is valid according to that media type MUST be successful (ie
| that the media type given in Accept-Content-Types must be defined at
| a granular level, so you can't just say Accept-Content-Types: application/xml
| unless you really do accept all XML)
i agree that this is not written down in these absolute terms anywhere.
and as you know, 99.99% of application/xml services then would have to
be application/xdm anyway (if there were such a media type).
| I think that the REST best practice is not so much "the constraints on a POST/PUT should be identified through a media type" as "the constraints on a POST/PUT should be discoverable". @dret said:
i like the term discoverable here, but then again the question remains
through what means. it could be HTTP (even thought it's not mandatory),
it could be registrations somewhere (that's the media type route), or it
could be through runtime mechanisms (which then need machinery that is
capable of using them).
| Taking Atom as an example of good RESTful practice, I note that its `link` element has a `@type` attribute, but it is only defined in terms of the media type of the response to a GET on the `@href`, not on limiting what can be submitted when POSTing to that URI. The `edit` link relation defined in the Atom publishing protocol doesn't say anything about the interpretation of the `@type` attribute in this context either. The Atom service descriptions do have an `accept` element, but it's not specified how these are located. It would be really good to have an example of an RESTful API that is actually doing this _right_, that we could follow. Presumably you have an example in mind, @dret?
atompub does specify the expected media types in the media type
registration itself (defining the link relations and what clients are
supposed to do when the follow these links). i haven't written the spec,
but i assume the idea was to only specify those media types which are
dynamic at runtime (@accept). service descriptions are discoverable
through "service", which for some reason i still don't understand is
listed in
http://www.iana.org/assignments/link-relations/link-relations.xml as
specified in RFC 5023, when it very clearly isn't. @jasnell may have the
background on this, but i think it became apparent that making service
documents discoverable was a good idea, and adds very little overhead
(just one link relation).
i think overall, atompub gets it right. like you mentioned earlier,
clients need to be coded to support these interaction patterns of a
media type anyway, and because of that, it does not hurt that not all
expectations about media types in link interactions are discoverable at
runtime. only if they are variable there should be a runtime mechanism.
| But anyway, let's explore some possible patterns in an XML world. As @dret said, the first possibility would be for the link relation to implicitly describe what is expected by the endpoint, so when you GET information about a product, it includes a link like:
| <link rel="http://example.com/relation/order"
| href="http://amazon.com/order" />
| and by knowledge of the link relation http://example.com/relation/order (which is presumably defined at that URI, although there's no constraint to make that so within Atom so far as I can tell), an application can work out what it can send to the endpoint. This only works if there aren't endpoint-specific constraints on what's acceptable.
"http://example.com/relation/order" is not a link, it's an identifier
(http://tools.ietf.org/html/rfc5988#section-4.2). clients have knowledge
of the link relations they can traverse (because then implement them),
and other links are meaningless to them. i am not 100% sure what you
mean by "endpoint-specific constraints". if the media type or the
registered link relation specify a media type that is expected when
following that link, then that's what a server should accept. of course
it might reject it because of service aspects (invalid product number in
order), is that what you're referring to?
| A second pattern is that the owner of the web service specifies a media type `application/vnd.order+xml` and that's used in the `@type` attribute, with the link relation `http://example.com/relation/order` specifying that the `@type` attribute indicates the media type of what can be POSTed to the URI in the `@href`:
| <link rel="http://example.com/relation/order"
| href="http://amazon.com/order"
| type="application/vnd.amazon.order+xml" />
i've seen that quite a bit for GET, but not for POST, i think. but it
would work for POSTs as well, as long as the link relation (either in
the media type or in the link relation registration) makes it clear that
@type refers to the request, and not to the response.
| A third possible pattern would be to have the `application/xml` media type specify some media type parameters that enabled people to specify a schema location for and document element of some XML (there are multiple ways to cut that of course; I'm more interested in the pattern of using media type parameters than the niceties of what that would mean for XML). In that case, the link would look like:
| <link rel="http://example.com/relation/order"
| href="http://amazon.com/order"
| type='application/xml;schema="http://amazon.com/schema/order.xsd";root="{http://amazon.com/schema/order}Order"' />
| A fourth possible pattern would be to add `@x:schema` and `@x:root` attributes to Atom's `link` element to provide equivalent information, like this:
| <link rel="http://example.com/relation/order"
| href="http://amazon.com/order"
| type="application/xml"
| x:schema="http://amazon.com/schema/order.xsd"
| x:root="{http://amazon.com/schema/order}Order" />
that i don't like that much because in many cases, media types not just
specify a schema, but also a processing model for the client (how to
handle extensions of the base schema, for example). if all you can
specify is a schema, then you cannot specify a processing model.
| A fifth pattern would be to not define anything on the `link` element itself, but for the documentation of the link relation `http://example.com/relation/order` to state that applications can query on the `@href` URI using the OPTIONS method, and what should be returned in that case, and for that response to specify the constraints. So the document containing the link would have:
| <link rel="http://example.com/relation/order"
| href="http://amazon.com/order" />
| just like in the first example, but doing an OPTIONS request on `http://amazon.com/order` would result in something like:
| <service xmlns="http://www.w3.org/2007/app"
| xmlns:atom="http://www.w3.org/2005/Atom">
| <workspace>
| atom:titleAmazon/atom:title
| <collection href="http://amazon.com/order">
| atom:titleOrders/atom:title
| <accept>application/vnd.amazon.order+xml</accept>
| </collection>
| </workspace>
| </service>
| with of course also the possibility for the `accept` element in this case to follow any of the patterns above.
that would be perfectly legitimate behavior for a media type, making as
many thing runtime as possible. the question is what you're buying with
this pattern, i.e. are you really expecting that clients will support
different order media types and then can maybe specify their supported
media types in the request via accept when they follow the order link.
it's doable, but i have not seen that level of radical openness. i'd say
that typically, media types encode an application scenario and assume
that clients are interaction within that framework. they might define
extension points and places where clients can find additional links, but
within the media type scenario, things are typically designed with
making some decisions design time, and only making those decisions
runtime where there's a specific goal for doing that.
| My hypothesis is that it's impossible in the general case to specify all possible constraints on acceptable POST/PUT entities. Some constraints are going to be unknowable because they depend on the state of the world at submission time (eg are there sufficient items in stock to fulfil the order). Other constraints are going to be endpoint specific (eg is the item of a type that the vendor sells).
of course you should not hardcode the available products into the media
type, that would be a pretty terrible design. but you can hardcode all
the things that make sense for your application scenario, strategically
leaving those things up to runtime that can change at runtime. that's
how you usually design services that are as easy to use as possible, at
least from the SOA point of view.
| So you have to draw the line somewhere. I think as a developer the crucial thing is discoverability: I would prefer to have the link relation/link/endpoint specify `application/xml` plus the schema and document element of the expected XML than for it to specify an unregistered media type of `application/vnd.amazon.order+xml`. But I may have missed some REST theory that states that this is not a good way of specifying constraints?
the crucial this is "understandability", which might be a little bit
different. media types are supposed to be "self-describing" (not in the
semweb sense of the word) in the sense that you see an instance, there
is a way how to find information that helps you to understand what it
means. the media type is the label you start with, and then you go to
the registry and can find the definition.
`application/vnd.amazon.order+xml` should be documented somewhere, and
there's google. that will get you to a document that tells you the
conversational context. if you're just linked to the schema, you can
auto-generate an instance, but you don't understand the conversational
scenario (get a shopping card id, add items to it, get your customer id,
and then submit an order with your shopping cart and customer id). a
service has almost always more context than just one isolated
interaction, and the media type establishes that context. that's why the
important part about atompub is the protocol, and not the schemas (which
are fairly minimal, as a diff with atom).
| This is a gap in the RDF stack (and one that's come up a few times during TAG discussions over the last few days). OWL inference can be run in a "closed world" mode that does a kind of validation. We have SPARQL graph patterns, but using them as a means of validating RDF would be like doing XML validation solely through XPath expressions. It would be nice to have a grammar more like RELAX NG for RDF graphs; I think that Eric Prud'hommeaux is interested in doing something like that, but it would surprise me if there weren't something similar around already from which we could learn.
i think that for any kind of service scenario, validation is essential.
it's the first line of defense, effective when backed by a good schema
language, and thus takes load off the actual service implementation. and
even for the "just POST some RDF graph to an RDF database", i would
guess that in all settings with loose coupling, you would want to have
some control over what people are POSTing.
| We should really be on the LDP mailing list to discuss this rather than here...
now that you're mentioning it ;-) feel free to link to the gist, maybe
for tomorrow's meeting people would like to read some of that. and as
usual, thanks a lot for your great comments!
Hi,
why not use SPARQL (or rather, graph patterns) to describe inputs, outputs and relation between input and output? Most of the current approaches on http://linkedservices.org/ use that type of description.
HATEOAS URIs could just be embedded into the RDF that's returned.
Best regards,
Andreas.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment