Notes in preparation for a blog post or article.
- Whitespace in the original document
- As entered, or introduced/manipulated by an editor's functions, such as oXygen's "Format and indent" (and settings thereof), or features, such as Atom's default behavior of trimming trailing whitespace
- How the document is exposed to the XQuery processor
- Fetched (get) via a query from the network (
doc()
,hc:send-request()
) or file system (file:read()
,xmldb:store-files-from-pattern()
) - Uploaded (put) to the database and stored via REST, WebDAV, XML-RPC
- Included inline in a query, as an in-memory node; whitespace in in-memory nodes is affected by
boundary-space strip|preserve
(default isstrip
)
- Fetched (get) via a query from the network (
- How the document (or the results of a query based on it) is serialized
- The processor's default serialization settings, particularly indentation: conf.xml's default is
serializer/@indent='yes'
- Serialization declarations in a query's prolog:
output:indent "yes|no"
orexist:serialize "indent=yes|no"
- Serialization parameters in functions that serialize, e.g.,
fn:serialize()
,file:serialize()
,response:serialize()
,transform:transform()
- Network interface (e.g., REST, WebDAV, XML-RPC) defaults
- WebDAV serialization defaults to
indent=yes
, can be overridden in$EXIST_HOME/extensions/webdav/webdav.properties
; see https://github.com/eXist-db/exist/blob/develop/extensions/webdav/webdav.properties and https://github.com/eXist-db/exist/blob/develop/extensions/webdav/src/org/exist/webdav/ExistResourceFactory.java#L65-L74 - See also:
- https://github.com/eXist-db/exist/blob/develop/src/org/exist/util/serializer/IndentingXMLWriter.java
- https://github.com/eXist-db/exist/blob/develop/test/src/xquery/serializer.xql
- https://github.com/eXist-db/exist/tree/develop/samples/xmlrpc
- https://github.com/eXist-db/exist/search?p=1&q=indent&type=&utf8=%E2%9C%93
- WebDAV serialization defaults to
- The processor's default serialization settings, particularly indentation: conf.xml's default is
- How clients display data from the server
- Web browsers apply HTML whitespace rules to HTML documents
- Clients may "pretty print" XML nodes
- Be explicit and conscious of whitespace in source documents, especially in mixed content
- If using in-memory nodes, be conscious of
boundary-space
default setting,strip
- Be conscious of default indent settings. If
boundary-space strip
and serializerindent=yes
this can make a document appear as if it has whitespace, when it really doesn't indent=yes
can introduce whitespace; saving a document back into the database can result in extra whitespace
- Does
indent=no
ever completely strip whitespace where it was, or does it just collapse whitespace to a single character? - Does
indent=yes
ever insert whitespace where there wasn't any, or does it just expand existing whitespace? - What role does the indexer's whitespace settings (
suppress-whitespace=leading|trailing|both|none
orpreserve-whitespace-mixed-content=yes|no
) play? Under what scenarios does it affect the results of a query? - What role does the parser play? Presumably none, except for resolving entities?
In the case of eXide, it also helps to set indent=no in eXide/modules/load.xql. With this it seems eXide perseveres everything as intended (and indented :) ).