Skip to content

Instantly share code, notes, and snippets.

@kalxas
Last active September 14, 2021 10:19
Show Gist options
  • Save kalxas/6ecb06d61cdd487dc7f9 to your computer and use it in GitHub Desktop.
Save kalxas/6ecb06d61cdd487dc7f9 to your computer and use it in GitHub Desktop.
Data.gov CSW HowTo v2.0
Based on copy of https://gist.github.com/kalxas/5ab6237b4163b0fdc930 on 20 June 2014 12:27 UTC

Data.gov Catalogue Service for the Web (CSW) API

CSW (Catalogue Service for the Web)

CSW (Catalogue Service for the Web) is an OGC (Open Geospatial Consortium) specification that defines common interfaces to discover, browse, and query metadata about data, services, and other potential resources.

Data.gov provides access to its catalog via the CSW standard for both first-order and all metadata for harvested data, services and applications. The first-order CSW endpoint provides collection level filtering of all metadata records. The all metadata CSW endpoint provides all levels of metadata at varying levels of granularity.

Any client supporting CSW (desktop, GIS, web application, client library, etc.) can integrate the Data.gov CSW endpoints.

Interacting with the Data.gov CSW

The Data.gov CSW endpoints are implemented using pycsw. pycsw is an OGC CSW server implementation written in Python, enabling the publishing and discovery of data and services, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Certified OGC Compliant and is an OGC Reference Implementation.

Data.gov CSW Endpoints

The Data.gov CSW endpoints support the OGC CSW 2.0.2 standard as well as the ISO Metadata Application 1.0.0 Profile. The CSW endpoints operate over HTTP GET, POST (XML) and SOAP.

Making HTTP POST (XML) requests

While making HTTP GET requests is relatively straightforward, HTTP POST (XML) requests require the client to open the connection and send the request XML as the payload. Below are a few examples on how to run HTTP POST (XML) requests on the command line:

# assuming XML request is saved to csw-request.xml

# curl
curl -X POST -d @csw-request.xml http://catalog.data.gov/csw

# lwp-request
cat csw-request.xml | POST http://catalog.data.gov/csw

# wget
wget http://catalog.data.gov/csw --post-file=csw-request.xml

Service Endpoints

First-order metadata: http://catalog.data.gov/csw

All metadata: http://catalog.data.gov/csw-all

We will use the first-order CSW endpoint for the examples below.

GetCapabilities

The GetCapabilities operation provides CSW clients with service metadata about the CSW service as an XML document.

Examples:

HTTP GET: http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=GetCapabilities

HTTP POST (XML):

<csw:GetCapabilities xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ows="http://www.opengis.net/ows" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" service="CSW">
  <ows:AcceptVersions>
    <ows:Version>2.0.2</ows:Version>
  </ows:AcceptVersions>
  <ows:AcceptFormats>
    <ows:OutputFormat>application/xml</ows:OutputFormat>
  </ows:AcceptFormats>
</csw:GetCapabilities>

DescribeRecord

The DescribeRecord operation provides CSW clients with elements of supported information models of the CSW service.

Examples:

HTTP GET: http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=DescribeRecord

HTTP POST (XML):

<csw:DescribeRecord service="CSW" version="2.0.2" outputFormat="application/xml" schemaLanguage="http://www.w3.org/XML/Schema" xmlns="http://www.opengis.net/cat/csw/2.0.2" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
  <csw:TypeName>csw:Record</csw:TypeName>
</csw:DescribeRecord>

GetDomain

The GetDomain operation provides an interface to return all possible values for a given metadata property/queryable or parameter.

Examples:

HTTP GET: http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=GetDomain&propertyname=dc:type

http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=GetDomain&parametername=GetRecords.outputSchema

HTTP POST (XML):

<csw:GetDomain xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" version="2.0.2" service="CSW">
  <csw:PropertyName>dc:type</csw:PropertyName>
</csw:GetDomain>

<csw:GetDomain xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" version="2.0.2" service="CSW">
  <csw:ParameterName>GetRecords.outputSchema</csw:ParameterName>
</csw:GetDomain>

GetRecords

The GetRecords operation provides a query interface to search for data both using spatial predicates as well as attribute / temporal queries, or both. GetRecords queries are best invoked as HTTP POST (XML). Examples:

Notes:

  • adjust the startPosition and maxRecords parameters to customize / page responses
  • adjust the optional outputSchema parameter to customize output format (default is Dublin Core [http://www.opengis.net/cat/csw/2.0.2], use http://www.isotc211.org/2005/gmd for ISO)
  • bounding box queries and responses always use axis order latitude longitude
  • adjust the optional csw:ElementSetName parameter (brief, summary [default], full) to adjust verbosity of metadata record responses. The following table provides an overview of the elements returned:
Returnable elements
csw:ElementSetName Elements Returned
brief dc:identifier, dc:title, dc:type, ows:BoundingBox
summary brief + dc:format, dc:subject, dct:modified, dc:abstract, dct:references
full summary + dc:date, dc:creator, dc:publisher, dc:contributor, dc:source, dc:language, dc:rights

Query all records, return records 1 - 15:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="15" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>full</csw:ElementSetName>
  </csw:Query>
</csw:GetRecords>

Query records with a bounding box:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="5" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" xmlns:gml="http://www.opengis.net/gml">
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>brief</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:BBOX>
          <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
          <gml:Envelope>
            <gml:lowerCorner>47 -5</gml:lowerCorner>
            <gml:upperCorner>55 20</gml:upperCorner>
          </gml:Envelope>
        </ogc:BBOX>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>

Query records by attribute:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="10" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0">
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>brief</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:PropertyIsLike matchCase="false" wildCard="%" singleChar="_" escapeChar="\">
          <ogc:PropertyName>dc:title</ogc:PropertyName>
          <ogc:Literal>roads%</ogc:Literal>
        </ogc:PropertyIsLike>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>

Query records by full text search:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="5" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>brief</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:PropertyIsEqualTo>
          <ogc:PropertyName>csw:AnyText</ogc:PropertyName>
          <ogc:Literal>roads</ogc:Literal>
        </ogc:PropertyIsEqualTo>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>

and part of the response:

<!-- pycsw 1.8.0 -->
<csw:GetRecordsResponse version="2.0.2" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
  <csw:SearchStatus timestamp="2014-03-17T18:09:43Z"/>
  <csw:SearchResults nextRecord="6" numberOfRecordsMatched="27724" numberOfRecordsReturned="5" recordSchema="http://www.opengis.net/cat/csw/2.0.2" elementSet="brief">
  ......

Query records by combined bounding box and full text search:

<csw:GetRecords xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:gml="http://www.opengis.net/gml" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="5" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>brief</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:And>
          <ogc:PropertyIsEqualTo>
            <ogc:PropertyName>csw:AnyText</ogc:PropertyName>
            <ogc:Literal>roads</ogc:Literal>
          </ogc:PropertyIsEqualTo>
          <ogc:BBOX>
            <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
            <gml:Envelope>
              <gml:lowerCorner>47 -5</gml:lowerCorner>
              <gml:upperCorner>55 20</gml:upperCorner>
            </gml:Envelope>
          </ogc:BBOX>
        </ogc:And>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>

Query the total number of records in the catalogue (HTTP GET): http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementsetname=brief

GetRecordById

The GetRecordById operation returns defailed information for specific metadata records.

Examples:

HTTP GET: http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=GetRecordById&id=16bbf4f8-8e88-45c6-a76b-6af51b2b3555&elementsetname=full

HTTP GET (ISO 19139): http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=GetRecordById&id=16bbf4f8-8e88-45c6-a76b-6af51b2b3555&elementsetname=full&outputSchema=http://www.isotc211.org/2005/gmd

HTTP GET (ISO 19139 brief): http://catalog.data.gov/csw?service=CSW&version=2.0.2&request=GetRecordById&id=16bbf4f8-8e88-45c6-a76b-6af51b2b3555&elementsetname=brief&outputSchema=http://www.isotc211.org/2005/gmd

HTTP POST (XML):

<csw:GetRecordById service="CSW" version="2.0.2" outputFormat="application/xml" outputSchema="http://www.opengis.net/cat/csw/2.0.2" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd">
  <csw:Id>16bbf4f8-8e88-45c6-a76b-6af51b2b3555</csw:Id>
  <csw:ElementSetName>full</csw:ElementSetName>
</csw:GetRecordById>

Searching Collections

The Data.gov CSW supports searches within for collections. The following query searches for all records that are part of a collection via the apiso:ParentIdentifier core queryable:

<csw:GetRecords xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:ogc="http://www.opengis.net/ogc" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="5" outputFormat="application/xml" outputSchema="http://www.isotc211.org/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" xmlns:gmd="http://www.isotc211.org/2005/gmd">
  <csw:Query typeNames="csw:Record">
    <csw:ElementSetName>brief</csw:ElementSetName>
    <csw:Constraint version="1.1.0">
      <ogc:Filter>
        <ogc:PropertyIsEqualTo>
          <ogc:PropertyName>apiso:ParentIdentifier</ogc:PropertyName>
          <ogc:Literal>da679621-ccd6-49aa-a6db-75fc791a4833</ogc:Literal>
        </ogc:PropertyIsEqualTo>
      </ogc:Filter>
    </csw:Constraint>
  </csw:Query>
</csw:GetRecords>

OpenSearch Geo and Time Extensions

The Data.gov CSW endpoints support the OGC OpenSearch Geo and Time Extensions 1.0 standard. This provides a lightweight mechanism to query the catalogue via HTTP GET for simple spatial and temporal searches.

To query the Data.gov via OpenSearch, requests must be specificed with mode=opensearch. The following parameters are supported:

  • {searchTerms} (keywords)
  • {geo:box} (bounding box of minx,miny,maxx,maxy)
  • {time:start} and {time:end} (temporal)

Specifying combinations of these parameters constitutes an exclusive search.

Examples:

Autodiscovery: http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetCapabilities

Keywords: http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&q=idaho

Bounding Box: http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&bbox=-140,20,-40,40

Temporal Extent (range): http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&time=2001/2004

Temporal Extent (since 2004): http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&time=2004/

Temporal Extent (before 2004): http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&time=/2004

Temporal Extent (2001/2007) and Keywords: http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&time=2001/2007&q=fauna

Temporal Extent (2001/2007) and Bounding Box: http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&time=2001/2007&bbox=-140,20,-40,40

Keywords and Bounding Box: http://catalog.data.gov/csw?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&resulttype=results&typenames=csw:Record&q=vegetation&bbox=-140,20,-40,40

@fellahst
Copy link

Is there a way to get all updated records since a given date/time? I like to harvest the CSW every day, but to optimize the harvesting, I like to get only the updated records every day instead of harvesting the whole catalog every day and perform filtering on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment