Skip to content

Instantly share code, notes, and snippets.

@brianv0
Last active June 3, 2016 20:20
Show Gist options
  • Save brianv0/df71fcbea9f801ac1ad0240b89699774 to your computer and use it in GitHub Desktop.
Save brianv0/df71fcbea9f801ac1ad0240b89699774 to your computer and use it in GitHub Desktop.
IVOA Summary

IVOA Meeting Summary

Data Interfaces

LSST has several interests in IVOA interfaces and their implementations. First off, it is imperative that we support standardized interfaces for data access so that our users can reuse community-standard tools, such as TOPCAT. Secondly, HTTP interfaces are an extremely flexible way of providing and managing data access both inside a datacenter and across the internet, and the IVOA has several standards we could potentially leverage where they make sense. Finally, it's in our best interest to reuse data access interface implementations and code where we can in order to reduce our workload and hopefully improve the existing body of work where we can.

When evaluating the IVOA standards, the data access team has two specific use cases to accomodate. The first is to accomodate the SUIT team with relevant interfaces to access catalog data and images. The second use case is to support standard tools such as TOPCAT

Of the many data access interfaces, LSST's Data Access group is primarily be interested in two: TAP and Simple Image Access.

Implementations

IRSA/IPAC

IRSA is likely the largest repository of VO-accessible data in terms of data volume, as many legacy catalogs are available in addition to WISE data. Most of the VO code is in C++.

SIA (v1)
  • ~10k queries a day for most of 2015, bursting up to 30k average since February.
SCS
  • 150 - 300 unique IPs across 100-150 subnets.
  • 15k average queries/day, with a spike up to 225k queries a day for a few months (one user)
  • Small search radius queries are ran in-process, larger queries farmed out to SLURM.
TAP
  • ~80k-120k average queries/day over a year, mostly North America
    • Completely dominated by one NEOWISE-R user, otherwise average of 300-10k queries a day.
  • sync: 20-170 IPs, 10-70 subnets
  • async: 21 IPs, 12 subnets
    • Germans (ab)used this to tile WISE data in search for Planet Nine
    • async not used as much as sync
ADQL

IRSA plans on submitting a variant of ADQL with restricted geometry for standardization (could be useful to collaborate here)

Registry

Due to the variety of data they have across several experiments, they have decided to run their own registry, but have had quite a bit of pain in doing so.

CADC

The Data Center in CADC is heavily built around VO interfaces. Most of the software is implemented in Java, and quite a bit of work goes into implementing most of the VO standards wherever possible. In general, some people use CADC implementations are reference implementations (e.g. NED)

SIA v2

SIA v2 is heavily oriented around SODA which is a standard that more formally describes how transformations, or more generally, server actions, can be implemented. They do have a production SODA implementation which can operate on multi-d data, reducing data according to spatial, spectral, temporal and polarization parameters.

TAP

CADC is currently implementing TAP 1.1 as the standard progresses. They export both CAOM2 and ObsCore 1.1 archive metadata through their TAP interfaces. They have additional endpoints for the same tap service which are specific to authentication schemes (i.e. anon, username+password, x509)

Other news from IVOA

CDS/VizieR

CDS has put a lot of work into their web portal. Would be interesting for SUIT team:

CASDA

CASDA (CSIRO ASKAP Science Data Archive) is implementing SODA for their images, but they've only implemented async as they have quite a bit of data on tape. Their service returns full images or can perform spatial and plane cutouts.

Cosmopterix

  • Docker containers for database platforms. This is intended to quickly get up, test, and/or validate TAP and ADQL platforms

Data Formats

  • Tom McGlynn had a good talk on issues with VOTable validation.
  • Mark Taylor is working on TAPLint to validate TAP services; this could be extremely useful for us as we're working on our own implementation.
  • Consensus that HDF5 is important, no consensus on what to do about it in the current term.

Collaborations

Qserv

  • Stelios Voutsinas, Dave Morris (both at Edinburg) are interested in Qserv for Cosmopterix

  • Dave Morris is interested in running a Qserv Cluster for testing

    • Both were notified this may be possible around fall
  • Stelios and Dave also have history of ADQL queries from their services. I believe Gregory Mantelet (GAVO) does too.

    • We would like to mine that data and understand common query patterns, maybe use a parser to identify structure. This could help inform Qserv team as well as future ADQL implementations
  • Matthew Graham interested in Qserv as well (not sure if this is for Caltech or AURA)

TAP

  • Mark Taylor says if LSST implemented TAP many many people would bereally happy

    • Doesn't think it makes sense to bother with SIA right now because dust hasn't settled
  • Consensus that you can go ahead and implement whichever response formats you want from TAP, just understand they might not become a standard

  • Near universal agreement there should be some JSON output from TAP

    • But no agreement on if that should be 1:1 mapping from VOTable+XML, or should be more of enhanced version of CSV output
  • Walter Landry has some statistics on the popularity of IPAC's catalogs

    • Walter is curious about Simple Cone Search response time (Serge advertised 30ms)
  • No real traction for officially adding LIMIT keyword to ADQL (TOP is descended from SQL Server)

  • Christophe Arviset enthusiastically asked during talk about LSST contributions to (TAP/ADQL) specifications. I mentioned it's probably too late in this cycle of TAP/ADQL to doo much about anything, and that TAP 2/ADQL 3 wouldbe our target if anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment