Wil does a short intro. Get volunteers to take notes.
Gregory presenting on behalf of LSST.
Three core parts, “aspects”. (Portal Notebook APIs) and unifying infrastructure.
Angry question about APIs. Rational response from Gregory. (From Jupyter and Pandas perspective consider https://arrow.apache.org/)
Many collaborations
CAS + DAS
Wil asks about batch jobs. They use MPI directly. Larger containers have 48 cores. No horizontal solution, yet.
Quota and Resource question. Not addressing it yet.
Bring the code to the data. Interactive Jupyter notebooks.
On the fly VM with full software environment.
On the fly data processing (predefined data processing threads)
Long term preservation of old software.
Standard mission bulk data processing.
Provide software and sceince data processing capabilities as “service”.
-
Collaborative platform for data.
Share your data (user disk space inside the archive, VOSpace)
Share your metadata (DB user space inside the archive, …)
Publish your data (VO protocols)
-
Gaia Archive Architecture
PoC - User workspace (data and metadata). DAS-LT, PLAAVI, GAVIP.
Gathering requirements for SEPP use cases. Design and prototype in 2018.
NOAO Data Lab
DECam/Mosaic Image Data.
NOAO All Sky Catalog
Web, notebook, cmdline, APIs.
RESTful.
Support legacy code in containers.
Use established standards, but hide complexity.
Know your limits.
-
<Details on catalogs and pixel data, virtual stoage, etc>
-
Example notebook - star / galaxy / QSO separation.
Science Platform.
Increase scientific output from our holdings, new and archival.
Increase turn-around of science (currently ~12 mo)
Connect multi-wavelength data
Provide tools for reproducible research, integrate in workflow.
Look forward to future missions (WFIRST)
-
MAST Portal.
-
Tools
MAST API
Astroconda
Hosted: ExoCTK
exoctk.stsci.edu
-
Hubble Public Data on AWS
120 TB to 140 TB of data.
AMIs with tools and notebooks NEXT TO DATA.
JupyterHub Project
AMIs -> Docker containers.
User auth.
Kubernetes container orchestration.
How can we support collaboration.
CANFAR
2 openstack clouds
38 astro projects
18m jobs on 74k VMs
200TB user storage split into 220 project spaces.
-
Openstack cloud configuration - but addresses all the pain points.
-
Raw OpenStack portal with vanilla VMs + Volumes
Web + CLI VOSpace. User Storage and Data Sharing.
Projects (4) using Jupyter
-
Submitted CFI proposal:
Hybrid cloud / container orch
Jupyter - everything
User DB
Integrated tiered storage.
Use Ceph for Openstack backend (volumes?).
Use github for at least some auth.
GWS working group.
Working with KDD interest group to define use cases.
Goal: Fast computing - interoperable computing services close to the data.
Interactive vis and compute.
Batch and parallel processing.
Language agnostic (containers?)
-
ML remote execution prototyping
Train and run algorithm on a full set of data.
Make use of existing VO protocols.
Perform certain steps manually if necessary (moved on to next slide).
-
Must allow data centers to optimize the approach to execute batch processes.
Combination of API and code may assist with prov and repro.
We encourage experiments and prototypes.
Feedback to IVOA.
Contribute Saturday Morning IVOA Session.
Robert Lupton - question on what it’s all about.
Personal note - Similar to Cyverse’s Discovery Environment. Or at least how it is implemented underneath.
Question for all the centers.
This exists. http://wholetale.org/
http://www.nationaldataservice.org/
Personal note - Gregory should talk to Nirav and hear how it works at Cyverse.
Workflow - STScI has a way to approach legacy processing.