ADASS BoF - Science Platforms

LSP BOF

LSP - LSST Science Platform. Gregory

Wil does a short intro. Get volunteers to take notes.

Gregory presenting on behalf of LSST.

Three core parts, “aspects”. (Portal Notebook APIs) and unifying infrastructure.

Angry question about APIs. Rational response from Gregory. (From Jupyter and Pandas perspective consider https://arrow.apache.org/)

SciServer ?

Many collaborations

CAS + DAS

Wil asks about batch jobs. They use MPI directly. Larger containers have 48 cores. No horizontal solution, yet.

Quota and Resource question. Not addressing it yet.

ESA ?

Bring the code to the data. Interactive Jupyter notebooks.

On the fly VM with full software environment.

On the fly data processing (predefined data processing threads)

Long term preservation of old software.

Standard mission bulk data processing.

Provide software and sceince data processing capabilities as “service”.

Collaborative platform for data.

Share your data (user disk space inside the archive, VOSpace)

Share your metadata (DB user space inside the archive, …)

Publish your data (VO protocols)

Gaia Archive Architecture

PoC - User workspace (data and metadata). DAS-LT, PLAAVI, GAVIP.

Gathering requirements for SEPP use cases. Design and prototype in 2018.

NOAO

NOAO Data Lab

DECam/Mosaic Image Data.

NOAO All Sky Catalog

Web, notebook, cmdline, APIs.

RESTful.

Support legacy code in containers.

Use established standards, but hide complexity.

Know your limits.

Example notebook - star / galaxy / QSO separation.

STScI Iva

Science Platform.

Increase scientific output from our holdings, new and archival.

Increase turn-around of science (currently ~12 mo)

Connect multi-wavelength data

Provide tools for reproducible research, integrate in workflow.

Look forward to future missions (WFIRST)

MAST Portal.

Tools

MAST API

Astroconda

Hosted: ExoCTK

exoctk.stsci.edu

Hubble Public Data on AWS

120 TB to 140 TB of data.

AMIs with tools and notebooks NEXT TO DATA.

JupyterHub Project

AMIs -> Docker containers.

User auth.

Kubernetes container orchestration.

How can we support collaboration.

Sebastian

CANFAR

2 openstack clouds

38 astro projects

18m jobs on 74k VMs

200TB user storage split into 220 project spaces.

Openstack cloud configuration - but addresses all the pain points.

Raw OpenStack portal with vanilla VMs + Volumes

Web + CLI VOSpace. User Storage and Data Sharing.

Projects (4) using Jupyter

Submitted CFI proposal:

Hybrid cloud / container orch

Jupyter - everything

User DB

Integrated tiered storage.

Use Ceph for Openstack backend (volumes?).

Use github for at least some auth.

IVOA and remote computing

GWS working group.

Working with KDD interest group to define use cases.

Goal: Fast computing - interoperable computing services close to the data.

Interactive vis and compute.

Batch and parallel processing.

Language agnostic (containers?)

ML remote execution prototyping

Train and run algorithm on a full set of data.

Make use of existing VO protocols.

Perform certain steps manually if necessary (moved on to next slide).

Must allow data centers to optimize the approach to execute batch processes.

Combination of API and code may assist with prov and repro.

We encourage experiments and prototypes.

Feedback to IVOA.

Contribute Saturday Morning IVOA Session.

Robert Lupton - question on what it’s all about.

Personal note - Similar to Cyverse’s Discovery Environment. Or at least how it is implemented underneath.

Q & A

Question for all the centers.

This exists. http://wholetale.org/

http://www.nationaldataservice.org/

Personal note - Gregory should talk to Nirav and hear how it works at Cyverse.

Workflow - STScI has a way to approach legacy processing.

jmatt/bof.org