Jed Sundwall jedsundwall

FAIR data principles and distributed computing
data is stored in the cloud and can be directly read or queried with http read requests
Exploiting large geospatial datasets in the cloud in an optimized way by transmitting as few bytes as possible.
efficient with cloud storage
Ability to scale up/out geospatial analyses to cloud scale more easily
Big Data, tiled processing, STAC, portable/scalable workflows, COG
Technologies that are designed to work well in the cloud.
Less configuration
The data (and analytics - which is not yet achieved) moves from Desktop computers to clouds (plural), where they can be accessed using cloud services by expert but also non-expert users.
To work on cloud without lift and shift (I.e. spinning up a VM on cloud)

Data Location

The Common Crawl dataset lives on Amazon S3 as part of the Amazon Public Datasets program. Downloading them is free from any instance on Amazon EC2, both via S3 and HTTP.

As the Common Crawl Foundation has evolved over the years, so has the format and metadata that accompany the crawls themselves.

[ARC] Archived Crawl #1 - s3://commoncrawl/crawl-001/ - crawl data from 2008/2010
[ARC] Archived Crawl #2 - s3://commoncrawl/crawl-002/ - crawl data from 2009/2010
[ARC] Archived Crawl #3 - s3://commoncrawl/parse-output/ - crawl data from 2012
[WARC] s3://commoncrawl/crawl-data/CC-MAIN-2013-20/

We are experimenting with providing Global Forecast System (GFS) Model and High-Resolution Rapid Refresh (HRRR) Model data publicly available on Amazon S3. This Gist describes where to find the data and how it's organized. To work with the data, use any of AWS's various SDKs or Command Line Interface.

GFS

A rolling four-week archive of 0.25 degree GFS data is available in s3://noaa-gfs-pds.

Browse the data in your browser at http://awsopendata.s3-website-us-west-2.amazonaws.com/noaa-gfs/

HRRR

How to create a publicly-accessible SNS topic that sends messages when objects are added to a public Amazon S3 bucket.

1. Create something within AWS that triggers notifications.

In this case, that's an S3 bucket that is continually updated by the addition of new sensor data. For the purposes of this tutorial, we’ll use s3://noaa-nexrad-level2 – one of our NEXRAD on AWS buckets – as an example.

2. Create an SNS topic and appropriate policy.

The SNS topic should be in the same region as the bucket. It will need to have a policy that allows our S3 bucket to publish to it, and anyone to subscribe to it using Lambda or SQS.

Best Of Music You Can Dance To In 2012

Mix by Joakim Bouaziz

Tracklist

Jai Paul – Jasmine
Surahn – Take Your Time
Tanner Ross – Straight To Mexico
Kindness – House
Zombie Zombie – The Wisdom of Stones

	{
	"AWSTemplateFormatVersion": "2010-09-09",
	"Description": "This template creates the AWS infrastructure to publish a public data set on S3. It creates an S3 bucket for the dataset, an S3 bucket for access logs, and a policy to allows the Amazon Public Data Set program to read the logs and the public to read the dataset.",
	"Outputs": {},
	"Parameters": {
	"DataSetName": {
	"AllowedPattern": "[a-z0-9\\.\\-_]*",
	"ConstraintDescription": "may only contain lowercase letters, numbers, and ., -, or _ characters",
	"Description": "The name of the dataset's S3 bucket. This will be used to create the dataset and log S3 bucket.",
	"MaxLength": "250",

	{
	"Version": "2012-10-17",
	"Id": "BUCKET_NAME-pds-policy",
	"Statement": [
	{
	"Effect": "Allow",
	"Principal": "*",
	"Action": [
	"s3:List*",
	"s3:Get*"

	while read p; do
	mkdir -p $p
	done <urls.txt

	<markdown>
	Post content
	</markdown>