OlegGorj/Reading and Writing Event Streams to S3.md

Created December 18, 2018 16:59

Star (0) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/OlegGorj/6efcf3611c80f292db2e321c4281a084.js"></script>
Save OlegGorj/6efcf3611c80f292db2e321c4281a084 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

Reading and Writing Event Streams to S3.md

Request Rate and Performance Considerations
AWS S3 Developer Guide (API Version 2006-03-01)
How do I ingest a large number of small files from S3? My job looks like it's stalling.
Databricks Cloud support forum thread
What is the best way to ingest and analyze a large S3 dataset?
Databricks Cloud support forum thread
How can we get S3DistCp running on DBC?
Databricks Cloud support forum thread
How do I improve throughput of S3 writes in a Spark Streaming scenario?
Databricks Cloud support forum thread
Stall on loading many Parquet files on S3
Databricks Cloud support forum thread
Strategies for reading large numbers of files
Apache Spark Users Mailing List
Dealing with Hadoop's small files problem
Snowplow Blog Post
s3-streamlogger
npm package
Maximizing Amazon S3 Performance
Slide deck from AWS re:Invent 2013 (STG304)
The Bleeding Edge: Spark, Parquet and S3
AppsFlyer tech blog post by Arnon Rotem-Gal-Oz
Using S3DistCP to Merge Many Small S3 Files
Hadoop and S3: 6 Tips for Top Performance
Mortar Data blog post
s4cmd
Super S3 command line tool (python)
The Open Guide to Amazon Web Services: S3
AWS EMR S3DistCp
AWS documentation
fetch_and_combine.py
Sample python script to aggregate Cloudfront logs on S3
S3mper: Consistency in the Cloud
Netflix tech blog
AWS EMRFS
AWS documentation
Are We Consistent Yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment