-
Request Rate and Performance Considerations
AWS S3 Developer Guide (API Version 2006-03-01) -
How do I ingest a large number of small files from S3? My job looks like it's stalling.
Databricks Cloud support forum thread -
What is the best way to ingest and analyze a large S3 dataset?
Databricks Cloud support forum thread -
How can we get S3DistCp running on DBC?
Databricks Cloud support forum thread -
How do I improve throughput of S3 writes in a Spark Streaming scenario?
Databricks Cloud support forum thread -
Stall on loading many Parquet files on S3
Databricks Cloud support forum thread -
Strategies for reading large numbers of files
Apache Spark Users Mailing List -
Dealing with Hadoop's small files problem
Snowplow Blog Post -
s3-streamlogger
npm package -
Maximizing Amazon S3 Performance
Slide deck from AWS re:Invent 2013 (STG304) -
The Bleeding Edge: Spark, Parquet and S3
AppsFlyer tech blog post by Arnon Rotem-Gal-Oz -
Hadoop and S3: 6 Tips for Top Performance
Mortar Data blog post -
s4cmd
Super S3 command line tool (python) -
AWS EMR S3DistCp
AWS documentation -
fetch_and_combine.py
Sample python script to aggregate Cloudfront logs on S3 -
S3mper: Consistency in the Cloud
Netflix tech blog -
AWS EMRFS
AWS documentation
Created
December 18, 2018 16:59
-
-
Save OlegGorj/6efcf3611c80f292db2e321c4281a084 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment