yarn application -kill application_1428487296152_25597
https://stackoverflow.com/questions/29565716/spark-kill-running-application
| // ... | |
| def endDate = new Date().clearTime() // today | |
| def startDate = endDate - 30 | |
| def newDateParsed | |
| startDate.upto(endDate) { | |
| newDateParsed = it.format("yyyy-MM-dd") | |
| println(newDateParsed) | |
yarn application -kill application_1428487296152_25597
https://stackoverflow.com/questions/29565716/spark-kill-running-application
| docker run -u 0 -it myImage:tag bash |
| IMPORT INTO mytable | |
| FROM LOCAL SECURE CSV | |
| FILE /file.csv | |
| (1, 2 FORMAT = 'YYYY-MM-DD', 3..12) | |
| ENCODING = 'ASCII' | |
| ROW SEPARATOR = 'LF' | |
| SKIP = 1; | |
| COMMIT; | |
| SELECT * | |
| FROM customer | |
| WHERE country = '${VAR_COUNTRY}' |
| . |
$ split -C 15M --numeric-suffixes --suffix-length=4 input_filename output_filename_without_suffix
creates files like output_filename_without_suffix0001 output_filename_without_suffix0002 output_filename_without_suffix0003 ... each of max size 15 megabytes.
s4cmd is the fastest way I've found (a command-line utility written in Python):
pip install s4cmd
Now to calculate the entire bucket size using multiple threads:
s4cmd du -r s3://bucket-name
| #!/bin/bash | |
| iterations=100000 | |
| fileName=myFile.xml | |
| # header | |
| echo "<?xml version="1.0" encoding="UTF-8"?> | |
| <file> | |
| <veryBig> | |
| <wantedTag>Hello World</wantedTag> |
| #!/usr/bin/env python2.7 | |
| # -*- coding: utf-8 -*- | |
| """ | |
| Written on 2012-12-12 by Philipp Klaus <philipp.l.klaus →AT→ web.de>. | |
| Check <https://gist.github.com/4271012> for newer versions. | |
| Also check <https://gist.github.com/3155743> for a tool to | |
| rename JPEGs according to their EXIF shot time. | |
| """ |