yarn application -kill application_1428487296152_25597
https://stackoverflow.com/questions/29565716/spark-kill-running-application
// ... | |
def endDate = new Date().clearTime() // today | |
def startDate = endDate - 30 | |
def newDateParsed | |
startDate.upto(endDate) { | |
newDateParsed = it.format("yyyy-MM-dd") | |
println(newDateParsed) | |
yarn application -kill application_1428487296152_25597
https://stackoverflow.com/questions/29565716/spark-kill-running-application
docker run -u 0 -it myImage:tag bash |
IMPORT INTO mytable | |
FROM LOCAL SECURE CSV | |
FILE /file.csv | |
(1, 2 FORMAT = 'YYYY-MM-DD', 3..12) | |
ENCODING = 'ASCII' | |
ROW SEPARATOR = 'LF' | |
SKIP = 1; | |
COMMIT; | |
SELECT * | |
FROM customer | |
WHERE country = '${VAR_COUNTRY}' |
. |
$ split -C 15M --numeric-suffixes --suffix-length=4 input_filename output_filename_without_suffix
creates files like output_filename_without_suffix0001 output_filename_without_suffix0002 output_filename_without_suffix0003 ... each of max size 15 megabytes.
s4cmd is the fastest way I've found (a command-line utility written in Python):
pip install s4cmd
Now to calculate the entire bucket size using multiple threads:
s4cmd du -r s3://bucket-name
#!/bin/bash | |
iterations=100000 | |
fileName=myFile.xml | |
# header | |
echo "<?xml version="1.0" encoding="UTF-8"?> | |
<file> | |
<veryBig> | |
<wantedTag>Hello World</wantedTag> |
#!/usr/bin/env python2.7 | |
# -*- coding: utf-8 -*- | |
""" | |
Written on 2012-12-12 by Philipp Klaus <philipp.l.klaus →AT→ web.de>. | |
Check <https://gist.github.com/4271012> for newer versions. | |
Also check <https://gist.github.com/3155743> for a tool to | |
rename JPEGs according to their EXIF shot time. | |
""" |