Jorge belenaj

yarn application -kill application_1428487296152_25597

https://stackoverflow.com/questions/29565716/spark-kill-running-application

 $ split -C 15M --numeric-suffixes --suffix-length=4 input_filename output_filename_without_suffix

creates files like output_filename_without_suffix0001 output_filename_without_suffix0002 output_filename_without_suffix0003 ... each of max size 15 megabytes.

More here: https://stackoverflow.com/questions/2016894/how-to-split-a-large-text-file-into-smaller-files-with-equal-number-of-lines

s4cmd is the fastest way I've found (a command-line utility written in Python):

pip install s4cmd

Now to calculate the entire bucket size using multiple threads:

s4cmd du -r s3://bucket-name

	// ...

	def endDate = new Date().clearTime() // today
	def startDate = endDate - 30
	def newDateParsed

	startDate.upto(endDate) {
	newDateParsed = it.format("yyyy-MM-dd")
	println(newDateParsed)

	IMPORT INTO mytable
	FROM LOCAL SECURE CSV
	FILE /file.csv
	(1, 2 FORMAT = 'YYYY-MM-DD', 3..12)
	ENCODING = 'ASCII'
	ROW SEPARATOR = 'LF'
	SKIP = 1;
	COMMIT;

	#!/bin/bash

	iterations=100000
	fileName=myFile.xml

	# header
	echo "<?xml version="1.0" encoding="UTF-8"?>
	<file>
	<veryBig>
	<wantedTag>Hello World</wantedTag>

-#!/usr/bin/env python2.7
-# -*- coding: utf-8 -*-
-"""
-Written on 2012-12-12 by Philipp Klaus <philipp.l.klaus →AT→ web.de>.
-Check <https://gist.github.com/4271012> for newer versions.
-Also check <https://gist.github.com/3155743> for a tool to
-rename JPEGs according to their EXIF shot time.
-"""

	SELECT *
	FROM customer
	WHERE country = '${VAR_COUNTRY}'