Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
#!/bin/sh | |
############################################# | |
# Output file for HTML5 video # | |
# Requirements: # | |
# - handbrakecli # | |
# # | |
# usage: # | |
# ./html5VideoHandBrakeFolder.sh folder # | |
# # |
Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
[ | |
{ | |
"keys": ["super+b"], | |
"command": "build", | |
"context": [ | |
{ "key": "selector", "operator": "equal", "operand": "source.c++" } | |
], | |
"args": { | |
"build_system": "Packages/C++/C++.sublime-build", | |
"variant": "Build" |
Over the last few years I've been quite involved with using hive for big data analysis.
I've read many web tutorials and blogs about using hadoop/hive/pig for data analysis but all them seem to be over simplified and targeted as a "my first hive query" kind of audience instead of showing how to structure hive tables and queries for real word use cases eg years of data, reoccurring batch jobs to build aggregate/reporting tables and having to deal with late arriving data etc.
Most of these tutorials look something like this
Twitter Data -> hdfs/external hive table external hive table -> hive query -> results.
# Send | |
openssl aes-256-cbc -salt -a -e -in /path/to/file | nc -l 3333 | |
# Receive | |
nc {ip} 3333 | openssl aes-256-cbc -salt -a -d -out /path/to/file |
client | |
dev tun | |
remote example.com | |
resolv-retry infinite | |
nobind | |
persist-key | |
persist-tun | |
ca [inline] | |
cert [inline] | |
key [inline] |
# -*- coding: utf-8 -*- | |
import scrapy | |
from scrapy.http.request import Request | |
from scrapy.selector import Selector | |
import urllib2 | |
import re | |
import PyV8 | |
import json |
<property> | |
<name>hive.vectorized.groupby.flush.percent</name> | |
<value>0.1</value> | |
</property> | |
<property> | |
<name>hive.vectorized.groupby.maxentries</name> | |
<value>10240</value> | |
</property> | |
<property> | |
<name>tez.session.am.dag.submit.timeout.secs</name> |
--Hive expects that you want to calculate your percentiles by account_number and sales | |
--This code will generate an error about a missing GROUP BY statement | |
select | |
account_number, | |
sales, | |
CASE WHEN sales > a.sales_90th_percentile from sales THEN 1 ELSE 0 END as top10pct_sales | |
from sales | |
cross join (select percentile_approx(sales, .9) as sales_90th_percentile from sales) a; |
The regex patterns in this gist are intended only to match web URLs -- http, | |
https, and naked domains like "example.com". For a pattern that attempts to | |
match all URLs, regardless of protocol, see: https://gist.github.com/gruber/249502 | |
# Single-line version: | |
(?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|s |