- Don’t
SELECT *
, Specify explicit column names (columnar store) - Avoid large JOINs (filter each table first)
- In PRESTO tables are joined in the order they are listed!!
- Join small tables earlier in the plan and leave larger fact tables to the end
- Avoid cross joins or 1 to many joins as these can degrade performance
- Order by and group by take time
- only use order by in subqueries if it is really necessary
- When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
How to search in all countries *but* the US (or any other for that matter)? | |
Linkedin Country codes: https://developer.linkedin.com/docs/reference/country-codes# | |
Linkedin faceted search url format: %5B"ca%3A0"%2C"au%3A0"%2C"es%3A0"%5D | |
Decoded URL: ["ca:0","au:0","es:0"] | |
=> Complete list for injection in url (remove the country you want to exclude): | |
["ae:0","ar:0","at:0","au:0","be:0","br:0","ca:0","ch:0","cl:0","cn:0","co:0","cz:0","de:0","dk:0","es:0","fi:0","fr:0","fx:0","gb:0","gr:0","hk:0","hr:0","hu:0","id:0","ie:0","il:0","in:0","is:0","it:0","jp:0","lb:0","lu:0","lv:0","ma:0","mc:0","mx:0","my:0","nl:0","no:0","nz:0","oo:0","pe:0","ph:0","pk:0","pl:0","pr:0","pt:0","py:0","qa:0","ro:0","ru:0","sa:0","se:0","sg:0","sk:0","th:0","tr:0","tw:0","ua:0","us:0","uy:0","ve:0","vn:0","yu:0","za:0"] |
from selenium import webdriver | |
from selenium.webdriver.common.proxy import Proxy | |
from selenium.webdriver.common.keys import Keys | |
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities | |
from selenium.webdriver.chrome.options import Options | |
import zipfile,os | |
def proxy_chrome(PROXY_HOST,PROXY_PORT,PROXY_USER,PROXY_PASS): | |
manifest_json = """ | |
{ |
#!/usr/bin/python | |
# -*- coding: utf-8 -*- | |
'''To use gzip file between python application and S3 directly for Python3. | |
Python 2 version - https://gist.github.com/a-hisame/f90815f4fae695ad3f16cb48a81ec06e | |
''' | |
import io | |
import gzip | |
import json |
I had a really interesting journey today with a thorny little challenge I had while trying to delete all the files in a s3 bucket with tons of nested files.
The bucket path (s3://buffer-data/emr/logs/
) contained log files created by ElasticMapReduce jobs that ran every day over a couple of years (from early 2015 to early 2018).
Each EMR job would run hourly every day, firing up a cluster of machines and each machine would output it's logs. That resulted thousands of nested paths (one for each job) that contained thousands of other files. I estimated that the total number of nested files would be between 5-10 million.
I had to estimate this number by looking at samples counts of some of the nested directories, because getting the true count would mean having to recurse through the whole s3 tree which was just too slow. This is also exactly why it was challenging to delete all the files.
Deleting all the files in a s3 object like this is pretty challenging, since s3 doesn't really work like a true f
jmails.info | |
sacustomerdelight.co.in | |
extrobuzzapp.com | |
ixigo.info | |
offer4uhub.com | |
netecart.com | |
101coupon.in | |
freedealcode.in | |
bankmarket.in | |
hotoffers.co.in |
service: service-name | |
provider: | |
name: aws | |
runtime: nodejs6.10 | |
functions: | |
myfunc: | |
handler: handler.myfunc |
/* | |
// AdWords Script: Put Data From AdWords Report In Google Sheets | |
// -------------------------------------------------------------- | |
// Copyright 2017 Optmyzr Inc., All Rights Reserved | |
// | |
// This script takes a Google spreadsheet as input. Based on the column headers, data filters, and date range specified | |
// on this sheet, it will generate different reports. | |
// | |
// The goal is to let users create custom automatic reports with AdWords data that they can then include in an automated reporting | |
// tool like the one offered by Optmyzr. |