- Second price auctions (2PA) are a type of auction where the highest bidder pays the second highest bid
- In contrast to first price auctions (FPA), where the highest bidder pays her own bid
- Why 2PA work better than FPA
Hive and postgres handle where
vs on
clauses differently. Postgres' query engine is smarter: where
and on
clause joins will be handled the same. In Hive, where
clause is more efficient than on
clause.
On
clause: In stage 1, pulls in ~400MM records; takes ~13 minutes to execute
Where
clause: In stage 1, pulls in ~60MM records; takes ~5 minutes to execute
Postgres uses an MVCC (Multiversion concurrency control) model (as opposed to table locking)
So: when you run an update, it's essentially doubling the size of the table, because a new snapshot is being created.
# Parse JSON data with this one weird trick! | |
from pyspark import SparkContext | |
from pyspark import SparkConf | |
from pyspark.sql import SQLContext | |
from pyspark.sql import Row | |
# Set up basic spark session | |
conf = (SparkConf() | |
.setAppName('My App') |
Functions have arity: n of arguments they take [source]