- Second price auctions (2PA) are a type of auction where the highest bidder pays the second highest bid
- In contrast to first price auctions (FPA), where the highest bidder pays her own bid
- Why 2PA work better than FPA
Hive and postgres handle where vs on clauses differently. Postgres' query engine is smarter: where and on clause joins will be handled the same. In Hive, where clause is more efficient than on clause.
On clause: In stage 1, pulls in ~400MM records; takes ~13 minutes to execute
Where clause: In stage 1, pulls in ~60MM records; takes ~5 minutes to execute
Postgres uses an MVCC (Multiversion concurrency control) model (as opposed to table locking)
So: when you run an update, it's essentially doubling the size of the table, because a new snapshot is being created.
| # Parse JSON data with this one weird trick! | |
| from pyspark import SparkContext | |
| from pyspark import SparkConf | |
| from pyspark.sql import SQLContext | |
| from pyspark.sql import Row | |
| # Set up basic spark session | |
| conf = (SparkConf() | |
| .setAppName('My App') |
Functions have arity: n of arguments they take [source]