- herding cats
- is the juice worth the squeeze
- when you got a hammer everything looks like a nail
- that's a solution looking for a problem
- works on my machine!
- two birds with one stone
Experiments beyond Java to create pipelines that are semantically more familiar to sql developers, functional programmers, and others with big data backgrounds.
The dream is we can make pipelines in less time and make them easier to read. This will bring value faster and lower our maintenance costs in the long run.
The best way to explain this is with an example. We take a simple made up model of orders and refunds. An order can have 0 to N refunds. A customer can have 0 to N orders. We want to total the amount a
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package example.scala | |
import com.spotify.scio._ | |
import com.spotify.scio.extra.json._ | |
case class Orders(orders: List[Order]) | |
case class Order(order_id:String, customer_id:String, order_amt:Long) | |
case class Refunds(refunds: List[Refund]) | |
case class Refund(refund_order_id:String, original_order_id:String, customer_id:String, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package example.java; | |
import com.fasterxml.jackson.databind.ObjectMapper; | |
import com.google.common.collect.Iterables; | |
import org.apache.beam.sdk.Pipeline; | |
import org.apache.beam.sdk.io.TextIO; | |
import org.apache.beam.sdk.options.PipelineOptions; | |
import org.apache.beam.sdk.options.PipelineOptionsFactory; | |
import org.apache.beam.sdk.transforms.Combine; | |
import org.apache.beam.sdk.transforms.DoFn; |
Exisiting problems:
- the way StackGroups are laid out there is a lot of copy and past repetition between envs and regions. This is cumbersome and prone to mistakes (Let's get DRY)
- certain infrastructure changes need to go out with every deployment because they are rapidly changing things or dependencies for code changes. This often get missed right now causing churn.
- we need to be able to have our CI servers automate all provisioning as the prod creds aren't known broadly (in prod you can't just run sceptre from your machine, it has to be intitiated from CI builds)
account_provisioning/sceptre/templates
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#install the cli | |
https://docs.aws.amazon.com/cli/latest/userguide/installing.html | |
## for windows | |
https://docs.aws.amazon.com/cli/latest/userguide/awscli-install-windows.html | |
# configure your keys | |
You should have received an access key and a secret key |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark-submit | |
--class com.mycompany.Job | |
--deploy-mode cluster | |
--master yarn | |
--conf spark.yarn.submit.waitAppCompletion=false | |
--driver-memory 4g | |
--num-executors 4 | |
--executor-memory 2g | |
--executor-cores 5 | |
s3://mycompany/artifact.jar |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
com.trax.platform.fps.auditreview:audit-review-recon:jar:1.0-SNAPSHOT | |
+- com.trax.platform:trax-platform-utils:jar:1.3.40:compile | |
| +- com.fasterxml.jackson.core:jackson-core:jar:2.4.4:compile | |
| +- com.fasterxml.jackson.core:jackson-databind:jar:2.4.4:compile | |
| | \- com.fasterxml.jackson.core:jackson-annotations:jar:2.4.0:compile | |
| +- com.fasterxml.jackson.module:jackson-module-scala_2.11:jar:2.4.4:compile | |
| | +- com.thoughtworks.paranamer:paranamer:jar:2.6:compile | |
| | \- com.google.code.findbugs:jsr305:jar:2.0.1:compile | |
| +- nl.grons:metrics-scala_2.11:jar:3.5.1:compile | |
| | \- io.dropwizard.metrics:metrics-healthchecks:jar:3.1.2:compile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
December 2017 Freight Bills with changes | |
Time to extract 16183567 bills from TraxDW - 22 minutes | |
found and grouped 16183567 records in 12 ms | |
+---------+----------------+ | |
|OWNER_KEY|count(OWNER_KEY)| | |
+---------+----------------+ | |
| 000-1001| 1| |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select count(*) as payr_dtl from dw.paYR_DTL; -- 171,280,225 | |
select count(*) from dw.remitDTL; --392,726,040 | |
select count(*) from dw.frght_bl; --408,302,966 | |
select count(*) from dw.exceptions; -- 201,285,497 | |
select count(*) from dw.invoice; -- 598,03,644 | |
select count(*) from dw.fb_ln; --1,226,455,963 | |
select count(*) from dw.frghtBlMaster; --686,564,991 | |
select count(*) from dw.ca_ELEM; --554,904,231 | |
select count(*) from dw.veNDOR_REMIT; --9697 | |
select count(*) from dw.veNDOR; --8379 |
NewerOlder