This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Best Practice: write implicit conversion to types that you own | |
- Implicits can always be explicitly provided | |
Coursera courses | |
- https://www.coursera.org/learn/progfun1/home/welcome | |
Cheatsheet for this course is present at https://github.com/lampepfl/progfun-wiki/blob/gh-pages/CheatSheet.md | |
- https://www.coursera.org/learn/progfun2/home/welcome | |
Cheatsheet for this course is present at https://github.com/sjuvekar/reactive-programming-scala/blob/master/ReactiveCheatSheet.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1) look at thumb of raised hands on your side ... 10 times | |
2) Roll eyes and blink - 5 clockwise and anticlockwise | |
3) Write your name with eyes | |
4) ciliary muscle exercise - switch focus between close object and distant object | |
5) Open (inhale) and close(exhale) eyes - 5 times | |
6) massage your eyes | |
7) rub your hand and put them on eye |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dynamo DB notes | |
Dynamo uses a synthesis of well known techniques to achieve scalability and availability: | |
a) Data is partitioned and replicated using consistent hashing [10] | |
b) consistency is facilitated by object versioning [12]. | |
c) The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol. | |
d) Dynamo employs a gossip based distributed failure detection and membership protocol. | |
Dynamo is a completely decentralized system with minimal need for manual administration. Storage nodes can be added and removed from Dynamo without requiring any manual partitioning or redistribution. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
------------------------------------------------------------------------- | |
Training by Sameer Farooqui (https://www.youtube.com/watch?v=7ooZ4S7Ay6Y) | |
------------------------------------------------------------------------- | |
Schedulers | |
- Yarn/Mesos - you get dynamic partitioning (scaling) | |
- Local/Standalone - you get static partitioning (work is being done to get that in at least standalone more) | |
Hadoop MR vs Spark | |
- Spark is essentially a replacement for MR and not HDFS or Yarn. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//Install on ubuntu 14-04 | |
https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04 | |
cd ~/kafka | |
//Start the server | |
nohup bin/kafka-server-start.sh config/server.properties > kafka.log 2>&1 & | |
//Write to a topic TutorialTopic | |
echo "Hello, World" | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Searching in an unordered list in square root - n time : Grover's Algorithm | |
- Cryptography - Shor's Algo |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist is to provide a summary of the book titled "Data Lake development with big data" by Pradeep Pasupuleti |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We have seen technology evolve at a rapid pace in last 3 decades. From a point where computers were accessible to few to a world where they | |
are everywhere and connected. Internet moving from a point of rarity to ubiquity. This book by Kevin Kelly describes a dozen of inevitable | |
technological forces that have governed these changes and will continue to shape the next 30 years. He has captured their change into 12 | |
verbs, such as accessing, tracking, and sharing. To be more accurate, these are not just verbs, but present participles, the grammatical | |
form that conveys continuous action. These forces are accelerating actions. Essentially these are getting amplified as we are changing | |
as a society. These forces are Becoming, Cognifying, Flowing, Screening, Accessing, Sharing, Filtering, Remixing, Interacting, Tracking, | |
Questioning, and then Beginning. | |
Before we move to the actual forces, we should have a note around the change itself. So, here it is. | |
---------------------------------------------------------- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Properties of Data Platform: | |
- Data should be consolidated (with different sources together) | |
- It should be fast and efficient | |
- It should be approachable (discoverable, explorable, self-serve, viewable) | |
- It should be secure (governance, ACLs, provenance) | |
- ML on top of that using Spark | |
- Last and most important, it should be relevant and driven by business needs | |
These are based on the following document | |
https://drive.google.com/open?id=0B8eAsKPWNEi6M3d5Mm1qaFNPY3c |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Relational databases have been a successful technology for twenty years providing | |
- persistence | |
- concurrency control (Multiple apps and multiple users access the DB at the same time) | |
- integration mechanism (This is what prevented object oriented DBs to flourish) | |
Drawbacks of Relational DBs | |
- Impedance mismatch (In-memory(object) model of an application is different from (relational) model on disk). | |
That's why there are ORM frameworks which lead to loss of performance | |
- They are not designed to run efficiently on clusters |