Skip to content

Instantly share code, notes, and snippets.

@neerajgoel82
neerajgoel82 / books--data-lake-development-with-big-data
Created February 18, 2017 03:50
Summary of Data Lake development with big data
This gist is to provide a summary of the book titled "Data Lake development with big data" by Pradeep Pasupuleti
@neerajgoel82
neerajgoel82 / CS--quantum-computing
Created March 5, 2017 07:56
This will capture my thoughts around quantum computing
- Searching in an unordered list in square root - n time : Grover's Algorithm
- Cryptography - Shor's Algo
//Install on ubuntu 14-04
https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-14-04
cd ~/kafka
//Start the server
nohup bin/kafka-server-start.sh config/server.properties > kafka.log 2>&1 &
//Write to a topic TutorialTopic
echo "Hello, World" | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null
-------------------------------------------------------------------------
Training by Sameer Farooqui (https://www.youtube.com/watch?v=7ooZ4S7Ay6Y)
-------------------------------------------------------------------------
Schedulers
- Yarn/Mesos - you get dynamic partitioning (scaling)
- Local/Standalone - you get static partitioning (work is being done to get that in at least standalone more)
Hadoop MR vs Spark
- Spark is essentially a replacement for MR and not HDFS or Yarn.
Dynamo DB notes
Dynamo uses a synthesis of well known techniques to achieve scalability and availability:
a) Data is partitioned and replicated using consistent hashing [10]
b) consistency is facilitated by object versioning [12].
c) The consistency among replicas during updates is maintained by a quorum-like technique and a decentralized replica synchronization protocol.
d) Dynamo employs a gossip based distributed failure detection and membership protocol.
Dynamo is a completely decentralized system with minimal need for manual administration. Storage nodes can be added and removed from Dynamo without requiring any manual partitioning or redistribution.
1) look at thumb of raised hands on your side ... 10 times
2) Roll eyes and blink - 5 clockwise and anticlockwise
3) Write your name with eyes
4) ciliary muscle exercise - switch focus between close object and distant object
5) Open (inhale) and close(exhale) eyes - 5 times
6) massage your eyes
7) rub your hand and put them on eye
- Best Practice: write implicit conversion to types that you own
- Implicits can always be explicitly provided
Coursera courses
- https://www.coursera.org/learn/progfun1/home/welcome
Cheatsheet for this course is present at https://github.com/lampepfl/progfun-wiki/blob/gh-pages/CheatSheet.md
- https://www.coursera.org/learn/progfun2/home/welcome
Cheatsheet for this course is present at https://github.com/sjuvekar/reactive-programming-scala/blob/master/ReactiveCheatSheet.md
Installing HBase
- Download HBase
- Edit conf/hbae-site.xml. Add the following to the configuration file
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///Users/neeraj/tmp/hbase</value>
</property>
<property>
cd ~/zookeeper/bin
//To start zookeeper server
./zkServer start
//To start zookeeper client
./zkCli.sh
If it is installed on mac using brew, then just write zkServer start
The product manager has two key responsibilities:
-------------------------------------------
- assessing product (that is valuable) opportunities
- defining the product (usable, and feasible) to be built.
Different roles in a product organization
-------------------------------------------
- Product managers (the key is that it describes the functionality and behavior of the product to be built, and not how it will be implemented.)
- UX Designers ( These people are responsible for developing a deep understanding of the target users (each persona that you’re trying to
satisfy in your pro) and coming up with the tasks, navigation, and flow that are both usable and productive.)