- Skyline: ETL-as-a-Service
- Writing Dataflow pipelines with scalability in mind
- Data scientists mostly just do arithmetic
- BigQuery integrates with Google Drive
- StreamScope: Continuous reliable distributed processing of big data streams
- Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming
- Implementing Lambda Architecture to Track Real-Time Updates
- Getting the Picture
- Start here: Statistics for A/B testing
- Applied Spatial Data Science with R
- SASI Empowering Secondary Indexes
- Row Level Security with PostgreSQL 9.5
- Equivalences between Tables, Maps, Graphs, and Sets
- Analyzing the NPM dependency network
- Storing and Querying a Unidirectional Graph in MySQL
- Stephen Curry 3-point record ridiculousness
- The Fallacy of Measuring Everything
- Getting Started with MapR Streams
- How to Get Started Using Apache Spark GraphX with Scala
- How to Create a Network Graph Visualization of Reddit Subreddits
- Bet Super Bowl 50 Like A Boss with Apache Spark
- The Nuts and Bolts of Managing Database Upgrades
- Summarizing Data in SQL
- How to map JSON objects using generic Hibernate Types
- Splice Machine Announces Move to Open Source, Offers Early Access to Developers
- How to Use Cohort Analysis to Improve Retention
- BigData and CAP theorem in plain english
- “Unit testing” for data science
- The Key Steps for Quick and Effective Analytics on Disparate Data
- Spark Data Source API: Extending Our Spark SQL Query Engine
- A Pocket Guide to Data Science
- PostgreSQL 9.6 with Parallel Query vs. TPC-H
- The Changing Economics of Big Data
- No shard left behind: dynamic work rebalancing in Google Cloud Dataflow
- How to Evolve from RDBMS to NoSQL + SQL
- Real-Time Event Streaming: What Are Your Options?
- Introducing Autotrack for analytics.js
- Iterate over all keys in a Redis Cluster
- Data Modeling in Cassandra from a Postgres Perspective
- Apache Spark: RDD, DataFrame or Dataset?
- Evolution of the Netflix Data Pipeline
- Tachyon Caching Is Bigger Than Spark In-Memory
- Announcing Kafka Connect: Building large-scale low-latency data pipelines
- How much warmer your city was in 2015
- How We Monitor and Run Elasticsearch at Scale
- Why Apache Arrow is the future for open source Columnar In-Memory Analytics
- Sentiment Analysis on Enron’s Emails with Apache Spark
- The importance of single-partition operations in Cassandra
- BigQuery under the hood
- Interactive Analytics on GitHub Data using PostgreSQL with Citus
- Mining Mailboxes with Elasticsearch and Kibana
- A Graph-Specialized ETL: Taking Citizens into a Graph and Keeping It Up to Date
- Ideal Messaging Capabilities for Streaming Data
- Strong consistency in Manhattan
- Data USA makes government data easier to explore
- Kung fu particles
- One (Token) Ring to Rule Them All
- Engineers Shouldn’t Write ETL: Building a High Functioning Data Science Department
- Explore Data Science – self-paced online learning
- A Universal Streamer for Apache Ignite Based on Apache Camel
- Master-less Distributed Queue with Postgres and PG Paxos
- Change in the British diet, since 1974
- Understand Your Machine Learning Data With Descriptive Statistics in Python
- The design of RavenDB 4.0: Making Lucene reliable
- Stream-processing with Mantis
- How to Integrate Custom Data Sources Into Apache Spark
- Visualizing Unstructured Analysis – Elections, Words, and Zika virus
- Inside Capacitor, BigQuery’s next-generation columnar storage format
- 12 Awesome Spring Data Tutorials to Kick-Start your Data Projects
- 5 Tips for Learning to Code for Visualization
- SQL: Counting Groups of Rows Sharing Common Column Values
- Neo4j vs Relational: Refactoring – Extracting node/table
- Setting up Your Analytics Stack with Jupyter Notebook & AWS Redshift
- Elasticsearch
- You only need 3 votes to play, and other facts about the Hacker News frontpage
- Five mistakes beginners make when working with databases
- 21 Must-Know Data Science Interview Questions and Answers
- Postgres Query Plan Visualization
- Getting Started with Heron on Apache Mesos and Apache Kafka
- Announcing SQL Server on Linux
- Spark Key Terms, Explained
- Never Been Married
- Apache Hadoop Tutorial – The ULTIMATE Guide (PDF Download)
- Documents Update By Query with Elasticsearch
- 12 Inspiring Women In Data Science, Big Data
- Solving Problems with the Right Technology: Hadoop and RDBMS
- MySQL metadata locking and database transaction ending
- Catalog of Life Taxonomic Tree
- If you’re a db, you need to manage CPU / Memory / IO
- Firearms Dealers vs. Burgers, Pizza, and Coffee
- 10 SQL Tricks That You Didn’t Think Were Possible
- On-Time Flight Performance with Spark GraphFrames
- Lower socioeconomic status linked to lower education attainment
- Data Science Tools – Are Proprietary Vendors Still Relevant?
- Neo4j: A procedure for the SLM clustering algorithm
- Predictive policing
- Who marries who, by profession
- Small multiples for NBA game differentials
- Using GraphQL with NoSQL database ArangoDB
- Gartner 2016 Magic Quadrant for Advanced Analytics Platforms: gainers and losers
- Type safety on Spark DataFrames – Part 1
- Going multi DC with Cassandra : what is the pattern of your cluster ?
- The design of RavenDB 4.0: Physically segregating collections
- one.love – an inspired rethink
- The Essential Guide to Streaming-first Processing with Apache Flink
- GraphQL Deep Dive: The Cost of Flexibility
- R in Ecology
- A Great Analyst’s Best Friends: Skepticism & Wisdom!
- Open Sourcing Twitter Heron
- PostgreSQL Bloat: origins, monitoring and managing
- Spotify’s Event Delivery – The Road to the Cloud (Part I)
- Observability at Twitter: technical overview, part I
- Possible paths for a Trump nomination loss or win
- How to effectively use the Elasticsearch data source in Grafana and solutions to common pitfalls
- Microsoft SQL Server Developer Edition is now free
- Data Science: Do the Numbers – Part 1
- Hunting Down Phantom Write Spikes in RDS Postgres
- Analyzing Ruby Code With Neo4j
- Tailing the MongoDB Replica Set Oplog with Scala and Akka Stream
- Quick start with In memory Data Grid, Apache Ignite
- Building Data Systems: What Do You Need?
- Shot Blocking in the NHL Playoffs
- One Chart, Twelve Charting Libraries
- The Data Science Puzzle, Explained
- Using Metadata Repository to Improve MDM Success
- Python Dependency Analysis
- Citus Unforks From PostgreSQL, Goes Open Source
- History of massive-scale sorting experiments at Google
- What I Use to Visualize Data
- Fast Search Using PostgreSQL Trigram Indexes
- Correlated Subqueries are Evil and Slow. Or are They?
- Interconnectedness of the galaxies
- Sorted pagination in Cassandra
- Tablesample In PostgreSQL 9.5
- The Guts n’ Glory of Database Internals: Searching information and file format
- SQL Query on Mixed Schema Data Using Apache Drill
- Push your configuration to the limit – spice it up with statistics
- Treating visualization as a process
- First steps to Spring Boot Cassandra
- MongoDB queries don’t always return all matching documents!
- Rising death rates for white women
- NFL draft pick quality for your team
- How To Prepare Your Data For Machine Learning in Python with Scikit-Learn
- Voting habits for various demographic groups
- Transitioning from MySQL to Cassandra at Chaordic
- Microsoft’s DocumentDB now lets you use your mad MongoDB skills
- Redis transactions
- The Guts n’ Glory of Database Internals: The LSM option
- How to call Oracle stored procedures and functions from Hibernate
- How we built Search at Kit
- Spark Streaming and Twitter Sentiment Analysis
- How-To Apache Spark Streaming with Scala Part 1
- Introducing the Google Analytics 360 Suite
- Ad Block Tracking With Google Analytics: Code, Metrics, Reports
- Feature flagging to mitigate risk in database migration
- Shifting Parent Work Hours, Mom vs. Dad
- Doing Data Science: A Kaggle Walkthrough Part 2 – Understanding the Data
- (Abusing) Elasticsearch as a Framework
- Introduction to Big Data
- 10 Useful Python Data Visualization Libraries for Any Discipline
- Application architectures with persistent storage
- Cassandra Native Secondary Index Deep Dive
- Lift Analysis – A Data Scientist’s Secret Weapon
- Human perception for visualization
- Stream Processing Everywhere – What to Use?
- Vega-Lite for quick online charts
- Suite of data tools for beginners, focused on fun
- The most important thing to know in Cassandra data modeling: The primary key
- Facebook Reactions and the Problems With Quantifying Likes Differently
- Thoughts on Algolia (vs Solr & Elasticsearch)
- Dataflow/Beam & Spark: A Programming Model Comparison
- Search Strategy Formulation: A Framework For Learning
- Rethinking DITA “Custom” Attributes
- Visualizing Concurrency in Go
- The Rise of Dark Data and How It Can Be Harnessed
- Lists are the new search
- School district spending per student
- Agile Databases
- Getting Started with Sample Programs for Apache Kafka 0.9
- How to Build Applications on a NoSQL Document Database and Perform Analytics in Place
- The Guts n’ Glory of Database Internals: B+Tree
- U.S. gun deaths rate is an outlier
- When Should Approximate Query Processing Be Used?
- Do jobs run in families?
- Query Sniper
- Top 10 Essential Books for the Data Enthusiast
- The Marketing Analytics Maturity Model: Where You Are and How to Advance
- Pills of Eventual Consistency
- Using Spark and Zeppelin to process big data on Kubernetes 1.2
- PostgreSQL Streaming Replication in 10 Minutes
- Search Strategy Formulation: A Framework For Learning (part 2)
- The New Rules for Becoming a Data Scientist
- Cassandra: The Foundation Big Data Building Block
- R: tm – Unique words/terms per document
- Continuous Data Migration
- Supreme Court shifts in power
- Growing to Obesity
- Analyzing the Panama Papers with Neo4j: Data Models, Queries & More
- How to implement a custom String-based sequence identifier generator with Hibernate
- Apache Spark as a Distributed SQL Engine
- Gorilla: A fast, scalable, in-memory time series database
- Get Facebook, Bing, Twitter & more into Google Data Studio
- Building a Streaming Search Platform
- Rock Analytics More: Obsess About Goals And Goal Values!
- Five Things Every Startup Should be Doing with its Data
- Play chess against the machine and see what it’s thinking
- Working with streaming
- Serializable, Lockless, Distributed: Isolation in CockroachDB
- Marrying Age
- Evolution of Big Data Storage: How to Support Real-time Analytics at Scale
- Differences in JPA entity locking modes
- Apache Hadoop Hive Tutorial
Created
December 14, 2016 06:33
-
-
Save peanutpi/74c09062ba917b8785761b6b1c9efe51 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment