Skip to content

Instantly share code, notes, and snippets.

View mikekaminskycc's full-sized avatar

mikekaminskycc

View GitHub Profile
@mikekaminskycc
mikekaminskycc / BottledWater.md
Last active August 29, 2015 14:20
Bottled Water Summary

#Bottled Water

##Summary

Bottled water takes advantage of logical decoding (available with PostgreSQL 9.4) to funnel changes made to the database to a Kakfka stream in an Avro format which can then be transformed and sent somewhere else in a stream. Until PostgreSQL 9.4, if you wanted to stream changes you had to use triggers which is unappealing because of the burden they place on the databse servers.

Bottled Water Diagram

@mikekaminskycc
mikekaminskycc / streamprocessing.md
Last active August 29, 2015 14:20
Stream Processing Systems

#Streaming Processing Systems

Nomenclature:

  • Data Streaming Platform: The entire ecosystem of streaming data that includes the messaging system and the data distribution process.
  • Stream Processing System: The set of applications that are tasked with transforming streaming data en route between where the data were generated and where they are eventually placed for long-term storage and ad-hoc analyses.

Confluent has a nice post on what a stream data platform is for and what it is comprised of:

A stream data platform has two primary uses:

@mikekaminskycc
mikekaminskycc / audit_log_series.md
Last active August 29, 2015 14:21
Resetting the Audit Log Series

select count(*) from audit_log; 2652505123

From operational: select nextval('audit_log_id_seq'); 2752134460

If we exepect that the migration won't be run for two weeks, and we expect there to be 5 million rows added per day to the audit log between now and then, we get 14*5 = 70 million. 100 million to be safe. The sequence should be bumped forward 100 million from its current value of 2,752,134,460 giving us 2,852,134,460.

We are adding <300,000 IDs in our fixes, so this should be sufficient.