Skip to content

Instantly share code, notes, and snippets.

@bramswenson
Last active August 29, 2015 14:04
Show Gist options
  • Save bramswenson/b90122857fc1cdc807d9 to your computer and use it in GitHub Desktop.
Save bramswenson/b90122857fc1cdc807d9 to your computer and use it in GitHub Desktop.
What happens when we kill databases with rails replication/sharding solutions?

The goal of this spike is to determine the behavior of rails replication and sharding solutions, specifically when selecting against associated records. We would like to see reads distributed across slaves where not instructed to do otherwise. Like so in the Octopus api:

# contrived example that actually doesn't work in octopus 0.8.2
irb(main):002:0> Artist.first.albums.count
[Shard: slave1]  Artist Load (1.3ms)  SELECT `artists`.* FROM `artists` LIMIT 1
[Shard: slave2]   (1.1ms)  SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
  1. Fails to use shard on queries involving relations
  2. Querues the slave for all queries when forced with a block
  3. Raises when the slave is gracefully stopped
  4. Raises when the slave is down before query is made
  5. Raises when the mysql service is stopped or the slave is removed from the replica set
  6. Breaks rake db:reset (not a big deal, but I suspect others)
  7. With slave groups the situation gets worse with no reads to slaves by default
  8. Appears to support ActiveRecord versions 3 and 4

The closest octopus gets is directly selecting a slave with the using method:

irb(main):002:0> Octopus.using(:slave1) { Artist.first.albums.count }
[Shard: slave1]  Artist Load (1.3ms)  SELECT `artists`.* FROM `artists` LIMIT 1
[Shard: slave1]   (1.1ms)  SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
  1. Only supports replication solutions
  2. Has select-able 'sticky' behavior ensuring a single slave in use per session
  3. Timed blacklisting of servers upon outage
  4. Adjustable Error handler to customize blacklisting behavior
  5. Configuration is sensitive to things that seemingly shouldn't matter (port setting in slave definition crashes the things)
  6. Generally performs as described and is just thin enough to do the job well
  7. Appears to support ActiveRecord versions 3 and 4
  8. Breaks db: related rake tasks (and more) with multiple sets of credentials
  1. Crashes on slave outages or read errors
  2. More support for shards
  3. Also a thin layer, but more migration help for shards

Others

development:
adapter: mysql2
encoding: utf8
reconnect: false
database: replication_ars_development
pool: 5
username: root
password:
host: 192.168.33.100
slave:
host: 192.168.33.101
host: 192.168.33.102
# uses master by default
irb(main):001:0> Artist.first.albums.count
Artist Load (0.4ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.3ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
# manual call to use slave (verified manually since its not logged)
irb(main):003:0> Artist.on_slave.first.albums.on_slave.count
Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.3ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
irb(main):008:0> while true do
irb(main):009:1* Artist.on_slave.first.albums.on_slave.count
irb(main):010:1> sleep 1
irb(main):011:1> end
Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.3ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
Artist Load (0.8ms) SELECT `artists`.* FROM `artists` LIMIT 1
Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
class Album < ActiveRecord::Base
belongs_to :artist
has_many :likes
has_many :liked_by, through: :likes, class_name: :User, source: :user
attr_accessible :name
end
class Artist < ActiveRecord::Base
has_many :albums
attr_accessible :name
end
class Like < ActiveRecord::Base
belongs_to :album
belongs_to :user
validates :album_id, presence: true, uniqueness: { scoped: :user_id }
validates :user_id, presence: true
end
class User < ActiveRecord::Base
attr_accessible :name
has_many :likes
has_many :liked_albums, through: :likes, class_name: :Album, source: :album
end
ActiveRecord::Schema.define(:version => 20140715220709) do
create_table "albums", :force => true do |t|
t.integer "artist_id"
t.string "name"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
end
add_index "albums", ["artist_id"], :name => "index_albums_on_artist_id"
create_table "artists", :force => true do |t|
t.string "name"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
end
create_table "likes", :force => true do |t|
t.integer "album_id"
t.integer "user_id"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
end
add_index "likes", ["album_id"], :name => "index_likes_on_album_id"
add_index "likes", ["user_id"], :name => "index_likes_on_user_id"
create_table "users", :force => true do |t|
t.string "name"
t.datetime "created_at", :null => false
t.datetime "updated_at", :null => false
end
end
bram = User.where(name: 'Bram').first_or_create!
tool = Artist.where(name: 'Tool').first_or_create!
[ 'Undertow', 'AEnima', 'Lateralus', '10000 Days' ].each do |album_name|
album = tool.albums.where(name: album_name).first_or_create!
bram.liked_albums << album unless bram.liked_albums.include?(album)
end
development:
adapter: makara_mysql2
database: replication_makara_development
username: root
password:
host: 192.168.33.100
port: 3306
makara:
connections:
- role: master
host: 192.168.33.100
- role: slave
host: 192.168.33.101
- role: slave
host: 192.168.33.102
connection_error_matchers:
- !ruby/regexp '/.*Mysql2::Error: Server shutdown in progress:.*/'
irb(main):005:0> while true do
irb(main):006:1* Artist.first.albums.pluck(:name)
irb(main):007:1> sleep 2
irb(main):008:1> end
[slave/2] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/2] SQL (0.4ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
[slave/2] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/2] SQL (0.5ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
# killed slave 2 with mysql stop
[slave/1] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/1] SQL (0.4ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
[slave/1] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/1] SQL (0.4ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
[slave/1] Artist Load (0.4ms) SELECT `artists`.* FROM `artists` LIMIT 1
# killed slave1 with mysql stop, recovered slave2 with mysql start
[slave/1] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
[master/1] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
[master/1] SQL (0.4ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
[master/1] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
[master/1] SQL (0.6ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
[master/1] Artist Load (1.1ms) SELECT `artists`.* FROM `artists` LIMIT 1
[master/1] SQL (1.1ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
# stopped console and restarted (seems console sessions are 'sticky')
irb(main):001:0> Artist.first.albums.pluck(:name)
[master/1] SCHEMA (0.7ms) SHOW FULL FIELDS FROM `artists`
[master/1] SCHEMA (0.4ms) SHOW TABLES LIKE 'artists'
[master/1] SCHEMA (0.3ms) SHOW CREATE TABLE `artists`
[slave/1] Artist Load (1.0ms) SELECT `artists`.* FROM `artists` LIMIT 1
[master/1] SCHEMA (0.7ms) SHOW FULL FIELDS FROM `albums`
[master/1] SCHEMA (0.3ms) SHOW TABLES LIKE 'albums'
[master/1] SCHEMA (0.2ms) SHOW CREATE TABLE `albums`
[slave/1] SQL (0.7ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
=> ["Undertow", "AEnima", "Lateralus", "10000 Days"]
irb(main):002:0> Artist.first.albums.pluck(:name)
[slave/1] Artist Load (1.2ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/1] SQL (0.4ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
=> ["Undertow", "AEnima", "Lateralus", "10000 Days"]
irb(main):003:0> Artist.first.albums.pluck(:name)
[slave/1] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/1] SQL (0.5ms) SELECT `albums`.`name` FROM `albums` WHERE `albums`.`artist_id` = 1
=> ["Undertow", "AEnima", "Lateralus", "10000 Days"]
# inspects the tables, then selects from slave
irb(main):001:0> Artist.first.albums.count
[master/1] SCHEMA (0.7ms) SHOW FULL FIELDS FROM `artists`
[master/1] SCHEMA (0.3ms) SHOW TABLES LIKE 'artists'
[master/1] SCHEMA (0.3ms) SHOW CREATE TABLE `artists`
[slave/1] Artist Load (1.3ms) SELECT `artists`.* FROM `artists` LIMIT 1
[master/1] SCHEMA (0.8ms) SHOW FULL FIELDS FROM `albums`
[master/1] SCHEMA (0.3ms) SHOW TABLES LIKE 'albums'
[master/1] SCHEMA (0.2ms) SHOW CREATE TABLE `albums`
[slave/1] (0.5ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
# all subsequent request to slave
irb(main):002:0> Artist.first.albums.count
[slave/1] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/1] (0.4ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
irb(main):003:0> Artist.first.albums.count
[slave/1] Artist Load (1.3ms) SELECT `artists`.* FROM `artists` LIMIT 1
[slave/1] (1.1ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
development:
adapter: mysql2
encoding: utf8
reconnect: false
database: replication_octopus_development
pool: 5
username: root
password:
host: 192.168.33.100
# without groups
octopus:
replicated: true
environments:
- development
development:
slave1:
adapter: mysql2
host: 192.168.33.101
database: replication_octopus_development
slave2:
adapter: mysql2
host: 192.168.33.102
database: replication_octopus_development
# with groups
octopus:
replicated: true
environments:
- development
development:
slave_group1:
slave1:
adapter: mysql2
host: 192.168.33.101
database: replication_octopus_development
slave2:
adapter: mysql2
host: 192.168.33.102
database: replication_octopus_development
# without slave groups
irb(main):001:0> Artist.first.albums.count
[Shard: slave1] Artist Load (1.4ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.3ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
# with slave groups
# https://github.com/tchandy/octopus/wiki/Slave-Groups#without-sharding
irb(main):001:0> Artist.first.albums.count
Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.4ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
# without slave groups
irb(main):003:0> while true
irb(main):004:1> Octopus.using(:slave1) { Artist.first.albums.count }
irb(main):005:1> sleep 2
irb(main):006:1> end
[Shard: slave1] Artist Load (1.7ms) SELECT `artists`.* FROM `artists` LIMIT 1
[Shard: slave1] (0.4ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave1] Artist Load (1.2ms) SELECT `artists`.* FROM `artists` LIMIT 1
Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
ActiveRecord::StatementInvalid: Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
# with slave groups
irb(main):016:0> while true
irb(main):017:1> Octopus.using(slave_group: :slave_group1) { Artist.first.albums.count }
irb(main):018:1> sleep 2
irb(main):019:1> end
[Shard: slave2] Artist Load (1.4ms) SELECT `artists`.* FROM `artists` LIMIT 1
(1.5ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave1] Artist Load (0.8ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.7ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave2] Artist Load (1.4ms) SELECT `artists`.* FROM `artists` LIMIT 1
(1.6ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave1] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
ActiveRecord::StatementInvalid: Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
# without slave groups
irb(main):048:0> while true
irb(main):049:1> Octopus.using(:slave1) { Artist.first.albums.count }
irb(main):050:1> sleep 2
irb(main):051:1> end
[Shard: slave1] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
[Shard: slave1] (0.4ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave1] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
[Shard: slave1] (0.4ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave1] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
ActiveRecord::StatementInvalid: Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
# with slave groups
irb(main):001:0> while true
irb(main):002:1> Octopus.using(slave_group: :slave_group1) { Artist.first.albums.count }
irb(main):003:1> sleep 2
irb(main):004:1> end
[Shard: slave2] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.6ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave1] Artist Load (0.3ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.5ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave2] Artist Load (1.2ms) SELECT `artists`.* FROM `artists` LIMIT 1
(1.8ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
[Shard: slave1] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
ActiveRecord::StatementInvalid: Mysql2::Error: Server shutdown in progress: SELECT `artists`.* FROM `artists` LIMIT 1
# without slave groups
irb(main):001:0> Octopus.using(:slave1) { Artist.first.albums.count }
Mysql2::Error: Can't connect to MySQL server on '192.168.33.102' (111)
# with slave groups
irb(main):001:0> Octopus.using(slave_group: :slave_group1) { Artist.first.albums.count } [Shard: slave2] Artist Load (0.6ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.4ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
irb(main):002:0> Octopus.using(slave_group: :slave_group1) { Artist.first.albums.count }
Mysql2::Error: Can't connect to MySQL server on '192.168.33.101' (113)
# without groups
irb(main):002:0> Octopus.using(:slave1) { Artist.first.albums.count }
[Shard: slave1] Artist Load (1.3ms) SELECT `artists`.* FROM `artists` LIMIT 1
[Shard: slave1] (1.1ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
# with slave groups
irb(main):001:0> Octopus.using(slave_group: :slave_group1) { Artist.first.albums.count }
[Shard: slave2] Artist Load (0.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.3ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
irb(main):002:0> Octopus.using(slave_group: :slave_group1) { Artist.first.albums.count }
[Shard: slave1] Artist Load (0.3ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.6ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
# wat!?! why u no use slave for association now?!?
# baby don't be so mean :(
irb(main):003:0> Octopus.using(slave_group: :slave_group1) do
irb(main):004:1* a = Artist.first
irb(main):005:1> Octopus.using(slave_group: :slave_group1) do
irb(main):006:2* a.albums.count
irb(main):007:2> end
irb(main):008:1> end
[Shard: slave2] Artist Load (1.5ms) SELECT `artists`.* FROM `artists` LIMIT 1
(1.5ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
# ok, lets get silly
irb(main):009:0> Octopus.using(slave_group: :slave_group1) do
irb(main):010:1* a = Artist.first
irb(main):011:1> Octopus.using("slave#{(1..2).to_a.sample}".to_sym) do
irb(main):012:2* a.albums.count
irb(main):013:2> end
irb(main):014:1> end
[Shard: slave1] Artist Load (1.0ms) SELECT `artists`.* FROM `artists` LIMIT 1
(0.6ms) SELECT COUNT(*) FROM `albums` WHERE `albums`.`artist_id` = 1
=> 4
# no joy, master select on association making shard_groups a less effective solution than just shards
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment