Drewzar · July 3, 2012 01:08
diff --git a/gistfile1.txt b/gistfile1.txt
 [14:38]*** Ines_S ([email protected]) has joined channel #ey-vitrue
       <Ines_S> hello
       <banzaiman> sessions: no resque processes running on this one.
 [14:40]<banzaiman> sessions: is there one that has resque running? I checked resque10, resque8 and resque9, but I don't see any resque process yet.
 [14:41]<banzaiman> sessions: so, this failure to reload unicorn must be coming from somewhere else.
       <banzaiman> deploy error is somehow getting stuck on resque9.
 [14:43]<Drewzar> Hey Ines_S 
 [14:44]<Drewzar> LuVitrue: is also here with us.
       <Ines_S> hey Drewzar
       <Drewzar> Where you able to find anything?
 [14:45]<banzaiman> sessions: so…
       <Ines_S> going to fetch your chef recipes, you guys have an even more custom cluster than we normally find
       <banzaiman> sessions: i-c0ef0ca0 is the one that had the issue.
       <banzaiman> sessions: how is resque supposed to be started on this? monit is not getting set up, it seems.
       <sessions> banzaiman yes, all the nodes labeled resque should have resque running
 [14:46]<sessions> yeah, it should be monit
 [14:49]<banzaiman> sessions: hmm.
 [14:50]<banzaiman> resque10, resque12, resque9, resque11, resque13, resque8, resque3 are not running any resque process.
 [14:51]<sessions> hmm... that could be a problem
       <Drewzar> Ines_S: Yea, sorry I didn't build that env, otherwise I'd be more of a help =\
       <Ines_S> no worries, it will just take a little time for me to understand what you guys were trying to do
 [14:52]<Ines_S> the last time I saw your environment it was just a 6 node replica set
       <sessions> banzaiman i guess i'll need to deploy... there is no current directory 
       <Ines_S> btw did you get a chance to look at your production cluster?
       <Ines_S> looks like it's missing a config server
 [14:53]<Drewzar> I haven't looked in to that but I'm pretty sure the app is working.
       <Drewzar> (as no one is screaming) :)
       <sessions> banzaiman deploying now to see if it drops the current directory on those nodes
       <LuVitrue> On staging, I don't remember we had a standalone config server.
 [14:56]<Ines_S> Drewzar: the app is working because the shards can be written/read from
       <Ines_S> but if you need to rebalance
       <Ines_S> the operation will fail
 [14:57]<Drewzar> So we need another config server in prod?
       <Drewzar> Do we have to take the whole mongo cluster to do that?
       <Drewzar> take it down*
 [14:59]<LuVitrue> Looks like we are missing one configserver on production, too.
       <Ines_S> yeah
       <Ines_S> you're missing config 2
       <Ines_S> so the metadata in the cluster is read only
       <Ines_S> things "look ok" but you need 3 for a fully functional cluster
       <Ines_S> so talk to me about your staging environment
 [15:00]<Ines_S> my guess is that you are doing a shard consisting of a 2 node replica set
       <Ines_S> is that correct?
       <Ines_S> then you are missing an arbiter
       <banzaiman> sessions: let me know how it goes
 [15:01]<sessions> ok
 [15:02]<sessions> banzaiman just an update is it's an the step right after the "permanently added...."
 [15:03]<LuVitrue> Ines_S: you mean mongos?
       <banzaiman> sessions: do you use engineyard gem?
 [15:04]<sessions> yeah, but i used the dashboard inthis case and am tailing the log on app-master
       <banzaiman> ssh key warnings are mostly noise, so it may or may not be relevant.
       <banzaiman> ok.
 [15:05]<sessions> it's actually been a long time.... seems to be much much longer than normal
 [15:07]<banzaiman> sessions: it appears to fail at the same spot with the same unicorn master.
       <sessions> hmm
 [15:09]<banzaiman> not sure what's going on there.
       <banzaiman> the master seems funcitonal.
       <sessions> yeah the app is fine
       <LuVitrue> Not sure how it was setup, but an arbiter doesn't require a dedicated server, 
 [15:10]<LuVitrue> it's probably running from some other node?
 [15:18]<Ines_S> but it's still a process
       <Ines_S> that needs to be up
 [15:19]<Ines_S> requerying your instances for status
       <sessions> banzaiman any ideas?
 [15:20]<banzaiman> I'm looking at the unicorn masters that failed to reload at the moment
       <banzaiman> there appear to be a few of those.
 [15:21]<Ines_S> ohh so you had a shard of only a master-server
       <Ines_S> crater i-9847e9f5 util [mongo1] ~ # cat /etc/mongodb/master.conf
       <Ines_S> dbpath = /db/mongodb/master
       <Ines_S> logpath = /var/log/mongodb/master.log
       <Ines_S> logappend = true
 [15:24]<Ines_S> I'm attempting to restart the mongod processes of your data nodes
       <Ines_S> if this works your staging env may be ok
 [15:27]<Ines_S> alright
       <Ines_S> looks like it's up
       <Ines_S> crater i-b05cf2dd util [mongo2] ~ # ps aux | grep mongo
       <Ines_S> root      3540  0.0  0.7  89804 12924 ?        Sl   Jun30   0:02 /usr/local/mongodb/bin/mongod --configsvr --config /etc/mongodb/config.conf
       <Ines_S> root     17406  0.0  1.6 174816 29648 pts/1    Sl   15:23   0:00 /usr/local/mongodb/bin/mongod --config /etc/mongodb/master.conf
       <Ines_S> root     17468  0.0  0.0   1796   604 pts/1    S+   15:26   0:00 grep --colour=auto mongo
       <Ines_S> crater i-b05cf2dd util [mongo2] ~ # /usr/local/mongodb/bin/mongo
       <Ines_S> MongoDB shell version: 2.0.0
       <Ines_S> connecting to: test
       <Ines_S> > show dbs;
       <Ines_S> local	0.203125GB
       <Ines_S> publisher	1.452880859375GB
       <Ines_S> > 
 [15:28]<banzaiman> sessions: can I try restarting unicorn on one?
       <LuVitrue> looks like it's just that mongod was down
 [15:29]<Ines_S> yep
       <Ines_S> try another deploy?
       <sessions> bonzaiman yes
       <Ines_S> I'm updating the ticket
       <LuVitrue> cool, thanks.
 [15:31]<LuVitrue> Drewzar: we need to fix the production mongo servers
 [15:32]<banzaiman> ah
       <banzaiman> sessions:
       <banzaiman> app i-1b41cf7e current # cat config/database.yml
       <banzaiman> production:
       <banzaiman>   adapter:   mysql2
       <banzaiman>   database:  'emcee'
       <banzaiman>   username:  'deploy'
       <banzaiman>   password:  'q2wC5vdJll'
       <banzaiman>   host:      'ec2-50-16-120-246.compute-1.amazonaws.com'
       <banzaiman>   reconnect: true
       <banzaiman> but mysql2 is not available.
       <banzaiman> your app is using mysql, it appears.
       <sessions> bonzaiman... shouldn't we be using mysql instead of mysql2?
       <LuVitrue> use mysql2
 [15:33]<LuVitrue> We have upgraded about two months ago.
       <LuVitrue> wait, 
       <LuVitrue> this is for emcee?
       <banzaiman> Gemfile is not reflecting that.
       <banzaiman> yes, emcee.
       <LuVitrue> nvmd
 [15:34]<banzaiman> ok
       <banzaiman> why are we choosing mysql2, I wonder…
       <sessions> banzaiman i think this is a recent change
       <banzaiman> sessions: on your part, or ours?
 [15:35]<Drewzar> LuVitrue: Ines_S: so publisher staging is back up and running?
 [15:36]<LuVitrue> at least the mongodb connection is okay now.
       <sessions> yours
       <banzaiman> forcing mysql restarts unicorn, it appears.
       <Ines_S> yeah Drewzar, you need to confirm that you can deploy correctly though
       <Drewzar> LuVitrue: okay, lets wait till sessions and banzaiman stop playing with prod before we start another thing.
 [15:37]<LuVitrue> I am redeploying staging now.
 [15:38]<Drewzar> okay, let us know :)
       <sessions> banzaiman force restart rather than hot reload?
 [15:39]<banzaiman> sessions: if the app's database.yml is messed up, I don't want to restart unicorn.
       <sessions> i see
       <sessions> LuVitrue: do you see a problem with using the mysql2 gem with emcee?
 [15:40]<banzaiman> sessions: and all of them, except the one that I just manually edited, has mysql2.
       <LuVitrue> I haven't tested mysql2 with emcee.
       <sessions> banzaiman strange, because the application is actually running
       <sessions> on the app nodes
       <LuVitrue> I thought you were talking about Publisher.
 [15:41]<banzaiman> sessions: I am puzzled, too. :-S
 [15:43]<sessions> banzaiman so it appears as the app will work on mysql2
       <sessions> actualy i don't know
 [15:44]<Drewzar> Want me to pull Eli in here?
       <Ines_S> hey guys any questions regarding the mongo cluster on Publisher?
       <sessions> seems as though the app is though b/c it's in the db.yml folder in the current directory
 [15:45]<Ines_S> I may step out for a bit and work on a different ticket
       <Drewzar> Ines_S: in order to add a config server we're going to need to reboot the mongo cluster, correct?
 [15:47]<Ines_S> yeah, you will need to stop the entire cluster
       <Ines_S> change the startup scripts
       <Ines_S> here let me give you a link
       <Drewzar> Okay we'll need to plan downtime for that.
       <Ines_S> likely best done with a scheduled maintenance window
 [15:48]<Drewzar> yeah
       <Drewzar> I'll work with sessions on that.
       <Ines_S> http://www.mongodb.org/display/DOCS/Changing+Config+Servers
       <Ines_S> ok, depending on the time I may be able to join you
       <Drewzar> Sweet thanks
       <Drewzar> It usally done Thrusdays at 11P
       <Ines_S> how about you open a ticket to start planning the maintenance
       <banzaiman> sessions: I think it is actually using mysql on the app master.
       <banzaiman> app_master i-b251d5dd ~ # lsof | grep emcee | grep mysql
       <banzaiman> rubyee     8271    deploy  mem       REG     65,145     91230     165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
       <banzaiman> rubyee    11435    deploy  mem       REG     65,145     91230     165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
 [15:49]<Ines_S> we can document what needs to be done 
       <banzaiman> rubyee    11436    deploy  mem       REG     65,145     91230     165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
       <banzaiman> rubyee    11438    deploy  mem       REG     65,145     91230     165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
       <banzaiman> rubyee    11446    deploy  mem       REG     65,145     91230     165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
       <Ines_S> and have someone handy to assist if needed
       <Drewzar> Ines_S: I'll open a ticket once we have a date set
       <Ines_S> ok guys I'm stepping out 
       <Ines_S> cool Drewzar
       <Drewzar> Thanks so much :)
       <Ines_S> nice to meet you all
       <Ines_S> welcome
	[14:38]*** Ines_S ([email protected]) has joined channel #ey-vitrue
	<Ines_S> hello
	<banzaiman> sessions: no resque processes running on this one.
	[14:40]<banzaiman> sessions: is there one that has resque running? I checked resque10, resque8 and resque9, but I don't see any resque process yet.
	[14:41]<banzaiman> sessions: so, this failure to reload unicorn must be coming from somewhere else.
	<banzaiman> deploy error is somehow getting stuck on resque9.
	[14:43]<Drewzar> Hey Ines_S
	[14:44]<Drewzar> LuVitrue: is also here with us.
	<Ines_S> hey Drewzar
	<Drewzar> Where you able to find anything?
	[14:45]<banzaiman> sessions: so…
	<Ines_S> going to fetch your chef recipes, you guys have an even more custom cluster than we normally find
	<banzaiman> sessions: i-c0ef0ca0 is the one that had the issue.
	<banzaiman> sessions: how is resque supposed to be started on this? monit is not getting set up, it seems.
	<sessions> banzaiman yes, all the nodes labeled resque should have resque running
	[14:46]<sessions> yeah, it should be monit
	[14:49]<banzaiman> sessions: hmm.
	[14:50]<banzaiman> resque10, resque12, resque9, resque11, resque13, resque8, resque3 are not running any resque process.
	[14:51]<sessions> hmm... that could be a problem
	<Drewzar> Ines_S: Yea, sorry I didn't build that env, otherwise I'd be more of a help =\
	<Ines_S> no worries, it will just take a little time for me to understand what you guys were trying to do
	[14:52]<Ines_S> the last time I saw your environment it was just a 6 node replica set
	<sessions> banzaiman i guess i'll need to deploy... there is no current directory
	<Ines_S> btw did you get a chance to look at your production cluster?
	<Ines_S> looks like it's missing a config server
	[14:53]<Drewzar> I haven't looked in to that but I'm pretty sure the app is working.
	<Drewzar> (as no one is screaming) :)
	<sessions> banzaiman deploying now to see if it drops the current directory on those nodes
	<LuVitrue> On staging, I don't remember we had a standalone config server.
	[14:56]<Ines_S> Drewzar: the app is working because the shards can be written/read from
	<Ines_S> but if you need to rebalance
	<Ines_S> the operation will fail
	[14:57]<Drewzar> So we need another config server in prod?
	<Drewzar> Do we have to take the whole mongo cluster to do that?
	<Drewzar> take it down*
	[14:59]<LuVitrue> Looks like we are missing one configserver on production, too.
	<Ines_S> yeah
	<Ines_S> you're missing config 2
	<Ines_S> so the metadata in the cluster is read only
	<Ines_S> things "look ok" but you need 3 for a fully functional cluster
	<Ines_S> so talk to me about your staging environment
	[15:00]<Ines_S> my guess is that you are doing a shard consisting of a 2 node replica set
	<Ines_S> is that correct?
	<Ines_S> then you are missing an arbiter
	<banzaiman> sessions: let me know how it goes
	[15:01]<sessions> ok
	[15:02]<sessions> banzaiman just an update is it's an the step right after the "permanently added...."
	[15:03]<LuVitrue> Ines_S: you mean mongos?
	<banzaiman> sessions: do you use engineyard gem?
	[15:04]<sessions> yeah, but i used the dashboard inthis case and am tailing the log on app-master
	<banzaiman> ssh key warnings are mostly noise, so it may or may not be relevant.
	<banzaiman> ok.
	[15:05]<sessions> it's actually been a long time.... seems to be much much longer than normal
	[15:07]<banzaiman> sessions: it appears to fail at the same spot with the same unicorn master.
	<sessions> hmm
	[15:09]<banzaiman> not sure what's going on there.
	<banzaiman> the master seems funcitonal.
	<sessions> yeah the app is fine
	<LuVitrue> Not sure how it was setup, but an arbiter doesn't require a dedicated server,
	[15:10]<LuVitrue> it's probably running from some other node?
	[15:18]<Ines_S> but it's still a process
	<Ines_S> that needs to be up
	[15:19]<Ines_S> requerying your instances for status
	<sessions> banzaiman any ideas?
	[15:20]<banzaiman> I'm looking at the unicorn masters that failed to reload at the moment
	<banzaiman> there appear to be a few of those.
	[15:21]<Ines_S> ohh so you had a shard of only a master-server
	<Ines_S> crater i-9847e9f5 util [mongo1] ~ # cat /etc/mongodb/master.conf
	<Ines_S> dbpath = /db/mongodb/master
	<Ines_S> logpath = /var/log/mongodb/master.log
	<Ines_S> logappend = true
	[15:24]<Ines_S> I'm attempting to restart the mongod processes of your data nodes
	<Ines_S> if this works your staging env may be ok
	[15:27]<Ines_S> alright
	<Ines_S> looks like it's up
	<Ines_S> crater i-b05cf2dd util [mongo2] ~ # ps aux \| grep mongo
	<Ines_S> root 3540 0.0 0.7 89804 12924 ? Sl Jun30 0:02 /usr/local/mongodb/bin/mongod --configsvr --config /etc/mongodb/config.conf
	<Ines_S> root 17406 0.0 1.6 174816 29648 pts/1 Sl 15:23 0:00 /usr/local/mongodb/bin/mongod --config /etc/mongodb/master.conf
	<Ines_S> root 17468 0.0 0.0 1796 604 pts/1 S+ 15:26 0:00 grep --colour=auto mongo
	<Ines_S> crater i-b05cf2dd util [mongo2] ~ # /usr/local/mongodb/bin/mongo
	<Ines_S> MongoDB shell version: 2.0.0
	<Ines_S> connecting to: test
	<Ines_S> > show dbs;
	<Ines_S> local 0.203125GB
	<Ines_S> publisher 1.452880859375GB
	<Ines_S> >
	[15:28]<banzaiman> sessions: can I try restarting unicorn on one?
	<LuVitrue> looks like it's just that mongod was down
	[15:29]<Ines_S> yep
	<Ines_S> try another deploy?
	<sessions> bonzaiman yes
	<Ines_S> I'm updating the ticket
	<LuVitrue> cool, thanks.
	[15:31]<LuVitrue> Drewzar: we need to fix the production mongo servers
	[15:32]<banzaiman> ah
	<banzaiman> sessions:
	<banzaiman> app i-1b41cf7e current # cat config/database.yml
	<banzaiman> production:
	<banzaiman> adapter: mysql2
	<banzaiman> database: 'emcee'
	<banzaiman> username: 'deploy'
	<banzaiman> password: 'q2wC5vdJll'
	<banzaiman> host: 'ec2-50-16-120-246.compute-1.amazonaws.com'
	<banzaiman> reconnect: true
	<banzaiman> but mysql2 is not available.
	<banzaiman> your app is using mysql, it appears.
	<sessions> bonzaiman... shouldn't we be using mysql instead of mysql2?
	<LuVitrue> use mysql2
	[15:33]<LuVitrue> We have upgraded about two months ago.
	<LuVitrue> wait,
	<LuVitrue> this is for emcee?
	<banzaiman> Gemfile is not reflecting that.
	<banzaiman> yes, emcee.
	<LuVitrue> nvmd
	[15:34]<banzaiman> ok
	<banzaiman> why are we choosing mysql2, I wonder…
	<sessions> banzaiman i think this is a recent change
	<banzaiman> sessions: on your part, or ours?
	[15:35]<Drewzar> LuVitrue: Ines_S: so publisher staging is back up and running?
	[15:36]<LuVitrue> at least the mongodb connection is okay now.
	<sessions> yours
	<banzaiman> forcing mysql restarts unicorn, it appears.
	<Ines_S> yeah Drewzar, you need to confirm that you can deploy correctly though
	<Drewzar> LuVitrue: okay, lets wait till sessions and banzaiman stop playing with prod before we start another thing.
	[15:37]<LuVitrue> I am redeploying staging now.
	[15:38]<Drewzar> okay, let us know :)
	<sessions> banzaiman force restart rather than hot reload?
	[15:39]<banzaiman> sessions: if the app's database.yml is messed up, I don't want to restart unicorn.
	<sessions> i see
	<sessions> LuVitrue: do you see a problem with using the mysql2 gem with emcee?
	[15:40]<banzaiman> sessions: and all of them, except the one that I just manually edited, has mysql2.
	<LuVitrue> I haven't tested mysql2 with emcee.
	<sessions> banzaiman strange, because the application is actually running
	<sessions> on the app nodes
	<LuVitrue> I thought you were talking about Publisher.
	[15:41]<banzaiman> sessions: I am puzzled, too. :-S
	[15:43]<sessions> banzaiman so it appears as the app will work on mysql2
	<sessions> actualy i don't know
	[15:44]<Drewzar> Want me to pull Eli in here?
	<Ines_S> hey guys any questions regarding the mongo cluster on Publisher?
	<sessions> seems as though the app is though b/c it's in the db.yml folder in the current directory
	[15:45]<Ines_S> I may step out for a bit and work on a different ticket
	<Drewzar> Ines_S: in order to add a config server we're going to need to reboot the mongo cluster, correct?
	[15:47]<Ines_S> yeah, you will need to stop the entire cluster
	<Ines_S> change the startup scripts
	<Ines_S> here let me give you a link
	<Drewzar> Okay we'll need to plan downtime for that.
	<Ines_S> likely best done with a scheduled maintenance window
	[15:48]<Drewzar> yeah
	<Drewzar> I'll work with sessions on that.
	<Ines_S> http://www.mongodb.org/display/DOCS/Changing+Config+Servers
	<Ines_S> ok, depending on the time I may be able to join you
	<Drewzar> Sweet thanks
	<Drewzar> It usally done Thrusdays at 11P
	<Ines_S> how about you open a ticket to start planning the maintenance
	<banzaiman> sessions: I think it is actually using mysql on the app master.
	<banzaiman> app_master i-b251d5dd ~ # lsof \| grep emcee \| grep mysql
	<banzaiman> rubyee 8271 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
	<banzaiman> rubyee 11435 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
	[15:49]<Ines_S> we can document what needs to be done
	<banzaiman> rubyee 11436 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
	<banzaiman> rubyee 11438 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
	<banzaiman> rubyee 11446 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
	<Ines_S> and have someone handy to assist if needed
	<Drewzar> Ines_S: I'll open a ticket once we have a date set
	<Ines_S> ok guys I'm stepping out
	<Ines_S> cool Drewzar
	<Drewzar> Thanks so much :)
	<Ines_S> nice to meet you all
	<Ines_S> welcome