Skip to content

Instantly share code, notes, and snippets.

@Drewzar
Created July 3, 2012 01:08
Show Gist options
  • Save Drewzar/3036784 to your computer and use it in GitHub Desktop.
Save Drewzar/3036784 to your computer and use it in GitHub Desktop.
[14:38]*** Ines_S ([email protected]) has joined channel #ey-vitrue
<Ines_S> hello
<banzaiman> sessions: no resque processes running on this one.
[14:40]<banzaiman> sessions: is there one that has resque running? I checked resque10, resque8 and resque9, but I don't see any resque process yet.
[14:41]<banzaiman> sessions: so, this failure to reload unicorn must be coming from somewhere else.
<banzaiman> deploy error is somehow getting stuck on resque9.
[14:43]<Drewzar> Hey Ines_S
[14:44]<Drewzar> LuVitrue: is also here with us.
<Ines_S> hey Drewzar
<Drewzar> Where you able to find anything?
[14:45]<banzaiman> sessions: so…
<Ines_S> going to fetch your chef recipes, you guys have an even more custom cluster than we normally find
<banzaiman> sessions: i-c0ef0ca0 is the one that had the issue.
<banzaiman> sessions: how is resque supposed to be started on this? monit is not getting set up, it seems.
<sessions> banzaiman yes, all the nodes labeled resque should have resque running
[14:46]<sessions> yeah, it should be monit
[14:49]<banzaiman> sessions: hmm.
[14:50]<banzaiman> resque10, resque12, resque9, resque11, resque13, resque8, resque3 are not running any resque process.
[14:51]<sessions> hmm... that could be a problem
<Drewzar> Ines_S: Yea, sorry I didn't build that env, otherwise I'd be more of a help =\
<Ines_S> no worries, it will just take a little time for me to understand what you guys were trying to do
[14:52]<Ines_S> the last time I saw your environment it was just a 6 node replica set
<sessions> banzaiman i guess i'll need to deploy... there is no current directory
<Ines_S> btw did you get a chance to look at your production cluster?
<Ines_S> looks like it's missing a config server
[14:53]<Drewzar> I haven't looked in to that but I'm pretty sure the app is working.
<Drewzar> (as no one is screaming) :)
<sessions> banzaiman deploying now to see if it drops the current directory on those nodes
<LuVitrue> On staging, I don't remember we had a standalone config server.
[14:56]<Ines_S> Drewzar: the app is working because the shards can be written/read from
<Ines_S> but if you need to rebalance
<Ines_S> the operation will fail
[14:57]<Drewzar> So we need another config server in prod?
<Drewzar> Do we have to take the whole mongo cluster to do that?
<Drewzar> take it down*
[14:59]<LuVitrue> Looks like we are missing one configserver on production, too.
<Ines_S> yeah
<Ines_S> you're missing config 2
<Ines_S> so the metadata in the cluster is read only
<Ines_S> things "look ok" but you need 3 for a fully functional cluster
<Ines_S> so talk to me about your staging environment
[15:00]<Ines_S> my guess is that you are doing a shard consisting of a 2 node replica set
<Ines_S> is that correct?
<Ines_S> then you are missing an arbiter
<banzaiman> sessions: let me know how it goes
[15:01]<sessions> ok
[15:02]<sessions> banzaiman just an update is it's an the step right after the "permanently added...."
[15:03]<LuVitrue> Ines_S: you mean mongos?
<banzaiman> sessions: do you use engineyard gem?
[15:04]<sessions> yeah, but i used the dashboard inthis case and am tailing the log on app-master
<banzaiman> ssh key warnings are mostly noise, so it may or may not be relevant.
<banzaiman> ok.
[15:05]<sessions> it's actually been a long time.... seems to be much much longer than normal
[15:07]<banzaiman> sessions: it appears to fail at the same spot with the same unicorn master.
<sessions> hmm
[15:09]<banzaiman> not sure what's going on there.
<banzaiman> the master seems funcitonal.
<sessions> yeah the app is fine
<LuVitrue> Not sure how it was setup, but an arbiter doesn't require a dedicated server,
[15:10]<LuVitrue> it's probably running from some other node?
[15:18]<Ines_S> but it's still a process
<Ines_S> that needs to be up
[15:19]<Ines_S> requerying your instances for status
<sessions> banzaiman any ideas?
[15:20]<banzaiman> I'm looking at the unicorn masters that failed to reload at the moment
<banzaiman> there appear to be a few of those.
[15:21]<Ines_S> ohh so you had a shard of only a master-server
<Ines_S> crater i-9847e9f5 util [mongo1] ~ # cat /etc/mongodb/master.conf
<Ines_S> dbpath = /db/mongodb/master
<Ines_S> logpath = /var/log/mongodb/master.log
<Ines_S> logappend = true
[15:24]<Ines_S> I'm attempting to restart the mongod processes of your data nodes
<Ines_S> if this works your staging env may be ok
[15:27]<Ines_S> alright
<Ines_S> looks like it's up
<Ines_S> crater i-b05cf2dd util [mongo2] ~ # ps aux | grep mongo
<Ines_S> root 3540 0.0 0.7 89804 12924 ? Sl Jun30 0:02 /usr/local/mongodb/bin/mongod --configsvr --config /etc/mongodb/config.conf
<Ines_S> root 17406 0.0 1.6 174816 29648 pts/1 Sl 15:23 0:00 /usr/local/mongodb/bin/mongod --config /etc/mongodb/master.conf
<Ines_S> root 17468 0.0 0.0 1796 604 pts/1 S+ 15:26 0:00 grep --colour=auto mongo
<Ines_S> crater i-b05cf2dd util [mongo2] ~ # /usr/local/mongodb/bin/mongo
<Ines_S> MongoDB shell version: 2.0.0
<Ines_S> connecting to: test
<Ines_S> > show dbs;
<Ines_S> local 0.203125GB
<Ines_S> publisher 1.452880859375GB
<Ines_S> >
[15:28]<banzaiman> sessions: can I try restarting unicorn on one?
<LuVitrue> looks like it's just that mongod was down
[15:29]<Ines_S> yep
<Ines_S> try another deploy?
<sessions> bonzaiman yes
<Ines_S> I'm updating the ticket
<LuVitrue> cool, thanks.
[15:31]<LuVitrue> Drewzar: we need to fix the production mongo servers
[15:32]<banzaiman> ah
<banzaiman> sessions:
<banzaiman> app i-1b41cf7e current # cat config/database.yml
<banzaiman> production:
<banzaiman> adapter: mysql2
<banzaiman> database: 'emcee'
<banzaiman> username: 'deploy'
<banzaiman> password: 'q2wC5vdJll'
<banzaiman> host: 'ec2-50-16-120-246.compute-1.amazonaws.com'
<banzaiman> reconnect: true
<banzaiman> but mysql2 is not available.
<banzaiman> your app is using mysql, it appears.
<sessions> bonzaiman... shouldn't we be using mysql instead of mysql2?
<LuVitrue> use mysql2
[15:33]<LuVitrue> We have upgraded about two months ago.
<LuVitrue> wait,
<LuVitrue> this is for emcee?
<banzaiman> Gemfile is not reflecting that.
<banzaiman> yes, emcee.
<LuVitrue> nvmd
[15:34]<banzaiman> ok
<banzaiman> why are we choosing mysql2, I wonder…
<sessions> banzaiman i think this is a recent change
<banzaiman> sessions: on your part, or ours?
[15:35]<Drewzar> LuVitrue: Ines_S: so publisher staging is back up and running?
[15:36]<LuVitrue> at least the mongodb connection is okay now.
<sessions> yours
<banzaiman> forcing mysql restarts unicorn, it appears.
<Ines_S> yeah Drewzar, you need to confirm that you can deploy correctly though
<Drewzar> LuVitrue: okay, lets wait till sessions and banzaiman stop playing with prod before we start another thing.
[15:37]<LuVitrue> I am redeploying staging now.
[15:38]<Drewzar> okay, let us know :)
<sessions> banzaiman force restart rather than hot reload?
[15:39]<banzaiman> sessions: if the app's database.yml is messed up, I don't want to restart unicorn.
<sessions> i see
<sessions> LuVitrue: do you see a problem with using the mysql2 gem with emcee?
[15:40]<banzaiman> sessions: and all of them, except the one that I just manually edited, has mysql2.
<LuVitrue> I haven't tested mysql2 with emcee.
<sessions> banzaiman strange, because the application is actually running
<sessions> on the app nodes
<LuVitrue> I thought you were talking about Publisher.
[15:41]<banzaiman> sessions: I am puzzled, too. :-S
[15:43]<sessions> banzaiman so it appears as the app will work on mysql2
<sessions> actualy i don't know
[15:44]<Drewzar> Want me to pull Eli in here?
<Ines_S> hey guys any questions regarding the mongo cluster on Publisher?
<sessions> seems as though the app is though b/c it's in the db.yml folder in the current directory
[15:45]<Ines_S> I may step out for a bit and work on a different ticket
<Drewzar> Ines_S: in order to add a config server we're going to need to reboot the mongo cluster, correct?
[15:47]<Ines_S> yeah, you will need to stop the entire cluster
<Ines_S> change the startup scripts
<Ines_S> here let me give you a link
<Drewzar> Okay we'll need to plan downtime for that.
<Ines_S> likely best done with a scheduled maintenance window
[15:48]<Drewzar> yeah
<Drewzar> I'll work with sessions on that.
<Ines_S> http://www.mongodb.org/display/DOCS/Changing+Config+Servers
<Ines_S> ok, depending on the time I may be able to join you
<Drewzar> Sweet thanks
<Drewzar> It usally done Thrusdays at 11P
<Ines_S> how about you open a ticket to start planning the maintenance
<banzaiman> sessions: I think it is actually using mysql on the app master.
<banzaiman> app_master i-b251d5dd ~ # lsof | grep emcee | grep mysql
<banzaiman> rubyee 8271 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
<banzaiman> rubyee 11435 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
[15:49]<Ines_S> we can document what needs to be done
<banzaiman> rubyee 11436 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
<banzaiman> rubyee 11438 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
<banzaiman> rubyee 11446 deploy mem REG 65,145 91230 165638 /data/emcee/shared/bundled_gems/ruby/1.8/gems/mysql-2.8.1/lib/mysql_api.so
<Ines_S> and have someone handy to assist if needed
<Drewzar> Ines_S: I'll open a ticket once we have a date set
<Ines_S> ok guys I'm stepping out
<Ines_S> cool Drewzar
<Drewzar> Thanks so much :)
<Ines_S> nice to meet you all
<Ines_S> welcome
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment