Created
July 2, 2012 17:42
-
-
Save scumola/3034517 to your computer and use it in GitHub Desktop.
irc chatlog on June 30, 2012 regarding getting the logins and RDS instances back up and running with Steve W, Alex C and Brendan K.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<stevewebb> Hey. | |
<beekermememe> Hey | |
<beekermememe> curl -v https://secure.dishonline.com:4444/secure_network_authentication/A21FF00E85C8666CE044001A4B0AA2BC | |
<beekermememe> * About to connect() to secure.dishonline.com port 4444 | |
<beekermememe> * Trying 23.21.52.20... connected | |
<beekermememe> * Connected to secure.dishonline.com (23.21.52.20) port 4444 | |
<beekermememe> * successfully set certificate verify locations: | |
<beekermememe> * CAfile: /etc/pki/tls/certs/ca-bundle.crt | |
<beekermememe> CApath: none | |
<beekermememe> * SSLv2, Client hello (1): | |
<stevewebb> Yea, I see that. | |
<beekermememe> Unknown SSL protocol error in connection to secure.dishonline.com:4444 | |
<beekermememe> * Closing connection #0 | |
<beekermememe> curl: (35) Unknown SSL protocol error in connection to secure.dishonline.com:4444 | |
<beekermememe> Do we need to recopy over the certs? | |
<stevewebb> no, the servers each respond correctly. I think you're right. It is ELB. | |
<stevewebb> I'll look into it. | |
<stevewebb> Ok, I forced the ELB to re-load the configuration and now I can get a good response back from the ELB every time. | |
<beekermememe> great | |
<beekermememe> okay logging in looks good at the moment, wait and see? | |
<stevewebb> pingdom just email'ed that the site is down. | |
<beekermememe> yep, give it a second to recover | |
<stevewebb> Ok. | |
<stevewebb> curl https://secure.dishonline.com:4444/secure_network_authentication/A21FF00E85C8666CE044001A4B0AA2BC | |
<stevewebb> is working reliably for me now. | |
<Alex> same for me | |
<beekermememe> Looks like networks is having trouble hitting the DB | |
<beekermememe> Ahhh back :) | |
<stevewebb> pingdom is back | |
<beekermememe> Now I think we may start to keep an eye on the DB loading | |
<stevewebb> The RDS instance x-prod-slave3 is being re-spun now | |
<beekermememe> I'll pop on and monitor | |
<stevewebb> x-prod-slave4 is next. | |
<stevewebb> Then we're done re-spinning RDS instances | |
<stevewebb> Although nagios thinks prod-dishonline-rds-slave-app1a | |
<stevewebb> is not replicating | |
<stevewebb> which I did re-spin this morning. | |
<beekermememe> ok things are going to be a little flaky until the master stops in the replications | |
<Alex> starting to recover | |
<beekermememe> yep, db connections are looking more normal | |
<stevewebb> Are logins working well now? | |
<Alex> I was able to login | |
<Alex> ^ pretty reliably | |
<beekermememe> same here 3 different logins | |
<stevewebb> ok, email sent. | |
<beekermememe> guide is loading ok here | |
<Alex> cool, I think the only open issue is DJ trying to connect to the rebooting RDS instance. | |
<beekermememe> k | |
<stevewebb> x-prod-slave-app3 is being built now. | |
<beekermememe> apdex is looking better now | |
<stevewebb> Excellent. | |
<stevewebb> I figured it'd take a while to get everything working again. | |
<stevewebb> When Amazon goes down, it goes down hard sometimes. | |
<Alex> yeah they must have had a major power failure | |
<stevewebb> Two power failures on the same coast in the same month. That's gotta hurt their reputation a little bit. | |
<beekermememe> yep, everyone got hit | |
<beekermememe> I think, we'll still see a higher error rate until all the DB's are back | |
<beekermememe> the ruby gem tries the ones marked as failed every so ofter | |
<beekermememe> often | |
<beekermememe> So are we just waiting for DB's to get respun? | |
<stevewebb> yea, I believe so. They take a while. | |
<beekermememe> yep, makes sense since everyone else is doing the same thing | |
<beekermememe> I am going to pop off and get a grill (been meaning to for the last 4 weeks) :) | |
<Alex> noice! | |
<Alex> manly saturday | |
<beekermememe> Hopefully AWS can get everything back, they really don't handle the cascading affect to well, that is the one thing I guess we need to keep an eye one | |
<beekermememe> for today | |
<stevewebb> starting the re-spin od RDS prod-slave-app04 instance |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment