Created
January 23, 2014 06:12
-
-
Save stowler/8573753 to your computer and use it in GitHub Desktop.
Timeline of events for 1599 cooling outage. Pasted from the notes I started the day after the outage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
======================== F, 20140103: ======================== | |
- 4:47p - email from hippostore: ERROR - Battery temperature is too high | |
- 4:57p - email from hippostore: ERROR - Enclosure temperature too high: encl=0, temperature sensor=2 | |
- 4:58p - email from hippostore: ERROR - Enclosure temperature too high: encl=0, temperature sensor=3 | |
- 5:16p - email from hippostore: ERROR - Enclosure temperature sensor error: encl=0, temperature sensor=0 | |
- 5:16p - email from hippostore: ERROR - Enclosure temperature too high: encl=0, temperature sensor=1 | |
- 5:16p - email from hippostore: ERROR - Enclosure temperature sensor error: encl=0, temperature sensor=1 | |
- 5:30p - email from Emory: Power outage at 1599 clifton rd COLO server room (cooling systems impacted) | |
- 5:33p - email from hippostore: ERROR - Battery charging fault | |
- 5:38p - I email Keith and depart for 1599: True positive alarms: all cooling died in 1599...on my way in a sec | |
- 5:42p - email from Dan: his servers hot, wants to be sure I saw Emory's 5:30 announcement | |
- 5:50p - I arrive at1599 | |
- 5:52p - email from Keith: hippostore RAID controller is 53 C, RAID battery failed. Drive temps are 55 degrees. Keith shuts down remotely. | |
- 6:07p - I push the power button on all qballs (Keith and I can't get them to respond otherwise). Greg Keys doesn't know who owns the still-active equipment below ours. | |
- 6:22p - portable blowers start arriving | |
- 6:30p - I call Gopi: someone thinks equipment below us is CSI/BITC | |
- 6:45p - Lei from CSI/BITC arrives and starts shutting down the equipment below ours | |
- 6:49p - 119 F ambient according to free-standing probe in rack G20 (low-density, coolest area of room) | |
- 7:51p - Cooling units repaired. Bringing our equipment back on line. No visible external alarms. | |
- 8:20p - I emailed update to VA imagers | |
- 9:00p - email from Emory: Service Restored Date/Time: 2014-01-03 09:00 PM EST | |
- 9:00p - I leave 1599 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment