User:Icyflame/Incident Report template
Jump to navigation
Jump to search
yyyy-mm-dd: Incident Title
Impact
Eg: Downtime of 3 minutes / Visual Editor could not be used for 4 hours / etc
Trigger
Eg: Release of PR 49
Detection
Eg: Realized that Visual Editor was not working during post-release testing on wiki.metakgp.org
Timeline
Notes:
- Dates and times must always be entered in India Standard Time (UTC +5:30)
- Event (column 3) must be written in the present tense
Date | Time | Event | Notes |
---|---|---|---|
2019-07-17 | 5:58 | Release of PR 49 to use networks instead of links begins | links was a deprecated Docker feature and we wanted to move away from using it to the recommended replacement: networks |
2019-07-17 | 6:15 | [INCIDENT BEGINS] Visual editor becomes unusable | Error shown on the browser: parsoid could not connect to wiki |
2019-07-17 | 7:00 | [INCIDENT MITIGATED] Release of PR 51 to put the parsoid and nginx containers on the same network is complete | |
2019-07-17 | 7:05 | [INCIDENT ENDS] Visual editor is usable again | Verified as both anon user and as a logged in user, with and without captcha |
Incident Analysis
What went well? | What went wrong? | Where did we get lucky? |
---|---|---|
Both Vikrant and Shivam were online and they immediately jumped into the issue and started looking for ways to solve it | We didn't know about the maintenance script runJobs.php or the MediaWiki job queue | The release started around 3 am IST which is extremely low traffic time for us |
Notes / Discussion
Links to related documentation / steps that could have been taken which would have mitigated the problem