User:Icyflame/Incident Report template

From Metakgp Wiki
Jump to navigation Jump to search

yyyy-mm-dd: Incident Title

Impact

Eg: Downtime of 3 minutes / Visual Editor could not be used for 4 hours / etc

Trigger

Eg: Release of PR 49

Detection

Eg: Realized that Visual Editor was not working during post-release testing on wiki.metakgp.org

Timeline

Notes:

  • Dates and times must always be entered in India Standard Time (UTC +5:30)
  • Event (column 3) must be written in the present tense
Date Time Event Notes
2019-07-17 5:58 Release of PR 49 to use networks instead of links begins links was a deprecated Docker feature and we wanted to move away from using it to the recommended replacement: networks
2019-07-17 6:15 [INCIDENT BEGINS] Visual editor becomes unusable Error shown on the browser: parsoid could not connect to wiki
2019-07-17 7:00 [INCIDENT MITIGATED] Release of PR 51 to put the parsoid and nginx containers on the same network is complete
2019-07-17 7:05 [INCIDENT ENDS] Visual editor is usable again Verified as both anon user and as a logged in user, with and without captcha

Incident Analysis

What went well? What went wrong? Where did we get lucky?
Both Vikrant and Shivam were online and they immediately jumped into the issue and started looking for ways to solve it We didn't know about the maintenance script runJobs.php or the MediaWiki job queue The release started around 3 am IST which is extremely low traffic time for us

Notes / Discussion

Links to related documentation / steps that could have been taken which would have mitigated the problem