Downtime Report
This is a page where I will discuss any downtime of services that I provide. A report will have the time, affected services, reason/cause, actions taken and remedial actions. I will do my best to make sure that any reports come out as soon as possible after a downtime event takes place. Currently all times will have a ish factor of 5 minutes because I use Uptime Robot and it checks every 5 minutes.
Downtime on March 3rd, 2019
- Time:
- Down: 15:21 UTC (~ 5 minutes)
- Up: 15:38 UTC (Manual Check). Reported back up at 15:40 UTC by Uptime Robot.
- Total Time: 17 minutes down
- Affected Services: blog, failover webserver, hidden services web site, tor exit.
- Reason/Cause: I believe the cause to be that I was running a program to generate a custom onion v3 address and it is CPU heavy. This caused nginx to not have the resources needed to serve web content. When I tried to connect with VNC and SSH to take a look it would not connect with SSH and with VNC it would not allow me to type anything in so I could not login. This let me to believe that the server was overloaded.
- Actions Taken: I first tried to preform a graceful shutdown of the server. However after a few minutes the server never shutdown. I then forced a power off. This brought the server offline. Once it booted back up I was able to login and the services came back online.
- Remedial Actions: I am going to limit the amount of threads that the generating program is allowed to use. If it causes downtime again, then I will abandon it and find another way to generate an address.
Downtime on March 4th, 2019
- Time:
- Down: 10:32 UTC (~ 5 minutes)
- Up: 13:28 UTC (Manual Check). Reported back at 13:29 UTC by Uptime Robot.
- Total Time: 2 hours and 55 minutes. (I was asleep when it went down)
- Affected Services: blog, failover webserver, hidden service web site, tor exit.
- Reason/Cause: I suspect that the program running to generate a custom v3 address caused CPU overload. It had the same symptoms as the previous downtime on March 3rd.
- Actions Taken: I forced a restart because it would not respond otherwise.
- Remedial Actions: I am abandoning the attempt to generate an address as it is only causing issues. I will see if I can find another to generate an address on less critical infrastructure.
Downtime on March 6th, 2019
- Time:
- Down: 9:33 UTC (~ 5 minutes)
- Up: 13:48 UTC (Manual Check). Reported back at 13:48 UTC by Uptime Robot.
- Total Time: 4 hours and 14 minutes. (I was asleep when it went down)
- Affected Services: blog, failover webserver, hidden service web site, tor exit.
- Reason/Cause: It turns out that I was wrong about the program generating a v3 address. It is actually tor that is taking about 95% of the CPU. Specifically, the process that was running the tor exit node.
- Actions Taken: I forced a restart because it would not respond otherwise.
- Remedial Actions: For now I am stopping my service as a tor exit node so that I can research a way to limit the resources that it can take and hopefully I can continue to run it.