TodaysMeet has had a few unexpected issues in the past month, and I want to let you know what I'm doing about them.
The interruptions have had a number of unrelated causes, which means they must be dealt with in a number of different ways.
Last week's catastrophic failure was probably unpreventable. Failures of this magnitude are extremely rare. However, the downtime could have been much shorter, and most of the loss of data could have been prevented.
I am pre-empting other work to take steps to make sure that data is continuously backed up—instead of periodically, which means data between backups can be lost—so that it can be restored quickly and with no, or extremely little, loss. The backups will be completely independent from TodaysMeet's main infrastructure. No one provider should be a single point of failure.
TodaysMeet's primary datacenter provider experienced a number of Distributed Denial of Service (DDoS) attacks at the end of December and beginning of January. TodaysMeet was not specifically targeted, but was still affected.
Since TodaysMeet is all about real-time communication, the common options (like Content Delivery Networks, or CDNs) don't work to prevent cases like this.
In the short term, TodaysMeet will be able to take advantage of the continuous backups in the event of an extended DDoS attack against our provider. It won't be instant, but if an attack seems likely to continue impacting the site, it will be possible to quickly set up infrastructure in a secondary provider to temporarily provide service. I'm also looking at spreading things like DNS services across multiple providers to keep them from failing together.
In the long term, as TodaysMeet evolves, agility and robustness in the face of these sorts of attacks are explicit goals.
There have been a handful of brief, partial interruptions over the past few months. Most of these have recovered by themselves, and many have not left enough data behind to fully diagnose and fix the issues.
I'm taking time to improve the monitoring and alerting tools I have in place so that I can restore service quickly and get more information about the underlying issues to fix them.
Over the past few days, this blog has been slow and occasionally completely unavailable. It's critical that I always have options for communicating with you. That's why I have external uptime, status, and social media pages.
Now this blog is also running on externally hosted infrastructure. While it was already independent of TodaysMeet's main site, now it is also backed by an external team. I'm happy to pay someone else to make sure the blog is available and fast, so I can concentrate on making TodaysMeet itself better and more reliable.
I am so privileged and thankful that so many teachers have made TodaysMeet part of their classrooms, and that comes with responsibility that I take very seriously.
Thank you for trusting TodaysMeet.