The network paths on the Internet that section.io used to communicate to some hosting partners suffered a high degree of packet loss. This packet loss occurred on the backbone of Australia's Internet, out of the control of section.io.
This packet loss meant that section.io could not connect to the hosting partners effectively, which resulted in slow page times, and possibly broken page loads.
section.io routed traffic around the problematic provider until the problem at NSW-IX was resolved.
We were notified of the incident by a customer report that their site was not loading properly.
During the early investigation, it was believed that this was a problem with the customer's hosting platform. We commenced troubleshooting and mitigation strategies to fix that specific customer.
After some time, we recognized that the problem was affecting multiple customers. We conducted a thorough review of our systems and found no problems.
We then collected a list of affected customers and worked to determine the commonality between those systems. The affected customers all used hosting providers with data centers in NSW. Since the problem was affecting multiple data centers, neither section.io or the hosting providers could be at fault.
We then examined the network paths that connect section.io's infrastructure, which runs on Amazon Web Services (AWS), to the customer data centers. All of the affected customers had data centers that connected to the Internet via an Internet Exchange called NSW-IX.
During our troubleshooting, we found that any traffic passing from AWS to a data center via the NSW-IX exchange point had problems.
With this understood, we started to find alternative paths from section.io's network to reach the data centers. Two strategies were implemented. Firstly, we would change DNS records to bypass section.io while new paths were established. Secondly, after new paths were established, DNS changes were made to route traffic onto these new paths.
We then worked with NSW-IX and our hosting partners to validate that the problems were resolved. We are currently restoring normal service fo all affected customers.
It is important to note that this is a low-level problem caused in the backbone of the Internet in NSW. section.io is not the only system to rely upon NSW-IX. This means websites that were not hosted on section.io were also affected by the problem.
There are areas for improvement that section.io has undertaken: