The Netflix outage caused by Amazon Web Services was the result of accidental data deletion according to an official statement released by Amazon’s web services team.
Netflix customers were outraged when, for several hours on Christmas Eve, they lost access to the service.
The company revealed that the issue was caused when one of a very small number of developers with access to the company’s AWS’ Elastic Load Balancing Service deleted a portion of the ELB state data. When the ELB data was deleted, the ELB control plane experienced high latency, which in turn caused error rates for API calls used to manage the ELB load balances.
As developers at Amazon realized numerous API calls were being made, they attempted to find and correct those errors.
According to Amazon, hours into the technical fix the technical team realized the issue boiled down to ELB data issues and not API calls as they initially suspected. The Amazon team only recognized the real culprit after they turned their attention towards their system’s load balancers.
Amazon claims to have now put various protections in place to ensure that ELB and other important data can not be easily deleted. According to the company, the system now avoids accidental modification by requiring specific Change Management approval.
The company has also modified its data recovery process to better allow for errant system problems in a more efficient manner in the future.
Amazon has since apologized to Netflix for any problems its data error caused for the large scale video streaming service.