How an ‘Unprecedented’ Google Cloud Event Wiped Out a Major Customer’s Account
“[A]ccording to UniSuper’s incident log, downtime started May 2, and a full restoration of services didn’t happen until May 15.”
UniSuper, an Australian pension fund that manages $135 billion worth of funds and has 647,000 members, had its entire account wiped out at Google Cloud, including all its backups that were stored on the service… UniSuper’s website is now full of must-read admin nightmare fuel about how this all happened. First is a wild page posted on May 8 titled “A joint statement from UniSuper CEO Peter Chun, and Google Cloud CEO, Thomas Kurian….” Google Cloud is supposed to have safeguards that don’t allow account deletion, but none of them worked apparently, and the only option was a restore from a separate cloud provider (shoutout to the hero at UniSuper who chose a multi-cloud solution)… The many stakeholders in the service meant service restoration wasn’t just about restoring backups but also processing all the requests and payments that still needed to happen during the two weeks of downtime.
The second must-read document in this whole saga is the outage update page, which contains 12 statements as the cloud devs worked through this catastrophe. The first update is May 2 with the ominous statement, “You may be aware of a service disruption affecting UniSuper’s systems….” Seven days after the outage, on May 9, we saw the first signs of life again for UniSuper. Logins started working for “online UniSuper accounts” (I think that only means the website), but the outage page noted that “account balances shown may not reflect transactions which have not yet been processed due to the outage….” May 13 is the first mention of the mobile app beginning to work again. This update noted that balances still weren’t up to date and that “We are processing transactions as quickly as we can.” The last update, on May 15, states, “UniSuper can confirm that all member-facing services have been fully restored, with our retirement calculators now available again.”
The joint statement and the outage updates are still not a technical post-mortem of what happened, and it’s unclear if we’ll get one. Google PR confirmed in multiple places it signed off on the statement, but a great breakdown from software developer Daniel Compton points out that the statement is not just vague, it’s also full of terminology that doesn’t align with Google Cloud products. The imprecise language makes it seem like the statement was written entirely by UniSuper.
Thanks to long-time Slashdot reader swm for sharing the news.
Read more of this story at Slashdot.