RSS

'No BS Host' Gandi Reports Major incident In Hosting Infrastructure

Fri, 10th January 2020, 12:27

Around 300 clients of France based web hoster Gandi.net are not impressed by a data-destroying storage meltdown on January 8th.The hosting provider has disclosed in a post of the incident that it had lost some customer data after a ZFS storage box in Luxembourg broke down and had to be replaced using a backup. Efforts to restore the data, however, failed, and there were no snapshots available to recover.

The storage unit became unavailable, prompting an interruption in service for all PaaS and IaaS services using the disk associated with that unit.

We followed the established procedures: move the control of data to an emergency machine. We inform customers impacted by the incident by email.

The data import on the emergency machine was not possible due to a corruption of the meta-data that we are not aware of the cause of.

The Gandi post went on to say it was conducting a full postmortem of the incident that would yield further details on what exactly went wrong. Its techies are still trying to recover the lost data, and thus far have had no luck.

While the loss of data without a viable backup was bad enough, President and CEO Stephan Ramoin intervention in a Twitter thread between angry clients and the company did nothing to quell the fray.

Gandi Ceo Stephan Ramoin clients are responsible for backing up data 

True but when the client does that and follows the recommendations from Gandi...

client responds that Gandi should place backups on Gandi servers

Ouch and Touche’ in the same sentence! The company's documentation clearly shows users are able to create back-up copies of volumes on a Gandi Cloud server via Snapshots on a scheduled basis.

A Gandi spokesperson is now stating:

"We now have some hope that we may recover the data but as we can't confirm it at the moment, customers who needed or need an immediate recovery should use their own backups, as was our initial recommendation."

And how to prevent this from happening again?

"We will certainly give a lot of thought to this question when we complete a full post mortem of the incident, but at this moment our teams are all still focused on restoring customer data.

"In particular, we'll be looking at what improvements can be made to our recovery time, our documentation, and our communications."

 Affected individual can further vent by leaving a review of their experiences here.