Page 1 of 1
October 10, 2009
...Data Rot
Sergey Brin wrote a great op-ed in the NYT regarding the Google Books settlement and the concept of a library that will last centuries. Here the link: http://bit.ly/MluIK (writing on an iPhone, so I can’t hyperlink that URL).
Sergey’s column is very insightful, but it makes one core, implicit assumption at the outset which I think is a bit controversial: he assumes that data rot is worse when the data is stored in a physical, analog form than when it is stored in a digital form. In other words, he assumes that when your high school English class final paper is stored digitally instead of in physical form, you are more likely to have the digital form than the physical form 10 years later.
This is not obvious to me, and often untrue in terms of the data I have kept over the years. I don’t know which form of data rot (physical or digital) is worse, but I do know geeks are prone to thinking that a digital solution is better simply by virtue of being digital.
Physical data rot is usually due to misplacement, destruction, natural disasters, spills, theft, or plain careless treatment. Copying physical data is harder than digital data, so physical data is typically not protected by geographic redundancy.
But, digital data is subject to all those same problems (because digital data is manifest physically in a laptop or hard drive which is subject to physical issues). Additionally, digital data is subject to viruses, magnetism, and general human error (Ever run rm -r on the wrong directory? I have.)
I trust Google to preserve data more than I trust the Library of Congress, but not solely because Google will work digitally. Data rot is a fun problem with often non-obvious loopholes, and I bet the Sys Architects at Google who get to solve it really enjoy their jobs.
1 note
-
david liked this
-
thegongshow
posted this
Please wait while my 