Permanent data storage

We need to focus our efforts on finding more permanent ways to store data. What we have now is inadequate. Hard drives are susceptible to failure, data corruption and data erasure (see effects of EM pulses for example). CDs and DVDs become unreadable after several years and archival-quality optical media also stops working after 10-15 years, not to mention that the hardware itself that reads and writes to media changes so fast that media written in the past may become unreadable in the future simply because there’s nothing to read it anymore. I don’t think digital bits and codecs are a future-proof solution, but I do think imagery (stills or sequences of stills) and text are the way to go. It’s the way past cultures and civilizations have passed on their knowledge. However, we need to move past pictographs on cave walls and cuneiform writing on stone tablets. Our data storage needs are quite large and we need systems that can accommodate these requirements.

We need to be able to read/write data to permanent media that stores it for hundreds, thousands and even tens of thousands of years, so that we don’t lose our collective knowledge, so that future generations can benefit from all our discoveries, study us, find out what worked and what didn’t.

We need to find ways to store our knowledge permanently in ways that can be easily accessed and read in the future. We need to start thinking long-term when it comes to inventing and marketing data storage devices. I hope this post spurs you on to do some thinking of your own about this topic. Who knows what you might invent?

Advertisements

A comparison of CrashPlan and Backblaze

I’ve been a paying CrashPlan customer since 2012 and my initial backup still hasn’t finished. I’ve been a paying Backblaze customer for less than a month and my initial backup is already complete. 

I’m not a typical customer for backup companies. Most people back up about 1 TB of data or less. The size of my minimum backup set is about 9 TB. If I count all the stuff I want to back up, it’s about 12 TB. And that’s a problem with most backup services.

First, let me say this: I didn’t write this post to trash CrashPlan. Their backup service works and it’s worked well for other members of my family. It just hasn’t worked for me. This is because they only offer a certain amount of bandwidth to each user. It’s called bandwidth throttling and it saves them money in two ways: (1) they end up paying less for their monthly bandwidth (which adds up to a lot for a company offering backup services) and (2) they filter out heavy users like me, who tend to fill up a lot of their drives with unprofitable data. My guess (from my experience with them) is that they throttle heavy users with large backup sets much more than they throttle regular users. The end result of this bandwidth throttling is that, even though I’ve been a customer since 2012 — at first, I was on the individual backup plan, then I switched to the family plan — my initial backup never completed and I was well on track to never completing it.

When I stopped using CrashPlan’s backup services, out of the almost 9 TB of data that I need to back up constantly, I had only managed to upload 0.9 TB in FOUR YEARS. Take a moment and think about that, and then you’ll realize how much bandwidth throttling CrashPlan does on heavy users like me.

Screen Shot 2016-10-20 at 23.37.07.png
After four years of continuous use, I backed up a grand total of 905.7 GB to CrashPlan

To be exact, counting the various versions of my data that had accummulated on the CrashPlan servers in these four years, I had a total of 2.8 TB stored on their servers, but even if you count that as the total, 2.8 TB in FOUR YEARS is still an awfully small amount.

Screen Shot 2016-10-27 at 00.42.14.png
Space used on CrashPlan’s servers: 2.8 TB

Tell me honestly, which one of you wants this kind of service from a backup company? You pay them for years in a row and your initial backup never finishes? If a data loss event occurs and your local backup is gone (say a fire, flood or burglary), you’re pretty much screwed and you’ll only be able to recover a small portion of your data from their servers, even though you’ve been a faithful, paying customer for years… That just isn’t right.

I talked with CrashPlan techs twice in these fours years about this very problematic data throttling. Given that they advertise their service as “unlimited backup”, this is also an ethical issue. The backup isn’t truly unlimited if it’s heavily throttled and you can never back up all of your data. The answer was the same both times, even the wording was the same, making me think it was scripted: they said that in an effort to keep costs affordable, they have to limit the upload speeds of every user. The first time I asked them, they suggested their Business plan has higher upload speeds, so in other words, they tried to upsell me. During both times, they advertised their “seed drive service”, which was a paid product (they stopped offering it this summer). The gist of their paid service was that they shipped asking customers a 1 TB drive so you could back up to it locally, then send it to them to jumpstart the backup. Again, given my needs of backing up at least 9 TB of data, this wasn’t a userful option.

Screen Shot 2016-10-31 at 15.57.25.png
This is false advertising
Screen Shot 2016-10-31 at 15.59.41.png
This is also false advertising

Some of you might perhaps suggest that I didn’t optimize my CrashPlan settings so that I could get the most out of it. I did. I tried everything they suggested in their online support notes. In addition to tricking out my Crashplan install, my computer has been on for virtually all of the last four years, in an effort to help the Crashplan app finish the initial backup, to no avail.

Another thing that bothered me about CrashPlan is that it would go into “maintenance mode” very often, and given the size of my backup set, this would take days, sometimes weeks, during which it wouldn’t back up. It would endlessly churn through its backup versions and compare them to my data, pruning out stuff, doing its own thing and eating up processor cycles with those activities instead of backing up my data.

Screen Shot 2016-10-22 at 19.40.33.png
Synchronizing block information…
Screen Shot 2016-10-23 at 14.39.36.png
Compacting data… for 22.8 days…
Screen Shot 2016-10-23 at 16.58.23.png
Maintaining backup files…

I understand why maintenance of the backups is important. But what I don’t understand is why it took so long. I can’t help thinking that maybe the cause is the Java-based backup engine that CrashPlan uses. It’s not a Mac-native app or a Windows-native app. It’s a Java app wrapped in Mac and Windows app versions. And most Java apps aren’t known for their speed. It’s true, Java apps could be fast, but the developers often get lazy and don’t optimize the code — or that’s the claim made by some experts in online forums.

Another way to look at this situation is that CrashPlan has a “freemium” business model. In other words, their app is free to use for local (DAS or NAS) backup or offsite backup (such as to a friend’s computer). And one thing I know is that you can’t complain about something that’s given freely to you. If it’s free, you either offer constructive criticism or you shut up about it. It’s free and the developers are under no obligation to heed your feedback or to make changes because you say so. As a matter of fact, I used CrashPlan as a free service for local backup for a couple of years before I started paying for their cloud backup service. But it was only after I started paying that I had certain expectations of performance. And in spite of those unmet expectations, I stuck with them for four years, patiently waiting for them to deliver on their promise of “no storage limits, bandwidth throttling or well-engineered excuses”… and they didn’t deliver.

Here I should also say that CrashPlan support is responsive. Even when I was using their free backup service, I could file support tickets and get answers. They always tried to resolve my issues. That’s a good thing. It’s important to point this out, because customer service is an important aspect of a business in the services industry — and online backups are a service.

About three weeks ago, I was talking with Mark Fuccio from Drobo about my issues with CrashPlan and he suggested I try Backblaze, because they truly have no throttling. So I downloaded the Backblaze app (which is a native Mac app, not a Java app), created an account and started to use their service. Lo and behold, the 15-day trial period wasn’t yet over and my backup to their servers was almost complete! I couldn’t believe it! Thank you Mark! 🙂

I optimized the Backblaze settings by allowing it to use as much of my ISP bandwidth as it needed (I have a 100 Mbps connection), and I also bumped the number of backup threads to 10, meaning the Backblaze app could initiate 10 separate instances of itself and upload all 10 instances simultaneously to their servers. I did have to put up with a slightly sluggish computer during the initial backup, but for the first time in many years, I was able to back up all of my critical data to the cloud. I find that truly amazing in and of itself.

Screen Shot 2016-10-14 at 21.36.27.png
This is what I did to optimize my Backblaze installation

As you can see from the image above, I got upload speeds over 100 Mbps when I optimized the backup settings. During most of the days of the initial upload, I actually got speeds in excess of 130 Mbps, which I think is pretty amazing given my situation: I live in Romania and the Backblaze servers are in California, so my data had to go through a lot of internet backbones and through the trans-Atlantic cables.

The short of it is that I signed up for a paid plan with Backblaze and my initial backup completed in about 20 days. Let me state that again: I backed up about 9 TB of data to Backblaze in about 20 days, and I managed to back up only about 1 TB of data to CrashPlan in about 4 years (1420 days). The difference is striking and speaks volumes about the ridiculous amount of throttling that CrashPlan puts in place for heavy users like me.

I also use CrashPlan for local network backup to my Drobo 5N, but I may switch to another app for this as well, for two reasons: it’s slow and it does a lot of maintenance on the backup set and because it doesn’t let me use Drobo shares mapped through the Drobo Dashboard app, which is a more stable way of mapping a Drobo’s network shares. CrashPlan refuses to see those shares and requires me to manually map network shares, which isn’t as stable a connection and leads to share disconnects and multiple mounts, which is something that screws up CrashPlan. I’m trying out Mac Backup Guru, which is a Mac-native app, is pretty fast and does allow me to back up to Drobo Dashboard-mapped shares. If this paragraph doesn’t make sense to you, it’s okay. You probably haven’t run into this issue. If you have, you know what I’m talking about.

Now, none of this stuff matters if you’re a typical user of cloud backup services. If you only have about 1 TB of data or less, any cloud backup service will likely work for you. You’ll be happy with CrashPlan and you’ll be happy with their customer service. But if you’re like me and you have a lot of data to back up, then a service like Backblaze that is truly throttle-free is exactly what you’ll need.

The value of a good backup

While working on the fifth episode of RTTE, I learned first hand the value of a good backup. The hard drive on my editing computer (my MacBook Pro) died suddenly and without warning. Thankfully, my data was backed up in two geographically different locations.

The day my hard drive died, I’d just gotten done with some file cleanups, and was getting ready to leave for a trip abroad. I shut down my computer, then realized I needed to check on a couple things, and booted it up again, only this time, it wouldn’t start. I kept getting a grey screen, meaning video was working, but it refused to boot into the OS. And I kept hearing the “click of death” as the hard drive churned. I tried booting off the Snow Leopard DVD, but that didn’t work either. I’d tested the hard drive’s SMART status just a couple of weeks before, and the utility had told me the drive had no problems whatsoever.

I had reason to worry for a couple of reasons:

  1. The laptop refused to boot up from the OS X DVD, potentially indicating other problems than a dead hard drive. I do push my laptop quite a bit as I edit photos and video, and I’d already replaced its motherboard once. I was worried I might have to spend more than I wanted to on repairs.
  2. All of the footage for the fifth episode of RTTE was on my laptop. Thankfully, it was also backed up in a couple of other places, but still, I hadn’t had reason to test those backups until now. What if I couldn’t recover it?

I had no time for further troubleshooting. I had to leave, and my laptop was useless to me. I left it home, and drove away, worried about what would happen when I returned.

A week later, I got home and tried to boot off the DVD again. No luck. I had to send it in, to make sure nothing else was wrong. In Romania, there’s only one Apple-authorized repair shop. They’re in Bucharest, and they’re called Noumax. I sent it to them for a diagnosis, and a couple of days later, I heard back from them: only the hard drive was defective, from what they could tell.

I was pressed for time. I had to edit and release the fifth episode of RTTE, and I also had to shoot some more footage for it. I didn’t have time to wait for the store to fix the laptop, so I asked them to get it back to me, while I ordered a replacement hard drive from an online store with fast, next-day delivery (eMag).

The hard drive and the laptop arrived the next day. I replaced the hard drive, using this guide, and also cleaned the motherboard and CPU fans of dust, then restored the whole system from the latest Time Machine backup. This meant that I got back everything that was on my laptop a few hours before it died.

I’d have preferred to do a clean OS install, then install the apps I needed one by one, then restore my files, especially since I hadn’t reformatted my laptop since I bought it a few years ago, but that would have been a 2-3 day job, and I just didn’t have the time. Thankfully, OS X is so stable that even a 3-year old install, during which I installed and removed many apps, still works fairly fast and doesn’t crash.

Some might say, what’s the big deal? The laptop was backed up, and you restored it… whoopee… Not so fast, grasshopper! The gravity of the situation doesn’t sink in until you realize it’s your work — YEARS of hard work — that you might have just lost because of a hardware failure. That’s when your hands begin to tremble and your throat gets dry, and a few white hairs appear instantly on your head. Even if the data’s backed up (or so you think) until your data’s restored and it’s all there, you just don’t know if you can get it back.

I’ve worked in IT for about 15 years. I’ve restored plenty of machines, desktops and servers alike. I’ve done plenty of backups. But my own computer has never gone down. I’ve never had a catastrophic hardware failure like this one until now. So even though I’ve been exposed to this kind of thing before, I just didn’t realize how painful it is until now. And I didn’t appreciate the value of a good backup until now.

So, here’s my advice to you, as if you didn’t hear it plenty of times in the past… BACK UP YOUR COMPUTER!

If you have a Mac, definitely use Time Machine. It just works. It’s beautifully simple. I’ve been backing up my laptop with Time Machine to the same reliable drive for years. It’s this little LaCie hard drive.

But the LaCie drive might fail at some point, which is why I also back up my data with CrashPlan. For this second backup, I also send my data to a geographically-different location. Since we live in Romania these days, I back up to my parents’ house in the US, where the backup gets stored on a Drobo. And the backup is also encrypted automatically by CrashPlan, which means it can’t be intercepted along the way.

It’s because of my obsessive-compulsive backup strategy that I was able to recover so quickly from the hardware failure. Thankfully, these days backups are made so easy by software like Time Machine and CrashPlan that anyone can keep their work safe. So please, back up your data, and do it often!

One more thing. You know the old saying, every cloud has a silver lining? It was true in my case. When I ordered the new drive for my laptop, I was able to upgrade from its existing 250GB SATA hard drive with an 8MB buffer and 5400 rpm to a spacious 750GB SATA hard drive with a 32MB buffer and 7200 rpm, which means my laptop now churns along a little faster, and has a lot more room for the 1080p footage of my shows. 🙂

Save the data!

Some of the most important technology programs that keep Washington accountable are in danger of being eliminated. Data.gov, USASpending.gov, the IT Dashboard and other federal data transparency and government accountability programs are facing a massive budget cut, despite only being a tiny fraction of the national budget.

Help save the data and make sure that Congress doesn’t leave the American people in the dark.

What’s next in data storage?

My recent musings on high definition and the state of the technology behind it have spurred me to think about data storage (not that it’s a new subject for me). But so far, I’ve commented only on what’s already been developed, and didn’t take the time to think about what’s next.

What’s the motivation behind this post? It’s simple. For Ligia’s Kitchen, it costs me about 10.5 GB for 5 minutes of final, edited footage of show, with a one-camera setup. What goes into the 10.5GB? There’s the raw footage (and sound files, if I use a standalone mic), the edits, and the final, published footage. When I use two cameras, the space needed can easily go up by 1.5-2.5x, depending on the shots I need to get. I shoot and edit in 1080p, and output to 720p.

My storage needs are okay for now. I’ve got plenty of space, and if I keep going at this rate, I should be fine. But… and there’s always a but, isn’t there… I have more show ideas in mind. And there’s the hypothetical possibility of shooting with a RED camera at some point in the future, if certain factors come together to allow it. So I’m thinking ahead.

Current hard drive technology (bits of data on disks) has certainly come a long way. Those of us who’ve been in the business long enough know what prices used to be like for capacities that are laughable by today’s standards. Back in 1999, I paid $275 for a 27GB hard drive. My laptop’s drive in college could store a grand total of 120MB. And when I began to learn programming, I’d load the code into memory from tape…

I remember being really excited about Hitachi’s new Perpendicular Magnetic Recording Technology, which came out in early 2006. They even had an animation on their website, which they’ve taken down since. That technology is behind all of the new hard drives that are on the market today, by the way. Hitachi came up with a way to get the bits of data to stand up (hence the term perpendicular) instead of lying down on hard drive platters, thus doubling the amount of data that could be stored onto them.

There are two roads ahead when it comes to data storage, of which one is more likely to succeed:

  • Optical storage (this is probably the future of storage)
  • Biological storage

Let’s first look at biological storage. One particular article made the rounds lately: researchers at the Chinese University in Hong Kong have managed to store 90GB of data in 1g of bacteria. While it sounds exciting, the idea of storing my data in petri dishes on my desk doesn’t readily appeal to me, and certain complications come up:

  • 1g of bacteria is about 10 million cells (that’s a LOT); one must start thinking about the potential for bio hazards when you work with bacteria.
  • The data is stored in a bacteria’s DNA, which means it’s encrypted (a good thing), but it’s also subject to significant mutation (a bad thing) and it takes a long time to retrieve it because you need a gene sequencer, which is tedious and expensive (a bad thing).

I’m not against this. Hey, if they can make it safe and fast, okay. But I believe this is going to be relegated to special applications. The article suggests the technique is currently used to store copyright information for newly created organisms (I wonder how many new bacteria researchers as a whole have created, and is it any wonder antibiotics have such a hard time working against them when we keep playing God). I also see this sort of data storage as a way for spies to operate, or for governments to keep certain secrets.

Okay, onto more cheery stuff, like optical storage. I’ve always thought there was massive potential here, and am glad to see significant work has already been done to make this a reality. There are two technologies which are feasible, according to research that’s already been done:

  • HDSS (Holographic Data Storage Systems), which so far can store up to 1TB of data in a crystal the size of a sugar cube, but doesn’t yet allow rewrites
  • 3D optical data storage, which so far can store up to 1TB of data onto a 1.2mm thick optical disc

These developments are very encouraging. Optical storage is safe, and its potential capacities are huge, possibly endless. And when you think about computer hardware, and how manufacturers are looking at using optical technology in the bridges and buses and wires inside the hardware, because it’s incredibly fast, you start to see how optical makes sense. Let’s also not forget fiber optic cabling, and its incredible capacity to carry data. It certainly looks like optical is the future!

So what’s going to happen to the standard 3.5″ form factor of today’s hard drives? Well, it’s likely that it will stay the same, even though it the storage technology inside it might change. We’ll have crystals and lasers instead of platters and heads, but they’ll likely be able to fit them in there somehow. I don’t think we’ll need to start keeping crystal libraries on our desks, like in Superman’s Crystal Cave, and sticking various-sized crystals into our computers any time soon, although it did look pretty cool when Christopher Reeve did it in the movie.

It really all depends on how soon this new technology will come to market. Right now, there’s clearly enough vested interest in the 3.5″ and 2.5″ form factors to motivate drive manufacturers to shoehorn the new technologies into those shapes, but if optical hard drives won’t be here for the next 5-10 years, then it’s possible that the form factor will change as well. We are after all moving to smaller, sleeker shapes for most computers, notebooks and desktops alike.

CrashPlan works for transatlantic backups

Updated 11/01/16: I’ve revised my opinion of CrashPlan. See here for the details.

Last week, I wrote an article called “What’s On Your Drobo“, and in it, I mentioned that I was going to try to use an app called CrashPlan to do backups from my photo library in Romania to my backup location in the US. I’m happy to say that it works as expected, and no, this isn’t an April Fool’s joke. Here’s a screenshot of an active backup. At the time, I was getting 2.7 Mbps throughput.

There is a bandwidth bottleneck somewhere, though I’m not sure where it is. My broadband connection in Romania sits at 30 Mbps up and down, as I mentioned here, and my parents’ broadband connection clocks in around 16 Mbps down and 4 Mbps up. Theoretically, since I’m uploading and they’re downloading, I should be getting at least 15 Mbps, but I’m not. So it looks like there’s either a bottleneck as my data exits Romania, or as it goes through the transatlantic fiber optic cables. If someone can chime in on this, I’d love to find out more. I do know that I hit that same 2.5 Mbps ceiling as I upload to SmugMug, YouTube and blip.tv.

Bottlenecks aside, I’m just happy I can do off-site backups, and at least given my current setup, it’s free! CrashPlan works as advertised! I have to admit I was a skeptic when I downloaded it and installed it. I figured it would work on the local network, which is where I did the initial backups, but it would surely run into some firewall issues when I tried it from another location. Nightmares of re-configuring my parents’ firewall remotely flashed before my eyes… Amazingly enough, I didn’t have to do any of that! It just works!

So, if you’re interested in doing this sort of thing, download CrashPlan (it’s multi-platform), install it on both computers where you want to use it, configure it (use the help section), test it, then let it do its thing!

One thing I need to mention is that if one of the computers falls asleep, the backup will be paused until it wakes up. Even though I set my parents’ iMac to wake up for network traffic, CrashPlan doesn’t seem to be able to wake it up when I try to start the backup from my end. Keep that in mind and plan your backups accordingly.

What’s on your Drobo?

The folks at Data Robotics put together a short video that showcases Drobo owners talking about what they store on their Drobos, and asked their Twitter followers what’s on their units.

That got me thinking about what I store on my Drobos. I have four Drobos in total: three 1st Gen Drobo units (USB 2.0 only) and one 2nd Gen Drobo (Firewire 800 + USB 2.0). Perhaps that makes me a bit unusual. Most people have one or two units, not four. But there’s reason to the seeming excess.

For one thing, I have a huge photo library. (You can find the photos I edited and published here.) For another, my wife and I have a huge video library. These are movies and cartoons we had on VHS tapes, which we digitized, or on DVDs, which we archived for easy viewing, or TV shows and movies that we recorded from TV. We’re big fans of classic movies and cartoons from the 1920s, 30s and 40s, and we collect all the ones we like. We also digitized most of our old paperwork. My medical records are all digital. So are my dental records. So are a bunch of our other documents. I scanned all the stuff I could scan, and now when I need to look something up, it’s right there at my fingertips. I’ve also started shooting video more intensively this past year, in SD and HD. (You can find my published videos here or here.) All that stuff takes a fair amount of space — terabytes to be more precise. And to top off this whole list, we live our life on two continents (North America and Europe).

Here’s what I do to make sure I don’t lose my data:

  1. I keep a Drobo with my parents, at their place. On it, I store a backup of my photo library and our video library, along with their files. I back up my live photo library to it using CrashPlan, a piece of software that will let you back up your data to a friend’s machine. I’ve actually just started using it, and while I’ve been able to back up just fine with both machines on the same network, being able to do it from thousands of miles away will be a litmus test of the software’s capabilities. I’ll be sure to write about it if it proves workable. The video library gets backed up every once in a while in a pretty simple manner: I carry movies and videos to them on a hard drive and copy them onto the Drobo. Updated 4/21/10: CrashPlan does indeed work as advertised!
  2. I keep two Drobo units at our home. On one of them, I keep our video library, and an extensive, historical file archive. On another, I keep a mirror copy of my live photo library, which is currently stored on a WD Studio drive, because it’s smaller and easier to transport than a Drobo, and I do a fair bit of traveling. I mirror my photo library with an app called Synkron, which works great. I switched to the WD Studio when I started traveling extensively and realized the Drobo couldn’t always fit safely into my luggage. (Where oh where is that Drobo carrying case I wrote about last year?)
  3. I gave the fourth Drobo to my brother, who needed a solid data storage device to begin to archive his ever-growing library of ethnological videos. He’s a documentary filmmaker who travels around Romania studying and recording religious and secular customs, which are being forgotten and buried along with the old folks. He wants to preserve these things for posterity. You can learn more about what he does at his website, called ORMA.

So that’s how I use my Drobos. However, I’ll have another logistical issue to deal with in the near future. I’m running out of space on the WD Studio drive, which has 2 x 1 TB drives in it. I run it in RAID 1, and in another month or two, it’ll be completely full. I’ll need to start using one of my Drobo units as my primary photo editing/storage device again. This means I’ll shuffle all my data around once more. A possible new arrangement will see me using the 2nd Gen Drobo for the storage and editing of my photos and videos, and the other for the storage and retrieval of our video library and historical file archive, while the WD Studio drive will see some backup duty or be relegated for travel-only purposes.

The current drive distribution among the three Drobos I use actively is as follows:

  • 2 x 2 TB drives + 2 x 1 TB drives in the Drobo that stays with my parents
  • 4 x 1 TB drives and 4 x 500 GB drives, respectively, in the two Drobos that are with us
  • I can’t speak for my brother, but I believe he’s using 4 x 1 TB drives in his Drobo

I’d love to hear how you are using your Drobo. Perhaps you have some ideas for me?

Image and video used courtesy of Data Robotics. The 2nd Gen Drobo is available for purchase from Amazon or B&H Photo. The 1st Gen Drobo has been discontinued as of 2009. Be sure to also check out my reviews of the Drobo S, DroboPro and DroboElite.