Datamation content and product recommendations are
editorially independent. We may make money when you click on links
to our partners.
Learn More
It seems like just about every day brings with it a new cloud storage product announcement from vendors big and small, but the reality is that beyond enterprise firewalls, cloud storage’s potential is limited.
There are two reasons for this: bandwidth limitations and the data integrity issues posed by the commodity drives that are typically used in cloud services. Together those two issues will limit what enterprise data storage users can do with external clouds.
Cloud Challenges
The ideal for cloud storage is to be self-managed, self-balanced and self-replicated, with regular data checksums to account for the undetectable or mis-corrected error rates of various storage technologies. Cloud storage depends on being able have multiples copies of files managed and checksummed and verified regularly, distributed across the storage cloud for safekeeping.
It’s a great idea, but it faces more than a few challenges, such as reliability, security, data integrity, power, and replication time and costs. But one of the biggest issues is simply that hardware is going to break. There are two ways disk and tape drives break:
- Hitting the hard error rate of the media, which is expressed in average number of bits before an error occurs
- Hitting the Annualized Failure Rate (AFR) of a device based on the number of hours used
The most common type of failure is known as the vendor’s bit error rate. The bit error rate is the expected failures per number of bits moved. The following is generally what is published by vendors:
| Device |
Hard Error Rate in Bits |
| Consumer SATA Drives |
1 in 10E14 |
| Enterprise SATA Drives |
1 in 10E15 |
| Enterprise FC/SAS Drives |
1 in 10E16 |
| LTO Tape |
1 in 10E17 |
| T10000B Tape |
1 in 10E19 |
|
These seem like good values, but it is important to note that they haven’t improved much in the last 10 years, maybe by an order of magnitude, while densities have soared and performance has increased moderately. This will begin to cause problems as the gaps get worse in the future (see RAID’s Days May Be Numbered). So using vendors’ best-case number, how many errors will we see from moving data around, which is needed for replication in clouds?
| Errors Per Data Moved |
1PB |
10PB |
40PB |
100PB |
| 1TB Consumer SATA |
9.007 |
90.07 |
360.288 |
900.720 |
| 1TB Enterprise SATA |
0.901 |
9.007 |
36.029 |
90.072 |
| 600GB FC/SAS |
0.090 |
0.901 |
3.603 |
9.007 |
| LTO-4/TS1130 |
0.009 |
0.090 |
0.360 |
0.901 |
| T10000B |
0.000 |
0.001 |
0.004 |
0.009 |
|
Clearly, moving 100PB on even 1TB enterprise drives can potentially cause significant loss of data, especially as many clouds I am familiar with do not use RAID and maintain data protection via mirroring. Remember, this is a perfect world and does not include channel failures, memory corruptions and all the other types of hardware failures and silent corruption. What happens if the world is not perfect and failure rates are an order of magnitude worse?
| Errors Per Data Moved |
1PB |
10PB |
40PB |
100PB |
| 1TB Consumer SATA |
90.072 |
900.720 |
3602.880 |
9007.199 |
| 1TB Enterprise SATA |
9.007 |
90.072 |
360.288 |
900.720 |
| 600GB FC/SAS |
0.901 |
9.007 |
36.029 |
90.072 |
| LTO-4/TS1130 |
0.090 |
0.901 |
3.603 |
9.007 |
| T10000B |
0.001 |
0.009 |
0.036 |
0.090 |
|
With current technology, you could lose 900TB of data, which is not trivial and would take some time to replicate.
Bandwidth Limits Replication
Now let’s look at the time to replication with various Internet connection speeds and data volumes.
| Network |
Data Rate |
Days to Replicate 1PB |
Days to Replicate 10PB |
Days to Replicate 40PB |
Days to Replicate 100PB |
| OC-3 |
155 Mbits/sec |
802 |
8018 |
32,071 |
80,178 |
| OC-12 |
622 Mbits/sec |
200 |
1998 |
7992 |
19,980 |
| OC-48 |
2.5 Gbits/sec |
51 |
506 |
2023 |
5057 |
| OC-192 |
10 Gbits/sec |
13 |
126 |
506 |
1264 |
| OC-384 |
19.9 Gbits/sec |
3 |
32 |
126 |
316 |
| OC-768 |
39.8 Gbits/sec |
1 |
8 |
32 |
79 |
|
Clearly, no one has an OC-768 connection, nor are they going to get one anytime soon, and very few have 100PB of data to replicate into a cloud, but the point is that data densities are growing faster than network speeds. There are already people talking about 100PB archives, but they don’t talk about OC-384 networks. It would take 10 months to replicate 100PB with OC-384 in the event of a disaster, and who can afford OC-384? That’s why, at least for the biggest enterprise storage environments, a centralized disaster recovery site that you can move operations to until everything is restored will be a requirement for the foreseeable future.
Consumer Problem Looming?
The bandwidth problem isn’t limited to enterprises. In the next 12 to 24 months, most of us will have 10Gbit/sec network connections at work (see Falling 10GbE Prices Spell Doom for Fibre Channel), while at home the fastest connect available as the current backbone of the Internet is OC-768, and each of us internally is going to have a connection that is 6.5 percent of OC-768. That will be limited, of course, by our DSL and cable connections, but their performance is going to grow and use up the backbone bandwidth. This is pretty shocking when you consider how much data we create and how long it takes to move it around. I have a home Internet backup service and about 1TB of data at home. It took me about three months to get all of the data copied off site via my cable connection, which was the bottleneck. If I had a crash before the off-site copy was created, I would have lost data.
Efforts like Internet2 may help ease the problem, but I worry that we are creating data faster than we can move it around. The issue becomes critical when data is lost — through human error, natural disaster or something more sinister — and must be re-replicated. You’d have all this data in a cloud, with two copies in two different places, and if one copy goes poof for whatever reason, you’d need to restore it from the other copy or copies, and as you can see that is going to take a very long time. During that time, you might only have one copy, and given hard error rates, you’re at risk. You could have more than two copies — and that’s probably a good idea with mission-critical data — but it starts getting very costly.
Google, Yahoo and other search engines for the most part use the cloud method for their data, but what about all the archival sites that already have 10, 20, 40 or more petabytes of storage data that is not used very often? There are already a lot of these sites, whether it is medical data, medical images or genetic data, or sites that have large images such as climate sites or the seismic data that is used for oil and gas exploration, and, of course, all the Sarbanes-Oxley data that is required to be kept in corporate America. Does it make sense to have all of this data online? Probably not, and the size of the data and the cost of power will likely be the overriding issues.
Henry Newman, CTO of Instrumental Inc. and a regular Enterprise Storage Forum contributor, is an industry consultant with 28 years experience in high-performance computing and storage.
Article courtesy of Enterprise Storage Forum.
-
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
-
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
-
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
-
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
-
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
-
Top 10 AIOps Companies
FEATURE | By Samuel Greengard,
November 05, 2020
-
What is Text Analysis?
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
-
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
-
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
-
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
-
Top 10 Chatbot Platforms
FEATURE | By Cynthia Harvey,
October 07, 2020
-
Finding a Career Path in AI
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
-
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
-
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
-
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
-
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
-
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
-
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
-
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
-
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
SEE ALL
ARTICLES