Ann Arbor Area Business Monthly
Small Business and the Internet

Big Data

October 2012

By Mike Gould

“Stuff expands to fill the available space” … Common Knowledge

And nothing expands faster than data, especially if you are a big corporation, science program, or movie company. Welcome to the world of big data, where the ones and zeroes meet the teras and petas.

Most of you out there can get by with a terabyte or two of storage. I’m a photographer and I come back from a shoot with so many large (in filesize) photos that they won’t fit on a DVD (which holds 7.4G). But if I was a cinematographer, I would be filling up three terabyte drives quite frequently.

Peta Zoo
Here’s a refresher on the nomenclature involved, with a bit of history tossed in:

Byte: eight bits, i.e., eight ones and/or zeroes, the smallest unit of digital information. This was originally the amount of mojo needed to encode a character of text, and has become the standard for measuring computer space. (Attn. Geeks: a gross simplification, I know. Deal with it.)
Kilobyte or KB: 1,000 bytes. One of these articles takes up around 47K of space on my one terabyte main hard drive. The first floppy disks held 175K.
Megabyte or MB: 1,000,000 bytes. See the pattern here? Each size up is 1,000 times bigger. My first hard drive held 20M (“20Megs”) and cost $700.
Gigabyte or GB: 1,000 Megabytes. This is pronounced with a hard “G”, Doc Brown from “Back to the Future” notwithstanding.
Terabyte or TB: 1,000 Gigabytes. Terabyte hard drives now cost less than $100. Next, things get silly large:
Peta, Exa, Zetta, and Yotta are the next in the series. I leave it to the reader to do the math. Serious amounts of ones and zeroes. Humorists have also suggested “Brontobyte” and “LOC – Library of Congress”, but cooler heads prevailed and we now have the above names, based on Greek words. “Giga” is Greek for “giant”, for instance (although Zeta derives from the French numeral sept, so named because Zeta is 1,000 to the seventh power).

A Lotta Bytes
In researching this article (and yes, I do research this stuff. I don’t just make it up. Well, some of it…) I came across an interesting article from Wired Magazine, June 2008. They summed things up thusly:

Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. This is interesting to me in that Wired was, as usual, prescient in covering this topic -but back then Terabytes were to be found in disk arrays, i.e., combinations of hard disks. Nowadays, a scant five years later, we have single drives that hold up to three TB. And you can get a 32GB flash drive for less than $20 on Amazon.

OK, so there is a lot of storage out there; what sort of stuff fills it up? Well, for example: traffic flow sensors, broadcast audio streams, output from social networks, web server logs, satellite imagery, MP3s of rap music, and on and on. The sum total of global civilization. And that’s just the general stuff; consider the sciences. The Large Hadron Collider cranks out 15 petabytes per year of data regarding very small things hitting each other really hard. On the opposite end of the scale, there is the Universe. Astronomers are attempting to catalog as many celestial bodies as possible, so their storage needs are astronomical as well.

Bigger Biz
And then there are the big businesses in the big data business: IBM, Google, Oracle, Amazon, Apple, Yahoo, and Facebook, to name a few of the bigger players. As our privacy vanishes into a haze of web tracking, data mining and such, our data is now taking up TBs of storage on the Amazon et al servers.

Big Data can also have a negative connotation due to the above. Remember the other “bigs” out there: Big Government, Big Oil and Big Brother, all of whom now have more access than ever to our buying habits, opinions, friends, favorite music and political beliefs. You can always avoid the sharing of your data by turning off your computer and living in a cave somewhere, but most of us like our Internet just fine, thank you.

If your lack of online privacy concerns you, you can join the Electronic Frontier Foundation, fighting the good fight to “Defend Your Rights in the Digital World”, as their motto has it.

Back to the business end of big data. Once you have all this information, how do you extract what you need of it from the morass of other data that is constantly streaming in? This is the pointy end of the stick that corporations are grappling with today. “I know this ginormous database I have will tell me how many people prefer the new TV ad over the radio spot, but how do I get to that information?”

Not surprisingly, other big businesses are happy to help you out with special software that will mine, interpret, and present you the answers you so desperately seek in your commercial quest.

O’Reilly Radar (Get it? Radar O’Reilly? MASH? O’Reilly publishes Make magazine and technical and programming books), a great tech blog, has a good summation, URL below. They talk about the Three V’s: From their article (January 11, 2012):

To clarify matters, the three Vs of volume, velocity and variety are commonly used to characterize different aspects of big data.

Volume is the amount of data, velocity is the rate at which it streams in, and variety is the different formats of data involved. Big data requires big software solutions, for big bucks. I’m kinda glad I’m a small businessman and don’t (yet) have to mess with this.

O’Reilly Radar:
http://radar.oreilly.com/2012/01/what-is-big-data.html

EFF:
https://www.eff.org

Mike Gould's data is measured in GigaBytes. He was a mouse wrangler for the U of M for 20 years, runs the MondoDyne Web Works/Macintosh Training/Digital Photography mega-mall, builds laser display devices, performs with the Illuminatus 2.2 Lightshow, and welcomes comments addressed to mgould@mondodyne.com.

MonodoDyne <M> The Sound of One Hand Clicking...
734 904 0659
Entire Site © 2016, Mike Gould - All Rights Reserved