Wednesday, June 12, 2013

Unstructured Data can create chaos

Unstructured Data can create chaos
June 12th, 2013

It seems that no matter where you go these days, twitter, your favorite tech blog, email newsletters, that everyone is talking about big data.  Let's face it, big data is a buzzword, or buzz term, that many technology professionals are being forced to address.  Most database guys like myself have been dealing with large amounts of data for years.  What was once kilobytes of data turned into megabytes, then gigabytes, then terabytes and now even beyond that.  We've dealt with this sizable data in a lot of different ways including table partitioning, regular archival and purging and the creation of data warehouses that are away from our regular transactional databases.  We've had the time to analyze what is coming into our databases so we can transform it into something useful.  This latest wave of "big data" is taking some of these approaches away from us for a couple of different reasons, velocity and volume. 

At some point the size of the data becomes just too big to handle and the speed at which it is coming at us is too quick for our systems to handle.  Now, fast forward to the wonderful world of unstructured data.  This world states that we really don't care what the data that comes in looks like we'll just store it.  Then after awhile we'll be able to do something useful with it.  But just how realistic is this approach?  As a database professional, I like to ensure data quality.  By introducing unstructured data into my world you've thrown a lot of my ability to ensure data quality out the window.  I can store it for you.  I might even be able to query a lot of it and produce useful insight from it but over time the data just becomes more and more difficult to manage.

For example, once I've traversed the last 2 years of web logs and created a dashboard of how often our customers go to each of our web pages do I keep the detail information just in case I might come up with a new way to traverse and create new business knowledge?  If I do keep it, do I tie back my new business knowledge to the rows of unstructured data for purposes of drill down?  In some shops this may be impossible.  My only real option may be to archive it because while I'm analyzing the bulk unstructured data that is stored let's not forget that all my current customers are quickly producing mounds of new data that I'll have to do something with sooner or later.

To be fair, vendors are giving us ways to deal with this data.  Newer, open source, database technologies such as noSQL and CouchDB (a derivative of NoSQL) are document based solutions.  The Hadoop File system (HFS) provides file based storage that is, in theory, easy to get to and designed to store bulk data.  Developers are slapping SQL like interfaces like Hive on top of HFS in order to facilitate those of us with SQL skills access to the data in these new systems.  But wait, if it is in fact truly unstructured, how do I know what I need?  If data is coming in from multiple sources and just dumping away into an open file system how do I make sense of it?

Well, this is where the database guys come back into the picture.  This is also why, in my opinion, the relational database management system is not going anywhere soon.  Extraction, transformation and loading (ETL) techniques from these large unstructured data sources will still need to be written in order to make the data into usable and business ready forms.  The data will need to be tied to valid business entities such as users, clients or customers or real assets such as servers and/or data centers.  Without the knowledge of what a piece of unstructured data is directly tied to it will be difficult, if not impossible to derive any real value from it.

Let's also not forget the big RDBMS guys like Oracle and Microsoft are adding big data features into their tool sets.  Either that or they are buying companies that already have tools and creating hooks back into their flagship products.  SQL Server has introduced PolyBase with its 2012 iteration that promises to tie unstructured big data with its relational counterparts.  Oracle has its own NoSQL database and a fully configured Big Data appliance that is ready to capture your organization's data.  These are all relevant and good approaches to the problem but without data order there is data chaos. 

You, and your team, must take a systematic approach to what data you are capturing and why.  Then you must consider its value to the business.  Each data have a relationship to a business unit or units within your organization.  Once the data are categorized, governance rules must be created.  You can't simply say we'll keep all data for all time, that is not realistic and will eventually create chaos.  Data retention rules apply here, they may be self imposed or perhaps government imposed in the case of financial institutions but clear rules should be defined. 

Unstructured data cannot always be handled in real time either.  Clear rules should be defined around which data needs to be handled in which order.  If the data is directly relevant to revenue then obviously it is more important and should be handled with the fastest available applications and hardware.  This is where defined disk tiers can come in handy.  If data is not needed to be instantly accessible it can reside on older, slower, commodity disks.  But if data is needed in real time or near real time then perhaps solid state devices are needed. 

If you haven't already been approached to deal with unstructured data, you soon will be.  I hope that this has helped you think about some of the pitfalls of dealing with it.  Do yourself a favor and start familiarizing yourself with some of the free tools that are out there.  Cloudera, as an example, has a free Virtual machine you can download that has a full install of Hadoop with Pig and Hive on it for practice sessions.  With a reasonably powered laptop you can get this up and running in a half hour or so.  If you don't, just keep up with industry blogs and whitepapers to stay in touch with what is out there. 

Here are links to some of the stuff I talked about in this blog

SQL Server 2012 and Polybase
Oracle Big Data Appliance
Oracle NoSQL Database
Cloudera Quickstart VM - Cloudera Quickstart VM

Bill Schoonmaker
Data Architect, EMC Corp

Friday, January 4, 2013

Welcome to the FitBit Generation

I'm overweight.  There, I said it.  According to some charts I'm obese although I don't think about myself in that way.  I've gone up quite a few pant sizes since my college days.  Then I was a svelte 170 pounds and was pretty active in High School.  I never was much of a runner, just didn't care for it.  I did like to lift weights but never really bulked up in any way.  I guess I was just "normal" for my height of about 6'1". 

The years tend to take a toll on us if we don't continually maintain ourselves.  For me, they've been particularly heavy on the pounds.  So here I am at the ripe old age of 43 and staring an overage of about 75 pounds dead in the face.  There are some serious health problems that can and will begin to crop up when you're pushing around that much extra weight.  To be fair, like I said, I don't "feel" like the fat guy in the room most of the time but I do notice that in most cases I am probably the heaviest guy in meetings or in general gatherings with my friends.  Even saying that in writing makes me know that something has to change.

I've dieted but only one time really seriously.  About 2 years ago before I started my new job I was a member at the Gym where I worked and joined a bootcamp program.  Three times a week I got up at 6:00 am to go exercise before work.  In that class I was also required to keep a food journal.  The food journal was really just a way for us to know what was going in because hey, knowledge is power, right?  Anyway, when I started that program I was sitting at around 235.  I managed to drop over 15 pounds during that program but then my world changed.  Or maybe it just got re-prioritized.

Starting a new job is stressful.  You have people to impress, changes to adapt to, a new commute to get used to and lunches to get to know people, among other things.  From that original 218 or so that I was down to I have now managed to put back on that weight and then some to put myself at a nice, plump 253 or so depending on which scale i'm standing on and how many clothes I'm wearing at the time.  Let's not even talk about the BMI percentage right now.

As it came to be closer to Christmas, the inevitable question arises.  What do you want for Christmas?  Honestly I have everything I need and more but I still thought about it.  I had read about these new fitness gadgets like the Nike Fuel Band and the Fitbit and others.  After doing some research I thought that the Fitbit would work into my lifestyle nicely and provide some real data to help me begin to make the changes necessary to lose weight.  On Christmas morning I opened up my new Fitbit One and also the matching Fitbit Aria Scale.  The products work together with an internet based software solution called the Fitbit dashboard. 

By signing up and then syncing your devices to the account you will begin to accumulate data that will help you see a clear picture of what your intake and output really is.  Being the data oriented guy that I am I was well into this thing right out of the gate.  The Fitbit one tracks the following:

1. Steps taken
2. Calories burned
3. Stairs Climbed
4. Sleep Quality

Some of this is based on tried and true formulas after you give some specifications about your body and the others are derived from actual activity.  The typical stride in an adult is about 2.5 feet when walking which seems to be well in line with my stride so the fitbit is pretty accurate in step counting for me.  You can also adjust the stride if you're shorter or taller as necessary.  The stair climber seems to be directly related to changes in elevation as walking steep hills sometimes counts as stairs but when I tried using it on the stairclimber the steps did not register as stairs (although they did register as steps).

The Calories burned are based on your statistics and throughout the day, even sedentary, you're burning calories.  You set a target weight loss goal (in my case 2 pounds per week) and the software will tell you how many calories you can eat for the day.  As you record your food consumed, walk, climb stairs and just generally go through your daily motions the calories remaining will change.  If you manage to keep your intake less than your expenditure you should lose weight.  Sounds simple in theory right?  So why do I need this little device to help me do it?

Well, the Fitbit is a data lover's dream!  In order to sync the data you plug a small device into a usb port on your computer.  Once this is done and you have the sync software installed, each time you come near it with the fitbit the data gets uploaded and is available in the Fitbit dashboard.  This gives you near realtime views of how your day is progressing with respect to your weight loss goals.  If you're sitting at 2:00 pm and you see you only have 500 calories left for the day you either get off your butt and go walking/stair climbing or you know that dinner is going to be a lean cuisine and maybe an ounce of almonds.    If you decide to move then the activity you add will immediately give you some buffer in your calories remaining for the day.

The Fitbit also tries to be cute and helpful with a community of people to interact and compete with.  Unfortunately this is where things fall off a little bit for me.  Yeah I like blogging and sharing my success/failures to a degree but do I really want complete strangers checking out my BMI?  If you have enough real life friends that use Fitbit maybe that would be good but so far this just seems to be a novelty.  The other area that is lacking a bit is user forums.  There are several thousand of them but there is not a way to search and find forums that are relative to you or your particular situation.  Instead you're left to click through them page by page until you find one you like, trust me, you'll get through maybe 10 or so pages before you give up looking.  If you do add a few friends you can decide what to share with them which is nice ( I only share steps taken) and then send messages or cute little taunts or praises which may help some people but will probably get old very quickly.

The best parts of the Fitbit dashboard for me are centered around the calories that I have left to consume.

Also, the graph that shows me the level of activity that I've performed for they day is helpful.  This  essentially tells me how often I'm being a slug.  As a desk jockey this is a great reminder to get up and move around every hour or so!

There are also other areas of information that can also help you get a better overall picture of your health such as the sleep recorder.  With the Fitbit you are given a wrist strap that the fitbit slips into.  Once it is in and you are ready to go to sleep you press and hold down the one button which puts it into sleep mode then you just go to sleep.  When you wake up (to the scheduled silent alarm if you prefer) you press and hold the button again.  Once you walk by the wireless receiver the data will be uploaded and you'll see either how good or how bad you slept.

 This is cool information to have but it doesn't really tell me how to fix it or if it even needs to be fixed.  I just know I woke up a lot.  I guess as I map this data out I will begin to see how I feel with less or more sleep.  The most eye opening thing to me was how many times I usually wake up during the night.  At this point I will also mention you can pay for a premium mebership.  For $49.95 you get access to additional graphs and logs and also access to real personal trainers that go over your data and help you make positive changes to your plan.  Maybe with this I would find out more about the sleep patterns.

So what else is good about the Fitbit?  The battery life seems solid.  The documentation says it can go for 5-7 days without a charge and then only needs 2 hours to get a full charge.  So far after 2-3 days I have plugged it in and the battery has been around 3/4 full, good to go there.  There is also a guy who's created a 3rd party application to notify you when it needs to be charge.  You can find his page at  The unit's form factor is very small and the belt clip that it came with seems very good with not much movement to it while walking.  I did manage to turn it off accidentally once while driving, I guess my belt or pants had somehow pushed and held the button down.  This was disappointing as it didn't record data until I turned it back on and there is no way to enter steps directly.  You can directly log activity but you'll just enter it as a distance walk.  I suppose this is good though as people could easily cheat the system and it would ruin the social competition.

 The Aria scale is pretty cool.  I'll admit I had some serious problems getting it to work when I took it out of the box.  After some googling I found that the scale only support Wireless B.  This forced me to make a change to my router to turn on support for it.  Once I did this the scale worked great.  Some people have also complained about its range.  I have mine in the room directly above where the router is so no problems there but your mileage may vary.  You can add multiple users to the scale as well which is very nice, up to 8 I think is the limit.  This way people can step on it and it logs their data separately.  Supposedly there is a slight inconvenience with multiple users of the same body mass as you have to select your username once stepping off the scale but that is understandable.  How else would it know?

So how have my results been so far?  Well, I officially started on December 27th, 2012 and today is January 4th, 2013 so I have 9 days in to date.  To date I have only missed my 10,000 step goal on one day and have lost 8.1 pounds.  I will admit that I'm being pretty good on my food intake.  Only one day has slipped and that was New Year's Eve on which I had far too many beers to stay in range of my calorie goal.  The Fitbit is something that I'm wearing regularly and looking forward to meeting my goals with every day.  I have a few friends out there that I'm connected with but I'm only sharing steps with them.  It is fun to see the leaderboard move up and down as we each attempt our goals.  Most importantly though this device has opened up my eyes to the lack of physical activity I was getting every day as an IT professional who is bound to a desk all day.  I'm not going to say it has worked yet because I've only been on the plan for 9 days but I will say it has been a revelation.

I will post periodically on my progress here, thanks for reading!