What type of questions can ‘Big data’ answer?

‘Big data’ is a recent buzzword.

What is Big data
The “what is it” part is pretty easy to explain in textbook terms, but the
hype that’s currently surrounding the idea makes it much harder to bring
down to practical application.

Sorting quality, actionable insights from the massive volumes of
less-relevant data available (from multiple sources). The focus is on data quality, not quantity.

I would guess ‘Big data’ will find it difficult to kick-start itself in your organization
unless you have some existing data warehouse/data mart/data collections of atleast
a few years’ worth regarding customer behavior.

If you dont have such dataset, it may be time to start creating databases that track
customer usage patterns – web page click stream analysis, and other data points
to be collected in the hope of eventually finding some nugget in the haystack of
data. (You may want to take a look at loggr.net to see a method of how you can log events) You know, track everything your customers do and store it forever.
What you dont need is a huge new server to process the petabytes of data – you can
mine the data with a personal laptop.

Also see
Big Query: Google’s data analysis platform if it can be used as a platform for mining big data.

Does big data mean you collect every bit of information about your customers?
No – you dont have to store everything – place value only on key customer data alone.

Democratization of data – means more individuals in your company can mine the data you already own. Possibly the biggest source for “big data” analysis is something you already have – your google analytics data export, worth a few years of visitors’ activity details.

How is big data different from analytics?
Big data enables you to uncover the hidden patterns that provide the answers you really need.

We rarely engage in conversation about making meaning out of data.
Knowing what to make of the data is more important than having access to reams of data. Proper analysis is more important than more data.

“Know what you need answers to”

My thoughts on the type of questions that Big data aiming to answer, without needing to ask a question or questions.

Focus on Activity and Action (what is the user trying to do, what have users already done in the past?)

Identify and/or articulate patterns
(“It looks like customers in Chicago are #1 number-wise in buying this specific book”)
Put data in perspective, using historical comparisons
(“Buyers from Florida are ordering part #234AZ in 2012 more than what they did in the years 2005-2008″)
Contemporary comparisons
(“More number of bottled water crates sold during March-April 2012 in Atlanta, as compared to Houston, in the same period”)
Identify anomalies
(“During this time of year, we usually sell more number of leafblowers than what we are selling now.”)
Identify and/or articulate relationships
(“Restaurants in Seattle and surrounding areas are the biggest consumers of Evian water since January 2012, when the state of Washington is taken into account”)
Identify causations (cause-effect)
(“The government mandated the use of biodegradable teabags so ourselves and 5 other competitors who are the only ones making porous filter paper used in teabags have experienced strong demand in 2012″)
Identify correlations
(“There has been an increase in the number of glue bottles sold and the number of kite-making kits sold, reported from two distinct industry lines”)
Glean actionable insights
(“More Evian water bottles are being sold where there are more number of checkins from food trucks in New York”)
Clarify assumptions
(“The reason there are more number of checkins between 4-6 PM in restaurants/eateries in 54th Street (as compared to 51st Street) may be because OpenTable shows most number of reservations already made in 51st Street establishments”)

Make gentle extrapolations or Predictions

For making this kind of ‘asked-for’ and ‘articulated’ competitive intelligence, any Big Data tool in the works need to be powerful as well as flexible as well as ‘able to be customized’ to a level I can only dream of. It need to have ability to zoom in and zoom out – sometimes you want to slice and dice correlating data variables by product categories or lines, but at other times by specific products.

Update April 13, 2012:
Related content:
Data as seeds of content
The idea that, not only visualizations, but also content (long-form or short bulleted points) can be created automatically from data analysis programs is being explored by a company called “Automated Insights” (automatedinsights.com). This idea is particularly appealing: “summaries written entirely by software, that derive insight from data”.

Leave a Reply