When an organisation embarks on a Big
Data project, it’s a journey laden with lots of landmines. Even if
one is unable to manage one risk, it has the potential to derail a Big
Data project even if one was able successfully evade other risks. As they say “A chain is as strong as its weakest link”.
So what are the weak links in a Big data project? What are the 5 key questions
to ask before embarking on a big data project so that one is able to mitigate
the risks from these weak links and steer it towards a successful
implementation. Based on real life experiences from the trenches Flutura Decision Sciences has outlined out top 5 Big Data questions which we feel are extremely vital to pose
upfront before spending dollars on a big data project
The “Dent” test: What is
the $ denting business use case we
are enabling?
Many engagements are “Data
forward as opposed to being use case backward”. It’s very important to
fully understand the $ impact of the use case being instantiated and the
business value of the new data pools which are being streamed for analysis. For
example how much increase in revenue are we expecting when creating a
recommender engine using Hadoop cluster to increase the breadth of purchase for
online customers? If so a value tracker can track the incremental revenue
attributed to the recommendations converted into a sale from the big data
solution? Along with
identifying the use cases one of the first tasks on hand is to identify the
pools of big data which are lying untapped and answering the following
questions- Do I have big data within my
premises? How do I identify one? What can I do with it?
The “Intersect” test: Which event data streams are we finding value
in?
Often value lies at the intersect of
multiple data streams. For example, Flutura worked for an engagement in
the telecom industry there are lots of devices at the periphery emitting events
– cell phone towers, firewalls, routers, switches, application logs , OS logs
etc. Whenever an adverse event happens –say a denial of service attack on a
provider it is important to triangulate the effect across routers logs firewall
logs and application logs.
Similarly in a real life engagement Flutura
worked for an online travel agency (OTA), the value of new scenarios being
instantiated by the Hadoop cluster was at the intersect of Apache log files
which recorded each and every click event of the user along with the cookie id
& IP address which was overlaid on top of search events ( from city, to city,
date, no of passengers flying) which was recorded in a mySQL search log which
was then co-related to actual booking /payment events which was stored in an
Oracle database. The new Hadoop cluster enabled the organisation to compute
look to book at a customer level as opposed to an aggregate corridor level. So
which intersect of event high velocity / unstructured event streams do we need
to look for value in?
The “Tool Components” test: How do I know which components are
relevant for my use case – Columnar db’s, document databases, machine data
tools, complex event processing etc ?
The Big data landscape is laden with a
lot of tools – columnar databases (Infobright, vertica), appliances ( Hana,
Exadata), Complex event processing frameworks ( S4) , Algorithmic libraries (
R, Mahout etc ), Machine data tools (
Splunk), Document databases ( CouchDB, Lucene,MongoDB) etc. There is very less
guidance on which scenarios require which kind of constructs. So a very
pertinent question would be what is the decision tree I need to use to arrive
at the architectural constructs which are required to deliver my business use
case
The “Chunk” test: Are we
delivering a high impact business output in 60-90 days?
In most organisations with traditional
DW mindsets it’s not uncommon to find the first deliverable being exposed to
business 8-12 months from the start date of a project. While executing
Big Data project, it makes sense to “chunk” the use cases into 60-90 day
deliverables so that it builds momentum from the business and
accelerates the much needed funding to set up
The “Co existence” test: What’s your co-existence strategy with traditional BI solutions ?
Even though new age big data solutions
have dramatically increased performance expectations and information handling
capability, that does not mean the end of traditional BI solutions. One must
have a co-existence strategy with traditional BI solutions as those data
processes have a lot of embedded business rules and one should not spend money
recreating them. So How can the new age
Big data solutions co-exist with existing BI solutions and other components in
our existing IT ecosystem?
Data Scientists at Flutura Decision Sciences have seen the importance of
managing weak links in a Big Data implementation by asking 5 important
questions. To summarize the 5 key questions to ask are
1. “Dent
test” : What is $ denting use case using the big data
stack?
2. “Intersect” test : Which event
data streams are we finding value in?
3. “Tool Components” test : Which Big data Components are required and when?
4. “Chunk” test :
Are we delivering a high impact business output in 60-90 days?
5. “Coexistence” : How do we co-exist with existing DW/BI
solutions in place?
More than a century back Louis Pasteur
made a profound statement – “Chances favour a prepared mind”. Flutura Decision Sciences strongly believes this statement is
true even in today “data soaked” world when organisations embark on Big Data
solutions and the 5 key Big data questions pave the way for a successful
implementation.