December 27, 2012

5 Disruptive Big Data Use cases in Health Care analytics to heal body & soul


Big data + Health care is an intersect we at Flutura are passionate about for 3 reasons

Reason-1 : It has large pools of  untapped data pools which are not “juiced” for patient care intelligence
Reason-2 : Healthcare is definitely ripe for disruption using advanced data analytic
Reason-3 : The outcomes of disruptive transformation touch people in a very human way



These beliefs got us really moving and nail the use cases which can move the patient experience needle. We have created an extensive catalog of Health care use cases which are still white space opportunities to improve care givers efficiency & heighten patient’s experience. As care givers start having access to wider and deeper data pools they have started putting this into a ‘data blender’ thereby stirring / moving the needle on  care giver efficiency metrics and patient experience metrics

In this blog we will share 5 sample use cases from our extensive use case catalogue designed to extract knowledge about patient care patterns from unstructured transcripts of doctors, nurses and diagnostic lab along with location data available from new instrumented devices

So what are 5 use cases which can help?


Use case-1: Keyword mining of doctor’s/Lab transcripts using text mining and co-relations to patient outcomes.


As a patient interacts with the hospital thru multiple touch points – nurse touch point, Doctors touch point, Diagnostic touch point etc. At each touch point encounter there is a lot of semi structured information regarding the patient’s conditions which is captured- for example a CAT scan report would have preliminary interpretation regarding the state of the nerves and blood flow conditions of the brain. This unstructured text from a doctor, diagnostic lab or nurse’s observations contains a rich gold mine of intelligence regarding the patient’s condition which can influence a patient’s outcome. These outcomes can be further divided into


However this rich source of information regarding a patient’s condition is also untapped. A text mining process can be used to harvest this intelligence into an Clinical Disease Repository (CDR ). 
- Coronary Keyword watch list
- Diabetic keyword watch list


The Clinical Disease Repository can contain early warning signals regarding co-relation frequency of occurrence of  specific words in the unstructured text and the clinical outcome


Use case-2: Location aware application analytics for enhancing customer experience and optimizing nurse/doctor deployment


A range of new solutions within hospitals have  RFID chips which are embedded to patient’s card or Doctors card or nurse which can relay the location information of the patient/doctor in real time. This location data is a real new data pool with huge implications for effectively managing patients experience and optimising resource within a hospital. For example we can create a simple vectors like nurse/patient ratio, nurse mobility index etc. We can also create models to see the strength of the relationships between patient satisfaction index and nurse/patient ratio. We can then define optimal nurse/patient ratios for different sections of the hospital – OPD/cardiology/paediatric wards for example may need higher nurse/patient ratios than say for example dental department. Once set anytime this goes crosses a threshold an alert can be send to the Head nurse to alleviate the risk of a under serviced patient. We can also use the nurse mobility index to decide how the various departments must be co-located within the hospital to  improve patient outcomes and optimize use of expensive health care equipment


Use case-3: Telemedicine Analytics



Telemedicine platforms can go to the patient when it is difficult for the patient to come to the hospital. A telemedicine platform can capture various vitals of the patient like temperature, Heart rate, Blood Pressure and ECG which can streamed to a central repository in real time via satellite. Once collated a series of triggers can be placed on the data to sense and respond to real world health conditions
ALERT-1: If growth in concentration of BP with statistical significance found for male in the age group 30-45 in a specific zip code say 08837 from chi square test then hold road shows to sensitize the inhabitants in the zip code on healthy eating habits
ALERT-2 : If the number of patient segment migrations> 10 %  based on actual diagnosis events moves from cluster-2 to cluster-5 then proactively import preventive medicine in bulk to cater to growing needs


Use case-4: Apriori sequence analysis to define new clinical pathways



Apriori algorithms can be used to unearth interesting sequences in data occurring close to each other before a clinical outcome. These could be time ordered sequence of events.This would help us create episode rules like “If ‘restlessness’ & ‘insomnia’   occurs in the transcripts there is a 60 % chance that a coronary episode is imminent” . These can trigger proactive interventions which can help reduce the chances of an adverse event or a hospital admission event.


Use case-5: Adverse events signal analysis sandbox


Adverse events are a reality of today’s health care environment. An adverse event can be defined depending on the context
-          Schizophrenic event
-          Angina event
-          Mortality event

Examples of signals are
 - Frequency of Low grade fever > 3 times  multiple times in last 90 days
 - Recency of last episode of memory failure ...

One can build a Disease signal repository which is an early warning system for the hospital consisting of the strongest predictors of a patient outcome.


Care providers can always learn from adverse events in a systematic way and to extract previously unknown signals which can be used to mitigate its recurrence.  These signals can be small pieces of evidence which when triangulated really amplifies the situation. These medical signals are buried in the avalanche of patient information and most times there is no bandwidth within the care givers organisation to extract the same and reduce the number of blinds spots regarding the adverse event.


Conclusion
To summarize we @ Flutura strongly believe Patient data can be the lifeblood of a care giver and must be treated like gold. Flutura strongly believes that Patient care intelligence platform can serve as a bridge between care providers good intentions and patient’s experience of human touch by harvesting new pattern signals to trigger personalized actions.

Bertrand Russell was right when he said “One must care about a world one cannot see”. Flutura is proud to advance these 5 Health care use cases which can be used to “see” a new world of “Health care signals” in the avalanche of patient care data thereby improving the human condition.
















December 8, 2012

8 TESTS TO DECODE BUSINESS ACCUMEN OF A DATA SCIENTIST



A data scientist at Flutura has to wear multiple hats in order to deliver next generation analytical solutions in the sectors we operate in namely energy, telecom, digital and health care industry. In order to do that he/she has to wear 3 hats

-         The BUSINESS  hat
-         The MATH hat
-         The DATA hat

Most of the time it’s easy to fathom the depth of the data scientists math / algorithmic knowledge and the depth of his/her understanding on handling high velocity data and unstructured data points. But one area of weakness is the business dimension. So how do you decide whether a data scientist can be put in front of the business? This blog talks about 8 different tests Flutura executes to decode the business acumen of a data scientist

Test-1: “RESONANT STORY TELLING” TEST


Human Beings are wired more to listen to stories than to read numbers. Flutura data scientists were doing data forensics on mobile app funnel drop analysis for an online travel agency was able distil the quintessential essence of all essences - That the mobile user who was getting dropped was a 20 something, last minute booker travelling between metros and trying to complete the transaction from a Samsung mobile using Android os and the friction point was the payment gateway
Therefore
-         Can the data scientist translate numbers into stories? This is a very important tool to build bridges with business. Else a data scientist has the risk of getting struck in the world of math and unable to make the connect.

Test-2: THE “STRING OF PEARLS” TEST


It’s very important for a data scientist to triangulate from key insights. A Flutura data scientist working on Telecom security use case was able to connect the dots when he was able to see a co-relation between multiple failed login attempts + successful patch download event and a surge in network traffic which was a result of the security hole in the patch which was downloaded.
Therefore
-         Can the data scientist connect the dots and form a “necklace” from the pearls of insights discovered from cryptic log file data points?

Test-3: “NEEDLE MOVEMENT” TEST


One of the biggest risks in a big data project is using data to solve the right problem. There are many use cases a data scientist can curate … How do we identify the use cases which are $ denting from the use cases which have marginal impact?.Big data use cases can be segmented into 2 categories … those which move the needle incrementally vs those which disrupt. Its very important to keep this distinction in mind. Flutura was able to shepherd an ecommerce company into introducing new payment products after most of the transactions were dropped at payment gateway. This minor tweak resulted in the friction point being removed and a huge upswing in revenues
Therefore
-         Can the data scientist tease out business themes where a use case can unlock disproportionate revenue making potential for the organisation?
-         How would a data scientist go about teasing out the business themes to move the needle?
-         Which are the best “impact zones” in a business process which are “ripe” for big data?

Test-4: “SNIFF THE DOMAIN OUT” TEST


Let’s face it – data driven domain knowledge can reduce the learning curve required to understand domain and is deeper than armchair based experiential knowledge. Multiple engagements Flutura has executed has proven to us that a data scientist can glean far more knowledge about the nuances of a business by doing getting his/her hands dirty on exploratory data analysis(EDA), and eyeballing univariate and bi-variate results.
Therefore
-         Can the data scientist “sniff the domain out” by examining EDA outputs and getting the business to put the numbers in context?

Test-5 : “ACTIONABILITY” TEST


Most of engagements , the end result is a  suave looking ppt with lots of eye candy graphs which result in a feel good effect but business is left wondering on the actions that can be driven out of the exercise. In Flutura our mantra has been “Actions not insights”. One of the use cases we executed resulted in high value customers who are vulnerable to churn away being redirected in real time to high touch contact centre agents who would call them instantly and offer an instant rebate to woo them back
Therefore
-         What was the data scientist’s role in operationalizing actions or did his prior engagements end with recommendations? There is a big difference between the two

Test-6 :  “USE CASE CURATION” TEST


Carving out new use cases and possibilities from new data pool is both an art and a science. A Flutura data scientist was able to use search logs which were typically discarded to decode the travel intent of an online booker – is it a price sensitive traveller or  a value conscious traveller ? is the traveller an early bird or a last minute booker. This use case to create behavorial tags from search logs resulted in more intelligent outbound actions
Therefore
-         Can we give a raw data set and can the data scientist take 3-5 minutes to curate an interesting possibility from the raw data set ?
-         Where would he or she start in the big data ocean and zero in on the right ‘catchment’ of use cases

Test-7: THE “NORTH POLE” TEST


Every big data voyage requires a north pole in terms of measuring success for the engagement. A data scientist must be extremely clear or what constitutes success for the business stakeholders be it a sandbox setup or a full fledged production setup of a Hadoop cluster.
Therefore
-         Can the data scientist work with business to articulate the ‘as is’ state and the expected ‘to be’ state of the decision making process after the analytical solution is implemented?

Test-8 : THE “WHAT DO YOU SEE” test


The ability to take an analytical output and translate them into a series of English statements – this constitutes Flutura’s “What do you see” test. The sample analytical outputs can be
-         Key word frequencies from text mining
-         Scatter plots
-         Box plots measuring behavioural volatility of customer balances
-         Bi-variate cross tab outputs
-         Clusters from a segmentation output etc
So
-         Can the data scientist construct 3-4 meaningful English statements from the above sample analytical outputs?
 If so he/she would have crossed the big chasm from math to a business pattern which can be perceived by business



So in a nutshell here are 8 questions to ask
-         “RESONANT STORY TELLING” TEST
o   Can the data scientist narrate a compelling and resonant story from the data patterns?
-         “STRING OF PEARLS” TEST
o   Can the data scientist connect the dots and form a “necklace” from the pearls of insights discovered from cryptic log file data points?
-         “NEEDLE MOVEMENT” TEST
o   Which are the best “impact zones” for use cases which are “ripe” for big data?
-         “SNIFF THE DOMAIN OUT” TEST
o   Can the data scientist “sniff the domain out” by examining analytical outputs and getting the business to put the numbers in context?
-         “ACTIONABILITY” TEST
o   What was the data scientist’s role in operationalizing actions or did his prior engagements end with recommendations?
-         “USE CASE CURATION” TEST
o   Can we give a raw data set and can the data scientist take 3-5 minutes to curate an interesting possibility from the raw data set ?
-         THE “NORTH POLE” TEST
o   Can the data scientist work with business to articulate the ‘as is’ state and the expected ‘to be’ state of the decision making process after the analytical solution is implemented?
-         THE “WHAT DO YOU SEE” test
o   Can the data scientist construct 3-4 meaningful English statements from clustering outputs, keyword frequencies, Box plots and other analytical outputs?


These tests are by no way collectively exhaustive or perfect. But it serves as a reasonable starting point to get the right DNA of Data Scientists into the organisation. Else we run the risk of having people who just knows how to create a Hadoop cluster :) as being labelled a data scientist.
As the saying goes “The real voyage of discovery consists not in seeking new landscapes but in having new eyes.”- Marcel Proust
Good luck with your efforts to recruit the rare species – the holistic data scientist :) !!!

October 6, 2012

32 Big data "gotchas" from the trenches in exactly 3 words



1.Usecase ! Usecase ! Usecase

2.Decode intent proxies!

3.Think "20-100 X Scalability" blindspots

4.Actions not insights

5.Frame unanswered questions !

6.Embedd MachineLearning processes !

7.Humanize analytical output !

8.Ingest unstructured data

9.Quantify $ impact !

10.Deepdive into "Architecture weaklinks" !

11.Iterate ! Iterate ! Iterate !

12.Deliver immersive interface !

13.Filter out 'architecture noise'

14.Dont over Engineer !

15.Tightly couple frontline-channels !

16.Mash disparate datapools !

17.Emit realtime response !

18.Map Architecture decision-tree

19.Decompose analytical workloads !

20.Think Columnar architecture !

21.Start Datascience Bootcamp !

22.Avoid 100 pg design !

23.Time to impact < 60 days !

24.Zero licensecost Sandbox !

25.Surface $denting usecase !

26.Curate "wow" scenarios !

27.Distill impactful patterns !

28.Think actions operationalized !

29.Align with business !

30.Woo a "Big Daddy" !

31.Pick Datascientists carefully !

32.Build DW-Bigdata bridges !



August 12, 2012

11 Core Big Data Workload Design Patterns



As big data use cases proliferate in telecom, health care, government, Web 2.0, retail etc there is a need to create a library of big data workload patterns. Flutura has created a Big data workload design pattern to help map out common solution constructs. There are 11 distinct workloads showcased which have common patterns across many business use cases.

A Big data workload design pattern is a template for identifying and solving commonly occurring big data workloads. The big data workloads stretching today’s storage and computing architecture could be human generated or machine generated. The big data design pattern may manifest itself in many domains like telecom, health care that can be used in many different situations. But irrespective of the domain they manifest in the solution construct can be used. Once the set of big data workloads associated with a business use case is identified it is easy to map the right architectural constructs required to service the workload - columnar, Hadoop, name value, graph databases, complex event processing (CEP) and machine learning processes

Here is a birds eye view of the various workload patterns



Data Workload-1:  Synchronous streaming real time event sense and respond workload
It essentially consists of matching incoming event streams with predefined behavioural patterns & after observing signatures unfold in real time, respond to those patterns instantly.
Example:  In  registered user digital analytics  scenario one specifically examines the last 10 searches done by registered digital consumer, so  as to serve a customized and highly personalized page  consisting of categories he/she has been digitally engaged. Also depending on whether the customer has done price sensitive search or value conscious search (which can be inferred by examining the search order parameter in the click stream) one can render budget items first or luxury items first
Similarly let’s now switch over to a health care situation.  In hospitals patients are tracked across three event streams – respiration, heart rate and blood pressure in real time. (ECG is supposed to record about 1000 observations per second). These event streams can be matched for patterns which indicate the beginnings of fatal infections and medical intervention put in place

Data Workload-2:  Ingestion of High velocity events - insert only (no update) workload
This is a unique workload widely experienced while ingesting terabytes of sensor and machine generated data. These are insert only workloads with no updates or lookup workloads
Example: Ingesting millions of micro events streaming from log files , Firewall alarms, sensor data, and the click stream data torrent. It is estimated that a Boeing flight has the potential to generate 200 terabytes of data on a single flight. Data from vibration sensors, temperature sensors, strain gauges, position data, speed etc … Imagine ingesting all this data for all the flights !

Workload-3: High node Social graph traversing
This is a unique workload where finding interrelationship around nodes in a network is vital. This workload is computation and read intensive as node statistics need to be computed and children of a node need to be read dynamically.
Example: In the telecom industry where there are millions of pre paid and post paid subscribers, the CDR (Call Detail Record ) consists of terabytes of switch logs which contain important patterns regarding inter-relationships between subscribers. This can be mined using graph databases to understand if certain new gaming applications or apps which are downloaded are getting viral with friends and family circles by traversing computation intensive graph traversals
Similarly in social websites, millions of interrelationship are stored as a graph and  one needs to traverse large complex graphs and  map key influencers who are capable of influencing a marketing outcome or to recommend a friend to expand the social network to its edges

Workload-4 : ‘Needle in a haystack’ workloads
Looking for a small string or attribute value from the terabytes of data across multiple attributes is a very common read workload specifically in machine data use cases
Example :  While processing terabytes of  sensor data from engines one may look for specific temperature and Rpm conditions behind an automobile breakdown. Similiarly security specialists investigating a network breach incident may wade thru steams of granular log data from multiple devices before homing in on crucial events vital to giving clues about the cause of an attack

Workload-5 :  Multiple event stream mash up & cross referencing events across both streams
Usually events in isolation may not have significance but taken together as a string of events occurring in a timeline there importance amplifies especially across multiple event streams
 Example : In telecom there is a need to mash up firewall events on a timeline along with router events to detect patterns in a distributed denial of services ( DDOS) attack

Workload-6 : Text indexing workload on large volume semi structured data
While processing semi structured data tools like Lucene needs to index the strings
Example : In medical scenarios, one need to identify all encounters of a patient with the doctor which has specific disease keywords and then analyze the health outcome of the patient

Workload-7 : Looking for absence of events in event streams in a moving time window
While most pattern detection consists of behaviour/patterns exhibited, it also makes sense to look out for ABSENCE of specific events across moving time windows as they may alert to a risk or a revenue potential
Example: In an online travel website, its important to sort thru the avalanche of log file data flowing in and isolate search instances which did NOT result in a booking event. So we are traversing a moving time window where there is a sequence of search events which do not have a book event.

Workload-8: High velocity, concurrent inserts and updates workload
Its very common to have thousands of users across the world update or insert based on booking or gaming applications
Example: Thousands of flight orders bookings, payment transactions online

Workload -9 : Semi & Unstructured data ingestion
 It is said that 80 % of the worlds information is unstructured and bringing it into repositories to analyze them may yield previously untapped intelligence
Example: Medical records – xray, ecg results  need to be digitized ( unstructured )and doctors observations on the patient ( semi structured ) need to be recorded
  
  
Workload-10 : Sequence analysis workloads
It is very common to chunk pieces of events together and examine if there are patterns which tell a story regarding the problem context
Example :  In genome and life sciences, DNA sequencing a crucial. Similarly in the telecom industry there are a lot of dropped calls from a switch which needs to be analyzed using sequence analysis processes to understand events leading to that outcome of interest.

Workload-11 : Chain of thought ad hoc workload for data forensic work
This workload is primarily triggered by power users or analysts who are the ‘Data Marco Polos’ exploring  large oceans of data with questions previously not thought off. They cast a wide net and often come up with few patterns. But when they identify a pattern it has huge repercussions for the organisation
Example :  Pricing analysts want to investigate consumer behaviour before they price a service.  They may have a sequence of hypothesis to test in a certain sequence before arriving at the optimal price point. Similiarly Infrastructure specialists want to confirm or reject hypothesis regarding effect of newly launched apps on digital traffic by sequencing a specific set of hypothesis regarding app engagement and its effect on network infrastructure load


So far we have seen  a draft articulation of workload patterns. It is our endeavour to make it collectively exhaustive and mutually exclusive with subsequent iteration.


As Leonardo Vinci said “Simplicity is the ultimate sophistication” …. Big data workload design patterns help simplify the decomposition of the business use cases into workloads. The workloads can then be mapped methodically to various building blocks of Big data solution architecture. Yes there is a method to the madness J



July 17, 2012

3 Game Changing Big Data Use Cases in Telecom


Like many industries undergoing transformation,  the Infrastructure/Security/Compliance function within large telecom companies is becoming more data driven. Flutura Decision Sciences has been at the forefront of some cutting edge use cases for Telecom Infrastructure/Security Intelligence. Here are 3 powerful use cases which vividly bring out new possibilities in Telecom big data

Telecom use case-1: Contact centre text mining and Telecom Bandwidth throttling


 In most organisations the contact centre channel data is analyzed typically from a SLA(Service Level Agreement perspective). For example TAT (Turnaround time), Average wait time etc. But the actual transcript of the conversation can yield powerful insights regarding telecom infrastructure usage. Surge in Contact centre keyword frequency as a lead indicator to infrastructure bottlenecks.
Telecom providers are competing with each other to get greater ARPU (Average revenue per user) from data services as opposed to voice services. In this competitive environment a telecom provider launched a new but extremely viral gaming application on Mobile devices. A few days after its launch it started observing a burst of calls to the call centres and on text mining the transcript Data Scientists found a sharp spike in the keywords alluding to performance. The specific intelligence regarding keyword burst and specific time of day at which this was encountered was shared with the infrastructure planning group which then put a plan in place to throttle the bandwidth dynamically based on usage.

Telecom use case-2 : Collocation Analysis from Cell phone towers


This is a security use case where if an investigation team wants to find out if there were multiple phones with the same person. When a call is made typically the following data points are captured - subscriber , date, time, and duration. Depending on the type of call, additional data can be gathered, including switch data, cell tower IDs, device identification (serial) numbers, as well as International Mobile Subscriber Identity (IMSI) and International Mobile Equipment Identity (IMEI) codes. The unique ID of the cell tower a handset was connected to when a connection was made is one of the most important components for collocation analysis
By examining terabytes of CDR/Tower  records from the switch one can triangulate on a few collocation events.  A co-location event can be defined as the same cellphone tower  being used to route calls during a specific point in time. This is almost like looking for a needle in a haystack and traditional solutions would have trouble handling the massive volume of tower and switch data. But with a combination of massive Hadoop clusters and columnar database architecture, these queries can be executed at lightning speed to surface a significant few events of interest from the massive ocean of log  data across devices

Telecom use case-3 : Multi device event stream analysis co-relating Firewall & IDS & Switch activity

Typically in most telecom infrastructures IDS ( Intrusion detection systems ) sit at the periphery of the network monitoring malicious activity and recording the same as log entries or alarm events into a log file. Firewalls and application logs also store  a plethora of important events which if triangulated thru a central log repository to  provide a comprehensive picture of any patterns which are dormant in the attack
One key components to enabling a Central  Log File repository with events streaming from multiple devices which are ingested and collated centrally.  Once this central log file is set up to store the torrent of event data, it can be  channelized into intelligence to optimize network infrastructure and aid security of the telecom assets. Flutura Decision Sciences is convinced that setting up of a Network Intelligence team consisting of Security experts and Data Scientists who work in a collaborative fashion can yield dramatic game changing insights to catapult an organisation to the next level

June 19, 2012

5 Questions to ask before embarking on a Big Data Project


When an organisation embarks on a Big Data project, it’s a journey laden with lots of landmines.  Even if  one is unable to manage one risk, it has the potential to derail a Big Data project even if one was able successfully evade other risks. As they say “A chain is as strong as its weakest link”. So what are the weak links in a Big data project? What are the 5 key questions to ask before embarking on a big data project so that one is able to mitigate the risks from these weak links and steer it towards a successful implementation. Based on real life experiences from the trenches Flutura Decision Sciences has outlined out top 5 Big Data questions which we feel are extremely vital to pose upfront before spending dollars on a big data project
The “Dent” test: What is the $ denting business use case we are enabling?
Many engagements are “Data forward as opposed to being use case backward”. It’s very important to fully understand the $ impact of the use case being instantiated and the business value of the new data pools which are being streamed for analysis. For example how much increase in revenue are we expecting when creating a recommender engine using Hadoop cluster to increase the breadth of purchase for online customers? If so a value tracker can track the incremental revenue attributed to the recommendations converted into a sale from the big data solution? Along with identifying the use cases one of the first tasks on hand is to identify the pools of big data which are lying untapped and answering the following questions-  Do I have big data within my premises? How do I identify one? What can I do with it?
The “Intersect” test: Which event data streams are we finding value in?
Often value lies at the intersect of multiple data streams. For example, Flutura worked for an engagement in the telecom industry there are lots of devices at the periphery emitting events – cell phone towers, firewalls, routers, switches, application logs , OS logs etc. Whenever an adverse event happens –say a denial of service attack on a provider it is important to triangulate the effect across routers logs firewall logs and application logs.
Similarly in a real life engagement Flutura worked for an online travel agency (OTA), the value of new scenarios being instantiated by the Hadoop cluster was at the intersect of Apache log files which recorded each and every click event of the user along with the cookie id & IP address which was overlaid on top of search events ( from city, to city, date, no of passengers flying) which was recorded in a mySQL search log which was then co-related to actual booking /payment events which was stored in an Oracle database. The new Hadoop cluster enabled the organisation to compute look to book at a customer level as opposed to an aggregate corridor level. So which intersect of event high velocity / unstructured event streams do we need to look for value in?
The “Tool Components” test: How do I know which components are relevant for my use case – Columnar db’s, document databases, machine data tools, complex event processing etc ?
The Big data landscape is laden with a lot of tools – columnar databases (Infobright, vertica), appliances ( Hana, Exadata), Complex event processing frameworks ( S4) , Algorithmic libraries ( R, Mahout etc ),  Machine data tools ( Splunk), Document databases ( CouchDB, Lucene,MongoDB) etc. There is very less guidance on which scenarios require which kind of constructs. So a very pertinent question would be what is the decision tree I need to use to arrive at the architectural constructs which are required to deliver my business use case
The “Chunk” test: Are we delivering a high impact business output in 60-90 days?
In most organisations with traditional DW mindsets it’s not uncommon to find the first deliverable being exposed to business 8-12 months from the start date of a project. While executing Big Data project, it makes sense to “chunk” the use cases into 60-90 day deliverables so that it builds momentum from the business and accelerates the much needed funding to set up 



The “Co existence” test:  What’s your co-existence strategy with traditional BI solutions ?
Even though new age big data solutions have dramatically increased performance expectations and information handling capability, that does not mean the end of traditional BI solutions. One must have a co-existence strategy with traditional BI solutions as those data processes have a lot of embedded business rules and one should not spend money recreating them.  So How can the new age Big data solutions co-exist with existing BI solutions and other components in our existing IT ecosystem?


Data Scientists at Flutura Decision Sciences have seen the importance of managing weak links in a Big Data implementation by asking 5 important questions. To summarize the 5 key questions to ask are
1.       “Dent test”         : What is $ denting use case using the big data stack?
2.        “Intersect” test                : Which event data streams are we finding value in?
3.        “Tool Components” test              : Which Big data Components are required and when?
4.        “Chunk” test     : Are we delivering a high impact business output in 60-90 days?
5.       “Coexistence”   : How do we co-exist with existing DW/BI solutions in place?
More than a century back Louis Pasteur made a profound statement – “Chances favour a prepared mind”. Flutura Decision Sciences strongly believes this statement is true even in today “data soaked” world when organisations embark on Big Data solutions and the 5 key Big data questions pave the way for a successful implementation.




May 29, 2012

What is the "$ denting" Big Data use case ?


Many big data engagements are Data forward as opposed to being use case backward”. It’s very important to fully understand the $ impact of the use case being instantiated and the business value of the new data pools which are being streamed for analysis. For example how much increase in revenue are we expecting when creating a recommender engine using Hadoop cluster to increase the breadth of purchase for online customers? If so a value tracker can track the incremental revenue attributed to the recommendations converted into a sale from the big data solution? In the digital big data use case catalogue tracking word of mouth analysis using social graphs is important because message virality is directly co-related to purchase behavior ... So to summarize ask yourself the most important question ' Is the use case I am instantiating on my hadoop cluster truly $ denting ?" or is it a nice to have ?

May 26, 2012

How to monetize from search data in Online travel ?


Every time you go to a travel agent to book a ticket on a flight, there are 2 broad kinds of transactions which are generated.
- Search request and response transactions 
- Booking transactions




While most travel organizations have mined their booking transactions data, not many insights have been juiced out of the search patterns for air booking transactions.
For example ,  If you are a price sensitive tourist looking for the cheapest tickets between Boston and Madrid in Nov on Economy class on a Friday evening. Or you could be a value conscious business traveler seeking Economy or Business class tickets at the last minute to ensure that you are on time for a crucial business meeting in New York. 
All the search requests and responses are captured in search log files and flushed out at regular intervals. These search logs which were traditionally seen as occupying a lot of disk space is suddenly viewed as a gold mine of interesting information. For example some interesting 
- Which are the heavily searched destinations from Bangalore on weekends / Holidays where an say Singapore airline has no service?
o An airline could use this information to expand its fleet of services to destinations which it currently does not serve and increase its share of market
Another scenario consists of segmenting customers based on price conscious search versus value conscious search behavior. Business users are typically convenience shoppers (correct timing and service excellence is important) whereas holiday shoppers typically are price conscious. (Getting the lowest price to Colombo is more important than catching the flight at a convenient time). A hadoop cluster consisting of about 6 nodes can be setup to ingest the search data and answer business questions which were previously unanswered

May 19, 2012

Why change? Does Big data = $? What is the Economics behind Big Data?


Flutura has been constantly asked a series of questions by hardnosed practical customers. These questions are from real world decision makers considering Big Data solutions in London, Dallas, Chicago, Bangalore, Dubai, Singapore and Riyadh. The breadth of industries spans large auto giants to telecom companies to health care service providers. All of them agree on the need to extract intelligence from the torrent of data from their ‘process exhausts’. All of them agree that the faster they do it the greater their competitive edge. But they have 
questions regarding the approach and solution.

  1. ·         Why not continue with a traditional BI infrastructure for dicing and slicing?
  2. ·         What’s wrong with my ‘as is’ current customer scoring or risk scoring data mining process in SAS?
  3. ·         Why not live with my current SIEM solution for log management in telecom?
  4. ·         Is there really a need to look at new age Big Data solutions?
  5. ·         Isn’t there more hype than substance?

These are genuine and valid questions. Perfectly valid question because since the industry has a history of dramatizing the need for new technologies, buzz words and solutions.
So how do we stay grounded and examine if there is TRULY a need to adopt big data solutions?
Here is a simple checklist of questions the answer to which can serve as a guiding compass to make the decision to implement a Big Data Solution or continue with existing BI solutions.
  • What are the new business use cases enabled by processing big data? For example can U customize the next best product to buy recommendation by having a deeper understanding of the micro clicks a user does on digital channel? It’s difficult to implement this on traditional low latency data solutions

  •   What is the impact of the new business use cases on reducing cost or enhancing revenue ? For example having a real time sense and respond infrastructure to respond to search behaviour within the session information increases repeat visit and purchase    

  •  What is the reduction in annual statistical license cost fee if I migrate the analytical scoring process from my current solution to an open source statistical package like R ?

  • What is the reduction in storage cost if I migrate data from my current SAN based solution to a Hadoop cluster consisting of a necklace of commodity hardware?
We should put not put the technical architecture ahead of business use cases and these questions can show the way.

April 13, 2012

Insurance Big Data - Harvesting causal predictors using call center text mining

While teasing out predictors for non renewal of an insurance policy in Insurance sector we investigate for clues for Policy surrender from a subset of call centre transcripts (subsample of 90 day conversations where an outbound CSR records the key points of conversation regarding a policy holders reasons for not subscribing. Since the volume of conversations is in the thousands its difficult to manually These conversations can be fed thru the unstructured text mining process which can extract the top 10 themes. Example we can track the frequency of occurrence of certain “WATCH LIST” keywords where a sudden increase in keyword frequency/themes like ‘ POOR SERVICE ‘, “PREMIUM AMOUNT” can signal a subscribers intent to not renew the policy

Overlaying Text mining over Behavioral Segmentation

While customer segmentation works on structured behavioral data to give us into a clue of the customer behavior, unstructured text mining can give us a clue as to what the customer actually feels about his or her experience. For example in one example in a recent exercise after segmentation of customer behavior they overlaid the key themes emerging from sentiment analysis by text mining of inbound customer calls to 1-800 numbers on top of it. This allowed the business to co-relate customer behavior with the underlying themes in conversations. For example the top 5 keywords which were used frequently by each behavioral segment can reveal a lot about the service levels / product coverage which influence the behavior reflected in the segment classifications. One can also discern if the migration of high value customers had anything to do with the increase in certain watch keywords in customer conversations. So consider combining text mining along with segmentation