March 31, 2012

Decoding Digital Consumers Intent = Apriori + Sequence patterns

Lets say Mary visited a famous Shampoo website and the following sequence data is the digital trail left behind


31-Mar-2012 11:00 am ; Reads reviews on Shampoo Brand X
31-Mar-2012 11:02 am : Parametric Search for Shampoo Brand X and orders the results by price
31-Mar-2012 11:03 am : Clicks on 3rd product displayed on search
31-Mar-2012 11:07 am : Goes back to search results page
31-Mar-2012 11:08 am : Clicks on 7th product displayed on search
31-Mar-2012 11:14 am : Advocacy event : Mary sends this Shampoo brand link as an email to 5 of her friends

This activity which Mary exhibited - Advocacy is a very engaged action. Primarily because a mail from a friend has greater chance of being clicked by recipients of the mail clicking and engaging with the brand as opposed to the organisation directly sending a mail to the consumer.

In this context it is extremely important to find out if there are any upstream behaviors a consumer exhibits which are co-related across time to a downstream event of interest ( say an advocacy event ) ?

Apriori algorithms can be used to mine the sequence patterns which are statistically significant and use that to predict with a certain confidence level the probability of an advocacy event happening. Once we run the algorithm there can be multiple sequences which are statistically significant. Statistical significance can be measured both by 'Support' and 'Confidence' of a particular sequential pattern. Once we get sequential patterns which have crossed the threshold for support and confidence we can leverage these patterns to make an intervention in the digital shoppers behavior

March 29, 2012

"Sara" & Big Data patterns

Saraswati is the goddess of knowledge in Hinduism. The sanskrit root 'sara' means essence. Saraswathi is symbolic of the need to extract and respect knowledge.  

So how does one extract the 'sara' in big data ? How does one codify the knowledge extracted from loads of sensor data, digital intent data, unstructured data floating around websites etc ?

Using various machine learning algorithms statistically significant patterns can be surfaced. In the process of 'bubbling up' patterns its conceivable that hundreds of statistically significant patterns can be surfaced. It is the duty of the domain analyst to put these patterns into business context and rank them on their potential to generate revenue, optimize cost or affect the risk matrix of an organisation. Once these patterns are extracted they can then be codified into a business rules engines which feeds a channel application. The channel application can then leverage that knugget of 'sara'(essence) which was extracted from  consumers behavior data streams to present the next best offer or a customized promotion or a highly personalized intimate webpage etc.

One can always draw inspiration from Saraswati to build a patterns repository which has the 'sara' or essence of the core $ denting patterns !

More @

Big Data use case : Optimizing cost of telecom tower maintenance

One of the most compelling use cases for Big Data is in managing fuel consumption in the telecom tower business.Krish was recently engaged in a conversation with a few folks who had intimate understanding of the business. Each of the telecom towers has a generator and one of the biggest components of cost is diesel cost.  There are many sensors/energy meters which constantly emit large data streams of operational data. How can they learn from years of humongous data collected across all towers regarding fluctuations in operational characteristics. Machine learning algorithms can crawl thru terrabytes of operational data stored across years to say with confidence what will happen and the drivers/operational patterns which lead to the event happening. Algorithms like apriori can spot failure signatures and these statistically significant patterns can be chunked into a knowledgebase. In fact machine data use cases though not as glamorous as digital use cases deliver far more dent to the cost/revenue levers than the digital big data use cases

More at

March 28, 2012

Analyzing Telecom Machine data

Telecom industry has some of the "ripest" use cases for Big Data. Lets look at the sheer number of devices in the periphery of a telecom infrastructure. 
They have 
- Firewalls 
- Routers 
- Switches 
- Towers 
- Application servers.

Each of these devices are constantly emitting data which reveals something about the state of the device 
- Perimeter devices login events and failure 
- User activity . Ex : Admin user / sys etc - Changes to user privileges 
- Configuration changes 
- Policy changes 
- Failure/System down events 
- High resource usage events/activity 
- Reboots, Resets and restart activity 
- Accounts created, deleted or modified 
- All administrator or root user activities on all servers 
- All group activity including creation, deletion, membership changes and rights or permissions assignments 
- All files accessed on the server.

From a compliance perspective as well as security and infrastructure load management perspective, telecom companies are required to manage this huge torrent of data. But traditional data solutions fail because of the following reasons 
- Sheer Velocity of device emitted events 
- Too many attributes in each device event 
- Stretches Traditional data solutions 

 An in many cases do not seem to work.

More at

Hadoop Cluster to mine Telecom machine data

So how can terabytes of telecom device/event data be managed and mined ?
Telecom infrastructure captures log events that describe the behavior of thousands of devices within its asset intensive infrastructure - firewalls, towers, switches, servers etc. Each of these devices emit logs and alarm events in the log describe the health and activity of these devices.  Understanding embedded in core telecom machine data is key. Hadoop can store and analyze log data, and builds a higher-level picture of the health of the data center as a whole.

Lets consider a real life use case where we need to identify the sequence of events preceding an adverse events ?

- Multi terabyte event logs
- Millions of atomic event
- A few hundred adverse events

Algorithms to decode linkages between upstream events and downstream adverse event.  An apriori algorithm can be executed on a Hadoop node to surface those sequence of events which seem to co-related to adverse events. This algorithm needs to traverse moving time windows of event logs to discover the most important sequences which are statistically significant from an adverse event point of view.

March 27, 2012

Big Data use case in Channel sentiment mining

Text mining consumer sentiments -A large volume of unstructured consumer comments are stored in call center transcripts and opinion platforms. In one instance there were inbound call center conversations transcripts which contained a lot of patterns regarding customer experience of a credit card products - credit limit themes, customer experience themes . While mining outbound call center transcripts we found themes related to collections, payment behavior etc. These are huge terabytes/petabytes of consumer sentiments data pools completely untapped. An unstructured text mining process can ingest this data and make sense of key themes underlying the consumers sentiment. More @

Modeling Butterfly effect in word of mouth behavior

“Butterfly effect” is an effect introduced by Edward Lorenz that essentially studies the amplification effect of a small change in one place to a large change in some other place.
Word of mouth activation is one of the most engaged action a user can undertake while on a website.So how can the principles of Butterfly effect be used to model word of mouth behavior ?If one wants to model complex social graph to discern the most active word of mouth behavior one can use ready made social graph libraries available. Using Social graph mapping tools one can quantitatively measure the amplification factor implicit in the “Butterfly effect”. For example, A tiny variation in no of clicks to activate a WOM event on Social diffusion of message and buzz velocity .Messages can become viral via many social channels - email referral, facebook , twitter etc. A tiny variation in range of channels to activate a WOM event can have a big impact on the viral velocity of a message. More at

“Forty second Boyd” & Big data

“Forty second Boyd” was the nickname for John Boyd who was a US Air force military strategist. He was known as “Forty Second Boyd" for his standing bet that beginning from a position of disadvantage, he could defeat any opposing pilot in air combat maneuvering in less than forty seconds. Businesses world wide have applied Boyds strategies in combat maneuvering to business. One of John Boyd's primary insights was that it is vital that the pilot must change speed/direction faster than the opponent can think and act. Getting "inside" the cycle—short-circuiting the opponent's thinking processes—produces opportunities for the opponent to react inappropriately. So how can this are applied to the world of business. In the world of business the relative ability to decode Big Data patterns faster from competition results in increased survival rate. Read more about it at

Why did we choose the name "FLUTURA" ?

2 core themes are dear to us - "TRUST" & "TRANSFORMATION". We looked at the most dramatic transformation which happens around us - One of a caterpillar morphing into a butterfly and said we want to do this. We want to do this to our customers decisioning processes. We want to do this to our employees who seek to transform their latent potential. We seek to do this to our partners who seek to grow with us. "FLUTURA" means BUTTERFLY in ALBANIAN ! We Love seeing Transformation around us !