August 12, 2012

11 Core Big Data Workload Design Patterns



As big data use cases proliferate in telecom, health care, government, Web 2.0, retail etc there is a need to create a library of big data workload patterns. Flutura has created a Big data workload design pattern to help map out common solution constructs. There are 11 distinct workloads showcased which have common patterns across many business use cases.

A Big data workload design pattern is a template for identifying and solving commonly occurring big data workloads. The big data workloads stretching today’s storage and computing architecture could be human generated or machine generated. The big data design pattern may manifest itself in many domains like telecom, health care that can be used in many different situations. But irrespective of the domain they manifest in the solution construct can be used. Once the set of big data workloads associated with a business use case is identified it is easy to map the right architectural constructs required to service the workload - columnar, Hadoop, name value, graph databases, complex event processing (CEP) and machine learning processes

Here is a birds eye view of the various workload patterns



Data Workload-1:  Synchronous streaming real time event sense and respond workload
It essentially consists of matching incoming event streams with predefined behavioural patterns & after observing signatures unfold in real time, respond to those patterns instantly.
Example:  In  registered user digital analytics  scenario one specifically examines the last 10 searches done by registered digital consumer, so  as to serve a customized and highly personalized page  consisting of categories he/she has been digitally engaged. Also depending on whether the customer has done price sensitive search or value conscious search (which can be inferred by examining the search order parameter in the click stream) one can render budget items first or luxury items first
Similarly let’s now switch over to a health care situation.  In hospitals patients are tracked across three event streams – respiration, heart rate and blood pressure in real time. (ECG is supposed to record about 1000 observations per second). These event streams can be matched for patterns which indicate the beginnings of fatal infections and medical intervention put in place

Data Workload-2:  Ingestion of High velocity events - insert only (no update) workload
This is a unique workload widely experienced while ingesting terabytes of sensor and machine generated data. These are insert only workloads with no updates or lookup workloads
Example: Ingesting millions of micro events streaming from log files , Firewall alarms, sensor data, and the click stream data torrent. It is estimated that a Boeing flight has the potential to generate 200 terabytes of data on a single flight. Data from vibration sensors, temperature sensors, strain gauges, position data, speed etc … Imagine ingesting all this data for all the flights !

Workload-3: High node Social graph traversing
This is a unique workload where finding interrelationship around nodes in a network is vital. This workload is computation and read intensive as node statistics need to be computed and children of a node need to be read dynamically.
Example: In the telecom industry where there are millions of pre paid and post paid subscribers, the CDR (Call Detail Record ) consists of terabytes of switch logs which contain important patterns regarding inter-relationships between subscribers. This can be mined using graph databases to understand if certain new gaming applications or apps which are downloaded are getting viral with friends and family circles by traversing computation intensive graph traversals
Similarly in social websites, millions of interrelationship are stored as a graph and  one needs to traverse large complex graphs and  map key influencers who are capable of influencing a marketing outcome or to recommend a friend to expand the social network to its edges

Workload-4 : ‘Needle in a haystack’ workloads
Looking for a small string or attribute value from the terabytes of data across multiple attributes is a very common read workload specifically in machine data use cases
Example :  While processing terabytes of  sensor data from engines one may look for specific temperature and Rpm conditions behind an automobile breakdown. Similiarly security specialists investigating a network breach incident may wade thru steams of granular log data from multiple devices before homing in on crucial events vital to giving clues about the cause of an attack

Workload-5 :  Multiple event stream mash up & cross referencing events across both streams
Usually events in isolation may not have significance but taken together as a string of events occurring in a timeline there importance amplifies especially across multiple event streams
 Example : In telecom there is a need to mash up firewall events on a timeline along with router events to detect patterns in a distributed denial of services ( DDOS) attack

Workload-6 : Text indexing workload on large volume semi structured data
While processing semi structured data tools like Lucene needs to index the strings
Example : In medical scenarios, one need to identify all encounters of a patient with the doctor which has specific disease keywords and then analyze the health outcome of the patient

Workload-7 : Looking for absence of events in event streams in a moving time window
While most pattern detection consists of behaviour/patterns exhibited, it also makes sense to look out for ABSENCE of specific events across moving time windows as they may alert to a risk or a revenue potential
Example: In an online travel website, its important to sort thru the avalanche of log file data flowing in and isolate search instances which did NOT result in a booking event. So we are traversing a moving time window where there is a sequence of search events which do not have a book event.

Workload-8: High velocity, concurrent inserts and updates workload
Its very common to have thousands of users across the world update or insert based on booking or gaming applications
Example: Thousands of flight orders bookings, payment transactions online

Workload -9 : Semi & Unstructured data ingestion
 It is said that 80 % of the worlds information is unstructured and bringing it into repositories to analyze them may yield previously untapped intelligence
Example: Medical records – xray, ecg results  need to be digitized ( unstructured )and doctors observations on the patient ( semi structured ) need to be recorded
  
  
Workload-10 : Sequence analysis workloads
It is very common to chunk pieces of events together and examine if there are patterns which tell a story regarding the problem context
Example :  In genome and life sciences, DNA sequencing a crucial. Similarly in the telecom industry there are a lot of dropped calls from a switch which needs to be analyzed using sequence analysis processes to understand events leading to that outcome of interest.

Workload-11 : Chain of thought ad hoc workload for data forensic work
This workload is primarily triggered by power users or analysts who are the ‘Data Marco Polos’ exploring  large oceans of data with questions previously not thought off. They cast a wide net and often come up with few patterns. But when they identify a pattern it has huge repercussions for the organisation
Example :  Pricing analysts want to investigate consumer behaviour before they price a service.  They may have a sequence of hypothesis to test in a certain sequence before arriving at the optimal price point. Similiarly Infrastructure specialists want to confirm or reject hypothesis regarding effect of newly launched apps on digital traffic by sequencing a specific set of hypothesis regarding app engagement and its effect on network infrastructure load


So far we have seen  a draft articulation of workload patterns. It is our endeavour to make it collectively exhaustive and mutually exclusive with subsequent iteration.


As Leonardo Vinci said “Simplicity is the ultimate sophistication” …. Big data workload design patterns help simplify the decomposition of the business use cases into workloads. The workloads can then be mapped methodically to various building blocks of Big data solution architecture. Yes there is a method to the madness J



63 comments:

  1. Most boardrooms are by now well aware of the benefits of cloud computing and its important role in any company's formal IT portfolio. Yet many businesses remain sceptical about data security, with 87 per cent of financial services organisations admitting they do not plan to migrate their most important applications to the virtual data room providers due to security concerns.

    ReplyDelete
  2. Thanks so much for taking the time to submit such a comprehensive, beneficial post. I've been being "swayed" by all the positive talk. A valuable information is also available here.
    Big Data Analytics Training in Chennai | Dot Net Training in Chennai

    ReplyDelete
  3. After reading this blog I am very strong in this topics and this blog is really helpful to all.. Explanation are very clear so it is easy to understand.. Thanks for sharing this blog…
    Python Training in Chennai | Web Designing Training in Chennai

    ReplyDelete
  4. Hi, Very nice pretty article! I read that some topics this blog.Our blog sites I'm read it's gone now very interesting.You are a very great content, published author.Keep it well sharing. Cloud Computing Training in Chennai | Selenium Training in Chennai | Salesforce Training in Chennai

    ReplyDelete
  5. Thanks so much for taking the time to submit such a comprehensive, beneficial post. I've been being "swayed" by all the positive talk. A valuable information is also available here.


    big data Training in Bangalore

    ReplyDelete
  6. I have read your blog its very attractive and impressive. I like it your blog.
    Abinitio Online Training
    Hadoop Online Training
    Cognos Online Training

    ReplyDelete
  7. Thanks for sharing the information
    http://www.capitalinfosol.com/salesforce-training-in-hyderabad/
    Best Salesforce Training in Hyderabad</a

    ReplyDelete
  8. Useful Information and it helps me with my presentation today. Get the best I.T training and placement in Bangalore from TIB Academy

    ReplyDelete
  9. The article was very useful to get more ideas. thank you for such an interesting article. get still some more details from top most training institutes in Bangalore,TIB ACADEMY
    -

    ReplyDelete
  10. Hey I surfed online and I found this amazing website the Best BigData Training Institute in Bangalore seems very useful to me as yours. Thank you

    ReplyDelete
  11. awesome post . Iam a Big Data course trainer of global training institute also called bigdata hadoop tutorials in bangalore. I had used your posts for my reference that is amazing.

    ReplyDelete
  12. can you offer guest writers to write content for you? I wouldn’t mind producing a post or elaborating on some the subjects you write concerning here. Again, awesome weblog!
    nebosh course in chennai
    offshore safety course in chennai

    ReplyDelete
  13. Thanks For Sharing The Information The Information Shared Is Very Valuable Please Keep Updating Us Time Just Went On Reading The article Python Online Course Hadoop Online Course Aws Online Course Data Science Online Course

    ReplyDelete
  14. It is an amazing blog on designs of IoT. It is very helpful.
    Data Analytics Courses

    ReplyDelete
  15. Thanks for sharing great info … Hiring a limousine are excellent option to make your special occasion more delightful. Limo Hire MelbourneHummer Hire Melbourne.

    ReplyDelete
  16. Those guidelines additionally worked to become a good way to recognize that other people online have the identical fervor like mine to grasp great deal more around this condition.outsource accounting in dubai

    ReplyDelete
  17. I really like your post. Thanks for sharing such a valuable post. Please keep sharing such kind of post. It will be helpful for other.

    Delhi's best python institute
    Noida's best python institute

    Gurgaon's best python institute

    ReplyDelete
  18. Great post i must say and thanks for the information. Big Data is definitely an important subject and I appreciate your post and look forward to more.
    Data Analytics Courses

    ReplyDelete
  19. Workload big data design is awesome. Keep posting informative articles. car hire melbourne airport | car rentals melbourne 

    ReplyDelete
  20. Such a great and informative article.
    You just made my day thanks for sharing this article.

    data science course singapore is the best data science course

    ReplyDelete
  21. Very informative blog and useful article thank you for sharing with us , keep posting learn more about aws with cloud computing

    AWS Online Training

    AI Training

    ReplyDelete
  22. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us
    You will get an introduction to the Python programming language and understand the importance of it. How to download and work with Python along with all the basics of Anaconda will be taught. You will also get a clear idea of downloading the various Python libraries and how to use them.
    Topics
    About ExcelR Solutions and Innodatatics
    Do's and Don’ts as a participant
    Introduction to Python
    Installation of Anaconda Python
    Difference between Python2 and Python3
    Python Environment
    Operators
    Identifiers
    Exception Handling (Error Handling)
    Excelr Solutions

    ReplyDelete
  23. Way cool! Some very valid points! I appreciate you penning this article and also the rest of the site is really good.
    Java Training in Bangalore
    selenium training institute in Bangalore
    Advanced Java Training in Bangalore

    ReplyDelete
  24. You will get an introduction to the Python programming language and understand the importance of it. How to download and work with Python along with all the basics of Anaconda will be taught. You will also get a clear idea of downloading the various Python libraries and how to use them.
    Topics
    About ExcelR Solutions and Innodatatics
    Do's and Don’ts as a participant
    Introduction to Python
    Installation of Anaconda Python
    Difference between Python2 and Python3
    Python Environment
    Operators
    Identifiers
    Exception Handling (Error Handling)
    Excelr Solutions

    ReplyDelete
  25. I really appreciate your post and you explain each and every point very well. Thanks for sharing this information
    http://www.rankingsstar.com

    ReplyDelete
  26. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
    Data science training in chennai |Data science course in chennai

    ReplyDelete
  27. I love your article so much. Good job
    ExcelR is a global leader delivering a wide gamut of management and technical training over 40 countries. We are a trusted training delivery partner of 350+ corporate clients and universities across the globe with 28,000+ professionals trained across various courses. With over 20 Franchise partners all over the world, ExcelR helps individuals and organisations by providing courses based on practical knowledge and theoretical concepts.

    Excelr Solutions

    ReplyDelete
  28. Sehar News is a wide area that envelops pakistan news , kashmir news , International News, Sports News, Arts and
    Entertainment News, Science and Technology, Business News, latest news in urdu , Education News and today news Columns.
    The perusers can snatch most recent urdu news dependent on different political and get-together
    occurring in the nation. Sehar News covers the most recent and up and coming news features, Read today urdu news and top stories from different backgrounds and carries it to the viewers



    wanna know latest pakistan news ? click pakistan news and know more.

    Read latest news in urdu and know more .

    read all the latest urdu news in this site.

    you dont know ? about today news click here and know more.

    know the current news of kashmir news check here.

    read all about today urdu news and gain knowledge.

    ReplyDelete
  29. Established in 2016 SSDWebHosting.net is providing top quality domain and hosting services worldwide to
    our valued customers and trying to play a little role in their success.We offer about 500 distinctive gTlds and ccTlds
    to look over which includes old master class gTlds like .com, .net and .org, in addition this we also offer newly launched Tlds
    like .xyz, .online, .master, .office, .on, .top and .club. We can assist you with choosing the best fitting name. Let's bring your
    thought or business on the web. visit this site https://SSDWebHosting.net/ to know more.


    Do you wanna buy SSD Web Hosting visit here.
    Find best Cheap Web Hosting here.

    ReplyDelete
  30. punaise des lits sont l'un des problèmes les plus difficiles à éliminer rapidement.
    La meilleure solution, de loin, pour lutter contre punaise des lits est d'engager une société de lutte antiparasitaire.
    ayant de l'expérience dans la lutte contre punaise des lits . Malheureusement, cela peut être coûteux et coûteux.
    au-delà des moyens de beaucoup de gens. Si vous pensez que vous n'avez pas les moyens d'engager un professionnel
    et que vous voulez essayer de contrôler traitement des punaises de lit vous-même, il y a des choses que vous pouvez faire. Avec diligence
    et de patience et un peu de travail, vous avez une chance de vous débarrasser de punaises de lit paris dans votre maison.


    Vous voulez supprimer punaise des lits de votre maison ?
    se débarrasser de punaises de lit paris cocher ici
    nous faisons traitement des punaises de lit de façon très professionnelle.

    ReplyDelete
  31. In our culture, the practice of treatment through various burn fat herbs and
    spices is widely prevalent. This is mainly due to the reason that different burn fat herbs grow in great abundance here. In addition to the
    treatment of various ailments these herbs prove beneficial in Healthy Ways To Lose Weight
    , especially for those who want to burn fat herbs

    we live in a world where diseases and their prevalence has gone off
    the charts. With the ever-growing incidences of illnesses and
    sufferings, one finds themselves caught up in a loop of medications
    and doctors’ visits. We, at https://goodbyedoctor.com/ , aim to find solutions for
    all your health-related problems in the most natural and harmless ways.
    We’re a website dedicated to providing you with the best of home
    remedies, organic solutions, and show you a path towards a healthy,
    happy life. visit https://goodbyedoctor.com/
    this site daily to know more about health tips and beauty tips.

    ReplyDelete
  32. Ez battery reconditioning reviews - You can now easily revive your old batteries with this
    Ez battery reconditioning pdf which provides step by step instructions for recondition a battery.
    Ez battery reconditioning blog publishes how Ez battery reconditioning programs works
    and where buy Ez battery reconditioning step by step program online after this
    candid Ez battery reconditioning reviews. Battery reconditioning course is newbie friendly. It may help you
    set up and run your own battery reconditioning business by learning this skill at home.
    How to recondition a battery with Ez battery reconditioning .
    Have you heard about Tom Ericson's Ez battery reconditioning reviews technique and
    are wondering whether it is possible or not. visit https://ezbatteryreconditioninginfo.com/ this site to know more. Thank you every one.

    ReplyDelete
  33. I like you article. if you you want to saw Sufiyana Pyaar Mera Star Bharat Serials Full
    Sufiyana Pyaar Mera

    ReplyDelete
  34. Tech Gadgets reviews and latest Tech and Gadgets news updates, trends, explore the facts, research, and analysis covering the digital world.
    You will see Some Tech reviews below,

    lg bluetooth headset : You will also wish to keep design and assorted features in mind. The most essential part of the design here is the buttonsof lg bluetooth headset .

    Fastest Car in the World : is a lot more than the usual number. Nevertheless, non-enthusiasts and fans alike can’t resist the impulse to brag or estimate according to specifications. Fastest Car in the World click here to know more.

    samsung galaxy gear : Samsung will undoubtedly put a great deal of time and even more cash into courting developers It is looking for partners and will allow developers to try out
    different sensors and software. It is preparing two variants as they launched last year. samsung galaxy gear is very use full click to know more.

    samsung fridge : Samsung plans to supply family-oriented applications like health care programs and digital picture frames along with games It should stick with what they know and they
    do not know how to produce a quality refrigerator that is worth what we paid. samsung fridge is very usefull and nice product. clickcamera best for travel: Nikon D850: Camera It may be costly, but if you’re trying to find the very best camera you can purchase at this time, then Nikon’s gorgeous DX50 DSLR will
    probably mark each box. The packaging is in a vibrant 45.4-megapixel full-frame detector, the picture quality is simply wonderful. However, this is just half the story. Because of a complex 153-point AF system along with a brst rate of 9 frames per minute. camera best specification. click here to know more.

    ReplyDelete