There is no doubt that data is behind of any successful company.

Data have been around for decades. Companies were getting benefits from these data by applying different “statistical methods.

After some years, with the growth of data and the revolution of technology, companies started extracting patterns from data which lead to “data mining”.

Similarly, after few some years, due to new mathematical and statistical models, companies can now perform more accurate forecasts which lead to “predictive analytics”.

Revolution of Data Science

Get closer than ever to your customers. So close that you tell them what they need well before they realize it themselves. Steve Jobs.

Explosion of data

The arrival of internet, social media and the digitization of everything around the world have led to massive amount of data generated every second. For example:

  • Retails databases, logistics, financial services, healthcare and other sectors.
  • computers’ capabilities to extract meaningful information from still images, video and audio.
  • Smart objects and Internet of Things.
  • Social media, personnel files, location data and online activities.
  • Machine generate data, computer and network logs.

Accordingly, Big Data is defined by 3Vs (VolumeVariety and Velocity).

  • Volume: amount of data (Terabytes, Petabytes or more)
  • Variety: types of data (Text, Numbers, Files, Images, Video, Audio, machine data…)
  • Velocity: speed of data processing (Real-time, Streaming, Batching, uncontrollable…)

The infographic below illustrates the 3Vs:

Volume — Variety — Velocity

In God we trust. All others must bring data. William Edwards Deming

Additional Vs can be added to Big Data definition such veracity, variability, visualization and value.

  • Veracitytrustworthiness of the data. For example outdated contact numbers are inaccurate and the business cannot rely on it.
  • Variability: focuses on the correct meanings of row data that depends on its context. For example the word “Great” gives an positive idea, however “Greatly disappointed” gives negative impression.
  • Visualization: refers to how the data is presented to business users (tables, graphical views, charts…)
  • Value: unless turning data into value, it is become useless. Businesses expect significant value from investing in Big Data.

Big Data Challenges

Big data is so big and complex that traditional computer solutions, relational databases, data processing methods and traditional analytics are not scalable to deal with it.

Accordingly, for getting value from Big Data, organizations have to deal with Data Pipeline and Data Science.

The infographic below illustrates the process:

Big Data Science Cycle

What is a Data Pipeline — ETL?

At the beginning of any analytics, data-driven decision require well-organizedand relevant data stored in a digital format. To get there, Data Pipeline is needed.

A Data Pipeline, also known as ETL (Extract — Tranform — Load), is a set of automated sequential actions to extract data from “different sources” and load it into a “target databases or warehouse”. During this process, data needs to be shaped or cleaned before loading it into its final destination.

Data Streaming Process

Extract, Transform and Load (ETL) is considered the most underestimated and time-consuming process in data warehousing development. Often 80% of development time is spent on ETL. J. Gamper, Free University of Bolzano

ETL process involves the following actions:

  • Extract: Connecting to various data sources, selecting and collecting the necessary data for further processing.
  • Transform: Applying various business rules and operations such as filtering, cleaning, sorting, aggregating, masking, validation, formatting, standardizing, enrichment and more.
  • Load: Importing the extracted and transformed data into warehouse or any target database.

What is Data Store?

After Extract Transform Load process, data will be stored into a ready-to-consume format for analytics. But due to the variety, volume and value of data, different technologies and methods should be considered.

Accordingly, a Data Store is a repository for persistently storing and managing collections of data which include not just repositories like databases, but also simpler store types such as simple files, emails etc. Wikipedia

Data store may be classified as:

Warehouse: is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed to provide greater executive insight into corporate performance. #structuted #relational #performance #scalable

Data Lake: is a centralized storage repository that holds a vast amount structured and unstructured data at any scale. Data can be stored data as-is, without having to first structure the data, and run different types of historical and real-time analytics

MDM “Master Data Management”: is a comprehensive method to link all critical data to a common point of reference. It’s a pillar to improve data quality.

For example, suppose a customer is presented in many systems within the organization, but his name, address might not be same in all the systems. For this reason we need methods for cleansing the data, match the data and then create a unique Master version of the existing data.

Extract Business Value

Big Data Analytics is a combination of scientific methods, processes, algorithms and systems required to extract business value, knowledge, insights, intelligence, analytics and predictions from data.

Extract Business Value — Data Analytics vs Data Analysis

Data Analytics covers different areas and goals such:

Business Intelligence — BI: is a combination of technologies and methods that use current and historical data to support strategic and tactical data-driven business decisions. The analyzed data will be presented in the format of metrics, KPIs, reports and dashboards.

Advanced Analytics: works beyond of traditional business intelligence (BI), to discover deeper insights, make predictions and forecasting “Predictive Analytics”. Also it enables businesses to conduct what-if analyses to predict the effects of potential changes in business strategies. It includes different techniques such:

  • Data mining, pattern matching and forecasting
  • Semantic, sentiment, network, cluster, graph and regression analysis
  • Multivariate statistics, simulation, complex event processing and neural networks

Machine Learning — ML: is creating an algorithm, which can be used by computers to find a model that fits the data as best as possible, and makes very accurate predictions based on that.

The concept is build a “Model” by implementing algorithms to train the “Machine Learning” using data. Accordingly, the ML tries to categorize data based on its hidden structure. Roughly, training algorithm can fall into three categories SupervisedUnsupervised and Reinforcement.

About the Author

This submitted article was written by Peter Jaber, a solutions Architect with over 20 years of experience. Contact.

Recently Published

Key Takeaway: Conspiracy theories are prevalent and can involve various factors. People believe false conspiracy theories for various reasons, such as the existence of real conspiracies. However, unfounded conspiracy theories often lack evidence and substitute elements that should be red flags for skeptics. To vet a claim, one should seek out evidence, test the allegation, […]
Key Takeaway: Recent research has focused on replicating the chemical reactions that constitute life as we know it in conditions plausible for early Earth around 4 billion years ago. However, the rise of experimental work has led to many contradictory theories. Some scientists believe that life emerged in deep-sea hydrothermal vents, where the conditions provided […]

Top Picks

Key Takeaway: NASA’s Curiosity and Perseverance rover missions are investigating the planet’s evidence for life, known as its “biosignatures,” in unprecedented detail. The rovers are acting as extraterrestrial detectives, hunting for clues that life may have existed eons ago, including evidence of long-gone liquid surface water, life-sustaining minerals, and organic molecules. The Mars of today […]
Key Takeaway: Jonathan Haidt’s book, The Anxious Generation, calls for action to limit teenagers’ smartphone access and address the mental health crisis caused by the widespread use of smartphones. Haidt cites the “great rewiring” period from 2010 to 2015 as a time when adolescents’ neural systems were primed for anxiety and depression by daily smartphone […]
Key Takeaway: Concerns about AI’s potential roguehood and potential harm to privacy and dignity are a significant concern. AI’s algorithms, programmed by humans, are also biased and discriminatory. However, a psychologist’s research suggests that AI is a threat to making people less disciplined and skilled in making thoughtful decisions. Making thoughtful decisions involves understanding the […]
Key Takeaway: A study published in the Journal of Personality suggests that long-term single people can be secure and thriving, possibly due to their attachment style. The research found that 78% of singles were insecure, with 22% being secure. Secure singles are comfortable with intimacy and closeness in relationships, while anxious singles worry about rejection […]


I highly recommend reading the McKinsey Global Institute’s new report, “Reskilling China: Transforming The World’s Largest Workforce Into Lifelong Learners”, which focuses on the country’s biggest employment challenge, re-training its workforce and the adoption of practices such as lifelong learning to address the growing digital transformation of its productive fabric. How to transform the country […]

Join our Newsletter

Get our monthly recap with the latest news, articles and resources.


Welcome to Empirics

We are glad you have decided to join our mission of gathering the collective knowledge of Asia!
Join Empirics