Companies today are collecting data at an unprecedented rate, but how much of the collected data actually makes an impact on their business? According to ODPi, by 2020, the accumulated volume of Big Data will increase from 4.4 zettabytes to roughly 44 zettabytes or 44 trillion GB.
It’s a tall order for companies to translate this data into ROI, and many businesses still don’t know how to combine Business Intelligence (BI) with Big Data to get insightful business value.
Cupid Chan, CTO of Index Analytics and ODPi lead for the BI & AI Special Interest Group (SIG), tells his clients, “It doesn’t matter how much data you have; unless you can get the insight from it, it is just bits and bytes occupying the storage.”
To help such businesses gain insight into how BI can be addressed by Big Data through multi-structured data and advanced data analytics, ODPi has released a new whitepaper called “BI”g Data – How Business Intelligence and Big Data Work Together.
The latest whitepaper shares best practices for combining BI and Big Data. It also shares real end-user perspectives on how businesses are using Big Data tools, the challenges they face, and where they are looking to enhance their investments.
“BI”g Data Highlights
- Preferred BI/SQL connectors (Hive, Presto, Impala…etc.) for a BI tool to connect to Hadoop
- Best practices to connect to both Hadoop and RDBMS
- Recommended BI architecture to query data in Hadoop
- How BI runs advanced analytics, including Machine Learning algorithm on Hadoop
Chan said that even though vendors vary in tackling the Big Data problem, there are some common themes:
- The traditional way to store data in RDBMS is fading, and people are leveraging more and more Big Data platforms. Therefore, BI has to adapt in order to meet customer expectations.
- Users want the results now, not a few hours of batch processing once a query is executed. Therefore, BI vendors need to respond creatively, including proprietary connectors, in-memory, and hybrid approaches to meet this requirement.
- Instead of creating a brand new standard, vendors are Integrating with existing industry standards, such has R and Python, for advanced analytics to allow users to leverage broader community support.
How much of this data has value?
One trend we have noticed is that companies are collecting massive amounts of data without actually knowing the value of that data and what to do with it. Chan agrees that this is true, especially for those companies who have the budget to ingest as much data as they want.
“Even though this may not be the optimal way to do analytics, it’s not wrong either. In fact, another argument for this practice is unless you have such data available for further analysis, there is no way to prove that the data is worthless,” said Chan.
Chan came up with the “AI + BI = CI” concept, which he first presented at the Conference on Health IT and Analytics (CHITA) organized by the University of Maryland. He is of the opinion that the true intelligence we should pursue is Cognitive Intelligence (CI). This can be achieved by combining the Speed of Machine Learning (provided by AI) with the Direction Intuited from Human Insight (provided by BI). “If companies can focus more on the putting the right subject matter experts for a domain needed to be examined, we can be more efficient to pull the right data for the analytics,” he explained.
When asked about which Big Data/ML platforms and frameworks these companies should take advantage of, Chan said that for data, the most prominent tools are Apache Hadoop (Cloudera/Hortonworks), AWS (S3, EBS, etc.), Azure Storage (Block Blobs, Azure Data Lake Storage, etc.), and Google Cloud (BigTable, Cloud Storage, etc.). For ML, TensorFlow, Keras, Pytorch, and Apache MXNet are all popular.
According to Chan, companies that are just getting started with this effort can pick any of these frameworks to begin their journey. Companies that have already started should leverage their existing resources in-house first, before deciding to overhaul what they already have, he noted.
Data is the new soil
Modern companies must include Big Data/ML as part of their digital transformation strategy if they want to succeed. “Companies should look at Big Data/ML today the way they looked at building a website 25 years ago. It was expensive to build a website because it was the ‘cutting-edge’ technology. Could you delay building a website in your ‘digital transformation strategy’? Yes, but the result is you will lose the lead to your competitor. Not having Big Data/ML in your digital transformation strategy will be even more impactful due to the fast and furious nature of the technology. So it’s better to have the plan now, and improve it incrementally in an agile fashion,” he said.
You may have heard that “data is the new oil.” Chan, however, prefers the view that data is the new soil. “You can have very fruitful result if you plant your business model properly, but do not expect the fruit will come overnight. And, it requires more than soil for your business to bloom. You also need DevSecOps sunlight to provide photosynthesis, financial support as the fertilizer, proper temperature of the Industry trend, and managerial dedication to water consistently, even though the result can’t be seen immediately. All these need to work together to reap the fruit of new business model,” he said.
Hosted by The Linux Foundation, ODPi aims to be a standard for simplifying, sharing and developing an open big data ecosystem. Through a vendor-neutral, industry-wide approach to data governance and data science, ODPi members bring maturity, choice and collaboration to an open ecosystem.