Posts

Big Data

A new whitepaper from ODPi helps businesses gain insight into how Business Intelligence can be addressed by Big Data through multi-structured data and advanced analytics.

Companies today are collecting data at an unprecedented rate, but how much of the collected data actually makes an impact on their business? According to ODPi, by 2020, the accumulated volume of Big Data will increase from 4.4 zettabytes to roughly 44 zettabytes or 44 trillion GB.

It’s a tall order for companies to translate this data into ROI, and many businesses still don’t know how to combine Business Intelligence (BI) with Big Data to get insightful business value.

Cupid Chan, CTO of Index Analytics and ODPi lead for the BI & AI Special Interest Group (SIG), tells his clients, “It doesn’t matter how much data you have; unless you can get the insight from it, it is just bits and bytes occupying the storage.”

To help such businesses gain insight into how BI can be addressed by Big Data through multi-structured data and advanced data analytics, ODPi has released a new whitepaper called “BI”g Data – How Business Intelligence and Big Data Work Together.

The latest whitepaper shares best practices for combining BI and Big Data. It also shares real end-user perspectives on how businesses are using Big Data tools, the challenges they face, and where they are looking to enhance their investments.

“BI”g Data Highlights

  • Preferred BI/SQL connectors (Hive, Presto, Impala…etc.) for a BI tool to connect to Hadoop
  • Best practices to connect to both Hadoop and RDBMS
  • Recommended BI architecture to query data in Hadoop
  • How BI runs advanced analytics, including Machine Learning algorithm on Hadoop

Chan said that even though vendors vary in tackling the Big Data problem, there are some common themes:

  1. The traditional way to store data in RDBMS is fading, and people are leveraging more and more Big Data platforms. Therefore, BI has to adapt in order to meet customer expectations.
  2. Users want the results now, not a few hours of batch processing once a query is executed. Therefore, BI vendors need to respond creatively, including proprietary connectors, in-memory, and hybrid approaches to meet this requirement.
  3. Instead of creating a brand new standard, vendors are Integrating with existing industry standards, such has R and Python, for advanced analytics to allow users to leverage broader community support.

How much of this data has value?

One trend we have noticed is that companies are collecting massive amounts of data without actually knowing the value of that data and what to do with it. Chan agrees that this is true, especially for those companies who have the budget to ingest as much data as they want.

“Even though this may not be the optimal way to do analytics, it’s not wrong either. In fact, another argument for this practice is unless you have such data available for further analysis, there is no way to prove that the data is worthless,” said Chan.

Chan came up with the “AI + BI = CI” concept, which he first presented at the Conference on Health IT and Analytics (CHITA) organized by the University of Maryland. He is of the opinion that the true intelligence we should pursue is Cognitive Intelligence (CI). This can be achieved by combining the Speed of Machine Learning (provided by AI) with the Direction Intuited from Human Insight (provided by BI). “If companies can focus more on the putting the right subject matter experts for a domain needed to be examined, we can be more efficient to pull the right data for the analytics,” he explained.

When asked about which Big Data/ML platforms and frameworks these companies should take advantage of, Chan said that for data, the most prominent tools are Apache Hadoop (Cloudera/Hortonworks), AWS (S3, EBS, etc.), Azure Storage (Block Blobs, Azure Data Lake Storage, etc.), and Google Cloud (BigTable, Cloud Storage, etc.). For ML, TensorFlow, Keras, Pytorch, and Apache MXNet are all popular.  

According to Chan, companies that are just getting started with this effort can pick any of these frameworks to begin their journey. Companies that have already started should leverage their existing resources in-house first, before deciding to overhaul what they already have, he noted.

Data is the new soil

Modern companies must include Big Data/ML as part of their digital transformation strategy if they want to succeed. “Companies should look at Big Data/ML today the way they looked at building a website 25 years ago. It was expensive to build a website because it was the ‘cutting-edge’ technology. Could you delay building a website in your ‘digital transformation strategy’? Yes, but the result is you will lose the lead to your competitor. Not having Big Data/ML in your digital transformation strategy will be even more impactful due to the fast and furious nature of the technology. So it’s better to have the plan now, and improve it incrementally in an agile fashion,” he said.

You may have heard that “data is the new oil.” Chan, however, prefers the view that data is the new soil. “You can have very fruitful result if you plant your business model properly, but do not expect the fruit will come overnight. And, it requires more than soil for your business to bloom. You also need DevSecOps sunlight to provide photosynthesis, financial support as the fertilizer, proper temperature of the Industry trend, and managerial dedication to water consistently, even though the result can’t be seen immediately. All these need to work together to reap the fruit of new business model,” he said.

Hosted by The Linux Foundation, ODPi aims to be a standard for simplifying, sharing and developing an open big data ecosystem. Through a vendor-neutral, industry-wide approach to data governance and data science, ODPi members bring maturity, choice and collaboration to an open ecosystem.

New members support efforts to advance data governance and data science approaches

Berlin, Germany – April 16, 2018 – DataWorks Summit — ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced that Attunity and ING have joined the initiative to advance data governance and data science approaches.

Many vendors have focused on productizing Apache Hadoop® as a distribution, which led to inconsistency that increased the cost and complexity for application vendors and end-users to  fully embrace Apache Hadoop. Founded in 2015, ODPi is an industry effort to accelerate the adoption of Apache Hadoop and related big data technologies. ODPi’s members aim to accelerate Apache Hadoop adoption through a neutral, industry-wide approach to data governance and data science. Together, they are supporting the mission of creating an open data ecosystem through collaboration with subject matter experts and data platform and tools vendors.

The Big Data market has, in part due to efforts by ODPi and its members, achieved the desired simplification of the Apache Hadoop landscape. However, barriers to broader and more rapid enterprise Hadoop adoption exist and can benefit from a neutral, industry-wide approach to data governance and data science,” said John Mertic, director of program management, ODPi. “We are thrilled to have Attunity and ING on board as ODPi members to help us further these industry-wide approaches.”

The new ODPi members will join a diverse and growing group of members that include well-known Apache Hadoop software companies, service providers and end users, as well as a rapidly growing community.

ING Information Architect and Application Developer, Maryna Strelchuk, and ODPi Director of Program Management, John Mertic, will be co-presenting at DataWorks Summit on The rise of big data governance: Insight on this emerging trend from active open source initiatives.

About the newest members:

Attunity is a leading provider of modern data integration and Big Data management software solutions that enable availability, delivery, and management of data across heterogeneous enterprise platforms in organizations worldwide. Its flagship solution, with change data capture technology, offers real-time data integration and ingestion across all databases, data warehouses, Hadoop and the cloud. Leading businesses choose Attunity to enable data lakes for real-time analytics, and ultimately, maximize the value of their IT and data investments.

“Attunity is excited to become a member of ODPi, helping to set a vision and technology ecosystem for metadata management that will benefit enterprises building modern data architectures,” said Itamar Ankorion, Chief Marketing Officer at Attunity. “Attunity shares ODPi’s belief that automated discovery and maintenance of metadata has to be an integral part of all modern data integration tools like ours that access, change and move information. We look forward to being part of ODPi’s efforts to standardize, support and accelerate growth of the Big Data Ecosystem.”

ING is a global financial institution with a strong European base, offering banking services. We draw on our experience and expertise, our commitment to excellent service and our global scale to meet the needs of a broad customer base, comprising individuals, families, small businesses, large corporations, institutions and governments. Our customers are at the heart of what we do.

“ING decided to become a member of ODPi to help drive standardization around open metadata,” said Ferd Scheepers, Chief Information Architect at ING. “Analytics is one of our strategic priorities, and we believe that standardization of metadata is a key enabler to be successful with analytics. ODPi as an independent group plays a key role in helping standardization across vendors, for ING the key reason to join and support ODPi.”

Additional Resources

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference platform called ODPi Core. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise. For more information about ODPi, please visit: http://www.ODPi.org

###

Media Contact:

Natasha Woods

ODPi

(415) 312-5289

pr@odpi.org

New members support efforts to advance data governance and data science approaches

Berlin, Germany – April 16, 2018 – DataWorks Summit — ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, today announced that Attunity and ING have joined the initiative to advance data governance and data science approaches.

Many vendors have focused on productizing Apache Hadoop® as a distribution, which led to inconsistency that increased the cost and complexity for application vendors and end-users to  fully embrace Apache Hadoop. Founded in 2015, ODPi is an industry effort to accelerate the adoption of Apache Hadoop and related big data technologies. ODPi’s members aim to accelerate Apache Hadoop adoption through a neutral, industry-wide approach to data governance and data science. Together, they are supporting the mission of creating an open data ecosystem through collaboration with subject matter experts and data platform and tools vendors.

The Big Data market has, in part due to efforts by ODPi and its members, achieved the desired simplification of the Apache Hadoop landscape. However, barriers to broader and more rapid enterprise Hadoop adoption exist and can benefit from a neutral, industry-wide approach to data governance and data science,” said John Mertic, director of program management, ODPi. “We are thrilled to have Attunity and ING on board as ODPi members to help us further these industry-wide approaches.”

The new ODPi members will join a diverse and growing group of members that include well-known Apache Hadoop software companies, service providers and end users, as well as a rapidly growing community.

ING Information Architect and Application Developer, Maryna Strelchuk, and ODPi Director of Program Management, John Mertic, will be co-presenting at DataWorks Summit on The rise of big data governance: Insight on this emerging trend from active open source initiatives.

About the newest members:

Attunity is a leading provider of modern data integration and Big Data management software solutions that enable availability, delivery, and management of data across heterogeneous enterprise platforms in organizations worldwide. Its flagship solution, with change data capture technology, offers real-time data integration and ingestion across all databases, data warehouses, Hadoop and the cloud. Leading businesses choose Attunity to enable data lakes for real-time analytics, and ultimately, maximize the value of their IT and data investments.

“Attunity is excited to become a member of ODPi, helping to set a vision and technology ecosystem for metadata management that will benefit enterprises building modern data architectures,” said Itamar Ankorion, Chief Marketing Officer at Attunity. “Attunity shares ODPi’s belief that automated discovery and maintenance of metadata has to be an integral part of all modern data integration tools like ours that access, change and move information. We look forward to being part of ODPi’s efforts to standardize, support and accelerate growth of the Big Data Ecosystem.”

ING is a global financial institution with a strong European base, offering banking services. We draw on our experience and expertise, our commitment to excellent service and our global scale to meet the needs of a broad customer base, comprising individuals, families, small businesses, large corporations, institutions and governments. Our customers are at the heart of what we do.

“ING decided to become a member of ODPi to help drive standardization around open metadata,” said Ferd Scheepers, Chief Information Architect at ING. “Analytics is one of our strategic priorities, and we believe that standardization of metadata is a key enabler to be successful with analytics. ODPi as an independent group plays a key role in helping standardization across vendors, for ING the key reason to join and support ODPi.”

Additional Resources

About ODPi

ODPi is a nonprofit organization committed to simplification and standardization of the big data ecosystem with a common reference platform called ODPi Core. As a shared industry effort, ODPi members represent big data technology, solution provider and end user organizations focused on promoting and advancing the state of Apache Hadoop® and big data technologies for the enterprise. For more information about ODPi, please visit: http://www.ODPi.org

###

Media Contact:

Natasha Woods

ODPi

(415) 312-5289

pr@odpi.org

2016 was a pivotal year for Apache Hadoop, a year in which enterprises across a variety of industries moved the technology out of PoCs and the lab and into production. Look no further than AtScale’s latest Big Data Maturity survey, in which 73 percent of respondents report running Hadoop in production.

ODPi recently ran a series of its own Twitter polls and found that 41 percent of respondents do not use Hadoop in-production, while 41% of respondents said they do. This split may partly be due to the fact that the concept of “production” Hadoop can be misleading. For instance, pilot deployments and enterprise-wide deployments are both considered “production,” but they are vastly different in terms of DataOps, as Table 1 below illustrates.

YiNSxpTWDbZhddVcZmA13-qBFp8yp7gqIKpNPcU2

Table 1: DataOps Considerations from Lab to Enterprise-wide Production.

As businesses move Apache Hadoop and Big Data out of Proof of Concepts (POC)s and into enterprise-wide production, hybrid deployments are the norm and several important considerations must be addressed. 

Dive into this topic further on June 28th for a free webinar with John Mertic, Director of ODPi at the Linux Foundation, hosting Tamara Dull, Director of Emerging Technologies at SAS Institute.

The webinar will discuss ODPi’s recent 2017 Preview: The Year of Enterprise-wide Production Hadoop and explore DataOps at Scale and the considerations businesses need to make as they move Apache Hadoop and Big Data out of Proof of Concepts (POC)s and into enterprise-wide production, hybrid deployments.

Register for the webinar here.

As a sneak peek to the webinar, we sat down with Mertic to learn a little more about production Hadoop needs.

Why is it that the deployment and management techniques that work in limited production may not scale when you go enterprise wide?

IT policies kick in as you move from Mode 2 IT — which tends to focus on fast moving, experimental projects such as Hadoop deployments — to Mode 1 IT — which controls stable, enterprise wide deployments of software. Mode 1 IT has to consider both the enterprise security and access requirements, but also data regulations that impact how a tool is used. On top of that, cost and efficiency come into play, as Mode 1 IT is cost conscious.

What are some of the step-change DataOps requirements that come when you take Hadoop into enterprise-wide production? 

Integrating into Mode 1 IT’s existing toolset is the biggest requirement. Mode 1 IT doesn’t want to manage tools it’s not familiar with, nor those it doesn’t feel it can integrating into the existing management tools the enterprise is already using. The more Hadoop uniformly fits into the existing devops patterns – the more successful it will be.

Register for the webinar now.

The Linux Foundation’s Hadoop project, ODPi, and Enterprise Strategy Group (ESG) are teaming up on November 7 for a can’t miss webinar for Chief Data Officers and their Big Data Teams.

esg-whitepaper-render-odpi-797×1024.png

Big Data report

As a bonus, all registrants will receive a free copy of Nik’s latest Big Data report.

Join ESG analyst Nik Rouda and ODPi Director John Mertic for “Taking the Complexity out of Hadoop and Big Data” to learn:

  1. How ODPi pulls complexity out of Hadoop, freeing enterprises and their vendors to innovate in the application space

  2. How CDOs and app vendors port apps easily across cloud, on prem and Hadoop distros. Nik revels ESG’s latest research on where enterprises are deploying net new Hadoop installs across on-premise, public, private and hybrid cloud

  3. What big data industry leaders are focusing on in the coming months

Removing Complexity

As ESG’s Nik Rouda observes, “Hadoop is not one thing, but rather a collection of critical and complementary components. At its core are MapReduce for distributed analytics jobs processing, YARN to manage cluster resources, and the HDFS file system. Beyond those elements, Hadoop has proven to be marvelously adaptable to different data management tasks. Unfortunately, too much variety in the core makes it harder for stakeholders (and in particular, their developers) to expand their Hadoop-enhancing capabilities.”
The ODPI Compliant certification program ensures greater simplicity and predictability for everyone downstream of Hadoop Core – SIs, app vendors and end users.

Application Portability

ESG reveals their latest findings on how enterprises are deploying Hadoop, and you may be surprised at the percent moving to the cloud. Find out who’s deploying on premise (dedicated and shared), who’s using pre-configured on-prem infrastructure, what percent are moving to private, public and hybrid cloud.

Where Industry Leaders are Headed

ESG interviewed leaders like Capgemini, VMWare, and more as part of this ODPi research – let their thinking light your way as you develop your Hadoop and Big Data Strategy.

Reserve your spot for this informative webinar. 

As a bonus, all registrants will receive a free copy of Nik’s latest Big Data report.

1) Jack Wallen shares what’s new with Automotive Grade Linux and why it’s an important Linux Foundation Collaborative Project.

Automotive Grade Linux Wants to Help Open Source Your Next Car– TechRepublic

2) Daniel Robinson shares the latest reasons why it’s smart to opt for Linux
5 Reasons to Ditch Windows for Linux– The Inquirer

3) The ODPi’s Hadoop runtime has been adopted by data analytics vendors.

ODPi Advances Hadoop Standards with Open Source Runtime Specification– The VAR Guy

4) The Linux Foundation’s OpenHPC Project promises to reduce duplicated development, validation and maintenance efforts across HPC.

System Software, Orchestration Gets an OpenHPC Boost– The Next Platform

5) GitHub publishes data visualizations, show the impact of open source development on hosted projects.

GitHub Visualizes the Impact of Open Source– ADT Magazine