machine learning

Patrick Ball, Director of Research, Human Rights Data Analysis Group, offered examples of when statistics and machine learning have proved useful and when they’ve failed in this presentation from Open Source Summit Europe.

Machine learning and statistics are playing a pivotal role in finding the truth in human rights cases around the world – and serving as a voice for victims, Patrick Ball, director of Research for the Human Rights Data Analysis Group, told the audience at Open Source Summit Europe.

Ball began his keynote, “Digital Echoes: Understanding Mass Violence with Data and Statistics,” with background on his career, which started in 1991 in El Salvador, building databases. While working with truth commissions from El Salvador to South Africa to East Timor, with international criminal tribunals as well as local groups searching for lost family members, he said, “one of the things that we work with every single time is trying to figure out what the truth means.”

In the course of the work, “we’re always facing people who apologize for mass violence. They tell us grotesque lies that they use to attempt to excuse this violence. They deny that it happened. They blame the victims. This is common, of course, in our world today.”

Human rights campaigns “speak with the moral voice of the victims,’’ he said. Therefore, it is critical that statistics, including machine learning, are accurate, Ball said.

He gave three examples of when statistics and machine learning proved to be useful, and where they failed.

Finding missing prisoners

In the first example, Ball recalled his participation as an expert witness in the trial of a war criminal, the former president of Chad, Hissène Habré. Thousands of documents were presented, which had been discovered as a pile of trash in an abandoned prison and which turned out to be the operational records of the secret police.

The team honed in one type of document that detailed the number of prisoners that were held at the beginning of the day, the number held at the end of the day, and the difference between the number of prisoners who were released, new prisoners brought in, those transferred to other places, and those who had died during the course of the day. Dividing the number of people who died throughout the day by the number alive in the morning produces the crude mortality rate, he said.

The status of the prisoners of war was critical in the trial of Habré because the crude mortality rate was “extraordinarily high,” he said.

“What we’re doing in human rights data analysis is … trying to push back on apologies for mass violence. In fact, the judges in the [Chad] case saw precisely that usage and cited our evidence … to reject President Habré’s defense that conditions in the prison were nothing extraordinary.”

That’s a win, Ball stressed, since human rights advocates don’t see many wins, and the former head of state was sentenced to spend the rest of his life in prison.

Hidden graves in Mexico

In a more current case, the goal is to find hidden graves in Mexico of the bodies of people who have disappeared after being kidnapped and then murdered. Ball said they are using a machine learning model to predict where searchers are likely to find those graves in order to focus and prioritize searches.

Since they have a lot of information, his team decided to randomly split the cases into test and training sets and then train a model. “We’ll predict the test data and then we’ll iterate that split, train, test process 1,000 times,’’ he explained. “What we’ll find is that over the course of four years that we’ve been looking at, more than a third of the time we can perfectly predict the counties that have graves.”

“Machine learning models are really good at predicting things that are like the things they were trained on,” Ball said.

A machine learning model can visualize the probability of finding mass graves by county, which generates press attention and helps with the advocacy campaign to bring state authorities into the search process, he said.

That’s machine learning, contributing positively to society,” he said. Yet, that doesn’t mean that machine learning is necessarily positive for society as a whole.

Predictive Policing

Many machine learning applications “are terribly detrimental to human rights and society,’’ Ball stressed.  In his final example, he talked about predictive policing, which is the use of machine learning patterns to predict where crime is going to occur.

For example, Ball and his team looked at drug crimes in Oakland, California. He displayed a heat map of the density of drug use in Oakland, based on a public health survey, showing the highest drug use close to the University of California.

Ball and his colleagues re-implemented one of the most popular predictive policing algorithms to predict crimes based on this data. Then he showed the model running in animation, with dots on the grid representing drug arrests. Then the model made predictions in precisely the same locations as where the arrests were observed, he said.

If the underlying data turns out to be biased, then “we recycle that bias. Now, biased data leads to biased predictions.” Ball went on to clarify that he was using the term bias in a technical, not racial sense.

When bias in data occurs, he said, it “means that we’re over predicting one thing and that we’re under predicting something else. In fact, what we’re under predicting here is white crime,’’ he said. Then the machine learning model teaches police dispatchers that they should go to the places they went before. “It assumes the future is like the past,” he said.

“Machine learning in this context does not simply recycle racial disparities in policing, [it] amplifies the racial disparities in policing.” This, Ball said, “is catastrophic. Policing already facing a crisis of legitimacy in the United States as a consequence of decades, or some might argue centuries, of unfair policing. ML makes it worse.”

“In predictive policing, a false positive means that a neighborhood can be systematically over policed, contributing to the perception of the citizens in that neighborhood that they’re being harassed. That erodes trust between the police and the community. Furthermore, a false negative means that police may fail to respond quickly to real crime,” he said.

When machine learning gets it wrong

Machine learning models produce variances and random errors, Ball said, but bias is a bigger problem. “If we have data that is unrepresentative of a population to which we intend to apply the model, the model is unlikely to be correct. It is likely to reproduce whatever that bias is in the input side.”

We want to know where a crime has occurred, “but our pattern of observation is systematically distorted. It’s not that [we] simply under-observe the crime, but under-observe some crime at a much greater rate than other crimes.” In the United States, he said, that tends to be distributed by race. Biased models are the end result of that.

The cost of a machine learning being wrong can also destroy people’s lives, Ball said. It also raises the question of who bears the cost of being wrong. You can hear more from Ball and learn more about his work in the complete video presentation below.

Hilary Mason, general manager for machine learning at Cloudera, discussed AI in the real world in her keynote the recent Open FinTech Forum.

We are living in the future – it is just unevenly distributed with “an outstanding amount of hype and this anthropomorphization of what [AI] technology can actually provide for us,” observed Hilary Mason, general manager for machine learning at Cloudera, who led a keynote on “AI in the Real World: Today and Tomorrow,” at the recent Open FinTech Forum.

AI has existed as an academic field of research since the mid-1950s, and if the forum had been held 10 years ago, we would have been talking about big data, she said. But, today, we have machine learning and feedback loops that allow systems continue to improve with the introduction of more data.

Machine learning provides a set of techniques that fall under the broad umbrella of data science. AI has returned, from a terminology perspective, Mason said, because of the rise of deep learning, a subset of machine learning techniques based around neural networks that has provided not just more efficient capabilities but the ability to do things we couldn’t do at all five years ago.

Imagine the future

All of this “creates a technical foundation on which we can start to imagine the future,’’ she said. Her favorite machine learning application is Google Maps. Google is getting real-time data from people’s smartphones, then it is integrating that data with public data sets, so the app can make predictions based on historical data, she noted.

Getting this right, however, is really hard. Mason shared an anecdote about how her name is a “machine learning-edge case.” She shares her name with a British actress who passed away around 2005 after a very successful career.

Late in her career, the actress played the role of a ugly witch, and a search engine from 2009 combined photos with text results. At the time, Mason was working as a professor, and her bio was paired with the actress’s picture in that role. “Here she is, the ugly hag… and the implication here is obvious,’’ Mason said. “This named entity disambiguation problem is still a problem for us in machine learning in every domain.”

This example illustrates that “this technology has a tremendous amount of potential to make our lives more efficient, to build new products. But it also has limitations, and when we have conferences like this, we tend to talk about the potential, but not about the limitations, and not about where things tend to go a bit wrong.”

Machine learning in FinTech

Large companies operating complex businesses have a huge amount of human and technical expertise on where the ROI in machine learning would be, she said. That’s because they also have huge amounts of data, generally created as a result of operating those businesses for some time. Mason’s rule of thumb when she works with companies, is to find some clear ROI on a cost savings or process improvement using machine learning.

“Lots of people, in FinTech especially, want to start in security, anti-money laundering, and fraud detection. These are really fruitful areas because a small percentage improvement is very high impact.”

Other areas where machine learning can be useful is in understanding your customers, churn analysis and marketing techniques, all of which are pretty easy to get started in, she said.

“But if you only think about the ROI in the terms of cost reduction, you put a boundary on the amount of potential your use of AI will have. Think also about new revenue opportunities, new growth opportunities that can come out of the same technologies. That’s where the real potential is.”

Getting started

The first thing to do, she said is to “drink coffee, have ideas.” Mason said she visits lots of companies and when she sees their list of projects, they’re always good ideas. “I get very worried, because you are missing out on a huge amount of opportunity that would likely look like bad ideas on the surface.”

It’s important to “validate against robust criteria” and create a broad sweep of ideas. Then, go through and validate capabilities. Some of the questions to ask include: is there research activity relevant to what you’re doing? Is there work in one domain you can transfer to another domain? Has somebody done something in another industry that you can use or in an academic context that you can use?

Organizations also need to figure out whether systems are becoming commoditized in open source; meaning “you have a robust software and infrastructure you can build on without having to own and create it yourself.” Then, the organization must figure out if data is available — either within the company or available to purchase.

Then it’s time to “progressively explore the risky capabilities. That means have a phased investment plan,’’ Mason explained. In machine learning, this is done in three phases, starting with validation and exploration: Does the data exist? Can you build a very simple model in a week?

“At each [phase], you have a cost gate to make sure you’re not investing in things that aren’t ready and to make sure that your people are happy, making progress, and not going down little rabbit holes that are technically interesting, but ultimately not tied to the application.”

That said, Mason said predicting the future is of course, very hard, so people write reports on different technologies that are designed to be six months to two years ahead of what they would put in production.

Looking ahead

As progress is made in the development of AI, machine learning and deep learning, there are still things we need to keep in mind, Mason said. “One of the biggest topics in our field right now is how we incorporate ethics, how we comply with expectations of privacy in the practice of data science.”

She gave a plug to a short, free ebook called “Data Driven: Creating a Data Culture,” that she co-authored with DJ Patil, who worked as chief data scientist for President Barack Obama. Their goal, she said, is “to try and get folks who are practicing out in the world of machine learning and data science to think about their tools [and] for them to practice ethics in the context of their work.”

Mason ended her presentation on an optimistic note, observing that “AI will find its way into many fundamental processes of the businesses that we all run. So when I say, ‘Let’s make it boring,’ I actually think that’s what makes it more exciting.’”

You can watch the complete presentation below:

Acumos AI Challenge

The Acumos AI Challenge, presented by AT&T and Tech Mahindra, is an open source developer competition seeking innovative, ground-breaking AI solutions; enter now.

Artificial Intelligence (AI) has quickly evolved over the past few years and is changing the way we interact with the world around us. From digital assistants, to AI apps interpreting MRIs and operating self-driving cars, there has been significant momentum and interest in the potential for machine learning technologies applied to AI.

The Acumos AI Challenge, presented by AT&T and Tech Mahindra, is an open source developer competition seeking innovative, ground-breaking AI solutions from students, developers, and data scientists. We are awarding over $100,000 in prizes, including the chance for finalists to travel to San Francisco to pitch their solutions during the finals on September 11, 2018. Finalists will also have the chance to have their solutions featured in the Acumos Marketplace, exposure, and meetings with AT&T and Tech Mahindra executives.

Acumos AI is a platform and open source framework that makes it easy to build, share, and deploy AI applications. The Acumos AI platform, hosted by The Linux Foundation, simplifies development and provides a marketplace for accessing, using and enhancing AI apps.  

We created the Acumos AI Challenge to enable and accelerate AI adoption and innovation, while recognizing developers who are paving the future of AI development. The Acumos AI Challenge seeks innovative AI models across all use cases. Some example use cases include, but are not limited to:

5G & SDN

Build an AI app that improves the overall performance and efficiencies of 5G networks and Software-Defined Networking.

Media & Entertainment

Build an AI model targeting a media or entertainment use case. Examples include solutions for:

  • Broadcast media, internet, film, social media, and ad campaign analysis
  • Video and image recognition, speech and sound recognition, video insight tools, etc.


Build an AI app around network security use cases such as advanced threat protection, cyber security, IoT security, and more.

Enterprise Solutions

Build an AI model targeting an enterprise use case, including solutions for Automotive, Home Automation, Infrastructure, and IoT.

Since it is so easy to onboard new models into Acumos, there are nearly an infinite number of use cases to consider that can benefit consumers and businesses across a multitude of disciplines. When submitting your entry, we encourage you to consider all scenarios that you are passionate about.

The Acumos AI Challenge will be accepting submissions between May 31 – August 5, 2018. Teams are required to submit a working AI model, test dataset, and a demo video under 3 minutes. Register your team for the Challenge beginning May 31, 2018. We encourage you to register early so that you can begin to plan and build your solution and create your demo video.

Prize Packages

Register today and submit your AI solution for a chance to be one of the top three teams to pitch their app at the Palace of Fine Arts in San Francisco on September 11, 2018. The top three teams will each receive:

  • $25,000 Cash
  • Trip to the finals in San Francisco, including air and hotel (for two team members)
  • Meetings with AT&T and Tech Mahindra executives
  • AI Solution featured in Acumos Marketplace

The team that wins the finale will take home an additional $25,000 grand prize, for a total of $50,000.

We look forward to your entry and hope to see you in San Francisco in September!


open source AI

Download this new ebook to learn about some of the most successful open source AI projects.

Open source AI is flourishing, with companies developing and open sourcing new AI and machine learning tools at a rapid pace. To help you keep up with the changes and stay informed about the latest projects, The Linux Foundation has published a free ebook by Ibrahim Haddad examining popular open source AI projects, including Acumos AI, Apache Spark, Caffe, TensorFlow, and others.

“It is increasingly common to see AI as open source projects,” Haddad said. And, “as with any technology where talent premiums are high, the network effects of open source are very strong.”

Open Source AI: Projects, Insights, and Trends looks at 16 open source AI projects – providing in depth information on their histories, codebases, and GitHub contributions. In this 100+ page book, you’ll gain insights about the various projects as well as the state of open source AI in general. Additionally, the book discusses the importance of project incubators, community governance, project consolidation, and presents some observations on common characteristics among the surveyed projects.

For each of the projects examined, the book provides a detailed summary offering basic information, observations, and pointers to web and code resources.  If you’re involved with open source AI, this book provides an essential guide to the current state of open source AI.

Download the ebook now to learn more about the most successful open source AI projects and read what it takes to build your own successful community.