In 1995, a group of researchers launched a study to predict pneumonia risk. Their goal was to help doctors decide when a pneumonia patient shows up at a hospital whether that person is low-risk and should be sent home with instructions for care and rest or be admitted as a high risk.

Kyle Dent, scientist at PARC

The researchers tried various machine learning approaches, one of which used a neural net that produced a model with remarkably high accuracy on their test data. With a very successful solution in hand, they made the extraordinary decision to use a model that did worse than their best possible predictions. Why would they make such a counter-intuitive choice?

Perhaps they had more experience or were more sensitive to possible risks, but this group of researchers asked themselves: "is this safe to use on real patients?" and they concluded it was not. The model that produced the best results wasn't explainable. It made great predictions on test data, but there was no way to know why it made the decisions.

That might not have mattered, but they were tipped off to the potential risks by a parallel system they built that learned rules from the same data. With rules you can get explanations.

When data lies

One of the rules it learned was that patients with asthma are at low risk from pneumonia. Any first-year med student knows that patients with asthma actually have a much higher risk than others. The system learned this erroneous rule because the data it learned from showed exactly that fact.

The data reflected that in practice, since asthma patients are especially at risk, doctors immediately admit them to hospital and give them a very high level of care with the result that those patients end up with better outcomes than the general population.

The sample data reflected the results of decisions made by experts. Statistical tools can learn those patterns but without the ability to reason about the cause and effect. Once the researchers recognised the problem, it would have been relatively easy to correct it. But they realised that they couldn’t be sure about other errors their opaque model might pick up.

AI researchers should know about the risks

Not all AI researchers are as aware of potential risks. They should be. Data is often biased, incomplete or not reflective of the real-world situation it’s supposed to model.

AI developers necessarily make decisions and choose trade-offs. Self-driving car designers choose to have their cars drive the speed limit and not the safest speed. That’s an ethical decision, but probably without much thinking about it in those terms.

There is a general feeling that technology is inherently neutral even among those developing AI solutions. This misunderstanding presents a growing challenge as artificial intelligence evolves and spreads into virtually every facet of human society.

“Most people trust their technologies without really understanding how they work, or recognising their limitations.”

Since the pneumonia study from 1995, AI has been adopted for many new applications with a big ripple effect on people’s lives. How we apply AI is starting to matter, which means the developers of smart systems have an obligation to consider very carefully any potential for harm. We need to bring the ethics of AI front and centre.

This issue is compounded by the fact that most people trust their technologies without really understanding how they work, or recognising their limitations. Consider the driver, who, in 2016, trusted enough in her GPS system that she mistakenly steered her vehicle directly into Georgian Bay in Ontario, Canada.

Earlier that same year, we saw the first fatal crash of a Tesla Motors car that was being driven in Autopilot mode. Putting undue reliance on the car’s semi-autonomous controls despite the manufacturer’s warnings, the Tesla driver collided with a tractor-trailer, resulting in his own death.

You can’t argue with a machine

A major side-effect of the common belief that machines are unbiased is that any debate about decisions is often shut down once an intelligent agent is introduced into the process. AI technology is already being used for decisions about judicial sentencing, job performance and hiring, among many other things.

There is no denying that without technology, human beings bring their own biases to decision-making, but those decisions are often questioned amid robust public debate. Consider the California judge who narrowly escaped recall and then suffered a decisive re-election defeat following public outrage at his sentencing decision in a sexual assault case.

“People seem to believe that technology is inherently neutral, so its decisions must be fair.”

People seem to believe that technology is inherently neutral, so its decisions must be fair. What’s lacking with AI is any discussion about how developers chose datasets, selected weighting schemes, modelled outcomes or evaluated their results, or even what those results are.

Those affected often have no recourse because computer decisions are considered infallible and usually final. One widely reported and commented on system is now being used in several jurisdictions to predict a defendant’s likelihood to recommit a crime.

An AI system that was used to predict risk assessments for criminal recidivism in the Florida courts was reported on by ProPublica in 2016. They found the system to be quite unreliable in predicting who would commit a crime, scoring only 20% correct of those it said were likely to commit a violent crime in the future.

They also reported on significant differences in the types of errors the system made when analysing white and black defendants. Their findings are disputed by the company supplying the software. The company does not disclose how they determine the risk scores the system produces, claiming their techniques as a trade secret.

Support, not replacement

Using AI shouldn’t eclipse existing laws and traditional protections extended to those affected by it. Historically, we, as a society, have favoured open government and have held human rights values that include human dignity, public health and safety, personal privacy and extend legal protection, even to criminal defendants.

Those with the authority to procure technology and those making use of it, must be aware of its design, its context for use, and its limitations. At a minimum, we need to maintain established values. As a society, we have to consider who benefits from the use of the technology and who accepts the risks of its use.

“Adopters of technology have a responsibility to hold vendors accountable and require disclosure of relevant information.”

Average consumers, business users and government agencies usually aren’t qualified to assess the relevant AI data models and algorithms. This asymmetrical relationship puts the burden on AI developers to be forthright and transparent about the underlying assumptions which guide their decisions.

Adopters of technology have a responsibility to hold vendors accountable and require disclosure of relevant information. In the case of decisions affecting sectors of the population who have been historically disadvantaged or marginalised, it is especially important to understand the benefits and risks of using the technology, in addition to understanding the reliability and accuracy of that usage.

Intelligence is only as good as its data

Most modern AI decision-making systems gain intelligence from existing data. It is critical that we review that data to understand how well it aligns with the real-world goals of the system. Training data does not always reflect variables from the actual environment where they are deployed. It’s often repurposed after being collected for other purposes.

Real life is complicated and messy. It can be difficult or even impossible to accurately define value functions that match the end goal. Whereas humans are good at ignoring obviously irrelevant data—it often doesn’t even enter our minds—machines are not good at reasoning about causal factors. They are really good at finding correlations whether they matter or not.

Performance accuracy is another important consideration. A model with a 99% accuracy would rightly be considered excellent results by any AI developer. But how many developers ask themselves about the real people who are within that 1% where the system gets it wrong?

What if hundreds or thousands of people are adversely impacted by the system? Is it still worth using?

“Real life is complicated and messy. It can be difficult or even impossible to accurately define value functions that match the end goal.”

In these cases, systems could be designed to allow for human input to compensate for the system’s misses. The size or severity of potential negative consequences should justify extra cost and effort to add protection into the system.

Evaluation of a system should continue even after a system is deployed. The world is highly dynamic and fast-changing. AI systems should incorporate ways to assess post-release accuracy and calculate how often that accuracy should be reviewed and recalibrated.

All of us should be asking hard-hitting questions about AI systems that impact individuals, communities, societies and our shared environments.

Are there some specific groups which may be advantaged or disadvantaged in the context of the algorithms under development? When using human data, do the benefits outweigh the risks for those involved?

Will there be a calculation of the error rates for different sub-populations and the potential differential impacts? And what are the effects of false positives or false negatives on the subjects who are misclassified?

The developers of AI systems can help mitigate a broad range of future problems by paying close attention to all the decisions which influence their software and data models. If these decisions are not surfaced and thoroughly examined in advance, we risk incurring a tragic and costly backlash from the growth of artificial intelligence.

Share this article