Models are at the core of AI. Fed on data that is often continually updated and evolving, they form the foundations upon which AI works.

AI developers spend countless hours improving and refining models, ensuring that, for example, a driverless car can safely and seamlessly move through city streets, knowing exactly what the objects around it are and how best to respond to them.

However, just like any form of digital asset, AI models are not immune to attack. And because the development of AI and machine learning involves such complex systems, knowing not only when a model has been attacked but what damage has been done is not always clear.

Speaking at Huawei Connect 2019 in Shanghai, China, Bao Feng, director of the Chinese technology giant’s Shield Lab, has spent considerable time looking into the issue of malicious attacks on AI models, and the information he has to share is of serious concern.

Attacks on AI models can, if appropriately targeted and obfuscated, have devastating effects on systems that will soon become essential parts our daily lives, and in some cases could cause damage to human life.

Malicious attacks on AI models

There are, according to Bao, four types of attacks that pose the most notable risks.

The first is straightforward theft, which while unlikely to cause direct harm to an AI model’s operation, could see it being used for unethical or unregulated purposes.

“When the model is okay, the data is okay, people can visit your model again and again so as to steal the model,” he says. “If the model is stolen, then your intellectual property will be breached and there will be other privacy issues.”

While this is naturally a concern for companies, particularly given the potential regulatory compliance issues if a model includes sensitive personal data, it is by no means the most serious type of attack that can be levied against an AI model.

A type of attack known as a backdoor or vulnerability attack arguably has the potential to be more severe. This occurs when AI models make use of learning frameworks developed by a third party, an extremely common practice within AI developments.

“You cannot always guarantee clean data from a clean source. That's why we need to always strictly screen the data.”

While the vast majority of these are developed in good faith to a high standard, they can contain backdoors or vulnerabilities – either intentionally or unwittingly – that can be taken advantage of to gain access to an AI model. And once a malicious party has access, they can do considerable damage.

One such approach is known as data poisoning, where reliable training data is mixed in with rogue data that can skew or damage the model’s parameters. This poisoned data can be introduced via a compromised server, or added to the source data before others gain access to it.

“You cannot always guarantee clean data from a clean source. That's why we need to always strictly screen the data,” says Bao.

And to skew the results of machine learning models, for example, an attacker does not need to introduce much rogue data. Bao cites research undertaken at UC Berkeley, where adding just 50 items to a database of 600,000 was enough to cause notable distortion.

The threat of evasion attacks

However, according to Bao, one of the most serious attack types is what is known as an evasion attack.

“More than 50% of researchers are studying evasion attacks because they can always lead to mistakes in the model behaviour,” he explains.

There are a host of different types of evasion attacks, but they all focus on fooling the model into drawing a different conclusion than the one it should come to.

“What people can easily recognise sometimes will not be identifiable by a machine, so evasion attacks are an important attack model,” explains Bao.

“What people can easily recognise sometimes will not be identifiable by a machine, so evasion attacks are an important attack model.”

One such example of this is adding noise to images in such a way that although it looks the same to a human, the model will interpret it as something else. Bao gives the example of a model to categorise pictures of cats and dogs: by adding the right noise to a picture of a cat, the model can be fooled into miscategorising it.

“You can recognise this as a cat, but the machine will recognise it at 99% as a dog,” he says.

This may seem like an innocent enough change, but for other uses of the technology it could have very serious consequences.

“In an autonomous driving scenario, you can imagine originally the machine can detect people walking in front of the car,” he says.

“However, if on the road I just add some small, unique sign that will not be identified by human eyes, AI will recognise this road as having nobody walking on it. This will cause a very big impact to autonomous driving.”

Preventing attacks on AI models

So how can organisations secure their AI models against such attacks, particularly given how hard-to-spot such additions can be?

Naturally, a key part of this is following the same principles you would follow for securing any sensitive data, including keeping all data protected by strong passwords and limiting who has access to it.

However, Bao also recommends conducting regular “multidimensional health checks” on AI models, to check their performance is not being compromised.

When it comes to checking for “adversarial examples” such as those used in evasion attacks, Bao recommends using what is known as mutation testing.

This involves introducing perturbations – that is, samples with changes in them – and then measuring how different the output results are from before the introduction, known as the label change rate.

“With iterative adversarial training, we can improve the robustness and the resilience of the model.”

While innocent changes will produce only minor adjustments, adversarial examples will lead to significant movement in the output results, with Bao saying that this method can identify adversarial additions with a 96% success rate.

But this approach only concerns routing out rogue additions. AI models can also be made more resistant to attack using iterative adversarial training, a method that Bao likens to a vaccine, in that it involves building attack methods into the AI model to make it resistant to their effect.

“With iterative adversarial training, we can improve the robustness and the resilience of the model. It is like taking a vaccine so that people can be immune to infectious diseases; you need to add some of the model to generate a new generation of the model and keep training, keep iterating, so that the virus and the defence capability of this model can be gradually improved,” he explains.

“Of course, it is impossible for you to guarantee security 100% – there will continuously be new kinds of attack models – but the precision, applicability and as well as efficiency can be quite high and this training will not impact the efficiency of the model.”

Share this article