According to Google’s red team leader, adversarial attacks, data poisoning, prompt injection, and backdoor assaults are some of the biggest dangers to machine learning (ML) systems. These ML systems consist of ChatGPT, Google Bard, and Bing AI, which are all based on substantial language models.
‘Tactics, techniques, and procedures’ (TTPs) are the usual name for these attacks.
In a recent study, Google’s AI Red team listed the most frequent TTPs employed by attackers against AI systems.
1. Adversarial attacks on AI systems
Writing inputs with the express purpose of deceiving an ML model is one form of adversarial assault. As a result, the model produces an inaccurate output or an output that it wouldn’t produce under different conditions, such as outcomes that the model may have been deliberately taught to avoid.
“The impact of an attacker successfully generating adversarial examples can range from negligible to critical, and depends entirely on the use case of the AI classifier,” Google’s AI Red Team paper stated.
2. Data-poisoning AI
Data poisoning, which comprises tampering with the model’s training data to sabotage its learning process, is another frequent method by which adversaries could attack machine learning systems, according to Fabian.
According to Fabian, “Data poisoning has become more and more interesting.” “Anyone can publish content online, including attackers, and they are free to disseminate their poisonous info. Therefore, it is up to us as defenders to figure out how to spot data that may have been tainted in some way.
These “data poisoning” attacks involve purposefully introducing false, deceptive, or altered data into the model’s training dataset in order to skew the model’s behavior and results. An illustration of this would be to purposefully misidentify faces in a facial recognition dataset by adding inaccurate labels to photographs in the collection.
According to Google’s AI Red Team paper, securing the data supply chain is one technique to avoid data poisoning in AI systems.
3. Prompt injection attacks
In order to alter the output of a model, a user can attack an AI system by performing quick injection attacks. Even when the model is particularly trained to counter these threats, the output may nevertheless produce unanticipated, biased, inaccurate, and offensive replies.
It is crucial to safeguard the model from users who have bad intentions because the majority of AI businesses work to develop models that offer accurate and unbiased information. This can entail limiting what can be entered into the model and carefully examining what users can contribute.
4. Backdoor attacks on AI models
One of the most deadly forms of assault against AI systems is backdoor attacks, which can go unreported for a very long time. Backdoor attacks could give a hacker the ability to steal data as well as hide code in the model and sabotage model output.
“On the one hand, the attacks are very ML-specific, and they require a lot of machine learning subject matter expertise to be able to modify the model’s weights to put a backdoor into a model or to do specific fine-tuning of a model to integrate a backdoor,” added Fabian.
The model can be used to carry out these assaults by installing and using a backdoor, a covert entry point that avoids conventional authentication.
0 Comments