Some leading AI models lack European regulation in key areas such as cybersecurity resilience and the effects of discrimination, according to Reuters data.
The EU had long debated new AI rules before OpenAI made ChatGPT publicly available in late 2022. Record popularity and public debate about the alleged existential threats posed by such models encouraged lawmakers to create specific rules for “general-purpose” artificial intelligence (GPAI) .
Click here to connect with us on WhatsApp
Now, a new tool designed by Swiss start-up LatticeFlow and partners and backed by European Union officials has tested generative AI models developed by big tech companies such as Meta and OpenAI, in line with broader Community AI legislation to be phased in in continue to apply for the next two years
Assigning each model a score from 0 to 1, a leaderboard released Wednesday by LatticeFlow showed that models created by Alibaba, Anthropic, OpenAI, Meta and Mistral scored 0.75 or higher.
However, the company's “Large Language Model (LLM) Checker” tool revealed some model errors in key areas, highlighting areas where companies may need to shift resources to ensure compliance.
Companies that fail to comply with AI laws face fines of up to 35 million euros ($38 million) or 7 percent of global annual turnover.
Mixed results
Currently, the EU is still trying to determine how AI law will apply to generative AI tools such as ChatGPT, and is calling on experts to develop a code of conduct to regulate the technology by spring 2025.
But the LatticeFlow test, developed in collaboration with researchers from Switzerland's ETH Zurich university and Bulgaria's research institute INSAIT, provides an early indicator of specific areas where tech companies may be failing to comply with the law.
For example, discrimination is an ongoing problem in developing generative AI models that reflect human biases in gender, race, and other areas when asked to do so.
When testing discriminant results, LLM checker LatticeFlow gave OpenAI's “GPT-3.5 Turbo” a relatively low score of 0.46. In the same category, Alibaba Cloud's “Qwen1.5 72B Chat” model scored only 0.37.
When testing for “quick takeover,” a type of cyber attack in which hackers disguise a malicious incentive as legitimate to extract sensitive information, LLM Checker Mater gave “Llama 2 13B Chat” a score of 0.42. In the same category, the “8x7B manual” model from the French startup Mistral received a score of 0.38.
“Claude 3 Opus,” a model created by Google-backed company Anthropic, received the highest average rating of 0.89.
The test has been designed in line with the text of the Artificial Intelligence Act and will be expanded to include further enforcement measures as they are introduced. Latticeflow said the LLM checker will be available for free for developers to check the compatibility of their models online.
The company's CEO and co-founder, Petar Sankov, told Reuters that the test results were generally positive and presented a roadmap for companies to adapt their models to the Artificial Intelligence Act.
“The EU is still working on all compliance criteria, but we are already seeing some gaps in the models,” he said. “We believe that with an increased focus on compliance optimization, model providers can be better prepared to meet regulatory requirements.” Meta declined to comment. Alibaba, Anthropic, Mistral and OpenAI did not immediately respond to requests for comment.
Although the European Commission cannot verify external tools, the agency was informed at every stage of the development of LLM Checker and described it as a “first step” in implementing the new regulations.
A spokesperson for the European Commission said: “The Commission welcomes this platform for research and evaluation of artificial intelligence models as a first step in translating EU AI rules into technical requirements.”
(Only the headline and image of this report may have been modified by Business Standards staff; the rest of the content is automatically generated from a syndicated feed.)