13/11/2025, Thursday06:03
flag
Yeni Şafak

EDITION   :

TRTRENENARARFRFRRURUURUR

Reklam yükleniyor...

Reklam yükleniyor...

Study finds widespread flaws in AI safety and performance testing

Yenişafak
11:57, 04/11/2025, Tuesday
AA
Study finds widespread flaws in AI safety and performance testing
File photo

Researchers have identified significant weaknesses in hundreds of tests used to evaluate artificial intelligence safety and effectiveness. The flaws could undermine claims about AI model reliability as companies release new systems at an accelerating pace.

A comprehensive study has revealed substantial weaknesses in hundreds of tests designed to evaluate the safety and performance of artificial intelligence systems. Researchers from the British government's AI Security Institute and academic institutions including Berkeley and Oxford examined more than 440 benchmarks that serve as critical evaluation tools for new AI models being released to the public.

Widespread Testing Deficiencies

The investigation found that nearly all the examined benchmarks contained flaws in at least one critical area, potentially rendering their results "irrelevant or even misleading." According to the researchers, these testing weaknesses "undermine the validity of the resulting claims" made by technology companies about their AI systems' capabilities and safety features, raising concerns about the reliability of current evaluation methods.

Real-World Consequences

The research emerges amid growing apprehension about AI safety standards, highlighted by recent incidents involving major technology companies. Google withdrew its Gemma AI model after it fabricated serious false allegations about a US senator, creating fictional news stories about non-consensual sexual relationships. The incident prompted Senator Marsha Blackburn to characterize the failure as "catastrophic" in terms of oversight and ethical responsibility.

Industry Context and Response

Andrew Bean of Oxford Internet Institute noted that many flawed benchmarks are routinely used to assess the latest AI models released by leading technology firms. Google responded by clarifying that its Gemma models were intended specifically for AI developers and researchers rather than factual assistance or consumer applications, though the incident underscores broader concerns about testing methodologies as AI development accelerates across the industry.

Reklam yükleniyor...

Reklam yükleniyor...

Comments
Avatar

Comments you share on our site are a valuable resource for other users. Please be respectful of different opinions and other users. Avoid using rude, aggressive, derogatory, or discriminatory language.

Page End
Turkey's Accumulation. International Media Group.

Welcome to the news source that sets Turkey's agenda! With its impartial, dynamic, and in-depth journalism, Yeni Şafak offers its readers an experience beyond current events. Get instant updates on what's happening in Turkey and worldwide, with news spanning a wide range from politics and economy to culture, arts, and sports. Access the most accurate information anytime, anywhere with its digital platforms; keep up with the agenda with Yeni Şafak!

Follow us on social media.
Download Mobile Apps

Carry the agenda in your pocket! With Yeni Şafak's mobile apps, get instant access to the latest news. A wide range of content, from politics to economy, sports to culture and arts, is at your fingertips! Easily download it on your iOS, Android, and Huawei devices to quickly access the most accurate information anytime, anywhere. Download now, don't miss out on developments around the world!

Categories
Albayrak Media

Maltepe Mah. Fetih Cad. No:6 34010 Zeytinburnu/İstanbul, Türkiyeiletisim@yenisafak.com+90 212 467 6515

LEGAL DISCLAIMER

The BIST name and logo are protected under a 'Protection Trademark Certificate' and cannot be used, quoted, or modified without permission. All information disclosed under the BIST name is fully copyrighted by BIST and may not be republished. Market data is provided by iDealdata Financial Technologies Inc. BIST stock data is delayed by 15 minutes.

© Net Medya, All right reserved. 2025