17 May 2025

Can AI Get Worse Over Time

Users on online forums noticed that AI bots’ performance changes over time. The quality of answers may drop, or a bot may refuse to respond to some questions altogether. The responses we get depend on different factors.

On this page

AI bots generate answers based on Large Language Models (LLMs), which are trained on large amounts of information from the web, books, and other sources. A team of computer scientists behind the creation of an LLM integrates the model into a product, which processes requests and responses. To come up with an answer, a system repeats patterns it’s trained on, including how sentences and paragraphs are structured.

AI chatbots and LLMs have been developed by different companies. For example, ChatGPT was developed by OpenAI and uses the GPT language model. Users can choose between the GPT-3.5, GPT-4, GPT-4 Turbo, and the most recent GPT-4o model.

Each new model presents new features and capabilities. What’s interesting is that not only do the capabilities of these models vary from each other, but the same version of the model can change in its accuracy and results.

Examples of How AI Models’ Performance Declined

It takes time to fully understand the performance and capabilities of an AI model. The model's capabilities become more apparent as more users interact with the system and provide feedback or examples that the system can learn from. Each user interaction requires computational resources, and collectively, these interactions lead to algorithm updates. As a result, there may arise issues, such as response delays, inaccuracies, and failure to respond to certain questions. Typically, the changes are noticeable several months later after the bot’s launch. 

Take for example the GPT-4, launched in March 2023. According to user reviews, it was surprisingly good at the start but began making mistakes later. On ChatGPT Forums you can see discussions like “WHY ChatGPT 4.0 is getting stupider and stupider,” or “Is Chat GPT getting worse or are my prompts getting worse?” According to users, the bot’s answers were limited, they didn’t provide detailed explanations or failed to catch and fix mistakes. 

A study by Stanford confirmed these concerns. Evaluating the GPT-4 and GPT-3.5 models, researchers found that their June version did get significantly more mistakes compared to March. The GPT-4’s mathematics accuracy in answering some problems fell from 98% to 2% during that time. The model failed to respond to questions, like if 17077 is a prime number. It kept saying no, although the correct answer was yes. 

Stanford’s study, however, faced disagreements. A response article published by Princeton University computer scientists questioned the methods that were used. While agreeing that GPT’s behavior has changed, the paper mentioned that using another evaluation method, for example testing the model not only on prime numbers, would result in a different picture. At the same time, Princeton University scientists found that the answers did change from March to June. In particular, while in March GPT-4 almost always guessed the number was prime, in the June version it almost always guessed it was composite. The authors believe the interpretation of the model’s behavior as a massive performance drop by the earlier study is incorrect. Instead, they state the bot demonstrated behavioral change affected by regular fine-tuning and prompting. It can still give the correct answer but may require new prompting strategies. According to the Princeton research: 

One important concept to understand about chatbots is that there is a big difference between capability and behavior. A model that has a capability may or may not display that capability in response to a particular prompt. It is little comfort to a frustrated ChatGPT user to be told that the capabilities they need still exist, but now require new prompting strategies to elicit.

GPT isn't alone in its struggle. Users of other AI bots, including Anthropic’s Claude, pointed out worsening performance, too. They mention that since the app’s release, it became slower, and less accurate. 

User review on Claude. Source: reddit.com

User review on Claude. Source: reddit.com

Why AI Models Become Worse Over Time? 

Generally, AI apps including ChatGPT and Claude are closed-source, meaning full details about how they are trained and work are not available. Although the exact reasons behind the deteriorating performance are hard to evaluate, there are several factors that make an impact. Among them is the challenge of fine-tuning large language models in different directions without compromising any of their capabilities. The Stanford research also mentions: 

Improving the model’s performance on some tasks, for example with fine-tuning on additional data, can have unexpected side effects on its behavior in other tasks. Consistent with this, both GPT-3.5 and GPT-4 got worse on some tasks but saw improvements in other dimensions.

The study also found changes in how the models reason and the level of detail in their answers.

AI models like ChatGPT learn not only from data they are trained on but also adjust and learn from user inputs. If you regularly use ChatGPT, you have probably noticed that the app asks for feedback. For example, on ChatGPT, you can dislike the answers that are not helpful and provide specific feedback on why. OpenAI removed the like button, focusing more on negative responses and nuanced feedback. It can sometimes deliver two possible versions of a response and ask which one you prefer to help improve its language generation capabilities.

ChatGPT adjusts its performance based on collected user feedback. However, doing so in a way that everyone likes is practically impossible.

Another crucial factor influencing AI models' responses is the implementation of safety guidelines. Programs increasingly refuse to answer sensitive questions to avoid bias and harmful outputs. On the positive side, this approach enhances safety and ethical standards, but it can also limit the system's capabilities in certain contexts. 

For example, Claude refuses to identify people in photos due to privacy and safety policies. To test the app, we attached a photo of Michael Jackson, asking the bot to tell us who it was. Claude explained that it can’t name the person “per its guidelines.”  However, it provided a descriptive hint, acknowledging that the image depicted “an incredibly influential and celebrated figure in the music industry whose artistry and performances captivated audiences worldwide.” This shows that while Claude recognized Michael Jackson, it is trained to refrain from directly answering such questions. 

Claude refuses to identify Michael Jackson in the photo due to safety measures. Source: claude.ai

Claude refuses to identify Michael Jackson in the photo due to safety measures. Source: claude.ai

What Will Happen in the Future? 

AI models’ performance not only can improve but also decline over time. The behavior of future AI applications is largely uncertain. It depends on how developers overcome the challenge of fine-tuning models without compromising existing abilities, whether there will be standards to train AI models, and how ethical constraints will impact the quality of answers. 

The content on The Coinomist is for informational purposes only and should not be interpreted as financial advice. While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, or reliability of any content. Neither we accept liability for any errors or omissions in the information provided or for any financial losses incurred as a result of relying on this information. Actions based on this content are at your own risk. Always do your own research and consult a professional. See our Terms, Privacy Policy, and Disclaimers for more details.

Articles by this author
Senate Nears Consensus on GENIUS Act, Vote Expected Next Week

Senate Nears Consensus on GENIUS Act, Vote Expected Next Week

Updates to the GENIUS Act, aimed at regulating stablecoins, include new provisions on national security and ethics. The Senate is preparing to bring the bill to a vote next week.

Dmytro Psevdonimenko
Méliuz Launches Bitcoin Treasury Strategy Amid Business Overhaul

Méliuz Launches Bitcoin Treasury Strategy Amid Business Overhaul

Shifting gears toward a crypto-first identity, Méliuz is relaunching with plans to become a Bitcoin treasury powerhouse in Latin America.

Anton Kryshtal
Atkins Outlines SEC’s New Crypto Framework

Atkins Outlines SEC’s New Crypto Framework

On May 12, 2025, SEC Chair Paul Atkins introduced a new regulatory strategy for digital assets, with a focus on tokenization, custody, and crypto trading practices.

Vlad Vovk
DDC Enterprise Unveils 3-Year Plan to Hold 5,000 BTC

DDC Enterprise Unveils 3-Year Plan to Hold 5,000 BTC

DDC Enterprise (NYSEAM: DDC) has committed to building a 5,000 BTC reserve, beginning with a 100 BTC purchase and a 36-month accumulation plan.

Vlad Vovk
Monica Long: How Ripple’s Quiet Force Is Redrawing Crypto’s Map

Monica Long: How Ripple’s Quiet Force Is Redrawing Crypto’s Map

Ripple’s Monica Long isn’t chasing headlines—she’s quietly building crypto’s future. From payments to tokenization, here’s how she’s reshaping finance from the inside.

Elina Moskovchuk
Top Crypto Tweets Today: Zerebro Dev Reveals He Faked His Suicide

Top Crypto Tweets Today: Zerebro Dev Reveals He Faked His Suicide

The biggest mystery in today’s Twitter/X recap is Zerebro dev Jeffy Yu, who claimed to take his life on a Pump.fun stream – but later said he faked the video to stop harassment.

Anahit Avetisyan
Top Crypto Tweets Today: Samourai Case, Curve X Hack & More

Top Crypto Tweets Today: Samourai Case, Curve X Hack & More

DOJ prosecutors reportedly suppressed key evidence in the Samourai Wallet case. Crypto lawyer Zack Shapiro shared the defense team’s hearing request on X.

Anahit Avetisyan
How to Buy New Crypto Before Listing: A Step-by-Step Guide

How to Buy New Crypto Before Listing: A Step-by-Step Guide

Buying a cryptocurrency before it’s listed publicly has become one of the most talked-about strategies in the space, offering the potential for major upside—if done carefully.

The Coinomist
The Rise and Fall of Web3 Darlings: A Guide to Crypto Longevity

The Rise and Fall of Web3 Darlings: A Guide to Crypto Longevity

Most Web3 projects don’t collapse—they just stop being talked about. What makes one protocol a star and another a ghost? And why, in crypto, silence might signal transformation rather than failure?

Vlad Vovk
Beyond Profits: Understanding the Spiritual Side of Trading

Beyond Profits: Understanding the Spiritual Side of Trading

Are spiritual habits the missing link in trading psychology? For many, mindfulness and reflection offer a buffer against stress, reduce snap decisions, and aid in staying grounded through market volatility.

Vlad Vovk
How Cryptocurrency and Its Owners Are Tracked

How Cryptocurrency and Its Owners Are Tracked

Think blockchain is private? Wallets have no names and transfers seem untraceable — but that’s misleading. Discover how experts uncover wallet owners and link identities to transactions.

Vlad Vovk
The State of Crypto Regulation in 2025: Where the World Stands

The State of Crypto Regulation in 2025: Where the World Stands

A wave of regulation is sweeping the crypto world in 2025. From Washington to Brussels to Singapore, governments are setting new ground rules. What’s at stake for crypto’s next chapter?

Daryna Nesterenko
Blockchain-Based Distribution Platforms: Taking Control of Your Film’s Future

Blockchain-Based Distribution Platforms: Taking Control of Your Film’s Future

Blockchain lets filmmakers skip middlemen, control distribution, and get paid fairly. Director Markus Müller-Hahnefeld shares how it works.

Sebastian Scheplitz
Bitcoin Retests $101K as Market Consolidation Holds

Bitcoin Retests $101K as Market Consolidation Holds

BTC remains range-bound between $101,000 and $105,000 as the market waits for new catalysts. Despite the pause in momentum, the leading cryptocurrency continues to show underlying strength.

Anton Kryshtal
Bitcoin Retreats Toward $101,000 Amid Mounting Sell-Side Pressure

Bitcoin Retreats Toward $101,000 Amid Mounting Sell-Side Pressure

Bitcoin loses ground despite notable net inflows into spot ETFs, signaling a bearish short-term trend.

Anton Kryshtal
MORE
Earning by Habit: How Crypto Weaves into Everyday Actions

Earning by Habit: How Crypto Weaves into Everyday Actions

You can now earn crypto tokens for your most routine daily habits — shopping, working out, or grabbing breakfast at a café. But how does it actually work?

Yara Zornell
Valletta: How Blockchain Became a Growth Engine for the Island of the Hospitallers

Valletta: How Blockchain Became a Growth Engine for the Island of the Hospitallers

Malta attracts crypto companies from around the world — flexible regulation, low taxes, and a prestigious European jurisdiction have turned the small city of Valletta into a land of opportunity.

Iaroslava Kramarenko
MORE