13 Jan 2025

Can AI Get Worse Over Time

Can AI Get Worse Over Time

Users on online forums noticed that AI bots’ performance changes over time. The quality of answers may drop, or a bot may refuse to respond to some questions altogether. The responses we get depend on different factors.

On this page

AI bots generate answers based on Large Language Models (LLMs), which are trained on large amounts of information from the web, books, and other sources. A team of computer scientists behind the creation of an LLM integrates the model into a product, which processes requests and responses. To come up with an answer, a system repeats patterns it’s trained on, including how sentences and paragraphs are structured.

AI chatbots and LLMs have been developed by different companies. For example, ChatGPT was developed by OpenAI and uses the GPT language model. Users can choose between the GPT-3.5, GPT-4, GPT-4 Turbo, and the most recent GPT-4o model.

Each new model presents new features and capabilities. What’s interesting is that not only do the capabilities of these models vary from each other, but the same version of the model can change in its accuracy and results.

Examples of How AI Models’ Performance Declined

It takes time to fully understand the performance and capabilities of an AI model. The model's capabilities become more apparent as more users interact with the system and provide feedback or examples that the system can learn from. Each user interaction requires computational resources, and collectively, these interactions lead to algorithm updates. As a result, there may arise issues, such as response delays, inaccuracies, and failure to respond to certain questions. Typically, the changes are noticeable several months later after the bot’s launch. 

Take for example the GPT-4, launched in March 2023. According to user reviews, it was surprisingly good at the start but began making mistakes later. On ChatGPT Forums you can see discussions like “WHY ChatGPT 4.0 is getting stupider and stupider,” or “Is Chat GPT getting worse or are my prompts getting worse?” According to users, the bot’s answers were limited, they didn’t provide detailed explanations or failed to catch and fix mistakes. 

A study by Stanford confirmed these concerns. Evaluating the GPT-4 and GPT-3.5 models, researchers found that their June version did get significantly more mistakes compared to March. The GPT-4’s mathematics accuracy in answering some problems fell from 98% to 2% during that time. The model failed to respond to questions, like if 17077 is a prime number. It kept saying no, although the correct answer was yes. 

Stanford’s study, however, faced disagreements. A response article published by Princeton University computer scientists questioned the methods that were used. While agreeing that GPT’s behavior has changed, the paper mentioned that using another evaluation method, for example testing the model not only on prime numbers, would result in a different picture. At the same time, Princeton University scientists found that the answers did change from March to June. In particular, while in March GPT-4 almost always guessed the number was prime, in the June version it almost always guessed it was composite. The authors believe the interpretation of the model’s behavior as a massive performance drop by the earlier study is incorrect. Instead, they state the bot demonstrated behavioral change affected by regular fine-tuning and prompting. It can still give the correct answer but may require new prompting strategies. According to the Princeton research: 

One important concept to understand about chatbots is that there is a big difference between capability and behavior. A model that has a capability may or may not display that capability in response to a particular prompt. It is little comfort to a frustrated ChatGPT user to be told that the capabilities they need still exist, but now require new prompting strategies to elicit.

GPT isn't alone in its struggle. Users of other AI bots, including Anthropic’s Claude, pointed out worsening performance, too. They mention that since the app’s release, it became slower, and less accurate. 

User review on Claude. Source: reddit.com

User review on Claude. Source: reddit.com

Why AI Models Become Worse Over Time? 

Generally, AI apps including ChatGPT and Claude are closed-source, meaning full details about how they are trained and work are not available. Although the exact reasons behind the deteriorating performance are hard to evaluate, there are several factors that make an impact. Among them is the challenge of fine-tuning large language models in different directions without compromising any of their capabilities. The Stanford research also mentions: 

Improving the model’s performance on some tasks, for example with fine-tuning on additional data, can have unexpected side effects on its behavior in other tasks. Consistent with this, both GPT-3.5 and GPT-4 got worse on some tasks but saw improvements in other dimensions.

The study also found changes in how the models reason and the level of detail in their answers.

AI models like ChatGPT learn not only from data they are trained on but also adjust and learn from user inputs. If you regularly use ChatGPT, you have probably noticed that the app asks for feedback. For example, on ChatGPT, you can dislike the answers that are not helpful and provide specific feedback on why. OpenAI removed the like button, focusing more on negative responses and nuanced feedback. It can sometimes deliver two possible versions of a response and ask which one you prefer to help improve its language generation capabilities.

ChatGPT adjusts its performance based on collected user feedback. However, doing so in a way that everyone likes is practically impossible.

Another crucial factor influencing AI models' responses is the implementation of safety guidelines. Programs increasingly refuse to answer sensitive questions to avoid bias and harmful outputs. On the positive side, this approach enhances safety and ethical standards, but it can also limit the system's capabilities in certain contexts. 

For example, Claude refuses to identify people in photos due to privacy and safety policies. To test the app, we attached a photo of Michael Jackson, asking the bot to tell us who it was. Claude explained that it can’t name the person “per its guidelines.”  However, it provided a descriptive hint, acknowledging that the image depicted “an incredibly influential and celebrated figure in the music industry whose artistry and performances captivated audiences worldwide.” This shows that while Claude recognized Michael Jackson, it is trained to refrain from directly answering such questions. 

Claude refuses to identify Michael Jackson in the photo due to safety measures. Source: claude.ai

Claude refuses to identify Michael Jackson in the photo due to safety measures. Source: claude.ai

What Will Happen in the Future? 

AI models’ performance not only can improve but also decline over time. The behavior of future AI applications is largely uncertain. It depends on how developers overcome the challenge of fine-tuning models without compromising existing abilities, whether there will be standards to train AI models, and how ethical constraints will impact the quality of answers. 

The content on The Coinomist is for informational purposes only and should not be interpreted as financial advice. While we strive to provide accurate and up-to-date information, we do not guarantee the accuracy, completeness, or reliability of any content. Neither we accept liability for any errors or omissions in the information provided or for any financial losses incurred as a result of relying on this information. Actions based on this content are at your own risk. Always do your own research and consult a professional. See our Terms, Privacy Policy, and Disclaimers for more details.

Articles by this author

Latest News

MORE
The Crypto Rollercoaster of 2024 — Wins and Woes

The Crypto Rollercoaster of 2024 — Wins and Woes

The crypto sector evolved at breakneck speed in 2024. With major wins and notable setbacks, it’s time to reflect on the year’s key developments and their implications for the future.

31 Dec 2024
OpenSea Token: Release Date and How to Qualify for the Airdrop

OpenSea Token: Release Date and How to Qualify for the Airdrop

The NFT marketplace OpenSea, a pioneer in the space for the past seven years, is expected to launch its native token in 2025. A significant portion of the tokens will likely be distributed through a retroactive airdrop—a common way to reward the community for their past activity and support.

30 Dec 2024
5 Most Exciting Token Launches to Watch in 2025

5 Most Exciting Token Launches to Watch in 2025

In 2024, we saw a number of hot airdrops and token launches, from AI-powered projects to the rise of memecoins. Now, as we head into 2025, the crypto space is set to expand even further with an increasing number of cryptocurrencies.

27 Dec 2024
A Million Bitcoins for the U.S.? Cynthia Lummis’ Ambitious Plan

A Million Bitcoins for the U.S.? Cynthia Lummis’ Ambitious Plan

Wyoming Senator Cynthia Lummis has proposed an ambitious plan to create a strategic Bitcoin reserve for the United States. In a recent interview, she explained how Bitcoin could strengthen the global position of the U.S. dollar and help address the growing national debt.

23 Dec 2024

Latest News Alt

MORE
Weekly Analysis of BTC, ETH, and the Stock Market (Jan 6, 2025)

Weekly Analysis of BTC, ETH, and the Stock Market (Jan 6, 2025)

An overview of BTC, ETH, XAUT, and S&P500 charts, along with the current cryptocurrency market dynamics.

06 Jan 2025
Weekly Analysis of BTC, ETH, and the Stock Market (Dec 30, 2024)

Weekly Analysis of BTC, ETH, and the Stock Market (Dec 30, 2024)

An overview of BTC, ETH, XAUT, and S&P500 charts, and the current cryptocurrency market dynamics.

30 Dec 2024
Weekly Analysis of BTC, ETH, and the Stock Market (Dec 23, 2024)

Weekly Analysis of BTC, ETH, and the Stock Market (Dec 23, 2024)

An overview of BTC, ETH, XAUT, and S&P500 charts, and the current cryptocurrency market dynamics.

23 Dec 2024

Might Be Interesting

MORE
Mining Farms Uncovered — How Crypto Is Mined at Scale

Mining Farms Uncovered — How Crypto Is Mined at Scale

As a cornerstone of the crypto industry, mining farms drive blockchain networks. But how do they work? Uncover the mechanics behind these cutting-edge hubs and their role in the crypto landscape.

07 Jan 2025
William Quigley, WAX/Tether: Stablecoins’ Role in Global Payments

William Quigley, WAX/Tether: Stablecoins’ Role in Global Payments

William Quigley, co-founder of WAX and Tether, firmly believes that stablecoins are more than a tool for traders—they’re the key to transforming the global economy. Already central to crypto trading and cross-border payments, their future potential is even more exciting.

04 Jan 2025
Why Blockchain Is Different from Traditional Databases

Why Blockchain Is Different from Traditional Databases

In the world of business and finance, information is everything. Traditional databases have been reliable tools for decades, but blockchain presents a groundbreaking alternative. What sets it apart, and could it lead to a paradigm shift?

03 Jan 2025
How Does Multisig Works and Protect Your Assets?

How Does Multisig Works and Protect Your Assets?

As threats to digital assets evolve, multisig technology provides a highly effective security layer. By requiring multiple signatures for transactions, it significantly reduces risks such as hacking and access loss.

02 Jan 2025
Crypto Price Gaps: Why Platforms Show Different Prices

Crypto Price Gaps: Why Platforms Show Different Prices

The crypto market has nuances you may not have noticed at first glance. For example, when you want to check the Bitcoin price, you probably Google it without thinking to compare the results. But when you monitor the market regularly and engage in trading, you notice the prices aren’t the same on all platforms.

24 Dec 2024
The Czech Republic and Its Crypto-Friendly Policies

The Czech Republic and Its Crypto-Friendly Policies

The Czech Republic is emerging as a crypto-friendly nation, recognizing cryptocurrencies as legitimate payment methods and encouraging their use in business. But its regulatory framework is still taking shape. Here’s how crypto is managed today.

23 Dec 2024

Opinions

8 Commandments for Crypto Exchange Users

8 Commandments for Crypto Exchange Users

While cryptocurrency exchanges offer many security features, they are still vulnerable to hacks, fraud, and other criminal activity. Remember, no online platform can guarantee 100% protection for your funds. Follow these eight key rules to reduce your risks. Rule #1: Don’t Believe in the Myth of Absolute Exchange Security Even the largest and most seemingly […]

12 Jan 2025
10 Key Investment Trends to Watch in 2025: Green Crypto, Regulations, and More

10 Key Investment Trends to Watch in 2025: Green Crypto, Regulations, and More

Donald Trump is back, Germany’s economy is in trouble, while U.S. economic indicators seem to have a robust momentum, and interest rates are sliding downhill. Sounds dramatic? It is. But 2025 isn’t all doom and gloom—it’s full of opportunities for investors who know where to look. Whether you’re a seasoned pro or someone still figuring […]

12 Jan 2025
MORE

Interviews

Dmytro Gordon and Volodymyr Nosov: A Sensational Interview

Dmytro Gordon and Volodymyr Nosov: A Sensational Interview

Volodymyr Nosov, CEO of Europe’s largest crypto exchange WhiteBIT, sat down with Dmytro Gordon, one of Ukraine’s most prominent journalists. The interview touched on Bitcoin, crypto, WhiteBIT, cars, keys to success, and business vision.

18 Dec 2024
WhiteBIT CEO: Standing Strong Against Russian Aggression

WhiteBIT CEO: Standing Strong Against Russian Aggression

In an interview with BTC-ECHO, Volodymyr Nosov, the founder and CEO of WhiteBIT, discussed the impact of Russian aggression on the crypto exchange’s business, how WhiteBIT stays a top competitor in the industry, and when he believes our financial system will be completely transformed.

04 Oct 2024
MORE