OpenAI Releases CriticGPT to Catch Chatbot Hallucinations – Is It Enough?
After criticism of AI’s lying chatbots, OpenAI’s newest release, CriticGPT, is designed to point out ChatGPT’s errors.
July 1, 2024
Last Thursday, OpenAI unveiled its newest AI model, CriticGPT. Based off GPT-4, this new model was made to assist human trainers and reviewers to identify the bugs and errors generated by ChatGPT code. On X, the platform previously known as Twitter, former OpenAI researcher Jan Leike called it “a promising sign for scalable oversight.”
With companies wary of inaccurate chatbots, CriticGPT marks a clear attempt from OpenAI to earn confidence from its business users.
How Does CriticGPT Work?
According to a blog post by OpenAI, the company is leveraging “CriticGPT-like” models to assist its trainers in evaluating outputs from advanced AI systems.
In training LLMs to be more accurate and make fewer mistakes, OpenAI developers use a process called “reinforcement learning from human feedback” (RLHF) in which the AI’s human trainers read and rate ChatGPT’s responses to prompts against one another.
According to Dr. Jignesh Patel, co-founder of DataChat and Professor at Carnegie Mellon University, “Firms train LLMs this way because AI doesn’t ‘think’ for itself. It merely synthesizes information in ways that sound intelligent, based on our feedback.”
Behind the Scenes of CriticGPT
To develop CriticGPT, OpenAI trained the model on a “large number” of inputs containing deliberate mistakes. Humans tasked with training CriticGPT manually introduced these mistakes into ChatGPT’s code and then provided feedback necessary for the model to learn to identify and critique the coding errors.
In the process of developing CriticGPT, OpenAI researchers also created a method called “force sampling beam search” (FSBS) to help create more detailed reviews of code and adjust the thoroughness of the model’s critiques.
FSBS enables human trainers to adjust CriticGPT when looking for bugs and minimize its tendency to “hallucinate” or highlight errors that don't exist.
Room to Grow: Current Limitations of CriticGPT
That said, CriticGPT, like most AI models, still has clear limitations—as hallucination issues continue to persist.
Most notably, CriticGPT was tested using very short ChatGPT questions and has not yet been trained to address more complex tasks.
Moreover, the model focuses only on errors that can be pointed out in one place, rather than spread throughout the code.
As AI models become more advanced, OpenAI plans to continue developing this model as well as growing its current capabilities.
About the Author
You May Also Like