GPT-5 Serves as Research Assistant in Proving One of Quantum Computing Theory's Trickiest Theorems

Insider Brief

Researchers say they proved that black-box error reduction in QMA, a quantum version of NP, cannot surpass doubly exponential completeness or exponential soundness.
The work builds on earlier results by Stacey Jeffery and Freek Witteveen and extends Scott Aaronson’s 2008 oracle separation by making it quantitative.
A key step in the proof came from GPT-5, marking one of the first instances of an AI model contributing to research in quantum complexity theory.

OpenAI’s GPT-5 may have just helped prove a theorem in quantum computing theory — potentially marking the first time artificial intelligence has contributed a key step in quantum complexity research.

In a new paper, published on the pre-print server arXiv, researchers show that methods for reducing error in Quantum Merlin-Arthur (QMA) — a quantum version of NP, a problem where a solution can be checked quickly — hit a hard ceiling. However, in a research twist, the scientists report that a key step in the proof came not from a human colleague, but from GPT-5.

Scott Aaronson, a computer scientist at the University of Texas at Austin, and Freek Witteveen of CWI Amsterdam set out to answer a question that had hung open for years: how far can black-box techniques push the reliability of QMA proof systems?

Early attempts bogged down in dense analysis. Aaronson decided to ask GPT-5. The model’s first attempts were wrong. He pushed back, and it tried again. After a few rounds of this back-and-forth — much like advising a graduate student — GPT-5 suggested reframing the problem using a simple mathematical expression that captured how a verifier’s chance of accepting a proof could inch close to certainty.

That shift turned out to be exactly the handle the researchers needed. By analyzing the expression with tools from approximation theory, they could finally prove sharp limits on QMA error reduction.

“I had tried similar problems a year ago, with the then-new GPT reasoning models, but I didn’t get results that were nearly as good,” writes Aaronson, in his Shtetl Optimized blog. “Now, in September 2025, I’m here to tell you that AI has finally come for what my experience tells me is the most quintessentially human of all human intellectual activities: namely, proving oracle separations between quantum complexity classes.

What Is QMA?

QMA is considered the quantum cousin of NP. A “prover” Merlin sends a quantum witness — a special quantum state — to the “verifier” Arthur, who runs a quantum algorithm to check if the answer to a problem is “yes.” If the proof is valid, Arthur should accept it with high probability. If it’s false, he should reject it.

Two numbers define these systems: completeness, the chance Arthur accepts a valid proof, and soundness, the chance he mistakenly accepts a false one. The conventional thresholds are 2/3 and 1/3. But these numbers can be pushed by amplification, where the verifier repeats the test and combines results.

Amplification can shrink the gap between certainty and error, making protocols nearly perfect. But the real question is: Can completeness reach perfection? That’s an open question behind the variant known as QMA1.

Earlier Attempts to Close the Gap

In 2008, Aaronson showed that black-box methods alone couldn’t answer whether QMA equals QMA1. He constructed a quantum oracle — a hypothetical black-box device — that separated the two classes. That meant any proof of equality would need to go beyond black-box techniques.

Then in 2025, Stacey Jeffery and Witteveen produced a significant advance: they showed that completeness can be amplified to be doubly exponentially close to one. The result was stronger than anything before and raised a tantalizing question. Could amplification go even further — triply exponential, or beyond?

Aaronson posed that question to Witteveen after a seminar talk. Within a week, the two had a proof that Jeffery and Witteveen’s bound was already optimal.

Findings of the New Paper

Their new paper, Limits to black-box amplification in QMA, proves two things, according to the researchers:

Completeness ceiling: Amplification can push the chance of accepting a valid proof extremely close to one, but no closer than doubly exponential.
Soundness floor: The chance of wrongly accepting a false proof can be suppressed to exponentially small, but no better.

These results may finally pin down the exact limits of black-box amplification in QMA. The asymmetry shows that completeness requires only one good witness to succeed, while soundness must hold against all possible witnesses.

The results have two layers of significance. For complexity theorists, they close the book on how far black-box error reduction can go in QMA. Any resolution of QMA versus QMA1 will have to use “nonrelativizing” methods, which are strategies that exploit the internal structure of circuits rather than treating them as opaque boxes.

Perhaps for a broader audience, the bigger news is that GPT-5 supplied the missing spark. Complexity theory is often considered the last redoubt of purely human thought — abstract, symbolic, and resistant to machine assistance. That an AI could make a concrete, useful suggestion in this domain may signal a pivot in the role of AI in science.

How the Proof Worked

The technical heart of the proof was to show that a verifier’s acceptance probability, when parameterized by a hidden oracle value, could not hover “too close” to certainty for long.

GPT-5 proposed focusing on a single mathematical function that neatly encoded how close the acceptance probability came to one. Once in hand, Aaronson and Witteveen could use established theorems from approximation theory to bound the growth of this function. It was through that analysis that the researchers proved that doubly exponential completeness is the ceiling and exponential soundness is the floor.

The rest of the argument built on Aaronson’s 2008 oracle construction, but now made quantitative. The outcome is a precise set of limits that match Jeffery and Witteveen’s earlier amplification result and confirm it as the best possible.

Questions And Future Research Directions

According to the study, the proof applies to black-box techniques, so that doesn’t rule out other methods that exploit specific circuit structures. It also depends on oracle models, which are mathematical stand-ins rather than real devices.

Another caveat is dimensionality. The proof holds for finite witness registers. In an infinite-dimensional setting, Jeffery and Witteveen showed earlier that perfect completeness can be achieved with the right gate set, though this doesn’t expand the underlying computational power of QMA.

In the future, researchers will likely investigate one of the biggest unresolved problems: whether QMA equals QMA1. This study shows that black-box methods can’t answer it, but it leaves open the possibility of new, more sophisticated techniques.

Other research directions include refining the oracle construction, exploring fixed-parameter cases, and extending approximation-theory tools to other complexity classes.

On the AI side, the open question is how far models like GPT-5 can scale their contributions. Today, the model can suggest useful constructions, but it can’t yet produce entire rigorous proofs. For now, the sweet spot is collaboration: humans spot errors, validate ideas and provide structure, while AI supplies unconventional proposals that might otherwise be overlooked.

Aaronson has noted that earlier reasoning models failed to offer anything comparable. GPT-5 was different: it could sustain a technical dialogue, adapt when corrected, and eventually produce a clever idea. If a graduate student had proposed the same step, he said, it would have earned praise for originality.

That may mark the start of a new phase in AI’s relationship with science. Not a wholesale replacement of human theorists, but a partnership where AI models help push research forward — even in fields as abstract as quantum complexity.

Perhaps that leads to one more open question in the research: Will — or when will — AI completely replace human scientists?

Aaronson suggests human scientists — even untenured ones — are safe for now.

“Right now, it almost certainly can’t write the whole research paper (at least if you want it to be correct and good), but it can help you get unstuck if you otherwise know what you’re doing, which you might call a sweet spot,” writes Aaronson. “Who knows how long this state of affairs will last? I guess I should be grateful that I have tenure.”

About The Quantum Insider

The Quantum Insider is recognized as the world’s leading source for timely quantum computing news, industry insights, and market intelligence. Our editorial team delivers trusted analysis to researchers, investors, and industry leaders.