Quantum Computers Aren't Quite Ready For LLM Vibe Coders, Study Finds

Insider Brief

A new study finds that while large language models like ChatGPT can reliably generate simple quantum code for IBM simulators, they falter on real hardware and struggle with photonic systems like Xanadu’s.
The researchers’ model performed well using Qiskit to write circuits for entangled states and teleportation on IBM’s simulator, but consistently failed on IBM hardware due to outdated syntax and lack of debugging capability.
Results were significantly worse with Strawberry Fields, where ChatGPT repeatedly misapplied quantum gates, misunderstood hardware constraints, and produced code that was often invalid or unusable.

The era of quantum computing vibe coding may be nearing, but it still has a ways to go before coders can use LLMs to help program actual quantum computers, according to a recent study.

Human programmers use large language models (LLMs), like ChatGPT, to to assist writing code. However, programming a quantum computer is notoriously difficult, often requiring specialized knowledge of quantum physics and hardware-specific syntax. Now, scientists from Southern Methodist University suggest LLMs could ease the burden — at least in some cases — by generating valid quantum code with little or no prior experience from the user.

The findings, published in the preprint server arXiv in June, reveal a mixed picture: while ChatGPT performs well when asked to write simple quantum circuits for IBM’s superconducting machines using Qiskit, it frequently stumbles when programming for Xanadu’s photonic quantum hardware using Strawberry Fields. The researchers conclude that although large language models can serve as useful entry-level programming aids for certain quantum platforms, their capabilities are far from universal.

The research team tested GPT-4, the latest version of OpenAI’s ChatGPT, on its ability to write programs for two common quantum computing tasks: creating a maximally entangled state and performing quantum teleportation. These tasks were chosen because they are fundamental to quantum information processing and well-documented in both the academic literature and cloud-based tutorials.

To test across hardware types, the study selected IBM’s gate-based quantum computers, which use discrete qubits, and Xanadu’s photonic systems, which rely on continuous-variable quantum computing. Each platform has its own software development kit (Qiskit for IBM and Strawberry Fields for Xanadu) which must be used to write and execute programs on their respective machines or simulators.

Strong Performance With IBM Simulators, Not So Much With Real Hardware

When ChatGPT was asked to write Qiskit programs for IBM’s simulator backend, it generated correct code for all five test cases for the entangled state task, with no errors. In each instance, the model implemented the correct circuit logic — a Hadamard gate followed by a Controlled-NOT gate — and provided useful comments and explanations. The teleportation programs for Qiskit simulators were also largely correct, with occasional logic errors that the model often fixed after being prompted.

However, when the same tasks were requested for IBM’s actual quantum hardware, performance slipped. Here, every program ChatGPT generated failed to execute because it relied on outdated methods for accessing hardware backends. The model could not diagnose or resolve the issue, and while it sometimes provided generic debugging suggestions, it was unable to adjust the code to match the most recent syntax updates in Qiskit.

Weaker Results on Xanadu’s Systems

Performance declined sharply when ChatGPT was asked to write programs using Strawberry Fields. Even on the simulator backend, most programs suffered from both syntax and logic errors. In many cases, the model incorrectly used discrete gates — appropriate for IBM but not valid in Xanadu’s continuous-variable framework — and failed to properly configure quantum operations such as squeezer gates or beam splitters. Only one program across all sessions ran successfully without error.

The errors multiplied when ChatGPT was tasked with targeting Xanadu’s X8 hardware. The hardware imposes strict constraints on allowable gate types and circuit configurations, many of which are not enforced or auto-corrected by Strawberry Fields. ChatGPT ignored these constraints in nearly all cases, leading to invalid programs that could not be run. Moreover, the model repeatedly misidentified the kind of quantum states it was generating, confusing discrete-variable Bell states with continuous-variable entangled states, and often failed to explain or even mention the limitations of the hardware platform.

The researchers emphasize that the disparity likely stems from differences in available training data. IBM’s Qiskit platform has been widely used and well-documented online for years, creating a rich source of examples for ChatGPT to learn from. Xanadu’s tools and documentation are more recent and less prevalent on public forums, limiting the model’s exposure to high-quality reference material.

This conclusion is supported by the types of mistakes ChatGPT made: it was able to replicate canonical Qiskit circuits frequently found on sites like Stack Overflow and Medium, but it hallucinated syntax, operations, and device capabilities when programming Strawberry Fields circuits from scratch.

Limitations and Future Work

The study focused exclusively on unmodified use of ChatGPT with no prompt engineering or custom training. The prompts were designed to simulate how a novice user might ask for help, such as “Write a Qiskit program that produces a maximally entangled state using the AER simulator.” The researchers did not test more advanced interactions or multi-turn refinements, which might improve results with more sophisticated prompting.

Another limitation was the exclusive use of GPT-4. Other large language models from Anthropic, Google, Meta, and Microsoft may perform differently, especially if fine-tuned for quantum programming. Future work could explore model comparisons, prompt engineering techniques, and performance on more complex algorithms or hardware from additional vendors.

LLMs as Helpers, Not Substitutes for Quantum Expertise

The study suggests that while large language models can help bridge the gap for beginners in quantum programming, —particularly on well-documented platforms like IBM Qiskit, they are not yet reliable stand-ins for experienced developers, especially when hardware-specific constraints come into play.

The researchers also said that future advances in both quantum computing and LLMs could change that.

They write: “Of course this work is just the beginning of an investigation of LLMs and quantum programming. Because Strawberry Fields currently seems the more challenging target, the next step for assessing ChatGPT’s abilities would be to experiment with Qiskit circuits for additional algorithms, including those not as readily available online. Additional future work includes testing with LLMs mentioned (in Sec. 1), considering other quantum computing hardware and software providers and better tailoring queries through prompt engineering.”

For a deeper, more technically precise examination of the work, which this summary story can’t provide, please review the paper on the pre-print server arXiv. Pre-print servers help researchers quickly distribute study results, especially in fast-moving fields, such as quantum computing; however, it is not officially peer-reviewed, which is a necessary step in the scientific method.

The research team included Elena R. Henderson, Jessie M. Henderson, Joshua Ange and Mitchell A. Thornton.