Insider Brief
- Researchers at Princeton University developed Qumus, an AI-powered robotic laboratory system that autonomously creates graphene and fabricates atomically thin quantum material devices including graphene transistors.
- The embodied AI system combines large language models, robotics, computer vision and automated laboratory equipment to plan, execute, analyze and revise quantum materials experiments without human intervention.
- The study suggests autonomous AI experimentalists could accelerate discovery and fabrication workflows in quantum materials, van der Waals structures and future quantum electronics.
An autonomous quantum materials research system has taken a step toward turning AI from a digital assistant into a physical laboratory scientist by autonomously creating graphene and fabricating atomically thin transistors inside a robotic mini-lab, according to a new study from researchers at Princeton University and collaborators.
The system, described in a paper posted to the preprint server arXiv, combines large language models, computer vision, robotics and automated laboratory equipment into what the researchers call the first “AI quantum materials experimentalist.” The platform, named Qumus, can receive natural-language requests, design experimental workflows, operate lab hardware, analyze results, correct mistakes and generate reports with little or no human intervention.
The work represents a clear demonstrations of “embodied AI” in scientific research, which are systems that not only reason digitally but also physically manipulate instruments and materials in the real world. The researchers report the approach could accelerate discovery in quantum materials, semiconductor devices and nanotechnology, fields where experiments often remain slow, manual and dependent on highly trained specialists.

The study focused on two-dimensional quantum materials, ultrathin crystals only atoms thick that can exhibit unusual electrical and quantum behaviors. Since the discovery of graphene in 2004, scientists have identified thousands of layered materials that could potentially be peeled down into atomically thin sheets and stacked into engineered structures known as van der Waals heterostructures. Those materials are considered promising for next-generation electronics, sensing systems and quantum devices.
Yet progress has been constrained by labor-intensive workflows. Producing usable flakes of materials such as graphene often involves repeated cycles of mechanical exfoliation, microscope inspection, alignment and transfer. Researchers said the process remains difficult to scale and highly dependent on expert judgment. The Qumus platform attempts to automate that entire chain.
AI-Run Research Group
According to the study, the system operates like a small AI-run research group. A lead AI agent acts as an orchestrator, while specialized sub-agents handle tasks such as project planning, laboratory monitoring, device design and physical processing. The system can consult prior experimental history, evaluate available materials and instruments, design fabrication plans and execute them through robotic hardware.
The physical setup includes robotic arms, microscope systems, temperature-controlled stages, automated Scotch-tape exfoliation equipment and machine-vision systems capable of identifying microscopic material flakes. The platform also uses computer-vision models based on YOLO — short for ‘You Only Look Once,’ a widely used AI image-recognition system — to monitor chips, tools and materials across the laboratory workspace.
In one demonstration, a human user simply asked the system: “Can you give me a graphene flake?” Qumus interpreted the request, checked whether graphene samples already existed in its database and, when none were found, autonomously carried out exfoliation and flake-search procedures until it produced a graphene sample. The researchers said the only human involvement required was supplying raw materials and electricity.
The study also explored how different large language models altered Qumus’ behavior. Researchers tested versions powered by models from OpenAI, Google, Anthropic, xAI, Alibaba and DeepSeek. While all successfully completed experiments, the researchers found they behaved differently in terms of caution, efficiency, consistency and willingness to act quickly.
The team described these behavioral differences as resembling the personalities of human researchers. Some models spent more time reasoning and checking conditions before acting, while others moved more aggressively into execution. Researchers quantified these tendencies using metrics such as “bias for action,” “caution” and “token efficiency.”

Open-Ended Optimization
One of the most significant experiments involved open-ended optimization rather than a fixed instruction. Researchers asked Qumus to create a graphene flake larger than 200 square micrometers and erased its previous experimental history, forcing the system to start from scratch.
The AI then independently explored a set of fabrication parameters, including substrate temperature, heating time, massage cycles and tape peel-off speed. After several iterative runs spanning more than four hours, the system eventually succeeded in creating a sufficiently large graphene flake. The researchers said the system behaved similarly to an experienced human experimentalist by generating hypotheses, evaluating failures and refining parameters based on observations from prior runs.
Another experiment highlighted the system’s ability to recover from unexpected errors.
During fabrication of hexagonal boron nitride, or hBN, a researcher intentionally removed a chip that Qumus was actively processing. The system detected the problem using computer vision, confirmed the chip was missing and generated a new plan to restart the experiment. In a second failure, one of the language models incorrectly labeled the material as graphene instead of hBN — a hallucination error common in generative AI systems. Qumus again identified the inconsistency and restarted the process until it successfully produced the requested material.
The researchers said this demonstrated true closed-loop experimentation, where the system continuously monitors outcomes and adjusts behavior without external correction.
The most complex demonstration involved fabrication of a graphene transistor.
In response to a request for a “graphene transistor,” Qumus designed a multilayer device architecture using graphene and hBN flakes placed onto prepatterned metal electrodes. The system searched its material inventory, generated a device layout, selected suitable flakes, aligned them and performed dry-transfer stacking to assemble the device. The entire process reportedly took about 90 minutes and involved roughly 30 procedural steps and 18 decision-making calls among AI agents.
The resulting structure functioned as an atomically thin field-effect transistor, one of the basic building blocks of modern electronics.
Growing Interest in AI-Run Labs
There’s growing interest in AI-driven laboratories. Over the past several years, researchers have begun combining machine learning with automated experimentation in chemistry, biology and materials science. Previous systems have included autonomous chemistry labs, robotic solar-cell optimization systems and AI-assisted gene-editing workflows.
However, many earlier platforms relied on predefined rules or narrow machine-learning models rather than flexible language-model reasoning. According to the team, their system differs because it combines planning, memory, multimodal sensing and physical execution into a unified architecture capable of handling unpredictable laboratory conditions.
Limitations and Future Work
The robot is not ready to crank out mass amounts of graphene transistors just yet, according to the paper.
The researchers acknowledged that the platform remains constrained by hardware speed rather than AI reasoning. Much of the system’s total runtime was consumed by physical processes such as robotic movement, microscope focusing and thermal stabilization rather than language-model computation.
The system also operates in a highly specialized environment focused on two-dimensional materials. Extending the approach to broader scientific disciplines may require substantial customization of both robotic hardware and AI workflows.
Hallucination errors from large language models remain another challenge. Although Qumus corrected some mistakes autonomously, the study showed that AI-generated errors can still disrupt experiments and require additional validation layers.
The work also raises questions about reproducibility, reliability and laboratory safety. Scientific experiments often involve ambiguous outcomes, contamination risks and edge cases that can be difficult for autonomous systems to interpret. While Qumus operates within predefined workflow boundaries and hardware constraints, scaling such systems into larger or more dangerous laboratory environments could introduce additional risks.
Another limitation is that the current demonstrations remain relatively simple compared with the broader ambitions of autonomous scientific discovery. Producing graphene flakes and basic transistor structures is a major engineering achievement for robotics and AI integration, but it does not yet represent independent scientific insight or discovery of fundamentally new materials.
Even so, researchers report the system establishes a framework that could evolve rapidly as both AI models and robotic systems improve.
The paper suggests future versions could operate inside inert-atmosphere gloveboxes, allowing handling of air-sensitive quantum materials that degrade when exposed to oxygen or moisture. The researchers also envision networks of AI laboratories coordinating experiments across different scientific domains.
The broader implication is that, with future work, AI systems may increasingly move beyond analyzing scientific data and into physically conducting experiments themselves.
This transition could prove especially important in fields such as quantum materials research, where experimentation is often bottlenecked by scarce human expertise and labor-intensive procedures. If embodied AI systems can reliably automate those tasks, researchers may be able to explore vastly larger combinations of materials, geometries and fabrication methods than human teams alone can practically manage.
The study was led by researchers from Princeton University, including contributors from the university’s Department of Physics, Department of Electrical and Computer Engineering and Princeton AI Lab. Additional collaborators came from University of Michigan, California State University, Northridge and Japan’s National Institute for Materials Science, including researchers affiliated with its Research Center for Electronic and Optical Materials and Research Center for Materials Nanoarchitectonics.
For a deeper, more technical dive, please review the paper on arXiv. It’s important to note that arXiv is a pre-print server, which allows researchers to receive quick feedback on their work. However, it is not — nor is this article, itself — official peer-review publications. Peer-review is an important step in the scientific process to verify results.



