Scaling Quantum Materials Research:
TL;DR: I study quantum materials, build autonomous experiment systems and agent evaluation pipelines.
Quantum Materials Research
Quantum Materials Research
My Publications
Experimental Systems
I also built/maintain heavy lab systems during my PhD:
Procedure redesign, operator reliability
Cryogenic Cooling System Optimization (He-3 Cryostat)
Redesigned the cryostat operating sequence to shorten cooldown from 7 hours to under 3 hours.
Lab infrastructure restoration
Liquid Helium Recovery System Restoration
Restored a dormant liquid-helium recovery system through hardware repair, calibration, and operating-path recovery.
Autonomous Experimental Control
Autonomous Experimental Control
Scientific Cognition Layer
The planning layer for long-running experiments, carrying intent, retrieval, memory, and iteration without absorbing raw instrument state into the same context.
Agent runtime, scientific orchestration
QDevBot
A high-level scientific agent runtime for long-horizon experimental work. It carries planning, retrieval, memory, and iteration across runs without absorbing raw device uncertainty into the same context, and coordinates lower layers to decide what to try next.
Execution Layer
The bounded control layer for live instrument work, where actions are verified against readback state so nondeterministic execution remains inspectable and recoverable.
Service architecture, operator controls
Scientific Instrument Agent Service
A bounded execution service for live instrument work under uncertain conditions, where device state is only partially observable and execution can drift. It enforces an action-observation-update loop with state verification and exposes that control layer over HTTP to higher-level agents.
Interface Layer
The contract layer that standardizes heterogeneous instruments into a shared control surface, keeping cross-device orchestration consistent as new hardware is added.
Instrument interface kit, device onboarding
Scientific Instrument Driver Kit
A driver framework that turns heterogeneous instrument APIs into a consistent control surface. It provides machine-readable capabilities and reusable control contracts so new hardware can be added without bespoke orchestration logic for each device.
Perception Layer
The interpretation layer that compresses scan images and metadata into compact quality signals, so higher-level planning does not depend on raw outputs from each run.
Dataset curation, quality modeling
STM Scan Quality Assessment Service
A perception service that evaluates STM scans from images and scan metadata. It turns raw scan output into compact quality judgments that higher layers can use without directly consuming measurement-level detail.
Agentic Scientific Evaluation
Agentic Scientific Evaluation
Task Distillation
Research artifacts distilled into benchmark-ready tasks, task profiles, and provenance-tracked scoring inputs.
Task distillation, provenance exports
Scientific Task Distillation Pipeline
A task-construction pipeline that turns quantum-physics papers and datasets into benchmark-ready tasks, task profiles, scoring inputs, and provenance-tracked exports. It supplies the evaluation stack with structured scientific work rather than isolated question-answer pairs.
Behavioral Benchmarking
Execution, tracing, and grading harnesses for evaluating agent behavior on realistic multi-step scientific tasks.
Benchmark harness, grading loops
QDevBench
A benchmark harness for evaluating multi-step agent behavior in quantum research environments. It traces tool use, intermediate decisions, grading signals, and task trajectories rather than relying only on final answers.
Iteration Loop
A feedback path from benchmark findings back to runtime and control-system revision.
Benchmark Feedback Loop
Benchmark traces, operator feedback, and runtime observations feed back into the next revision of task design, instrumentation logic, control policies, and agent behavior.
Explorations
Explorations
Workflow orchestration prototype
LLM Workflow Harness
Built an early LLM workflow orchestration harness that translated natural-language intent into JSON-defined task graphs and schema-constrained tool execution for MATLAB and STM workflows.
Persistent world-state orchestration
LLM-Driven Open-World Simulation Engine
Built a state-management harness that lets a stateless LLM run a consistent evolving open-world game across long sessions.