Scaling Quantum Materials Research:

TL;DR: I study quantum materials, build autonomous experiment systems and agent evaluation pipelines.

Quantum Materials Research

Quantum Materials Research

My Publications

Experimental Systems

I also built/maintain heavy lab systems during my PhD:

Autonomous Experimental Control

Autonomous Experimental Control

Scientific Cognition Layer

The planning layer for long-running experiments, carrying intent, retrieval, memory, and iteration without absorbing raw instrument state into the same context.

2026

Agent runtime, scientific orchestration

QDevBot

A high-level scientific agent runtime for long-horizon experimental work. It carries planning, retrieval, memory, and iteration across runs without absorbing raw device uncertainty into the same context, and coordinates lower layers to decide what to try next.

Execution Layer

The bounded control layer for live instrument work, where actions are verified against readback state so nondeterministic execution remains inspectable and recoverable.

2026Public

Service architecture, operator controls

Scientific Instrument Agent Service

A bounded execution service for live instrument work under uncertain conditions, where device state is only partially observable and execution can drift. It enforces an action-observation-update loop with state verification and exposes that control layer over HTTP to higher-level agents.

Interface Layer

The contract layer that standardizes heterogeneous instruments into a shared control surface, keeping cross-device orchestration consistent as new hardware is added.

2026Public

Instrument interface kit, device onboarding

Scientific Instrument Driver Kit

A driver framework that turns heterogeneous instrument APIs into a consistent control surface. It provides machine-readable capabilities and reusable control contracts so new hardware can be added without bespoke orchestration logic for each device.

Perception Layer

The interpretation layer that compresses scan images and metadata into compact quality signals, so higher-level planning does not depend on raw outputs from each run.

2025Public

Dataset curation, quality modeling

STM Scan Quality Assessment Service

A perception service that evaluates STM scans from images and scan metadata. It turns raw scan output into compact quality judgments that higher layers can use without directly consuming measurement-level detail.

Agentic Scientific Evaluation

Agentic Scientific Evaluation

Task Distillation

Research artifacts distilled into benchmark-ready tasks, task profiles, and provenance-tracked scoring inputs.

2026

Task distillation, provenance exports

Scientific Task Distillation Pipeline

A task-construction pipeline that turns quantum-physics papers and datasets into benchmark-ready tasks, task profiles, scoring inputs, and provenance-tracked exports. It supplies the evaluation stack with structured scientific work rather than isolated question-answer pairs.

Behavioral Benchmarking

Execution, tracing, and grading harnesses for evaluating agent behavior on realistic multi-step scientific tasks.

2026

Benchmark harness, grading loops

QDevBench

A benchmark harness for evaluating multi-step agent behavior in quantum research environments. It traces tool use, intermediate decisions, grading signals, and task trajectories rather than relying only on final answers.

Iteration Loop

A feedback path from benchmark findings back to runtime and control-system revision.

Benchmark Feedback Loop

Benchmark traces, operator feedback, and runtime observations feed back into the next revision of task design, instrumentation logic, control policies, and agent behavior.

Explorations

Explorations
2025Public

Workflow orchestration prototype

LLM Workflow Harness

Built an early LLM workflow orchestration harness that translated natural-language intent into JSON-defined task graphs and schema-constrained tool execution for MATLAB and STM workflows.

2026Public

Persistent world-state orchestration

LLM-Driven Open-World Simulation Engine

Built a state-management harness that lets a stateless LLM run a consistent evolving open-world game across long sessions.