Project
Scientific Task Distillation Pipeline
Project Brief
A task-construction pipeline that turns quantum-physics papers and datasets into benchmark-ready tasks, task profiles, scoring inputs, and provenance-tracked exports. It supplies the evaluation stack with structured scientific work rather than isolated question-answer pairs.
Project Tech Stack
Approach
Combined LLM-based interpretation and task synthesis with deterministic validation and export controls to keep benchmark authoring traceable and reproducible.
The emphasis is on turning messy research artifacts into task structures and reviewable scientific work that can be versioned and reused.
Outcome
The pipeline acts as the task-construction layer for QDevBench, converting papers, datasets, and scientific context into benchmark-ready assets with preserved provenance.
Media
Prompt:
Detect Spin-Orbit-Driven Phase Contrast
Use the compressibility and local magnetometry instrument to sweep the in-plane magnetic field across the VI to IVC transition in rhombohedral trilayer graphene. From the resulting field-dependent thermodynamic observables, determine whether the two phases keep the same in-plane spin response or whether low-field behavior reveals an additional coupling that differentiates them. Report the qualitative conclusion and the supporting observable pattern.
Answer:
The two phases do not keep the same in-plane spin response. Delta mu stays nearly constant while Delta m_parallel changes strongly at low field, supporting intrinsic spin-orbit coupling that suppresses the IVC phase at low in-plane field.
Rubrics:
- Score 4: The response identifies the field-sweep measurement, uses
b_parallel_tas the control, analyzes bothdelta_mu_microevanddelta_m_parallel_mu_b_per_electronacross the sweep, and concludes that low-field behavior reveals a finite in-plane spin-response difference consistent with intrinsic spin-orbit coupling suppressing the IVC phase. The workflow, not just the final sentence, is scientifically complete. - Score 3: The response performs the field-dependent comparison and reaches the correct qualitative conclusion, but omits one supporting detail such as the near-constancy of
delta_mu_microevor the specific low-field emphasis indelta_m_parallel_mu_b_per_electron. - Score 2: The response shows partial workflow evidence, such as inspecting only one observable or describing the sweep only loosely, and gives a plausible but weakly supported conclusion.
- Score 1: The response mentions the topic or gives an unsupported conclusion with little or no usable acquisition-and-analysis path.
- Score 0: The response is missing, irrelevant, or purely guessed.
- Key response elements: identify the field sweep, compare both observables, focus on low-field behavior, and connect the contrasted response to spin-orbit-driven differentiation of the phases.
- Common mistakes to penalize: skipping the sweep, reading only one observable, treating the data as a single summary number, overclaiming a mechanism without comparing both observables, or giving the final answer without showing the analysis path.
- Autograder instructions: award the highest score whose requirements are satisfied, cite missing workflow steps when deducting points, and do not award full credit for answer-only guessing even if the final conclusion text is correct.