# Testing And Quality Assurance

NTX is validated at four levels:

1. unit tests of geometry, operators, and solver algebra
2. regression tests of file-driven workflows and publication examples
3. imported-workflow tests for autodiff, NEOPAX, and JAX geometry backends
4. CPU/GPU runtime and smoke checks

The repository now treats these as separate execution lanes, not one monolithic
developer loop:

- fast PR lane:
  - lint
  - type checking
  - sharded core unit/workflow/validation tests
  - selected integration/example tests
  - docs build
- benchmark lane:
  - literature reproductions
  - long-running fixed-field and W7-X audits
  - scaling/profiling studies
- hardware lane:
  - GPU smoke
  - office workstation profiling

This split is intentional. It keeps pull-request feedback fast while still
tracking literature-grade validation and throughput studies in reproducible
scripts.

## Current Testing Priorities

The fast lane has already reached the coverage target, so new tests should be
chosen for scientific value first. Do not add slow examples only to increase the
headline coverage number.

Near-term high-value gates are:

- monoenergetic convergence ladders for `D11`, `D31`, `D33`, and
  `onsager_residual` on small repository-owned geometries,
- Boozer-coordinate normalization identities such as
  `\mathcal J B^2 = B_\zeta + \iota B_\theta` before transport solves consume
  the geometry,
- symmetric-limit checks where a constant-field Boozer surface has no radial
  transport channels but retains the finite parallel-conductivity branch,
- artifact-backed reproductions of larger W7-X, precise-QS, and QI-family
  literature cases,
- optional public VMEC family transport ladders that stay artifact-backed
  rather than running inside normal pull-request shards,
- derivative audits that compare direct AD, prepared implicit-adjoint
  derivatives, forward-mode geometry controls, and centered finite differences
  on the same scalar outputs,
- profile and bootstrap-current workflow tests that are run only after the
  underlying coefficient and derivative gates are already green,
- explicit tests that fixed-field closure comparisons remain stress gates unless
  they also pass the integrated W7-X transfer gate.

Every new benchmark-like test must declare its lane before it is added to CI:

- `core_foundation`: small algebra, geometry, operator, solver, and helper unit tests,
- `core_cli_workflows`: small public API, CLI, namespace, packaging, and example-discovery tests,
- `core_io_workflows`: input-file, profile-script, and VMEC scan workflow tests,
- `core_parallel_workflows`: CPU/GPU script-dispatch and multiprocessing workflow tests,
- `core_neopax_workflows`: imported-database mapping and HDF5 round-trip tests,
- `core_profile_audit_workflow`: primitive-force audit artifact workflow test,
- `core_profile_basic_workflows`: ambipolar-profile and profile-family tests,
- `core_profile_optimization_workflows`: scalar and basis profile-control tests,
- `core_profile_transport_workflows`: conservative profile-transport loop tests,
- `core_autodiff_uncertainty_workflow`: autodiff uncertainty artifact workflow,
- `core_robust_bootstrap_workflow`: robust-bootstrap artifact workflow,
- `core_validation`: small validation, artifact-registry, and physics-gate tests,
- `integration_examples`: representative imported workflow tests,
- `heavy_examples_profiles`: slower profile examples,
- `heavy_examples_derivatives`: local derivative benchmark examples,
- `heavy_examples_boundary`: env-gated imported boundary/equilibrium derivative
  examples,
- `heavy_examples_publication`: publication/manuscript figure examples,
- manual benchmark: long literature reproduction or hardware profiling.

The GitHub Actions sharding is driven by the maintained manifest:

```bash
python scripts/test_lane_manifest.py --check
python scripts/test_lane_manifest.py core_foundation
python scripts/test_lane_manifest.py core_cli_workflows
python scripts/test_lane_manifest.py core_io_workflows
python scripts/test_lane_manifest.py core_parallel_workflows
python scripts/test_lane_manifest.py core_neopax_workflows
python scripts/test_lane_manifest.py core_profile_audit_workflow
python scripts/test_lane_manifest.py core_profile_basic_workflows
python scripts/test_lane_manifest.py core_profile_optimization_workflows
python scripts/test_lane_manifest.py core_profile_transport_workflows
python scripts/test_lane_manifest.py core_autodiff_uncertainty_workflow
python scripts/test_lane_manifest.py core_robust_bootstrap_workflow
python scripts/test_lane_manifest.py core_validation
python scripts/test_lane_manifest.py integration_examples
python scripts/test_lane_manifest.py heavy_examples_profiles
python scripts/test_lane_manifest.py heavy_examples_derivatives
python scripts/test_lane_manifest.py heavy_examples_boundary
python scripts/test_lane_manifest.py heavy_examples_publication
```

Any new `tests/test_*.py` file must be added to the lane manifest. Slow modules
may be split by explicit pytest node ids, but every discovered test function in
that file must be listed exactly once. This prevents new benchmark scripts from
silently landing in the fast core shard.

The imported boundary/equilibrium example reruns are intentionally opt-in:

```bash
NTX_RUN_HEAVY_BOUNDARY_EXAMPLES=1 \
  python scripts/test_lane_manifest.py heavy_examples_boundary \
  | xargs python -m pytest -q -m "not gpu"
```

Normal CI still checks the committed artifacts through the benchmark matrix and
manuscript-artifact tests. The expensive boundary reruns are used when updating
those artifacts.

Each CI test-shard job has a ten-minute job timeout. The subprocess-based
parallel profiling smoke tests also set explicit subprocess timeouts, so a
device-discovery or multiprocessing hang fails as a diagnosable test error
instead of stalling the whole workflow.

The first fast owned-geometry coefficient gate is in
`tests/test_physics_gates.py`. It checks `D11`, `D31`, `D33`, the Onsager
residual, and a coarse-to-fine angular-grid transfer on the analytic Boozer
surface. Larger literature-family convergence ladders remain artifact-backed
work rather than default pull-request tests. The current broad VMEC-family
artifact is generated by `examples/geometry_family_transport_convergence.py`
and is tracked through the benchmark matrix and physics-gate registry as a
monitored stress diagnostic.

The committed `validation_summary.json` artifact is now checked by the physics
gate registry as the release-level monoenergetic convergence artifact. The gate
requires the maximum of the DKES-style and VMEC finest plotted `N_\xi` errors to
stay below `2.5e-1`; the example test also verifies that the plotted convergence
ladder decreases toward the finest reference point.

The same file also carries the symmetric-limit normalization gates: a
constant-field Boozer surface must have zero radial transport channels, finite
parallel conductivity, exact agreement with the `D33_spitzer` branch, and the
expected inverse-collisionality scaling of that Spitzer branch.

The operator unit tests protect the finite Legendre source projection directly:
the magnetic-drift drive must enter only the `k=0` and `k=2` rows with the
`2/3` and `1/3` weights used by the monoenergetic moment equations, while the
parallel-conductivity drive must enter only the `k=1` row as the physical `B`.

The `vmec_jax`/`booz_xform_jax` backend unit tests now also protect the
imported Boozer handedness convention directly. Scalar and profile forms must
leave `B_\zeta + \iota B_\theta >= 0`, matching the file-backed loader before
the solver evaluates the Boozer Jacobian and drift source terms.

The direct `boozmn` loader also has a half-grid radial-coordinate gate. Packed
Boozer spectra and Boozer radial profiles are selected with `s_in`, `s_b`, or
`jlist = compute_surfs + 2`; the full-grid `phi_b` profile is metadata and must
not shift the selected surface. `tests/test_boozmn.py` protects the fixture
half-grid selection, and `examples/boozmn_same_coordinate_roundtrip_audit.py`
provides the artifact-backed VMEC-to-Boozer-file round trip.

The same audit is also committed on an optimized finite-beta QA `wout` that
uses a current-profile representation not currently re-evaluated by the
optional differentiable VMEC-state path. The accepted path for that case is the
finalized `wout` magnetic-channel transform, not a reconstructed input-profile
state. `tests/test_vmec_jax_backend.py` protects the `profile_source="wout"`
and automatic fallback branches, while
`boozmn_finite_beta_wout_roundtrip_audit.json` keeps the resulting
same-coordinate `D11/D31/D13/D33` mismatch below `1e-6`.

The operator unit tests also include a derivative gate for the implicit-adjoint
lane: the hand-coded `dD_k/dnu_hat` and `dD_k/depsi_hat` blocks must match
JAX differentiation of the assembled Legendre-space operator. This is a fast
way to protect sensitivity-analysis, inverse-design, and uncertainty workflows
from normalization drift in the collisionality and radial-electric-field terms.

The committed prepared-derivative benchmark is also checked as an artifact gate:
`derivative_path_benchmark.json` must keep the prepared custom-VJP derivative
within `1e-4` relative mismatch of direct reverse-mode. The benchmark still
reports speedup, but CI treats agreement as the release claim.

The committed geometry and boundary-control derivative artifacts are now also
checked by the same physics-gate registry. The passing gates cover the owned
analytic geometry-control audit (`2e-4`), file-backed Boozer/VMEC samples
(`5e-4`), boundary-projected forward-mode current derivatives (`1e-5`), and
explicit-relaxed QA/QH boundary-to-current derivatives (`1e-4`). The
implicit-equilibrium derivative artifact remains a monitored non-shipping
diagnostic because only the equilibrium-volume objective currently matches
centered finite differences and the residual history does not contract.

The physics-facing gate structure is documented separately in
[`physics-gates.md`](physics-gates.md). The test suite and benchmark scripts are
meant to enforce that gate hierarchy, not to replace it.

The maintained claim-to-artifact map is documented in
[`benchmark-matrix.md`](benchmark-matrix.md) and can be regenerated with:

```bash
python scripts/build_benchmark_matrix.py
```

The committed performance-scaling artifacts are also schema checked in the
validation lane. That check requires each CPU/GPU smoke/heavy JSON to carry
peak resident memory, device-count metadata, positive timings/rates, matching
scan sizes, and numerical agreement between serial, device-parallel, and
multiprocess `D11` outputs. It deliberately does not require a speedup, since
the correct interpretation is hardware- and workload-dependent.

The prepared-geometry reuse profile is checked as a publication/example lane.
The smoke test regenerates a tiny artifact and requires exact prepared/direct
agreement, compiled/direct agreement below `1e-7`, and a positive compiled
steady-state speedup. The committed paper artifact is used for documentation
and performance guidance, not as a physics validation gate.

## Running The Suite

Full local suite:

```bash
python -m pytest -q
```

Coverage:

```bash
python -m pytest --cov=src/ntx --cov-report=term-missing -q
```

Module-wise coverage summary from a `coverage json` file:

```bash
coverage json -o coverage.json
python scripts/build_coverage_report.py \
  --json-input coverage.json \
  --json-output coverage-report.json \
  --text-output coverage-report.txt
```

Lint and type-check:

```bash
python -m ruff check .
python -m mypy src/ntx
```

Docs build:

```bash
python -m sphinx -b html docs docs/_build/html
```

## What The Tests Cover

Representative test groups:

- geometry and Fourier series:
  - `tests/test_geometry.py`
  - `tests/test_vmec.py`
  - `tests/test_vmec_jax_vmec.py`
- operator assembly:
  - `tests/test_operators.py`
- dense solver and scans:
  - `tests/test_solver.py`
  - `tests/test_parallel.py`
  - `tests/test_multiprocess_parallel.py`
- CLI and file-backed outputs:
  - `tests/test_cli.py`
  - `tests/test_inputfiles.py`
- NEOPAX mapping:
  - `tests/test_neopax_adapter.py`
  - `tests/test_neopax_arrays.py`
- autodiff and optimization helpers:
  - `tests/test_autodiff.py`
  - `tests/test_autodiff_profile_uncertainty_example.py`
  - `tests/test_bootstrap_current_robust_optimization_example.py`
  - `tests/test_file_backed_geometry_control_derivative_benchmark_example.py`
  - `tests/test_explicit_relaxed_boundary_current_derivative_benchmark_example.py`
- profile workflows:
  - `tests/test_profiles_unit.py`
  - `tests/test_profiles_workflows.py`
  - `tests/test_profile_force_reconstruction_audit_example.py`
- coverage and validation summaries:
  - `tests/test_build_coverage_report_script.py`
  - `tests/test_physics_gates.py`
  - `tests/test_benchmark_matrix.py`
- example and figure scripts:
  - `tests/test_make_publication_figures.py`
  - `tests/test_validation_summary_example.py`
  - `tests/test_bootstrap_current_optimization_example.py`

## CI Coverage

The GitHub Actions `tests` workflow now combines coverage across the 3.11 test
shards and publishes:

- raw `.coverage` data
- `coverage.json`
- `coverage-report.json`
- `coverage-report.txt`

The coverage report is intended to answer two different questions:

1. what the current repository-owned line coverage actually is by module,
2. which large files need restructuring or additional tests first.

It should not be interpreted as a physics-validation substitute. Physics gates
and literature benchmarks remain separate acceptance surfaces.

Recent hardening work used this report to target the refactored workflow
modules directly. The current full CI coverage report for the split lanes is
above the release threshold:

- overall repository-owned coverage is `99.0%`,
- `src/ntx/neopax.py`, `src/ntx/_neopax_io.py`, `src/ntx/_neopax_types.py`,
  and `src/ntx/_neopax_bridge.py` are now at or above `97%`,
- `src/ntx/_neopax_field.py`, the imported `vmec_jax`/`booz_xform_jax` field
  bridge, is now at `98.1%`,
- the split autodiff workflow owners (`src/ntx/_autodiff_workflows.py`,
  `src/ntx/_autodiff_inverse.py`, `src/ntx/_autodiff_derivatives.py`, and
  `src/ntx/_autodiff_profile.py`), `src/ntx/_profiles_eval.py`,
  `src/ntx/_profiles_transport.py`, and `src/ntx/_profiles_controls.py` are
  all covered by the fast unit/workflow lanes, and the split
  transport-closure owner is exercised by the same profile tests,
- `src/ntx/_solver_scan.py`, `src/ntx/parallel.py`, `src/ntx/cli.py`,
  `src/ntx/io.py`, and `src/ntx/database.py` are at or above `98%`.

Those gains come from narrow branch tests in the unit/workflow lanes, not from
adding slower benchmark execution to the default developer loop.

The coverage-report helper accepts both absolute and relative `src/ntx/...`
paths from `coverage json`, which keeps local reports and GitHub Actions
summaries comparable. Coverage is a release gate, not a replacement for the
literature-anchored physics gates and benchmark artifacts.

## GPU Validation

GPU-only smoke tests are marked and can be run with:

```bash
python -m pytest -m gpu -q
```

The runtime probes are:

```bash
python scripts/run_gpu_regression.py --output-json gpu-smoke-results.json
python scripts/profile_runtime.py --backend gpu --output-json runtime-profile-gpu.json
python scripts/profile_parallel_runtime.py --output-json parallel-runtime.json
python scripts/profile_multiprocess_runtime.py --backend gpu --workers 2
```

CI exercises `profile_parallel_runtime.py` with a reduced smoke profile
(`--num-cases 2 --grid 5,5,4`) so the shard checks serial/device-parallel
agreement without turning the default profiling run into a CI bottleneck.

## Cross-Checks Against Independent Workflows

NTX is designed to stand on its own, but it is still valuable to compare its
output against independent neoclassical workflows.

The repository therefore keeps:

- direct W7-X bootstrap-current convergence audits
- optional NEOPAX-coupled checks
- optional SFINCS-JAX-based consistency studies when that package is available
- profile-workflow regression tests split into:
  - fast unit/workflow tests
  - artifact-backed benchmark-family audits

These comparisons are used as trust-building validation, not as the definition
of NTX itself.

## Physics Gate Report

The tracked benchmark gates can be summarized from committed artifacts with:

```bash
python scripts/check_physics_gates.py
```

This is the fastest way to distinguish:

- analytical identities and exact-recovery gates,
- independent-code comparison gates,
- integrated-workflow transfer gates,
- and closure stress metrics that are monitored but not promoted to parity
  claims.

## What To Check Before Claiming A New Physics Result

Before publishing a new equilibrium scan or optimization result:

1. converge `N_xi` at the lowest collisionality used in the study
2. converge `N_theta` and `N_zeta` on `D31`
3. check `onsager_residual`
4. compare serial and parallel scan results on a subset
5. inspect the output file graphically with `plot_output_file.py`
6. add or update the benchmark-matrix entry before promoting the result
7. if the workflow feeds NEOPAX, regenerate the monoenergetic database and
   inspect the resulting radial profiles
8. if the workflow uses automatic differentiation, compare the reported
   derivative with centered finite differences on the same scalar output
9. if the workflow uses imported equilibrium or Boozer transformations, state
   whether the gate is projected-boundary, explicitly relaxed equilibrium, or
   implicit-equilibrium sensitivity