Testing And Quality Assurance

NTX is validated at four levels:

unit tests of geometry, operators, and solver algebra
regression tests of file-driven workflows and publication examples
imported-workflow tests for autodiff, NEOPAX, and JAX geometry backends
CPU/GPU runtime and smoke checks

The repository now treats these as separate execution lanes, not one monolithic developer loop:

fast PR lane:
- lint
- type checking
- sharded core unit/workflow/validation tests
- selected integration/example tests
- docs build
benchmark lane:
- literature reproductions
- long-running fixed-field and W7-X audits
- scaling/profiling studies
hardware lane:
- GPU smoke
- office workstation profiling

This split is intentional. It keeps pull-request feedback fast while still tracking literature-grade validation and throughput studies in reproducible scripts.

Current Testing Priorities

The fast lane has already reached the coverage target, so new tests should be chosen for scientific value first. Do not add slow examples only to increase the headline coverage number.

Near-term high-value gates are:

monoenergetic convergence ladders for D11, D31, D33, and onsager_residual on small repository-owned geometries,
Boozer-coordinate normalization identities such as \mathcal J B^2 = B_\zeta + \iota B_\theta before transport solves consume the geometry,
symmetric-limit checks where a constant-field Boozer surface has no radial transport channels but retains the finite parallel-conductivity branch,
artifact-backed reproductions of larger W7-X, precise-QS, and QI-family literature cases,
optional public VMEC family transport ladders that stay artifact-backed rather than running inside normal pull-request shards,
derivative audits that compare direct AD, prepared implicit-adjoint derivatives, forward-mode geometry controls, and centered finite differences on the same scalar outputs,
profile and bootstrap-current workflow tests that are run only after the underlying coefficient and derivative gates are already green,
explicit tests that fixed-field closure comparisons remain stress gates unless they also pass the integrated W7-X transfer gate.

Every new benchmark-like test must declare its lane before it is added to CI:

core_foundation: small algebra, geometry, operator, solver, and helper unit tests,
core_cli_workflows: small public API, CLI, namespace, packaging, and example-discovery tests,
core_io_workflows: input-file, profile-script, and VMEC scan workflow tests,
core_parallel_workflows: CPU/GPU script-dispatch and multiprocessing workflow tests,
core_neopax_workflows: imported-database mapping and HDF5 round-trip tests,
core_profile_audit_workflow: primitive-force audit artifact workflow test,
core_profile_basic_workflows: ambipolar-profile and profile-family tests,
core_profile_optimization_workflows: scalar and basis profile-control tests,
core_profile_transport_workflows: conservative profile-transport loop tests,
core_autodiff_uncertainty_workflow: autodiff uncertainty artifact workflow,
core_robust_bootstrap_workflow: robust-bootstrap artifact workflow,
core_validation: small validation, artifact-registry, and physics-gate tests,
integration_examples: representative imported workflow tests,
heavy_examples_profiles: slower profile examples,
heavy_examples_derivatives: local derivative benchmark examples,
heavy_examples_boundary: env-gated imported boundary/equilibrium derivative examples,
heavy_examples_publication: publication/manuscript figure examples,
manual benchmark: long literature reproduction or hardware profiling.

The GitHub Actions sharding is driven by the maintained manifest:

python scripts/test_lane_manifest.py --check
python scripts/test_lane_manifest.py core_foundation
python scripts/test_lane_manifest.py core_cli_workflows
python scripts/test_lane_manifest.py core_io_workflows
python scripts/test_lane_manifest.py core_parallel_workflows
python scripts/test_lane_manifest.py core_neopax_workflows
python scripts/test_lane_manifest.py core_profile_audit_workflow
python scripts/test_lane_manifest.py core_profile_basic_workflows
python scripts/test_lane_manifest.py core_profile_optimization_workflows
python scripts/test_lane_manifest.py core_profile_transport_workflows
python scripts/test_lane_manifest.py core_autodiff_uncertainty_workflow
python scripts/test_lane_manifest.py core_robust_bootstrap_workflow
python scripts/test_lane_manifest.py core_validation
python scripts/test_lane_manifest.py integration_examples
python scripts/test_lane_manifest.py heavy_examples_profiles
python scripts/test_lane_manifest.py heavy_examples_derivatives
python scripts/test_lane_manifest.py heavy_examples_boundary
python scripts/test_lane_manifest.py heavy_examples_publication

Any new tests/test_*.py file must be added to the lane manifest. Slow modules may be split by explicit pytest node ids, but every discovered test function in that file must be listed exactly once. This prevents new benchmark scripts from silently landing in the fast core shard.

The imported boundary/equilibrium example reruns are intentionally opt-in:

NTX_RUN_HEAVY_BOUNDARY_EXAMPLES=1 \
  python scripts/test_lane_manifest.py heavy_examples_boundary \
  | xargs python -m pytest -q -m "not gpu"

Normal CI still checks the committed artifacts through the benchmark matrix and manuscript-artifact tests. The expensive boundary reruns are used when updating those artifacts.

Each CI test-shard job has a ten-minute job timeout. The subprocess-based parallel profiling smoke tests also set explicit subprocess timeouts, so a device-discovery or multiprocessing hang fails as a diagnosable test error instead of stalling the whole workflow.

The first fast owned-geometry coefficient gate is in tests/test_physics_gates.py. It checks D11, D31, D33, the Onsager residual, and a coarse-to-fine angular-grid transfer on the analytic Boozer surface. Larger literature-family convergence ladders remain artifact-backed work rather than default pull-request tests. The current broad VMEC-family artifact is generated by examples/geometry_family_transport_convergence.py and is tracked through the benchmark matrix and physics-gate registry as a monitored stress diagnostic.

The angular sampling lane is split the same way. Fast tests verify Nyquist report semantics, the warning-only 2.25 oversampling recommendation, audit schema, residuals, and artifact generation on the repository fixture. The committed production artifact runs finite-beta QA, NCSX, and HSX outside the fast lane and requires the maximum recommended-grid error relative to the finest audit grid to remain below 1e-2. This stress gate does not replace the two-successive-grid acceptance rule.

The committed validation_summary.json artifact is now checked by the physics gate registry as the release-level monoenergetic convergence artifact. The gate requires the maximum of the DKES-style and VMEC finest plotted N_\xi errors to stay below 2.5e-1; the example test also verifies that the plotted convergence ladder decreases toward the finest reference point.

The same file also carries the symmetric-limit normalization gates: a constant-field Boozer surface must have zero radial transport channels, finite parallel conductivity, exact agreement with the D33_spitzer branch, and the expected inverse-collisionality scaling of that Spitzer branch.

The operator unit tests protect the finite Legendre source projection directly: the magnetic-drift drive must enter only the k=0 and k=2 rows with the 2/3 and 1/3 weights used by the monoenergetic moment equations, while the parallel-conductivity drive must enter only the k=1 row as the physical B.

The vmex/booz_xform_jax backend unit tests now also protect the imported Boozer handedness convention directly. Scalar and profile forms must leave B_\zeta + \iota B_\theta >= 0, matching the file-backed loader before the solver evaluates the Boozer Jacobian and drift source terms.

The direct boozmn loader also has a half-grid radial-coordinate gate. Packed Boozer spectra and Boozer radial profiles are selected with s_in, s_b, or jlist = compute_surfs + 2; the full-grid phi_b profile is metadata and must not shift the selected surface. tests/test_boozmn.py protects the fixture half-grid selection, and examples/boozmn_same_coordinate_roundtrip_audit.py provides the artifact-backed VMEC-to-Boozer-file round trip.

The same audit is also committed on an optimized finite-beta QA wout. The accepted file-backed path is the finalized wout magnetic-channel transform, not reconstruction of an equilibrium state from output coefficients. tests/test_vmex_backend.py protects the current root-level read_wout API, the profile_source="wout"/automatic routes, and the traceable core Boozer-table adapter, while boozmn_finite_beta_wout_roundtrip_audit.json keeps the resulting same-coordinate D11/D31/D13/D33 mismatch below 1e-6.

The operator unit tests also include a derivative gate for the implicit-adjoint lane: the hand-coded dD_k/dnu_hat and dD_k/depsi_hat blocks must match JAX differentiation of the assembled Legendre-space operator. This is a fast way to protect sensitivity-analysis, inverse-design, and uncertainty workflows from normalization drift in the collisionality and radial-electric-field terms.

The committed prepared-derivative benchmark is also checked as an artifact gate: derivative_path_benchmark.json must keep the prepared custom-VJP derivative within 1e-4 relative mismatch of direct reverse-mode. The benchmark still reports speedup and XLA temporary memory, but CI treats agreement as the release claim. tests/test_solver_derivative_audit.py separately requires independent full primal and algebraic-transpose residuals before the public audit marks a derivative valid. The committed low-collisionality probe demonstrates the opposite case: derivative methods agree, but the primal residual rejects the claim. The same artifact requires two successive resolution refinements below 1e-1 for the non-degenerate dD33/dnu_hat audit, with every resolution independently passing primal, transpose, and derivative-method gates.

The committed geometry and boundary-control derivative artifacts are now also checked by the same physics-gate registry. The passing gates cover the owned analytic geometry-control audit (2e-4), file-backed Boozer/VMEC samples (5e-4), boundary-projected forward-mode current derivatives (1e-5), and explicit-relaxed QA/QH boundary-to-current derivatives (1e-4). The implicit-equilibrium derivative artifact remains a monitored non-shipping diagnostic because only the equilibrium-volume objective currently matches centered finite differences and the residual history does not contract.

The physics-facing gate structure is documented separately in physics-gates.md. The test suite and benchmark scripts are meant to enforce that gate hierarchy, not to replace it.

The maintained claim-to-artifact map is documented in benchmark-matrix.md and can be regenerated with:

python scripts/build_benchmark_matrix.py

The committed performance-scaling artifacts are also schema checked in the validation lane. That check requires each CPU/GPU smoke/heavy JSON to carry peak resident memory, device-count metadata, positive timings/rates, matching scan sizes, and numerical agreement between serial, device-parallel, and multiprocess D11 outputs. It deliberately does not require a speedup, since the correct interpretation is hardware- and workload-dependent.

The prepared-geometry reuse profile is checked as a publication/example lane. The smoke test regenerates a tiny artifact and requires exact prepared/direct agreement, compiled/direct agreement below 1e-7, and a positive compiled steady-state speedup. The committed paper artifact is used for documentation and performance guidance, not as a physics validation gate.

Running The Suite

Full local suite:

python -m pytest -q

Coverage:

python -m pytest --cov=src/ntx --cov-report=term-missing -q

Module-wise coverage summary from a coverage json file:

coverage json -o coverage.json
python scripts/build_coverage_report.py \
  --json-input coverage.json \
  --json-output coverage-report.json \
  --text-output coverage-report.txt

Lint and type-check:

python -m ruff check .
python -m mypy src/ntx

Docs build:

python -m sphinx -b html docs docs/_build/html

What The Tests Cover

Representative test groups:

geometry and Fourier series:
- tests/test_geometry.py
- tests/test_vmec.py
- tests/test_vmex_vmec.py
operator assembly:
- tests/test_operators.py
dense solver and scans:
- tests/test_solver.py
- tests/test_parallel.py
- tests/test_multiprocess_parallel.py
CLI and file-backed outputs:
- tests/test_cli.py
- tests/test_inputfiles.py
NEOPAX mapping:
- tests/test_neopax_adapter.py
- tests/test_neopax_arrays.py
autodiff and optimization helpers:
- tests/test_autodiff.py
- tests/test_autodiff_profile_uncertainty_example.py
- tests/test_bootstrap_current_robust_optimization_example.py
- tests/test_file_backed_geometry_control_derivative_benchmark_example.py
- tests/test_explicit_relaxed_boundary_current_derivative_benchmark_example.py
profile workflows:
- tests/test_profiles_unit.py
- tests/test_profiles_workflows.py
- tests/test_profile_force_reconstruction_audit_example.py
coverage and validation summaries:
- tests/test_build_coverage_report_script.py
- tests/test_physics_gates.py
- tests/test_benchmark_matrix.py
example and figure scripts:
- tests/test_make_publication_figures.py
- tests/test_validation_summary_example.py
- tests/test_bootstrap_current_optimization_example.py

CI Coverage

The GitHub Actions tests workflow now combines coverage across the 3.11 test shards and publishes:

raw .coverage data
coverage.json
coverage-report.json
coverage-report.txt

The coverage report is intended to answer two different questions:

what the current repository-owned line coverage actually is by module,
which large files need restructuring or additional tests first.

It should not be interpreted as a physics-validation substitute. Physics gates and literature benchmarks remain separate acceptance surfaces.

Recent hardening work used this report to target the refactored workflow modules directly. The current full CI coverage report for the split lanes is above the release threshold:

overall repository-owned coverage is 99.0%,
src/ntx/neopax.py, src/ntx/_neopax_io.py, src/ntx/_neopax_types.py, and src/ntx/_neopax_bridge.py are now at or above 97%,
src/ntx/_neopax_field.py, the imported vmex/booz_xform_jax field bridge, is now at 98.1%,
the split autodiff workflow owners (src/ntx/_autodiff_workflows.py, src/ntx/_autodiff_inverse.py, src/ntx/_autodiff_derivatives.py, and src/ntx/_autodiff_profile.py), src/ntx/_profiles_eval.py, src/ntx/_profiles_transport.py, and src/ntx/_profiles_controls.py are all covered by the fast unit/workflow lanes, and the split transport-closure owner is exercised by the same profile tests,
the split scan owners src/ntx/_solver_scan_core.py, src/ntx/_solver_scan_execution.py, and src/ntx/_solver_scan_parallel.py, plus src/ntx/parallel.py, src/ntx/cli.py, src/ntx/io.py, and src/ntx/database.py are at or above 98%.

Those gains come from narrow branch tests in the unit/workflow lanes, not from adding slower benchmark execution to the default developer loop.

The coverage-report helper accepts both absolute and relative src/ntx/... paths from coverage json, which keeps local reports and GitHub Actions summaries comparable. Coverage is a release gate, not a replacement for the literature-anchored physics gates and benchmark artifacts.

GPU Validation

GPU-only smoke tests are marked and can be run with:

python -m pytest -m gpu -q

The runtime probes are:

python scripts/run_gpu_regression.py --output-json gpu-smoke-results.json
python scripts/profile_runtime.py --backend gpu --output-json runtime-profile-gpu.json
python scripts/profile_parallel_runtime.py --output-json parallel-runtime.json
python scripts/profile_multiprocess_runtime.py --backend gpu --workers 2

CI exercises profile_parallel_runtime.py with a reduced smoke profile (--num-cases 2 --grid 5,5,4) so the shard checks serial/device-parallel agreement without turning the default profiling run into a CI bottleneck.

Cross-Checks Against Independent Workflows

NTX is designed to stand on its own, but it is still valuable to compare its output against independent neoclassical workflows.

The repository therefore keeps:

direct W7-X bootstrap-current convergence audits
optional NEOPAX-coupled checks
optional SFINCS-JAX-based consistency studies when that package is available
profile-workflow regression tests split into:
- fast unit/workflow tests
- artifact-backed benchmark-family audits

These comparisons are used as trust-building validation, not as the definition of NTX itself.

Physics Gate Report

The tracked benchmark gates can be summarized from committed artifacts with:

python scripts/check_physics_gates.py

This is the fastest way to distinguish:

analytical identities and exact-recovery gates,
independent-code comparison gates,
integrated-workflow transfer gates,
and closure stress metrics that are monitored but not promoted to parity claims.

What To Check Before Claiming A New Physics Result

Before publishing a new equilibrium scan or optimization result:

converge N_xi at the lowest collisionality used in the study
converge N_theta and N_zeta on D31
check onsager_residual
compare serial and parallel scan results on a subset
inspect the output file graphically with plot_output_file.py
add or update the benchmark-matrix entry before promoting the result
if the workflow feeds NEOPAX, regenerate the monoenergetic database and inspect the resulting radial profiles
if the workflow uses automatic differentiation, compare the reported derivative with centered finite differences on the same scalar output
if the workflow uses imported equilibrium or Boozer transformations, state whether the gate is projected-boundary, explicitly relaxed equilibrium, or implicit-equilibrium sensitivity