Testing And Quality Assurance
NTX is validated at four levels:
unit tests of geometry, operators, and solver algebra
regression tests of file-driven workflows and publication examples
imported-workflow tests for autodiff, NEOPAX, and JAX geometry backends
CPU/GPU runtime and smoke checks
The repository now treats these as separate execution lanes, not one monolithic developer loop:
fast PR lane:
lint
type checking
sharded core unit/workflow/validation tests
selected integration/example tests
docs build
benchmark lane:
literature reproductions
long-running fixed-field and W7-X audits
scaling/profiling studies
hardware lane:
GPU smoke
office workstation profiling
This split is intentional. It keeps pull-request feedback fast while still tracking literature-grade validation and throughput studies in reproducible scripts.
Current Testing Priorities
The fast lane has already reached the coverage target, so new tests should be chosen for scientific value first. Do not add slow examples only to increase the headline coverage number.
Near-term high-value gates are:
monoenergetic convergence ladders for
D11,D31,D33, andonsager_residualon small repository-owned geometries,Boozer-coordinate normalization identities such as
\mathcal J B^2 = B_\zeta + \iota B_\thetabefore transport solves consume the geometry,symmetric-limit checks where a constant-field Boozer surface has no radial transport channels but retains the finite parallel-conductivity branch,
artifact-backed reproductions of larger W7-X, precise-QS, and QI-family literature cases,
optional public VMEC family transport ladders that stay artifact-backed rather than running inside normal pull-request shards,
derivative audits that compare direct AD, prepared implicit-adjoint derivatives, forward-mode geometry controls, and centered finite differences on the same scalar outputs,
profile and bootstrap-current workflow tests that are run only after the underlying coefficient and derivative gates are already green,
explicit tests that fixed-field closure comparisons remain stress gates unless they also pass the integrated W7-X transfer gate.
Every new benchmark-like test must declare its lane before it is added to CI:
core_foundation: small algebra, geometry, operator, solver, and helper unit tests,core_cli_workflows: small public API, CLI, namespace, packaging, and example-discovery tests,core_io_workflows: input-file, profile-script, and VMEC scan workflow tests,core_parallel_workflows: CPU/GPU script-dispatch and multiprocessing workflow tests,core_neopax_workflows: imported-database mapping and HDF5 round-trip tests,core_profile_audit_workflow: primitive-force audit artifact workflow test,core_profile_basic_workflows: ambipolar-profile and profile-family tests,core_profile_optimization_workflows: scalar and basis profile-control tests,core_profile_transport_workflows: conservative profile-transport loop tests,core_autodiff_uncertainty_workflow: autodiff uncertainty artifact workflow,core_robust_bootstrap_workflow: robust-bootstrap artifact workflow,core_validation: small validation, artifact-registry, and physics-gate tests,integration_examples: representative imported workflow tests,heavy_examples_profiles: slower profile examples,heavy_examples_derivatives: local derivative benchmark examples,heavy_examples_boundary: env-gated imported boundary/equilibrium derivative examples,heavy_examples_publication: publication/manuscript figure examples,manual benchmark: long literature reproduction or hardware profiling.
The GitHub Actions sharding is driven by the maintained manifest:
python scripts/test_lane_manifest.py --check
python scripts/test_lane_manifest.py core_foundation
python scripts/test_lane_manifest.py core_cli_workflows
python scripts/test_lane_manifest.py core_io_workflows
python scripts/test_lane_manifest.py core_parallel_workflows
python scripts/test_lane_manifest.py core_neopax_workflows
python scripts/test_lane_manifest.py core_profile_audit_workflow
python scripts/test_lane_manifest.py core_profile_basic_workflows
python scripts/test_lane_manifest.py core_profile_optimization_workflows
python scripts/test_lane_manifest.py core_profile_transport_workflows
python scripts/test_lane_manifest.py core_autodiff_uncertainty_workflow
python scripts/test_lane_manifest.py core_robust_bootstrap_workflow
python scripts/test_lane_manifest.py core_validation
python scripts/test_lane_manifest.py integration_examples
python scripts/test_lane_manifest.py heavy_examples_profiles
python scripts/test_lane_manifest.py heavy_examples_derivatives
python scripts/test_lane_manifest.py heavy_examples_boundary
python scripts/test_lane_manifest.py heavy_examples_publication
Any new tests/test_*.py file must be added to the lane manifest. Slow modules
may be split by explicit pytest node ids, but every discovered test function in
that file must be listed exactly once. This prevents new benchmark scripts from
silently landing in the fast core shard.
The imported boundary/equilibrium example reruns are intentionally opt-in:
NTX_RUN_HEAVY_BOUNDARY_EXAMPLES=1 \
python scripts/test_lane_manifest.py heavy_examples_boundary \
| xargs python -m pytest -q -m "not gpu"
Normal CI still checks the committed artifacts through the benchmark matrix and manuscript-artifact tests. The expensive boundary reruns are used when updating those artifacts.
Each CI test-shard job has a ten-minute job timeout. The subprocess-based parallel profiling smoke tests also set explicit subprocess timeouts, so a device-discovery or multiprocessing hang fails as a diagnosable test error instead of stalling the whole workflow.
The first fast owned-geometry coefficient gate is in
tests/test_physics_gates.py. It checks D11, D31, D33, the Onsager
residual, and a coarse-to-fine angular-grid transfer on the analytic Boozer
surface. Larger literature-family convergence ladders remain artifact-backed
work rather than default pull-request tests. The current broad VMEC-family
artifact is generated by examples/geometry_family_transport_convergence.py
and is tracked through the benchmark matrix and physics-gate registry as a
monitored stress diagnostic.
The committed validation_summary.json artifact is now checked by the physics
gate registry as the release-level monoenergetic convergence artifact. The gate
requires the maximum of the DKES-style and VMEC finest plotted N_\xi errors to
stay below 2.5e-1; the example test also verifies that the plotted convergence
ladder decreases toward the finest reference point.
The same file also carries the symmetric-limit normalization gates: a
constant-field Boozer surface must have zero radial transport channels, finite
parallel conductivity, exact agreement with the D33_spitzer branch, and the
expected inverse-collisionality scaling of that Spitzer branch.
The operator unit tests protect the finite Legendre source projection directly:
the magnetic-drift drive must enter only the k=0 and k=2 rows with the
2/3 and 1/3 weights used by the monoenergetic moment equations, while the
parallel-conductivity drive must enter only the k=1 row as the physical B.
The vmec_jax/booz_xform_jax backend unit tests now also protect the
imported Boozer handedness convention directly. Scalar and profile forms must
leave B_\zeta + \iota B_\theta >= 0, matching the file-backed loader before
the solver evaluates the Boozer Jacobian and drift source terms.
The direct boozmn loader also has a half-grid radial-coordinate gate. Packed
Boozer spectra and Boozer radial profiles are selected with s_in, s_b, or
jlist = compute_surfs + 2; the full-grid phi_b profile is metadata and must
not shift the selected surface. tests/test_boozmn.py protects the fixture
half-grid selection, and examples/boozmn_same_coordinate_roundtrip_audit.py
provides the artifact-backed VMEC-to-Boozer-file round trip.
The same audit is also committed on an optimized finite-beta QA wout that
uses a current-profile representation not currently re-evaluated by the
optional differentiable VMEC-state path. The accepted path for that case is the
finalized wout magnetic-channel transform, not a reconstructed input-profile
state. tests/test_vmec_jax_backend.py protects the profile_source="wout"
and automatic fallback branches, while
boozmn_finite_beta_wout_roundtrip_audit.json keeps the resulting
same-coordinate D11/D31/D13/D33 mismatch below 1e-6.
The operator unit tests also include a derivative gate for the implicit-adjoint
lane: the hand-coded dD_k/dnu_hat and dD_k/depsi_hat blocks must match
JAX differentiation of the assembled Legendre-space operator. This is a fast
way to protect sensitivity-analysis, inverse-design, and uncertainty workflows
from normalization drift in the collisionality and radial-electric-field terms.
The committed prepared-derivative benchmark is also checked as an artifact gate:
derivative_path_benchmark.json must keep the prepared custom-VJP derivative
within 1e-4 relative mismatch of direct reverse-mode. The benchmark still
reports speedup, but CI treats agreement as the release claim.
The committed geometry and boundary-control derivative artifacts are now also
checked by the same physics-gate registry. The passing gates cover the owned
analytic geometry-control audit (2e-4), file-backed Boozer/VMEC samples
(5e-4), boundary-projected forward-mode current derivatives (1e-5), and
explicit-relaxed QA/QH boundary-to-current derivatives (1e-4). The
implicit-equilibrium derivative artifact remains a monitored non-shipping
diagnostic because only the equilibrium-volume objective currently matches
centered finite differences and the residual history does not contract.
The physics-facing gate structure is documented separately in
physics-gates.md. The test suite and benchmark scripts are
meant to enforce that gate hierarchy, not to replace it.
The maintained claim-to-artifact map is documented in
benchmark-matrix.md and can be regenerated with:
python scripts/build_benchmark_matrix.py
The committed performance-scaling artifacts are also schema checked in the
validation lane. That check requires each CPU/GPU smoke/heavy JSON to carry
peak resident memory, device-count metadata, positive timings/rates, matching
scan sizes, and numerical agreement between serial, device-parallel, and
multiprocess D11 outputs. It deliberately does not require a speedup, since
the correct interpretation is hardware- and workload-dependent.
The prepared-geometry reuse profile is checked as a publication/example lane.
The smoke test regenerates a tiny artifact and requires exact prepared/direct
agreement, compiled/direct agreement below 1e-7, and a positive compiled
steady-state speedup. The committed paper artifact is used for documentation
and performance guidance, not as a physics validation gate.
Running The Suite
Full local suite:
python -m pytest -q
Coverage:
python -m pytest --cov=src/ntx --cov-report=term-missing -q
Module-wise coverage summary from a coverage json file:
coverage json -o coverage.json
python scripts/build_coverage_report.py \
--json-input coverage.json \
--json-output coverage-report.json \
--text-output coverage-report.txt
Lint and type-check:
python -m ruff check .
python -m mypy src/ntx
Docs build:
python -m sphinx -b html docs docs/_build/html
What The Tests Cover
Representative test groups:
geometry and Fourier series:
tests/test_geometry.pytests/test_vmec.pytests/test_vmec_jax_vmec.py
operator assembly:
tests/test_operators.py
dense solver and scans:
tests/test_solver.pytests/test_parallel.pytests/test_multiprocess_parallel.py
CLI and file-backed outputs:
tests/test_cli.pytests/test_inputfiles.py
NEOPAX mapping:
tests/test_neopax_adapter.pytests/test_neopax_arrays.py
autodiff and optimization helpers:
tests/test_autodiff.pytests/test_autodiff_profile_uncertainty_example.pytests/test_bootstrap_current_robust_optimization_example.pytests/test_file_backed_geometry_control_derivative_benchmark_example.pytests/test_explicit_relaxed_boundary_current_derivative_benchmark_example.py
profile workflows:
tests/test_profiles_unit.pytests/test_profiles_workflows.pytests/test_profile_force_reconstruction_audit_example.py
coverage and validation summaries:
tests/test_build_coverage_report_script.pytests/test_physics_gates.pytests/test_benchmark_matrix.py
example and figure scripts:
tests/test_make_publication_figures.pytests/test_validation_summary_example.pytests/test_bootstrap_current_optimization_example.py
CI Coverage
The GitHub Actions tests workflow now combines coverage across the 3.11 test
shards and publishes:
raw
.coveragedatacoverage.jsoncoverage-report.jsoncoverage-report.txt
The coverage report is intended to answer two different questions:
what the current repository-owned line coverage actually is by module,
which large files need restructuring or additional tests first.
It should not be interpreted as a physics-validation substitute. Physics gates and literature benchmarks remain separate acceptance surfaces.
Recent hardening work used this report to target the refactored workflow modules directly. The current full CI coverage report for the split lanes is above the release threshold:
overall repository-owned coverage is
99.0%,src/ntx/neopax.py,src/ntx/_neopax_io.py,src/ntx/_neopax_types.py, andsrc/ntx/_neopax_bridge.pyare now at or above97%,src/ntx/_neopax_field.py, the importedvmec_jax/booz_xform_jaxfield bridge, is now at98.1%,the split autodiff workflow owners (
src/ntx/_autodiff_workflows.py,src/ntx/_autodiff_inverse.py,src/ntx/_autodiff_derivatives.py, andsrc/ntx/_autodiff_profile.py),src/ntx/_profiles_eval.py,src/ntx/_profiles_transport.py, andsrc/ntx/_profiles_controls.pyare all covered by the fast unit/workflow lanes, and the split transport-closure owner is exercised by the same profile tests,src/ntx/_solver_scan.py,src/ntx/parallel.py,src/ntx/cli.py,src/ntx/io.py, andsrc/ntx/database.pyare at or above98%.
Those gains come from narrow branch tests in the unit/workflow lanes, not from adding slower benchmark execution to the default developer loop.
The coverage-report helper accepts both absolute and relative src/ntx/...
paths from coverage json, which keeps local reports and GitHub Actions
summaries comparable. Coverage is a release gate, not a replacement for the
literature-anchored physics gates and benchmark artifacts.
GPU Validation
GPU-only smoke tests are marked and can be run with:
python -m pytest -m gpu -q
The runtime probes are:
python scripts/run_gpu_regression.py --output-json gpu-smoke-results.json
python scripts/profile_runtime.py --backend gpu --output-json runtime-profile-gpu.json
python scripts/profile_parallel_runtime.py --output-json parallel-runtime.json
python scripts/profile_multiprocess_runtime.py --backend gpu --workers 2
CI exercises profile_parallel_runtime.py with a reduced smoke profile
(--num-cases 2 --grid 5,5,4) so the shard checks serial/device-parallel
agreement without turning the default profiling run into a CI bottleneck.
Cross-Checks Against Independent Workflows
NTX is designed to stand on its own, but it is still valuable to compare its output against independent neoclassical workflows.
The repository therefore keeps:
direct W7-X bootstrap-current convergence audits
optional NEOPAX-coupled checks
optional SFINCS-JAX-based consistency studies when that package is available
profile-workflow regression tests split into:
fast unit/workflow tests
artifact-backed benchmark-family audits
These comparisons are used as trust-building validation, not as the definition of NTX itself.
Physics Gate Report
The tracked benchmark gates can be summarized from committed artifacts with:
python scripts/check_physics_gates.py
This is the fastest way to distinguish:
analytical identities and exact-recovery gates,
independent-code comparison gates,
integrated-workflow transfer gates,
and closure stress metrics that are monitored but not promoted to parity claims.
What To Check Before Claiming A New Physics Result
Before publishing a new equilibrium scan or optimization result:
converge
N_xiat the lowest collisionality used in the studyconverge
N_thetaandN_zetaonD31check
onsager_residualcompare serial and parallel scan results on a subset
inspect the output file graphically with
plot_output_file.pyadd or update the benchmark-matrix entry before promoting the result
if the workflow feeds NEOPAX, regenerate the monoenergetic database and inspect the resulting radial profiles
if the workflow uses automatic differentiation, compare the reported derivative with centered finite differences on the same scalar output
if the workflow uses imported equilibrium or Boozer transformations, state whether the gate is projected-boundary, explicitly relaxed equilibrium, or implicit-equilibrium sensitivity