SFINCS-JAX RHSMode=1 Profile-Current Handoff

This note records the finite-beta RHSMode=1 profile-current diagnostics used by the NTX validation lane. The coefficient-level finite-beta transport-matrix ladder is already below the order-1e-1 coefficient gate at the profiled radii; the same-geometry profile-current calculation is now fast enough to run the pitch/velocity/radial convergence ladder used by the reduced-closure stress gate.

Goal

Run SFINCS-JAX, Redl, and NTX+NEOPAX on the same owned finite-beta QA VMEC geometry and analytic profile contract, then use the completed RHSMode=1 SFINCS-JAX profile-current outputs to separate:

  • profile-current normalization errors,

  • radial/profile interpolation errors,

  • reduced momentum-closure errors,

  • and raw solver/convergence errors.

The finite-beta bootstrap-current comparison is closed as a reduced-closure stress benchmark. It is not promoted as a broad full-collision parity claim.

Physics and Normalization Contract

The owned deck is generated by examples/owned_finite_beta_sfincs_jax_profile_current_audit.py.

Key settings:

  • finite-beta QA VMEC wout: /Users/rogeriojorge/local/single_stage_optimization_finite_beta/test/wout_LandremanPaul2021_QA_lowres_pressure_current.nc

  • RHSMode = 1

  • geometryScheme = 5

  • inputRadialCoordinate = 3

  • inputRadialCoordinateForGradients = 3

  • includeXDotTerm = .true.

  • includeElectricFieldTermInXiDot = .true.

  • useDKESExBDrift = .false.

  • includePhi1 = .false.

  • dPhiHatdrN = 0.0

  • ion/electron species order: Zs = 1, -1

  • mHats = 2, 1/1836.15267343

  • density and temperature are normalized as nHat = n / 1e20 and THat = T / 1 keV

  • profile gradients are with respect to rN = r/a = rho; SFINCS-JAX converts internally to the radial coordinate used by the solver.

The SFINCS-JAX current observable is converted to SI units as

<J.B>/sqrt(<B^2>) [A m^-2]
  = FSABjHatOverRootFSAB2 * e * 1e20 * sqrt(2 * 1keV / m_p)

For the current profiles here, the scale is 7.012642439557152e6 A m^-2.

The SFINCS-JAX HDF5 output uses

FSABjHat = sum_s Z_s * FSABFlow_s
FSABjHatOverRootFSAB2 = FSABjHat / sqrt(FSABHat2)

which matches the normalization used by the NTX diagnostic script.

Completed Same-Contract Data

Refresh status on 2026-05-01:

  • SFINCS-JAX worktree: /Users/rogeriojorge/local/tests/sfincs_jax

  • SFINCS-JAX version: 1.1.0

  • SFINCS-JAX commit: df0c70d (origin/main)

  • NTX used NTX_SFINCS_JAX_ROOT=/Users/rogeriojorge/local/tests/sfincs_jax and JAX_ENABLE_X64=True for the reruns.

Committed finite-beta RHSMode=1 low-resolution profile-current artifact:

  • JSON: docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.json

  • PNG/PDF: docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.{png,pdf}

  • grid: Ntheta=13, Nzeta=15, Nxi=8, Nx=5

  • radii: rho = 1/7, 0.5, 13/14

  • nu_n = 8.31565e-3

Current status from that artifact:

  • completed RHSMode=1 current points: 3

  • max SFINCS-JAX vs Redl current relative difference: 0.8478018522703108

  • max SFINCS-JAX vs NTX+NEOPAX current relative difference: 0.8740375383375442

  • max NTX+NEOPAX vs Redl current relative difference: 0.21926076611238907

  • all completed SFINCS-JAX solver residual gates pass

  • maximum true residual over target: 1.2527267276319048e-4

At rho=1/7, the completed low-resolution current is

FSABjHatOverRootFSAB2 = -0.44600080476476256
<J.B>/sqrt(<B^2>)     = -3.1276441715700175e6 A m^-2
NIterations           = 1

The low-resolution current values are unchanged by the optimized SFINCS-JAX solver-policy update. This confirms that the smoke-grid current gap is not a linear-solver residual artifact.

Profiling Runs

The 2026-05-01 rerun used SFINCS-JAX default RHSMode=1 solver policy unless noted otherwise:

JAX_ENABLE_X64=True
PYTHONPATH=/Users/rogeriojorge/local/tests/sfincs_jax:$PYTHONPATH

The NTX profile-current audit now calls write-output --compute-solution and writes sfincsOutput.solver_trace.json beside each HDF5 output. The HDF5 summary must include linearSolverAccepted=1, linearSolverTrueResidualConverged=1, and linearSolverResidualNorm <= linearSolverResidualTarget before a current point is treated as a numerically converged comparison.

Local CPU, 13 x 15 x 8, Nx=5

The committed three-radius smoke artifact now completes through the normal SFINCS-JAX CLI in 24.7 s total on local CPU. All three points use the auto policy and pass the true-residual gate. The current amplitudes remain the same as before the solver-policy update.

Local CPU, 17 x 21 x 12, Nx=5, Optimized Main

Generated single-point deck:

python examples/owned_finite_beta_sfincs_jax_profile_current_audit.py \
  --rho 0.14285714285714285 --nu-n 0.00831565 \
  --n-theta 17 --n-zeta 21 --n-xi 12 --nx 5 \
  --run-sfincs-jax \
  --output-dir examples/outputs/sfincs_jax_v1p1_profile_current/prod_17x21x12_deck \
  --output-prefix docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12

Result:

  • HDF5 output completes through the normal CLI path in 9.90 s

  • committed lightweight summary artifact: docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12.{json,png,pdf}

  • max resident set size: 1.55 GB

  • solver method: sparse_pc_gmres

  • true residual over target: 8.45200712991399e-7

  • output:

FSABjHatOverRootFSAB2    = -1.3120713644649011
<J.B>/sqrt(<B^2>)        = -9.201087334174225e6 A m^-2

Comparison to the same-profile targets at rho=1/7:

  • SFINCS-JAX vs Redl relative difference: 0.5522545492579316

  • SFINCS-JAX vs NTX+NEOPAX relative difference: 0.6294362315512264

  • NTX+NEOPAX vs Redl relative difference: 0.20828178269124106

This closes the previous 17 x 21 x 12 residual/runtime blocker. It does not close finite-beta bootstrap-current parity because the current amplitude remains far from both reference currents at this pitch truncation.

Local CPU, 25 x 31 x 17, Nx=11

The three-radius production ladder:

  • completes in 383.16 s wall time;

  • uses sparse_pc_gmres at every radius;

  • reaches a maximum true residual over target of 2.058374287457432e-5;

  • reaches max SFINCS-JAX vs Redl current relative difference 1.2860477350497015;

  • reaches max SFINCS-JAX vs NTX+NEOPAX current relative difference 1.442947235410411.

This proves the remaining profile-current discrepancy is not caused by a failed linear solve. It is a pitch truncation / collision-model / physical-branch audit.

Pitch-Resolution Audit

The new artifact docs/_static/owned_finite_beta_sfincs_jax_profile_current_resolution_audit.{json,png,pdf} summarizes a fixed-radius Nxi scan at rho=1/7, Ntheta=17, Nzeta=21, Nx=5.

Key metrics:

  • scan rows: 18

  • solver-converged rows: 17

  • best SFINCS-JAX vs Redl relative difference: 0.024877277870714646

  • best SFINCS-JAX vs NTX+NEOPAX relative difference: 0.015201711479977497

  • high-Nxi even/odd tail relative gap: 0.13232636827032285

  • Redl 1e-1 gate pass count: 2

  • NTX+NEOPAX 1e-1 gate pass count: 4

The scan shows a real terminal-Legendre-mode parity split. Even Nxi values move through both reference scales, while odd Nxi values sit on a larger current branch. The adjacent high-Nxi gap is 1.323e-1, which is accepted under the documented 1.5e-1 reduced-closure stress tolerance.

Full-Collision Probe

A same-grid collisionOperator=0 probe at 17 x 21 x 20, Nx=5 was attempted as a physics discriminator. It timed out after 901.76 s, used about 9.97 GB max RSS, and did not write a completed current output. The PAS profile-current branch is therefore the only completed SFINCS-JAX RHSMode=1 finite-beta current reference in this NTX artifact set.

Production RHSMode=3 Coefficient Ladder, Optimized Main

The production coefficient ladder was rerun through the same clean SFINCS-JAX main checkout:

  • grid: 35 x 43 x 48

  • six completed same-grid points

  • max NTX/SFINCS-JAX L13/L31/L33 relative difference: 0.02064719610195181

  • coefficient gate: 1e-1, passed

  • current-conditioned precision gate: still failed, with maximum precision gap 29.948023011811134

  • mean production point runtime: 5.999091423504676 s

This closes the broad finite-beta coefficient-resolution lane for the current owned QA case. The remaining finite-beta current work is localized to the profile-current closure/observable layer, pitch Legendre truncation, and the full-collision production path.

Current Bottleneck Hypotheses

  1. The remaining finite-beta profile-current gap is not explained by the RHSMode=3 monoenergetic coefficient bridge. The optimized production coefficient ladder remains below 2.1e-2.

  2. The 17 x 21 x 12, Nx=5 RHSMode=1 profile-current point changes the current amplitude substantially relative to the 13 x 15 x 8 smoke grid, so the profile-current observable is resolution sensitive.

  3. The 17 x 21 x 12, Nx=5 point now converges quickly with sparse-PC GMRES, so the old residual-stalling hypothesis is closed.

  4. The Nxi scan shows a high-order even/odd Legendre truncation split in the current observable.

  5. The full-collision RHSMode=1 point is not yet a practical production comparison at this grid on the local CPU.

Requested SFINCS-JAX Developer Actions

Recommended implementation work:

  • add a solve-complete/profile-finalization split so a successful HDF5 solve is not reported as a failed run if Perfetto/XPlane finalization fails;

  • add timeout-safe profiling that periodically flushes phase timings, residual history, and partial diagnostics outside the JAX profiler context;

  • write solver progress for RHSMode=1 every fixed number of Krylov iterations, including residual norm, preconditioner kind, fallback decisions, and elapsed time since last progress line;

  • time and report these phases separately: geometry/output field build, operator build, RHS assembly, constraint projection, preconditioner probe, Schur/PAS build, Krylov iterations, diagnostics, and HDF5 write;

  • expose a stable option to write partial HDF5 diagnostics on timeout when the state vector or current residual is available;

  • document the expected even/odd Nxi behavior for the finite Legendre hierarchy and retain the current 1.5e-1 reduced-closure stress policy;

  • make collisionOperator=0 RHSMode=1 feasible for this finite-beta deck or document the memory/runtime ceiling and recommended reduced test;

  • add a small finite-beta RHSMode=1 profile-current regression at 13 x 15 x 8, Nx=5, using the current observable above;

  • add a larger optional benchmark at 17 x 21 x 12, Nx=5 that is expected to complete HDF5 output within a documented budget and report whether the solve residual satisfies the requested tolerance.

Recommended validation sequence after those changes:

  1. Re-run 13 x 15 x 8, Nx=5 CPU and GPU, warm and cold, and confirm the current remains FSABjHatOverRootFSAB2 = -0.44600080476476256 at rho=1/7.

  2. Re-run 17 x 21 x 12, Nx=5 with the solver metadata gate and require sparse_pc_gmres or another true-residual-converged method.

  3. Re-run paired even/odd Nxi ladders when the pitch discretization or current observable changes, and require the gap to stay below 1.5e-1.

  4. Re-run the finite-beta radial/collisionality profile-current ladder when changing geometry loading, interpolation, or closure semantics.

  5. Keep the bootstrap-current figure scoped as a reduced-closure stress result unless a feasible full-collision RHSMode=1 branch is added.

NTX-Side Status

NTX now exposes Perfetto and device-memory profiling options in the prepared-geometry reuse profiler:

python examples/prepared_geometry_reuse_profile.py \
  --preset smoke --case-counts 3 \
  --output-prefix examples/outputs/ntx_prepared_geometry_profile/cpu_smoke \
  --trace-dir examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace \
  --perfetto \
  --device-memory-profile examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace/device_memory.prof

The smoke run reports:

  • warmup solve: 2.091 s

  • direct three-case path: 0.464 s

  • compiled steady three-case path: 0.000968 s

  • compiled steady speedup vs direct: 4.79e2

  • max compiled relative mismatch: 6.85e-11

  • peak RSS: about 1.83 GB

This supports the current NTX performance plan: keep fixed-shape prepared compiled closures as the first optimization target, and avoid broad XLA dump passes unless a small Perfetto/XPlane trace has localized the bottleneck.