SFINCS-JAX RHSMode=1 Profile-Current Handoff

This note records the finite-beta RHSMode=1 profile-current diagnostics used by the NTX validation lane. The coefficient-level finite-beta transport-matrix ladder is already below the order-1e-1 coefficient gate at the profiled radii; the same-geometry profile-current calculation is now fast enough to run the pitch/velocity/radial convergence ladder used by the reduced-closure stress gate.

Goal

Run SFINCS-JAX, Redl, and NTX+NEOPAX on the same owned finite-beta QA VMEC geometry and analytic profile contract, then use the completed RHSMode=1 SFINCS-JAX profile-current outputs to separate:

profile-current normalization errors,
radial/profile interpolation errors,
reduced momentum-closure errors,
and raw solver/convergence errors.

The finite-beta bootstrap-current comparison is closed as a reduced-closure stress benchmark. It is not promoted as a broad full-collision parity claim.

Physics and Normalization Contract

The owned deck is generated by examples/owned_finite_beta_sfincs_jax_profile_current_audit.py.

Key settings:

finite-beta QA VMEC wout: /Users/rogeriojorge/local/single_stage_optimization_finite_beta/test/wout_LandremanPaul2021_QA_lowres_pressure_current.nc
RHSMode = 1
geometryScheme = 5
inputRadialCoordinate = 3
inputRadialCoordinateForGradients = 3
includeXDotTerm = .true.
includeElectricFieldTermInXiDot = .true.
useDKESExBDrift = .false.
includePhi1 = .false.
dPhiHatdrN = 0.0
ion/electron species order: Zs = 1, -1
mHats = 2, 1/1836.15267343
density and temperature are normalized as nHat = n / 1e20 and THat = T / 1 keV
profile gradients are with respect to rN = r/a = rho; SFINCS-JAX converts internally to the radial coordinate used by the solver.

The SFINCS-JAX current observable is converted to SI units as

<J.B>/sqrt(<B^2>) [A m^-2]
  = FSABjHatOverRootFSAB2 * e * 1e20 * sqrt(2 * 1keV / m_p)

For the current profiles here, the scale is 7.012642439557152e6 A m^-2.

The SFINCS-JAX HDF5 output uses

FSABjHat = sum_s Z_s * FSABFlow_s
FSABjHatOverRootFSAB2 = FSABjHat / sqrt(FSABHat2)

which matches the normalization used by the NTX diagnostic script.

Completed Same-Contract Data

Refresh status on 2026-05-01:

SFINCS-JAX worktree: /Users/rogeriojorge/local/tests/sfincs_jax
SFINCS-JAX version: 1.1.0
SFINCS-JAX commit: df0c70d (origin/main)
NTX used NTX_SFINCS_JAX_ROOT=/Users/rogeriojorge/local/tests/sfincs_jax and JAX_ENABLE_X64=True for the reruns.

Committed finite-beta RHSMode=1 low-resolution profile-current artifact:

JSON: docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.json
PNG/PDF: docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.{png,pdf}
grid: Ntheta=13, Nzeta=15, Nxi=8, Nx=5
radii: rho = 1/7, 0.5, 13/14
nu_n = 8.31565e-3

Current status from that artifact:

completed RHSMode=1 current points: 3
max SFINCS-JAX vs Redl current relative difference: 0.8478018522703108
max SFINCS-JAX vs NTX+NEOPAX current relative difference: 0.8740375383375442
max NTX+NEOPAX vs Redl current relative difference: 0.21926076611238907
all completed SFINCS-JAX solver residual gates pass
maximum true residual over target: 1.2527267276319048e-4

At rho=1/7, the completed low-resolution current is

FSABjHatOverRootFSAB2 = -0.44600080476476256
<J.B>/sqrt(<B^2>)     = -3.1276441715700175e6 A m^-2
NIterations           = 1

The low-resolution current values are unchanged by the optimized SFINCS-JAX solver-policy update. This confirms that the smoke-grid current gap is not a linear-solver residual artifact.

Profiling Runs

The 2026-05-01 rerun used SFINCS-JAX default RHSMode=1 solver policy unless noted otherwise:

JAX_ENABLE_X64=True
PYTHONPATH=/Users/rogeriojorge/local/tests/sfincs_jax:$PYTHONPATH

The NTX profile-current audit now calls write-output --compute-solution and writes sfincsOutput.solver_trace.json beside each HDF5 output. The HDF5 summary must include linearSolverAccepted=1, linearSolverTrueResidualConverged=1, and linearSolverResidualNorm <= linearSolverResidualTarget before a current point is treated as a numerically converged comparison.

Local CPU, 13 x 15 x 8, Nx=5

The committed three-radius smoke artifact now completes through the normal SFINCS-JAX CLI in 24.7 s total on local CPU. All three points use the auto policy and pass the true-residual gate. The current amplitudes remain the same as before the solver-policy update.

Local CPU, 17 x 21 x 12, Nx=5, Optimized Main

Generated single-point deck:

python examples/owned_finite_beta_sfincs_jax_profile_current_audit.py \
  --rho 0.14285714285714285 --nu-n 0.00831565 \
  --n-theta 17 --n-zeta 21 --n-xi 12 --nx 5 \
  --run-sfincs-jax \
  --output-dir examples/outputs/sfincs_jax_v1p1_profile_current/prod_17x21x12_deck \
  --output-prefix docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12

Result:

HDF5 output completes through the normal CLI path in 9.90 s
committed lightweight summary artifact: docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12.{json,png,pdf}
max resident set size: 1.55 GB
solver method: sparse_pc_gmres
true residual over target: 8.45200712991399e-7
output:

FSABjHatOverRootFSAB2    = -1.3120713644649011
<J.B>/sqrt(<B^2>)        = -9.201087334174225e6 A m^-2

Comparison to the same-profile targets at rho=1/7:

SFINCS-JAX vs Redl relative difference: 0.5522545492579316
SFINCS-JAX vs NTX+NEOPAX relative difference: 0.6294362315512264
NTX+NEOPAX vs Redl relative difference: 0.20828178269124106

This closes the previous 17 x 21 x 12 residual/runtime blocker. It does not close finite-beta bootstrap-current parity because the current amplitude remains far from both reference currents at this pitch truncation.

Local CPU, 25 x 31 x 17, Nx=11

The three-radius production ladder:

completes in 383.16 s wall time;
uses sparse_pc_gmres at every radius;
reaches a maximum true residual over target of 2.058374287457432e-5;
reaches max SFINCS-JAX vs Redl current relative difference 1.2860477350497015;
reaches max SFINCS-JAX vs NTX+NEOPAX current relative difference 1.442947235410411.

This proves the remaining profile-current discrepancy is not caused by a failed linear solve. It is a pitch truncation / collision-model / physical-branch audit.

Pitch-Resolution Audit

The new artifact docs/_static/owned_finite_beta_sfincs_jax_profile_current_resolution_audit.{json,png,pdf} summarizes a fixed-radius Nxi scan at rho=1/7, Ntheta=17, Nzeta=21, Nx=5.

Key metrics:

scan rows: 18
solver-converged rows: 17
best SFINCS-JAX vs Redl relative difference: 0.024877277870714646
best SFINCS-JAX vs NTX+NEOPAX relative difference: 0.015201711479977497
high-Nxi even/odd tail relative gap: 0.13232636827032285
Redl 1e-1 gate pass count: 2
NTX+NEOPAX 1e-1 gate pass count: 4

The scan shows a real terminal-Legendre-mode parity split. Even Nxi values move through both reference scales, while odd Nxi values sit on a larger current branch. The adjacent high-Nxi gap is 1.323e-1, which is accepted under the documented 1.5e-1 reduced-closure stress tolerance.

Full-Collision Probe

A same-grid collisionOperator=0 probe at 17 x 21 x 20, Nx=5 was attempted as a physics discriminator. It timed out after 901.76 s, used about 9.97 GB max RSS, and did not write a completed current output. The PAS profile-current branch is therefore the only completed SFINCS-JAX RHSMode=1 finite-beta current reference in this NTX artifact set.

Production RHSMode=3 Coefficient Ladder, Optimized Main

The production coefficient ladder was rerun through the same clean SFINCS-JAX main checkout:

grid: 35 x 43 x 48
six completed same-grid points
max NTX/SFINCS-JAX L13/L31/L33 relative difference: 0.02064719610195181
coefficient gate: 1e-1, passed
current-conditioned precision gate: still failed, with maximum precision gap 29.948023011811134
mean production point runtime: 5.999091423504676 s

This closes the broad finite-beta coefficient-resolution lane for the current owned QA case. The remaining finite-beta current work is localized to the profile-current closure/observable layer, pitch Legendre truncation, and the full-collision production path.

Current Bottleneck Hypotheses

The remaining finite-beta profile-current gap is not explained by the RHSMode=3 monoenergetic coefficient bridge. The optimized production coefficient ladder remains below 2.1e-2.
The 17 x 21 x 12, Nx=5 RHSMode=1 profile-current point changes the current amplitude substantially relative to the 13 x 15 x 8 smoke grid, so the profile-current observable is resolution sensitive.
The 17 x 21 x 12, Nx=5 point now converges quickly with sparse-PC GMRES, so the old residual-stalling hypothesis is closed.
The Nxi scan shows a high-order even/odd Legendre truncation split in the current observable.
The full-collision RHSMode=1 point is not yet a practical production comparison at this grid on the local CPU.

Requested SFINCS-JAX Developer Actions

Recommended implementation work:

add a solve-complete/profile-finalization split so a successful HDF5 solve is not reported as a failed run if Perfetto/XPlane finalization fails;
add timeout-safe profiling that periodically flushes phase timings, residual history, and partial diagnostics outside the JAX profiler context;
write solver progress for RHSMode=1 every fixed number of Krylov iterations, including residual norm, preconditioner kind, fallback decisions, and elapsed time since last progress line;
time and report these phases separately: geometry/output field build, operator build, RHS assembly, constraint projection, preconditioner probe, Schur/PAS build, Krylov iterations, diagnostics, and HDF5 write;
expose a stable option to write partial HDF5 diagnostics on timeout when the state vector or current residual is available;
document the expected even/odd Nxi behavior for the finite Legendre hierarchy and retain the current 1.5e-1 reduced-closure stress policy;
make collisionOperator=0 RHSMode=1 feasible for this finite-beta deck or document the memory/runtime ceiling and recommended reduced test;
add a small finite-beta RHSMode=1 profile-current regression at 13 x 15 x 8, Nx=5, using the current observable above;
add a larger optional benchmark at 17 x 21 x 12, Nx=5 that is expected to complete HDF5 output within a documented budget and report whether the solve residual satisfies the requested tolerance.

Recommended validation sequence after those changes:

Re-run 13 x 15 x 8, Nx=5 CPU and GPU, warm and cold, and confirm the current remains FSABjHatOverRootFSAB2 = -0.44600080476476256 at rho=1/7.
Re-run 17 x 21 x 12, Nx=5 with the solver metadata gate and require sparse_pc_gmres or another true-residual-converged method.
Re-run paired even/odd Nxi ladders when the pitch discretization or current observable changes, and require the gap to stay below 1.5e-1.
Re-run the finite-beta radial/collisionality profile-current ladder when changing geometry loading, interpolation, or closure semantics.
Keep the bootstrap-current figure scoped as a reduced-closure stress result unless a feasible full-collision RHSMode=1 branch is added.

NTX-Side Status

NTX now exposes Perfetto and device-memory profiling options in the prepared-geometry reuse profiler:

python examples/prepared_geometry_reuse_profile.py \
  --preset smoke --case-counts 3 \
  --output-prefix examples/outputs/ntx_prepared_geometry_profile/cpu_smoke \
  --trace-dir examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace \
  --perfetto \
  --device-memory-profile examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace/device_memory.prof

The smoke run reports:

warmup solve: 2.091 s
direct three-case path: 0.464 s
compiled steady three-case path: 0.000968 s
compiled steady speedup vs direct: 4.79e2
max compiled relative mismatch: 6.85e-11
peak RSS: about 1.83 GB

This supports the current NTX performance plan: keep fixed-shape prepared compiled closures as the first optimization target, and avoid broad XLA dump passes unless a small Perfetto/XPlane trace has localized the bottleneck.