SFINCS-JAX RHSMode=1 Profile-Current Handoff
This note records the finite-beta RHSMode=1 profile-current diagnostics used by
the NTX validation lane. The coefficient-level finite-beta transport-matrix
ladder is already below the order-1e-1 coefficient gate at the profiled radii;
the same-geometry profile-current calculation is now fast enough to run the
pitch/velocity/radial convergence ladder used by the reduced-closure stress
gate.
Goal
Run SFINCS-JAX, Redl, and NTX+NEOPAX on the same owned finite-beta QA VMEC geometry and analytic profile contract, then use the completed RHSMode=1 SFINCS-JAX profile-current outputs to separate:
profile-current normalization errors,
radial/profile interpolation errors,
reduced momentum-closure errors,
and raw solver/convergence errors.
The finite-beta bootstrap-current comparison is closed as a reduced-closure stress benchmark. It is not promoted as a broad full-collision parity claim.
Physics and Normalization Contract
The owned deck is generated by
examples/owned_finite_beta_sfincs_jax_profile_current_audit.py.
Key settings:
finite-beta QA VMEC wout:
/Users/rogeriojorge/local/single_stage_optimization_finite_beta/test/wout_LandremanPaul2021_QA_lowres_pressure_current.ncRHSMode = 1geometryScheme = 5inputRadialCoordinate = 3inputRadialCoordinateForGradients = 3includeXDotTerm = .true.includeElectricFieldTermInXiDot = .true.useDKESExBDrift = .false.includePhi1 = .false.dPhiHatdrN = 0.0ion/electron species order:
Zs = 1, -1mHats = 2, 1/1836.15267343density and temperature are normalized as
nHat = n / 1e20andTHat = T / 1 keVprofile gradients are with respect to
rN = r/a = rho; SFINCS-JAX converts internally to the radial coordinate used by the solver.
The SFINCS-JAX current observable is converted to SI units as
<J.B>/sqrt(<B^2>) [A m^-2]
= FSABjHatOverRootFSAB2 * e * 1e20 * sqrt(2 * 1keV / m_p)
For the current profiles here, the scale is
7.012642439557152e6 A m^-2.
The SFINCS-JAX HDF5 output uses
FSABjHat = sum_s Z_s * FSABFlow_s
FSABjHatOverRootFSAB2 = FSABjHat / sqrt(FSABHat2)
which matches the normalization used by the NTX diagnostic script.
Completed Same-Contract Data
Refresh status on 2026-05-01:
SFINCS-JAX worktree:
/Users/rogeriojorge/local/tests/sfincs_jaxSFINCS-JAX version:
1.1.0SFINCS-JAX commit:
df0c70d(origin/main)NTX used
NTX_SFINCS_JAX_ROOT=/Users/rogeriojorge/local/tests/sfincs_jaxandJAX_ENABLE_X64=Truefor the reruns.
Committed finite-beta RHSMode=1 low-resolution profile-current artifact:
JSON:
docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.jsonPNG/PDF:
docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.{png,pdf}grid:
Ntheta=13,Nzeta=15,Nxi=8,Nx=5radii:
rho = 1/7, 0.5, 13/14nu_n = 8.31565e-3
Current status from that artifact:
completed RHSMode=1 current points:
3max SFINCS-JAX vs Redl current relative difference:
0.8478018522703108max SFINCS-JAX vs NTX+NEOPAX current relative difference:
0.8740375383375442max NTX+NEOPAX vs Redl current relative difference:
0.21926076611238907all completed SFINCS-JAX solver residual gates pass
maximum true residual over target:
1.2527267276319048e-4
At rho=1/7, the completed low-resolution current is
FSABjHatOverRootFSAB2 = -0.44600080476476256
<J.B>/sqrt(<B^2>) = -3.1276441715700175e6 A m^-2
NIterations = 1
The low-resolution current values are unchanged by the optimized SFINCS-JAX solver-policy update. This confirms that the smoke-grid current gap is not a linear-solver residual artifact.
Profiling Runs
The 2026-05-01 rerun used SFINCS-JAX default RHSMode=1 solver policy unless noted otherwise:
JAX_ENABLE_X64=True
PYTHONPATH=/Users/rogeriojorge/local/tests/sfincs_jax:$PYTHONPATH
The NTX profile-current audit now calls write-output --compute-solution and
writes sfincsOutput.solver_trace.json beside each HDF5 output. The HDF5
summary must include linearSolverAccepted=1,
linearSolverTrueResidualConverged=1, and
linearSolverResidualNorm <= linearSolverResidualTarget before a current point
is treated as a numerically converged comparison.
Local CPU, 13 x 15 x 8, Nx=5
The committed three-radius smoke artifact now completes through the normal
SFINCS-JAX CLI in 24.7 s total on local CPU. All three points use the
auto policy and pass the true-residual gate. The current amplitudes remain the
same as before the solver-policy update.
Local CPU, 17 x 21 x 12, Nx=5, Optimized Main
Generated single-point deck:
python examples/owned_finite_beta_sfincs_jax_profile_current_audit.py \
--rho 0.14285714285714285 --nu-n 0.00831565 \
--n-theta 17 --n-zeta 21 --n-xi 12 --nx 5 \
--run-sfincs-jax \
--output-dir examples/outputs/sfincs_jax_v1p1_profile_current/prod_17x21x12_deck \
--output-prefix docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12
Result:
HDF5 output completes through the normal CLI path in
9.90 scommitted lightweight summary artifact:
docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12.{json,png,pdf}max resident set size:
1.55 GBsolver method:
sparse_pc_gmrestrue residual over target:
8.45200712991399e-7output:
FSABjHatOverRootFSAB2 = -1.3120713644649011
<J.B>/sqrt(<B^2>) = -9.201087334174225e6 A m^-2
Comparison to the same-profile targets at rho=1/7:
SFINCS-JAX vs Redl relative difference:
0.5522545492579316SFINCS-JAX vs NTX+NEOPAX relative difference:
0.6294362315512264NTX+NEOPAX vs Redl relative difference:
0.20828178269124106
This closes the previous 17 x 21 x 12 residual/runtime blocker. It does not
close finite-beta bootstrap-current parity because the current amplitude remains
far from both reference currents at this pitch truncation.
Local CPU, 25 x 31 x 17, Nx=11
The three-radius production ladder:
completes in
383.16 swall time;uses
sparse_pc_gmresat every radius;reaches a maximum true residual over target of
2.058374287457432e-5;reaches max SFINCS-JAX vs Redl current relative difference
1.2860477350497015;reaches max SFINCS-JAX vs NTX+NEOPAX current relative difference
1.442947235410411.
This proves the remaining profile-current discrepancy is not caused by a failed linear solve. It is a pitch truncation / collision-model / physical-branch audit.
Pitch-Resolution Audit
The new artifact
docs/_static/owned_finite_beta_sfincs_jax_profile_current_resolution_audit.{json,png,pdf}
summarizes a fixed-radius Nxi scan at rho=1/7, Ntheta=17, Nzeta=21,
Nx=5.
Key metrics:
scan rows:
18solver-converged rows:
17best SFINCS-JAX vs Redl relative difference:
0.024877277870714646best SFINCS-JAX vs NTX+NEOPAX relative difference:
0.015201711479977497high-
Nxieven/odd tail relative gap:0.13232636827032285Redl
1e-1gate pass count:2NTX+NEOPAX
1e-1gate pass count:4
The scan shows a real terminal-Legendre-mode parity split. Even Nxi values
move through both reference scales, while odd Nxi values sit on a larger
current branch. The adjacent high-Nxi gap is 1.323e-1, which is accepted
under the documented 1.5e-1 reduced-closure stress tolerance.
Full-Collision Probe
A same-grid collisionOperator=0 probe at 17 x 21 x 20, Nx=5 was attempted
as a physics discriminator. It timed out after 901.76 s, used about
9.97 GB max RSS, and did not write a completed current output. The PAS
profile-current branch is therefore the only completed SFINCS-JAX RHSMode=1
finite-beta current reference in this NTX artifact set.
Production RHSMode=3 Coefficient Ladder, Optimized Main
The production coefficient ladder was rerun through the same clean SFINCS-JAX main checkout:
grid:
35 x 43 x 48six completed same-grid points
max NTX/SFINCS-JAX
L13/L31/L33relative difference:0.02064719610195181coefficient gate:
1e-1, passedcurrent-conditioned precision gate: still failed, with maximum precision gap
29.948023011811134mean production point runtime:
5.999091423504676 s
This closes the broad finite-beta coefficient-resolution lane for the current owned QA case. The remaining finite-beta current work is localized to the profile-current closure/observable layer, pitch Legendre truncation, and the full-collision production path.
Current Bottleneck Hypotheses
The remaining finite-beta profile-current gap is not explained by the RHSMode=3 monoenergetic coefficient bridge. The optimized production coefficient ladder remains below
2.1e-2.The
17 x 21 x 12, Nx=5RHSMode=1 profile-current point changes the current amplitude substantially relative to the13 x 15 x 8smoke grid, so the profile-current observable is resolution sensitive.The
17 x 21 x 12, Nx=5point now converges quickly with sparse-PC GMRES, so the old residual-stalling hypothesis is closed.The
Nxiscan shows a high-order even/odd Legendre truncation split in the current observable.The full-collision RHSMode=1 point is not yet a practical production comparison at this grid on the local CPU.
Requested SFINCS-JAX Developer Actions
Recommended implementation work:
add a solve-complete/profile-finalization split so a successful HDF5 solve is not reported as a failed run if Perfetto/XPlane finalization fails;
add timeout-safe profiling that periodically flushes phase timings, residual history, and partial diagnostics outside the JAX profiler context;
write solver progress for RHSMode=1 every fixed number of Krylov iterations, including residual norm, preconditioner kind, fallback decisions, and elapsed time since last progress line;
time and report these phases separately: geometry/output field build, operator build, RHS assembly, constraint projection, preconditioner probe, Schur/PAS build, Krylov iterations, diagnostics, and HDF5 write;
expose a stable option to write partial HDF5 diagnostics on timeout when the state vector or current residual is available;
document the expected even/odd
Nxibehavior for the finite Legendre hierarchy and retain the current1.5e-1reduced-closure stress policy;make
collisionOperator=0RHSMode=1 feasible for this finite-beta deck or document the memory/runtime ceiling and recommended reduced test;add a small finite-beta RHSMode=1 profile-current regression at
13 x 15 x 8, Nx=5, using the current observable above;add a larger optional benchmark at
17 x 21 x 12, Nx=5that is expected to complete HDF5 output within a documented budget and report whether the solve residual satisfies the requested tolerance.
Recommended validation sequence after those changes:
Re-run
13 x 15 x 8, Nx=5CPU and GPU, warm and cold, and confirm the current remainsFSABjHatOverRootFSAB2 = -0.44600080476476256atrho=1/7.Re-run
17 x 21 x 12, Nx=5with the solver metadata gate and requiresparse_pc_gmresor another true-residual-converged method.Re-run paired even/odd
Nxiladders when the pitch discretization or current observable changes, and require the gap to stay below1.5e-1.Re-run the finite-beta radial/collisionality profile-current ladder when changing geometry loading, interpolation, or closure semantics.
Keep the bootstrap-current figure scoped as a reduced-closure stress result unless a feasible full-collision RHSMode=1 branch is added.
NTX-Side Status
NTX now exposes Perfetto and device-memory profiling options in the prepared-geometry reuse profiler:
python examples/prepared_geometry_reuse_profile.py \
--preset smoke --case-counts 3 \
--output-prefix examples/outputs/ntx_prepared_geometry_profile/cpu_smoke \
--trace-dir examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace \
--perfetto \
--device-memory-profile examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace/device_memory.prof
The smoke run reports:
warmup solve:
2.091 sdirect three-case path:
0.464 scompiled steady three-case path:
0.000968 scompiled steady speedup vs direct:
4.79e2max compiled relative mismatch:
6.85e-11peak RSS: about
1.83 GB
This supports the current NTX performance plan: keep fixed-shape prepared compiled closures as the first optimization target, and avoid broad XLA dump passes unless a small Perfetto/XPlane trace has localized the bottleneck.