# SFINCS-JAX RHSMode=1 Profile-Current Handoff

This note records the finite-beta RHSMode=1 profile-current diagnostics used by
the NTX validation lane.  The coefficient-level finite-beta transport-matrix
ladder is already below the order-`1e-1` coefficient gate at the profiled radii;
the same-geometry profile-current calculation is now fast enough to run the
pitch/velocity/radial convergence ladder used by the reduced-closure stress
gate.

## Goal

Run SFINCS-JAX, Redl, and NTX+NEOPAX on the same owned finite-beta QA VMEC
geometry and analytic profile contract, then use the completed RHSMode=1
SFINCS-JAX profile-current outputs to separate:

- profile-current normalization errors,
- radial/profile interpolation errors,
- reduced momentum-closure errors,
- and raw solver/convergence errors.

The finite-beta bootstrap-current comparison is closed as a reduced-closure
stress benchmark. It is not promoted as a broad full-collision parity claim.

## Physics and Normalization Contract

The owned deck is generated by
`examples/owned_finite_beta_sfincs_jax_profile_current_audit.py`.

Key settings:

- finite-beta QA VMEC wout:
  `/Users/rogeriojorge/local/single_stage_optimization_finite_beta/test/wout_LandremanPaul2021_QA_lowres_pressure_current.nc`
- `RHSMode = 1`
- `geometryScheme = 5`
- `inputRadialCoordinate = 3`
- `inputRadialCoordinateForGradients = 3`
- `includeXDotTerm = .true.`
- `includeElectricFieldTermInXiDot = .true.`
- `useDKESExBDrift = .false.`
- `includePhi1 = .false.`
- `dPhiHatdrN = 0.0`
- ion/electron species order: `Zs = 1, -1`
- `mHats = 2, 1/1836.15267343`
- density and temperature are normalized as `nHat = n / 1e20` and
  `THat = T / 1 keV`
- profile gradients are with respect to `rN = r/a = rho`; SFINCS-JAX converts
  internally to the radial coordinate used by the solver.

The SFINCS-JAX current observable is converted to SI units as

```text
<J.B>/sqrt(<B^2>) [A m^-2]
  = FSABjHatOverRootFSAB2 * e * 1e20 * sqrt(2 * 1keV / m_p)
```

For the current profiles here, the scale is
`7.012642439557152e6 A m^-2`.

The SFINCS-JAX HDF5 output uses

```text
FSABjHat = sum_s Z_s * FSABFlow_s
FSABjHatOverRootFSAB2 = FSABjHat / sqrt(FSABHat2)
```

which matches the normalization used by the NTX diagnostic script.

## Completed Same-Contract Data

Refresh status on 2026-05-01:

- SFINCS-JAX worktree: `/Users/rogeriojorge/local/tests/sfincs_jax`
- SFINCS-JAX version: `1.1.0`
- SFINCS-JAX commit: `df0c70d` (`origin/main`)
- NTX used `NTX_SFINCS_JAX_ROOT=/Users/rogeriojorge/local/tests/sfincs_jax`
  and `JAX_ENABLE_X64=True` for the reruns.

Committed finite-beta RHSMode=1 low-resolution profile-current artifact:

- JSON: `docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.json`
- PNG/PDF:
  `docs/_static/owned_finite_beta_sfincs_jax_profile_current_audit.{png,pdf}`
- grid: `Ntheta=13`, `Nzeta=15`, `Nxi=8`, `Nx=5`
- radii: `rho = 1/7, 0.5, 13/14`
- `nu_n = 8.31565e-3`

Current status from that artifact:

- completed RHSMode=1 current points: `3`
- max SFINCS-JAX vs Redl current relative difference: `0.8478018522703108`
- max SFINCS-JAX vs NTX+NEOPAX current relative difference:
  `0.8740375383375442`
- max NTX+NEOPAX vs Redl current relative difference:
  `0.21926076611238907`
- all completed SFINCS-JAX solver residual gates pass
- maximum true residual over target: `1.2527267276319048e-4`

At `rho=1/7`, the completed low-resolution current is

```text
FSABjHatOverRootFSAB2 = -0.44600080476476256
<J.B>/sqrt(<B^2>)     = -3.1276441715700175e6 A m^-2
NIterations           = 1
```

The low-resolution current values are unchanged by the optimized SFINCS-JAX
solver-policy update.  This confirms that the smoke-grid current gap is not a
linear-solver residual artifact.

## Profiling Runs

The 2026-05-01 rerun used SFINCS-JAX default RHSMode=1 solver policy unless
noted otherwise:

```bash
JAX_ENABLE_X64=True
PYTHONPATH=/Users/rogeriojorge/local/tests/sfincs_jax:$PYTHONPATH
```

The NTX profile-current audit now calls `write-output --compute-solution` and
writes `sfincsOutput.solver_trace.json` beside each HDF5 output.  The HDF5
summary must include `linearSolverAccepted=1`,
`linearSolverTrueResidualConverged=1`, and
`linearSolverResidualNorm <= linearSolverResidualTarget` before a current point
is treated as a numerically converged comparison.

### Local CPU, 13 x 15 x 8, Nx=5

The committed three-radius smoke artifact now completes through the normal
SFINCS-JAX CLI in `24.7 s` total on local CPU.  All three points use the
auto policy and pass the true-residual gate.  The current amplitudes remain the
same as before the solver-policy update.

### Local CPU, 17 x 21 x 12, Nx=5, Optimized Main

Generated single-point deck:

```bash
python examples/owned_finite_beta_sfincs_jax_profile_current_audit.py \
  --rho 0.14285714285714285 --nu-n 0.00831565 \
  --n-theta 17 --n-zeta 21 --n-xi 12 --nx 5 \
  --run-sfincs-jax \
  --output-dir examples/outputs/sfincs_jax_v1p1_profile_current/prod_17x21x12_deck \
  --output-prefix docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12
```

Result:

- HDF5 output completes through the normal CLI path in `9.90 s`
- committed lightweight summary artifact:
  `docs/_static/owned_finite_beta_sfincs_jax_profile_current_prod_17x21x12.{json,png,pdf}`
- max resident set size: `1.55 GB`
- solver method: `sparse_pc_gmres`
- true residual over target: `8.45200712991399e-7`
- output:

```text
FSABjHatOverRootFSAB2    = -1.3120713644649011
<J.B>/sqrt(<B^2>)        = -9.201087334174225e6 A m^-2
```

Comparison to the same-profile targets at `rho=1/7`:

- SFINCS-JAX vs Redl relative difference: `0.5522545492579316`
- SFINCS-JAX vs NTX+NEOPAX relative difference: `0.6294362315512264`
- NTX+NEOPAX vs Redl relative difference: `0.20828178269124106`

This closes the previous `17 x 21 x 12` residual/runtime blocker.  It does not
close finite-beta bootstrap-current parity because the current amplitude remains
far from both reference currents at this pitch truncation.

### Local CPU, 25 x 31 x 17, Nx=11

The three-radius production ladder:

- completes in `383.16 s` wall time;
- uses `sparse_pc_gmres` at every radius;
- reaches a maximum true residual over target of `2.058374287457432e-5`;
- reaches max SFINCS-JAX vs Redl current relative difference
  `1.2860477350497015`;
- reaches max SFINCS-JAX vs NTX+NEOPAX current relative difference
  `1.442947235410411`.

This proves the remaining profile-current discrepancy is not caused by a failed
linear solve.  It is a pitch truncation / collision-model / physical-branch
audit.

### Pitch-Resolution Audit

The new artifact
`docs/_static/owned_finite_beta_sfincs_jax_profile_current_resolution_audit.{json,png,pdf}`
summarizes a fixed-radius `Nxi` scan at `rho=1/7`, `Ntheta=17`, `Nzeta=21`,
`Nx=5`.

Key metrics:

- scan rows: `18`
- solver-converged rows: `17`
- best SFINCS-JAX vs Redl relative difference: `0.024877277870714646`
- best SFINCS-JAX vs NTX+NEOPAX relative difference: `0.015201711479977497`
- high-`Nxi` even/odd tail relative gap: `0.13232636827032285`
- Redl `1e-1` gate pass count: `2`
- NTX+NEOPAX `1e-1` gate pass count: `4`

The scan shows a real terminal-Legendre-mode parity split. Even `Nxi` values
move through both reference scales, while odd `Nxi` values sit on a larger
current branch. The adjacent high-`Nxi` gap is `1.323e-1`, which is accepted
under the documented `1.5e-1` reduced-closure stress tolerance.

### Full-Collision Probe

A same-grid `collisionOperator=0` probe at `17 x 21 x 20, Nx=5` was attempted
as a physics discriminator.  It timed out after `901.76 s`, used about
`9.97 GB` max RSS, and did not write a completed current output.  The PAS
profile-current branch is therefore the only completed SFINCS-JAX RHSMode=1
finite-beta current reference in this NTX artifact set.

### Production RHSMode=3 Coefficient Ladder, Optimized Main

The production coefficient ladder was rerun through the same clean SFINCS-JAX
main checkout:

- grid: `35 x 43 x 48`
- six completed same-grid points
- max NTX/SFINCS-JAX `L13/L31/L33` relative difference:
  `0.02064719610195181`
- coefficient gate: `1e-1`, passed
- current-conditioned precision gate: still failed, with maximum precision gap
  `29.948023011811134`
- mean production point runtime: `5.999091423504676 s`

This closes the broad finite-beta coefficient-resolution lane for the current
owned QA case.  The remaining finite-beta current work is localized to the
profile-current closure/observable layer, pitch Legendre truncation, and the
full-collision production path.

## Current Bottleneck Hypotheses

1. The remaining finite-beta profile-current gap is not explained by the
   RHSMode=3 monoenergetic coefficient bridge.  The optimized production
   coefficient ladder remains below `2.1e-2`.
2. The `17 x 21 x 12, Nx=5` RHSMode=1 profile-current point changes the current
   amplitude substantially relative to the `13 x 15 x 8` smoke grid, so the
   profile-current observable is resolution sensitive.
3. The `17 x 21 x 12, Nx=5` point now converges quickly with sparse-PC GMRES,
   so the old residual-stalling hypothesis is closed.
4. The `Nxi` scan shows a high-order even/odd Legendre truncation split in the
   current observable.
5. The full-collision RHSMode=1 point is not yet a practical production
   comparison at this grid on the local CPU.

## Requested SFINCS-JAX Developer Actions

Recommended implementation work:

- add a solve-complete/profile-finalization split so a successful HDF5 solve is
  not reported as a failed run if Perfetto/XPlane finalization fails;
- add timeout-safe profiling that periodically flushes phase timings, residual
  history, and partial diagnostics outside the JAX profiler context;
- write solver progress for RHSMode=1 every fixed number of Krylov iterations,
  including residual norm, preconditioner kind, fallback decisions, and elapsed
  time since last progress line;
- time and report these phases separately:
  geometry/output field build, operator build, RHS assembly, constraint
  projection, preconditioner probe, Schur/PAS build, Krylov iterations,
  diagnostics, and HDF5 write;
- expose a stable option to write partial HDF5 diagnostics on timeout when the
  state vector or current residual is available;
- document the expected even/odd `Nxi` behavior for the finite Legendre
  hierarchy and retain the current `1.5e-1` reduced-closure stress policy;
- make `collisionOperator=0` RHSMode=1 feasible for this finite-beta deck or
  document the memory/runtime ceiling and recommended reduced test;
- add a small finite-beta RHSMode=1 profile-current regression at
  `13 x 15 x 8, Nx=5`, using the current observable above;
- add a larger optional benchmark at `17 x 21 x 12, Nx=5` that is expected to
  complete HDF5 output within a documented budget and report whether the solve
  residual satisfies the requested tolerance.

Recommended validation sequence after those changes:

1. Re-run `13 x 15 x 8, Nx=5` CPU and GPU, warm and cold, and confirm the
   current remains `FSABjHatOverRootFSAB2 = -0.44600080476476256` at `rho=1/7`.
2. Re-run `17 x 21 x 12, Nx=5` with the solver metadata gate and require
   `sparse_pc_gmres` or another true-residual-converged method.
3. Re-run paired even/odd `Nxi` ladders when the pitch discretization or
   current observable changes, and require the gap to stay below `1.5e-1`.
4. Re-run the finite-beta radial/collisionality profile-current ladder when
   changing geometry loading, interpolation, or closure semantics.
5. Keep the bootstrap-current figure scoped as a reduced-closure stress result
   unless a feasible full-collision RHSMode=1 branch is added.

## NTX-Side Status

NTX now exposes Perfetto and device-memory profiling options in the
prepared-geometry reuse profiler:

```bash
python examples/prepared_geometry_reuse_profile.py \
  --preset smoke --case-counts 3 \
  --output-prefix examples/outputs/ntx_prepared_geometry_profile/cpu_smoke \
  --trace-dir examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace \
  --perfetto \
  --device-memory-profile examples/outputs/ntx_prepared_geometry_profile/cpu_smoke_trace/device_memory.prof
```

The smoke run reports:

- warmup solve: `2.091 s`
- direct three-case path: `0.464 s`
- compiled steady three-case path: `0.000968 s`
- compiled steady speedup vs direct: `4.79e2`
- max compiled relative mismatch: `6.85e-11`
- peak RSS: about `1.83 GB`

This supports the current NTX performance plan: keep fixed-shape prepared
compiled closures as the first optimization target, and avoid broad XLA dump
passes unless a small Perfetto/XPlane trace has localized the bottleneck.