# Numerics And Algorithms

NTX is built around one numerical idea: the Legendre representation of the
monoenergetic drift-kinetic equation leads to a dense block-tridiagonal system
that can be solved in time linear in `N_xi` and cubic in the number of spatial
points `N_fs = N_theta N_zeta`.

This page documents the discretization and the exact algorithm implemented in
the source tree.

## Angular Discretization

The angular grid is uniform in `\theta` and `\zeta`:

```{math}
\theta_j = \frac{2\pi j}{N_\theta},
\qquad
\zeta_\ell = \frac{2\pi \ell}{N_\mathrm{fp} N_\zeta}.
```

This is created by `periodic_grid(...)` in
[`src/ntx/grids.py`](../src/ntx/grids.py).

### Spectral Derivatives

The first-derivative matrices are Fourier-collocation matrices assembled by
differentiating the discrete Fourier basis:

```{math}
D = F^{-1} (ik) F.
```

In code this is `fourier_derivative_matrix(...)` in
[`src/ntx/grids.py`](../src/ntx/grids.py). It works for even and odd grid sizes
and is easy to differentiate because it is built from JAX FFT primitives.

## Dense Spatial Blocks

For each Legendre mode `k`, NTX packs the coefficient fields

```{math}
(c_\theta, c_\zeta, c_0)
```

and constructs the dense operator block

```{math}
\mathcal B_k
=
\mathrm{diag}(c_\theta) D_\theta
+
\mathrm{diag}(c_\zeta) D_\zeta
+
\mathrm{diag}(c_0).
```

This happens in:

- `derivative_blocks(...)`
- `build_block(...)`
- `operator_blocks(...)`

in [`src/ntx/operators.py`](../src/ntx/operators.py).

## Forward Schur Recursion

NTX does not invert the full block-tridiagonal matrix directly. Instead it uses
the Schur-complement recursion described in the thesis.

Starting from the terminal mode,

```{math}
\Delta_{N_\xi} = D_{N_\xi},
\qquad
X_{N_\xi} = \Delta_{N_\xi}^{-1} L_{N_\xi},
```

and then for descending `k`

```{math}
\Delta_k = D_k - U_k X_{k+1},
\qquad
X_k = \Delta_k^{-1} L_k.
```

This is implemented in `_solve_modes(...)` in
[`src/ntx/_solver_factorization.py`](../src/ntx/_solver_factorization.py)
with a `jax.lax.scan`.

### Why Only Modes 0, 1, And 2 Are Stored

The monoenergetic coefficients need only the first three Legendre modes.
Therefore NTX:

- runs the Schur recursion through all `k = N_\xi, \dots, 0`
- saves only `L_k`, `U_k`, and `\Delta_k` for `k = 0,1,2`
- back-substitutes only those low-order modes

That keeps the method dense and simple while avoiding storage of the full
Legendre spectrum in the standard transport workflow.

## Backward Substitution

Once the low-order Schur complements are known, NTX forms reduced right-hand
sides and solves

```{math}
\Delta_0 f^{(0)} = \sigma^{(0)},
\qquad
\Delta_1 f^{(1)} = \sigma^{(1)} - L_1 f^{(0)},
\qquad
\Delta_2 f^{(2)} = \sigma^{(2)} - L_2 f^{(1)}.
```

The code solves the transport and parallel-current systems together whenever
they share the same factorization. That is why `_solve_modes(...)` uses
stacked right-hand sides for several of the LU solves.

## Linear Algebra Choices

NTX uses:

- `jax.scipy.linalg.lu_factor`
- `jax.scipy.linalg.lu_solve`

throughout the dense solve. It does **not** form explicit inverses.

That choice is visible in:

- `_solve_modes(...)`
- `compile_prepared_solver(...)`

in [`src/ntx/_solver_factorization.py`](../src/ntx/_solver_factorization.py)
[`src/ntx/_solver_core.py`](../src/ntx/_solver_core.py), and the prepared
custom-VJP adjoint helpers in
[`src/ntx/_solver_adjoint.py`](../src/ntx/_solver_adjoint.py).

## Complexity

Let

```{math}
N_\mathrm{fs} = N_\theta N_\zeta.
```

Then the dense block solve scales as:

- linear in `N_\xi`
- cubic in `N_\mathrm{fs}`

or, schematically,

```{math}
\mathcal O(N_\xi N_\mathrm{fs}^3).
```

That is why NTX focuses on:

- keeping the Legendre recursion linear in `N_\xi`
- reusing prepared geometry for scans
- exposing separate serial-batched and multiprocess throughput lanes

## JAX Usage

NTX uses JAX in two distinct ways.

### Differentiable Imported Lane

The imported lane is designed for:

- `jit`
- `vmap`
- `grad`
- in-memory scans
- NEOPAX coupling

Key entry points:

- `solve_monoenergetic(...)`
- `solve_monoenergetic_scan(...)`
- `compile_prepared_solver(...)`
- `build_ntx_neopax_scan(...)`

### Non-Differentiable Throughput Lane

The multiprocess execution path in [`src/ntx/parallel.py`](../src/ntx/parallel.py)
is intentionally not differentiable. It exists to improve throughput on larger
production scans by running one worker process per device.

Use this lane when:

- scan size is large
- throughput matters more than differentiability
- device isolation is needed for multi-GPU reliability

## Serial, Threaded, And Multiprocess Scan Paths

NTX currently offers three scan styles.

### Serial Batched JAX

`solve_monoenergetic_scan(...)`

- default
- differentiable
- usually the fastest choice for small and medium scans

### Single-Process Device-Sharded Scan

`solve_monoenergetic_parallel_scan(...)`

- uses local healthy devices
- remains in one process
- useful when the local device stack is stable

### Multiprocess Scan

`solve_monoenergetic_multiprocess_scan(...)`

- one worker process per CPU or GPU device
- robust for multi-GPU isolation
- best treated as the throughput lane for larger jobs

The measured crossover behavior is summarized in
[Performance](performance.md).

## Convergence Guidance

The thesis shows that the hardest part of the solve is often `D31` at low
collisionality. NTX therefore follows the same practical logic:

1. converge `N_xi` first at low `\hat \nu`
2. then increase `N_theta` and `N_zeta` until `D31` is stable
3. check `|D13 + D31|` as an Onsager sanity test

The W7-X and CIEMAT-QI studies in the thesis suggest that low-collisionality
production runs can require `N_xi` in the `140-180` range and significantly
larger angular grids than the bundled examples.

The bundled examples are intentionally smaller because they are meant for:

- quick inspection
- tests
- documentation
- figure generation

not for final production convergence claims on arbitrary equilibria.