When Explicit Blows Up, Solve Implicit Without Inverting the Matrix — LU-SGS

The run diverged at 3 a.m. The last line of the log was always the same: pressure residual = NaN. I dropped the CFL from 0.9 to 0.5, then to 0.3. Each step advanced so little time that reaching steady state would take days. That wall is familiar to anyone doing compressible steady-state work. This post shows the way over it — why implicit time-stepping permits a large time step — and then implements LU-SGS, a method that solves that system without ever inverting the full matrix. By the end you'll hold a Burgers solver that survives a CFL of 5.

The stability region opens to infinity#

Time-stepping stability is decided by a single model equation.

\frac{du}{dt} = \lambda u

Here $\lambda$ is the eigenvalue produced by the spatial discretization (it comes from wave speeds and viscosity), and $u$ is the amplitude of one mode. After one step the amplitude is multiplied by the amplification factor $G$ . We need $|G|\le 1$ to avoid blow-up.

Explicit stepping (forward Euler) gives

G = 1 + \lambda\Delta t

so $|1+\lambda\Delta t|\le 1$ — the point $\lambda\Delta t$ must sit inside a small disk centered at $-1$ with radius $1$ . A viscous grid or a thin cell makes $\lambda$ large, and it leaves that disk fast.

Implicit stepping (backward Euler) is the opposite.

G = \frac{1}{1 - \lambda\Delta t}

For every physical decaying mode with $\mathrm{Re}(\lambda)<0$ , $|G|<1$ . It stays stable no matter how large $\Delta t$ grows. The stability constraint dissolves, and now the time step only has to be as small as accuracy demands.

Play with the parameters in the simulation below.

Re(λΔt) = -1.00Im(λΔt) = 1.40

Cyan = stable region (|G| ≤ 1). Forward Euler and RK4 are bounded blobs — push λΔt past their edge and the mode blows up. Backward Euler and Trapezoidal cover the entire left half plane, so any decaying physical mode stays stable no matter how large Δt grows.

Pick forward Euler and drag $\mathrm{Re}(\lambda\Delta t)$ to $-2.5$ : the point leaves the cyan region and flips to UNSTABLE. Switch to backward Euler and the whole left half plane fills cyan, so the same point becomes stable again.

Inverting the full matrix is a cost you can't pay#

Nothing is free. Implicit stepping asks for the residual at the future state.

\frac{U_i^{n+1} - U_i^n}{\Delta t} = -R_i(U^{n+1})

$R_i$ is the flux divergence (residual) of cell $i$ , and $U$ is the conserved variable. Linearize $R(U^{n+1})$ about the current state,

R_i(U^{n+1}) \approx R_i^n + \frac{\partial R_i}{\partial U}\,\Delta U

and a linear system for $\Delta U = U^{n+1}-U^n$ remains.

\left( \frac{I}{\Delta t} + \frac{\partial R}{\partial U} \right) \Delta U = -R^n

The trouble is the matrix $A$ on the left. With fifty million cells, $A$ has fifty million dimensions. A direct inverse is impossible in both memory and time. A Krylov solver like GMRES works too, but it has to store the matrix explicitly and needs a preconditioner. In the 1990s compressible analysts found a cheaper path: approximate the matrix with no storage, no multiplications, and two sweeps.

LU-SGS — split into D, L, U and sweep twice#

Decompose $A$ into a block lower triangle $L$ , a block diagonal $D$ , and a block upper triangle $U$ .

A = D + L + U

LU-SGS (Lower-Upper Symmetric Gauss-Seidel) approximates this $A$ by the product

(D + L)\,D^{-1}\,(D + U)\,\Delta U = -R^n

which discards the $L D^{-1} U$ term created by the $LU$ product. When $\Delta U$ is small it's negligible. The solve is now two triangular back-substitutions in sequence.

Forward sweep — lower triangle, front to back:

\Delta U_i^{*} = D_i^{-1}\!\left( -R_i^n - L_i\,\Delta U_{i-1}^{*} \right)

Backward sweep — upper triangle, back to front:

\Delta U_i = \Delta U_i^{*} - D_i^{-1}\,U_i\,\Delta U_{i+1}

The key is what fills $D$ , $L$ , and $U$ . Yoon and Jameson (1988) approximated the nonlinear flux Jacobian with its spectral radius $\rho$ . Splitting first-order upwind as $A^{\pm} = \tfrac{1}{2}(a \pm |a|)$ leaves the diagonal $D_i = I/\Delta t + |a_i|/\Delta x$ and off-diagonals that carry only the split face Jacobians. You never build the whole matrix — a few scalars (or a small block) per cell are enough.

Python — advancing implicit Burgers with one factorization#

The toy problem is shock formation in the 1D Burgers equation. A smooth sine wave steepens into a shock over time. Explicit stepping would pin the CFL below 1. The LU-SGS implicit step survives at CFL 5.

import numpy as np
 
def upwind_flux_residual(u, dx):
    """First-order upwind flux for the residual R_i of d(u^2/2)/dx (periodic)."""
    f = 0.5 * u * u
    a = u                       # local wave speed df/du
    f_left  = np.where(a >= 0, np.roll(f, 1), f)
    a_right = np.roll(a, -1)
    f_right = np.where(a_right >= 0, f, np.roll(f, -1))
    return (f_right - f_left) / dx
 
def lu_sgs_burgers_step(u, dt, dx):
    """One time step: advance the implicit system with a single LU-SGS pass."""
    n = u.size
    a  = u
    ap = 0.5 * (a + np.abs(a))         # A+  (lower-triangle contribution)
    am = 0.5 * (a - np.abs(a))         # A-  (upper-triangle contribution)
    D  = 1.0 / dt + np.abs(a) / dx     # diagonal block
    R  = upwind_flux_residual(u, dx)
 
    dus = np.zeros(n)                  # forward sweep: (D+L) dU* = -R
    for i in range(n):
        lower = ap[i - 1] / dx * dus[i - 1]
        dus[i] = (-R[i] + lower) / D[i]
 
    du = dus.copy()                   # backward sweep: (D+U) dU = D dU*
    for i in range(n - 1, -1, -1):
        upper = am[(i + 1) % n] / dx * du[(i + 1) % n]
        du[i] = dus[i] - upper / D[i]
    return u + du
 
def drive_burgers_shock(n=200, cfl=5.0, tmax=0.22):
    dx = 1.0 / n
    x  = (np.arange(n) + 0.5) * dx
    u  = 0.5 + 0.5 * np.sin(2 * np.pi * x)   # smooth initial condition
    t, steps = 0.0, 0
    while t < tmax:
        dt = cfl * dx / np.max(np.abs(u))    # CFL>1 allowed (implicit)
        dt = min(dt, tmax - t)
        u = lu_sgs_burgers_step(u, dt, dx)
        t += dt; steps += 1
    return x, u, steps
 
x, u, steps = drive_burgers_shock(cfl=5.0)
print(f"CFL=5 -> {steps} steps,  u in [{u.min():.3f}, {u.max():.3f}]")
# CFL=5 -> 49 steps,  u in [0.012, 0.988]

Run the same problem with explicit forward Euler at CFL 5 and it produces NaN within two or three steps. The implicit version reaches the shock in 49 stable steps. No heavy Riemann solver per face — one diagonal division and two sweeps did the whole job.

Sweep after sweep, the residual collapses#

A single LU-SGS application gives an approximate solution. To solve more accurately, you repeat the sweep on the same system. The convergence rate then depends on the diagonal dominance of the matrix. In the viscous-dominated case — where the viscous Jacobian becomes a 1D Laplacian — the diagonal is $1+2d$ and the off-diagonals are $-d$ . Here $d=\nu\Delta t/\Delta x^2$ is the diffusion number (the nondimensional time step of viscous diffusion).

Watch the residual fall sweep by sweep below.

diffusion # d = 6.0N = 48

Each sweep is one forward + one backward Gauss-Seidel pass. At d ≈ 1 the residual drops ten orders in a handful of sweeps. Crank d up to 40 (a huge implicit Δt) and the same curve flattens — the matrix is stiffer, so LU-SGS needs more sweeps or a stronger preconditioner to keep up.

Keep $d$ near 1 and the residual drops ten orders in a handful of sweeps. Crank $d$ up to 40 (a huge implicit $\Delta t$ ) and the same curve flattens. You bought stability by enlarging the step, but the matrix got stiffer, so the inner relaxation needs more sweeps. Stability and inner convergence do not arrive free together.

Traps to avoid when you write the code#

Three things catch people in practice.

Don't drop the viscous spectral radius from the diagonal. When the viscous term is large you must add a $\nu/\Delta x^2$ contribution to the diagonal $D$ . Omit it and $D$ shrinks, and the sweep diverges.

Keep underrelaxation in one hand. When the nonlinearity is strong, don't add the full $\Delta U$ at once. Use $U^{n+1}=U^n+\omega\,\Delta U$ with $\omega\in[0.5,0.8]$ . It kills oscillations near shocks.

Alternate the sweep direction. Repeating only the forward sweep lets information flow one way. Alternating forward and backward keeps the symmetry alive and speeds convergence. Across an MPI partition boundary, insert one communication between sweeps so the neighbor block's update propagates.

What I want to leave you with: implicit stepping opens the stability region across the whole left half plane and grants a large $\Delta t$ . The giant matrix it costs is approximated by LU-SGS with a $D\!\cdot\!L\!\cdot\!U$ split and two sweeps — no storage, no multiplications. Stability is nearly free, but inner convergence is paid for in proportion to the stiffness of the matrix.