Reading Time: 7 minutes

Key Takeaways

  • pytest basics cover unit testing. Fixtures, parametrization, and pytest.approx are your starting point, not the finish line.
  • Property-based testing with Hypothesis catches edge cases that manual tests often miss. It is used by major scientific Python projects such as NumPy, JAX, and PyTorch.
  • Numerical convergence testing requires solver-specific tolerance strategies. Standard assertions do not capture convergence behavior.
  • Integration testing for simulation workflows needs mocking patterns that isolate external dependencies while preserving numerical logic.
  • Reproducibility testing goes beyond unit tests. You need deterministic seed management and cross-platform floating-point verification.

What To Know First

If you already know the basics of unit testing for scientific code, you likely understand how to write a test_ function, when to use fixtures, and how pytest.approx handles floating-point comparisons. That is the baseline.

Scientific code introduces challenges that standard unit testing does not fully address. Numerical algorithms converge or fail to converge. Simulations depend on external solvers and I/O systems. Floating-point arithmetic can behave differently across platforms.

Edge cases that never occur in normal data, such as zero divisors, singular matrices, and extreme aspect ratios, are often where the most serious bugs hide.

This article covers testing patterns for scientific Python code beyond the basics.

The Testing Pyramid for Scientific Code

Every testing strategy has a place on the pyramid. For scientific code, the structure looks like this:

    ┌─────────────────────────────┐
    │    System / Simulation Tests │  ← Integration-level workflows
    │    (3-5 tests)               │
    ├─────────────────────────────┤
    │  Integration / Workflow      │  ← Multi-step simulations
    │  Tests (10-20 tests)         │  ← Data pipelines
    ├─────────────────────────────┤
    │  Property-Based Tests        │  ← Hypothesis with numerical properties
    │  (50-100 tests)              │  ← Convergence testing
    ├─────────────────────────────┤
    │  Unit Tests                  │  ← Numerical assertions
    │  (50-200 tests)              │  ← Fixtures for solvers and meshes
    └─────────────────────────────┘

Unit tests form the foundation. Above them are property-based tests, integration tests, and system-level simulation tests. These higher layers make testing useful for real scientific workflows.

Property-Based Testing with Hypothesis

Why Property-Based Testing Matters for Scientific Code

Hypothesis is not only a teaching tool. It is used by major scientific Python libraries to find bugs in core numerical routines.

Property-based testing changes the mental model of testing. Instead of writing tests with one specific input and one expected output, you write tests that verify mathematical properties. Hypothesis then generates many inputs automatically, including edge cases you may not think to test manually.

Converting a Handwritten Test to Property-Based

Consider this test for a numerical solver wrapper:

# Before: handwritten test with single case
def test_solver_preserves_mass():
    """Check mass conservation for one mesh size."""
    result = solver_solve(mass_mesh, dt=0.01)
    mass_before = sum(result[0])
    mass_after = sum(result[-1])
    assert mass_after / mass_before == pytest.approx(1.0, rel=1e-3)

This test checks one mesh size and one time step. It may pass even if the solver fails for larger meshes or different time-step values.

Here is a Hypothesis version:

from hypothesis import given, strategies as st
from hypothesis import settings, health_check, suppress

@given(
    st.integers(min_value=5, max_value=200),
    st.floats(min_value=1e-6, max_value=1.0)
)
@settings(max_examples=100, deadline=None)
@health_check("init_consistency_suppressor")
def test_solver_mass_conservation(n_cells, dt):
    """Mass should be preserved across valid mesh sizes and timesteps."""
    mesh = create_mesh(n_cells)
    result = solver_solve(mesh, dt=dt)
    
    mass_before = float(sum(result[0]))
    mass_after = float(sum(result[-1]))
    
    ratio = mass_after / mass_before
    assert abs(ratio - 1.0) < 1e-2

This catches problems that a handwritten test can miss:

  • Corner cases in mesh generation, including thin domains and extreme aspect ratios.
  • Time-step sensitivity, especially when large dt values destabilize explicit schemes.
  • Floating-point edge cases, including rounding boundaries and unusual numerical ranges.

The Property Checklist for Scientific Functions

Before writing a property test, identify which properties the function should satisfy.

Property Type Example Why It Matters
Conservation Mass, energy, or momentum is conserved Prevents numerical drift
Symmetry f(a, b) == f(b, a) for symmetric operators Catches asymmetric implementation bugs
Scaling f(k*a, k*b) == k^n * f(a, b) for homogeneous functions Tests dimensional consistency
Monotonicity Temperature decreases with distance in a simple heat problem Checks physical correctness
Bounds Values stay within a valid range Prevents non-physical results
Null behavior f(0) returns the expected edge-case value Guards against division by zero and singular cases

Numerical Convergence Testing

Why Convergence Testing Is Different

Unit tests check whether code produces a specific answer. Convergence tests check whether code approaches the right answer as parameters are refined.

This is critical for iterative solvers, time-stepping schemes, and mesh-refinement studies. Standard assertions such as assert result == expected do not capture convergence behavior. You need to test convergence patterns.

Testing Solver Convergence

Different solver suites use different convergence criteria. The key point is that convergence criteria are solver-specific, and the default tolerance may not be appropriate for every problem.

import numpy as np

def test_solver_convergence():
    """Test that the solver converges with mesh refinement."""
    residuals = []
    for n in [16, 32, 64, 128, 256]:
        mesh = create_mesh(n)
        result = solve_with_convergence(
            mesh,
            criterion="default",
            tolerance=1e-8
        )
        residuals.append(result.residual)
    
    residuals = np.array(residuals)
    
    # Test that residuals decrease monotonically with tolerance
    for i in range(1, len(residuals)):
        if residuals[i] > residuals[i-1] * 1.01:
            raise AssertionError(
                f"Residual increased at refinement level {i}: "
                f"{residuals[i-1]:.2e} → {residuals[i]:.2e}"
            )
    
    # Test convergence rate
    rate = np.log(residuals[0] / residuals[-1]) / np.log(16)
    assert rate >= 1.8, f"Convergence rate too slow: {rate:.2f}"

Tolerance Strategies

Tests should cover the most common tolerance criteria used by the solver stack.

@pytest.mark.parametrize("criterion", [
    "default",
    "unscaled",
    "preconditioned",
    "natural",
])
def test_convergence_criteria(criterion):
    """Verify each criterion produces reasonable residuals."""
    mesh = create_mesh(64)
    result = solve_with_convergence(mesh, criterion=criterion)
    
    assert result.status in (
        "convergence",
        "absolute_tolerance_convergence",
        "relative_tolerance_convergence",
        "happy_breakdown"
    ) or result.residual < 1e-2

Testing Divergence Cases

You should also test that the code handles divergence gracefully.

def test_divergence_handling():
    """Code should not crash on divergent cases."""
    mesh = create_mesh(64)
    result = solve_with_convergence(
        mesh,
        criterion="default",
        divergence_tolerance=100
    )
    
    assert hasattr(result, "status")
    assert hasattr(result, "residual")

Integration Testing for Simulation Workflows

Why Integration Testing Matters

Unit tests verify individual functions. Integration tests verify that functions work together correctly.

For scientific code, this means testing multi-step simulations, data pipelines, solver-coupling patterns, and I/O workflows.

Pattern 1: Mocking External Dependencies in Scientific Code

When testing functions that call external APIs, databases, or file systems, use mocking to isolate the numerical logic.

from unittest.mock import patch, MagicMock

def test_simulation_io():
    """Test simulation I/O without actually writing to disk."""
    
    with patch("h5py.File") as mock_file:
        mock_dataset = MagicMock()
        mock_file.return_value = MagicMock()
        mock_file.return_value.__enter__ = MagicMock(return_value=mock_file)
        mock_file.return_value.__exit__ = MagicMock(return_value=None)
        
        data = prepare_simulation_output(mesh_size=(100, 100, 100))
        
        assert "simulation_data" in mock_file.return_value.keys()
        assert data.shape == (100, 100, 100)

This isolates the data preparation logic from the HDF5 file write. The test verifies that your code formats and packages data correctly without waiting on I/O.

Pattern 2: Parametrized Fixtures for Simulation Sweeps

Instead of manually testing a few mesh sizes, use parametrized fixtures for systematic sweeps.

@pytest.fixture(params=[10, 20, 40, 80, 160])
def mesh_sizes(request):
    """Systematic mesh refinement series."""
    return request.param

@pytest.fixture(params=[
    (1e-3, "fine"),
    (1e-2, "medium"), 
    (1e-1, "coarse")
])
def timesteps(request):
    """Test different timestep regimes."""
    dt, label = request.param
    return dt, label

def test_convergence_with_sweeps(mesh_sizes, timesteps):
    """Test convergence across mesh and timestep refinement."""
    dt, label = timesteps
    
    result = solve_convergence_study(
        mesh_sizes,
        dt=dt,
        quantity="mass_conservation"
    )
    
    assert len(result.residuals) == len(result.mesh_sizes)
    assert all(r > 0 for r in result.residuals)

Pattern 3: Testing Multi-Step Workflows

Scientific simulations often have multi-step pipelines: mesh generation, solving, post-processing, and writing output.

def test_full_workflow():
    """Test the complete simulation pipeline."""
    # Step 1: Generate mesh
    mesh = generate_mesh(geometry="block", resolution=40)
    
    # Step 2: Solve
    solution = solve(mesh, solver="scipy", tolerance=1e-8)
    
    # Step 3: Post-process
    metrics = compute_metrics(solution, quantities=["mass", "energy"])
    
    # Step 4: Verify outputs
    assert metrics["mass"] > 0
    assert metrics["energy"] < 1e6
    
    # Step 5: Write output
    output_path = write_output(solution, path="/tmp/test_output.h5")
    
    # Verify output is readable
    readback = h5py.File(output_path, "r")
    assert "solution" in readback.keys()

Advanced Mocking for Scientific Libraries

The Challenge

Testing scientific code can be slow because NumPy, SciPy, and FiPy functions may have complex return types and expensive execution paths. Mocking them lets you test wrapper logic quickly.

import numpy as np
from unittest.mock import patch

def test_solver_wrapper():
    """Test that your wrapper calls scipy.sparse.linalg correctly."""
    
    with patch("scipy.sparse.linalg.spsolve") as mock_solve:
        mock_solve.return_value = np.zeros(100)
        
        result = your_solver_wrapper(mesh_size=100)
        
        mock_solve.assert_called_once()
        call_args = mock_solve.call_args
        assert isinstance(call_args[0], np.ndarray)
        assert isinstance(call_args[1], np.ndarray)

This verifies wrapper logic without waiting for the actual solver to run. The test can finish in milliseconds instead of seconds.

Testing Roundtrips for Scientific Data

Testing that scientific data survives a read-write-read cycle is a powerful pattern.

from hypothesis import given, strategies as st

@given(st.integers(min_value=10, max_value=200))
def test_hdf5_roundtrip(n_cells):
    """Writing and reading HDF5 data should preserve values."""
    mesh = create_mesh(n_cells)
    solution = solve(mesh, dt=0.01)
    
    path = write_to_hdf5(solution, path="/tmp/test_roundtrip.h5")
    
    readback = read_from_hdf5(path)
    
    assert np.allclose(readback, solution, rtol=1e-6, atol=1e-8)

This catches data serialization bugs, dtype mismatches, and HDF5 chunking issues that could silently corrupt results.

Testing Reproducibility

Deterministic Seed Management

Floating-point operations can be deterministic when the seed is fixed, but reproducibility also requires consistent library versions, compilation flags, and MPI decomposition patterns.

def test_reproducible_seed():
    """Fixed seed should produce identical results."""
    np.random.seed(42)
    
    mesh1 = generate_mesh(seed=42)
    solution1 = solve(mesh1)
    
    np.random.seed(42)
    mesh2 = generate_mesh(seed=42)
    solution2 = solve(mesh2)
    
    assert np.allclose(solution1, solution2)

Cross-Platform Floating-Point Testing

Different platforms may produce slightly different floating-point results because of hardware and library differences.

def test_floating_point_stability():
    """Results should be consistent across platforms within tolerance."""
    mesh = create_mesh(64)
    solution = solve(mesh)
    
    # All values should be finite
    assert np.all(np.isfinite(solution))
    
    # Values should not be suspiciously small
    assert np.all(np.abs(solution) > np.finfo(np.float64).tiny * 1e-3)

What We Recommend: The Testing Strategy

For a production scientific Python project, use a layered testing strategy.

Must Have

  1. Unit tests with pytest.approx for numerical assertions.
  2. Parametrized fixtures for systematic testing across parameters.
  3. Convergence tests that verify solver behavior across refinement.

Should Have

  1. Hypothesis property-based tests for edge cases and numerical robustness.
  2. Integration tests for multi-step simulation workflows.
  3. Mocking strategies for external dependencies and solver interfaces.

Nice to Have

  1. Cross-platform reproducibility tests.
  2. Data roundtrip tests that write, read, and verify scientific data.
  3. Performance benchmarks with pytest-benchmark.

Common Mistakes

Mistake 1: Using Tolerance-Dependent Assertions Incorrectly

Do not write numerical tests like assert result == expected. Use pytest.approx or tolerance-aware assertions for numerical code. Tolerance depends on problem scale, so a fixed tolerance may fail for larger problems.

Mistake 2: Ignoring Convergence

If code uses iterative solvers, do not skip convergence testing. A solver that works on one mesh may diverge on another. Test convergence behavior, not only the final result.

Mistake 3: Testing Only the Happy Path

Property-based testing forces you to test edge cases that manual tests often miss. Add at least one Hypothesis test for each major numerical function. It can reveal bugs that normal example-based tests never catch.

Summary

Unit tests are necessary, but they are not sufficient for scientific code. Property-based testing, convergence testing, integration workflows, and reproducibility checks address the specific challenges that numerical simulations introduce.

The key insight is that scientific testing is not only about verifying that code produces one right answer. It is about verifying that code behaves correctly. It should converge, conserve, scale, and handle edge cases gracefully.

Start with the must-have patterns, then add the should-have patterns as the test suite matures. Use the nice-to-have patterns to harden the codebase over time.

Property-based testing with Hypothesis is one of the highest-value additions you can make because it catches bugs that manual tests often miss.

Related Guides

What To Do Next

If you have just started writing tests for scientific code, begin with the must-have patterns above. Once those are in place, add Hypothesis tests for the most important numerical functions, especially functions called repeatedly in simulations.

For teams working on production scientific software, the investment in property-based testing can pay off quickly. It is the same strategy used by major scientific Python libraries and is one of the most effective ways to find edge-case bugs in numerical code.

If you need help designing a testing strategy for a simulation project, consider booking a consultation. A practical review can map your code structure to unit tests, convergence tests, integration tests, and reproducibility checks.