Key Takeaways
- pytest basics cover unit testing. Fixtures, parametrization, and
pytest.approxare your starting point, not the finish line. - Property-based testing with Hypothesis catches edge cases that manual tests often miss. It is used by major scientific Python projects such as NumPy, JAX, and PyTorch.
- Numerical convergence testing requires solver-specific tolerance strategies. Standard assertions do not capture convergence behavior.
- Integration testing for simulation workflows needs mocking patterns that isolate external dependencies while preserving numerical logic.
- Reproducibility testing goes beyond unit tests. You need deterministic seed management and cross-platform floating-point verification.
What To Know First
If you already know the basics of unit testing for scientific code, you likely understand how to write a test_ function, when to use fixtures, and how pytest.approx handles floating-point comparisons. That is the baseline.
Scientific code introduces challenges that standard unit testing does not fully address. Numerical algorithms converge or fail to converge. Simulations depend on external solvers and I/O systems. Floating-point arithmetic can behave differently across platforms.
Edge cases that never occur in normal data, such as zero divisors, singular matrices, and extreme aspect ratios, are often where the most serious bugs hide.
This article covers testing patterns for scientific Python code beyond the basics.
The Testing Pyramid for Scientific Code
Every testing strategy has a place on the pyramid. For scientific code, the structure looks like this:
┌─────────────────────────────┐
│ System / Simulation Tests │ ← Integration-level workflows
│ (3-5 tests) │
├─────────────────────────────┤
│ Integration / Workflow │ ← Multi-step simulations
│ Tests (10-20 tests) │ ← Data pipelines
├─────────────────────────────┤
│ Property-Based Tests │ ← Hypothesis with numerical properties
│ (50-100 tests) │ ← Convergence testing
├─────────────────────────────┤
│ Unit Tests │ ← Numerical assertions
│ (50-200 tests) │ ← Fixtures for solvers and meshes
└─────────────────────────────┘
Unit tests form the foundation. Above them are property-based tests, integration tests, and system-level simulation tests. These higher layers make testing useful for real scientific workflows.
Property-Based Testing with Hypothesis
Why Property-Based Testing Matters for Scientific Code
Hypothesis is not only a teaching tool. It is used by major scientific Python libraries to find bugs in core numerical routines.
Property-based testing changes the mental model of testing. Instead of writing tests with one specific input and one expected output, you write tests that verify mathematical properties. Hypothesis then generates many inputs automatically, including edge cases you may not think to test manually.
Converting a Handwritten Test to Property-Based
Consider this test for a numerical solver wrapper:
# Before: handwritten test with single case
def test_solver_preserves_mass():
"""Check mass conservation for one mesh size."""
result = solver_solve(mass_mesh, dt=0.01)
mass_before = sum(result[0])
mass_after = sum(result[-1])
assert mass_after / mass_before == pytest.approx(1.0, rel=1e-3)
This test checks one mesh size and one time step. It may pass even if the solver fails for larger meshes or different time-step values.
Here is a Hypothesis version:
from hypothesis import given, strategies as st
from hypothesis import settings, health_check, suppress
@given(
st.integers(min_value=5, max_value=200),
st.floats(min_value=1e-6, max_value=1.0)
)
@settings(max_examples=100, deadline=None)
@health_check("init_consistency_suppressor")
def test_solver_mass_conservation(n_cells, dt):
"""Mass should be preserved across valid mesh sizes and timesteps."""
mesh = create_mesh(n_cells)
result = solver_solve(mesh, dt=dt)
mass_before = float(sum(result[0]))
mass_after = float(sum(result[-1]))
ratio = mass_after / mass_before
assert abs(ratio - 1.0) < 1e-2
This catches problems that a handwritten test can miss:
- Corner cases in mesh generation, including thin domains and extreme aspect ratios.
- Time-step sensitivity, especially when large
dtvalues destabilize explicit schemes. - Floating-point edge cases, including rounding boundaries and unusual numerical ranges.
The Property Checklist for Scientific Functions
Before writing a property test, identify which properties the function should satisfy.
| Property Type | Example | Why It Matters |
|---|---|---|
| Conservation | Mass, energy, or momentum is conserved | Prevents numerical drift |
| Symmetry | f(a, b) == f(b, a) for symmetric operators |
Catches asymmetric implementation bugs |
| Scaling | f(k*a, k*b) == k^n * f(a, b) for homogeneous functions |
Tests dimensional consistency |
| Monotonicity | Temperature decreases with distance in a simple heat problem | Checks physical correctness |
| Bounds | Values stay within a valid range | Prevents non-physical results |
| Null behavior | f(0) returns the expected edge-case value |
Guards against division by zero and singular cases |
Numerical Convergence Testing
Why Convergence Testing Is Different
Unit tests check whether code produces a specific answer. Convergence tests check whether code approaches the right answer as parameters are refined.
This is critical for iterative solvers, time-stepping schemes, and mesh-refinement studies. Standard assertions such as assert result == expected do not capture convergence behavior. You need to test convergence patterns.
Testing Solver Convergence
Different solver suites use different convergence criteria. The key point is that convergence criteria are solver-specific, and the default tolerance may not be appropriate for every problem.
import numpy as np
def test_solver_convergence():
"""Test that the solver converges with mesh refinement."""
residuals = []
for n in [16, 32, 64, 128, 256]:
mesh = create_mesh(n)
result = solve_with_convergence(
mesh,
criterion="default",
tolerance=1e-8
)
residuals.append(result.residual)
residuals = np.array(residuals)
# Test that residuals decrease monotonically with tolerance
for i in range(1, len(residuals)):
if residuals[i] > residuals[i-1] * 1.01:
raise AssertionError(
f"Residual increased at refinement level {i}: "
f"{residuals[i-1]:.2e} → {residuals[i]:.2e}"
)
# Test convergence rate
rate = np.log(residuals[0] / residuals[-1]) / np.log(16)
assert rate >= 1.8, f"Convergence rate too slow: {rate:.2f}"
Tolerance Strategies
Tests should cover the most common tolerance criteria used by the solver stack.
@pytest.mark.parametrize("criterion", [
"default",
"unscaled",
"preconditioned",
"natural",
])
def test_convergence_criteria(criterion):
"""Verify each criterion produces reasonable residuals."""
mesh = create_mesh(64)
result = solve_with_convergence(mesh, criterion=criterion)
assert result.status in (
"convergence",
"absolute_tolerance_convergence",
"relative_tolerance_convergence",
"happy_breakdown"
) or result.residual < 1e-2
Testing Divergence Cases
You should also test that the code handles divergence gracefully.
def test_divergence_handling():
"""Code should not crash on divergent cases."""
mesh = create_mesh(64)
result = solve_with_convergence(
mesh,
criterion="default",
divergence_tolerance=100
)
assert hasattr(result, "status")
assert hasattr(result, "residual")
Integration Testing for Simulation Workflows
Why Integration Testing Matters
Unit tests verify individual functions. Integration tests verify that functions work together correctly.
For scientific code, this means testing multi-step simulations, data pipelines, solver-coupling patterns, and I/O workflows.
Pattern 1: Mocking External Dependencies in Scientific Code
When testing functions that call external APIs, databases, or file systems, use mocking to isolate the numerical logic.
from unittest.mock import patch, MagicMock
def test_simulation_io():
"""Test simulation I/O without actually writing to disk."""
with patch("h5py.File") as mock_file:
mock_dataset = MagicMock()
mock_file.return_value = MagicMock()
mock_file.return_value.__enter__ = MagicMock(return_value=mock_file)
mock_file.return_value.__exit__ = MagicMock(return_value=None)
data = prepare_simulation_output(mesh_size=(100, 100, 100))
assert "simulation_data" in mock_file.return_value.keys()
assert data.shape == (100, 100, 100)
This isolates the data preparation logic from the HDF5 file write. The test verifies that your code formats and packages data correctly without waiting on I/O.
Pattern 2: Parametrized Fixtures for Simulation Sweeps
Instead of manually testing a few mesh sizes, use parametrized fixtures for systematic sweeps.
@pytest.fixture(params=[10, 20, 40, 80, 160])
def mesh_sizes(request):
"""Systematic mesh refinement series."""
return request.param
@pytest.fixture(params=[
(1e-3, "fine"),
(1e-2, "medium"),
(1e-1, "coarse")
])
def timesteps(request):
"""Test different timestep regimes."""
dt, label = request.param
return dt, label
def test_convergence_with_sweeps(mesh_sizes, timesteps):
"""Test convergence across mesh and timestep refinement."""
dt, label = timesteps
result = solve_convergence_study(
mesh_sizes,
dt=dt,
quantity="mass_conservation"
)
assert len(result.residuals) == len(result.mesh_sizes)
assert all(r > 0 for r in result.residuals)
Pattern 3: Testing Multi-Step Workflows
Scientific simulations often have multi-step pipelines: mesh generation, solving, post-processing, and writing output.
def test_full_workflow():
"""Test the complete simulation pipeline."""
# Step 1: Generate mesh
mesh = generate_mesh(geometry="block", resolution=40)
# Step 2: Solve
solution = solve(mesh, solver="scipy", tolerance=1e-8)
# Step 3: Post-process
metrics = compute_metrics(solution, quantities=["mass", "energy"])
# Step 4: Verify outputs
assert metrics["mass"] > 0
assert metrics["energy"] < 1e6
# Step 5: Write output
output_path = write_output(solution, path="/tmp/test_output.h5")
# Verify output is readable
readback = h5py.File(output_path, "r")
assert "solution" in readback.keys()
Advanced Mocking for Scientific Libraries
The Challenge
Testing scientific code can be slow because NumPy, SciPy, and FiPy functions may have complex return types and expensive execution paths. Mocking them lets you test wrapper logic quickly.
import numpy as np
from unittest.mock import patch
def test_solver_wrapper():
"""Test that your wrapper calls scipy.sparse.linalg correctly."""
with patch("scipy.sparse.linalg.spsolve") as mock_solve:
mock_solve.return_value = np.zeros(100)
result = your_solver_wrapper(mesh_size=100)
mock_solve.assert_called_once()
call_args = mock_solve.call_args
assert isinstance(call_args[0], np.ndarray)
assert isinstance(call_args[1], np.ndarray)
This verifies wrapper logic without waiting for the actual solver to run. The test can finish in milliseconds instead of seconds.
Testing Roundtrips for Scientific Data
Testing that scientific data survives a read-write-read cycle is a powerful pattern.
from hypothesis import given, strategies as st
@given(st.integers(min_value=10, max_value=200))
def test_hdf5_roundtrip(n_cells):
"""Writing and reading HDF5 data should preserve values."""
mesh = create_mesh(n_cells)
solution = solve(mesh, dt=0.01)
path = write_to_hdf5(solution, path="/tmp/test_roundtrip.h5")
readback = read_from_hdf5(path)
assert np.allclose(readback, solution, rtol=1e-6, atol=1e-8)
This catches data serialization bugs, dtype mismatches, and HDF5 chunking issues that could silently corrupt results.
Testing Reproducibility
Deterministic Seed Management
Floating-point operations can be deterministic when the seed is fixed, but reproducibility also requires consistent library versions, compilation flags, and MPI decomposition patterns.
def test_reproducible_seed():
"""Fixed seed should produce identical results."""
np.random.seed(42)
mesh1 = generate_mesh(seed=42)
solution1 = solve(mesh1)
np.random.seed(42)
mesh2 = generate_mesh(seed=42)
solution2 = solve(mesh2)
assert np.allclose(solution1, solution2)
Cross-Platform Floating-Point Testing
Different platforms may produce slightly different floating-point results because of hardware and library differences.
def test_floating_point_stability():
"""Results should be consistent across platforms within tolerance."""
mesh = create_mesh(64)
solution = solve(mesh)
# All values should be finite
assert np.all(np.isfinite(solution))
# Values should not be suspiciously small
assert np.all(np.abs(solution) > np.finfo(np.float64).tiny * 1e-3)
What We Recommend: The Testing Strategy
For a production scientific Python project, use a layered testing strategy.
Must Have
- Unit tests with
pytest.approxfor numerical assertions. - Parametrized fixtures for systematic testing across parameters.
- Convergence tests that verify solver behavior across refinement.
Should Have
- Hypothesis property-based tests for edge cases and numerical robustness.
- Integration tests for multi-step simulation workflows.
- Mocking strategies for external dependencies and solver interfaces.
Nice to Have
- Cross-platform reproducibility tests.
- Data roundtrip tests that write, read, and verify scientific data.
- Performance benchmarks with
pytest-benchmark.
Common Mistakes
Mistake 1: Using Tolerance-Dependent Assertions Incorrectly
Do not write numerical tests like assert result == expected. Use pytest.approx or tolerance-aware assertions for numerical code. Tolerance depends on problem scale, so a fixed tolerance may fail for larger problems.
Mistake 2: Ignoring Convergence
If code uses iterative solvers, do not skip convergence testing. A solver that works on one mesh may diverge on another. Test convergence behavior, not only the final result.
Mistake 3: Testing Only the Happy Path
Property-based testing forces you to test edge cases that manual tests often miss. Add at least one Hypothesis test for each major numerical function. It can reveal bugs that normal example-based tests never catch.
Summary
Unit tests are necessary, but they are not sufficient for scientific code. Property-based testing, convergence testing, integration workflows, and reproducibility checks address the specific challenges that numerical simulations introduce.
The key insight is that scientific testing is not only about verifying that code produces one right answer. It is about verifying that code behaves correctly. It should converge, conserve, scale, and handle edge cases gracefully.
Start with the must-have patterns, then add the should-have patterns as the test suite matures. Use the nice-to-have patterns to harden the codebase over time.
Property-based testing with Hypothesis is one of the highest-value additions you can make because it catches bugs that manual tests often miss.
Related Guides
- Unit Testing for Scientific Code: pytest Strategies for Research Projects — The foundation you need before this article.
- When to Use FEM, FVM, or FDM: A Practical Comparison for Beginners — Context on numerical methods.
- Mesh Quality and Convergence Studies: A Hands-On Guide for Scientific Simulations — Related convergence testing context.
What To Do Next
If you have just started writing tests for scientific code, begin with the must-have patterns above. Once those are in place, add Hypothesis tests for the most important numerical functions, especially functions called repeatedly in simulations.
For teams working on production scientific software, the investment in property-based testing can pay off quickly. It is the same strategy used by major scientific Python libraries and is one of the most effective ways to find edge-case bugs in numerical code.
If you need help designing a testing strategy for a simulation project, consider booking a consultation. A practical review can map your code structure to unit tests, convergence tests, integration tests, and reproducibility checks.