Managing dependencies in scientific Python is one of the most important parts of reproducible research. A simulation can fail months later because package versions changed, a Python interpreter shifted, or a numerical library updated in a way that affects results.
This guide explains how lockfiles, environments, and modern package managers help researchers keep scientific Python workflows reproducible, portable, and easier to maintain.
Key Takeaways
- Lockfiles are one of the most impactful steps for reproducible scientific Python workflows.
uvis a fast modern package manager and is increasingly used as an alternative to pip, pip-tools, and Poetry.- PEP 751 introduces a standardized lockfile format,
pylock.toml, to reduce tool fragmentation. - Conda remains important for scientific Python projects with non-Python dependencies such as C, C++, MPI, HDF5, and CUDA.
- Your tool choice should match your workflow:
uvfor speed and reproducibility, conda for scientific and HPC stacks, and Poetry for mature publishing pipelines.
The Hidden Failure Mode in Scientific Python
Every scientific computing researcher eventually sees the same problem. A simulation worked last month, but now it fails with a version mismatch. Worse, it may still run but produce subtly different results.
This is not just an inconvenience. It is a reproducibility failure. The computational environment changed, including package versions, Python interpreters, or lower-level dependencies. Your code either broke or silently produced results that are no longer equivalent to the original run.
The root cause is usually unpinned dependencies. Even when code is version-controlled, the work remains tied to a specific machine and moment in time if the environment is not reproducible.
Scientific Python makes this problem especially serious. Unlike many web projects, scientific computing often depends on complex chains of numerical libraries, compiled packages, and hardware-sensitive builds. NumPy, SciPy, FiPy, HDF5, MPI, OpenBLAS, and specific Python versions all need to work together.
The solution is to manage dependencies deliberately. Use lockfiles to freeze exact versions, track environments in version control, and document how the environment should be recreated.
What Is a Lockfile?
A lockfile is a snapshot of a project’s exact software environment. It records the specific versions of all dependencies, including sub-dependencies, that were installed when the project was last configured.
Think of it as a frozen recipe for your software setup. Anyone can recreate the same environment later, even if package versions have changed upstream.
Without Lockfiles
# requirements.txt without a lockfile
numpy>=1.25.0
scipy>=1.11.0
fipy>=4.3.0
# On another machine or after a month:
pip install -r requirements.txt
# This might install newer versions:
# numpy 1.26.x, scipy 1.12.x, or different dependency builds
This kind of setup allows package versions to shift. The code may still run, but numerical behavior, precision, solver behavior, or dependency internals may change.
With Lockfiles
# uv.lock or another lockfile records exact resolved versions
uv sync
# Anyone who runs the sync command gets the same resolved environment
This matters for simulation projects because numerical libraries can change code paths across versions. Floating-point behavior can differ between builds. PDE solvers and scientific frameworks can also behave differently across releases.
A lockfile reduces this uncertainty by making installation deterministic.
The Python Dependency Management Landscape
The Python ecosystem has several practical dependency management options. Each tool has different strengths, and the right choice depends on whether the project is pure Python, HPC-focused, package-oriented, or built for long-term reproducibility.
uv: The Modern Fast Option
uv is a Rust-based package manager built by Astral. It aims to replace several common Python tools, including pip, pip-tools, pipx, pyenv, and virtualenv, with a single fast workflow.
Researchers are switching to uv for several reasons:
- Speed.
uvcan install packages much faster than traditional pip-based workflows, especially with caching. - Universal lockfile. A single
uv.lockcan support reproducible installs across platforms. - pip compatibility. Commands such as
uv pip installmake migration easier. - Python version management.
uvcan install and manage Python interpreters, reducing the need for separate tools. - Standalone installation.
uvdoes not require Python to be installed first.
Use uv for new projects where speed, reproducibility, and simple migration from pip matter.
It is especially useful for:
- New simulation projects.
- CI/CD pipelines where install time matters.
- Projects that need dependency management and Python version management together.
- Teams migrating from pip or pip-tools.
Basic uv Workflow
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create a new project with Python 3.11
uv init my-simulation --python 3.11
cd my-simulation
# Add dependencies
uv add numpy scipy fipy
# Generate a lockfile
uv lock
# Sync to install from the lockfile
uv sync
The main trade-off is that Poetry still has mature publishing workflows and advanced dependency group handling. If you maintain a scientific Python package for PyPI, Poetry may still be attractive.
Poetry: The Established Project Manager
Poetry is a complete project management tool. It handles dependency resolution, virtual environments, package building, and publishing to PyPI.
Researchers still choose Poetry for several reasons:
- Dependency groups. Poetry can separate development, testing, documentation, and CI dependencies cleanly.
- Publishing workflow. It has built-in support for building and publishing packages.
- Mature ecosystem. Poetry has years of production use, extensive documentation, and a large community.
Use Poetry when:
- You publish a Python package to PyPI.
- You need mature dependency group management.
- Your team values a long track record and established documentation.
The trade-off is speed. Poetry can take longer than uv for cold installs, lockfile generation, and package additions. For small projects, this may not matter. For large projects and CI pipelines, the difference can become noticeable.
Conda: The Scientific Staple
Conda remains one of the most important tools for scientific Python, especially when the project needs non-Python dependencies.
Conda is useful for research because it can manage Python packages, R packages, compiled libraries, compilers, MPI, HDF5, CUDA, and other system-level dependencies in one environment.
Conda is important when:
- You need non-Python dependencies such as C, C++, MPI, HDF5, FFTW, or CUDA.
- You target HPC clusters.
- Your scientific libraries depend on compiled C or Fortran code.
- You work across Python and R.
The trade-off is that conda can be slower than uv for pure-Python dependency resolution and may not produce universal cross-platform lockfiles as easily.
Basic Conda Workflow
# Create environment
conda create -n my-sim python=3.11
conda activate my-sim
# Install packages, including non-Python dependencies
conda install numpy scipy hdf5 openmpi
# Export to environment file
conda env export --no-builds | grep -v "prefix:" > environment.yml
# Restore the environment
conda env create -f environment.yml
pip-tools: The Minimalist Option
pip-tools bridges traditional pip workflows and lockfiles. It is lightweight and stays close to the pip interface.
Use pip-tools when:
- You want a simple approach.
- You are migrating from pip and do not want to learn a larger tool.
- Your project is small and does not need advanced dependency groups.
The trade-off is that pip-tools can generate platform-specific output files. It is also less comprehensive than Poetry or uv.
PEP 751: The Future of Lockfiles
PEP 751 proposes a standardized file format for recording Python dependencies so environments can be installed reproducibly. This format is called pylock.toml.
What It Solves
Python dependency tools have historically used different lockfile formats. PDM, pip freeze, pip-tools, Poetry, and uv all approach environment locking differently.
This creates several problems:
- Vendor lock-in. It can be difficult to switch between tools.
- Tooling fragmentation. Security scanners and automation tools may support only some formats.
- Auditing difficulty. Different formats have different syntax and conventions.
PEP 751 proposes pylock.toml as a standardized format.
Why It Matters for Scientific Python
The proposed format is useful for research because it is:
- Human-readable, using TOML.
- Machine-generated, so tools can write consistent output.
- Consumable by non-Python tools.
- Secure by design, with cryptographic hashes for supply chain protection.
- Flexible enough to represent multiple environments or dependency groups.
Once adopted widely, pylock.toml can reduce the need for tool-specific lockfile formats. For researchers, this means better interoperability. A lockfile could be consumed by any compliant tool and audited more easily by external services.
The Reproducibility Spectrum
Reproducibility is a spectrum. The right level depends on the risk, project duration, and publication requirements.
| Level | Approach | Cost | Reproducibility | Best For |
|---|---|---|---|---|
| Good | Document dependencies in README | Minimal | Manual verification | Quick scripts and tutorials |
| Better | Environment file such as requirements.txt or environment.yml |
Low | Automated install | Shared projects and collaborators |
| Best | Lockfile plus version-controlled environment | Moderate | Exact reproduction | Publications and long-term archives |
| Maximum | Containerization with Docker or Singularity | High | Strong isolation | HPC and published workflows |
For research projects, the minimum is to pin exact versions in requirements.txt or environment.yml. The recommended approach is to use uv lock or conda export to generate a reproducible environment file. The best practice is to track that file in Git and document how to recreate the environment in the README.
Practical Workflows for Scientific Projects
Workflow 1: New Simulation Project with uv
# Initialize project
uv init my-simulation --python 3.11
cd my-simulation
# Add core dependencies
uv add numpy scipy matplotlib
# Add scientific libraries
uv add fipy mpmath
# Generate lockfile
uv lock
# Add dev dependencies
uv add --group dev pytest black ruff
# Commit everything
git add pyproject.toml uv.lock
git commit -m "Initial project structure with pinned dependencies"
This works because the uv.lock file is committed to Git. Anyone cloning the repository and running uv sync gets the same resolved versions.
Workflow 2: HPC Project with Conda and Singularity
# On your workstation
conda create -n hpc-sim python=3.11
conda activate hpc-sim
conda install numpy scipy hdf5 openmpi
# Export environment
conda env export --no-builds | grep -v "prefix:" > environment.yml
# Build Singularity image from Docker
docker build -t my-sim:latest .
singularity build my-sim.sif docker://my-sim:latest
# Transfer to HPC cluster
scp my-sim.sif hpc-cluster:/scratch/
This works because conda manages dependencies on the workstation, while Singularity provides HPC-compatible containerization. The environment.yml file can be version-controlled and recreated on another system.
Workflow 3: Migration from pip to uv
# Start with existing requirements.txt
pip install uv
# Replace pip with uv for installs
uv pip install -r requirements.txt
# Generate a uv lockfile
uv lock
# From now on, use uv sync instead of pip install
uv sync
This migration path is simple because uv is compatible with many pip-style workflows. The uv.lock file becomes the single source of truth for the environment.
Common Mistakes and How to Avoid Them
Mistake 1: Using Unpinned Dependencies
# WRONG: Allows automatic updates
numpy>=1.0
scipy>=1.0
# RIGHT: Pin exact versions
numpy==1.26.4
scipy==1.11.4
Minor version changes can alter numerical behavior, convergence criteria, or floating-point precision. Pin exact versions when reproducibility matters.
Mistake 2: Committing No Lockfile
If your repository contains pyproject.toml but no lockfile, your project is not fully reproducible. A lockfile is the minimum requirement for deterministic builds.
Generate a lockfile and commit it:
uv lock # or conda export
git add uv.lock # or environment.yml
git commit -m "Add lockfile for reproducible environment"
Mistake 3: Ignoring Sub-Dependencies
Even if you pin top-level packages, sub-dependencies can still change behavior.
# If you pin fipy but not its dependencies:
# numpy, mpmath, and other packages may upgrade independently
Use a tool that resolves and pins the full dependency tree, such as uv, conda, or Poetry.
Mistake 4: Platform-Specific Environments
Some tools can generate platform-specific output. If you develop on macOS but deploy on Linux, environment files may fail or resolve differently.
Use tools that support cross-platform lockfiles, or generate lockfiles on the target platform.
Mistake 5: External Data Dependencies
If a simulation depends on external APIs or databases that change, reproducibility can break even when the Python environment is locked.
Snapshot external data when possible. If that is not possible, use versioned API endpoints and document the exact version or access date.
What We Recommend: A Decision Framework
Use this decision framework when choosing a dependency management tool for a scientific Python project.
-
Do you need non-Python dependencies such as C, C++, MPI, CUDA, HDF5, or FFTW?
- Yes: use conda, or use conda inside Docker or Singularity.
- No: continue to the next question.
-
Are you publishing a Python package to PyPI?
- Yes: consider Poetry for mature publishing workflows or
uvfor speed. - No: continue to the next question.
- Yes: consider Poetry for mature publishing workflows or
-
Are you working in CI/CD pipelines?
- Yes: use
uvbecause faster installs can reduce CI time. - No: either
uv, conda, or Poetry may work depending on the project.
- Yes: use
-
How important is cross-platform compatibility?
- High: use
uvwhere a universal lockfile fits your needs. - Moderate: conda can work well, especially on Unix-like research systems.
- High: use
For most new research projects, uv is a strong default when you need speed and reproducibility. For HPC projects or workflows with non-Python dependencies, pair conda with containerization.
Summary
Managing dependencies in scientific Python is not optional. It is a foundation of reproducible research.
The most important points are:
- Lockfiles are essential. Pin exact versions and track them in Git.
uvis a strong modern option for fast, reproducible environments.- Conda remains vital for scientific stacks with non-Python dependencies.
- PEP 751 aims to unify lockfile formats through
pylock.toml. - A README with clear installation instructions is the minimum documentation every project needs.
Start by auditing current projects. Check whether environments are documented and dependencies are pinned. Convert one project to use a lockfile. The upfront cost pays off when you or another researcher need to rerun the simulation months or years later with confidence.
Related Guides
- Reproducible Research Workflows: Docker and Conda for Simulation Projects — Learn how Docker and Conda complement each other for simulation reproducibility.
- Unit Testing for Scientific Code: pytest Strategies for Research Projects — CI patterns for reproducible simulation pipelines.
- GPU Acceleration for FiPy Simulations: CuPy and Numba Integration Guide — Manage GPU dependencies with PyPI and conda.
- Working Through Your First FiPy Example — Start with a dependency-managed FiPy project.
References and Further Reading
- uv Documentation — Official guide for the Rust-based package manager.
- PEP 751: pylock.toml Specification — Standardized lockfile format proposal.
- The Turing Way: Reproducible Research Definitions — Community standards for reproducible workflows.
- Python Package Manager Shootout — Benchmarks comparing uv, Poetry, pip-tools, PDM, and Pixi.
- Wagtail: uv Overtakes Poetry — Industry adoption signal.