8 Levels of Reproducibility: Future-Proofing Your Python Projects

This could be you! Click here to submit an abstract for our Maker Blog Series.


Congratulations! You’ve written a bit of Python code that does something useful. Now what? Will you be able to run that code tomorrow, a year from now, or 10 years from now? If you give it to someone else, will they be able to run it? Can you set it up to run on a regular schedule without you being present, or deploy it live on a server without needing special information stored only in your head?

If you don’t plan for reproducibility, then the answer to all these questions is almost certainly “No.” Python programs typically build on code from a wide range of separately installed libraries, and your code can easily stop working if even one of those libraries isn’t available or has an older or newer version than the one you used. Luckily, Python has tools to capture all these dependencies so you can achieve just about any level of reproducibility you want, though each new level will take a bit more work and will require more disk space or other resources.

As a helpful guide, here’s a list of increasing levels of reproducibility; just choose the level you need and follow the instructions to achieve it! We’ll assume the code you want to reproduce is expressed as either lines of Python in Jupyter notebook code cells, or as a .py Python text file. We’ll also assume that if your code requires data, the data is static and small enough to be archived along with your code (but we’ll come back to that assumption later!).

  • Level -1 (not reproducible): Your code ran and worked properly once, with commands in some order, in some previously edited version, but if you do python file.py or “Restart and Run All” with your notebook it will fail for some reason—even if you haven’t changed anything about your environment. If you’ve been entering commands in Jupyter or at a command line without carefully thinking about the execution order, this is likely where you will be starting out!

  • Level 0 (reproducible only by you, today): Your code currently runs, repeatably, in your initial environment. Reproducible only by you and only if your environment and any other external code and data are available and do not change and if you know what commands to run in that environment and which environment to use (where “command” would be something like jupyter notebook file.ipynb (+ Restart and Run All) or python file.py arg1).

  • Level 1 (reproducible by others with guidance): You have captured the important parts of your environment in a requirements.txt or environment.yml file with your direct dependencies pinned by version (e.g. pandas=1.4.1), and you save or distribute an archive (e.g. myproject.zip) or a repository (e.g. on github.com) containing your commands, the environment specification, any data files you need, and unambiguous human-readable instructions about how to reproduce the results. Reproducible by you and anyone you communicate the archive to, as long as they can find the correct commands to run in that environment, they can interpret your instructions properly, they have access to the pip or conda packages referenced, and no unpinned package has been updated by others in an incompatible way.

  • Level 2 (reproducible today, by anyone with internet access): You have captured the commands, any external code, and the environment in machine-readable form using anaconda-project or an equivalent tool, with direct dependencies pinned by version. Reproducible today by anyone who has access to the pip or conda packages referenced as long as no unpinned package has been released with incompatible updates. Because anaconda-project captures the commands explicitly, there’s no danger of ambiguous or missing steps in your instructions as long as you’ve tested the project locally, but it can still fail when the packages available on the internet get updated, even packages you may never have heard of (if they are dependencies of the packages you are using).

  • Level 3 (reproducible indefinitely, by anyone with internet access): You have captured both the commands and the environment into a fully locked project (using anaconda-project lock, conda-lock, or an equivalent tool), pinning every recursive dependency down to the individual build (not just pandas=1.4.1, but also each installed dependency like pytz=2020.1, even if you never use pytz directly yourself). Reproducible by anyone who has access to the conda or pip packages referenced as long as no locked package has become unavailable (which is rare but can happen due to a security risk or other issue discovered later).

  • Level 4 (reproducible indefinitely, without depending on the internet): You have captured both the commands and the environment into a fully locked anaconda project with all pip or conda packages unpacked and included using anaconda-project –pack-envs. Reproducible even with no conda repository available or if packages have been deleted from the repo as long as nothing depends on system libraries, but generates much bigger archives that include binary files. Note that only a single platform’s packages can be packed in this way; building on other platforms will still require internet access (or building separate Windows, Linux, and Mac archives).

  • Level 5 (reproducible with Docker): Same as 4, but creating a Docker image containing your project (which can be as simple as using anaconda-project dockerize). Reproducible and even deployable by anyone with access to your operating system (OS), even if system libraries change and no package repository is available, as long as Docker itself is available (which is widely true today). Generates even bigger archives that are less directly accessible by humans since they mix configuration with binary artifacts, but are more isolated from the underlying computing system than with level 4.

  • Level 6 (reproducible with a virtual machine): Same as 5, but putting Docker and the Docker image onto a virtual machine (VM) image. Reproducible by anyone on any OS where VMs can run—even if you no longer have access to the original type of hardware or OS—as long as you can run a VM image, but generates even larger archives than Docker images due to incorporating even more of the underlying computing system being used.

  • Level 7 (reproducible on untouched hardware): Same as 6 but with the VM image also installed on air-gapped physical hardware kept locked up for safety. Reproducible even if current hardware or OSs can no longer run VM images, but potentially only by those with physical access and as long as the hardware remains operational.

Unless you explicitly make your project reproducible by others, you’re likely to be stuck at Level -1 or level 0. Reaching Level 1 requires tools for curating a Python environment that is separate from the underlying system, such as pip+venv, poetry, or conda, which are commonly available but not really sufficient for reproducibility since they do not capture the commands or data files needed. In practice, I shoot for making a level 3 archive, which is the first level that gives me any confidence that I can share my code and expect it to work for me or others a few months later, while generating a compact file archive containing only what is strictly required to reproduce the project. Reaching level 3 does require installing anaconda-project, but that’s an easy open-source download and the resulting project archive is not appreciably larger than any other archive that could contain your project. (There are also plans to incorporate anaconda-project into conda itself eventually, shrinking even this low barrier for reproducibility.) Anaconda Nucleus includes a simple guide to getting started with level 3 and above using anaconda-project.

Level 4 is also often useful, but it requires a huge increase in file size because your archive holds a copy of every single dependency on a particular platform, so you won’t want to create a level-4 archive every day. Definitely useful at milestones or the end of a project, though! Level 5 again increases the file size required, and also requires installing Docker to test out the results, but provides a good level of reassurance that your project will remain usable and is particularly useful for deployment, given the wide support for Docker in deployment systems. Levels 6 and 7 are for the truly paranoid or for cases where you really, really fear being unable to run that code later (e.g. for regulatory, legal, or product-safety reasons).

Note that the levels above all assume that if your code needs any data or other non-code files, the files are small enough to include in the archive, which is easy with .zip archives and is well supported by anaconda-project. If you work with large remote data files, you’ll also need a way to ensure that those remain available and are referenced unambiguously, which can be its own headache (and a separate topic). If you do refer to any external data files, it’s a great idea to include a tiny subset of the data directly in the archive so that at least the code itself can still be demonstrated to run reproducibly, even if the full data is later no longer available. You’ll thank yourself later when you want to run the same code on new data and don’t even care about the original data but can’t get the code working without it, or if you can’t figure out what sort of data the code will accept.

The same goes for code that relies on special hardware such as particular graphics processing unit (GPU) models, a certain type of computing cluster, or other system infrastructure. Even if your full results depend on those specialized systems, it’s important to include some version of your code that will run on “generic” hardware, so that a future user of your project will be able to verify that the basic code still works as intended when the specialized computing system is no longer available. anaconda-project supports multiple commands, so it’s simple to add one for small test data and/or a limited compute platform.

These levels also assume that your code is something that runs and then completes with an output. If instead your code is something that makes sense to deploy, such as a dashboard or a REST API, you should shoot for level 3 or higher and then actually deploy and monitor the results, ideally testing using a continuous-integration system that a freshly restarted deployment will continue to work. Deployment brings all its own headaches, but if you have your project continuously deployed, not only can you be sure it’s still working but you’ll immediately allow everyone to make use of your work.

Whatever your situation, just pick the level where you are comfortable with the effort and risk involved, but surely at least choose level 3 if your code is worth anything at all!


About the Author

Jim Bednar is the Director of Custom Services at Anaconda, Inc. Dr. Bednar holds a Ph.D. in Computer Science from the University of Texas, along with degrees in Electrical Engineering and Philosophy. He has published more than 50 papers and books about the visual system, software development, and reproducible science. Dr. Bednar manages the HoloViz project, a collection of open-source Python tools that includes Panel, hvPlot, Datashader, HoloViews, GeoViews, Param, Lumen, and Colorcet. Dr. Bednar was a Lecturer and Reader in Computational Neuroscience at the University of Edinburgh from 2004-2015, and previously worked in hardware engineering and data acquisition at National Instruments.

About the Maker Blog Series

Anaconda is amplifying the voices of some of its most active and cherished community members in a monthly blog series. If you’re a Maker who has been looking for a chance to tell your story, elaborate on a favorite project, educate your peers, and build your personal brand, consider submitting an abstract. For more details and to access a wealth of educational data science resources and discussion threads, visit Anaconda Nucleus.

Talk to an Expert

Talk to one of our experts to find solutions for your AI journey.

Talk to an Expert