Getting Started with conda Environments
Team Anaconda
In this tutorial, we are going to learn about using conda environments and why they are an effective tool for organizing your work when using conda. By the end of the tutorial, you should have a firm grasp of following concepts:
- Creating, updating and removing conda environments
- Saving your environments as config files
- Creating and updating your environments from config files
Requirements
The only requirement is a running version of conda. This can either be installed via the Miniconda installer or the Anaconda Distribution installer.
Environment Basics
At its core, a conda environment simply provides a way of organizing your project dependencies. Here, a project can be thought of as a collection of code or Jupyter notebooks that are used to either build an application or conduct an analysis. The dependencies for a project often include libraries such as pandas or SciPy when working with Python or any number of other dependencies when working with other programming languages.
For this tutorial, we are going to be creating an environment for a project that will be used to perform geo-spatial analysis with Python. The environment will need the following:
- Required version of Python (3.10 in our case)
- Dependencies for analysis: GeoPandas, Shapely, rasterio
- JupyterLab so we can organize our analysis in notebooks
Let’s start by running a command that will create this new environment for us and initialize it with the version of Python we want to use (3.10):
conda create -n geo-project python=3.10
After creating this environment, the next step is to activate it. We can do that with the following command:
conda activate geo-project
In order to see a list of all our environments, you can run the conda env list command:
conda env list
This will print output similar to the following:
# conda environments:\n#\nbase /home/user/opt/conda\ngeo-project
* /home/user/opt/conda/envs/geo-project
The asterisk next to the environment name lets you know that this is the currently activated environment.
Quick Note on Base Environments
When first installing and using conda, you probably saw references to something called “base” or a “base environment.” This environment is where conda itself is installed, and it should not be used for your projects. Instead, always use environments you create yourself. This allows you to better organize your project’s dependencies.
Adding More Project Dependencies
Now we have a new conda environment with the version of Python we want, but this will not be useful to us until we begin installing the dependencies we need to conduct our analysis. To do this, we will add our required dependencies with the conda install command:
conda install geopandas shapely rasterio jupyterlab -c conda-forge
This command installs four dependencies: GeoPandas, Shapely, rasterio, and JupyterLab. We also specified that we want to use conda-forge as the channel. We do this because conda-forge has the latest versions of all these dependencies. Head here to learn more about conda-forge.
At this point, we have everything we need to conduct our analysis and can begin by launching JupyterLab with the following command:
jupyterlab
Even Better Workflows with Config Files
The above workflow is great when you are still unsure about which dependencies you need for your project, but what if I told you that you could simplify the above procedure by running just a single command? This incredible feat is made possible by creating an environment config file. In this section, we will cover how to create the environment we just set up with a config file.
Let’s jump right in by showing you how the environment we created above would be written as an environment config file:
# environment.yml\nname: geo-project\nchannels: \n - conda-forge\ndependencies: \n - python=3.10 \n - jupyterlab=3.4.2 \n - geopandas=0.10.2 \n - shapely=1.8.2 \n - rasterio=1.2.10
Environment config files are created using the YAML file format and each file has a number of variables defined within. Below, we go over each variable.
Name
This is the name of your environment. You will use this later when activating and deactivating this environment, so be sure to choose something short and easy to remember.
Channels
The channels variable is where conda looks to install packages. By default, all conda installs include the “defaults” channel, which is managed by Anaconda. We use the this section to specify the additional channel, “conda-forge,” which is where we can find all the dependencies we need for our project.
Dependencies
Dependencies are the variables we define that list everything we need for our project. This includes the version of Python we want to use, JupyterLab, and all of our dependencies for performing our analysis, like GeoPandas.
If you look closely, you will also notice that we take advantage of something called version pinning.
Version pinning is when you specify exactly which version of the dependency you want to use. This is important as we begin sharing these environment files because it helps ensure that your collaborators are using the same exact version of the dependencies that you are using.
To find the available versions of your dependencies so you can pin them, visit anaconda.org.
Creating Our Environment from a Config File
Now that we have defined our config file, we can create this environment with the following command:
conda env create --file environment.yml
Afterward, activate the environment just like before:
conda activate geo-project
With these two commands, we have done everything necessary to create our environment and are now ready to do the real work of conducting analysis.
conda create vs. conda env create
You will notice that we use the conda env create command here instead of conda create. This is largely because of historical reasons (conda env was at one point its own separate project) and because conda create does not understand how to read environment config files.
Updating Your Environment
Project dependencies can change over time, and as your environment config file changes to reflect this, you can update your conda environment, too.
Let’s say we’ve learned about a new library we want add to our project. It’s the osmnx library, which will help us import and use OpenStreetMap data. After we add it to our config file, it will look like the following:
name: geo-project\nchannels: \n - conda-forge\ndependencies: \n - python=3.10 \n - jupyterlab=3.4.2 \n - geopandas=0.10.2 \n - shapely=1.8.2 \n - rasterio=1.2.10 \n - osmnx=1.2.0 # <-- our new library
Once we have saved our config file, we update our environment with the following command:
conda env update --file environment.yml
We can repeat this process in the future as our project dependencies change (e.g., version bumps of existing dependencies or removal dependencies).
Final Thoughts
In this tutorial, we have walked through how conda environments work and how you can use environment config files to simplify the process of creating and updating the same environment over and over. The latter method is most effective for sharing your work with others. The files can easily be stored in version control systems like git or even shared via email and cloud storage drives.
The next time you sit down and create a new project, be sure to include an environment config file and define your project dependencies there. A future you and others will thank you.
Further Reading
Conda for Geospatial Python DevelopmentTalk to an Expert
Talk to one of our experts to find solutions for your AI journey.