Spine-Tingling Data Science Tales From Beyond the Desk
data:image/s3,"s3://crabby-images/41348/41348d441e3ede2326824013829be957fdd4c008" alt="unnamed"
6min
Happy Halloween, readers. At Anaconda, we’re not too scared about things that go bump in the night. We’ve examined the data and concluded that it’s just the cleaning staff upstairs. We are, however, kept awake by the ever-present concern of the security and experience of our users! We’d like to take this opportunity to discuss some of the scary stuff out there, and what we’re doing to mitigate the risks and prevent problems.
Have you ever read one of those horrible stories around Halloween time where someone is doing nasty things to candy and hurting kids? Well, we think of the potential for maliciously modified packages like that nasty, corrupted candy. Just as the candy was produced “good” and turned “bad” down the line, we worry about people taking good packages and using them to infect users’ computers. In order to prevent this happening to packages that you get from Anaconda’s “defaults” channel, we take many steps to ensure that no one gets between us and your packages.
Around this time in 2017, we released version 5 of the Anaconda Distribution. That release contained major upgrades to our compiler infrastructure, which allowed us to take advantage of several new security improvements provided by compilers. We’ve detailed these in a prior blog post, but suffice to say here that where older packages presented a juicy target for buffer overflow attacks, our new packages are much harder targets. Over the past several months, we have been working with the conda-forge community to extend these security benefits to their packages.
In our efforts to improve some parts of our projects, gremlins often sneak in where we’re not looking to mess up others. Recently, speed of the conda package installation process has suffered. We’re grateful for your patience in tolerating these sloth gremlins. We’re also happy to report that we’re working hard to improve the situation. For one, we’re now using Air Speed Velocity to benchmark conda continuously. You can check out our results and our benchmark code.
ASV has already been a tremendous tool. With it, we’ve discovered that we took a large speed hit around conda version 4.3.25. That version corresponds with the addition of timestamp data to the solver. We added that for the sake of conda-build 3, with its new hashes. At the time, any modification of a conda recipe at all resulted in a different hash. The timestamp optimization was meant as a tiebreaker so that conda would choose your latest package build, if everything else was equal. Sounds innocent, right? Well, when no packages had timestamps, this caused no penalty. We’ve been increasing the number of packages with timestamps, though. On the “main” channel, which is part of our defaults channel, all the packages have these timestamps. Due to a particular way that the solver operates, the timestamps are considered earlier in the solution process than they should be, leading conda to get very confused, take a long time, and sometimes come up with bizarre solutions. With our latest conda 4.6.0 beta, we’ve disabled this timestamp optimization by default for conda, but kept it for conda-build. This change has returned conda solve speed for the anaconda metapackage to the level that it had been prior to conda 4.3.25.
But wait, that still can be a lot slower than pip, right? Yep. Pip does not use a solver (yet?) Instead, pip considers only the constraints provided by the package that it is working on installing at a given point in time—not the constraints from other packages being installed, and not the constraints from packages already in your environment. Here’s where gremlins creep in that break things. These gremlins are the gremlins of software version incompatibility. The versions required by one package may not be compatible with the versions required by another package. Conda’s solver is its greatest strength for avoiding these. As long as the package authors have captured the software compatibility bounds correctly in their conda recipe, conda will not break your environment by installing software that only works with part of your environment. Doing that right is worth some extra time, in our opinion. However, we are actively working to reduce the time necessary for the solver, both by improving the solver itself and by rethinking the input to the solver to reduce the size of the problem that it has to solve (sharding repodata into smaller chunks). The wait time for conda’s solver is something that we’ll seek to improve over time, but we appreciate your patience and feedback in the meantime!
The success of the community in creating a vibrant package ecosystem for conda has been both something we’re very happy to see, and also vexed by. We’re vexed because there are a lot of ways that it can go wrong, and lead to bad experiences for people who use conda. Binary incompatibility arises in a surprisingly wide variety of ways. We’ll describe this problem in detail in an upcoming blog post, but right now, it is advisable to limit the number of channels that you use in a given environment, and try to keep to a single channel for a given environment as much as possible.
Unfortunately, Anaconda doesn’t install perfectly on the first try for everyone. We’re always working to improve this, but there are a few problems that we struggle with. These are almost always related to the state of the user’s system at install time. Anaconda is designed to be self-contained and non-disruptive to your other software. Unfortunately, it’s not so simple in reality. Existing Python installations, installations of Python modules in global locations, or libraries that have the same names as Anaconda libraries can prevent Anaconda from working properly. Here are a few places to check for this, if you’re having trouble.
We’re working on improving our software to prevent the conflicts currently haunting you, but we hope these Halloween tips & treats will help you recover from any problems in the meantime!
Talk to one of our experts to find solutions for your AI journey.