Why “Boomerang” Is Such a Common Term at Anaconda

The numeric Python community should consider adopting Numba more widely within community code.
Numba is strong in performance and usability, but historically weak in ease of installation and community trust. This blog post discusses these strengths and weaknesses from the perspective of an OSS library maintainer. It uses other more technical blog posts written on the topic as references. It is biased in favor of wider adoption given recent changes to the project.
Let’s start with a wildly unprophetic quote from Jake Vanderplas in 2013:
“I’m becoming more and more convinced that Numba is the future of fast scientific computing in Python.”
– Jake Vanderplas, June 15, 2013
https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/
We’ll use the following blog posts by other community members throughout this post. They’re all good reads and are more technical, showing code examples, performance numbers, etc.
At the end of the blog post these authors will also share some thoughts on Numba today, looking back with some hindsight.
Disclaimer: I work alongside many of the Numba developers within the same company and am partially funded through the same granting institution.
Many open source numeric Python libraries need to write efficient low-level code that works well on NumPy arrays, but is more complex than the NumPy library itself can express. Typically they use one of the following options:
Each of these choices has tradeoffs in performance, packaging, attracting new developers, and so on. Ideally we want our solution to…
While the two main approaches today, Cython and C/C++, both do well on most of these objectives, neither is perfect. Some issues that arise include the following:
There are other options out there—including Numba and Pythran—that provide excellent performance and usability benefits but are rarely used. Let’s take a closer look at Numba’s benefits and drawbacks.
Numba is generally well-regarded from a technical perspective (it’s fast, easy to use, well maintained, etc.) but historically has not been trusted due to packaging and community concerns.
In any test of either performance or usability, Numba almost always wins (or ties for the win). It does all of the compiler optimization tricks you expect. It supports both for-loopy code as well as Numpy-style slicing and bulk operation code. It requires almost no additional information from the user (assuming that you’re ok with JIT behavior) and so is very approachable, and very easy for novices to use well.
This means that we get phrases like the following:
In all cases where authors compared Numba to Cython for numeric code (Cython is probably the standard for these cases), Numba always performs as-well-or-better and is always much simpler to write.
Here is a code example from Jake’s second blogpost:
The algorithmic body of each function (the nested for loops) is identical. However, the Cython code is more verbose with annotations, both at the function definition (which we would expect for any AOT compiler), but also within the body of the function for various utility variables. The Numba code is just straight Python + Numpy code. We could remove the @numba.jit
decorator and step through our function with normal Python.
Additionally, Numba lets us use NumPy syntax directly in the function. For example, the following function is well-accelerated by Numba, even though it already fits NumPy’s syntax well.
Mixing and matching NumPy-style with for-loop style is often helpful when writing complex numeric algorithms.
Benchmarks in the these blog posts show that Numba is both simpler to use and often as-fast-or-faster than more commonly used technologies like Cython.
So, given these advantages, why didn’t Jake’s original prophecy hold true?
I believe that there are three primary reasons why Numba has not been more widely adopted among other open source projects:
pip
.All three of these are excellent reasons to avoid adding a dependency. Technical excellence alone is insufficient, and must be considered alongside community and long-term maintenance concerns.
Numba now depends on the easier-to-install library llvmlite
, which, as of a few months ago is pip-installable with binary wheels on Windows, Mac, and Linux. The llvmlite
package is still a heavy-ish runtime dependency (42MB), but that’s significantly less than large Cython libraries like pandas or SciPy.
If your concern was about the average user’s inability to install Numba, I think that this concern has been resolved.
Numba has three community problems:
This combination strongly attached Numba’s image to Continuum’s for-profit ventures, making community-oriented software maintainers understandably wary of dependence, for fear that dependence on this library might be used for Continuum’s financial gain at the expense of community users.
Things have changed significantly.
Numba Pro was abolished years ago. The funding for the project today comes more often from Anaconda Inc. consulting revenue, hardware vendors looking to ensure that Python runs as efficiently as possible on their systems, and from generous donations from the Gordon and Betty Moore foundation to ensure that Numba serves the open source Python community.
Developers outside of Anaconda Inc. now have core commit access, which forces communication to happen in public channels, notably GitHub (which was standard before) and Gitter chat (which is relatively new).
The maintainers are still fairly relatively unknown within the broader community. This isn’t due to any sort of conspiracy, but is instead due more to shyness or having interests outside of OSS. Antoine, Siu, Stan, and Stuart are all considerate, funny, and clever fellows with strong enthusiasm for compilers, OSS, and performance. They are quite responsive on the Numba mailing list should you have any questions or concerns.
If your concern was about Numba trapping users into a for-profit mode, that seems to have been resolved years ago.
If your concern is more about not knowing who is behind the project, I encourage you to reach out. I would be surprised if you don’t walk away pleased!
For completeness, let’s list a number of reasons why it is still quite reasonable to avoid Numba today:
llvmlite
is cheaper than LLVM, it’s still 50MB.After writing the above, I reached out both to Stan and Siu from Numba and to the original authors of the referenced blogposts to get some of their impressions now having the benefit of additional experience.
Here are a few choice responses:
Stan:
I think one of the biggest arguments against Numba still is time. Due to a massive rewrite of the code base, Numba, in its present form, is ~3 years old, which isn’t that old for a project like this. I think it took PyPy at least 5-7 years to reach a point where it was stable enough to really trust. Cython is 10 years old. People have good reason to be conservative with taking on new core dependencies.
Jake:
One thing I think would be worth fleshing-out a bit (you mention it in the final bullet list) is the fact that numba is kind of a black box from the perspective of the developer. My experience is that it works well for straightforward applications, but when it doesn’t work well it’s *extremely difficult to diagnose what the problem might be.*
Contrast that with Cython, where the html annotation output does wonders for understanding your bottlenecks both at a non-technical level (“this is dark yellow so I should do something different”) and a technical level (“let me look at the C code that was generated”). If there’s a similar tool for numba, I haven’t seen it.
Florian:
Elaborating on Jake’s answer, I completely agree that Cython’s annotation tool does wonders in terms of understanding your code. In fact, numba does possess this too, but as a command-line utility. I tried to demonstrate this in my blog post, but exporting the CSS in the final HTML render kind of mangles my blog post so here’s a screenshot:
This is a case where jit(nopython=True)
works, so there seems to be no coloring at all.
Florian also pointed to the SciPy 2017 tutorial by Gil Forsyth and Lorena Barba
I hold Numba in high regard, and the speedups impress me every time. I use it quite often to optimize some bottlenecks in our production code or data analysis pipelines (unfortunately not open source). And I love how Numba makes some functions like scipy.optimize.minimize
or scipy.ndimage.generic_filter
well-usable with minimal effort.
However, I would never use Numba to build larger systems, precisely for the reason Jake mentioned. Subjectively, Numba feels hard to debug, has cryptic error messages, and seemingly inconsistent behavior. It is not a “decorate and forget” solution; instead it always involves plenty of fiddling to get right.
That being said, if I were to build some high-level scientific library à la Astropy with some few performance bottlenecks, I would definitely favor Numba over Cython (and if it’s just to spare myself the headache of getting a working C compiler on Windows).
Stephan:
I wonder if there are any examples of complex codebases (say >1000 LOC) using Numba. My sense is that this is where Numba’s limitations will start to become more evident, but maybe newer features like jitclass would make this feasible.
As a final take-away, you might want to follow Florian’s advice and watch Gil and Lorena’s tutorial here:
Talk to one of our experts to find solutions for your AI journey.