Speed Up your NumPy Code with NumExpr in Seconds
Take your NumPy code to the next level
The entire data science field has grown enormously in the last few years. Python is the most widely used programming language in this field, and many Data Scientists use NumPy to program efficiently. NumPy is very efficient on small datasets. For large datasets, there are optimizations to speed up your NumPy code. You can optimize your NumPy code by using the Python package NumExpr. It’s a fast numerical expression evaluator for NumPy.
What can you expect from this article? We explain what NumExpr is and how you can use it. We also show why NumExpr has a better performance than NumPy for large datasets. For this, we make a performance test based on three examples. In addition, we show when the use of NumExpr makes sense. Be curious!
We’ll discuss the following:
What is NumExpr?
Installation and Getting Started with NumExpr
NumPy and NumExpr Examples
Conclusion
What is NumExpr?
NumExpr is an expression evaluator for NumPy. NumPy is a great Python library to work efficiently with arrays. It offers fast and optimized vectorized operations. But! It hasn’t multi-threaded capabilities. As an alternative to NumPy, you can use NumExpr. It supports parallel processing so that you can use all CPU cores for computation. That results in a better performance for large arrays, compared to NumPy, and it uses less memory than the same calculation in Python.
Compared to NumPy, NumExpr avoids allocating memory for intermediate results. That leads to better cache utilization and reduces memory access. For this reason, NumExpr works best with large arrays. NumExpr transforms the expressions into its own format that is used by a computing virtual machine. The virtual machine is written in C. Then, it splits the array operands into small chunks. These chunks fit well into the CPU. The virtual machine runs the operations on each chunk. In addition, it can distribute the chunks across the available CPU cores so that parallel code execution is possible. This approach enables optimal hardware utilization for array-based operations. According to the NumExpr documentation, it works best if the arrays are too large for the L1 CPU cache. Now, we understand how it works. So it’s time for practice!
🎓 Our Online Courses and recommendations
Installation and Getting Started with NumExpr
First, we have to install the packages NumPy and NumExpr. Enter the following command in your terminal:
$ pip install numpy numexpr
Now, we have both installed. In practice, the evaluate()
function of NumExpr is used most often. This function evaluates the string expression that you have passed as a parameter to the function. For example, we have two large NumPy arrays (var1
and var2
). Next, we want to compute the following: 2 * np.exp(var1) — var2
. We can write it as follows: numexpr.evaluate("2 * exp(var1) — var2")
. 2 * exp(var1) — var2
is the string expression. Let’s look at some examples.
NumPy and NumExpr Examples
In this section, we compare the performance of NumPy and NumExpr for different array sizes. In addition, we look at three examples to show when it makes sense to use NumExpr. First, you must create a new Python file or Jupyter Notebook. Then, you’ve to import the following libraries:
import numpy as np
import numexpr as ne
import timeit
The imports are necessary for all examples. Let’s look at the first example!
Example 1 — Trigonometric functions:
The following code shows a runtime comparison for trigonometric functions.
# large arrays
var1 = np.random.random(2**27)
var2 = np.random.random(2**27)
%timeit 2 * np.sin(var1) / np.cos(var2)
# 4.13 s ± 1.73 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit ne.evaluate("2 * sin(var1) / cos(var2)")
# 627 ms ± 29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# small arrays
var1 = np.random.random(27)
var2 = np.random.random(27)
%timeit 2 * np.sin(var1) / np.cos(var2)
# 2.5 µs ± 94.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit ne.evaluate("2 * sin(var1) / cos(var2)")
# 8.55 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
First, we generate two large arrays (var1
and var2
). The NumPy code has a runtime of 4.13 s. NumExpr runs the same code in 627 ms. Wow, that’s 6.5 times faster than NumPy. Next, we look at a small array. We can see that the NumPy code executes about 3.5 times faster. NumExpr only works well when we have large arrays.
Example 2 — Mathematical Operations:
In this example, we look at mathematical operations like multiplication, subtraction and exponentiation.
# large arrays
var1 = np.random.random(2**17)
var2 = np.random.random(2**17)
%timeit 3*var1 - var2**17
# 2.18 ms ± 59.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit ne.evaluate("3*var1 - var2**17")
# 291 µs ± 86.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# small arrays
var1 = np.random.random(17)
var2 = np.random.random(17)
%timeit 3*var1 - var2**17
# 4.64 µs ± 492 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit ne.evaluate("3*var1 - var2**17")
# 8.98 µs ± 487 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
First, we run the test with large arrays. Here we see the same picture. The NumExpr code is 7.5 times faster than the NumPy code. If we use the small arrays, the NumPy code is faster (about twice as fast). Ok. We see that the NumExpr code only makes sense on large arrays. Next, we look at a last example.
Example 3 — Logical Operations:
The following code shows a comparison of logical operations:
# large arrays
var1 = np.random.random(10**7)
var2 = np.random.random(10**7)
var3 = np.random.random(10**7)
%timeit np.sqrt((var1 - var2)**2 + var3**2) > 0.3
# 99.9 ms ± 6.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit ne.evaluate("sqrt((var1 - var2)**2 + var3**2) > 0.3")
# 15.6 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# small arrays
var1 = np.random.random(7)
var2 = np.random.random(7)
var3 = np.random.random(7)
%timeit np.sqrt((var1 - var2)**2 + var3**2) > 0.3
# 7.67 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit ne.evaluate("sqrt((var1 - var2)**2 + var3**2) > 0.3")
# 19.9 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
We can see that the NumExpr code for large arrays is about 6.5 times faster. With small arrays, NumPy code again has a better runtime. That’s the same as the previous examples.
The examples have shown that you should use NumExpr, especially for large amounts of data. In all other cases, NumPy is sufficient. You can find all supported operators and functions in the NumExpr documentation.
Conclusion
This article has shown how to speed up your NumPy code for large datasets. We mainly use NumExpr when we work with large amounts of data and find that our NumPy code becomes slow. If you have problems with the performance of code sections in your next data science project, think of NumExpr. It will speed up your code significantly.