Numpy Multiplying large arrays with dtype int8 is SLOW

0 votes

Consider the following piece of code, which generates some (potentially) huge, multi-dimensional array and performs numpy.tensordot with it (whether we multiply the same or two different arrays here, does not really matter).

import time
import numpy

L, N = 6, 4

shape = (2*L)*[N,]
A = numpy.arange(numpy.prod(shape)).reshape(shape)
A = A % 256 - 128   # [-127,+127]
axes=(range(1,2*L,2), range(0,2*L,2))

def run(dtype, repeat=1):
    A_ = A.astype(dtype)
    t = time.time()
    for i in range(repeat):
        numpy.tensordot(A_, A_, axes)
    t = time.time() - t
    print(dtype, '   \t%8.2f sec\t%8.2f MB' %(t, A_.nbytes/1e6))

Now we can compare the performance for different data types, e.g.:

run(numpy.float64)
run(numpy.int64)

Since the array only consists of small integer numbers, I would like to save some memory by using dtype=int8. However, this slows down the matrix multiplication A LOT.

Here are some test cases

The first one, is the important one for my use case. The others are just for reference. Using Numpy 1.13.1 and Python 3.4.2

Large array

L, N = 6, 4; A.size = 4**12 = 16777216
<class 'numpy.float64'>        59.58 sec      134.22 MB
<class 'numpy.float32'>        44.19 sec       67.11 MB
<class 'numpy.int16'>         711.16 sec       33.55 MB
<class 'numpy.int8'>          647.40 sec       16.78 MB

Same array with different data types. Memory decreases as expected. But why the large differences in the CPU time? If anything I would expect int to be faster than float.

Large array with different shape

L, N = 1, 4**6; A.size = (4**6)**2 = 16777216
<class 'numpy.float64'>        57.95 sec      134.22 MB
<class 'numpy.float32'>        42.84 sec       67.11 MB

The shape doesn't seem to have a large effect.

Not so large array

L, N = 5, 4
<class 'numpy.float128'>       10.91 sec       16.78 MB
<class 'numpy.float64'>         0.98 sec        8.39 MB
<class 'numpy.float32'>         0.90 sec        4.19 MB
<class 'numpy.float16'>         9.80 sec        2.10 MB
<class 'numpy.int64'>           8.84 sec        8.39 MB
<class 'numpy.int32'>           5.55 sec        4.19 MB
<class 'numpy.int16'>           2.23 sec        2.10 MB
<class 'numpy.int8'>            1.82 sec        1.05 MB

Smaller values, but same weird trend.

small array, lots of repetitions

L, N = 2, 4; A.size = 4**4 = 256; repeat=1000000

<class 'numpy.float128'>       17.92 sec        4.10 KB
<class 'numpy.float64'>        14.20 sec        2.05 KB
<class 'numpy.float32'>        12.21 sec        1.02 KB
<class 'numpy.float16'>        41.72 sec        0.51 KB
<class 'numpy.int64'>          14.21 sec        2.05 KB
<class 'numpy.int32'>          14.26 sec        1.02 KB
<class 'numpy.int16'>          13.88 sec        0.51 KB
<class 'numpy.int8'>           13.03 sec        0.26 KB

Other than float16 being much slower, everything is fine here.

Why is int8 for very large arrays so much slower? Is there any way around this? Saving memory becomes increasingly important for larger arrays!

Sep 5, 2018 in Python by bug_seeker
• 15,520 points
2,942 views

1 answer to this question.

0 votes

Unfortunately,

as correctly underlined in the comments, the "engine" behind the scenes is BLAS, and it does not have native integer type. That's why The float64 or 32 will then run faster (some discussion in a related answer for a similar question for C++).

As a side note to the core of your question, a way to explore to speed up your problem while limiting the memory consumption is to go with Cython, where you can run C code directly and getting back the result in Python.

answered Sep 5, 2018 by Priyaj
• 58,100 points

Related Questions In Python

0 votes
1 answer

Numpy: Multiplying large arrays with dtype=int8 is SLOW

Unfortunately, the "engine" behind the scenes is BLAS, ...READ MORE

answered May 9, 2018 in Python by charlie_brown
• 7,720 points
1,227 views
0 votes
0 answers
0 votes
1 answer
0 votes
1 answer

Python Pandas: selecting element in array column

pa.loc[row] selects the row with label row. pa.loc[row, ...READ MORE

answered May 13, 2019 in Python by SDeb
• 13,300 points
13,258 views
0 votes
1 answer

NumPy Array Indexing

If you want to create a subarray ...READ MORE

answered Jul 26, 2019 in Python by SDeb
• 13,300 points
650 views
0 votes
1 answer

ValueError: setting an array element with a sequence

The problem is the shape of the ...READ MORE

answered May 1, 2022 in Python by narikkadan
• 63,720 points
2,349 views
0 votes
0 answers

Python numpy array of numpy arrays

I have a requirement. I need to ...READ MORE

Aug 1, 2022 in Python by krishna
• 2,820 points
490 views
0 votes
3 answers

What is the python keyword “with” used for?

The with statement in Python simplifies exception ...READ MORE

answered Jul 19, 2019 in Python by rahul
• 360 points
1,596 views
+1 vote
1 answer

Why is openpyxl is required for loading excel format files?

Well, it sounds like openpyxl is not ...READ MORE

answered Aug 8, 2018 in Python by Priyaj
• 58,100 points
1,013 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP