Numpy: Multiplying large arrays with dtype=int8 is SLOW

0 votes

Consider the following piece of code, which generates some (potentially) huge, multi-dimensional array and performs numpy.tensordot with it (whether we multiply the same or two different arrays here, does not really matter).

import time
import numpy

L, N = 6, 4

shape = (2*L)*[N,]
A = numpy.arange(numpy.prod(shape)).reshape(shape)
A = A % 256 - 128   # [-127,+127]
axes=(range(1,2*L,2), range(0,2*L,2))

def run(dtype, repeat=1):
    A_ = A.astype(dtype)
    t = time.time()
    for i in range(repeat):
        numpy.tensordot(A_, A_, axes)
    t = time.time() - t
    print(dtype, '   \t%8.2f sec\t%8.2f MB' %(t, A_.nbytes/1e6))

Now we can compare the performance for different data types, e.g.:

run(numpy.float64)
run(numpy.int64)

Since the array only consists of small integer numbers, I would like to save some memory by using dtype=int8. However, this slows down the matrix multiplication A LOT.

Here are some test cases

The first one, is the important one for my use case. The others are just for reference. Using Numpy 1.13.1 and Python 3.4.2

Large array

L, N = 6, 4; A.size = 4**12 = 16777216
<class 'numpy.float64'>        59.58 sec      134.22 MB
<class 'numpy.float32'>        44.19 sec       67.11 MB
<class 'numpy.int16'>         711.16 sec       33.55 MB
<class 'numpy.int8'>          647.40 sec       16.78 MB

Same array with different data types. Memory decreases as expected. But why the large differences in the CPU time? If anything I would expect int to be faster than float.

Large array with different shape

L, N = 1, 4**6; A.size = (4**6)**2 = 16777216
<class 'numpy.float64'>        57.95 sec      134.22 MB
<class 'numpy.float32'>        42.84 sec       67.11 MB

The shape doesn't seem to have a large effect.

Not so large array

L, N = 5, 4
<class 'numpy.float128'>       10.91 sec       16.78 MB
<class 'numpy.float64'>         0.98 sec        8.39 MB
<class 'numpy.float32'>         0.90 sec        4.19 MB
<class 'numpy.float16'>         9.80 sec        2.10 MB
<class 'numpy.int64'>           8.84 sec        8.39 MB
<class 'numpy.int32'>           5.55 sec        4.19 MB
<class 'numpy.int16'>           2.23 sec        2.10 MB
<class 'numpy.int8'>            1.82 sec        1.05 MB

Smaller values, but same weird trend.

small array, lots of repetitions

L, N = 2, 4; A.size = 4**4 = 256; repeat=1000000

<class 'numpy.float128'>       17.92 sec        4.10 KB
<class 'numpy.float64'>        14.20 sec        2.05 KB
<class 'numpy.float32'>        12.21 sec        1.02 KB
<class 'numpy.float16'>        41.72 sec        0.51 KB
<class 'numpy.int64'>          14.21 sec        2.05 KB
<class 'numpy.int32'>          14.26 sec        1.02 KB
<class 'numpy.int16'>          13.88 sec        0.51 KB
<class 'numpy.int8'>           13.03 sec        0.26 KB

Other than float16 being much slower, everything is fine here.

Question

Why is int8 for very large arrays so much slower? Is there any way around this? Saving memory becomes increasingly important for larger arrays!

May 9, 2018 in Python by ariaholic
• 7,320 points
36 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Unfortunately, the "engine" behind the scenes is BLAS, and it does not have native integer type. That's why The float64 or 32 will then run faster (some discussion in a related answer for a similar question for C++).

As a side note to the core of your question, a way to explore to speed up your problem while limiting the memory consumption is to go with Cython, where you can run C code directly and getting back the result in Python.

answered May 9, 2018 by charlie_brown
• 7,710 points

Related Questions In Python

0 votes
1 answer

Numpy: Multiplying large arrays with dtype=int8 is SLOW

Unfortunately, as correctly underlined in the comments, the ...READ MORE

answered Sep 5, 2018 in Python by Priyaj
• 56,120 points
44 views
0 votes
1 answer
0 votes
1 answer

Combining numpy arrays

There is concatenation function for numpy arrays ...READ MORE

answered Jul 3, 2018 in Python by Hamartia's Mask
• 1,580 points
61 views
0 votes
1 answer

Printing a large numpy array

numpy.set_printoptions(threshold='nan') READ MORE

answered Jul 20, 2018 in Python by Nietzsche's daemon
• 4,260 points
60 views
0 votes
2 answers

What is the python keyword “with” used for?

Please visit this site to know abt ...READ MORE

answered Aug 31, 2018 in Python by Somesh
25 views
0 votes
1 answer

mapping two numpy arrays

Here's a NumPythonic vectorized approach - B[:,1][(A == ...READ MORE

answered May 10 in Python by SDeb
• 9,460 points
12 views
0 votes
1 answer

How Lambda() is used with filter() in python?

The filter() function in Python takes in ...READ MORE

answered 1 day ago in Python by Rakshi
10 views
+3 votes
5 answers

is python compatible with Linux?

Just follow these three commands and you ...READ MORE

answered Sep 12, 2018 in Python by charlie_brown
• 7,710 points
54 views
+1 vote
2 answers

Measuring the distance between pixels on OpenCv with Python

You can try this: Mat pts1(nPts, 1, CV_8UC2), ...READ MORE

answered Aug 24, 2018 in Python by Omkar
• 65,840 points
1,909 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.