python - creating small arrays in cython takes a humongous amount of time -


i writing new random number generator numpy produces random numbers according arbitrary distribution when came across weird behavior:

this test.pyx

#cython: boundscheck=false #cython: wraparound=false import numpy np cimport numpy np cimport cython  def barebones(np.ndarray[double, ndim=1] a,np.ndarray[double, ndim=1] u,r):     return u  def untypedwithloop(a,u,r):     cdef int i,j=0     in range(u.shape[0]):         j+=i     return u,j  def bsreplacement(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):     cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)     cdef int i,j=0     in range(u.shape[0]):         j=i     return r 

setup.py

from distutils.core import setup cython.build import cythonize setup(name = "simple cython func",ext_modules = cythonize('test.pyx'),) 

profiling code

#!/usr/bin/python __future__ import division  import subprocess import timeit  #compile cython modules before importing them subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])  sstr=""" import test import numpy u=numpy.random.random(10) a=numpy.random.random(10) a=numpy.cumsum(a) a/=a[-1] r=numpy.empty(10,int) """  print "binary search: creates array[n] , performs n binary searches fill it:\n",timeit.timeit('numpy.searchsorted(a,u)',sstr) print "simple replacement binary search:takes same args np.searchsorted , returns new array. performs 1 trivial operation per element:\n",timeit.timeit('test.bsreplacement(a,u)',sstr)  print "barebones function doing nothing:",timeit.timeit('test.barebones(a,u,r)',sstr) print "untyped inputs , doing n iterations:",timeit.timeit('test.untypedwithloop(a,u,r)',sstr) print "time np.empty()",timeit.timeit('numpy.empty(10,int)',sstr) 

the binary search implementation takes in order of len(u)*log(len(a)) time execute. trivial cython function takes in order of len(u) run. both return 1d int array of len(u).

however, no computation trivial implementation takes longer full binary search in numpy library. (it written in c: https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c see pyarray_searchsorted)

the results are:

binary search: creates array[n] , performs n binary searches fill it: 1.15157485008 simple replacement binary search:takes same args np.searchsorted , returns new array. performs 1 trivial operation per element: 3.69442796707 barebones function doing nothing: 0.87496304512 untyped inputs , doing n iterations: 0.244267940521 time np.empty() 1.0983929634 

why np.empty() step taking time? , can empty array can return ?

the c function , runs whole bunch of sanity checks , uses longer algorithm in inner loop. (i removed logic except loop fro example)


update

it turns out there 2 distinct problems:

  1. the np.empty(10) call alone has ginormous overhead , takes time takes searchsorted make new array , perform 10 binary searches on it
  2. just declaring buffer syntax np.ndarray[...] has massive overhead takes more time receiving untyped variables , iterating 50 times.

results 50 iterations:

binary search: 2.45336699486 simple replacement:3.71126317978 barebones function doing nothing: 0.924916028976 untyped inputs , doing n iterations: 0.316384077072 time np.empty() 1.04949498177 

there discussion of on cython list might have useful suggestions: https://groups.google.com/forum/#!topic/cython-users/cwtu_jyadgm

generally though try allocate small arrays outside of cython, pass them in , re-use them in subsequent calls method. understand not option.


Comments

Popular posts from this blog

java - Run a .jar on Heroku -

java - Jtable duplicate Rows -

validation - How to pass paramaters like unix into windows batch file -