## array: efficient arrays of numeric values [solved]

This is the place for queries that don't fit in any of the other categories.

### array: efficient arrays of numeric values [solved]

Long time ago I implemented a simple Eratosthenes sieve using a list of booleans.
Code: Select all
`def primesFunUntil(n):    marks = [True] * (n + 1)    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            for mult in range(2*d, n + 1, d):                marks[mult] = False    return marks`

Some days ago I heard of Python's array module http://docs.python.org/2/library/array.html, defining efficient arrays of numeric values. I thought this should improve my sieve implementation, so I modified it:
Code: Select all
`def primesFunUntilArrays(n):    marks = array.array('b', [1] * (n + 1))    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            for mult in range(2*d, n + 1, d):                marks[mult] = 0    return marks`

I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python), while I assumed that in the second, where the size of the elements is specified by the 'b' parameter in the array creation, it would be. So, I expected a significant improvement. Here is the complete script to test both and compare, used for primes until 30,000,000:
Code: Select all
`import mathimport arrayimport timeitdef primesFunUntil(n):    marks = [True] * (n + 1)    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            for mult in range(2*d, n + 1, d):                marks[mult] = False    return marksdef primesFunUntilArrays(n):    marks = array.array('b', [1] * (n + 1))    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            for mult in range(2*d, n + 1, d):                marks[mult] = 0    return marksif __name__ == "__main__":    print("Time for naive implementation: %s." % timeit.timeit("primesFunUntil(3*10**7)", number=1, setup="from __main__ import primesFunUntil"))    print("Time for the implementation with arrays: %s." % timeit.timeit("primesFunUntilArrays(3*10**7)", number=1, setup="from __main__ import primesFunUntilArrays"))`

It produces the very dissapointing result:
Code: Select all
`Time for naive implementation: 8.21312978368488.Time for the implementation with arrays: 11.19651029153776.`

Can someone shed some light and tell me what is going on here?

Last edited by zeycus on Sat Oct 05, 2013 7:30 pm, edited 1 time in total.

Live long and prosper.
Spock

zeycus

Posts: 23
Joined: Sun Feb 17, 2013 10:30 am

### Re: array: efficient arrays of numeric values

I'm not exactly sure what's happening behind the scenes when using array.arrays, as I have never used them.
I would guess array.array is just a list with different code for space allocation and added type-checking.
You could read the arraymodule.c if you want to know for sure.

zeycus wrote:I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python)

I don't know where you read that, but it's wrong.
Element access times are constant; lists are implemented as C arrays of pointers to python objects.

And finally:
If you really have need for efficient arrays, I would suggest using numpy.
Last edited by stranac on Sat Oct 05, 2013 6:59 pm, edited 1 time in total.
Reason: Changed a do to a don't; shouldn't have been a do in the first place
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1790
Joined: Thu Feb 07, 2013 3:42 pm

### Re: array: efficient arrays of numeric values

stranac wrote:
zeycus wrote:I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python)

I do know where you read that, but it's wrong.
Element access times are constant; lists are implemented as C arrays of pointers to python objects.

This comes as a shock. I don't remember where I read that time access for lists is not constant in Python, this goes to prove that you can't believe everything you read. Not that I would not believe you , but to avoid making the same mistake twice, I looked it up again, and now everywhere I found that it is O(1). So, thanks a lot for straightening that up.

Anyway, I am still surprised that normal lists of numbers can beat arrays, that are alledgely efficient list for numbers. Maybe this explains why I never found them used.

Live long and prosper.
Spock

zeycus

Posts: 23
Joined: Sun Feb 17, 2013 10:30 am

### Re: array: efficient arrays of numeric values

The array module only support C types. It allocates a contiguous block of memory sufficient to hold the requested number of objects. A Python object must be converted into a C type before it can be stored into the array. The extra conversion is the primary cause of the slowdown. A Python list is a contiguous block of memory containing pointers to Python objects so conversions are not required.

An array is more efficient for memory use if the size of the data type is less than a pointer. They are also helpful for reading standard C types from data files or interfacing with external modules.

It is possible to avoid the type conversion by using the slice notation to copy between two arrays. This can also be done with lists. See functions primesFunUntil2() and primesFunUntilArrays2() in the code below. The running time is now identical. The functions primesFunUntil3() and primesFunUntil4() tweak optimize the sieve implementation but those changes are irrelevant to lists vs. arrays.

I maintain the gmpy2 library and it includes an integer type (xmpz) that is optimized for bit manipulation. I included an example that uses xmpz.

Code: Select all
`import mathimport arrayimport timeitimport gmpy2def primesFunUntil(n):    marks = [True] * (n + 1)    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            for mult in range(2*d, n + 1, d):                marks[mult] = False    return marksdef primesFunUntil2(n):    zeros = [False] * (n + 1)    marks = [True] * (n + 1)    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            marks[2*d: n + 1: d] = zeros[2*d: n + 1: d]    return marksdef primesFunUntilArrays(n):    marks = array.array('b', [1] * (n + 1))    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            for mult in range(2*d, n + 1, d):                marks[mult] = 0    return marksdef primesFunUntilArrays2(n):    zeros = array.array('b', [0] * (n + 1))    marks = array.array('b', [1] * (n + 1))    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            marks[2*d:n+1:d] = zeros[2*d:n+1:d]    return marksdef primesFunUntilArrays3(n):    zeros = array.array('b', [0] * (n + 1))    marks = array.array('b', [1] * (n + 1))    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            marks[d*d:n+1:d] = zeros[d*d:n+1:d]    return marksdef primesFunUntilArrays4(n):    zeros = array.array('b', [0] * (n + 1))    marks = array.array('b', [1] * (n + 1))    d = 2    marks[d*d:n+1:d] = zeros[d*d:n+1:d]    for d in range(3, 1 + int(round(math.sqrt(n)))):        if marks[d]:            marks[d*d:n+1:d+d] = zeros[d*d:n+1:d+d]    return marksdef gmpy2_sieve(n):    sieve_limit = gmpy2.isqrt(n) + 1    limit = n + 1    bitmap = gmpy2.xmpz(3)    bitmap[4 : limit : 2] = -1    for p in bitmap.iter_clear(3, sieve_limit):        bitmap[p*p : limit : p+p] = -1    return bitmapif __name__ == "__main__":    print("Time for naive implementation: %s." %          timeit.timeit("primesFunUntil(3*10**7)",                        number=1,                        setup="from __main__ import primesFunUntil"))    print("Time for naive implementation #2: %s." %          timeit.timeit("primesFunUntil2(3*10**7)",                        number=1,                        setup="from __main__ import primesFunUntil2"))    print("Time for the implementation with arrays: %s." %          timeit.timeit("primesFunUntilArrays(3*10**7)",          number=1,          setup="from __main__ import primesFunUntilArrays"))    print("Time for the implementation with arrays #2: %s." %          timeit.timeit("primesFunUntilArrays2(3*10**7)",          number=1,          setup="from __main__ import primesFunUntilArrays2"))    print("Time for the implementation with arrays #3: %s." %          timeit.timeit("primesFunUntilArrays3(3*10**7)",          number=1,          setup="from __main__ import primesFunUntilArrays3"))    print("Time for the implementation with arrays #4: %s." %          timeit.timeit("primesFunUntilArrays4(3*10**7)",          number=1,          setup="from __main__ import primesFunUntilArrays4"))    print("Time for the implementation with gmpy2: %s." %          timeit.timeit("gmpy2_sieve(3*10**7)",          number=1,          setup="from __main__ import gmpy2_sieve"))`

Running times:

Code: Select all
`Time for naive implementation: 5.765339136123657.Time for naive implementation #2: 3.6114020347595215.Time for the implementation with arrays: 7.475277900695801.Time for the implementation with arrays #2: 3.6160669326782227.Time for the implementation with arrays #3: 3.5527069568634033.Time for the implementation with arrays #4: 3.0937089920043945.Time for the implementation with gmpy2: 0.17927193641662598.`
casevh

Posts: 114
Joined: Sat Feb 09, 2013 7:35 am

### Re: array: efficient arrays of numeric values

Wow, your computer kicks mine's ass...
Anyway, I thought you were missing a numpy test, so I did a timing of my own.
Code: Select all
`Time for naive implementation: 14.5630209446.Time for naive implementation #2: 5.99617981911.Time for the implementation with arrays: 17.6102640629.Time for the implementation with arrays #2: 8.53394293785.Time for the implementation with arrays #3: 8.39233803749.Time for the implementation with arrays #4: 7.34447622299.Time for the implementation with numpy: 0.0301620960236.`

The test for gmpy2 is left out, as I don't have it installed.
But it seems numpy beats it anyway, by quite a margin.
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1790
Joined: Thu Feb 07, 2013 3:42 pm

### Re: array: efficient arrays of numeric values

Thank you both casevh and stranac for taking the trouble of preparing those thorough comparisons!

casevh wrote:The array module only support C types. It allocates a contiguous block of memory sufficient to hold the requested number of objects. A Python object must be converted into a C type before it can be stored into the array. The extra conversion is the primary cause of the slowdown. A Python list is a contiguous block of memory containing pointers to Python objects so conversions are not required.

I don't know why, I would never have thought of using slices in the assignation, and I see performance is really improved. I had no idea.

gmpy2 and numpy's performance in stranac's test is just awesome. I tried numpy some time ago and, although of course it was very useful, reminded me of Matlab, so until now I only used it for problems where linear algebra is the key. But seeing this, I must take a second look.

Again, thank you both!

Live long and prosper.
Spock

zeycus

Posts: 23
Joined: Sun Feb 17, 2013 10:30 am

### Re: array: efficient arrays of numeric values

stranac wrote:Wow, your computer kicks mine's ass...
Anyway, I thought you were missing a numpy test, so I did a timing of my own.
Code: Select all
`Time for naive implementation: 14.5630209446.Time for naive implementation #2: 5.99617981911.Time for the implementation with arrays: 17.6102640629.Time for the implementation with arrays #2: 8.53394293785.Time for the implementation with arrays #3: 8.39233803749.Time for the implementation with arrays #4: 7.34447622299.Time for the implementation with numpy: 0.0301620960236.`

The test for gmpy2 is left out, as I don't have it installed.
But it seems numpy beats it anyway, by quite a margin.

Just for grins, can you post your numpy version? I'm curious how fast it will be on my computer.
casevh

Posts: 114
Joined: Sat Feb 09, 2013 7:35 am

### Re: array: efficient arrays of numeric values [solved]

Sure, but it's the same as the initial example with just the array creation and assignment changed:
Code: Select all
`def numpy_sieve(n):    marks = numpy.zeros(n+1, dtype=numpy.bool)    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            marks[2*d: n + 1: d] = 1    return marks`
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1790
Joined: Thu Feb 07, 2013 3:42 pm

### Re: array: efficient arrays of numeric values [solved]

In the example I included for gmpy2, I made version that uses is same convention. It is a little slower. When you made the numpy version, you mixed the logic of the two versions. I think the correct code for both versions is:

Code: Select all
`def gmpy2_sieve(n):    sieve_limit = gmpy2.isqrt(n) + 1    limit = n + 1    bitmap = gmpy2.xbit_mask(n+1)    bitmap[4 : limit : 2] = 0    for p in bitmap.iter_set(3, sieve_limit):        bitmap[p*p : limit : p+p] = 0    return bitmapdef numpy_sieve(n):    marks = numpy.ones(n+1, dtype=numpy.bool)    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            marks[2*d: n + 1: d] = 0    return marks`

And the running times...

Code: Select all
`Time for the implementation with gmpy2: 0.21328949000007924.Time for the implementation with numpy: 0.3965407480000067.`
casevh

Posts: 114
Joined: Sat Feb 09, 2013 7:35 am

### Re: array: efficient arrays of numeric values [solved]

I guess you're right, although I have no idea what either xbit_mask or .iter_set do.
Is there documentation anywhere?

Also: yay my computer beat yours at the numpy test
Just kiding, I realize you probably used a greater n.
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1790
Joined: Thu Feb 07, 2013 3:42 pm

### Re: array: efficient arrays of numeric values [solved]

Documentation is online

Here is the overview of the xmpz type. Sieving is used as an example.

xbit_mask(n) returns an xmpz integer with the first n bits set to 1.

iter_set(start, end) creates an iterator that returns all the bits that are set between start and end.
casevh

Posts: 114
Joined: Sat Feb 09, 2013 7:35 am

### Re: array: efficient arrays of numeric values [solved]

Interesting. Thanks for the info.
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1790
Joined: Thu Feb 07, 2013 3:42 pm