array: efficient arrays of numeric values [solved]

This is the place for queries that don't fit in any of the other categories.

array: efficient arrays of numeric values [solved]

Postby zeycus » Sat Oct 05, 2013 11:01 am

Long time ago I implemented a simple Eratosthenes sieve using a list of booleans.
Code: Select all
def primesFunUntil(n):
    marks = [True] * (n + 1)
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            for mult in range(2*d, n + 1, d):
                marks[mult] = False
    return marks


Some days ago I heard of Python's array module http://docs.python.org/2/library/array.html, defining efficient arrays of numeric values. I thought this should improve my sieve implementation, so I modified it:
Code: Select all
def primesFunUntilArrays(n):
    marks = array.array('b', [1] * (n + 1))
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            for mult in range(2*d, n + 1, d):
                marks[mult] = 0
    return marks


I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python), while I assumed that in the second, where the size of the elements is specified by the 'b' parameter in the array creation, it would be. So, I expected a significant improvement. Here is the complete script to test both and compare, used for primes until 30,000,000:
Code: Select all
import math
import array
import timeit

def primesFunUntil(n):
    marks = [True] * (n + 1)
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            for mult in range(2*d, n + 1, d):
                marks[mult] = False
    return marks

def primesFunUntilArrays(n):
    marks = array.array('b', [1] * (n + 1))
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            for mult in range(2*d, n + 1, d):
                marks[mult] = 0
    return marks

if __name__ == "__main__":
    print("Time for naive implementation: %s." % timeit.timeit("primesFunUntil(3*10**7)", number=1, setup="from __main__ import primesFunUntil"))
    print("Time for the implementation with arrays: %s." % timeit.timeit("primesFunUntilArrays(3*10**7)", number=1, setup="from __main__ import primesFunUntilArrays"))


It produces the very dissapointing result:
Code: Select all
Time for naive implementation: 8.21312978368488.
Time for the implementation with arrays: 11.19651029153776.


Can someone shed some light and tell me what is going on here?

Thanks in advance!
Last edited by zeycus on Sat Oct 05, 2013 7:30 pm, edited 1 time in total.
Image

Live long and prosper.
Spock
User avatar
zeycus
 
Posts: 23
Joined: Sun Feb 17, 2013 10:30 am
Location: Madrid

Re: array: efficient arrays of numeric values

Postby stranac » Sat Oct 05, 2013 1:43 pm

I'm not exactly sure what's happening behind the scenes when using array.arrays, as I have never used them.
I would guess array.array is just a list with different code for space allocation and added type-checking.
You could read the arraymodule.c if you want to know for sure.

zeycus wrote:I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python)

I don't know where you read that, but it's wrong.
Element access times are constant; lists are implemented as C arrays of pointers to python objects.

And finally:
If you really have need for efficient arrays, I would suggest using numpy.
Last edited by stranac on Sat Oct 05, 2013 6:59 pm, edited 1 time in total.
Reason: Changed a do to a don't; shouldn't have been a do in the first place
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1144
Joined: Thu Feb 07, 2013 3:42 pm

Re: array: efficient arrays of numeric values

Postby zeycus » Sat Oct 05, 2013 4:01 pm

Thank you for your quick reply.

stranac wrote:
zeycus wrote:I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python)

I do know where you read that, but it's wrong.
Element access times are constant; lists are implemented as C arrays of pointers to python objects.

This comes as a shock. I don't remember where I read that time access for lists is not constant in Python, this goes to prove that you can't believe everything you read. Not that I would not believe you :D, but to avoid making the same mistake twice, I looked it up again, and now everywhere I found that it is O(1). So, thanks a lot for straightening that up.

Anyway, I am still surprised that normal lists of numbers can beat arrays, that are alledgely efficient list for numbers. Maybe this explains why I never found them used.
Image

Live long and prosper.
Spock
User avatar
zeycus
 
Posts: 23
Joined: Sun Feb 17, 2013 10:30 am
Location: Madrid

Re: array: efficient arrays of numeric values

Postby casevh » Sat Oct 05, 2013 5:29 pm

The array module only support C types. It allocates a contiguous block of memory sufficient to hold the requested number of objects. A Python object must be converted into a C type before it can be stored into the array. The extra conversion is the primary cause of the slowdown. A Python list is a contiguous block of memory containing pointers to Python objects so conversions are not required.

An array is more efficient for memory use if the size of the data type is less than a pointer. They are also helpful for reading standard C types from data files or interfacing with external modules.

It is possible to avoid the type conversion by using the slice notation to copy between two arrays. This can also be done with lists. See functions primesFunUntil2() and primesFunUntilArrays2() in the code below. The running time is now identical. The functions primesFunUntil3() and primesFunUntil4() tweak optimize the sieve implementation but those changes are irrelevant to lists vs. arrays.

I maintain the gmpy2 library and it includes an integer type (xmpz) that is optimized for bit manipulation. I included an example that uses xmpz.

Code: Select all
import math
import array
import timeit
import gmpy2

def primesFunUntil(n):
    marks = [True] * (n + 1)
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            for mult in range(2*d, n + 1, d):
                marks[mult] = False
    return marks

def primesFunUntil2(n):
    zeros = [False] * (n + 1)
    marks = [True] * (n + 1)
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            marks[2*d: n + 1: d] = zeros[2*d: n + 1: d]
    return marks

def primesFunUntilArrays(n):
    marks = array.array('b', [1] * (n + 1))
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            for mult in range(2*d, n + 1, d):
                marks[mult] = 0
    return marks

def primesFunUntilArrays2(n):
    zeros = array.array('b', [0] * (n + 1))
    marks = array.array('b', [1] * (n + 1))
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            marks[2*d:n+1:d] = zeros[2*d:n+1:d]
    return marks

def primesFunUntilArrays3(n):
    zeros = array.array('b', [0] * (n + 1))
    marks = array.array('b', [1] * (n + 1))
    d = 2
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            marks[d*d:n+1:d] = zeros[d*d:n+1:d]
    return marks

def primesFunUntilArrays4(n):
    zeros = array.array('b', [0] * (n + 1))
    marks = array.array('b', [1] * (n + 1))
    d = 2
    marks[d*d:n+1:d] = zeros[d*d:n+1:d]
    for d in range(3, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            marks[d*d:n+1:d+d] = zeros[d*d:n+1:d+d]
    return marks

def gmpy2_sieve(n):
    sieve_limit = gmpy2.isqrt(n) + 1
    limit = n + 1
    bitmap = gmpy2.xmpz(3)
    bitmap[4 : limit : 2] = -1
    for p in bitmap.iter_clear(3, sieve_limit):
        bitmap[p*p : limit : p+p] = -1

    return bitmap

if __name__ == "__main__":
    print("Time for naive implementation: %s." %
          timeit.timeit("primesFunUntil(3*10**7)",
                        number=1,
                        setup="from __main__ import primesFunUntil"))

    print("Time for naive implementation #2: %s." %
          timeit.timeit("primesFunUntil2(3*10**7)",
                        number=1,
                        setup="from __main__ import primesFunUntil2"))

    print("Time for the implementation with arrays: %s." %
          timeit.timeit("primesFunUntilArrays(3*10**7)",
          number=1,
          setup="from __main__ import primesFunUntilArrays"))

    print("Time for the implementation with arrays #2: %s." %
          timeit.timeit("primesFunUntilArrays2(3*10**7)",
          number=1,
          setup="from __main__ import primesFunUntilArrays2"))

    print("Time for the implementation with arrays #3: %s." %
          timeit.timeit("primesFunUntilArrays3(3*10**7)",
          number=1,
          setup="from __main__ import primesFunUntilArrays3"))

    print("Time for the implementation with arrays #4: %s." %
          timeit.timeit("primesFunUntilArrays4(3*10**7)",
          number=1,
          setup="from __main__ import primesFunUntilArrays4"))

    print("Time for the implementation with gmpy2: %s." %
          timeit.timeit("gmpy2_sieve(3*10**7)",
          number=1,
          setup="from __main__ import gmpy2_sieve"))


Running times:

Code: Select all
Time for naive implementation: 5.765339136123657.
Time for naive implementation #2: 3.6114020347595215.
Time for the implementation with arrays: 7.475277900695801.
Time for the implementation with arrays #2: 3.6160669326782227.
Time for the implementation with arrays #3: 3.5527069568634033.
Time for the implementation with arrays #4: 3.0937089920043945.
Time for the implementation with gmpy2: 0.17927193641662598.
casevh
 
Posts: 70
Joined: Sat Feb 09, 2013 7:35 am

Re: array: efficient arrays of numeric values

Postby stranac » Sat Oct 05, 2013 7:12 pm

Wow, your computer kicks mine's ass...
Anyway, I thought you were missing a numpy test, so I did a timing of my own.
Code: Select all
Time for naive implementation: 14.5630209446.
Time for naive implementation #2: 5.99617981911.
Time for the implementation with arrays: 17.6102640629.
Time for the implementation with arrays #2: 8.53394293785.
Time for the implementation with arrays #3: 8.39233803749.
Time for the implementation with arrays #4: 7.34447622299.
Time for the implementation with numpy: 0.0301620960236.

The test for gmpy2 is left out, as I don't have it installed.
But it seems numpy beats it anyway, by quite a margin.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1144
Joined: Thu Feb 07, 2013 3:42 pm

Re: array: efficient arrays of numeric values

Postby zeycus » Sat Oct 05, 2013 7:29 pm

Thank you both casevh and stranac for taking the trouble of preparing those thorough comparisons!

casevh wrote:The array module only support C types. It allocates a contiguous block of memory sufficient to hold the requested number of objects. A Python object must be converted into a C type before it can be stored into the array. The extra conversion is the primary cause of the slowdown. A Python list is a contiguous block of memory containing pointers to Python objects so conversions are not required.


Your explanation is crystal clear.

I don't know why, I would never have thought of using slices in the assignation, and I see performance is really improved. I had no idea.

gmpy2 and numpy's performance in stranac's test is just awesome. I tried numpy some time ago and, although of course it was very useful, reminded me of Matlab, so until now I only used it for problems where linear algebra is the key. But seeing this, I must take a second look.

Again, thank you both!
Image

Live long and prosper.
Spock
User avatar
zeycus
 
Posts: 23
Joined: Sun Feb 17, 2013 10:30 am
Location: Madrid

Re: array: efficient arrays of numeric values

Postby casevh » Sat Oct 05, 2013 7:38 pm

stranac wrote:Wow, your computer kicks mine's ass...
Anyway, I thought you were missing a numpy test, so I did a timing of my own.
Code: Select all
Time for naive implementation: 14.5630209446.
Time for naive implementation #2: 5.99617981911.
Time for the implementation with arrays: 17.6102640629.
Time for the implementation with arrays #2: 8.53394293785.
Time for the implementation with arrays #3: 8.39233803749.
Time for the implementation with arrays #4: 7.34447622299.
Time for the implementation with numpy: 0.0301620960236.

The test for gmpy2 is left out, as I don't have it installed.
But it seems numpy beats it anyway, by quite a margin.


Just for grins, can you post your numpy version? I'm curious how fast it will be on my computer.
casevh
 
Posts: 70
Joined: Sat Feb 09, 2013 7:35 am

Re: array: efficient arrays of numeric values [solved]

Postby stranac » Sat Oct 05, 2013 7:43 pm

Sure, but it's the same as the initial example with just the array creation and assignment changed:
Code: Select all
def numpy_sieve(n):
    marks = numpy.zeros(n+1, dtype=numpy.bool)
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            marks[2*d: n + 1: d] = 1
    return marks
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1144
Joined: Thu Feb 07, 2013 3:42 pm

Re: array: efficient arrays of numeric values [solved]

Postby casevh » Sat Oct 05, 2013 8:20 pm

In the example I included for gmpy2, I made version that uses is same convention. It is a little slower. When you made the numpy version, you mixed the logic of the two versions. I think the correct code for both versions is:

Code: Select all
def gmpy2_sieve(n):
    sieve_limit = gmpy2.isqrt(n) + 1
    limit = n + 1
    bitmap = gmpy2.xbit_mask(n+1)
    bitmap[4 : limit : 2] = 0
    for p in bitmap.iter_set(3, sieve_limit):
        bitmap[p*p : limit : p+p] = 0
    return bitmap

def numpy_sieve(n):
    marks = numpy.ones(n+1, dtype=numpy.bool)
    for d in range(2, 1 + int(round(math.sqrt(n)))):
        if marks[d]:
            marks[2*d: n + 1: d] = 0
    return marks


And the running times...

Code: Select all
Time for the implementation with gmpy2: 0.21328949000007924.
Time for the implementation with numpy: 0.3965407480000067.
casevh
 
Posts: 70
Joined: Sat Feb 09, 2013 7:35 am

Re: array: efficient arrays of numeric values [solved]

Postby stranac » Sat Oct 05, 2013 8:30 pm

I guess you're right, although I have no idea what either xbit_mask or .iter_set do.
Is there documentation anywhere?

Also: yay my computer beat yours at the numpy test :P
Just kiding, I realize you probably used a greater n.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1144
Joined: Thu Feb 07, 2013 3:42 pm

Re: array: efficient arrays of numeric values [solved]

Postby casevh » Sat Oct 05, 2013 8:37 pm

Documentation is online

Here is the overview of the xmpz type. Sieving is used as an example.

https://gmpy2.readthedocs.org/en/latest ... -xmpz-type

xbit_mask(n) returns an xmpz integer with the first n bits set to 1.

iter_set(start, end) creates an iterator that returns all the bits that are set between start and end.
casevh
 
Posts: 70
Joined: Sat Feb 09, 2013 7:35 am

Re: array: efficient arrays of numeric values [solved]

Postby stranac » Sat Oct 05, 2013 8:39 pm

Interesting. Thanks for the info.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1144
Joined: Thu Feb 07, 2013 3:42 pm


Return to General Coding Help

Who is online

Users browsing this forum: micseydel and 4 guests