## Array efficiency (solved)

This is the place for queries that don't fit in any of the other categories.

### Array efficiency (solved)

I need to handle a data structure that is like a dictionary, where both keys and values are integers.
EDIT: The keys are the first n positive integers.

I began with the obvious choice: a dict. This works alright, the problem is its size: I need to handle 100 million different keys, and it takes too much memory.

So, next I tried to use a list of 100 million values, understanding that the ith value would be the value assigned to i. I could create a 100-million integers list, the problem is that the asignments and search operations with it are too slow.

I post here the specific example. The math in it is not the issue here, what I am doing is storing in lod[i] the smallest prime that divides i, and I want this for i = 1...size. What I am worried about is not the prime generation, but the efficiency of the listOneDivisor functions. For size=10,000 (instead of 100,000,000):
* Using dictionaries, it takes 0.007 seconds (function listOneDivisor_01).
* Using lists, it takes about 6.5 seconds (function listOneDivisor_02).

So, altough I could store the info in a list, the access times are not good at all, compared with those of the hash. I know time access in Python lists is not constant-time.

I thought the solution would be the efficient arrays in
http://docs.python.org/2/library/array.html

with array.array('L') I am creating an array of long integers. I expected the access-time and modification-time to be very short (thinking, like in C, that the data would be stored in a single chunk of memory, so that the position of the n-th item is straightforward). So I coded listOneDivisor_03. I timed it and got... 8.2 seconds! .

Code: Select all
`import arrayimport timeimport mathdef primesFunUntil(n):    """Returns a boolean vector, marking primes as True, composites as False.    It is based on Erathostenes Sieve    """    marks = [True] * (n + 1)    d = 2    for d in range(2, 1 + int(round(math.sqrt(n)))):        if marks[d]:            for mult in range(2*d, n + 1, d):                marks[mult] = False    return marksdef primesUntil(n):    """Returns the list of primes in [2,...,n]."""    marks = primesFunUntil(n)    return [m for m in range(2, n+1) if marks[m]]def listOneDivisor01(n):    output = [0] * (n+1)    ps = primesUntil(n)    for p in ps:        curr = p        while curr <= n:            if curr not in output:                output[curr] = p            curr += p    return outputdef listOneDivisor02(n):    output = array.array('L')    output.extend([0] * (n+1) )    ps = primesUntil(n)    for p in ps:        curr = p        while curr <= n:            if curr not in output:                output[curr] = p            curr += p    return outputdef listOneDivisor03(n):    output = {}    ps = primesUntil(n)    for p in ps:        curr = p        while curr <= n:            if curr not in output:                output[curr] = p            curr += p    return outputif __name__ == '__main__':    bound = 1 * 10**4    t0 = time.time()    lod = listOneDivisor01(bound)    t1 = time.time()    lod = listOneDivisor02(bound)    t2 = time.time()    lod = listOneDivisor03(bound)    t3 = time.time()    print(t1-t0, t2-t1, t3-t2)`

So, two questions:

* Am I doing something wrong here? Or arrays in the array module are only efficient in space, but not in access and modification times?

* Is there a simple way to create that 100 million integer key-value data structure where search and modification are quick,
that I can hold in memory (unlike dict)?.

Thanks!
Last edited by zeycus on Sat Mar 09, 2013 7:49 pm, edited 1 time in total.

Live long and prosper.
Spock

zeycus

Posts: 23
Joined: Sun Feb 17, 2013 10:30 am

### Re: Array efficiency

I answer myself: I blundered. The test "curr in output" was ok for dictionaries,
but I should have replaced it with "output[curr]==0" for the lists. Now I get
very similar times for all the executions.

Live long and prosper.
Spock

zeycus

Posts: 23
Joined: Sun Feb 17, 2013 10:30 am

### Re: Array efficiency

Thanks for posting the followup!
Due to the reasons discussed here we will be moving to python-forum.io on October 1, 2016.

This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

micseydel

Posts: 2997
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA