Way to optimize this code ?

This is the place for queries that don't fit in any of the other categories.

Way to optimize this code ?

Postby Larry » Sat Jul 06, 2013 7:48 am

Hello,

I run into a performance problem in python3 :

I need to iterate over a list twice.

Code: Select all
mylist = ["45","89","59","47","88","554"]

for x in mylist:
    print("x {}".format(x))
    o = dna.index(x)
    for p in mylist[o:]:
        print(p)



My goal is to get the values of which the index is greater than the current value index.

The result here :
45 -> 45, 89,59,47,88,554
89 -> 89,59 ...

BUT with a large list the performances are really bad : 5 seconds for a 8000 elements list. (which turns into millions of iterated values)

I am sure I missed something, a structure or whatever to really optimize that.

Any clue ?

Thanks,
Larry
 
Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

Postby stranac » Sat Jul 06, 2013 11:45 am

I had a really nicely written reply, and then my phone decided to fuck it up...

The essence of what the post said:
  • profile, dont guess
  • my guess would be .index() and the slicing are the slow parts
  • take a look at enumerate() and itertools.islice()
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1218
Joined: Thu Feb 07, 2013 3:42 pm

Re: Way to optimize this code ?

Postby Larry » Sat Jul 06, 2013 11:52 am

:)

Here is the full code that I currently use :

Code: Select all
#THE DICTIONARY
listdict = {}

#THE TEMP LIST WHICH WILL KEEP THE TEMP VALUES


#MY LIST OF NUMBERS
#mylist = ["45","89","59","47","88","554"]
mylist = ["45","89","78"]

# -1 BECAUSE WE WILL INCREMENT BY ONE AND START AT 0. COULD HAVE SET IT AT 0 AND INCREMENT AT THE END BUT...
i = -1

#ITERATOR
for x in mylist:
   
    #SET/RESET THE LIST
    tmp = []
    i += 1
    g = -1
   
    for p in mylist:
        g += 1
       
        if g > i:

            tmp.append(p)
   
    listdict[x] = tmp

print(listdict)


I just cannot imagine how I could make this better when things become heavy (thousands of items in lists)
C++ might help here but it is quite complicated.

Any way to make this better now that you have the full overview ?

Thanks!
Larry
 
Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

Postby stranac » Sat Jul 06, 2013 12:14 pm

A few things I can say:
  • This is stupid:
    Code: Select all
    i = something
    g = -1
    for p in mylist:
        g += 1
        if g > i:
            # whatever

    Why not simply set g to i + 1 from the start?
  • This does the same thing as your entire code, and probably faster:
    Code: Select all
    listdict = {x: mylist[i:] for i, x in enumerate(mylist, 1)}
  • If you tell what your goal is(e.g. why you're creating this dict), we might suggest a better way to do it
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1218
Joined: Thu Feb 07, 2013 3:42 pm

Re: Way to optimize this code ?

Postby Larry » Sat Jul 06, 2013 12:40 pm

Of course !

I need such an implementation to eliminate some possibilities thereafter and being faster.

All along my code I will pick the dictionary index and list associated and make my comparisons. i will add an if statement to your answer to have it already shaped.
My list will be smaller.

Why is your solution so fast in comparison ?
Larry
 
Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

Postby Larry » Sat Jul 06, 2013 1:44 pm

Okay,

Now your solution is so much faster that I try to implement a filter in the loop :
Code: Select all
mylist = [ "7", "1", "8", "54", "4", "2", "8", "7", "9", "3", "9", "8", "7" ]
switch = {

             "32": "33",
             "1": "7",
             "2": "8",
             "3": "9",

}
listdict = {x: mylist[i:] for i, x in enumerate(mylist, 1) if x in switch.keys()}


The intended result :
{"1": [7,7], "2": [8,8], "3":"[9]}

Currently I only get the sorting done in the keys, which is normal. But how to iterate again in the mylist part to filter the wrong values ?

Something like :
Code: Select all
listdict = {x: mylist[i:] for y in mylist in switch.values() for i, x in enumerate(mylist, 1) if x in switch.keys()}

...but working...


I am not good at the "hyper-performant-one-liner" coding style...

Thanks !
Larry
 
Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

Postby stranac » Sat Jul 06, 2013 3:12 pm

What makes my code faster is skipping the unnecessary looping(the stuff with g) and letting python do the work for me - the dict comprehension creates the dict and handles looping, and enumerate and slicing create the lists.

As for your last post, I'd suggest not doing something like that with a oneliner.
Sure, it might be faster that way, but it would be unreadable and impossible to maintain.

I still think there's probably a better way to do what you want, but I still don't know why you're doing this. That dict makes no sense to me.
I was thinking more about why you think you need this dict(what will the program or the funcion do?), not the details of what you think you need to do with it.

And finally, if you really need speed so badly that you're willing to write bad code to make it happen, python might be the wrong choice.
You could try using pypy though, it's supposed to be quite a bit faster than cpython.
Writing a c/cython/whatever extension miight also be an option.
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1218
Joined: Thu Feb 07, 2013 3:42 pm

Re: Way to optimize this code ?

Postby micseydel » Sat Jul 06, 2013 5:34 pm

You should learn about time complexity. Basically, you wrote O(n**2) code that can be O(n). It doesn't matter that it was a one-liner, stranac's version did it in O(n), and that's why it's faster. The loops such as the one he mentioned where you could just assign a value directly are very, very bad coding practice, but probably impact your performance very little (you should absolutely clean up the code though, especially if you're asking people to read it).
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1443
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Way to optimize this code ?

Postby Larry » Sun Jul 07, 2013 9:10 am

Hello !

You are so kind on this forum !

My final line is :
Code: Select all
def coco(mim,p):

    q = [x for x in mim if x.split(":")[0] in switch.get(p.split(":")[0]) ]
    return q

listdict = {x: coco(mylist[i:],x) for i, x in enumerate(mylist, 1) if x in switch.keys()}


It calls function coco() and checks value in switch.

Well all ends well.

BUT

if I want a generator instead of a list to increase the performance, I will get :

Code: Select all
def coco(mim,p):

    #for x in mim:
    #   
    #    if x.split(":")[0] in switch.get(p.split(":")[0]):
    #        print(x)

    q = (x for x in mim if x.split(":")[0] in switch.get(p.split(":")[0]) )
    yield q

listdict -> {a:<generator ... >, b: <generator ....>}

And I don't know how to get all the values from generators..

Any clue ?

Thanks !
Larry
 
Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

Postby Yoriz » Sun Jul 07, 2013 11:02 am

You iterate over the items in a generator.
Code: Select all
for item in generator:
    #do something with item
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Image
User avatar
Yoriz
 
Posts: 1056
Joined: Fri Feb 08, 2013 1:35 am
Location: UK


Return to General Coding Help

Who is online

Users browsing this forum: Bing [Bot] and 3 guests