Way to optimize this code ?

This is the place for queries that don't fit in any of the other categories.

Way to optimize this code ?

Hello,

I run into a performance problem in python3 :

I need to iterate over a list twice.

Code: Select all
`mylist = ["45","89","59","47","88","554"]for x in mylist:    print("x {}".format(x))    o = dna.index(x)    for p in mylist[o:]:        print(p)`

My goal is to get the values of which the index is greater than the current value index.

The result here :
45 -> 45, 89,59,47,88,554
89 -> 89,59 ...

BUT with a large list the performances are really bad : 5 seconds for a 8000 elements list. (which turns into millions of iterated values)

I am sure I missed something, a structure or whatever to really optimize that.

Any clue ?

Thanks,
Larry

Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

I had a really nicely written reply, and then my phone decided to fuck it up...

The essence of what the post said:
• profile, dont guess
• my guess would be .index() and the slicing are the slow parts
• take a look at enumerate() and itertools.islice()
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1503
Joined: Thu Feb 07, 2013 3:42 pm

Re: Way to optimize this code ?

Here is the full code that I currently use :

Code: Select all
`#THE DICTIONARYlistdict = {}#THE TEMP LIST WHICH WILL KEEP THE TEMP VALUES#MY LIST OF NUMBERS #mylist = ["45","89","59","47","88","554"]mylist = ["45","89","78"]# -1 BECAUSE WE WILL INCREMENT BY ONE AND START AT 0. COULD HAVE SET IT AT 0 AND INCREMENT AT THE END BUT...i = -1#ITERATORfor x in mylist:        #SET/RESET THE LIST    tmp = []    i += 1    g = -1        for p in mylist:        g += 1                if g > i:            tmp.append(p)        listdict[x] = tmpprint(listdict)`

I just cannot imagine how I could make this better when things become heavy (thousands of items in lists)
C++ might help here but it is quite complicated.

Any way to make this better now that you have the full overview ?

Thanks!
Larry

Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

A few things I can say:
• This is stupid:
Code: Select all
`i = somethingg = -1for p in mylist:    g += 1    if g > i:        # whatever`

Why not simply set g to i + 1 from the start?
• This does the same thing as your entire code, and probably faster:
Code: Select all
`listdict = {x: mylist[i:] for i, x in enumerate(mylist, 1)}`
• If you tell what your goal is(e.g. why you're creating this dict), we might suggest a better way to do it
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1503
Joined: Thu Feb 07, 2013 3:42 pm

Re: Way to optimize this code ?

Of course !

I need such an implementation to eliminate some possibilities thereafter and being faster.

All along my code I will pick the dictionary index and list associated and make my comparisons. i will add an if statement to your answer to have it already shaped.
My list will be smaller.

Why is your solution so fast in comparison ?
Larry

Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

Okay,

Now your solution is so much faster that I try to implement a filter in the loop :
Code: Select all
`mylist = [ "7", "1", "8", "54", "4", "2", "8", "7", "9", "3", "9", "8", "7" ]switch = {             "32": "33",             "1": "7",             "2": "8",             "3": "9",}listdict = {x: mylist[i:] for i, x in enumerate(mylist, 1) if x in switch.keys()}`

The intended result :
{"1": [7,7], "2": [8,8], "3":"[9]}

Currently I only get the sorting done in the keys, which is normal. But how to iterate again in the mylist part to filter the wrong values ?

Something like :
Code: Select all
`listdict = {x: mylist[i:] for y in mylist in switch.values() for i, x in enumerate(mylist, 1) if x in switch.keys()}`

...but working...

I am not good at the "hyper-performant-one-liner" coding style...

Thanks !
Larry

Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

What makes my code faster is skipping the unnecessary looping(the stuff with g) and letting python do the work for me - the dict comprehension creates the dict and handles looping, and enumerate and slicing create the lists.

As for your last post, I'd suggest not doing something like that with a oneliner.
Sure, it might be faster that way, but it would be unreadable and impossible to maintain.

I still think there's probably a better way to do what you want, but I still don't know why you're doing this. That dict makes no sense to me.
I was thinking more about why you think you need this dict(what will the program or the funcion do?), not the details of what you think you need to do with it.

And finally, if you really need speed so badly that you're willing to write bad code to make it happen, python might be the wrong choice.
You could try using pypy though, it's supposed to be quite a bit faster than cpython.
Writing a c/cython/whatever extension miight also be an option.
Friendship is magic!

R.I.P. Tracy M. You will be missed.

stranac

Posts: 1503
Joined: Thu Feb 07, 2013 3:42 pm

Re: Way to optimize this code ?

You should learn about time complexity. Basically, you wrote O(n**2) code that can be O(n). It doesn't matter that it was a one-liner, stranac's version did it in O(n), and that's why it's faster. The loops such as the one he mentioned where you could just assign a value directly are very, very bad coding practice, but probably impact your performance very little (you should absolutely clean up the code though, especially if you're asking people to read it).
Join the #python-forum IRC channel on irc.freenode.net for off-topic chat!

Please prefer not to PM members. The point of the forum is so that anyone can benefit. We don't want to help you over PMs/emails/Skype chats that others can't benefit from

micseydel

Posts: 2118
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Way to optimize this code ?

Hello !

You are so kind on this forum !

My final line is :
Code: Select all
`def coco(mim,p):    q = [x for x in mim if x.split(":")[0] in switch.get(p.split(":")[0]) ]    return qlistdict = {x: coco(mylist[i:],x) for i, x in enumerate(mylist, 1) if x in switch.keys()}`

It calls function coco() and checks value in switch.

Well all ends well.

BUT

if I want a generator instead of a list to increase the performance, I will get :

Code: Select all
`def coco(mim,p):    #for x in mim:    #        #    if x.split(":")[0] in switch.get(p.split(":")[0]):    #        print(x)    q = (x for x in mim if x.split(":")[0] in switch.get(p.split(":")[0]) )    yield q`

listdict -> {a:<generator ... >, b: <generator ....>}

And I don't know how to get all the values from generators..

Any clue ?

Thanks !
Larry

Posts: 9
Joined: Sat Jul 06, 2013 7:33 am

Re: Way to optimize this code ?

You iterate over the items in a generator.
Code: Select all
`for item in generator:    #do something with item`
Join the #python-forum IRC channel on irc.freenode.net!

Yoriz

Posts: 1460
Joined: Fri Feb 08, 2013 1:35 am
Location: UK