search for elements without duplicates in a list

This is the place for queries that don't fit in any of the other categories.

search for elements without duplicates in a list

Postby lovecodecakes » Sun Apr 14, 2013 1:20 pm

this is a code to check for duplicates in a list
Code: Select all
def has_duplicates(alist):
    alpha=[]
    numeric=[]
    for i,j in enumerate(alist):
        if type(j)!=int and alist.count(j)>1:
            if all(j != alpha[x] for x in alpha): # Tried this to eliminate duplicating the alpha list
                alpha+=[int(i)]
    for i,j in enumerate(alist):
        if type(j)==int and alist.count(j)>1:
            if all(j != numeric[x] for x in numeric):  # Tried this to eliminate duplicating the numeric list
                numeric+=[int(i)]
    return alpha,numeric

This is not an efficient code.
Another point is given a list [1,2,3,'a','b','b','c',4,4,'A']
Code: Select all
has_duplicates([1,2,3,'a','b','b','c',4,4,'A'])
Out[77]: ([4, 5], [7, 8])


Q1. the problem is 4==5 and 7==8 in the result. I would like to know if something like findall feature exists for lists where
Q2 .How do I get just ([4],[7]) in this particular example??
Q3. How to search for only numbers in a list using some search function available for list??
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: search for elements without duplicates in a list

Postby Mekire » Sun Apr 14, 2013 2:43 pm

You are over complicating. Also, do you simply want to know if duplicates exist, or is it vital to know the indexes?
If you just want a True or False it is simple:
Code: Select all
def has_duplicates(alist):
    for item in alist:
        if alist.count(item) > 1:
            return True
    return False

def has_dups_hashable(alist):
    return not sorted(alist) == sorted(list(set(alist)))
The second version there will only work if all types in the list are hashable.

To get the indexes isn't much harder. Use list.count and list.index

-Mek
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: search for elements without duplicates in a list

Postby Yoriz » Sun Apr 14, 2013 4:40 pm

Here is another way of doing it using dictionary's to count the occurrences.
Code: Select all
from collections import defaultdict

def duplicate_indexes(items):
    alpha_dict = defaultdict(list)
    numeric_dict = defaultdict(list)
    for index, item in enumerate(items):
        items_dict = alpha_dict if type(item) == str else numeric_dict
        items_dict[item].append(index)
    alpha = [item for item in alpha_dict.itervalues() if len(item) > 1]
    numeric = [item for item in numeric_dict.itervalues() if len(item) > 1]
    return alpha, numeric
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Spam topic disapproval technician
Windows7, Python 2.7.4., WxPython 2.9.5.0., some Python 3.3
User avatar
Yoriz
 
Posts: 763
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: search for elements without duplicates in a list

Postby Yoriz » Sun Apr 14, 2013 4:57 pm

I just noticed
lovecodecakes wrote:this is a code to check for duplicates in a list

and the title is
search for elements without duplicates in a list
New Users, Read This
Join the #python-forum IRC channel on irc.freenode.net!
Spam topic disapproval technician
Windows7, Python 2.7.4., WxPython 2.9.5.0., some Python 3.3
User avatar
Yoriz
 
Posts: 763
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

Re: search for elements without duplicates in a list

Postby Mekire » Mon Apr 15, 2013 12:11 am

Generalizing to find duplicates of any type:
Code: Select all
def get_dups(alist):
    dups_type = {}
    for item in alist:
        if alist.count(item) > 1:
            current = dups_type.setdefault(type(item),[])
            if item not in current:
                current.append(item)
    return dups_type

Usage:
Code: Select all
>>> mylist = [[1,2],[1,2],3,3,4,5,5,6,"ab","ab","dc",{"a":2},{"a":2}]
>>> dups = get_dups(mylist)
>>> dups[int]
[3, 5]
>>> dups[str]
['ab']
>>> dups[dict]
[{'a': 2}]
>>> dups[list]
[[1, 2]]

Or for the first index in the original list where the duplicate occurs:
Code: Select all
def get_dup_inds(alist):
    dups_type = {}
    dummy_list = []
    for i,item in enumerate(alist):
        if item not in dummy_list and alist.count(item) > 1:
            current = dups_type.setdefault(type(item),[])
            dummy_list.append(item)
            current.append(i)
    return dups_type

-Mek
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: search for elements without duplicates in a list

Postby lovecodecakes » Mon Apr 15, 2013 4:20 pm

Mekire wrote:

Yoriz wrote:Here is another way of doing it using dictionary's to count the occurrences.


Okay, I feel the code could be simpler than that so I tried something out.

Code: Select all
alist=([1,2,3,'a','b','b','c',4,4,'A'])
dummy=[]

for i,j in enumerate(alist):
    if type(j)!=int and alist.count(j)>1 and j not in dummy:
        dummy+=[j]
print dummy

['b']

(Very good, my problem is solved)
BUT!
Do that Again,this way
Code: Select all
 dummy=[j for i,j in enumerate(alist) if type(j)!=int and alist.count(j)>1 and j not in dummy]

dummy
Out[24]: []

Why, both are the same aren't they?? So why the difference? I don't understand.

And Again,

Code: Select all
dummy
Out[25]: []

dummy+=[j for i,j in enumerate(alist) if type(j)!=int and alist.count(j)>1 and j not in dummy]

dummy
Out[27]: ['b', 'b']

I don't get that one either. Could someone explain please.
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: search for elements without duplicates in a list

Postby micseydel » Mon Apr 15, 2013 11:10 pm

Solutions using str.index() and str.count() here are O(n**2), so if the size of the list is going to be relatively large, then I recommend against those solutions.

Also, use isinstance() instead of type()==.

Instead of code like
Code: Select all
blist += [element]

do
Code: Select all
blist.append(element)


In a line like
Code: Select all
alist=([1,2,3,'a','b','b','c',4,4,'A'])
you don't need the parenthesis.

Your list comprehension doesn't work because it refers to dummy, a different list (since the one you're making with the list comp hasn't been bound to the variable on the left side yet). Further, I recommend against long, ugly, inefficient one-liners anyway.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1179
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: search for elements without duplicates in a list

Postby Mekire » Tue Apr 16, 2013 12:08 am

Um... the entire point of mine was that it worked for all types simultaneously. You asked for strings AND ints. The way you have coded it, when you want to do the same for ints, you have to rewrite the code. Also the reason I used type was to make dictionary keys, not for type checking; as Micseydel said, you want isinstance. And yes, never use += for a list if you can help it. Use append (or when adding multiple entries use extend).

Anyway... your code returns this:
Code: Select all
['b']
if you want to check for int duplicates you have to retype the whole thing. Mine creates a dictionary of all duplicates according to type.
Code: Select all
def get_dups(alist):
    dups_type = {}
    for item in alist:
        if alist.count(item) > 1:
            current = dups_type.setdefault(type(item),[])
            if item not in current:
                current.append(item)
    return dups_type
Code: Select all
mylist=[1,2,3,'a','b','b','c',4,4,'A']
all_dups = get_dups(mylist)
print(all_dups[int])
print(all_dups[str])

>>>
[4]
['b']
If you suddenly decide you want to check for duplicates of ints or floats or lists you don't have to change a thing.
You are once again trying to be "clever" and looking to make a complicated list comprehension. You shouldn't even use list comprehensions until you have a much better grasp of the language. The point of python is to write both highly readable, and highly reusable code. At the moment you are aiming for neither.

-Mek

(Edit: Nevermind - I concur, count is horrid. Live and learn.)
Last edited by Mekire on Tue Apr 16, 2013 5:04 am, edited 2 times in total.
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: search for elements without duplicates in a list

Postby Mekire » Tue Apr 16, 2013 4:50 am

I retract my statement saying that list.count is ok. It is horrible.

I wrote a version that didn't use count to compare it with and was fairly staggered at the disparity:
Code: Select all
from random import randint
import timeit

def get_dups(alist):
    dups_type = {}
    for item in alist:
        if alist.count(item) > 1:
            current = dups_type.setdefault(type(item),[])
            if item not in current:
                current.append(item)
    return dups_type

def dups_hashable(alist):
    adict = {}
    dups_type = {}
    for item in alist:
        adict[item] = adict.get(item,0)+1
    for item in adict:
        if adict[item] != 1:
            dups_type.setdefault(type(item),[]).append(item)
    return dups_type

if __name__ == "__main__":
    mylist = [randint(0,10000) for i in xrange(10000)]
    times = 5

    T = timeit.Timer("get_dups(mylist)","from __main__ import get_dups,mylist")
    print(T.timeit(times)/times)

    T = timeit.Timer("dups_hashable(mylist)","from __main__ import dups_hashable,mylist")
    print(T.timeit(times)/times)

    print(sorted(get_dups(mylist)[int]) == sorted(dups_hashable(mylist)[int]))

Results:
Code: Select all
2.98663032406
0.00477591056778
True

The second method that uses a second for loop runs a staggering 600 times faster. I will definitely be fanatically avoiding count in the future. The second version as it is written can now only handle hashable types (strings, ints, floats) but that is a small price to pay.

-Mek
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: search for elements without duplicates in a list

Postby micseydel » Tue Apr 16, 2013 7:06 am

The count method isn't inherently flawed, and the fact by what you notice a difference in speed will depend on the size of the list. The bigger the list, the bigger the difference (and for a small enough list, the count solution may even be faster). Don't worry about the fact that it's by a 600x difference in your test. Just worry about the fact that in this case, for every element in the list (of which there are n), you could do end up doing n work (n**2) total.

Compare this to the dictionary solution that does a constant amount of work for each of the n elements. For small inputs, the difference will be small, or the constant factor might be even worse. We tend not to worry about such cases though, since for so small an input the difference likely will not matter anyway, and we write the scaleable solution since it will never crap out so spectacularly on us as the square solution.

Sometimes "micro-optimization" is worthwhile, but you don't ever start out that way. Priority 1 is writing clean, correct code, optimizing for time complexity when necessary, and only after empirically finding that something is not fast enough, worrying about a constant factor. Which you could go years in the field without having happen. With the paid work I've done, I don't believe I've even come across issues where time complexity was relevant. I remember there was one database querying line of code that was commented as being inefficient, and in need of change, but there weren't enough customers at the time to sort it out.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1179
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: search for elements without duplicates in a list

Postby Mekire » Tue Apr 16, 2013 7:45 am

micseydel wrote:Solutions using str.index() and str.count() here are O(n**2)

So just to be clear, count by itself is O(n), correct? And it is the count function put inside the for loop (IE counting every item) that makes it O(n**2)?

-Mek
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: search for elements without duplicates in a list

Postby micseydel » Tue Apr 16, 2013 5:49 pm

Yes, the count method is O(n). It's also big-Theta(n), and I should have said before that the previous one was big-­Theta too. That is to say, it's not just a scary upper bound, it's a tight bound; it will always perform so poorly, since count must traverse the entire iterable (str, list, tuple, whatever).

It's n**2 because you're doing a big-Theta(n) operation in a loop which does that operation exactly n times. To put some numbers to it, this means that a doubling in size of the iterable leads to a quadrupling of the time spent, and increasing the size x100 means a x10000 increase in time.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1179
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: search for elements without duplicates in a list

Postby lovecodecakes » Tue Apr 16, 2013 9:24 pm

Mekire wrote:
I wrote a version that didn't use count to compare
... The second version as it is written can now only handle hashable types (strings, ints, floats) but that is a small price to pay.

-Mek

You gave me a great idea. So please do positively criticize on this code so I can improvise 'cause of you guys that I learnt so much about it.
The idea was to list out the duplicates as well as w/o duplicates a list or output plus its type would be cool too. So I took that idea from you Mek.
Code: Select all
mylist
Out[78]: [1, 2, 3, 'a', 'b', 'b', 'c', 4, 4, 'A']

def dup(alist):
    generic={}
    removed_duplicate={}
    counter=collections.Counter()
    counter.update(alist)
    counter=dict(counter)
    #print counter
    for keys,values in counter.items():
        generic.setdefault(type(keys),[]).append(keys) if values>1 else removed_duplicate.setdefault(type(keys),[]).append(keys)
    return generic,removed_duplicate


Code: Select all
dup(mylist)
Out[73]: ({int: [4], str: ['b']}, {int: [1, 2, 3], str: ['a', 'c', 'A']})


Just to check how deep in water i am:
Code: Select all
%timeit dup(mylist)
100000 loops, best of 3: 16.2 us per loop

how is it?

Code: Select all
 cProfile.run('dup(mylist)')
         38 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <ipython-input-72-52530c3cd2e9>:1(dup)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 _weakrefset.py:68(__contains__)
        1    0.000    0.000    0.000    0.000 abc.py:128(__instancecheck__)
        1    0.000    0.000    0.000    0.000 collections.py:407(__init__)
        2    0.000    0.000    0.000    0.000 collections.py:470(update)
        1    0.000    0.000    0.000    0.000 {getattr}
        1    0.000    0.000    0.000    0.000 {isinstance}
        8    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       10    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        8    0.000    0.000    0.000    0.000 {method 'setdefault' of 'dict' objects}

what else do i need to add to profiler to get to the point time consumed per process. Anyway, how is the code? I have been noticing I'm being nudged to not use one liner code 'cleverly'. Meanwhile, a someone experienced in python told me list comprehensions are better. And here as a novice it is not advised. So anyhow, what needs to be improvised ? Please let me know. Thanks.
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: search for elements without duplicates in a list

Postby lovecodecakes » Tue Apr 16, 2013 9:26 pm

micseydel wrote:Yes, the count method is O(n). It's also big-Theta(n), and I should have said before that the previous one was big-­Theta too. That is to say, it's not just a scary upper bound, it's a tight bound; it will always perform so poorly, since count must traverse the entire iterable (str, list, tuple, whatever).

It's n**2 because you're doing a big-Theta(n) operation in a loop which does that operation exactly n times. To put some numbers to it, this means that a doubling in size of the iterable leads to a quadrupling of the time spent, and increasing the size x100 means a x10000 increase in time.


Can you point me some online resources where I can learn more about these code optimization and this Time-complexity? And other stuff related to this, most of the prerequisites if you will?
lovecodecakes
 
Posts: 56
Joined: Mon Feb 11, 2013 8:19 pm

Re: search for elements without duplicates in a list

Postby Mekire » Wed Apr 17, 2013 12:42 am

I'm currently reading a book that came highly recommended called "Introduction to the Theory of Computation," written by Michael Sipser. It covers all of this kind of stuff from the ground up (I'm not that far through at the moment), but there isn't a free version (not legally anyway) that I know of.

-Mek
User avatar
Mekire
 
Posts: 986
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: search for elements without duplicates in a list

Postby micseydel » Wed Apr 17, 2013 6:16 am

lovecodecakes wrote:Can you point me some online resources where I can learn more about these code optimization and this Time-complexity? And other stuff related to this, most of the prerequisites if you will?

The two links I've provided are the foundation of computer science. Any source on algorithms and datastructures is good after that (shortest path, sorting; stack, queue, heap, trees, etc.)

I'm taking a class taught out of Sipser's book. I'm not really a fan. I'm more interested in pratical application of CS toward software engineering than I am discussion on theory. I wouldn't be taking the class if it weren't required for my degree.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1179
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: search for elements without duplicates in a list

Postby setrofim » Wed Apr 17, 2013 6:37 am

Introduction to Algorithms by Cormen et al. is the classic introductory text in this area. (pdf version of the previous edition). You'll particularly want to look at chapter 3.
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm


Return to General Coding Help

Who is online

Users browsing this forum: Bing [Bot] and 5 guests