item frequency count in python

This is the place for queries that don't fit in any of the other categories.

item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:15 pm

I know there is a good way to sort by the items frequency in python, but my question is how to keep only the items which have the frequency >1? Could anyone helps me? I prefer the format of result like this: [('banana', 3), ('apple', 2)]. Thank you in advance!

Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())
print(freqs)
Counter({'banana': 3, 'apple': 2, 'strawberry': 1, 'lemon': 1})


I know if I do "Counter(freqs).most_common(2)", the results will be [('banana', 3), ('apple', 2)], but as I don't know how many elements have the frequency >1, this is not what I want.
Last edited by ericrystal on Fri Jun 14, 2013 1:25 pm, edited 1 time in total.
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby Mekire » Fri Jun 14, 2013 1:25 pm

You seem to be able to iterate over Counter objects just like dicts so:
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = {item:count for (item,count) in freqs.items() if count>1}
print(a)
Result:
Code: Select all
{'apple': 2, 'banana': 3}

-Mek

Edit: Sorry, I saw your format request.
User avatar
Mekire
 
Posts: 980
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:34 pm

Mekire wrote:You seem to be able to iterate over Counter objects just like dicts so:
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = {item:count for (item,count) in freqs.items() if count>1}
print(a)
Result:
Code: Select all
{'apple': 2, 'banana': 3}

-Mek

Edit: Sorry, I saw your format request.


Thanks a lot! But I need that 'banana': 3 is before 'apple': 2, how to do this?
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:42 pm

Mekire wrote:You seem to be able to iterate over Counter objects just like dicts so:
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = {item:count for (item,count) in freqs.items() if count>1}
print(a)
Result:
Code: Select all
{'apple': 2, 'banana': 3}

-Mek

Edit: Sorry, I saw your format request.


I think when we use freqs.items(), the order of freqs chages, so the result obtained is {'apple': 2, 'banana': 3} instead of {'banana': 3, 'apple': 2}.
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 1:42 pm

How about
Code: Select all
big_enough = ((word, freq) for word, freq in freqs.iteritems() if freq > 1)
sorted_words = sorted(big_enough, key=lambda (word, freq): freq, reverse=True)

?

EDIT: This was repeatedly edited because I kept having things too hacky. This is a bit verbose but I would argue the most readable. Some unpacking wasn't entirely necessary, but that's what I ended up with.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1116
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:47 pm

micseydel wrote:How about
Code: Select all
sorted((pair for pair in freqs.iteritems() if pair[1] > 1), key=lambda pair: -pair[1])

?


It works, thanks a lot!
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby Mekire » Fri Jun 14, 2013 1:51 pm

Yeah, I gave it back to you as a dictionary. Dictionaries are by definition unordered.

You could, in addition to Mics suggestion, just leave it as a Counter object.
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = Counter({item:count for (item,count) in freqs.items() if count>1})
print(a)
Result:
Code: Select all
Counter({'banana': 3, 'apple': 2})

-Mek
User avatar
Mekire
 
Posts: 980
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 1:53 pm

Mekire wrote:Yeah, I gave it back to you as a dictionary. Dictionaries are by definition unordered.

In his post, he requests a list of tuples. Perhaps he had edited it before you saw that though.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1116
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: item frequency count in python

Postby Mekire » Fri Jun 14, 2013 2:10 pm

I'm not sure actually. I thought he requested a dict my first time through but I may have just misread. Also reading the description of counter it sounds like the frequency ordering might be a fluke. The description states that it is every bit as unordered as a standard dict, so my last suggestion wouldn't have helped if this is indeed the case.

-Mek
User avatar
Mekire
 
Posts: 980
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 2:14 pm

It uses hashing, which is unordered in the sense that it might seem like it is but you cannot rely on it. Here's a fun read about the generic data structure used internally in Python to implement dictionaries. Very good basic knowledge. It's second year material at my university, typically after 2-3 quarters of other programming but after an introduction to OOP.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1116
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: item frequency count in python

Postby ochichinyezaboombwa » Fri Jun 14, 2013 4:50 pm

There is a trivial solution (and it is in fact in the OP):
Code: Select all
>>> from collections import Counter
>>> words = "apple banana apple strawberry banana lemon banana"
>>> freqs = Counter(words.split())
>>> [(x,y) for x,y in freqs.most_common(5) if y > 1]
[('banana', 3), ('apple', 2)]

Simply notice that most_common() returns things in the desired order.
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 4:53 pm

ochichinyezaboombwa wrote:Simply notice that most_common() returns things in the desired order.

Oh my gosh DUH. Thanks for the catch ochichinyezaboombwa.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1116
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA


Return to General Coding Help

Who is online

Users browsing this forum: freddyhard and 2 guests