item frequency count in python

This is the place for queries that don't fit in any of the other categories.

item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:15 pm

I know there is a good way to sort by the items frequency in python, but my question is how to keep only the items which have the frequency >1? Could anyone helps me? I prefer the format of result like this: [('banana', 3), ('apple', 2)]. Thank you in advance!

Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())
print(freqs)
Counter({'banana': 3, 'apple': 2, 'strawberry': 1, 'lemon': 1})


I know if I do "Counter(freqs).most_common(2)", the results will be [('banana', 3), ('apple', 2)], but as I don't know how many elements have the frequency >1, this is not what I want.
Last edited by ericrystal on Fri Jun 14, 2013 1:25 pm, edited 1 time in total.
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby Mekire » Fri Jun 14, 2013 1:25 pm

You seem to be able to iterate over Counter objects just like dicts so:
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = {item:count for (item,count) in freqs.items() if count>1}
print(a)
Result:
Code: Select all
{'apple': 2, 'banana': 3}

-Mek

Edit: Sorry, I saw your format request.
User avatar
Mekire
 
Posts: 1120
Joined: Thu Feb 07, 2013 11:33 pm
Location: Asakusa, Japan

Re: item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:34 pm

Mekire wrote:You seem to be able to iterate over Counter objects just like dicts so:
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = {item:count for (item,count) in freqs.items() if count>1}
print(a)
Result:
Code: Select all
{'apple': 2, 'banana': 3}

-Mek

Edit: Sorry, I saw your format request.


Thanks a lot! But I need that 'banana': 3 is before 'apple': 2, how to do this?
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:42 pm

Mekire wrote:You seem to be able to iterate over Counter objects just like dicts so:
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = {item:count for (item,count) in freqs.items() if count>1}
print(a)
Result:
Code: Select all
{'apple': 2, 'banana': 3}

-Mek

Edit: Sorry, I saw your format request.


I think when we use freqs.items(), the order of freqs chages, so the result obtained is {'apple': 2, 'banana': 3} instead of {'banana': 3, 'apple': 2}.
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 1:42 pm

How about
Code: Select all
big_enough = ((word, freq) for word, freq in freqs.iteritems() if freq > 1)
sorted_words = sorted(big_enough, key=lambda (word, freq): freq, reverse=True)

?

EDIT: This was repeatedly edited because I kept having things too hacky. This is a bit verbose but I would argue the most readable. Some unpacking wasn't entirely necessary, but that's what I ended up with.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1488
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: item frequency count in python

Postby ericrystal » Fri Jun 14, 2013 1:47 pm

micseydel wrote:How about
Code: Select all
sorted((pair for pair in freqs.iteritems() if pair[1] > 1), key=lambda pair: -pair[1])

?


It works, thanks a lot!
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Re: item frequency count in python

Postby Mekire » Fri Jun 14, 2013 1:51 pm

Yeah, I gave it back to you as a dictionary. Dictionaries are by definition unordered.

You could, in addition to Mics suggestion, just leave it as a Counter object.
Code: Select all
from collections import Counter
words = "apple banana apple strawberry banana lemon banana"
freqs = Counter(words.split())

a = Counter({item:count for (item,count) in freqs.items() if count>1})
print(a)
Result:
Code: Select all
Counter({'banana': 3, 'apple': 2})

-Mek
User avatar
Mekire
 
Posts: 1120
Joined: Thu Feb 07, 2013 11:33 pm
Location: Asakusa, Japan

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 1:53 pm

Mekire wrote:Yeah, I gave it back to you as a dictionary. Dictionaries are by definition unordered.

In his post, he requests a list of tuples. Perhaps he had edited it before you saw that though.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1488
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: item frequency count in python

Postby Mekire » Fri Jun 14, 2013 2:10 pm

I'm not sure actually. I thought he requested a dict my first time through but I may have just misread. Also reading the description of counter it sounds like the frequency ordering might be a fluke. The description states that it is every bit as unordered as a standard dict, so my last suggestion wouldn't have helped if this is indeed the case.

-Mek
User avatar
Mekire
 
Posts: 1120
Joined: Thu Feb 07, 2013 11:33 pm
Location: Asakusa, Japan

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 2:14 pm

It uses hashing, which is unordered in the sense that it might seem like it is but you cannot rely on it. Here's a fun read about the generic data structure used internally in Python to implement dictionaries. Very good basic knowledge. It's second year material at my university, typically after 2-3 quarters of other programming but after an introduction to OOP.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1488
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: item frequency count in python

Postby ochichinyezaboombwa » Fri Jun 14, 2013 4:50 pm

There is a trivial solution (and it is in fact in the OP):
Code: Select all
>>> from collections import Counter
>>> words = "apple banana apple strawberry banana lemon banana"
>>> freqs = Counter(words.split())
>>> [(x,y) for x,y in freqs.most_common(5) if y > 1]
[('banana', 3), ('apple', 2)]

Simply notice that most_common() returns things in the desired order.
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: item frequency count in python

Postby micseydel » Fri Jun 14, 2013 4:53 pm

ochichinyezaboombwa wrote:Simply notice that most_common() returns things in the desired order.

Oh my gosh DUH. Thanks for the catch ochichinyezaboombwa.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1488
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA


Return to General Coding Help

Who is online

Users browsing this forum: morissio and 2 guests