## compare file line by line found the longest parts… in python

This is the place for queries that don't fit in any of the other categories.

### compare file line by line found the longest parts… in python

How I compare line by line text file and found the longest reapeated parts and their frequencies.

Example:
Code: Select all
`A B C DA B C DA B C EA B C FA B C `

Result would be list like that:

[['2','A B C D'],['3','A B C']]

this is what I have done http://pastebin.ca/2332147 but it transforms the text to the array and it's searching the most repeated chain in the whole text and I need to find the most repeated chain between the lines.

Help someone?
boy157

Posts: 2
Joined: Thu Mar 14, 2013 6:27 pm

### Re: compare file line by line found the longest parts… in py

I know some will frown on using numpy, but this is what I use most often.
You could build a dictionary using numpy. e.g.

Code: Select all
`import numpy as npa = np.array(['a b c', 'a b c', 'a b c', 'e f g'])counts = {}while np.shape(a)!=0:    counts.update( {a[0]:np.size(a[a==a[0]],axis=0)} )    a = a[a!=a[0]]`

This will give you a dictionary like: counts = {'a b c':3, 'e f g':1}

You could also continue using lists (likely more favorable to more users):

Code: Select all
`# First make a unique list.  # If we have same list as above: a = ['a b c', 'a b c', 'a b c', 'e f g']b = list( set( ['a b c', 'a b c', 'a b c', 'e f g'] ) )counts = {}for r in b:     counts.update( {r:a.count(r)} )`

This should give the same dictionary as using numpy.
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython

tnknepp

Posts: 153
Joined: Mon Mar 11, 2013 7:41 pm

### Re: compare file line by line found the longest parts… in py

Python has something in the standard library to do this for you.
Code: Select all
`from collections import Counterlines = ('A B C D', 'A B C D', 'A B C E', 'A B C F', 'A B C')counter = Counter(lines)print sorted(counter.most_common(), key=lambda item: len(item[0]),             reverse=True)[('A B C D', 2), ('A B C F', 1), ('A B C E', 1), ('A B C', 1)]`
Due to the reasons discussed here we will be moving to python-forum.io/ on October 1 2016
This forum will be locked down and no one will be able to post/edit/create threads, etc. here from thereafter. Please create an account at the new site to continue discussion.

Yoriz

Posts: 1672
Joined: Fri Feb 08, 2013 1:35 am
Location: UK

### Re: compare file line by line found the longest parts… in py

Thanks guys

but yours solutions are searching only the lines which are the same, for example:

when i have this lines:

A B C D
A B C F

output is [(A B C D,1)(A B C F,1)]

but i need this output:

[(A B C,2)]

Can you help me?
boy157

Posts: 2
Joined: Thu Mar 14, 2013 6:27 pm

### Re: compare file line by line found the longest parts… in py

Are you always limiting yourself to the first three letters in the list, or will you eventually want to limit yourself to two, or expand to more?
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython

tnknepp

Posts: 153
Joined: Mon Mar 11, 2013 7:41 pm

Return to General Coding Help

### Who is online

Users browsing this forum: Bing [Bot] and 4 guests