Checking Lines in Files to Find Uniques

This is the place for queries that don't fit in any of the other categories.

Checking Lines in Files to Find Uniques

Postby slipcell » Fri Oct 18, 2013 1:12 pm

Hey,

I'm hoping someone can help me...

I have two txt files with lines of URls in each, the first is the existing file which has lots of URLs in it that I have stored and the second is a new list of URls that I would like to check against the existing file.

I would like Python to check each line of each file against each other and print out any new links that aren't in the existing file. Hope that makes sense?

Here is my code (I have rewritten it a couple dozen times so I may be well off the mark by now!)

Code: Select all
filename = raw_input('enter the new file name: ')
   
f = open(filename, 'r')

e = open('existing.txt', 'r')

def main():
    for line in f:
        url_file = line

    for line in e:
        url_existing = line
 
    if not url_file in url_existing:
        print url_file
main()




Any help is greatly appreciated!!
Last edited by Mekire on Fri Oct 18, 2013 2:35 pm, edited 1 time in total.
Reason: First post lock.
slipcell
 
Posts: 8
Joined: Fri Oct 18, 2013 1:03 pm

Re: Checking Lines in Files to Find Uniques

Postby Kebap » Fri Oct 18, 2013 2:27 pm

Hey slipcell, welcome to the forums!

You can of course walk through each line, and compare it to each line of the other file individually.

However, Python already has a few helpers for these tasks like comparing groups of objects, etc.

Then again, your code does not even run when I try. It seems you are missing a .read() somewhere?

I would suggest you use .readlines() instead, therefore directly obtaining a list of all lines in the file.

Then you can do something like:

Code: Select all
my_lines = document_a.readlines()
for current_line in document_b.readlines():
  if current_line not in my_lines:
    print current_line


Hope it helps! :mrgreen:
Last edited by Kebap on Fri Oct 18, 2013 2:42 pm, edited 1 time in total.
Learn: How To Ask Questions The Smart Way
Join the #python-forum IRC channel on irc.freenode.net and chat with uns directly!
Kebap
 
Posts: 397
Joined: Thu Apr 04, 2013 1:17 pm
Location: Germany, Europe

Re: Checking Lines in Files to Find Uniques

Postby slipcell » Fri Oct 18, 2013 2:35 pm

It's alive!!! :)

Thanks so much. Glad I joined this forum!!

[goes off to dance]
slipcell
 
Posts: 8
Joined: Fri Oct 18, 2013 1:03 pm

Re: Checking Lines in Files to Find Uniques

Postby snippsat » Fri Oct 18, 2013 3:20 pm

so give code by Kebap correct result?
It`s close but there is a problem,can you figure it out?

Code: Select all
'''urllist_1.txt-->
https://www.google.no/
http://www.sol.no/
http://www.itavisen.no/
'''

'''urllist_2.txt-->
https://www.google.no/
http://www.sol.no/
http://www.itavisen.no/
http://stackoverflow.com/
http://www.python-forum.org/
'''

so here it should print out.
Code: Select all
http://stackoverflow.com/
http://www.python-forum.org/
User avatar
snippsat
 
Posts: 271
Joined: Thu Feb 21, 2013 12:04 am

Re: Checking Lines in Files to Find Uniques

Postby slipcell » Fri Oct 18, 2013 3:32 pm

the first file needs to be longer than the second for otherwise it stops short?
slipcell
 
Posts: 8
Joined: Fri Oct 18, 2013 1:03 pm

Re: Checking Lines in Files to Find Uniques

Postby snippsat » Fri Oct 18, 2013 3:54 pm

After test it can be ok,if enter is used after the last line in "txt" file.
Then a newline(\n) is on all lines.
Here i have used enter after last line,i did not in first test.
Code: Select all
url_1 = open('urllist_1.txt')
url_2 = open('urllist_2.txt')

my_lines = url_1.readlines()
print my_lines #Se that \n is on all lines.
for current_line in url_2.readlines():
    if current_line not in my_lines:
        print current_line.strip()

'''Ouput-->
http://stackoverflow.com/
http://www.python-forum.org/'''

I was thinking of remove all new line character,so it did not make a difference.
Code: Select all
url_1 = open('urllist_1.txt')
url_2 = open('urllist_2.txt')

my_lines = [i.strip() for i in url_1]
for current_line in url_2.readlines():
    if current_line.strip() not in my_lines:
        print current_line.strip()

'''Ouput-->
http://stackoverflow.com/
http://www.python-forum.org/'''


Alternative:
Code: Select all
with open('urllist_1.txt') as f1,open('urllist_2.txt') as f2:
    url_1 = [i.strip() for i in f1]
    url_2 = [i.strip() for i in f2]
    diff_line = [i for i in url_2 if i not in url_1]
    for line in diff_line:
        print line

'''Ouput-->
http://stackoverflow.com/
http://www.python-forum.org/'''
User avatar
snippsat
 
Posts: 271
Joined: Thu Feb 21, 2013 12:04 am

Re: Checking Lines in Files to Find Uniques

Postby slipcell » Fri Oct 18, 2013 4:18 pm

Seems to work fine, not sure what you are meaning?
slipcell
 
Posts: 8
Joined: Fri Oct 18, 2013 1:03 pm

Re: Checking Lines in Files to Find Uniques

Postby snippsat » Fri Oct 18, 2013 4:42 pm

Yes now it`s working.
But if you just copy a url list and save it,it will have no new line(\n) on last line.
Then it will give wrong result like this,so my point was to remove this problem by removing all newline(\n)
Code: Select all
url_1 = open('urllist_1.txt')
url_2 = open('urllist_2.txt')

my_lines = url_1.readlines()
print my_lines
for current_line in url_2.readlines():
    if current_line not in my_lines:
        print current_line.strip()

'''Output-->
['https://www.google.no/\n', 'http://www.sol.no/\n', 'http://www.itavisen.no/']
http://www.itavisen.no/
http://stackoverflow.com/
http://www.python-forum.org/'''
User avatar
snippsat
 
Posts: 271
Joined: Thu Feb 21, 2013 12:04 am


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot], leopard555, Mekire and 5 guests