Find if a string already exists in a file

General discussions related to Python.

Moderators: KDoiron, ChrJim, mawe, python

Find if a string already exists in a file

Postby PythonNewbie2 on Sun Jul 11, 2010 7:17 pm

Hey guys,

You've been super helpful before and I really appreciate all of your help.

Basically, what I need to do is check if a line already exists in a file and if it doesn't add it to a certain spot.

So let's say I have this XML file named write_it.xml:
Code: Select all
<EnclosingTag>


<Fierce name="Item1" separator="," src="myfile1.csv" />
<Fierce name="Item2" separator="," src="myfile2.csv" />
<Fierce name="Item3" separator="," src="myfile3.csv" />
<Fierce name="Item4" separator="," src="myfile4.csv" />
<Fierce name="Item5" separator="," src="myfile5.csv" />

<NotFierce Name="Item22">
</NotFierce>

</EnclosingTag>


What I want to be able to do is check if something like
<Fierce name="Item1" separator="," src="myfile1.csv" /> already exists in the file and if it doesn't, write it into the file. Basically, I will be running this script on a daily basis, but I only want it to add the necessary code when it doesn't already exist (most likely the very first day it is run). Every other time it runs, if the script notices that the line(s) already exists, I want it not to add anything.

Any help would be appreciated.

Here's the code I wrote for writing to the XML file. I've tested it and played around until I got it working properly. I just need your help in adding the correct "if" statement to check if the string already exists before adding it to the file. THANKS!!!

Code: Select all
#==========WRITING OUT FILES=========##
filesrc = r'C:\Reports\write_it.xml'
f = open(filesrc)
lines = f.readlines()
print lines
f.close()

'''
This is where the if statement will need to go.  Something like, if '<Fierece name="%s" separator="," src="%s" />'%('VARIABLE','mytestfile.csv') doesn't already exist in the file, please set it to lines[2].  Else, ignore it and move on with the script.
'''

lines[2] = '\n<Fierce name="%s" separator="," src="%s" />\n'%('VARIABLE','mytestfile.csv')
f = open(filesrc, 'w')
f.writelines(lines)
f.close()
print file(filesrc).read()
PythonNewbie2
Python User
Python User
 
Posts: 58
Joined: Wed Jun 30, 2010 7:53 am

Re: Find if a string already exists in a file

Postby istihza on Sun Jul 11, 2010 11:56 pm

I think you can create two sets, and check for common elements.

Code: Select all
file = open("write_it.xml")

control_list = ['<Fierce name="Item1" separator="," src="myfile1.csv" />']

a = set([j.strip() for j in file])
b = set(control_list)

for i in a & b:
   print i,
istihza
New Python User
New Python User
 
Posts: 40
Joined: Tue Nov 11, 2008 3:43 pm
Location: Turkey

Re: Find if a string already exists in a file

Postby Taos on Mon Jul 12, 2010 6:44 am

Or just parse the xml?
Taos
Python Heavy Programmer
Python Heavy Programmer
 
Posts: 374
Joined: Mon Apr 26, 2010 5:08 am

Re: Find if a string already exists in a file

Postby PythonNewbie2 on Mon Jul 12, 2010 7:46 am

Thank you very much istihza! That's exactly what I needed to get started!

I have a few more questions.
1. If I wanted to delete the matching line from the file, how could I do that?
2. Can you look over my code to see if it can be improved?

Here's the code that I wrote using your help and I've tested it. It works the way I want it to, but I'm wondering if it is inefficient...

Code: Select all
##==========WRITING OUT FILES=========##

variable1 = 'Item2'
variable2 = 'mytestfile.csv'
line_to_check = '<Fierce name="%s" separator="," src="%s" />'%(variable1,variable2)

filesrc = r'C:\Reports\write_it.xml'
XMLfile = open(filesrc)
lines = XMLfile.readlines()
print lines
XMLfile.close()

##  \n adds a new line
line_to_write = '\n%s\n'%(line_to_check)

##===Checking if line already exists=====##
XMLfile = open(filesrc)
control_list = [line_to_check]

a = set([line.strip() for line in XMLfile])
b = set(control_list)

same_line = ''
for matching_elements in a & b:
   same_line = matching_elements
   print same_line
   
XMLfile.close()

XMLfile = open(filesrc, 'w')
if same_line == line_to_check:
    print '\nYES! LINE MATCHES! DONT REWRITE! \n'
else:
    print '\nNope, that line is not here. Lets write it in.  \n'
    lines[2] = line_to_write
XMLfile.writelines(lines)
XMLfile.close()

print file(filesrc).read()
PythonNewbie2
Python User
Python User
 
Posts: 58
Joined: Wed Jun 30, 2010 7:53 am

Re: Find if a string already exists in a file

Postby istihza on Mon Jul 12, 2010 1:10 pm

To check whether the line already exists in the file, just look at the truth value of the intersection set. If the intersection set is empty (which means that its truth value is False), then it means that the line does not exist in the file. Otherwise, if the intersection set is non-empty (which means its truth value is True), it means that the line already exists in the file. This will save you a few lines of code.

To manipulate the file, first dump its contents to a list. Then make the necessary changes to this list, and write the contents of the final list to a file:

Code: Select all
variable1 = 'Item132'
variable2 = 'mytestfile.csv'
control_list = ['<Fierce name="%s" separator="," src="%s" />'
%(variable1, variable2)]

filesrc = 'write_it.xml'
XMLfile = open(filesrc)

lines = XMLfile.readlines()

a = set([line.strip() for line in lines])
b = set(control_list)

#if the intersection set of a and b is not empty,
#it means that the line matches...
if a & b:
    print '\nYES! LINE MATCHES! DONT REWRITE! \n'
   
else:
    print '\nNope, that line is not here. Lets write it in.  \n'
    #insert the line to the second index of the 'lines' list.
    lines.insert(2, "\n" + control_list[0] + "\n")
    XMLfile = open("write_it.xml", "w")
   
    #write the 'lines' list to the file
    XMLfile.writelines(lines)
    XMLfile.close()


To remove the matched line from the file you can use something like this:

Code: Select all
lines.remove(control_list[0]+"\n")
istihza
New Python User
New Python User
 
Posts: 40
Joined: Tue Nov 11, 2008 3:43 pm
Location: Turkey


Return to Python General

Who is online

Users browsing this forum: No registered users and 1 guest


Sponsored by Dreamlink Web hosting and Traduzioni Rumeno Italiano and ASSP Deluxe for cPanel.