Forcing resumption of execution

This is the place for queries that don't fit in any of the other categories.

Forcing resumption of execution

Postby comcomtech » Wed Aug 28, 2013 2:16 pm

My son wrote the following script to help me verify the uniformity of "randomly generated" directory output.

The directory refuses to respond after some 100 or so queries and resumes after a delay of an hour or two.

I am not a programmer. I need to modify the script to:
1. Force resumption of execution (after an hour, say) if the targeted site fails to respond.
2. Generate a written file (file.txt) if the targeted site doesn't resume responding in an hour (so I can capture the information already in memory.
3. Enable an execution of the script at a later point in time to add to the data already collected in the first run (and printed to a file ["file.txt"] rather than creating a new file).

Alternately, I need to modify the script to:
1. Pause after 50 searches for one hour. Then resume searching.
2. I need a total of about 1,500 searches.

Thanks very much in advance for your help!

Josh Wallace

P.S. My son just started his doctorate in math; he's too busy now to fiddle with this anymore.

Code: Select all
import re
import time
import urllib.parse
import urllib.request

# post data
url = 'http://ottiaq.org/en/directory/results/'
values = { 'langue_depart' : 'EN', 'langue_arrivee' : 'FR', 'cat' : 'ALL', 'profession[0]' : 'on', 'profession[1]' : 'off', 'profession[2]' : 'off', 'profession[3]' : 'off' }
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')

# open existing file (maybe remove this since multiple executions doesn't really work)
# f = open('test.txt', 'r+')
# tempfile = f.read().rsplit('\n')
# file = [[tempfile[i], tempfile[i+1]] for i in range(0, len(tempfile)-2, 2)]

filename = input('Log file: ')
f = open(filename, 'w+')
N = input('Number of samples: ')
file = []

# patterns
p = re.compile('"nom"><h4>\s*[^\s]*\s*,\s*[^\s]*')
q1 = re.compile('"nom"><h4>\s*')
q2 = re.compile('\s*,\s*')

# false webpage (for testing)
for i in range(int(N)):
   # retrieve data and extract relevant section
   
   # for false page, use:
   # html = open('test.html','r').read()
   # m = p.findall(html)
   # time.sleep(0.5)
   
   # for real page, use:
   req = urllib.request.Request(url, data)
   response = urllib.request.urlopen(req)
   html = response.read()
   m = p.findall(html.decode('utf-8'))
   time.sleep(0.5)

   # format names
   for index, item in enumerate(m):
      item = q1.sub('', item)
      item = q2.sub(', ', item)
      m[index] = item
   
   # update count
Last edited by micseydel on Wed Aug 28, 2013 4:47 pm, edited 1 time in total.
Reason: Locked OP, added code tags.
comcomtech
 
Posts: 3
Joined: Wed Aug 28, 2013 2:07 pm

Re: Forcing resumption of execution

Postby micseydel » Thu Aug 29, 2013 1:50 am

comcomtech wrote:I am not a programmer. I need to modify the script to:

What is it that you want from us exactly? Are you looking to hire someone to accomplish this for you? Are you looking for help to become a programmer so that you can make these modifications? Or are you looking for someone to do it for you?
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1256
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Forcing resumption of execution

Postby comcomtech » Thu Aug 29, 2013 5:02 am

I'm looking for someone to do it for me, assuming it's fairly easy to do.

I already made some modifications to the original code (the delay in each execution, the print to screen) that enabled me to try and find a work around and ultimately diagnose the problem.

I think there are a couple of lines of code I can add that would provide the desired result. If one of you could tell me what they are, I can insert them myself.
comcomtech
 
Posts: 3
Joined: Wed Aug 28, 2013 2:07 pm

Re: Forcing resumption of execution

Postby micseydel » Thu Aug 29, 2013 6:24 am

I should have said this before, but it doesn't like you included the whole code. I noticed that when I added code tags as well.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1256
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Forcing resumption of execution

Postby comcomtech » Thu Aug 29, 2013 10:03 am

Code: Select all
# You're right. Sorry.


import re
import time
import urllib.parse
import urllib.request

# post data
url = 'http://ottiaq.org/en/directory/results/'
values = { 'langue_depart' : 'EN', 'langue_arrivee' : 'FR', 'cat' : 'ALL', 'profession[0]' : 'on', 'profession[1]' : 'off', 'profession[2]' : 'off', 'profession[3]' : 'off' }
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')

# open existing file (maybe remove this since multiple executions doesn't really work)
# f = open('test.txt', 'r+')
# tempfile = f.read().rsplit('\n')
# file = [[tempfile[i], tempfile[i+1]] for i in range(0, len(tempfile)-2, 2)]

filename = input('Log file: ')
f = open(filename, 'w+')
N = input('Number of samples: ')
file = []

# patterns
p = re.compile('"nom"><h4>\s*[^\s]*\s*,\s*[^\s]*')
q1 = re.compile('"nom"><h4>\s*')
q2 = re.compile('\s*,\s*')

# false webpage (for testing)
for i in range(int(N)):
   # retrieve data and extract relevant section
   
   # for false page, use:
   # html = open('test.html','r').read()
   # m = p.findall(html)
   # time.sleep(0.5)
   
   # for real page, use:
   req = urllib.request.Request(url, data)
   response = urllib.request.urlopen(req)
   html = response.read()
   m = p.findall(html.decode('utf-8'))
   time.sleep(0.5)

   # format names
   for index, item in enumerate(m):
      item = q1.sub('', item)
      item = q2.sub(', ', item)
      m[index] = item
   
   # update count
   for name in m:
      matched = 0
      # check if name is already in file and increase corresponding count if it is
      for index, entry in enumerate(file):
         if re.match(name, entry[0]):
            matched = 1
            file[index][1] = str(1 + int(file[index][1]))
      # if name was not in file, append it with count 1
      if matched == 0:
         file.append([name, str(1)])
         print(name)

f.truncate(0)
# write data to file
for item in file:
   f.write('%s\n%s\n' % (item[0], item[1]))

f.close()

Mekire: Again. PLEASE use code tags. Indentation is lost without them.
Last edited by Mekire on Thu Aug 29, 2013 10:15 am, edited 1 time in total.
Reason: Code tags added.
comcomtech
 
Posts: 3
Joined: Wed Aug 28, 2013 2:07 pm


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot], ivan614969 and 3 guests