Help With Directory Site

A forum for general discussion of the Python programming language.

Help With Directory Site

Postby worthingtonclint » Mon Sep 23, 2013 1:12 am

I have created a directory based web site. I was hoping to get some tips as far as scripting a program that would scan through all of my links within the site and then print out the links that are broken or are no longer functioning so that I can fix or delete them. Any help, tips, or advice would be greatly appreciated.
worthingtonclint
 
Posts: 4
Joined: Mon Sep 23, 2013 1:05 am

Re: Help With Directory Site

Postby micseydel » Mon Sep 23, 2013 4:01 am

I'd use the HTTPConnection class in httplib. If you want more help than that, you'll need to ask more specific questions.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 1113
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Help With Directory Site

Postby worthingtonclint » Mon Sep 23, 2013 5:38 am

http://www.swineworld.org/index.html#/a ... odule.html

The first example on this web page sums up what I am attempting to do in a simpler format although when I try to run this code python fails to retrieve the module linkchecker. I've found nothing on the web that would allow me to download a linechecker library or anything. Any idea what's up?
worthingtonclint
 
Posts: 4
Joined: Mon Sep 23, 2013 1:05 am

Re: Help With Directory Site

Postby worthingtonclint » Mon Sep 23, 2013 5:50 am

Code: Select all
# /usr/bin/python

from httplib import *
from urllib import *
from StringIO import *
from gzip import *

connection = HTTPConnection("google.com")
head = {"Accept-Encoding" : "gzip,deflate", "Accept-Charset" : "UTF-8,*"}
connection.request("GET", "/index.php", headers=head)
response = connection.getresponse()

if response.status == 200:
   print "Page Found Successfully, Outputting Request Body"
   raw_data = response.read()
   stream = StringIO(raw_data)
   decompressor = GzipFile(fileobj=stream)
   print decompressor.read()
elif response.status == 404:
   print "Page Not Found"
else:
   print response.status, response.reason

connection.close()


With that I just get 301 page permanently moved as a output for any and every page.
worthingtonclint
 
Posts: 4
Joined: Mon Sep 23, 2013 1:05 am


Return to General Discussions

Who is online

Users browsing this forum: No registered users and 2 guests