Help With Directory Site

A forum for general discussion of the Python programming language.

Help With Directory Site

Postby worthingtonclint » Mon Sep 23, 2013 1:12 am

I have created a directory based web site. I was hoping to get some tips as far as scripting a program that would scan through all of my links within the site and then print out the links that are broken or are no longer functioning so that I can fix or delete them. Any help, tips, or advice would be greatly appreciated.
worthingtonclint
 
Posts: 4
Joined: Mon Sep 23, 2013 1:05 am

Re: Help With Directory Site

Postby micseydel » Mon Sep 23, 2013 4:01 am

I'd use the HTTPConnection class in httplib. If you want more help than that, you'll need to ask more specific questions.
Join the #python-forum IRC channel on irc.freenode.net!

Please do not PM members regarding questions which are meant to be discussed publicly. The point of the forum is so that others can benefit from it. We don't want to help you over PMs or emails.
User avatar
micseydel
 
Posts: 1301
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: Help With Directory Site

Postby worthingtonclint » Mon Sep 23, 2013 5:38 am

http://www.swineworld.org/index.html#/a ... odule.html

The first example on this web page sums up what I am attempting to do in a simpler format although when I try to run this code python fails to retrieve the module linkchecker. I've found nothing on the web that would allow me to download a linechecker library or anything. Any idea what's up?
worthingtonclint
 
Posts: 4
Joined: Mon Sep 23, 2013 1:05 am

Re: Help With Directory Site

Postby worthingtonclint » Mon Sep 23, 2013 5:50 am

Code: Select all
# /usr/bin/python

from httplib import *
from urllib import *
from StringIO import *
from gzip import *

connection = HTTPConnection("google.com")
head = {"Accept-Encoding" : "gzip,deflate", "Accept-Charset" : "UTF-8,*"}
connection.request("GET", "/index.php", headers=head)
response = connection.getresponse()

if response.status == 200:
   print "Page Found Successfully, Outputting Request Body"
   raw_data = response.read()
   stream = StringIO(raw_data)
   decompressor = GzipFile(fileobj=stream)
   print decompressor.read()
elif response.status == 404:
   print "Page Not Found"
else:
   print response.status, response.reason

connection.close()


With that I just get 301 page permanently moved as a output for any and every page.
worthingtonclint
 
Posts: 4
Joined: Mon Sep 23, 2013 1:05 am


Return to General Discussions

Who is online

Users browsing this forum: ryguy7272 and 2 guests