parsing web page with BeautifulSoup

This is the place for queries that don't fit in any of the other categories.

parsing web page with BeautifulSoup

Postby verb » Wed Aug 07, 2013 11:39 am

Hello everyone,
i have problem when trying to parse web page on ubuntu 13.04 it stops almost at the beginning here is the code:

Code: Select all
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup


page=urllib2.urlopen('http://en.kingofsat.net/pos-13E.php')
soup=BeautifulSoup(page)


for line in soup:
    print line;


the output http://paste.ofcode.org/Suhaw79pExEQEZzMkrcE3H
verb
 
Posts: 12
Joined: Fri Feb 22, 2013 8:15 pm

Re: parsing web page with BeautifulSoup

Postby stranac » Wed Aug 07, 2013 1:09 pm

The output seems correct.
What's the problem?
Friendship is magic!

R.I.P. Tracy M. You will be missed.
User avatar
stranac
 
Posts: 1093
Joined: Thu Feb 07, 2013 3:42 pm

Re: parsing web page with BeautifulSoup

Postby verb » Wed Aug 07, 2013 1:47 pm

Hello stranac
the same code executed on fedora 16 outputs more then 1.6MB files (whole source code for the given page) and executed on Ubuntu 48K
on both distros i use bs4.__version__ = '4.2.1'
if i try to fetch and iterate the page with urllib2 it works but the problem comes when i pass it to BeautifulSoup
verb
 
Posts: 12
Joined: Fri Feb 22, 2013 8:15 pm

Re: parsing web page with BeautifulSoup

Postby verb » Wed Aug 07, 2013 3:37 pm

i have found the solution by instaling html5lib and using it like this -> BeautifulSoup(page,"html5lib")
(cheers)
verb
 
Posts: 12
Joined: Fri Feb 22, 2013 8:15 pm


Return to General Coding Help

Who is online

Users browsing this forum: snippsat and 2 guests