How to parse HTML like a pro

A forum for general discussion of the Python programming language.

How to parse HTML like a pro

Postby hansn » Fri Jun 07, 2013 10:15 pm

I recently discovered Metalburrs thread about programming ideas; viewtopic.php?f=10&t=378
It gave me lots of stuff to work on and made me realize that I've skipped a lot of the basics during my learning process. Like extracting data from webpages and parsing HTML.

So I just made my first parsing script, it's basically showing the current weather in Trondheim, Norway.
Code: Select all
import requests
from bs4 import BeautifulSoup

def print_yr(soup):
    td_tags = soup.find_all('td')
    for tag in td_tags:
        try:
            title = tag['title']
            if 'For the period' in title:
                print '\t' + title
                break
        except KeyError:
            pass
    for tag in td_tags:
        try:
            title = tag['title']
            if 'Temperature:' in title:
                print '\t' + title
                break
        except KeyError:
            pass

yr = 'http://www.yr.no/place/Norway/S%C3%B8r-Tr%C3%B8ndelag/Trondheim/Trondheim/'
yr_request = requests.get(yr)
yr_soup = BeautifulSoup(yr_request.text)

print '\n\nwww.yr.no:'
print_yr(yr_soup)

I'm not looking for help with the code (critics are always welcome), but I would like some opinions on how I made it work.

I basically found what tag the current weather was inside of in the source code of the website, compared part of the text inside that tag to the text in all the other tags of that kind and printed the text if it matched.

I just think it seems like an unprofessional way of doing it but I can't think of any other ways.
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: How to parse HTML like a pro

Postby snippsat » Sat Jun 08, 2013 12:33 am

Dårlig vær i Trondheim ;)

I just think it seems like an unprofessional way of doing it but I can't think of any other ways.

There are other ways do this an easier,you are doing to much stuff and kind of messy code.
So this give the same output as your code.
Code: Select all
import requests
from bs4 import BeautifulSoup

yr = 'http://www.yr.no/place/Norway/S%C3%B8r-Tr%C3%B8ndelag/Trondheim/Trondheim/'
yr_request = requests.get(yr)
soup = BeautifulSoup(yr_request.text)

table = soup.find('table', {'class': 'yr-table yr-table-overview2 yr-popup-area'})
td = table.find_all('td')
for i in range(1,3):
    print td[i]['title']

Output:
Code: Select all
Rain. For the period: 03:00–06:00
Temperature: 10°.  Feels like 10°.  For the period: 03:00
User avatar
snippsat
 
Posts: 221
Joined: Thu Feb 21, 2013 12:04 am

Re: How to parse HTML like a pro

Postby hansn » Sat Jun 08, 2013 8:48 am

Oh that's a lot cleaner. Thanks!

I was actually worried that comparing to a string in the html to find the desired input was unprofessional as the html might change or something :roll:
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm


Return to General Discussions

Who is online

Users browsing this forum: No registered users and 3 guests