I need little help

A forum for general discussion of the Python programming language.

I need little help

Postby floriano » Thu Jun 06, 2013 9:45 am

I've been looking through a lot of questions related with parsing HTML with Python using BeautifulSoup, but I can't manage to get what I need.
This is a little module of a personal app I want to do, and it consists in a web login part with credentials, and once the script is logged in the web, I need to parse some information in order to manage it and process it.
The HTML code after getting loged is:

Code: Select all
<table id="dedcontent">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta http-equiv="Content-Type" content="text/html; charset=utf-8">
               <tr>
                  <td>
                  <table>
                     <tr>
                                                                                                                     <td colspan="5" style="text-align:left;padding-left:4px;" class="category">  <img src="http://www.nnnnn.com/images/world/menu.gif">
                         This is a text </td>
                     </tr>
                                          <tr>
                        <td class="date" colspan="5">June 05 </td>
                     </tr>
                                          <tr>
                        <td style="test-align:left;width:40px;">This is a text</td>
                        <td style="padding-right:4px; width:180px;text-align:right">
                        This is a text </td>
                                                <td style="width:40px;text-align:center"> <nobr><a id="I1" name="I1" href="javascript:MoreInformation(1,'1048','1527875','TT','home');">
                        This is a text</a></nobr>
                         </td>
                        <td style="padding-left:5px; width:180px;text-align:left">
                        This is a text </td>
                        <td style="width:40px;text-align:center"></td>
                     </tr>
                                          <tr>
                        <td style="test-align:left;width:40px;">This is a text</td>
                        <td style="padding-right:4px; width:180px;text-align:right">
                        This is a text </td>
                                                <td style="width:40px;text-align:center"> <nobr><a id="I2" name="I2" href="javascript:MoreInformation(2,'1048','1527874','TT','home');">
                        This is a text</a></nobr>
                         </td>
                        <td style="padding-left:5px; width:180px;text-align:left">
                        This is a text </td>
                        <td style="width:40px;text-align:center"></td>
                     </tr>
                                       </table>
                  </td>
                                 <tr>
                  <td>
                  <table>
                     <tr>


and continues with tables that I want to extract the text.
Thanks in advance for any help!
Floriano
Last edited by floriano on Thu Jun 06, 2013 11:08 am, edited 2 times in total.
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby hansn » Thu Jun 06, 2013 10:18 am

Show us what you've tried so far and what you need help with.

It is unlikely that anyone here is going to do your work for you.
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: I need little help

Postby floriano » Thu Jun 06, 2013 11:13 am

I try.
Code: Select all
import urllib2
from BeautifulSoup import BeautifulSoup


soup = BeautifulSoup(urllib2.urlopen('http://www.example.com').read())

for row in soup('table')[4].findAll('tr'):
    tds = row('td')
    print tds
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby hansn » Thu Jun 06, 2013 2:57 pm

And what do you need help with?
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: I need little help

Postby micseydel » Fri Jun 07, 2013 6:15 am

If you need to do authentication, mechanize is great.
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 928
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: I need little help

Postby floriano » Fri Jun 07, 2013 7:12 am

Thanks for your help! I try mecanize. But I want to extract first text from <td>Tis is a Text </td> and second text: <td> <nobr> <a...> This is Text</a></nobr></td>

Code: Select all
<td>This is a first Text</td><td style="width:40px;text-align:center"> <nobr><a id="I1" name="I1" href="javascript:MoreInformation(1,'1048','1527875','TT','home');">
                        This is second text</a></nobr>
                         </td>
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby micseydel » Fri Jun 07, 2013 8:09 am

For that particular bit of code, this works
Code: Select all
soup = BeautifulSoup(your_string)
soup.findAll('a')[0].string
Join the #python-forum IRC channel on irc.freenode.net!
User avatar
micseydel
 
Posts: 928
Joined: Tue Feb 12, 2013 2:18 am
Location: Mountain View, CA

Re: I need little help

Postby floriano » Fri Jun 07, 2013 10:52 am

Thanks very much, but I can't implement your code in my script.
You can help me please?

Code: Select all
soup = BeautifulSoup(your_string)
soup.findAll('a')[0].string


Code: Select all
import urllib2
from BeautifulSoup import BeautifulSoup


soup = BeautifulSoup(urllib2.urlopen('http://www.example.com').read())

for row in soup('table')[4].findAll('tr'):
    tds = row('td')
    print tds


Thanks for all help!
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby Kebap » Fri Jun 07, 2013 11:08 am

floriano wrote:I can't implement your code in my script.
You can help me please?
Thanks for all help!

What seems to be the problem? Please take a look at my signature.
Learn: How To Ask Questions The Smart Way
Join the #python-forum IRC channel on irc.freenode.net and chat with uns directly!
Kebap
 
Posts: 282
Joined: Thu Apr 04, 2013 1:17 pm
Location: Germany, Europe

Re: I need little help

Postby floriano » Fri Jun 07, 2013 11:51 am

I am a beginner, and I do not know how to implement your code in my script.
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby snippsat » Fri Jun 07, 2013 1:26 pm

It's hard to say without knowing how website source look.
But I want to extract first text from <td>Tis is a Text </td> and second text: <td> <nobr> <a...> This is Text</a></nobr></td>

For this it can look like this.
Code: Select all
from BeautifulSoup import BeautifulSoup
import re

html = '''\
<td>This is a first Text</td><td style="width:40px;text-align:center"> <nobr>
<a id="I1" name="I1" href="javascript:MoreInformation(1,'1048','1527875','TT','home');">
This is second text</a></nobr></td>'''

soup = BeautifulSoup(html)
tag_td = soup.findAll('td')[0].text
tag_a = soup.findAll('a')[0].text
print tag_td, tag_a #--> This is a first Text This is second text

#Both in one go
print [soup.findAll(re.compile(r"td|a"))[i].text for i in range(2)]
#--> [u'This is a first Text', u'This is second text']


('http://www.example.com').read())

Do not use read() before you pass url to BeautifulSoup.
Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8.
User avatar
snippsat
 
Posts: 85
Joined: Thu Feb 21, 2013 12:04 am

Re: I need little help

Postby floriano » Fri Jun 07, 2013 2:33 pm

Thanks very much for your help!
I prepared my script, but not work! Where is my mistake?

Code: Select all
from scrapy.spider import BaseSpider
from BeautifulSoup import BeautifulSoup
import re

html = '''\
<td>This is a first Text</td><td style="width:40px;text-align:center"> <nobr>
<a id="I1" name="I1" href="javascript:MoreInformation(1,'1048','1527875','TT','home');">
This is second text</a></nobr></td>'''

class NameSpider(BaseSpider):
    name = "name"
    allowed_domains = ["example.com/"]
    start_urls = [
         "http://example.com/"
    ]
   
soup = BeautifulSoup(html)
tag_td = soup.findAll('td')[0].text
tag_a = soup.findAll('a')[0].text
print tag_td, tag_a

#--> This is a first Text This is second text

#Both in one go

print [soup.findAll(re.compile(r"td|a"))[i].text for i in range(2)]

#--> [u'This is a first Text', u'This is second text']


Thanks again!
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby hansn » Fri Jun 07, 2013 6:52 pm

If your script does not work, please show us the error message you get (called Traceback).
Please read the link in kebaps signature.

Your script worked for me. I just changed the 2nd line as I use beautifulsoup4 and the import should look different. (see the beautifulsoup 4 docs if you also use beautifulsoup 4)
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: I need little help

Postby floriano » Fri Jun 07, 2013 8:50 pm

Hello, thanks for your help!
I download bs4 and now use. Here is my traceback:

Code: Select all
   Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 12
01, in mainLoop
            self.runUntilCurrent()
          File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 82
4, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 3
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 4
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 5
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\spi
der.py", line 57, in parse
            raise NotImplementedError
        exceptions.NotImplementedError:


Thanks again!
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby hansn » Fri Jun 07, 2013 9:56 pm

floriano wrote:Hello, thanks for your help!
I download bs4 and now use. Here is my traceback:

Code: Select all
   Traceback (most recent call last):
          File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 12
01, in mainLoop
            self.runUntilCurrent()
          File "C:\Python27\lib\site-packages\twisted\internet\base.py", line 82
4, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 3
80, in callback
            self._startRunCallbacks(result)
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 4
88, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 5
75, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\spi
der.py", line 57, in parse
            raise NotImplementedError
        exceptions.NotImplementedError:


Thanks again!

http://lmgtfy.com/?q=scrapy+exception+N ... ntedError#

Please read the link in Kebaps signature.
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: I need little help

Postby floriano » Sat Jun 08, 2013 6:56 am

My problems persist. I searched all night in the links on google, if anyone can help me!
I try to change this:
from scrapy.spider import BaseSpider
class NameSpider(BaseSpider)
I receive this error:
raise NotImplementedError
exceptions.NotImplementedError:

with this:
from scrapy.spider import CrawlSpider
class NameSpider(CrawlSpider)
and have this error:
from scrapy.spider import CrawlSpider
ImportError: cannot import name CrawlSpider

Thanks for any helps!
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby hansn » Sat Jun 08, 2013 8:16 am

http://lmgtfy.com/?q=ImportError%3A+can ... rawlSpider

2nd result. Look at how he imports CrawlSpider.

I've pretty much never used scrapy so I can't guarantee that this is correct, but the scrapy docs also say the same: http://doc.scrapy.org/en/0.16/topics/sp ... rawlspider

It's always a good idea to check out the documentation of a module if it's giving you problems and google can't help you. (As implied in Kebaps signature)
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm

Re: I need little help

Postby floriano » Sat Jun 08, 2013 12:49 pm

hansn, you sent me Google! If you look at my post above you can read: "I searched all night in the links on google".
"snippsat" wrote to me a little code. I Asked for help from "snippsat" and you sent me to find again in google.
If you can not help me, please you leave me alone. I found out a week about python, are at first. You are not obliged to help me, you really does not help me. This is all. This forum is for help people.
If anyone can help me, thanks very much!
floriano
 
Posts: 15
Joined: Thu Jun 06, 2013 9:10 am

Re: I need little help

Postby hansn » Sat Jun 08, 2013 1:12 pm

floriano wrote:hansn, you sent me Google! If you look at my post above you can read: "I searched all night in the links on google".
"snippsat" wrote to me a little code. I Asked for help from "snippsat" and you sent me to find again in google.
If you can not help me, please you leave me alone. I found out a week about python, are at first. You are not obliged to help me, you really does not help me. This is all. This forum is for help people.
If anyone can help me, thanks very much!

I'm sorry if you found my replies offensive.

I think googling is a skill to learn, and would rather show you how I did it so that you might learn something from it. I told you to look at the second result from my google search. More accurately: The 2nd result, the first post, the 4th line of the code he wrote. I think you will be able to find it.

I prefer to help people by pointing them in the right direction and having them figure out the rest by themselves. Mainly because personally, I don't learn anything if I don't have to do some work myself so that's how I would want people to help me.
hansn
 
Posts: 87
Joined: Thu Feb 21, 2013 8:46 pm


Return to General Discussions

Who is online

Users browsing this forum: No registered users and 1 guest