BeautifulSoup (KeyError: 0)

This is the place for queries that don't fit in any of the other categories.

BeautifulSoup (KeyError: 0)

Postby runawaykinms » Sun Dec 29, 2013 12:13 am

Ok, I give up. I have been searching for two days now and can't find a solution to this error. I am trying to write a function that pulls a list of the companies on the S&P 500 from a Wikipedia page and saves it to a CSV file in my directory. Here is my code:

Code: Select all
def s&p500():
    import urllib
    import csv
    from bs4 import BeautifulSoup
    open_html = urllib.urlopen('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    page = open_html.read()
    soup = BeautifulSoup(page)
    rows = table.find_all('tr')
    ofile = open('home/my_file_directory/s&p500.csv', 'wb')
    writer = csv.writer(ofile)
    writer.writerows(rows)
    ofile.close()


However, when I run this line by line I get this error:

Code: Select all
Traceback (most recent call last):
  File "<pyshell#36>", line 1, in <module>
    writer.writerows(rows)
  File "/usr/lib/python2.7/dist-packages/bs4/element.py", line 892, in __getitem__
    return self.attrs[key]
KeyError: 0


Thanks in advance for any suggestions or hints!
runawaykinms
 
Posts: 7
Joined: Tue Dec 24, 2013 9:57 am

Re: BeautifulSoup (KeyError: 0)

Postby metulburr » Sun Dec 29, 2013 1:20 am

You shouldnt put import like that in functions.

Your code shouldnt even be working. Your table is not even defined. Since the table you are referring to is apparently the first, you can just soup.find('table') to the get the html of that table. and then table.find_all('tr') to get a list of each row.

this code puts each row into the variable rows, the first one being the header. At this point you would just loop rows and parse the html further for each column of what specifcally you want.

Code: Select all
try:
    import urllib.request as REQ
except ImportError:
    import urllib2 as REQ
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
req = REQ.urlopen(url)
page = req.read()
soup = BeautifulSoup(page)
table = soup.find('table')
rows = table.find_all('tr')
print(rows[1].prettify())
print(rows[1].text)


--output--
Code: Select all
<tr>
 <td>
  <a class="external text" href="http://www.nyse.com/about/listed/lcddata.html?ticker=mmm" rel="nofollow">
   MMM
  </a>
 </td>
 <td>
  <a href="/wiki/3M" title="3M">
   3M Company
  </a>
 </td>
 <td>
  <a class="external text" href="http://www.sec.gov/cgi-bin/browse-edgar?CIK=MMM&amp;action=getcompany" rel="nofollow">
   reports
  </a>
 </td>
 <td>
  Industrials
 </td>
 <td>
  Industrial Conglomerates
 </td>
 <td>
  <a class="mw-redirect" href="/wiki/St_Paul,_Minnesota" title="St Paul, Minnesota">
   St Paul, Minnesota
  </a>
 </td>
 <td>
 </td>
</tr>


MMM
3M Company
reports
Industrials
Industrial Conglomerates
St Paul, Minnesota

New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1564
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY

Re: BeautifulSoup (KeyError: 0)

Postby runawaykinms » Sun Dec 29, 2013 7:53 pm

Thanks for the reply! Your suggestion helped me fix the problem. I was using this code:
Code: Select all
writer.writerows(rows)

However, I should have been using this:
Code: Select all
writer.writerow(rows)


That change allowed me to write to the csv file. Unfortunately, it did not come out the way I wanted it. However, I am going to work on that the next couple of days and try to figure it out.
runawaykinms
 
Posts: 7
Joined: Tue Dec 24, 2013 9:57 am

Re: BeautifulSoup (KeyError: 0)

Postby runawaykinms » Mon Dec 30, 2013 7:30 pm

I have almost completed this task, but the last step is giving me some trouble. I can't get the format in the csv file I am looking for.

Here my the code:
Code: Select all
import urllib
import csv
from bs4 import BeautifulSoup

html = urllib.urlopen('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
page = html.read()
soup = BeautifulSoup(page)
writer = csv.writer(open('my_file_directory//s&p500.csv', 'w'))

table = soup.find_all('table')[0]
rows = table.find_all('tr')
columns = table.find_all('td')

symbols = columns[0].get_text()
name = columns[1].get_text()
reports = columns[2].get_text()
sector = columns[3].get_text()
industry = columns[4].get_text()
address = columns[5].get_text()

for tr in rows:
    writer.writerow([symbols,name])


However, it writes above data only for the first row in the list of rows. I then tried this code:
Code: Select all
writer.writerow([tr.get_text()])


However, that returns the desired code, but for each row it lists all the data in the same column. I am so close here but can't figure out how to separate each piece of data to a new column without getting an error of some sort.
runawaykinms
 
Posts: 7
Joined: Tue Dec 24, 2013 9:57 am

Re: BeautifulSoup (KeyError: 0)

Postby runawaykinms » Tue Dec 31, 2013 6:09 am

I finally figured it out!!!!! :D

In case anyone was following this thread and interested in the final result, here is the code:

Code: Select all
import urllib
import csv
from bs4 import BeautifulSoup

html = urllib.urlopen('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
page = html.read()
soup = BeautifulSoup(page)
writer = csv.writer(open('/my_file_directory//s&p500.csv', 'w'))

table = soup.find_all('table')[0]
rows = table.find_all('tr')
columns = table.find_all('td')

writer.writerow(['Symbol', 'Name', 'Reports', 'Sector', 'Industry', 'Address'])

for tr in rows:
    tds = tr.find_all('td')
    for td in tds[0:1]:
        writer.writerow([tds[0].get_text(),tds[1].get_text(),tds[3].get_text(),tds[4].get_text(),tds[5].get_text()])
runawaykinms
 
Posts: 7
Joined: Tue Dec 24, 2013 9:57 am

Re: BeautifulSoup (KeyError: 0)

Postby snippsat » Tue Dec 31, 2013 8:08 am

Do not do this,it can even make stuff not work.
Code: Select all
page = html.read()

Pass html into BeautifulSoup like this without read().
Code: Select all
soup = BeautifulSoup(html)

Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8.
You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't detect one.
User avatar
snippsat
 
Posts: 296
Joined: Thu Feb 21, 2013 12:04 am

Re: BeautifulSoup (KeyError: 0)

Postby metulburr » Tue Dec 31, 2013 3:37 pm

Do not do this,it can even make stuff not work.
Code: Select all
page = html.read()

Pass html into BeautifulSoup like this without read().
Code: Select all
soup = BeautifulSoup(html)

Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8.
You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't detect one.

interesting...i did not know that. I did that as well

@OP
writer.writerow([tds[0].get_text(),tds[1].get_text(),tds[3].get_text(),tds[4].get_text(),tds[5].get_text()])

i would assign each a variable name and write the variable name to the file. I have done this before, and i come back 6 months later to find i have no idea what i am writing to the file.
New Users, Read This
OS Ubuntu 14.04, Arch Linux, Gentoo, Windows 7/8
https://github.com/metulburr
steam
User avatar
metulburr
 
Posts: 1564
Joined: Thu Feb 07, 2013 4:47 pm
Location: Elmira, NY


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot] and 4 guests