I've spent about 50 hours trying to figure out how to write a webscraper in Python 3.3. It appears hopeless at this point, so I'm posting out of desperation, with the thought that one of you super geniuses can probably resolve my 50-hour failure in a couple minutes.
I have two specific actions I am trying to execute, and I know Python can do them, but I can't bridge the gap from concept to execution.
[Also, I can write the loops for these actions just fine; the URL activities are what I can't figure out]
I have a webpage that has 200 links on it, each of which pertains to a sports team.
--from the source code, I need to ( a ) extract all 200 of these URLs and compile them into a list (for the following purpose)
--each of the team URLs shows up in the following template: <h5><a href="http://website.com/team/id/[id_number]/[abbrev_team_name]" class="bi">[Team_Name]</a></h5>
Then, within each of these pages, there are ~12 links to individual players' pages
--I need to ( b ) make another list of the URLs of all 2400 players
--each of the player URLs shows up in the following template: <td><a href="http://website.com/player/id/[PLR_id_number]/[abbrev_player_name]">[Player_Name]</a></td>
Finally, once I have the list of 2400 URLs, I need to ( c ) capture a table of data from each of them, but I think that part is going to be too complex to describe for now. If you do have any advice where that is concerned (scraping tables of data), please DO share, though.