How to use BeatifulSoup for webscraping?

+2 votes

I'm trying to collect all the titles of a forum from a certain site. I can't really figure out which HTML elements to target as I'm not very familiar with the site structure. 

This is what I could develop reading the documentation

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://thailove.net/bbs/board.php?bo_table=ent'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

#I don't think this is correct, but not sure on how else to to do this...
containers = page_soup.findAll("td",{"class":"td_subject"})


for container in containers:
subject = container.a.font.font.contents
#similarly not sure this is correct     
print("subject: ", subject)


I'm not really sure where I should be trying to improvise 
Apr 4, 2018 in Python by ariaholic
• 7,340 points
33 views

2 answers to this question.

+2 votes
Best answer

your programme is fine until you start executing the for-loop.  You have to access container.a.contents[0]to get the subjects, and the print function should be inside your for loop:

for container in containers:
    subject = container.a.contents[0]
    print("subject: ", subject)
answered Apr 4, 2018 by charlie_brown
• 7,720 points

selected Oct 12, 2018 by Omkar
0 votes
You can go through the below link:

Here the webscrapping is explained in brief
https://www.dataquest.io/blog/web-scraping-tutorial-python/
answered Oct 12, 2018 by findingbugs
• 4,750 points

Related Questions In Python

0 votes
1 answer

How to use BeautifulSoup for Webscraping

Your code is good until you get ...READ MORE

answered Sep 6, 2018 in Python by Priyaj
• 56,520 points
193 views
0 votes
1 answer

How to use for loop in Python?

There are multiple ways of using for ...READ MORE

answered Mar 4 in Python by Priyaj
• 56,520 points
20 views
0 votes
1 answer

Raw_input method is not working in python3. How to use it?

raw_input is not supported anymore in python3. ...READ MORE

answered May 4, 2018 in Python by aayushi
• 750 points
113 views
0 votes
2 answers

how to use print statement in python3?

The print statement has been replaced with a print() ...READ MORE

answered Jul 16, 2018 in Python by Mrunal
• 680 points
38 views
0 votes
2 answers

How to use threading in Python?

 Thread is the smallest unit of processing that ...READ MORE

answered Apr 6 in Python by anonymous
81 views
0 votes
1 answer

How to use “raise” keyword in Python

You can use it to raise errors ...READ MORE

answered Jul 30, 2018 in Python by Priyaj
• 56,520 points
33 views
0 votes
1 answer

How to use string.replace() in python 3.x

replace() is a method of <class 'str'> ...READ MORE

answered Aug 3, 2018 in Python by Priyaj
• 56,520 points
53 views
+1 vote
2 answers

How to use the pass statement in Python

In Python programming, pass is a null statement. The ...READ MORE

answered Apr 5 in Python by anonymous
54 views
0 votes
1 answer

How can I use python to execute a curl command?

For sake of simplicity, maybe you should ...READ MORE

answered Oct 11, 2018 in Python by charlie_brown
• 7,720 points
12,850 views
0 votes
1 answer

How to use not equal operator in python

Use !=. See comparison operators. For comparing object identities, ...READ MORE

answered Dec 20, 2018 in Python by charlie_brown
• 7,720 points
94 views