Can t use read for html2text

0 votes

I'm making a Python program that searches a webpage for a word. Although, when I try

website = urllib.request.urlopen(url)
content = website.read()
website.close()
test = html2text.html2text(content)
print(test)

I get this error :

test = html2text.html2text(content)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-    packages/html2text/__init__.py", line 840, in html2text
return h.handle(html)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-  packages/html2text/__init__.py", line 129, in handle
self.feed(data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/html2text/__init__.py", line 125, in feed
data = data.replace("</' + 'script>", "</ignore>")
TypeError: a bytes-like object is required, not 'str'

I'm new to Python, so I'm not sure how to deal with this error.
Python 3.5, Mac.

Jul 3, 2019 in Python by ana1504.k
• 7,910 points
1,415 views

1 answer to this question.

0 votes

decode() the content with the charset sent inside the Charset header:

resource = urllib.request.urlopen(url)
content = resource.read()
charset = resource.headers.get_content_charset()
content = content.decode(charset)
answered Jul 3, 2019 by SDeb
• 13,300 points

Related Questions In Python

+2 votes
2 answers

How to use BeatifulSoup for webscraping?

your programme is fine until you start ...READ MORE

answered Apr 4, 2018 in Python by charlie_brown
• 7,720 points
742 views
0 votes
1 answer

How to use BeautifulSoup for Webscraping

Your code is good until you get ...READ MORE

answered Sep 6, 2018 in Python by Priyaj
• 58,090 points
1,929 views
0 votes
1 answer

Need help extracting a schema to make use for an avro file in Python

Hi, nice question. So what I daily use ...READ MORE

answered Jan 10, 2019 in Python by Nymeria
• 3,560 points
4,531 views
0 votes
1 answer

How to use read a WSDL file from the file system using Python suds?

Hi, good question. It is a very simple ...READ MORE

answered Jan 21, 2019 in Python by Nymeria
• 3,560 points
7,672 views
0 votes
1 answer

Crawling after login in Python

You missed a few login data forms, ...READ MORE

answered Sep 7, 2018 in Python by Priyaj
• 58,090 points
1,436 views
0 votes
1 answer

Crawling after login in Python

You missed a few login data forms, ...READ MORE

answered Sep 14, 2018 in Python by Priyaj
• 58,090 points
698 views
0 votes
1 answer

“stub” __objclass__ in a Python class how to implement it?

You want to avoid interfering with this ...READ MORE

answered Sep 27, 2018 in Python by Priyaj
• 58,090 points
1,811 views
+1 vote
1 answer

How is raw_input() and input() in python3.x?

raw_input() was renamed to input() so now input() returns the exact string ...READ MORE

answered Oct 30, 2018 in Python by Priyaj
• 58,090 points
684 views
0 votes
1 answer

Escaping strings for use in XML

You can try the following: from xml.dom.minidom import ...READ MORE

answered Apr 15, 2019 in Python by SDeb
• 13,300 points
682 views
0 votes
1 answer

Return a list inside a for loop while iterating over the elements of another list

The print() is getting called multiple times ...READ MORE

answered Sep 22, 2018 in Python by SDeb
• 13,300 points
4,652 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP