Can't use read() for html2text?

0 votes

I'm making a Python program that searches a webpage for a word. Although, when I try

website = urllib.request.urlopen(url)
content = website.read()
website.close()
test = html2text.html2text(content)
print(test)

I get this error :

test = html2text.html2text(content)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-    packages/html2text/__init__.py", line 840, in html2text
return h.handle(html)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-  packages/html2text/__init__.py", line 129, in handle
self.feed(data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/html2text/__init__.py", line 125, in feed
data = data.replace("</' + 'script>", "</ignore>")
TypeError: a bytes-like object is required, not 'str'

I'm new to Python, so I'm not sure how to deal with this error.
Python 3.5, Mac.

Jul 3 in Python by ana1504.k
• 7,470 points
17 views

1 answer to this question.

0 votes

decode() the content with the charset sent inside the Charset header:

resource = urllib.request.urlopen(url)
content = resource.read()
charset = resource.headers.get_content_charset()
content = content.decode(charset)
answered Jul 3 by SDeb
• 12,360 points

Related Questions In Python

+2 votes
2 answers

How to use BeatifulSoup for webscraping?

your programme is fine until you start ...READ MORE

answered Apr 4, 2018 in Python by charlie_brown
• 7,710 points
23 views
0 votes
1 answer

How to use BeautifulSoup for Webscraping

Your code is good until you get ...READ MORE

answered Sep 6, 2018 in Python by Priyaj
• 56,160 points
169 views
0 votes
1 answer

Need help extracting a schema to make use for an avro file in Python

Hi, nice question. So what I daily use ...READ MORE

answered Jan 10 in Python by Nymeria
• 3,500 points
346 views
0 votes
1 answer

How to use read a WSDL file from the file system using Python suds?

Hi, good question. It is a very simple ...READ MORE

answered Jan 21 in Python by Nymeria
• 3,500 points
293 views
0 votes
1 answer

Crawling after login in Python

You missed a few login data forms, ...READ MORE

answered Sep 7, 2018 in Python by Priyaj
• 56,160 points
216 views
0 votes
1 answer

Crawling after login in Python

You missed a few login data forms, ...READ MORE

answered Sep 14, 2018 in Python by Priyaj
• 56,160 points
27 views
0 votes
1 answer

“stub” __objclass__ in a Python class how to implement it?

You want to avoid interfering with this ...READ MORE

answered Sep 27, 2018 in Python by Priyaj
• 56,160 points
41 views
+1 vote
1 answer

How is raw_input() and input() in python3.x?

raw_input() was renamed to input() so now input() returns the exact string ...READ MORE

answered Oct 30, 2018 in Python by Priyaj
• 56,160 points
34 views
0 votes
1 answer

Escaping strings for use in XML

You can try the following: from xml.dom.minidom import ...READ MORE

answered Apr 15 in Python by SDeb
• 12,360 points
37 views
0 votes
1 answer

Return a list inside a for loop while iterating over the elements of another list

The print() is getting called multiple times ...READ MORE

answered Sep 21, 2018 in Python by SDeb
• 12,360 points
42 views