How to filter HTML tags and resolve entities using Python?

0 votes
Hello all. with regard to the above question - My concern is with the tags.

I know I can get away by making use of Regular Expressions. But, to be honest, I shy away from them because they overwhelm me. So coming to the question - I am trying to find the best way O could use to basically remove all of the HTML tags and later resolving the HTML entities in Python.

Note: Input is a string.

How can I go about doing this? All help appreciated!
Feb 13 in Python by Anirudh
• 2,050 points
33 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Him the answer is a pretty simple one. 

Make use of lxml. This is one among the best HTML/XML libraries in Python.

Consider the following piece of code:

import lxml.html
t = lxml.html.fromstring("...")
t.text_content()

And also if you wish to sanitize the HTML code to look clean then make use of the following module:

module - lxml.html.clean

Hope this helped!

answered Feb 13 by Nymeria
• 3,500 points

Related Questions In Python

0 votes
1 answer
+2 votes
2 answers

How to make a laplacian pyramid using OpenCV python?

down voteacceptTheeThe problem is that you're iterating ...READ MORE

answered Apr 3, 2018 in Python by charlie_brown
• 7,710 points
948 views
0 votes
2 answers

how to print the current time using python?

print(datetime.datetime.today()) READ MORE

answered Feb 14 in Python by Shashank
• 1,350 points
19 views
0 votes
1 answer

how can i count the items in a list?

suppose you have a list a = [0,1,2,3,4,5,6,7,8,9,10] now ...READ MORE

answered May 2 in Python by Mohammad
• 1,400 points
20 views
+4 votes
6 answers
0 votes
1 answer

Shortest path from source to and from a negative cycle using Bellman Ford in Python

class NegativeWeightFinder: def __init__(self, graph: nx.Graph): ...READ MORE

answered Nov 13, 2018 in Python by Nymeria
• 3,500 points
45 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.