Him the answer is a pretty simple one.
Make use of lxml. This is one among the best HTML/XML libraries in Python.
Consider the following piece of code:
import lxml.html
t = lxml.html.fromstring("...")
t.text_content()
And also if you wish to sanitize the HTML code to look clean then make use of the following module:
module - lxml.html.clean
Hope this helped!