How to perform HTML decoding/encoding using Python/Django?

0 votes

I have a string that is HTML encoded:

'''<img class="size-medium wp-image-113"\
 style="margin-left: 15px;" title="su1"\
 src="ttp://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg"\
 alt="" width="300" height="194" />'''

I want to change that to:

<img class="size-medium wp-image-113" style="margin-left: 15px;" 
  title="su1" src="ttp://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg" 
  alt="" width="300" height="194" /> 

I want this to register as HTML so that it is rendered as an image by the browser instead of being displayed as text.

The string is stored like that because I am using a web-scraping tool called BeautifulSoup, it "scans" a web-page and gets certain content from it, then returns the string in that format.

I've found how to do this in C# but not in Python. Can someone help me out?

May 6 in Python by kartik
• 20,470 points
280 views

1 answer to this question.

0 votes

Hello,

For html encoding, there's cgi.escape from the standard library:

>> help(cgi.escape)
cgi.escape = escape(s, quote=None)
    Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
    is also translated.

For html decoding, I use the following:

import re
from htmlentitydefs import name2codepoint
# for some reason, python 2.5.2 doesn't have this one (apostrophe)
name2codepoint['#39'] = 39

def unescape(s):
    "unescape HTML code refs; c.f. http://wiki.python.org/moin/EscapingHtml"
    return re.sub('&(%s);' % '|'.join(name2codepoint),
              lambda m: unichr(name2codepoint[m.group(1)]), s)

For anything more complicated, I use BeautifulSoup.

Hope this is helpful!!

Thank You!!

answered May 6 by Niroj
• 43,540 points

Related Questions In Python

0 votes
1 answer

How to filter HTML tags and resolve entities using Python?

Him the answer is a pretty simple ...READ MORE

answered Feb 13, 2019 in Python by Nymeria
• 3,520 points
247 views
0 votes
1 answer

How to fetch HTML code using urllib module in Python?

Hi@akhtar, You can use urllib module to fetch ...READ MORE

answered Jun 26 in Python by MD
• 41,340 points
40 views
0 votes
1 answer

How to extract specific tags in multiple html .txt files using python.

Hello, @Pooja, Even I got the same issue, ...READ MORE

answered 3 days ago in Python by Kedaar Thomas
50 views
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 6, 2019 in Python by Neha
• 330 points

edited Jul 8, 2019 by Kalgi 1,190 views
0 votes
0 answers
0 votes
1 answer

How to Creat a JSON response using Django and Python?

Hello @kartik, I usually use a dictionary, not ...READ MORE

answered 2 days ago in Python by Niroj
• 43,540 points
22 views
0 votes
1 answer

How to get the latest file in a folder using python?

Hello @kartik,  would suggest using glob.iglob() instead of the glob.glob(), as ...READ MORE

answered May 27 in Python by Niroj
• 43,540 points
320 views