Python Issue with unexpected end of pattern

Question

I'm doing small project on sentiment Analysis using twitter data. I have the sample csv file containing the data. but before doing the sentiment analysis part. I have to clean up the data. There is one part that I am stuck. Here's the code.

tweets['source'][2]   ## Source is an attribute in csv file containing values
Out[51]: u'<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>'

I want to clean the source(data). I don't want the the values to be shown with web links and the tags.

Here's the code for cleaning the source:

tweets['source_new'] = ''

for i in range(len(tweets['source'])):
    m = re.search('(?)(.*)', tweets['source'][i])
    try:
        tweets['source_new'][i]=m.group(0)
    except AttributeError:
        tweets['source_new'][i]=tweets['source'][i]

tweets['source_new'] = tweets['source_new'].str.replace('', ' ', case=False)

But when I executed the code. I got this error:

Traceback (most recent call last):

  File "<ipython-input-50-f92a7f05ad1d>", line 2, in <module>
    m = re.search('(?)(.*)', tweets['source'][i])

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 146, in search
    return _compile(pattern, flags).search(string)

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 251, in _compile
    raise error, v # invalid expression

error: unexpected end of pattern

I got an error saying 'error: unexpected end of pattern". Can any help me with this? I can't find the issue of the code that I am working on.

Priyaj · Answer 1 · Sep 12, 2018

I should start by stating that using a regular expression for this task is not a good idea¹²

Being that said, I see two ways to accomplish this depending on your context:

If you don't really know what tags you are going to encounter

We can get the HTML text value doing the following:

# Replace any HTML tag with empty string
value = re.sub('<[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value

If you know what tags you are going to encounter (recommended)

This would be my recommended approach (if you really need to use regular expressions), as it is more explicit and less prone to any surprises.

# Replace any HTML "a" tag with empty string
value = re.sub('(?i)<\/?a[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value

answered Sep 12, 2018 by Priyaj
• 58,020 points

Python Issue with unexpected end of pattern

Your comment on this question:

1 answer to this question.

Your answer

If you don't really know what tags you are going to encounter

If you know what tags you are going to encounter (recommended)

Your comment on this answer:

Related Questions In Python

Need help with making use of Pluck in Python

Need help with Django URL string parameter pattern in Python

How to zip with a list output in Python instead of a tuple output?

Has Python 3.0 reached end of support?

How to create Pandas series from numpy array?

Pandas series with custom index

How to create Pandas series from dictionary?

How to convert pandas dataframe to numpy array?

How to replace id with attribute corresponding to id of another table?

What are the types of dictionary in python?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES