Python: Issue with 'unexpected end of pattern'

0 votes

I'm doing small project on sentiment Analysis using twitter data. I have the sample csv file containing the data. but before doing the sentiment analysis part. I have to clean up the data. There is one part that I am stuck. Here's the code.

tweets['source'][2]   ## Source is an attribute in csv file containing values
Out[51]: u'<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>'

I want to clean the source(data). I don't want the the values to be shown with web links and the tags.

Here's the code for cleaning the source:

tweets['source_new'] = ''

for i in range(len(tweets['source'])):
    m = re.search('(?)(.*)', tweets['source'][i])
    try:
        tweets['source_new'][i]=m.group(0)
    except AttributeError:
        tweets['source_new'][i]=tweets['source'][i]

tweets['source_new'] = tweets['source_new'].str.replace('', ' ', case=False)

But when I executed the code. I got this error:

Traceback (most recent call last):

  File "<ipython-input-50-f92a7f05ad1d>", line 2, in <module>
    m = re.search('(?)(.*)', tweets['source'][i])

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 146, in search
    return _compile(pattern, flags).search(string)

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 251, in _compile
    raise error, v # invalid expression

error: unexpected end of pattern

I got an error saying 'error: unexpected end of pattern". Can any help me with this? I can't find the issue of the code that I am working on.

Sep 24, 2018 in Python by bug_seeker
• 15,310 points
65 views

1 answer to this question.

0 votes

I should start by stating that using a regular expression for this task is not a good idea12

Being that said, I see two ways to accomplish this depending on your context:

If you don't really know what tags you are going to encounter

We can get the HTML text value doing the following:

# Replace any HTML tag with empty string
value = re.sub('<[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value

If you know what tags you are going to encounter (recommended)

This would be my recommended approach (if you really need to use regular expressions), as it is more explicit and less prone to any surprises.

# Replace any HTML "a" tag with empty string
value = re.sub('(?i)<\/?a[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value

Alternatively, you can take a look at How to remove HTML tags from a String on Python for other options and approaches.


1 Using a Regex to remove HTML tags from a string

2 Using Regex to parse HTML

answered Sep 24, 2018 by Priyaj
• 56,160 points

Related Questions In Python

0 votes
1 answer

Need help with making use of Pluck in Python

Hi, good question. Easy solution to be ...READ MORE

answered Jan 24 in Python by Nymeria
• 3,500 points
58 views
0 votes
1 answer

How to zip with a list output in Python instead of a tuple output?

Good question - Considering that you are ...READ MORE

answered Feb 7 in Python by Nymeria
• 3,500 points
39 views
0 votes
2 answers
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

How to convert pandas dataframe to numpy array?

Irrespective of whether the dataframe has similar ...READ MORE

answered May 12 in Python by Rishi
250 views
0 votes
1 answer

Python: Issue with 'unexpected end of pattern'

I should start by stating that using ...READ MORE

answered Sep 12, 2018 in Python by Priyaj
• 56,160 points
135 views
+1 vote
1 answer

How to replace id with attribute corresponding to id of another table?

Use the following query statement and let ...READ MORE

answered Aug 8, 2018 in Python by Priyaj
• 56,160 points
19 views