Python: Issue with 'unexpected end of pattern'

0 votes

I'm doing small project on sentiment Analysis using twitter data. I have the sample csv file containing the data. but before doing the sentiment analysis part. I have to clean up the data. There is one part that I am stuck. Here's the code.

tweets['source'][2]   ## Source is an attribute in csv file containing values
Out[51]: u'<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>'

I want to clean the source(data). I don't want the the values to be shown with web links and the tags.

Here's the code for cleaning the source:

tweets['source_new'] = ''

for i in range(len(tweets['source'])):
    m = re.search('(?)(.*)', tweets['source'][i])
    try:
        tweets['source_new'][i]=m.group(0)
    except AttributeError:
        tweets['source_new'][i]=tweets['source'][i]

tweets['source_new'] = tweets['source_new'].str.replace('', ' ', case=False)

But when I executed the code. I got this error:

Traceback (most recent call last):

  File "<ipython-input-50-f92a7f05ad1d>", line 2, in <module>
    m = re.search('(?)(.*)', tweets['source'][i])

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 146, in search
    return _compile(pattern, flags).search(string)

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 251, in _compile
    raise error, v # invalid expression

error: unexpected end of pattern

I got an error saying 'error: unexpected end of pattern". Can any help me with this? I can't find the issue of the code that I am working on.

Sep 12, 2018 in Python by bug_seeker
• 15,350 points
183 views

1 answer to this question.

0 votes

I should start by stating that using a regular expression for this task is not a good idea12

Being that said, I see two ways to accomplish this depending on your context:

If you don't really know what tags you are going to encounter

We can get the HTML text value doing the following:

# Replace any HTML tag with empty string
value = re.sub('<[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value

If you know what tags you are going to encounter (recommended)

This would be my recommended approach (if you really need to use regular expressions), as it is more explicit and less prone to any surprises.

# Replace any HTML "a" tag with empty string
value = re.sub('(?i)<\/?a[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value
answered Sep 12, 2018 by Priyaj
• 56,520 points

Related Questions In Python

0 votes
1 answer

Need help with making use of Pluck in Python

Hi, good question. Easy solution to be ...READ MORE

answered Jan 24 in Python by Nymeria
• 3,520 points
90 views
0 votes
1 answer

How to zip with a list output in Python instead of a tuple output?

Good question - Considering that you are ...READ MORE

answered Feb 7 in Python by Nymeria
• 3,520 points
43 views
0 votes
2 answers
0 votes
1 answer

How to create Pandas series from numpy array?

Hi. Refer to the below command: import pandas ...READ MORE

answered Apr 1 in Python by Pavan
109 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

How to convert pandas dataframe to numpy array?

Irrespective of whether the dataframe has similar ...READ MORE

answered May 12 in Python by Rishi
938 views
0 votes
1 answer

Python: Issue with 'unexpected end of pattern'

I should start by stating that using ...READ MORE

answered Sep 24, 2018 in Python by Priyaj
• 56,520 points
97 views
+1 vote
1 answer

How to replace id with attribute corresponding to id of another table?

Use the following query statement and let ...READ MORE

answered Aug 8, 2018 in Python by Priyaj
• 56,520 points
29 views