Python Issue with unexpected end of pattern

0 votes

I'm doing small project on sentiment Analysis using twitter data. I have the sample csv file containing the data. but before doing the sentiment analysis part. I have to clean up the data. There is one part that I am stuck. Here's the code.

tweets['source'][2]   ## Source is an attribute in csv file containing values
Out[51]: u'<a href="" rel="nofollow">Twitter for Android</a>'

I want to clean the source(data). I don't want the the values to be shown with web links and the tags.

Here's the code for cleaning the source:

tweets['source_new'] = ''

for i in range(len(tweets['source'])):
    m ='(?)(.*)', tweets['source'][i])
    except AttributeError:

tweets['source_new'] = tweets['source_new'].str.replace('', ' ', case=False)

But when I executed the code. I got this error:

Traceback (most recent call last):

  File "<ipython-input-50-f92a7f05ad1d>", line 2, in <module>
    m ='(?)(.*)', tweets['source'][i])

  File "C:\Users\aneeq\Anaconda2\lib\", line 146, in search
    return _compile(pattern, flags).search(string)

  File "C:\Users\aneeq\Anaconda2\lib\", line 251, in _compile
    raise error, v # invalid expression

error: unexpected end of pattern

I got an error saying 'error: unexpected end of pattern". Can any help me with this? I can't find the issue of the code that I am working on.

Sep 12, 2018 in Python by bug_seeker
• 15,520 points

1 answer to this question.

0 votes

I should start by stating that using a regular expression for this task is not a good idea12

Being that said, I see two ways to accomplish this depending on your context:

If you don't really know what tags you are going to encounter

We can get the HTML text value doing the following:

# Replace any HTML tag with empty string
value = re.sub('<[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value

If you know what tags you are going to encounter (recommended)

This would be my recommended approach (if you really need to use regular expressions), as it is more explicit and less prone to any surprises.

# Replace any HTML "a" tag with empty string
value = re.sub('(?i)<\/?a[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value
answered Sep 12, 2018 by Priyaj
• 58,090 points

Related Questions In Python

0 votes
1 answer

Need help with making use of Pluck in Python

Hi, good question. Easy solution to be ...READ MORE

answered Jan 24, 2019 in Python by Nymeria
• 3,560 points
0 votes
1 answer

How to zip with a list output in Python instead of a tuple output?

Good question - Considering that you are ...READ MORE

answered Feb 7, 2019 in Python by Nymeria
• 3,560 points
0 votes
1 answer

Has Python 3.0 reached end of support?

The first release of Python 3 was ...READ MORE

answered Jun 7, 2019 in Python by Harsh
• 180 points
0 votes
1 answer

How to create Pandas series from numpy array?

Hi. Refer to the below command: import pandas ...READ MORE

answered Apr 1, 2019 in Python by Pavan
0 votes
1 answer
0 votes
1 answer

How to create Pandas series from dictionary?

Here's a sample script: import pandas as pd import ...READ MORE

answered Apr 1, 2019 in Python by Prateek
0 votes
1 answer

How to convert pandas dataframe to numpy array?

Irrespective of whether the dataframe has similar ...READ MORE

answered May 13, 2019 in Python by Rishi
+1 vote
1 answer

How to replace id with attribute corresponding to id of another table?

Use the following query statement and let ...READ MORE

answered Aug 8, 2018 in Python by Priyaj
• 58,090 points
0 votes
2 answers

What are the types of dictionary in python?

There are 4 types of dictionary Empty Integer Mixed Dictionary with ...READ MORE

answered Feb 14, 2019 in Python by Shashank
• 1,370 points
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP