Python: Issue with 'unexpected end of pattern'

0 votes

I'm doing small project on sentiment Analysis using twitter data. I have the sample csv file containing the data. but before doing the sentiment analysis part. I have to clean up the data. There is one part that I am stuck. Here's the code.

tweets['source'][2]   ## Source is an attribute in csv file containing values
Out[51]: u'<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>'

I want to clean the source(data). I don't want the the values to be shown with web links and the tags.

Here's the code for cleaning the source:

tweets['source_new'] = ''

for i in range(len(tweets['source'])):
    m = re.search('(?)(.*)', tweets['source'][i])
    try:
        tweets['source_new'][i]=m.group(0)
    except AttributeError:
        tweets['source_new'][i]=tweets['source'][i]

tweets['source_new'] = tweets['source_new'].str.replace('', ' ', case=False)

But when I executed the code. I got this error:

Traceback (most recent call last):

  File "<ipython-input-50-f92a7f05ad1d>", line 2, in <module>
    m = re.search('(?)(.*)', tweets['source'][i])

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 146, in search
    return _compile(pattern, flags).search(string)

  File "C:\Users\aneeq\Anaconda2\lib\re.py", line 251, in _compile
    raise error, v # invalid expression

error: unexpected end of pattern

I got an error saying 'error: unexpected end of pattern". Can any help me with this? I can't find the issue of the code that I am working on.

Sep 12, 2018 in Python by bug_seeker
• 15,510 points
536 views

1 answer to this question.

0 votes

I should start by stating that using a regular expression for this task is not a good idea12

Being that said, I see two ways to accomplish this depending on your context:

If you don't really know what tags you are going to encounter

We can get the HTML text value doing the following:

# Replace any HTML tag with empty string
value = re.sub('<[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value

If you know what tags you are going to encounter (recommended)

This would be my recommended approach (if you really need to use regular expressions), as it is more explicit and less prone to any surprises.

# Replace any HTML "a" tag with empty string
value = re.sub('(?i)<\/?a[^>]*>', '', tweets['source'][i])
tweets['source_new'] = value
answered Sep 12, 2018 by Priyaj
• 57,700 points

Related Questions In Python

0 votes
1 answer

Need help with making use of Pluck in Python

Hi, good question. Easy solution to be ...READ MORE

answered Jan 24, 2019 in Python by Nymeria
• 3,520 points
435 views
0 votes
1 answer

How to zip with a list output in Python instead of a tuple output?

Good question - Considering that you are ...READ MORE

answered Feb 7, 2019 in Python by Nymeria
• 3,520 points
259 views
0 votes
1 answer

Has Python 3.0 reached end of support?

The first release of Python 3 was ...READ MORE

answered Jun 7, 2019 in Python by Harsh
• 180 points
68 views
0 votes
1 answer

How to create Pandas series from numpy array?

Hi. Refer to the below command: import pandas ...READ MORE

answered Apr 1, 2019 in Python by Pavan
1,103 views
0 votes
1 answer
0 votes
1 answer

How to create Pandas series from dictionary?

Here's a sample script: import pandas as pd import ...READ MORE

answered Apr 1, 2019 in Python by Prateek
426 views
0 votes
1 answer

How to convert pandas dataframe to numpy array?

Irrespective of whether the dataframe has similar ...READ MORE

answered May 12, 2019 in Python by Rishi
5,514 views
0 votes
1 answer

Python: Issue with 'unexpected end of pattern'

I should start by stating that using ...READ MORE

answered Sep 24, 2018 in Python by Priyaj
• 57,700 points
456 views
+1 vote
1 answer

How to replace id with attribute corresponding to id of another table?

Use the following query statement and let ...READ MORE

answered Aug 8, 2018 in Python by Priyaj
• 57,700 points
133 views