utf-8 codec can t decode byte 0x82 in position 16 invalid start byte

Question

hello,

I'm working on a sentiment analysis project where I'm dealing with the Arabic language. I downloaded an excel sheet that contains two columns, text and labels. and I'm getting this error 'utf-8' codec can't decode byte 0x82 in position 16: invalid start byte. The file itself can open, but when I want to tokenize the text the error occurs!

please help me very soon!!!

this is my code

import nltk
nltk.download('punkt')
token_data= open("data try.xlsx").read()
tokens = nltk.sent_tokenize(token_data)
sent_tokenize(token_data)

MD · Answer 1 · Jun 29, 2020

Hi@zena,

The error is because there is some non-ASCII character and it can't be encoded/decoded. One simple way to avoid this error is to encode such strings. You can use the below line in your code.

token_data= open("data try.xlsx",encoding="utf8").read()

answered Jun 29, 2020 by MD
• 95,460 points

utf-8 codec can t decode byte 0x82 in position 16 invalid start byte

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Python

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 10: invalid start byte

UnicodeDecodeError: "utf-8" codec can't decode byte in position : invalid start byte

utf-8' codec can't decode byte 0xa0 in position 10: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

how can i randomly select items from a list?

how can i count the items in a list?

how do i use the enumerate function inside a list?

Lowercase in Python

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 16: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES