at model fit I m getting an error as ValueError Input contains NaN infinity or a value too large for dtype float32

+3 votes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

company= pd.read_csv("C:\\Users\\Mahesh\\Desktop\\Sravanthi\\company_Data.csv")
colnames=list(company.columns)

x=pd.get_dummies(company.iloc[:,1:11]).astype('bool')
X=pd.DataFrame(x)
y=company.iloc[:,0]
y.isnull().sum()
y=pd.get_dummies(company.iloc[:,0]).astype('bool')
Y=pd.DataFrame(y)
help(pd.DataFrame)
y = y.dropna()
y.isnull().sum()
from sklearn.model_selection import train_test_split
train,test=train_test_split(company,test_size=0.3)
x=x.drop([399],inplace=True,axis=0)

from sklearn.tree import DecisionTreeClassifier
model=DecisionTreeClassifier(criterion='entropy')

model.fit(train[X],train[Y])
Apr 10, 2020 in Data Analytics by anonymous
• 180 points
807 views

1 answer to this question.

0 votes

Hi,

I think your X parameter contains null value, that's why you got this error. You can avoid this error by feature engineering concept. In this you can avoid NAN in two ways. One way is to remove the null value and another way is replacing missing values with suitable value(Imputation concept)

To remove NAN values, you can use dropna(). But it will remove the total row or column. So you can create your own customize function that is called Imputation.

Hope this will help you.

answered Apr 30, 2020 by MD
• 95,160 points

Related Questions In Data Analytics

0 votes
2 answers

R function for finding the index of an element in a vector?

The function match works on vectors : x <- sample(1:10) x # ...READ MORE

answered Dec 11, 2020 in Data Analytics by Rajiv
• 8,880 points
40,201 views
0 votes
1 answer

Finding the nth highest value in a vector or a data-frame column

sort(x,T)[n] Here, 'x' is the data-frame/vector and 'n' ...READ MORE

answered May 31, 2018 in Data Analytics by Bharani
• 4,620 points
6,404 views
0 votes
1 answer
0 votes
1 answer

Error saying "some group too small for qda"

It's not wrong code (there is little ...READ MORE

answered Nov 9, 2018 in Data Analytics by Maverick
• 10,840 points
1,925 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

I'm trying to start rattle on R Studio and end up with an error

You need the package RGtk2 for rattle to ...READ MORE

answered Nov 26, 2018 in Data Analytics by Maverick
• 10,840 points
1,654 views
0 votes
1 answer