at model.fit I'm getting an error as : ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

0 votes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

company= pd.read_csv("C:\\Users\\Mahesh\\Desktop\\Sravanthi\\company_Data.csv")
colnames=list(company.columns)

x=pd.get_dummies(company.iloc[:,1:11]).astype('bool')
X=pd.DataFrame(x)
y=company.iloc[:,0]
y.isnull().sum()
y=pd.get_dummies(company.iloc[:,0]).astype('bool')
Y=pd.DataFrame(y)
help(pd.DataFrame)
y = y.dropna()
y.isnull().sum()
from sklearn.model_selection import train_test_split
train,test=train_test_split(company,test_size=0.3)
x=x.drop([399],inplace=True,axis=0)

from sklearn.tree import DecisionTreeClassifier
model=DecisionTreeClassifier(criterion='entropy')

model.fit(train[X],train[Y])
Apr 10 in Data Analytics by anonymous
• 150 points
132 views

1 answer to this question.

0 votes

Hi,

I think your X parameter contains null value, that's why you got this error. You can avoid this error by feature engineering concept. In this you can avoid NAN in two ways. One way is to remove the null value and another way is replacing missing values with suitable value(Imputation concept)

To remove NAN values, you can use dropna(). But it will remove the total row or column. So you can create your own customize function that is called Imputation.

Hope this will help you.

answered Apr 30 by MD
• 23,050 points

Related Questions In Data Analytics

0 votes
1 answer

R function for finding the index of an element in a vector?

Yes, we can find the index of ...READ MORE

answered Apr 13, 2018 in Data Analytics by zombie
• 3,750 points
12,939 views
0 votes
1 answer

Finding the nth highest value in a vector or a data-frame column

sort(x,T)[n] Here, 'x' is the data-frame/vector and 'n' ...READ MORE

answered May 31, 2018 in Data Analytics by Bharani
• 4,560 points
2,456 views
0 votes
1 answer
0 votes
1 answer

Error saying "some group too small for qda"

It's not wrong code (there is little ...READ MORE

answered Nov 9, 2018 in Data Analytics by Maverick
• 10,820 points
1,065 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

I'm trying to start rattle on R Studio and end up with an error

You need the package RGtk2 for rattle to ...READ MORE

answered Nov 26, 2018 in Data Analytics by Maverick
• 10,820 points
749 views
0 votes
1 answer

How to create Heatmap for visualizing my dataset?

Hi@akhtar, You have to import seaborn module to ...READ MORE

answered May 4 in Data Analytics by MD
• 23,050 points
123 views