discretization in weka

0 votes
I need to know when is the right time to do discretization in weka.I have data set,i need to create training and testing data samples from that data. Should i do the discretization for the numerical attributes before the sampling or after the sampling?
Apr 5 in Machine Learning by Dev
• 6,000 points

1 answer to this question.

0 votes

This should be self-evident.

You can do it later as long as you obtain the same outcome regardless of the split you used. But what is the advantage of doing so? Then just start with the preprocessing.

You should be alright if you discretize by rounding - for example, float to integer (which is unaffected by the split). However, if you discretize using quantiles, it should be evident that you can make a mistake because the different portions will be discretized differently!

Let's imagine you want to divide data into two categories:

Input data    Type     Output value
0.9           good     1.05
1.0           good     1.05
1.1           good     1.05
1.2           good     1.05
2.1           good     2.20
2.3           good     2.20
2.2           good     2.20
---  SPLIT HERE ---
1.1           bad      1.20
1.2           bad      1.20
1.3           bad      1.20
1.9           bad      2.00
2.0           bad      2.00
2.1           bad      2.00

Because the average of each cluster of values was used, both "good" and "bad" were discretized into two discrete values. The resulting property, however, plainly reveals the genuine membership because the averages for "excellent" and "bad" differ. The task of detecting "bad" has gotten a lot simpler.
Separate preprocessing is not required and you don't need to perform it also.

answered Apr 7 by Nandini
• 5,480 points

Related Questions In Machine Learning

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

What is clustering in Machine Learning?

Clustering is a type of unsupervised learning ...READ MORE

answered May 10, 2019 in Machine Learning by Shridhar
0 votes
1 answer

Is there any easy way to fill in missing data?

You can try the following code: First, you ...READ MORE

answered Jun 20, 2018 in Data Analytics by DataKing99
• 8,240 points
0 votes
1 answer

SMOTE-function not working in R

If you convert 'y' to a factor, ...READ MORE

answered Jun 27, 2018 in Data Analytics by CodingByHeart77
• 3,720 points
0 votes
1 answer

How to find out cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 27, 2018 in Data Analytics by Sahiti
• 6,360 points
0 votes
1 answer

Create vector matrix of movie ratings using R project?

Why do'nt you try the dcast function, in the reshape2 package. d ...READ MORE

answered Jun 30, 2018 in Data Analytics by anonymous
0 votes
1 answer

How do I create a linear regression model in Weka without training?

Weka is a classification algorithm. This is ...READ MORE

answered Mar 9 in Machine Learning by Nandini
• 5,480 points
0 votes
1 answer

What is inductive bias in machine learning?

Inductive bias can be understood as an ...READ MORE

answered Feb 10 in Machine Learning by Nandini
• 5,480 points
Send OTP
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP