discretization in weka

0 votes
I need to know when is the right time to do discretization in weka.I have data set,i need to create training and testing data samples from that data. Should i do the discretization for the numerical attributes before the sampling or after the sampling?
Apr 5, 2022 in Machine Learning by Dev
• 6,000 points
2,293 views

1 answer to this question.

0 votes

This should be self-evident.

You can do it later as long as you obtain the same outcome regardless of the split you used. But what is the advantage of doing so? Then just start with the preprocessing.

You should be alright if you discretize by rounding - for example, float to integer (which is unaffected by the split). However, if you discretize using quantiles, it should be evident that you can make a mistake because the different portions will be discretized differently!

Let's imagine you want to divide data into two categories:

Input data    Type     Output value
0.9           good     1.05
1.0           good     1.05
1.1           good     1.05
1.2           good     1.05
---
2.1           good     2.20
2.3           good     2.20
2.2           good     2.20
---  SPLIT HERE ---
1.1           bad      1.20
1.2           bad      1.20
1.3           bad      1.20
---
1.9           bad      2.00
2.0           bad      2.00
2.1           bad      2.00

Because the average of each cluster of values was used, both "good" and "bad" were discretized into two discrete values. The resulting property, however, plainly reveals the genuine membership because the averages for "excellent" and "bad" differ. The task of detecting "bad" has gotten a lot simpler.
Separate preprocessing is not required and you don't need to perform it also.

Elevate your skills with our comprehensive Machine Learning Course.

answered Apr 7, 2022 by Nandini
• 5,480 points

Related Questions In Machine Learning