discretization in weka

0 votes
I need to know when is the right time to do discretization in weka.I have data set,i need to create training and testing data samples from that data. Should i do the discretization for the numerical attributes before the sampling or after the sampling?
Apr 5, 2022 in Machine Learning by Dev
• 6,000 points
955 views

1 answer to this question.

0 votes

This should be self-evident.

You can do it later as long as you obtain the same outcome regardless of the split you used. But what is the advantage of doing so? Then just start with the preprocessing.

You should be alright if you discretize by rounding - for example, float to integer (which is unaffected by the split). However, if you discretize using quantiles, it should be evident that you can make a mistake because the different portions will be discretized differently!

Let's imagine you want to divide data into two categories:

Input data    Type     Output value
0.9           good     1.05
1.0           good     1.05
1.1           good     1.05
1.2           good     1.05
---
2.1           good     2.20
2.3           good     2.20
2.2           good     2.20
---  SPLIT HERE ---
1.1           bad      1.20
1.2           bad      1.20
1.3           bad      1.20
---
1.9           bad      2.00
2.0           bad      2.00
2.1           bad      2.00

Because the average of each cluster of values was used, both "good" and "bad" were discretized into two discrete values. The resulting property, however, plainly reveals the genuine membership because the averages for "excellent" and "bad" differ. The task of detecting "bad" has gotten a lot simpler.
Separate preprocessing is not required and you don't need to perform it also.

Elevate your skills with our comprehensive Machine Learning Course.

answered Apr 7, 2022 by Nandini
• 5,480 points

Related Questions In Machine Learning

0 votes
1 answer

What is the process involved in machine Learning?

Discussing this on a high level, these ...READ MORE

answered May 10, 2019 in Machine Learning by Rhea
1,314 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

What is clustering in Machine Learning?

Clustering is a type of unsupervised learning ...READ MORE

answered May 10, 2019 in Machine Learning by Shridhar
967 views
0 votes
1 answer

Is there any easy way to fill in missing data?

You can try the following code: First, you ...READ MORE

answered Jun 20, 2018 in Data Analytics by DataKing99
• 8,240 points
893 views
0 votes
1 answer

SMOTE-function not working in R

If you convert 'y' to a factor, ...READ MORE

answered Jun 27, 2018 in Data Analytics by CodingByHeart77
• 3,740 points
2,700 views
0 votes
1 answer

How to find out cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 27, 2018 in Data Analytics by Sahiti
• 6,370 points
1,196 views
0 votes
1 answer

Create vector matrix of movie ratings using R project?

Why do'nt you try the dcast function, in the reshape2 package. d ...READ MORE

answered Jun 30, 2018 in Data Analytics by anonymous
892 views
0 votes
1 answer

How do I create a linear regression model in Weka without training?

Weka is a classification algorithm. This is ...READ MORE

answered Mar 9, 2022 in Machine Learning by Nandini
• 5,480 points
1,081 views
0 votes
1 answer

What is inductive bias in machine learning?

Inductive bias can be understood as an ...READ MORE

answered Feb 10, 2022 in Machine Learning by Nandini
• 5,480 points
3,221 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP