How do I perform feature selection in a disease prediction data set?

+1 vote
For feature selection in a disease dataset, should I be familiar with the biology or science behind the disease for doing the feature selection?
Jul 30, 2018 in Data Analytics by Anmol
• 1,620 points

edited Aug 20, 2018 by Anmol 32 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Feature selection is based equally upon logic and hit and trial. Logically selecting features is tried first then comes the hit and trial approach.

Selecting features logically includes using the below listed approaches to filter out the un-required features or choose the most dominant one.

  1. Correlation plot
  2. Checking for co-linearity among variables
  3. Selecting variables based on business insight or common knowledge
  4. Building a linear model to check coefficient values assigned to the model

Once you have logically selected a predefined set of response variables, you can use hit and trial approach to combine, add or remove response variables.

Combining can be beneficial in case the target variable is binary, example being obese, having diabetes, having irregular blood pressure can all be combined together to predict a disease.

answered Aug 20, 2018 by ANMOL
• 3,620 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
1 answer

How can I drop columns by name in a data frame ?

We can Drop Columns by name in ...READ MORE

answered Apr 13, 2018 in Data Analytics by zombie
• 3,690 points
18 views
0 votes
1 answer

How can I calculate mean per group in a data.frame?

You can use aggregate function for calculating ...READ MORE

answered May 24, 2018 in Data Analytics by zombie
• 3,690 points
12 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
25 views
0 votes
1 answer

What is the difference between correlation and covariance?

Correlation and Co-variance both are used as ...READ MORE

answered Jul 24, 2018 in Data Analytics by ANMOL
• 3,620 points
1,419 views
0 votes
1 answer

What is the difference between random forest and decision trees?

The basic difference is that Random Forest ...READ MORE

answered Jul 30, 2018 in Data Analytics by ANMOL
• 3,620 points
271 views
0 votes
2 answers

What is the difference between LDA and PCA for dimensionality reduction?

Principal Component Analysis (PCA) is an unsupervised ...READ MORE

answered Mar 6 in Data Analytics by Seema
• 140 points
573 views
0 votes
1 answer

How do I become a data scientist step by step?

I am assuming that you are a ...READ MORE

answered Jul 26, 2018 in Data Analytics by ANMOL
• 3,620 points
66 views
+1 vote
2 answers

How can I get experience in Data Science as a fresher?

Work on projects of your own. It’s tough, ...READ MORE

answered Aug 9, 2018 in Data Analytics by ANMOL
• 3,620 points
30 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.