Data Science Python Certification Training
- 84k Enrolled Learners
- Live Class
We can divide any problem into smaller processes:
Classification – is, where we classify the data. E.g. diseases; all diseases exhibit certain behavior, and we can further classify them.
For example: diseases reducing immunity, diseases that give headache, etc.
Regression – involves finding out relationship between multiple variables.
For example: how a human being’s weight is related to his height.
Anomoly Detection – is basically a fluctuation.
For example: In the case of high voltage or low voltage.
Another example could include regulated behaviour which involves driving in the right side or left side based on country. The anomoly here is someone driving from the opposite.
Another example could be network intrusion. Here, an authenticated user logs into your company’s website, and then if someone unauthenticated logs in, it is an An0moly.
Attribute Importance – It basically gives multiple attributes, such as height, weight, temperature, heartbeat. A point to note is that all these attributes are important for a task.
For example: Someone is trying to predict, at what time a person will reach office. Each attribute plays an important role but not all attributes are important.
Association Rules – In simpler terms, it is to analyse or predict the next behavior, where it revolves around the recommendation engine.
For example: A person buying bread may also buy milk. If we analyze the past shopping behaviors, all items in the basket have a relation. In this case, there could be a probability that the person buying bread will also buy milk.
Clustering – It is one of the oldest techniques in statistics. In fact, one can always model any problem, be it classification or clustering, which means grouping similar entities.
1) Take a basket of apples and oranges, in which we can segregate apples from oranges.
2) An important use case for clustering is healthcare. Almost all the statistics and analysis started with use cases of healthcare. To go deeper, there is a clustering term called cohorts (people with similar diseases), so that they can be studied separately from existing customers. For example, if 10 people are suffering from fever and another 10 people from headache, we will find what is common between them and generate medicine.
Feature Extraction – In feature extraction accuracy, validity and failure is quite relevant. In other words, feature extraction can be termed as pattern recognition.
In Google search, when a user enters a term, it comes up with results. Now, an important question to be asked is how did it know, which page is relevant and non-relevant to the term? This can be answered with feature extraction and pattern recognition, where it adds prominent features. Let us say a photo is given, certain cameras detect faces, highlight face to give beautiful images, which also uses feature recognition.
a) Prediction Category – The techniques include regression, logistic, neural networks and decision trees. Some examples include fraud detection (where a computer learns and predicts the next fraud from previous history of fraud). In unsupervised learning, one cannot predict with examples as there is no historical data.
b) Classification Category – Taking an example, whether the transaction is fraudulent or not, it enters the classification category. Here, we take historical data and classify it with decision trees or in case we don’t take any historical data at all, then we directly start on data and try to exploit features on our own. For example, if we need to know the employees, who are likely to leave the organization or likely to stay. In case, it is a new organization, where we can’t use historical data, we can always use clustering for data extraction.
c) Exploration Category – This is a straight forward method, coming up with, what big data means. In unsupervised learning, it is called principle components and clustering.
d) Affinity Category – here multiple elements are involved such as cross-sell/up sell, market basket analysis. In the basket analysis, there is no supervised learning as there is no historical data. So we take data directly and find associations, sequencing and factor analysis.
Got a question for us? Mention them in the comments section and we will get back to you.