what is r-squared in linear regression models

0 votes
I am a bit confused about the definition of the r-squared score in the linear regression model. As far as I know, the R-squared score represents how much of the dependent variable can be determined by the independent variables. However, in the scikit learn library, we have an r2-score function that calculates the r-squared score like r2_score(y_true, y_pred). But both of the parameters here are the output values, and it doesn't seem that it involves any of the indepent variables. Could you help me to understand how this is calculated?
Mar 26 in Machine Learning by Dev
• 6,000 points

1 answer to this question.

0 votes

You inquired about the code x = r2_score(y_true, y_pred) in Python.

Note that:

y_pred y_true
It stands for 'prediction' of the y-variable y_true stands for true value of the y-variable
predicted value is not the raw data. At times it gives line of best fit, y_true are the raw numbers collected during an experiment, survey, or scientific study.

Suppose that you have a model of a teenager's height as a function of age.

Age(in years) 10  12  14  16 
Height (inches) 55 60 65 70

The term "age" refers to the number of years since one was born

In addition, we round down to the next entire year. A youngster who is 10.81773 years old is given the age of ten.
An anticipated value might be that you believe a youngster of ten years old is 55 inches tall on average.
If you undertake a research in which you measure the height of 1,038 10-year-old children, you will discover that they are not all exactly 55 inches tall.
The collection of true y-values refers to the raw data (measured heights of youngsters).
Statisticians frequently calculate error by comparing a child's measured height to the expected height.

Shiny, a 10-year-old girl, is 52 inches tall (rounded to the nearest whole inch). Shina's height was estimated to be 55 inches. Between the true and forecasted figures, there is a 3 inch disparity. Statisticians frequently prefer a single number for a data collection rather than 1,038 separate ones.

Converting the difference between the projected and actual heights of the children into a positive number is one option. For instance, a -5 becomes a +5. The average positive difference (in inches) between actual and forecasted height is then computed.
It's crucial to consider the absolute difference. Some youngsters are shorter (-2 inches) than expected, while others are taller (+7 inches).

If negative numbers are allowed, the average difference between average and actual height is always 0.

  • Take a look at the 1,038 actual heights.

  • Take 55 inches off your real height.

  • The result of adding the height disparities without converting to positive values

  • Result  is always zero.

In reality, one definition of mean is a number x such that when you calculate the difference between each data point and x, then add the results, the result is zero.

The disparities are usually squared by statisticians. Shina's squared inaccuracy is +4 inches because she is short (-2 inches). multiples of a negative number A positive number is never a negative number.

The negative signs are removed by squaring. The negative indicators are removed when the absolute value is used. In fact, there are a million different strategies to get rid of the negative indicators.
Nobody has a formula that perfectly computes for data-set A and data-set B, depending on which data-set is more "spread-out."

It's tough to tell what matters to people. In any case, mean-square-error is preferable to nothing. IT determines how dispersed a data set is. 
Are the data-points all wide apart from the average
What if a 10-year-old child's true average height was 55 inches? Consider what would happen if the true standard deviation was 4 inches.

Assume you took a random sample of 1,038 youngsters, each 10 years old, in that fictitious universe.
Your sample-variance is 7.1091 inches (based on experimental data).

What are the chances that a group of 1,038 kids will have a difference of 7.1091 inches or more?

If your model is true, how likely is it that the data will deviate as much as you observed from the model's prediction?
If the data you get is significantly different from the projected value, your model is most likely flawed.
In either case, the R-squared value is:
100% if the differences between the data and the prediction are sufficiently explained by random chance. 0% if the data does not fit the model at all.

For example, if you throw a fair-coin 1,000 times, it's likely that 491 of the results will be heads rather than exactly 500 "heads."

answered Apr 4 by Nandini
• 5,480 points

Related Questions In Machine Learning

0 votes
1 answer

What is LassoLars? - Linear regression

LassoLars is a lasso model implemented using ...READ MORE

answered May 22, 2019 in Machine Learning by Basu
0 votes
1 answer

What is rolling linear regression?

Rolling regression is the analysis of changing ...READ MORE

answered May 23, 2019 in Machine Learning by Jinu
0 votes
1 answer

What is alpha in ridge regression?

‘L2 regularization‘ or Ridge Regularization  adds penalty ...READ MORE

answered Mar 2 in Machine Learning by Nandini
• 5,480 points
+1 vote
1 answer

Machine Learning and Python Code

You can create an array called actualScore ...READ MORE

answered Dec 13, 2018 in Data Analytics by Shubham
• 13,490 points
0 votes
2 answers
0 votes
1 answer
0 votes
1 answer
Send OTP
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP