What happens when prob argument in sample sums to less greater than 1

0 votes
We already know that the prob argument in sample is used to apply a weight probability.

As an example,

table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1))) table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1)

/1e6

# 1, 2, 3, and 4

#0.2 0.4 0.3 0.1 0.2 0.4 0.3 0.1 0.2 0.4 0.3

table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1))) table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1)

/1e6

Numbers 1 to 4

#0.200 0.400 0.299 0.100

In this example, the sum of probability is exactly 1 (0.2 + 0.4 + 0.3 + 0.1), hence it gives the expected ratio but what if the probability does not sum to 1? What output would it give? I thought it would result in an error but it gives some value.

When the probability sums up to more than 1.

table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.5, 0.5, 0.1)))

/1e6

# 1 2 3 4

#0.1544 0.3839 0.3848 0.0768
Jun 17, 2022 in Data Analytics by Avinash
• 1,260 points
1,001 views

1 answer to this question.

0 votes
Excellent query. The documentation is ambiguous on this, however by looking at the source code, the issue can be resolved.

If you examine the R code, you will notice that sample constantly calls sample.

int It will utilise sample if you enter just one number, x. If x is a vector, sample is used; otherwise, int is used to build a vector of integers less than or equal to that value. utilises an int to produce a sample of integers with lengths less than or equal to length(x), and then uses that sample to subset x.

The sample.int function now has the following appearance:

useHash = (!replace && is.null(prob) && size = n/2 && n > 1e+07) && n = n, n, size = n, replace = FALSE, prob = NULL,
if (useHash) is used.
Sample2(n, size) internal
else .
Internal(n, size, replace, prob), sample(n))
answered Jun 24, 2022 by Sohail
• 3,040 points

Related Questions In Data Analytics

+5 votes
0 answers
0 votes
1 answer

In a dpylr pipline how to use sample and seq?

For avoiding rowwise(), I prefer to use ...READ MORE

answered Apr 6, 2018 in Data Analytics by DeepCoder786
• 1,720 points

edited Jun 9, 2020 by Gitika 2,029 views
0 votes
1 answer
0 votes
1 answer

What are the important skills to have in Python with regard to data analysis?

The following are some of the important ...READ MORE

answered Aug 20, 2018 in Data Analytics by Abhi
• 3,720 points
5,184 views
0 votes
1 answer

R: Sample from a neighborhood according to scores

I would suggest you to use the truncated ...READ MORE

answered May 29, 2018 in Data Analytics by Sahiti
• 6,370 points
1,140 views
0 votes
1 answer

How to sample random rows in dataframe?

Create data frame and then implement as ...READ MORE

answered Jul 3, 2018 in Data Analytics by Sahiti
• 6,370 points
1,485 views
0 votes
1 answer

How to sample n random rows per group in a dataframe?

You can assign a random ID to ...READ MORE

answered Jul 3, 2018 in Data Analytics by Sahiti
• 6,370 points
5,882 views
0 votes
0 answers

100 samples of 20 from the dataset and drawing regression lines along with population regression line

I have a datasetwith two variables hours ...READ MORE

Apr 11, 2022 in Machine Learning by Dev
• 6,000 points
1,390 views
0 votes
0 answers

What is meant by "An Object" when we use Str() Function in R

Everyone agrees that str() reveals an object's ...READ MORE

Jun 23, 2022 in Data Analytics by avinash
• 1,840 points
742 views
0 votes
1 answer

How to rename a single column in a data.frame?

data.rename(columns={'gdp':'log(gdp)'}, inplace=True) The rename show that it accepts a dict ...READ MORE

answered Jun 24, 2022 in Data Analytics by Sohail
• 3,040 points
828 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP