What happens when prob argument in sample sums to less greater than 1

Question

We already know that the prob argument in sample is used to apply a weight probability.

As an example,

table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1))) table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1)

/1e6

# 1, 2, 3, and 4

#0.2 0.4 0.3 0.1 0.2 0.4 0.3 0.1 0.2 0.4 0.3

table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1))) table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.4, 0.3, 0.1)

/1e6

Numbers 1 to 4

#0.200 0.400 0.299 0.100

In this example, the sum of probability is exactly 1 (0.2 + 0.4 + 0.3 + 0.1), hence it gives the expected ratio but what if the probability does not sum to 1? What output would it give? I thought it would result in an error but it gives some value.

When the probability sums up to more than 1.

table(sample(1:4, 1e6, replace = TRUE, prob = c(0.2, 0.5, 0.5, 0.1)))

/1e6

# 1 2 3 4

#0.1544 0.3839 0.3848 0.0768

Sohail · Answer 1 · Jun 24, 2022

Excellent query. The documentation is ambiguous on this, however by looking at the source code, the issue can be resolved.

If you examine the R code, you will notice that sample constantly calls sample.

int It will utilise sample if you enter just one number, x. If x is a vector, sample is used; otherwise, int is used to build a vector of integers less than or equal to that value. utilises an int to produce a sample of integers with lengths less than or equal to length(x), and then uses that sample to subset x.

The sample.int function now has the following appearance:

useHash = (!replace && is.null(prob) && size = n/2 && n > 1e+07) && n = n, n, size = n, replace = FALSE, prob = NULL,
if (useHash) is used.
Sample2(n, size) internal
else .
Internal(n, size, replace, prob), sample(n))