Probability that a formula fails in IEEE 754

Question

I can check that(0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3)
evaluates to&#160;False.More generally, I can estimate that the formula&#160;(a + b) + c == a + (b + c)&#160;fails roughly&#160;17%&#160;of the time when&#160;a,b,c&#160;are chosen uniformly and independently on&#160;[0,1], using the following simulation:import numpy as np
import numexpr

np.random.seed(0)
formula = '(a + b) + c == a + (b + c)'

def failure_expectation(formula=formula, N=10**6):
    a, b, c = np.random.rand(3, N)
    return 1.0 - numexpr.evaluate(formula).mean()
# e.g. 0.171744
I wonder if it is possible to arrive at this probability by hand, e.g. using the definitions in the floating point standard and some assumption on the uniform distribution.Given the answer below, I assume that the following part of the original question is out of reach, at least for now.Is there is a tool that computes the failure probability for a given formula without running a simulation.Formulas can be assumed to be simple, e.g. involving the use of parentheses, addition, subtraction, and possibly multiplication and division.(What follows may be an artifact of&#160;numpy&#160;random number generation, but still seems fun to explore.)Bonus question based on an observation by NPE. We can use the following code to generate failure probabilities for uniform distributions on a sequence of ranges&#160;[[-n,n] for n in range(100)]:import pandas as pd

def failures_in_symmetric_interval(n):
    a, b, c = (np.random.rand(3, 10**4) - 0.5) * n
    return 1.0 - numexpr.evaluate(formula).mean()

s = pd.Series({
    n: failures_in_symmetric_interval(n)
    for n in range(100)
})On my computer, I can check that(0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3)
evaluates to&#160;False.More generally, I can estimate that the formula&#160;(a + b) + c == a + (b + c)&#160;fails roughly&#160;17%&#160;of the time when&#160;a,b,c&#160;are chosen uniformly and independently on&#160;[0,1], using the following simulation:import numpy as np
import numexpr

np.random.seed(0)
formula = '(a + b) + c == a + (b + c)'

def failure_expectation(formula=formula, N=10**6):
    a, b, c = np.random.rand(3, N)
    return 1.0 - numexpr.evaluate(formula).mean()
# e.g. 0.171744
I wonder if it is possible to arrive at this probability by hand, e.g. using the definitions in the floating point standard and some assumption on the uniform distribution.Given the answer below, I assume that the following part of the original question is out of reach, at least for now.Is there is a tool that computes the failure probability for a given formula without running a simulation.Formulas can be assumed to be simple, e.g. involving the use of parentheses, addition, subtraction, and possibly multiplication and division.(What follows may be an artifact of&#160;numpy&#160;random number generation, but still seems fun to explore.)Bonus question based on an observation by NPE. We can use the following code to generate failure probabilities for uniform distributions on a sequence of ranges&#160;[[-n,n] for n in range(100)]:import pandas as pd

def failures_in_symmetric_interval(n):
    a, b, c = (np.random.rand(3, 10**4) - 0.5) * n
    return 1.0 - numexpr.evaluate(formula).mean()

s = pd.Series({
    n: failures_in_symmetric_interval(n)
    for n in range(100)
})
The plot looks something like this:In particular, failure probability dips down to&#160;0&#160;when&#160;n&#160;is a power of&#160;2&#160;and seems to have a fractal pattern. It also looks like every "dip" has a failure probability equal to that of some previous "peak". Any elucidation of why this happens would be great!

Nandini · Answer

It is feasible to evaluate these things by hand, but the only ways I am familiar with are time-consuming and require a lot of case-by-case enumeration. For example, the likelihood that (a + b) + c == a + (b + c) is 53/64, which is within a few multiples of the machine epsilon. So the chance of a mismatch is 11/64, or about 17.19 percent, which is consistent with what you saw in your experiment. To begin, keep in mind that Python and NumPy's "uniform-on-[0, 1]" random numbers are always of the form n/2**53 for some integer n in range(2**53), and each such number is equally likely to occur within the limitations of the underlying Mersenne Twister PRNG. The vast majority of IEEE 754 values aren't generated by random.random() (or np.random.rand()) because there are about 2**62 binary64 representable values in the range [0.0, 1.0]. This information makes the analysis much easier, but it's also a bit of a cheat. I had to divide the value of 53/64 into five separate cases in order to calculate it: 1)When both a + b 1 and b + c 1 are true. Both a + b and b + c are calculated correctly in this case, so (a + b) + c and a + (b + c) both give the closest float to the exact result, rounding ties to even as usual. As a result, the probability of agreement in this case is 1. 2)The situation in which a + b <1 and b + c >= 1. Here, (a + b) + c will equal the true sum's correctly rounded value, while a + (b + c) may not. We can further separate into subcases based on the parity of a, b, and c's least significant bits. We'll label a "odd" if it's of the form n/2**53 with n odd, and "even" if it's of the form n/2**53 with n even, and vice versa for b and c. If b and c have the same parity (which happens about half the time), (b + c) is computed exactly, and a + (b + c) must equal (a + b) + c once more. The likelihood of agreement in the other circumstances is 1/2 in each case; the details are all quite similar, however in the scenario when an is odd, b is odd, and c is even, (a + b) + c is computed exactly, whereas a + (b + c) incurs two rounding mistakes, each of magnitude 2**-53. If the two faults are in opposing directions, they cancel each other out, and we arrive to an agreement. We don't have any if we don't have any. In this situation, there is a 3/4 probability of agreement. 3)The situation in which a + b >= 1 and b + c <1. After flipping the roles of a and c, this is equivalent to the prior situation; the likelihood of agreement is now 3/4. 4) a + b and b + c are both greater than one, but a + b + c is less than two. Again, one can divide on the parities of a, b, and c and examine each of the 8 situations separately. We always get agreement in the cases even-even-even and odd-odd-odd. The likelihood of agreement in the scenario of odd-even-odd is 3/4. (by yet further sub analysis). In all other circumstances, the answer is 1/2. When you add them all up, you obtain a total chance of 21/32 for this example. 5) If a + b + c >= 2, then Because we're rounding the final result to a multiple of four times 2**-53 in this example, we need to look at the last two significant bits in addition to the parities of a, b, and c. I won't delve into the gritty details, but the probability of agreement is 13/16. Finally, we can combine all of these examples. To accomplish so, we'll need to know the chances that our triple (a, b, c) will land in each scenario. The volume of the square-based pyramid given by 0= a, b, c= 1, a + b 1, b + c 1, which is 1/3, is the likelihood that a + b 1 and b + c 1. The odds of the other four instances can be calculated to be 1/6 each (using solid geometry or appropriate integrals). As a result, our total is 1/3 * 1 + 1/6 * 3/4 + 1/6 * 3/4 + 1/6 * 21/32 + 1/6 * 13/16, which equals 53/64, as claimed. Finally, 53/64 is very probably not the correct answer; to achieve an exact response, we'd have to consider all of the corner instances when a + b, b + c, or a + b + c hit a binade boundary (1.0 or 2.0). It would be conceivable to refine the preceding approach to determine how many of the 2**109 possible triples (a, b, c) satisfy (a + b) + c == a + (b + c), but not before bedtime. However, because corner situations should account for about 1/2**53 of all cases, our estimate of 53/64 should be accurate to at least 15 decimal places.

Probability that a formula fails in IEEE 754

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Machine Learning

Formula to calculate chance (probability) of a dice side based on its value

How to simulate first passage time probability in python for a random walk?

How to calculate probability in a normal distribution given mean & standard deviation?

Create a Markov Model that can generate text simulations by studying Donald Trump speech data set.

View onto a numpy array?

Dimension in python numpy

How to slice an array using python numpy? Is there any numpy tutorial which has covered all its operations?

Printing a large numpy array

How to compute the probability of a value given a list of samples from a distribution in Python?

How does software that calculates winning probability of a Texas Hold'em or Omaha hand against 8 random opponent hands work?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES