[While this is a self-answered question, I will happily upvote and accept any alternative answer that either provides superior accuracy for the same amount of computational work or reduces the amount of computational work while maintaining the same level of accuracy.]
I have previously demonstrated how to compute the complementary error function erfcf() with a maximum error of less than three ulps. This can serve as a building block for additional functions, such as the CDF of the standard normal distribution Φ(x) = ½ erfc (-√½ x) or the Gaussian Q-function, Q(x) = 1-Φ(x) = ½ erfc (√½ x). However, for some use cases computation fully accurate to single precision is not needed while the contribution of erfc() evaluations to total run-time is not negligible.
The literature provides various low-accuracy approximations to the complementary error function, but they are either limited to a subset of the full input domain, or optimized for absolute error, or computationally too complex, for example by requiring multiple invocations of transcendental functions. How could one go about implementing erfcf() with high performance and with a relative accuracy of about 5 decimal digits across the entire input domain?
This answer, like OP's answer, does achieve about 5 decimal digits relative accuracy*1. To achieve at least 5 likely obliges another term in the rational polynomial - which I doubt will slow the performance much.
Special values
In addition to performance and accuracy, the result of a
my_erfc(x)for select values should be considered.For
xnear 0.0, the result should be 1.0 exactly. This follows principle of least surprise.Example: A user may readily rely on
my_cos(0.0) == 1.0, ormy_exp(0.0) == 1.0, so shouldmy_erfc(0.0) == 1.0Fortunately this is not hard to accomplish when a rational polynomial is used with coefficients of 1.0. See below.
Take advantage of the linear region
About 45% of all
floatare such that theerfc()is a simple linear equation with high accuracy.Define performance rating
One could rate code in a worst case, what is the slowest for all
float x?I suspect OP is more interested in an assessment from
x= 0.0up to whenerfc(x)is 0.0 (~x == 10) in a linear progression.Sample performance results and test code follows. Of course, hard to beat the C library's crafted code.
Following code is modeled after OP self answer and incorporates ideas mentioned above.
Note: I'd consider eliminating
c = fminf(a, 10.5f);.This code's error results using OP's answer test harness.
(
#define USE_FMA (0)and#define USE_BUILTIN_EXP (1)).It's relative error (1.20e-05) vs. OP's error (1.06e-05) is primarily due to using
1.0in both polynomialspandq.Other code could use OP's answer's
pandqand this answer'sif (fabs(x) < ERFC_SMALL)for a best of both. Additional work needed to see how well the linear and rational polynomial sections meet.As an extra, below is a graphical view of OP's error over the [0... ~10] range. I found it illuminating to show where errors develop.
*1 Low relative accuracy is not possible in the sub-normal range of answers.