Various datatypes in GEMM

Review GEMM datatypes.

To convert the binary representation of a number to a decimal, we use the following formula: $(-1)^{sign} \times 2^{exponent-bias} \times 1.mantisa$

FP64

mantisa: 52 bits
exponent: 11 bits (holds 0~2047, but exponent ranges form 1~2046, but 0 is reserved for denorms and zero, and 2047 is reserved for NaN and infinity)
sign: 1 bit
bias: 1023
range: 2^-1022 to 2^1023
precision: 15-17 decimal digits
size: 8 bytes
largest positive value: 1.7976931348623157e+308
largest negative value: -1.7976931348623157e+308
smallest positive value: 2.2250738585072014e-308
smallest negative value: -2.2250738585072014e-308
machine epsilon: 2.2204460492503131e-16
smallest denormalized value: 4.9406564584124654e-324
largest denormalized value: 2.2250738585072014e-308

mantisa: 23 bits
exponent: 8 bits (holds 0~255, but exponent ranges form 1~254, but 0 is reserved for denorms and zero, and 255 is reserved for NaN and infinity)
sign: 1 bit
bias: 127
range: 2^-126 to 2^127
precision: 6-9 decimal digits
size: 4 bytes
largest positive value: 3.4028235e+38
largest negative value: -3.4028235e+38
smallest positive value: 1.1754944e-38 (2^-126)
smallest negative value: -1.1754944e-38 (-2^-126)
machine epsilon: 1.1920929e-07
smallest denormalized value: 1.4012985e-45 (2^-(126+23))
largest denormalized value: 1.1754944e-38

mantisa: 10 bits
exponent: 5 bits (holds 0~31, but exponent ranges form 1~30, but 0 is reserved for denorms and zero, and 31 is reserved for NaN and infinity)
sign: 1 bit
bias: 15
range: 2^-14 to 2^15
precision: 3-4 decimal digits
size: 2 bytes
largest positive value: 65504
largest negative value: -65504
smallest positive value: 6.1035e-5
smallest negative value: -6.1035e-5
machine epsilon: 9.77e-4
smallest denormalized value: 5.96e-8
largest denormalized value: 6.1035e-5

mantisa: 7 bits
exponent: 8 bits (holds 0~255, but exponent ranges form 1~254, but 0 is reserved for denorms and zero, and 255 is reserved for NaN and infinity)
sign: 1 bit
bias: 127
range: 2^-126 to 2^127
precision: 2-3 decimal digits
size: 2 bytes
largest positive value: 3.38953139e+38
largest negative value: -3.38953139e+38
smallest positive value: 1.168e-38
smallest negative value: -1.168e-38
machine epsilon: 1.168e-07
smallest denormalized value: 1.4012985e-45
largest denormalized value: 1.168e-38

Zero: 0/1 sign bit for positive or negative zero, exponent and mantisa are all 0
NaN (Not a Number): all exponent bits are 1, and mantisa is non zero, sign bit can be 0 or 1. The most significant bit from x is used to determine the type of NaN: "quiet NaN" or "signaling NaN"
+/-inf: all exponent bits are 1, and mantisa is zero. 0/1 sign bit for positive and negative infinity
subnormal/denorms:
Normalized numbers have the implicit leading binary digit is a 1. To reduce the loss of precision when an underflow occurs, IEEE 754 includes the ability to represent fractions smaller than are possible in the normalized representation, by making the implicit leading digit a 0. Such numbers are called denormal/subnormal. They don't include as many significant digits as a normalized number, but they enable a gradual loss of precision when the result of an operation is not exactly zero but is too close to zero to be represented by a normalized number.

A denormal number is represented with a biased exponent of all 0 bits, which represents, for example, an exponent of -126 in single precision (not -127), or -1022 in double precision (not -1023). In contrast, the smallest biased exponent representing a normal number is 1.

There are three kinds of operations that can return NaN:

The divisions (±0) / (±0) and (±∞) / (±∞).
The multiplications (±0) × (±∞) and (±∞) × (±0).
Remainder x % y when x is an infinity or y is zero.
The additions (+∞) + (−∞), (−∞) + (+∞) and equivalent subtractions (+∞) − (+∞) and (−∞) − (−∞).
The standard has alternative functions for powers:

The standard pow function and the integer exponent pown function define 00, 1∞, and ∞0 as 1.
The powr function defines all three indeterminate forms as invalid operations and so returns NaN.

The square root of a negative number.
The logarithm of a negative number.
The inverse sine or inverse cosine of a number that is less than -1 or greater than 1.