C/C++之无符号数和有符号数的定义, 关系及转换

本文从数学的角度来介绍无符号数和有符号数的定义, 关系及转换.

1. 无符号数和有符号数的定义

设 $n$ 维位向量: $\vec{x} = [x_0,x_1,\cdots, x_{n-1}] \in \{0, 1 \}^n$.

无符号数的定义, $B2U(\vec{x}) := \sum_{i = 0}^{n - 1} {{x_i}{2^i}}$.

有符号数的定义, $B2T(\vec{x}) := -x_{n-1}2^{n-1} + \sum_{i=0}^{n-2}{{x_i}{2^i}}$.

注意: 这里的有符号数采用的是 two's complement 编码方式, 这是绝大多数计算机系统采用的方式.

2. 无符号数和有符号数的取值范围

$B2U_{\min}(\vec{x}) = 0$, 这时 $\vec{x} = [0, 0, \cdots, 0]$

$B2U_{\max}(\vec{x}) = \sum_{i = 0}^{n - 1} {{2^i}}$, 这时 $\vec{x} = [1, 1, \cdots, 1]$

$B2T_{\min}(\vec{x}) = -2^{n-1}$, 这时 $\vec{x} = [1, 0, \cdots, 0]$

$B2T_{\max}(\vec{x}) = \sum_{i=0}^{n-2}{{2^i}} = 2^{n-1} - 1$, 这时 $\vec{x} = [0, 1,\cdots, 1]$

所以:
$$\begin{cases} B2U(\vec{x}) \in\{0, \cdots, 2^{n}-1\} \\ B2T(\vec{x}) \in\{-2^{n-1}, \cdots, 2^{n-1} - 1\} \end{cases}$$

3. 无符号数和有符号数转换函数的逆

可以证明 $B2U$ 和 $B2T$ 是双射 (bijection) 的, 所以可以定义它们的逆:

$$\begin{cases} U2B = B2U^{-1}: \{0, \cdots, 2^{n}-1\} \rightarrow \{0, 1 \}^n \\ T2B = B2T^{-1}: \{-2^{n-1}, \cdots, 2^{n-1} - 1\} \rightarrow \{0, 1 \}^n \end{cases}$$

4. 无符号数和有符号数之间的关系

根据定义, 可得
$$B2U(\vec{x}) - B2T(\vec{x}) = x_{n-1}{2^{n-1}} + x_{n-1}2^{n-1} = x_{n-1}{2^{n}}$$
具体地,
$$\begin{aligned} B2U(\vec{x}) &= B2T(\vec{x}) + x_{n-1}{2^{n}} \\ &= \begin{cases} B2T(\vec{x}) & x_{n-1} = 0 \ \textit{i.e.,}\ B2T(\vec{x}) \geq 0 \\ B2T(\vec{x}) + 2^n & x_{n-1} = 1 \ \textit{i.e.,}\ B2T(\vec{x}) < 0 \end{cases} \\ &= B2T(\vec{x}) \bmod 2^{n} \\ \end{aligned}$$

$$\begin{aligned} B2T(\vec{x}) &= B2U(\vec{x}) - x_{n-1}{2^{n}} \\ &= \begin{cases} B2U(\vec{x}) & x_{n-1} = 0 \ \textit{i.e.,}\ B2U(\vec{x}) < 2^{n-1}\\ B2U(\vec{x}) - 2^n & x_{n-1} = 1 \ \textit{i.e.,}\ B2U(\vec{x}) \geq 2^{n-1} \end{cases}\\ &= (B2U(\vec{x}) + 2^{n-1}) \bmod 2^{n} - 2^{n-1} \end{aligned}$$

5. 无符号数和有符号数之间的转换

定义无符号数到有符号数的转换函数: $U2T(u) := B2T(U2B(u))$, 其中 $u \in \{0, \cdots, 2^{n}-1\}$

定义有符号数到无符号数的转换函数: $T2U(t) := B2U(T2B(t))$, 其中 $t \in \{-2^{n-1}, \cdots, 2^{n-1} - 1\}$

根据定义及上一小节的结论, 可以得到:
$$\begin{cases} \begin{aligned} T2U(t) &:= B2U(T2B(t)) \\ &= B2T(T2B(t)) \bmod 2^{n} \\ &= t \bmod 2^{n} \end{aligned}\\ \begin{aligned} U2T(u) &:= B2T(U2B(u)) \\ &= (B2U(U2B(u)) + 2^{n-1}) \bmod 2^{n} - 2^{n-1} \\ &= (u + 2^{n-1}) \bmod 2^{n} - 2^{n-1} \\ \end{aligned}\\ \end{cases}$$

测试代码 (可以打开该链接运行该段代码: http://coliru.stacked-crooked.com/a/5a896386dbc1f143) 如下:

#include <iostream>

int main()
{
    int n = 8;
    for (int u = 0; u <= 255; ++u)
    {
        int u2t_v1 = (signed char)u;
        // U2T(u) = (u + 2^{n-1}) \bmod 2^{n} - 2^{n-1}
        int u2t_v2 = ((u + (1 << (n - 1))) % (1 << n)) - (1 << (n -1));
        std::cout << u << "\t" << u2t_v1 << "\t" << u2t_v2 << std::endl;
    }
    std::cout << "==========================" <<std::endl;

    for (int t = -128; t <= 127; ++t)
    {
        int t2u_v1 = (unsigned char)t;
        // T2U(t) = t \bmod 2^{n}
        // mod op in math is different from % op in C++ since C++11
        // the sign of the result of mod in math is the same as the one of divisor
        // the sign of the result of % since C++11 is the same as the one of dividend
        // so dividend add (1 << n) to make it positive
        int t2u_v2 = (t  + (1 << n)) % (1 << n);
        std::cout << t << "\t" << t2u_v1 << "\t" << t2u_v2 << std::endl;
    }

    std::getchar();
    return 0;
}

参考

CSAPP

更新记录

20211226, 发布