数学分析要点

Posted on 2019-07-07 | Edited on 2024-10-23

有尽小数在实数系中处处稠密

定理设$a$和$b$是实数, $a<b$. 则存在有尽小数$c$, 满足

$$
a < c < b
$$

确界原理

$\mathbb{R}$的任何非空而有上界的子集合$D$在$\mathbb{R}$中有上确界.

常用不等式

绝对值不等式
$$
|a+b| \le |a| + |b| \\
||a|-|b|| \le |a - b| \\
$$

Bernoulli不等式

$$
(1+x)^n \ge 1+nx
$$

均值不等式

$$
\frac{x_1+x_2}{2} \ge \sqrt{x_1x_2}
$$

AM-GM不等式

定理设$x_1,x_2,\cdots,x_n \ge 0$, 则以下的算术平均数与几何平均数不等式成立:

$$
\frac{x_1+x_2+\cdots+x_n}{n} \ge \sqrt[n]{x_1x_2 \cdots x_n}
$$

三角函数不等式

定理对于用弧度表示的角$x$, 有以下不等式成立

$$
\forall x \in (0, \frac{\pi}{2}) \qquad \sin x < x < \tan x
$$

有界序列, 无穷小序列, 收敛序列性质一览

两个有界序列的和是有界序列, 两个有界序列的积是有界序列.
两个无穷小序列的和是无穷小序列, 两个无穷小序列的积也是无穷小序列.
无穷小序列和有界序列的积是有界序列.
$\{a_n\}$是无穷小序列当且仅当$\{|a_n|\}$是无穷小序列.
实数与无穷小序列的积是无穷小序列.
有限个无穷小序列的和和积都是无穷小序列.
收敛序列是有界序列.
$\lim |x_n| = |\lim x_n|$.
$\lim (x_n \pm y_n) = \lim x_n \pm \lim y_n$.
$\lim (x_ny_n) = \lim x_n \cdot \lim y_n$.
若$x_n \ne 0$且$\lim x_n \ne 0$, $\lim \frac{1}{x_n} = \frac{1}{\lim x_n}$.
$\lim(cx_n) = c\lim x_n$.
若$x_n \ne 0$且$\lim x_n \ne 0$, $\lim \frac{y_n}{x_n} = \frac{\lim y_n}{\lim x_n}$.

序列收敛原理

单调有界序列收敛.
有界序列具有收敛的子序列.
基本序列收敛.

函数极限定义

定义I(序列式定义)

设$a, A \in \bar{\mathbb{R}}$, 并设函数$f(x)$在$a$点的某个去心邻域$U^*(a)$上有定义. 如果对于任何满足条件$x_n \rightarrow a$的序列$\{x_n\} \subset U^*(a)$, 相应的函数值序列$\{f(x_n)\}$都以$A$为极限，那么我们就说当$x \rightarrow a$时, 函数$f(x)$的极限为$A$, 记为

$$
\lim_{x \rightarrow a}f(x) = A
$$

定义II($\epsilon\text{-}\delta$式定义)

设$a, A \in \mathbb{R}$, 并设函数$f(x)$在$a$点的某个去心邻域$U^*(a,\eta)$上有定义. 如果对任意的$\epsilon > 0$, 存在$\delta > 0$, 使得只要$0 < |x - a| < \delta$, 就有

$$
|f(x) - A| < \epsilon
$$

那么我们就说: $x \rightarrow a$时函数$f(x)$的极限是$A$, 记为

$$
\lim_{x \rightarrow a}f(x) = A
$$

*定义II仅定义了自变量和因变量都是实数的情况, 除此之外还有$a = \pm\infty$, $A = \pm\infty$与实数进行组合的多种情况, 在此不一一列举.

连续函数重要性质

定理1 连续点附近有界.

定理2 两个函数$f(x)$, $g(x)$在$x_0$点连续, 有

$f(x) \pm g(x)$在$x_0$处连续.
$f(x)g(x)$在$x_0$处连续.
$\frac{f(x)}{g(x)}$在使得$g(x_0) \ne 0$的$x_0$处连续.
$|f(x)|$在$x_0$处连续.
若$f(x_0) < g(x_0)$, 则存在$\delta > 0$使得对于$x \in U(x_0, \delta)$有$f(x) < g(x)$.

定理3(复合函数连续性) 设函数$f(x)$在$x_0$处连续, 函数$g(x)$在$y_0=f(x_0)$处连续, 则复合函数$g \circ f(x) = g(f(x))$在$x_0$处连续.

定理4(介值定理)

设函数$f$在闭区间$[a,b]$连续. 对于任意$\gamma$满足$f(a) < \gamma < f(b)$或$f(a) > \gamma > f(b)$, 存在$c \in (a,b)$使得

$$
f(c) = \gamma
$$

定理5 在闭区间$[a,b]$连续的函数$f$在$[a,b]$上有界.

定理6(最大值与最小值定理) 设函数$f$在闭区间$[a,b]$上连续, $M,m$分别是该函数的上确界和下确界, 则存在$x’, x’’ \in [a,b]$使得

$$
f(x’) = M, \quad f(x’’) = m
$$

定理7(一致连续性定理) 在闭区间$[a,b]$上连续的函数$f$在$[a,b]$上一致连续.

无穷量

设函数$\phi(x)$和$\psi(x)$在$a$点的某个取心邻域$U^*(a)$上有定义, 并设在$U^*(a)$上$\phi(x) \ne 0$. 我们分别用记号$O$, $o$与$\sim$表示比值$\frac{\psi(x)}{\phi(x)}$在$a$点邻近的几种情况:

$\psi(x)=O(\phi(x))$表示$\frac{\psi(x)}{\phi(x)}$是$x\rightarrow a$时的有界变量.
$\psi(x)=o(\phi(x))$表示$\frac{\psi(x)}{\phi(x)}$是$x\rightarrow a$时的无穷小量.
$\psi(x)\sim \phi(x)$表示$\lim_{x\rightarrow a}\frac{\psi(x)}{\phi(x)} = 1$.

重要极限小记

$$
\begin{eqnarray}
\lim_{x\rightarrow 0}&\frac{\sin x}{x} &= 1 \\
\lim_{x\rightarrow \infty}&(1+\frac{1}{x})^x &= e \\
\lim_{\alpha \rightarrow 0}&\frac{\ln(1+\alpha)}{\alpha} &= 1 \\
\lim_{\alpha \rightarrow 0}&\frac{\log_b(1+\alpha)}{\alpha} &= \frac{1}{\ln b} \\
\lim_{\alpha \rightarrow 0}&\frac{e^\alpha-1}{\alpha} &= 1 \\
\lim_{\alpha \rightarrow 0}&\frac{(1+\alpha)^\mu - 1}{\alpha} &= \mu \\
\end{eqnarray}
$$

基础微分

导数设函数$f(x)$在$x_0$点邻近有定义. 如果存在有穷极限

$$
\lim_{\Delta x \rightarrow 0}\frac{f(x_0+\Delta x)-f(x_0)}{\Delta x}
$$

那么我们就说函数$f(x)$在$x_0$点可导, 并把上述极限值称为$f(x)$在$x_0$点的导数, 记为$f’(x_0)$(Lagrange)或$\frac{df(x_0)}{dx}$(Leibnitz).

微分设函数$f(x)$在$x$点邻近有定义, 如果

$$
f(x+h)-f(x)=Ah + o(h)
$$

其中$A$与$h$无关, 那么我们就说函数$f$在$x$点可微.

设函数$y=f(x)$在$x_0$处可微, 引入记号

$$
\begin{align}
dx &:= \Delta x \nonumber \\
dy &:= f’(x_0)dx = f’(x_0)\Delta x \nonumber
\end{align}
$$

$dy$叫做函数$y=f(x)$在$x_0$点的微分.

初等函数导数表

函数	导数	备注
$C$	$0$
$x^m$	$mx^{m-1}$	$m \in \mathbb{N}_+$
$x^{-m}$	$-mx^{-m-1}$	$m \in \mathbb{N}_+, \quad m \ne 0$
$x^\mu$	$\mu x^{\mu-1}$	$\mu \in \mathbb{R}, \quad x > 0$
$\sin x$	$\cos x$
$\cos x$	$-\sin x$
$\tan x$	$\frac{1}{\cos^2 x}$	$x \ne k\pi + \frac{\pi}{2}$
$\cot x$	$-\frac{1}{\sin^2 x}$	$x \ne k\pi$
$\arcsin x$	$\frac{1}{\sqrt{1-x^2}}$	$\|x\| < 1$
$\arccos x$	$-\frac{1}{\sqrt{1-x^2}}$	$\|x\| < 1$
$\arctan x$	$\frac{1}{1+x^2}$
$\text{arccot }x$	$-\frac{1}{1+x^2}$
$e^x$	$e^x$
$a^x$	$a^x\ln a$	$a>0$
$\ln x$	$\frac{1}{x}$	$x>0$
$\log_a x$	$\frac{1}{x} \log_a e$	$a>0, \quad x>0$

不定积分

设函数$f$在区间$I$上有定义, 函数$F$是$f$的一个原函数, 则函数簇

$$
F(x) + C (C \in \mathbb{R})
$$

表示$f$的一切原函数. 我们把这函数簇叫做函数$f(x)$的不定积分, 或者叫做微分形式$f(x)dx$的不定积分, 记为

$$
\int f(x)dx = F(x) + C
$$

不定积分表

不定积分表
$\int 0 dx = C$
$\int 1 dx = x + C$
$\int x^\mu dx = \frac{1}{\mu + 1} x^{\mu+1} + C (\mu \ne -1)$
$\int x^{-1} dx = \int \frac{dx}{x} = \ln{\|x\|} + C$
$\int \frac{dx}{x-a} = \ln{\|x-a\|} + C$
$\int e^x dx = e^x + C$
$\int a^x dx = \frac{a^x}{\ln{a}}+C(a > 0, a \ne 1)$
$\int \cos x dx = \sin x + C$
$\int \sin x dx = -\cos x + C$
$\int \frac{dx}{\cos^2 x} = \tan x + C$
$\int \frac{dx}{\sin^2 x} = - \cot x + C$
$\int \frac{dx}{1+x^2} = \arctan x + C$
$\int \frac{dx}{\sqrt{1-x^2}} = \arcsin x + C$
$\int \frac{dx}{\sqrt{x^2 \pm a^2}} = \ln{\|x+\sqrt{x^2 \pm a^2} \|} + C$

定积分

设函数$f$在区间$[a,b]$有定义. 如果存在实数$I$, 使得对于任意无穷细分割序列${P^{(n)}}$, 不论相应于每个分割$P^{(n)}$的标志点组$\xi^{(n)}$怎样选择, 都有

$$
\lim_{n \rightarrow +\infty} \sigma(f, P^{(n)}, \xi^{(n)})=I
$$

那么我们就说函数$f$在$[a,b]$上可积,并把$I$称为函数$f$在$[a,b]$上的定积分,记为

$$
\int_a^bf(x)dx = \lim_{|P| \rightarrow 0}\sigma(f,P,\xi)=I
$$

牛顿-莱布尼茨公式

设函数$f$在闭区间$[a,b]$连续. 如果存在函数$F$在$[a,b]$连续, 在$(a,b)$可导, 并且满足

$$
F’(x) = f(x), \forall x \in (a,b)
$$

那么函数$f$在$[a,b]$上可积,并且

$$
\int_a^bf(x)dx=F(b)-F(a)
$$

二重积分的计算

设函数$z = f(x,y)$在闭区间$D$上连续, 其中$D$由直线$x=a, x=b (a < b)$及曲线$y=\varphi_1(x), y=\varphi_2(x) (\varphi_1(x) \le \varphi_2(x), a \le x \le b)$围成. 这时$f(x,y)$的二重积分可表示成如下的累次积分

$$
\iint f(x,y)d\sigma = \int_a^b [\int_{\varphi_1(x)}^{\varphi_2(x)} f(x,y)dy]dx
$$

基础微分方程

本记将基础微分方程分为5种类型，分别是可分离变量型、齐次型、一阶线性型、二阶可降阶型和二阶线性常系数微分方程。

可分离变量型

形如

$$
\frac{dy}{dx} = g(x)h(y)
$$

的方程, 可将其变换为

$$
\frac{dy}{h(y)} = g(x)dx
$$

并在两边同时积分.

齐次型

形如

$$
\frac{dy}{dx} = f(\frac{y}{x})
$$

的方程, 可令$u = \frac{y}{x}$代入得到

$$
\frac{xdu}{dx} + u = f(u)
$$

并可进行变量分离得到

$$
\frac{du}{f(u)-u} = \frac{dx}{x}
$$

一阶线性型

形如

$$
y’ + p(x)y = q(x)
$$

的方程, 两边同时乘以$e^{\int p(x)dx}$可得

$$
\begin{align*} &e^{\int p(x)dx}y’ + e^{\int p(x)dx}p(x)y = e^{\int p(x)dx}q(x) \\
\Rightarrow& [e^{\int p(x)dx}y]' = e^{\int p(x)dx}q(x) \\
\Rightarrow& y = e^{-\int p(x)dx}[\int e^{\int p(x)dx}q(x)dx + C_1]
\end{align*} $$

二阶可降阶型

形如

$$
y’’ = f(x, y’)
$$

或

$$
y’’ = f(y’, y)
$$

的方程, 可令$y’ = p$, 则$y’’ = \frac{dp}{dx} = \frac{dp}{dy}p$, 并化为一阶方程.

二阶线性常系数微分方程

形如

$$
y’’ + py’ + qy = f(x)
$$

的方程, 考察其特征方程

$$
\lambda^2 + p\lambda + q = 0
$$

其齐次型($f(x)=0$)的通解如下表所示

特征根	通解
$\lambda_1, \lambda_2 \in \mathbb{R}, \lambda_1 \ne \lambda_2$	$y = C_1e^{\lambda_1x}+C_2e^{\lambda_2x}$
$\lambda_1, \lambda_2 \in \mathbb{R}, \lambda_1 = \lambda_2$	$y=(C_1+C_2x)e^{\lambda_1x}$
$\lambda_{1,2}=\alpha \pm i\beta$	$y=e^{ax}(C_1\cos \beta x + C_2 \sin \beta x)$

其非齐次型的特解如下表所示

$f(x)$形式	条件	特解
	$\alpha \ne \lambda_1, \alpha \ne \lambda_2$	$y^* = e^{\alpha x}Q_n(x)$
$P_n(x)e^{\alpha x}$	$\alpha = \lambda_1$ 或 $\alpha = \lambda_2$	$y^* = xe^{\alpha x}Q_n(x)$
	$\alpha = \lambda_1 = \lambda_2$	$y^* = x^2e^{\alpha x}Q_n(x)$

$e^{\alpha x}[P_n(x)\cos \beta x + P_m(x)\sin \beta x](\beta \ne 0)$	$\lambda \ne \alpha \pm i\beta$	$y^* = e^{\alpha x}[Q_l(x)\cos \beta x + R_l(x)\sin \beta x]$
	$\lambda = \alpha \pm i\beta$	$y^* = xe^{\alpha x}[Q_l(x)\cos \beta x + R_l(x)\sin \beta x]$

其中, $P_n(x)$为关于$x$的$n$次多项式, $l = \max\{n, m\}$.

参考文献

张筑生. 数学分析新讲. 北京大学出版社.

Important Points of Compilers

Posted on 2019-07-03 | Edited on 2024-10-23

The principles and techniques for compiler design are applicable to so many other domains that they are likely to be reused many times in the career of a computer scientist. The study of compiler writing touches upon programming languages, machine architecture, language theory, algorithms, and software engineering.

The General Structure of a Compiler

Context-Free Grammars

A context-free grammar $G$ is defined by the 4-tuple

$$
G = (V, \Sigma, R, S)
$$

where

$V$ is a set of nonterminals, sometimes called "syntactic variables." Each nonterminal represents a set of strings of terminals, in a manner we shall describe.
$\Sigma$ is a set of terminal symbols, sometimes referred to as "tokens." The terminals are the elementary symbols of the language defined by the grammar.
$R$ is a set of productions, where each production consists of a nonterminal, called the head or left side of the production, an arrow, and a sequence of terminals and/or nonterminals, called the body or right side of the production. In fact, $R$ is a finite relation from $V$ to $(V \cup \Sigma)^*$.
$S$ is designation of one of the nonterminals as the start symbol.

All the terminal strings that can be derived from the start symbol form the language defined by the grammar.

Important Concepts in Lexical Analysis

A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or a sequence of input characters denoting an identifier. The token names are the input symbols that the parser processes. In what follows, we shall generally write the name of a token in boldface. We will often refer to a token by its token name.

A pattern is a description of the form that the lexemes of a token may take. In the case of a keyword as a token, the pattern is just the sequence of characters that form the keyword. For identifiers and some other tokens, the pattern is a more complex structure that is matched by many strings.

A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token.

An alphabet is any finite set of symbols.

A string over an alphabet is a finite sequence of symbols drawn from that alphabet.

A language is any countable set of strings over some fixed alphabet.

A regular set is a language that can be defined by a regular expression.

Regular Expressions

Regular expressions are defined by the following rules:

$\epsilon$ is a regular expression, and $L(\epsilon)=\{\epsilon\}$.
If $a \in \Sigma$, then $a$ is a regular expression, and $L(a) = \{a\}$.

Suppose $r$ and $s$ are regular expressions.

$r|s$ is a regular expression and $L(r|s)=L(r) \cup L(s)$.
$rs$ is a regular expression and $L(rs)=L(r)L(s)$.
$r^*$ is a regular expression and $L(r^*)=L(r)^*$.
$(r)$ is a regular expression and $L((r))=L(r)$.

There are also plenty of extensions of regular expressions.

$r^+$ is a regular expression and $L(r^+)=L(r)^+$.
$r?$ is a regular expression and $L(r?)=L(r) \cup \{\epsilon\}$.
$[a_1a_2\cdots a_n]$ is a regular expression with $a_1,a_2,\cdots,a_n$ are all regular expressions and it is a shorthand of $a_1|a_2|\cdots|a_n$.

Regular expressions have following properties:

$r|s=s|r$.
$r|(s|t)=(r|s)|t$.
$r(st)$=$(rs)t$.
$r(s|t)=rs|rt$; $(s|t)r=sr|tr$.
$\epsilon r=r\epsilon=r$.
$r^*=(r|\epsilon)^*$.
$r^{**}=r^*$.

Finite Automata

Finite automata are recognizers: they simply say "yes" or "no" about each possible input string.

Nondeterministic Finite Automata

A nondeterministic finite automaton(NFA) is defined by a 5-tuple, $(S, \Sigma, T, s_0, F)$, consisting of

A finite set of states $S$.
A set of input symbols $\Sigma$, the input alphabet. We assume that $\epsilon$ is never a member of $\Sigma$.
A transition function $T$ that gives, for each state, and for each symbol in $\Sigma \cup \{\epsilon\}$ a set of next states.
A state $s_0$ from $S$ that is distinguished as the start state or the initial state.
A set of states $F$, a subset of $S$, that is distinguished as the accepting states or the final states.

Deterministic Finite Automata

A deterministic finite automaton(DFA) is a special case of an NFA where:

There are no moves on input $\epsilon$.
For each state $s$ and input symbol $a$, there is exactly one edge out of $s$ labeled $a$.

Algorithm: From NFA to DFA

INPUT An NFA $N$.

OUTPUT A DFA $D$ accepting the same language as $N$.

STEPS

Compute $\epsilon\text{-}closure(s_0)$ as the initial unmarked state $s_0’$ of $D$.
Do the following until all the states in $D$ is marked:
1. Pick up an unmarked state $s’$ in $D$, mark $s’$.
2. For all symbols $a$ in $\Sigma$, do the following:
  1. Compute $\epsilon\text{-}closure(move(S_{s’}, a))$ as a state $s_i’$ in $D$, where $S_{s’}$ is the set of states in $N$ corresponding to the the state $s’$ in $D$.
  2. If $s_i’$ is not a existed state of $D$, add it as an unmarked new state.
  3. Create an edge $a$ between $s’$ and $s_i’$.

EXPLANATION

$\epsilon\text{-}closure(s)=\{s | s\text{ can be reached from }s_0\text{ through an edge labeled }\epsilon\}$.
$move(S,a)=\{s|s\text{ can be reached from states in }S\text{ through an edge labeled }a\}$.

Algorithm: Simulating an NFA

INPUT An input string $x$ terminated by an end-of-file character eof. An NFA $N$.

OUTPUT Answer "true" if $N$ accepts $x$; "false" otherwise.

STEPS

Compute $S=\epsilon\text{-}closure(s_0)$, where $s_0$ is the start state of $N$.
Do the following until an eof is read:
1. Read a character $c$.
2. Let $S=\epsilon\text{-}closure(move(S,c))$.
If $S \cap F \ne \emptyset$, answer "true" and otherwise, answer "false".

Algorithm: Constructing an NFA from a Regular Expression

INPUT A regular expression $r$ over alphabet $\Sigma$.

OUTPUT An NFA $N$ accepting $L(r)$.

METHOD

Apply the following rules.

The NFA accepting $L(\epsilon)$ is

The NFA accepting $L(a)$, where $L(a)=\{a\}$, is

The NFA accepting $L(s|t)$, where both $s$ and $t$ are regular expressions, is

The NFA accepting $L(st)$, where both $s$ and $t$ are regular expressions, is

The NFA accepting $L(s^*)$, where $s$ is a regular expression, is

Algorithm: Constructing a DFA from a Regular Expression

INPUT A regular expression $r$ over alphabet $\Sigma$.

OUTPUT A DFA $D$ accepting $L(r)$.

STEPS

Construct a syntax tree $T$ from the augmented regular expression $(r)\#$, and give each leaf a number, called a position. Note that all the leaves of $T$ is a single symbol.
Compute $firstpos(n_0)$ as the unmarked start state $s_0$ of $D$, where $n_0$ is the root of $T$.
Do the following until all the states in $D$ is marked:
1. Pick up an unmarked state $s$ in $D$, mark $s$.
2. For all symbols $a$ in $\Sigma$, do the following:
  1. Compute $followpos(p)$ for all $p$ in $s$ whose symbol in $T$ is $a$ as a state $s_i$.
  2. If $s_i$ is not a existed state of $D$, add it as an unmarked new state.
  3. Create an edge $a$ between $s$ and $s_i$.

EXPLANATION

$firstpos(n)=\{\text{some positions in the subtree rooted at node }n\}$
Suppose the language expressed by the subtree rooted at $n$ is $L(T_n)$, and all the possible first symbols of strings in $L(T_n)$ form a set $\sigma$. $firstpos(n)$ is the set of positions corresponding to the symbols in $\sigma$.
$followpos(p)=\{\text{positions corresponding to all the symbols that are possibly the next one of position }p\}$

Algorithm: Minimizing the Number of States of a DFA

INPUT A DFA $D=(S, \Sigma, T, s_0, F)$.

OUTPUT A DFA $D’$ accepting the same language as $D$ and having as few states as possible.

STEPS

Let $\Pi = {F, S-F}$ be the initial partition.
Do the following until $\Pi$ remains the same:
For each group $G$ in $\Pi$, partition $G$ into subgroups such that two states $s$ and $t$ are in the same group if and only if for all input symbols $a$, state $s$ and $t$ have transitions on $a$ to states in the same group of $\Pi$.
Choose one state of each group in $\Pi$ as the representative for that group, which is also a state in $D’$. The start/accepting state of $D’$ is the representative of the group containing the start/accepting state of $D$.

References

Alfred V. A., et al. Compilers Principles, Techniques, & Tools Second Edition

Context-Free Grammar - Wikipedia. https://en.wikipedia.org/wiki/Context-free_grammar

【LeetCode】493. Reverse Pairs

Posted on 2019-07-02 | Edited on 2024-10-23

Summary

493 is a typical question using the divide-and-conquer method. In fact, we can add only a little process into the general merging procedure of the mergesort to accomplish this task.

Solution by Mergesort

In the general mergesort process, we merge two sorted list into one by moving two indexes in each list. For example, consider two sorted list L=[1, 3, 5, 7] and R=[2, 4, 6, 8]. Let i, j be the indexes. The initial state is

We compare the R[i] and L[j] and put the smaller one to the new list. After that, the smaller one's index move to next one.

This process continues until the big list is filled up.

In 493, what we need to do is just repeat this process similarly once before this process. We still use two indexes and compare the values, but at this time, we should compare L[i] and 2 * R[j].

If L[i] > 2 * R[j], then we can make sure that all the elements after i in L are also bigger than 2 * R[j] because L is sorted. Thus, we just need to add L.size() - i to the total number and move j to the next one.

If L[i] <= 2 * R[j], just move i to the next one.

Finally, what we get is two processes. The first one is comparing L[i] and 2 * R[j] and modify the number of reverse pairs, and the second one is the general merging process of the mergesort. The two processes only access each element once, so the time complexity is $O(n)$. With the division process time complexity $O(logn)$, the total time complexity of this solution is $O(nlogn)$.

Common Data Structures and Algorithms

Posted on 2019-07-02 | Edited on 2024-10-23

The simplest are also the most important.

Arrays and Linked Lists

In almost every programming language, arrays are built-in data structures. An array arranges continuous space for objects, and is good at random access. However, arrays perform bad when meet insertions and deletions, and cost a lot to change their sizes.

With a linked list, insertions and deletions are very fast to perform, but one cannot access objects randomly in a linked list with $O(1)$ time complexity.

Many complex data structures are implemented based on these two simple data structures.

Stacks

In a stack data structure, all insertions and deletions of entries are made at one end, called the top of the stack. When we add an item to a stack, we say that we push it onto the stack, and when we remove an item, we say that we pop it from the stack. Note that the last item pushed onto a stack is always the first that will be popped from the stack. This property is called last in, first out, or LIFO for short.

Stacks are simple enough to be understood by only a few sentences. The most important feature of a stack is the LIFO property, which is very useful when a task about reversal need to be performed.

Example: Reverse a List

When a series of objects need to be loaded and written out in reverse order, it's suitable to use a stack to accomplish the task. Consider a series of input as follow,

1 2	10 // The number of objects. 1,2,3,4,5,6,7,8,9,10 // The objects.

A list in reverse order should be output like,

1	10,9,8,7,6,5,4,3,2,1

At this time, a perfect solution is using a stack. After pushing all the numbers into the stack, the reverse order of the list is automatically produced by popping them out.

Queues

For computer applications, we similarly define a queue to be a list in which all additions to the list are mode at one end, and all deletions from the list are made at the other end. Queues are also called first-in, first-out lists, or FIFO for short.

As a good brother of stacks, queues are also easy to understand. It's FIFO property is extremely suitable to perform tasks with particular orders, especially when some tasks need to wait for some other tasks.

Heaps

In a max heap, for any given node $C$, if $P$ is a parent node of $C$, then the key (the value) of $P$ is greater than or equal to the key of $C$. In a min heap, the key of $P$ is less than or equal to the key of $C$.

Heaps are often implemented by a binary tree, and a few important operations are listed below.

Insertion

To insert a new node to a heap, the general idea is to do swapping after the node is inserted to the "end".

Deletion

Only the root can be deleted. We move the node in the "end" to the root and heapify.

Binary Trees

Binary trees are very powerful tools to solve difficult problems, but it is also difficult to determine when to use binary trees and what kinds of trees to use. Here some typical binary trees are introduced, but at first, let's remember the precise definition of a binary tree.

A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root.

Binary Search Trees (BST)

A binary search tree is a binary tree that is either empty or in which every node has a key and satisfies the following conditions:

The key of the root (if it exists) is greater than the key in any node in the left subtree of the root.

The key of the root (if it exists) is less than the key in any node in the right substree of the root.

The left and right subtrees of the root are again binary search trees.

We focus on the search, insertion and deletion of elements on a binary search tree.

Search is quite simple. Since the tree has the unequality property and is recursively defined, we can just use a simple recursive or iterative function to accomplish the task. Note that searching a key in the tree has a time complexity $O(log n)$.

Insertion requires that after adding to new node to the tree, the tree is still a BST. However, since the tree is not required to be balanced, the process of insertion is quite like the process of search. We search in the tree as if the node has already been inserted, and when we reach a leaf node, we insert the node under that leaf node.

Deletion meets difficulties when a node with both left and right subtrees exist. At this time, we find the most farthest and most rightest child of the left subtree and replace it with the node. Since the child node is the greatest one of the left subtree and it is still less than any node on the right subtree, the replacement is safe. Note that this node may not be a leaf node but a node with only a left subtree, so a recursive deletion may occur.

AVL Trees

BSTs are no more than a good start point, because we cannot control the height of the tree when insertions or deletions occur frequently. AVL trees are mean to solve this problem, which can let a BST always be balanced.

An AVL tree is a binary search tree in which the heights of the left and right substrees of the root differ by at most 1 and in which the left and right subtrees are again AVL trees.
With each node of an AVL tree is associated a balance factor that is left-higher, equal-height, or right-higher according, respectively, as the left subtree has height greater than, equal to, or less than that of the right subtree.

The most important feature of an AVL tree is that it will perform some special operations, called rotations, in order to keep balance of the tree when insertions or deletions attempt to break the balance.

Let us first redefine the concept balance factor. Let $H_L(x)$ be the height of the left subtree of the node $x$, and $H_R(x)$ be the height of the right subtree. We say that the balance factor of node $x$

$$
bf(x) = H_L(x) - H_R(x)
$$

Therefore, any node $x$ of an AVL tree satisfies $|bf(x)| \le 1$.

When we meet a node $x$ that has a balance factor $|bf(x)| > 1$, generally there are four rotations like showing in the following table.

Insertion of a new node to an AVL tree has a new step after the node is inserted like a BST. First we insert the node as we insert the node to a BST, and then, we need to check the balance factor from the inserted node backward one by one. When we meet a node $x$ that has a balance factor $|bf(x)| > 1$, we stop immediately and perform a rotation to that node.

Deletion of a node in an AVL tree requires checking backward from the deleted node to the root node. If an unbalanced node is found, perform a rotation to that node.

Recursion

Recursion is the name for the case when a function invokes itself or invokes a sequence of other functions, one of which eventually invokes the first function again.

Recursion is a good way to fast accomplish a specific group of tasks like factorials. However, any recursive program can be rewritten to a iterative program with higher performance. Therefore, in modern software development, it is not a good idea to use recursion directly, but it is still worthwhile to learn because it provides a good method for us to find out answers of some complicated questions.

There are two main situations under which we may let recursion in. The first one is that we can write a formula which specify the relationship between the current state and one or a few previous states. Besides, an initial state is specified. For example, the factorials can be written like

$$
n! = \begin{cases}
1 &n = 0 \\
n(n - 1)! & n > 0
\end{cases}
$$

$n = 0$ is the initial state and when $n > 0$, the result is computed by $(n - 1)!$, a same problem but with a smaller scale.

Example: The Towers of Hanoi

If someone would like to teach recursion, he or she must introduce this typical problem – The Towers of Hanoi. This is a question that almost every one knows but, not all people can write down the solution immediately.

It is not so obvious that this problem is under the first situation, but we still can write down a formula here. Let m(n, s, d, t) represents "the steps move n disks from tower s to tower d using t as a temporary tower." Therefore, the formula can be written as

$$
m(n, s, d, t) = \begin{cases}
\text{do nothing} & n = 0 \\
m(n - 1, s, t, d) + m(1, s, d, t) + m(n - 1, t, d, s) & n > 0
\end{cases}
$$

The following is a simple C++ recursive program to accomplish the task.

void move(int n, int s, int d, int t) {
    if (n > 0) {
        move(n - 1, s, t, d);
        printf("move %d from %d to %d\n", n, s, d);
        move(n - 1, t, d, s);
    }
}

One more thing to say is that this program is terrible because every call produces two calls. Thus, with $n$ plates, the total number of calls is

$$
1 + 2 + 4 + \cdots + 2^{n - 1} = 2^n - 1
$$

Therefore, this is a program with $O(2^n)$ time complexity.

The second situation is that we need to do backtracking during the execution. In other words, we need to undo some operations that have already been done before. At this time, the structure of a recursive program is very useful. The next example is the typical Eight-Queens Puzzle, which will illustrate how the backtracking works.

Example: The Eight-Queens Puzzle

Computers are good at enumerating. When they meet the Eight-Queens Puzzle, they simply try all possible positions row by row. Obviously, it is easy to reach a dead end and a backtracking need to be performed, so we use a recursive structure to accomplish the task.

The following is a simple C++ recursive program to solve the puzzle.

// The board is a 8x8 array with 0 representing empty and 1 representing
// a queen.

// canPlace validates whether (row, col) can be placed a queen on the board.
bool canPlace(int board[8][8], int row, int col) {
    // Validate the row and the column.
    for (int i = 0; i < 8; i++)
        if (board[row][i] && i != col || board[i][col] && i != row)
            return false;

    // Validate the diagonal.
    for (int i = 0; i < 8; i++) {
        for (int j = 0; j < 8; j++) {
            if (board[i][j] && i != row && 
                (i + j == row + col || i - j == row - col))
                return false;
        }
    }

    return true;
}

// eightQueensPuzzle returns true when the rows >= row can be placed
// correctly on the board.
bool eightQueensPuzzle(int board[8][8], int row) {
    if (row == 8) return true;

    // Enumerate the row.
    for (int i = 0; i < 8; i++) {
        if (canPlace(board, row, i)) {
            board[row][i] = 1;
            if (eightQueensPuzzle(board, row + 1)) {
                // If the rows >= row + 1 can be placed correctly, since the
                // current place is valid, the puzzle is solved.
                break;
            } else {
                // The current place reaches a dead end, try the next place.
                board[row][i] = 0;
            }
        }
    }

    // If the current row has one queen placed, a true is returned.
    for (int i = 0; i < 8; i++) if (board[row][i]) return true;
    return false
}

Can you find all solutions of the Eight-Queens Puzzle?

Searching

Every one knows the general idea of binary search. The problem is that when to use it to reduce the time complexity of a program. Here I would like to introduce a "standard" implementation of binary search. Note that even the general idea of binary search is the same, the actual times of comparison may still vary from implementations to implementations.

Here is the code of searching in a sorted vector of integers.

int binarySearch(vector<int> &nums, int target) {
    int left = 0, right = nums.size() - 1;
    while (left <= right) {
        int mid = (left + right) / 2;
        if (nums[mid] == target) return mid;
        if (nums[mid] < target) left = mid + 1;
        else right = mid - 1;
    }
    return -1;
}

Sorting

Sorting is a very facinating topic in the field of algorithms. It can be simple, like perform a selection sort on a list of integers, and suitable for newcommers to learn. It is also worth thinking because the mergesort and the quicksort are two of the most typical algorithms applying the divide-and-conquer method, which are widely used in plenty of tasks. It is extremely useful because it can reduce the time complexity of searching. It also provides judgements of performances of other algorithms. In conclusion, it is worthwhile to learn sorting as many times as you can and to remember it in your heart.

I am not going to introduce sorting algorithms with $O(n^2)$ time complexity like selection sort or bubble sort. I will main focus on quicksort and mergesort here.

Mergesort

In the first method, we simply chop the list into two sublists of sizes as nearly equal as possible and then sort them separately. Afterward, we carefully merge the two sorted sublists into a single sorted list.

The main procedure of mergesort is to divide the list into two sublists and perform mergesort on each sublist, repectively. Then the sublists will be merged to a sorted list. This process is recursive. When a sublist only contains one element, this sublist is automatically sorted and the combination begins immediately.

The following is a common recursive C++ implementation to sort a vector of integers.

void recursiveMergeSort(vector<int> &nums, int left, int right) {
    if (left == right) return;

    // Sort the left sublist and the right sublist, respectively.
    int mid = left + (right - left) / 2;
    recursiveMergeSort(nums, left, mid);
    recursiveMergeSort(nums, mid + 1, right);

    // Merge the two sublists.
    int i = left, j = mid + 1;
    vector<int> tmp(right - left + 1);
    for (int k = 0; k < tmp.size(); k++) {
        if (i <= mid && j <= right) {
            if (nums[i] >= nums[j]) {
                tmp[k] = nums[j];
                j++;
            } else {
                tmp[k] = nums[i];
                i++;
            }
        } else if (i > mid) {
            tmp[k] = nums[j];
            j++;
        } else if (j > right) {
            tmp[k] = nums[i];
            i++;
        }
    }
    for (int k = 0; k < tmp.size(); k++) {
        nums[left + k] = tmp[k];
    }
}

void mergeSort(vector<int> &nums) {
    recursiveMergeSort(nums, 0, nums.size() - 1);
}

References

Robert L. K. and Alexander J. R., Data Structures and Program Design in C++.

Capacitated Facility Location Problem

Posted on 2018-12-18 | Edited on 2024-10-23

Capacitated Facility Location Problem(CFLP)问题是一个典型的NP-Hard问题，该问题的描述如下，

给定N个设施(facility)和M位顾客(customer)，每个设施有一个最大容量s，并且可以选择开启或关闭，开启有一个开启费用f。
每位顾客有一个需求d，需要被某个或某些个设施满足，某个设施满足的需求数不得超过其最大容量s。
每个设施到每个顾客都有一个满足费用c，即满足某个顾客单位需求的费用。
最小化满足所有顾客所有需求的总费用。

分析

引入如下的数学符号，

$N$表示设施数，$M$表示顾客数。
$s_i, \forall i = 0,…,N-1$表示设施$i$的最大容量。
$f_i, \forall i = 0,…,N-1$表示设施$i$的开启费用。
$d_i, \forall i = 0,…,M-1$表示顾客$i$的需求。
$c_{ij}, \forall i = 0,…,N-1, j = 0,…,M-1$表示设施$i$满足顾客$j$单位需求所需费用。
注：一般而言，用$c_{ij}$表示的是满足顾客所有需求所需费用，这里为避免引入除法和分数采用单位需求。

并将CFLP的解表示为，

$x_i, \forall i = 0,…,N-1, x_i \in \{0,1\}$表示设施$i$的开启状况，1表示开启，0表示关闭。
$y_{ij}, \forall i = 0,…,N-1, j=0,…,M-1$表示设施$i$满足顾客$j$的需求数。

那么CFLP可以表示为如下的最优化问题，

$$
min \sum_{i=0}^{N-1}\sum_{j=0}^{M-1}c_{ij}y_{ij} + \sum_{i=0}^{N-1}f_ix_i \quad s.t. \\
\begin{align}
y_{ij} \ge 0 & \quad \forall i=0,…,N-1, j=0,…,M-1 \\
\sum_{j=0}^{M-1}y_{ij} \le s_ix_i & \quad \forall i=0,…,N-1 \\
\sum_{i=0}^{N-1}y_{ij} = d_j & \quad \forall j=0,…,M-1 \\
x_i \in \{0,1\} & \quad \forall i=0,…,N-1
\end{align}
$$

CFLP的条件比较多，但您可以将解过程分为两个部分，

开启一些设施（求解$x$）。
决定每个顾客在每个设施中满足需求的数量（将用户需求赋值给设施，求解$y$）。

其中，开启设施共有$2^N$种可能性，也是使得这个问题无法在多项式时间内求解的关键因素。当开启的设施已经被选定时，可以采用如下的贪心策略对设施赋值，

在已开启且还有容量剩余的设施和还未被满足需求的顾客中选择费用最低的设施-顾客对，尽可能地满足该顾客，直到所有顾客都被满足。
若该顾客已满足，则将顾客移出考虑范围；若该顾客未被满足，则设施已达到最大容量，将该设施移出考虑范围；特别的，恰好满足时，同时移除设施和顾客；此时如果还有顾客未被满足，则返回1，否则结束过程。

因此，考虑的重点在于选择哪些设施进行开启，下面将着重阐述若干个策略。

实现

Github Repo

实现框架：Golang+Cobra命令行程序

问题的表示

问题包含$N,M,s,f,d,c$五个变量，其中$N,M$用两个整数表示，$s,f$分别用一个长度为$N$的一维数组表示，$d$用一个长度为$M$的一维数组表示，$c$采用一个$N \times M$的二维数组表示，结构如下，

// Problem contains data of a CFLP.
type Problem struct {
    N, M        int
    Capacities  []int
    FixedCosts  []int
    Demands     []int
    TotalDemand int
    Costs       [][]int
}

注：为了计算方便，在读取数据的时候计算了总需求并储存在变量TotalDemand中。

易得，读取数据的时间复杂度为$O(NM)$，空间复杂度为$O(NM)$。

解的表示

一个解包含设施的开启信息$x$和用户需求的赋值信息$y$，分别使用一个长度为$N$的一维数组和一个$N \times M$的二维数组表示，结构如下，

// Solution represents a solution of a CFLP.
type Solution struct {
    *Problem
    X       []bool
    Y       [][]int
    RunTime float64
}

注：添加了RunTime记录运行时间。

当X确定时，采用上面提到的贪心策略进行需求赋值的代码过程如下，

// Assign assigns demands to facility based on current opened facilities.
func (s *Solution) Assign() {
    totalDemand := s.TotalDemand
    filled := make([]int, s.N)
    demands := make([]int, s.M)
    for j := range s.Demands {
        demands[j] = s.Demands[j]
    }
    for i := range s.Y {
        for j := range s.Y[i] {
            s.Y[i][j] = 0
        }
    }
    for totalDemand > 0 {
        // find the cheapest cost
        minI := 0
        minJ := 0
        minCost := math.MaxInt32
        for i := range s.Costs {
            if !s.X[i] || filled[i] == s.Capacities[i] {
                continue
            }
            for j := range s.Costs[i] {
                if s.Costs[i][j] < minCost && demands[j] > 0 {
                    minI = i
                    minJ = j
                    minCost = s.Costs[i][j]
                }
            }
        }

        // assign
        var fill int
        if s.Capacities[minI]-filled[minI] <= demands[minJ] {
            // the facility will be full
            fill = s.Capacities[minI] - filled[minI]
        } else {
            // the facility has more capacity then the demand
            fill = demands[minJ]
        }
        demands[minJ] -= fill
        totalDemand -= fill
        s.Y[minI][minJ] += fill
        filled[minI] += fill
    }
}

可见，该过程须要临时保留每个设施被占用的容量以及每个顾客剩余的需求，因此空间复杂度为$O(N+M)$。

时间上，每次循环内需要便利所有符合条件的设施-用户对的费用，循环内时间复杂度为$O(NM)$，而每次循环过后必定有一个用户得到满足或者某个设施占用容量达到最大，所以循环次数不会超过设施数和用户数的总和，因此整个过程的时间复杂度为$O(NM(N+M))$。

Greedy 贪心策略

在选择设施上进行贪心策略显然不是最优的，但贪心策略的优点在于运行速度快，能够很快地得到一个参考结果，用于对比其他算法的解。

贪心策略非常简单，

选择设施中未开启的开启费用最少的设施，开启它。
若开启的设施的总容量小于顾客的总需求，回到1，否则结束。

运行代码片段如下，

func (s *Solver) solveByGreedy() *Solution {
    // initialize a solution
    sol := s.initSolution()

    // find the cheapest facility and turn it on
    minFPos := 0
    minF := math.MaxInt32
    for i, f := range s.p.FixedCosts {
        if f < minF {
            minFPos = i
            minF = f
        }
    }
    // Open the facility
    sol.Open(minFPos)

    // open from cheaper until all demands can be satisfied
    for !sol.Valid() {
        minFPos = 0
        minF = math.MaxInt32
        for i, f := range s.p.FixedCosts {
            if f < minF && !sol.X[i] {
                minFPos = i
                minF = f
            }
        }
        sol.Open(minFPos)
    }

    // assign the demands
    sol.Assign()

    return sol
}

复杂度分析

整个算法的流程如下，

初始化解（$O(MN)$）。
寻找未开启设施中开启费用最低的设施（$O(N)$）。
若开启的设施无法满足所用顾客的需要，返回2。(最多返回$N$次)
需求赋值（$O(MN(M+N))$）。

因此，贪心策略的时间复杂度为$O(N^2 + MN(M+N))$。

由于整个过程仅需要保持一个解，空间复杂度为$O(MN)$。

实验结果

在71个问题的问题集上的实验结果如下，

	Result	Time(s)
p1	12989.953	0.00004
p2	9288.850	0.00007
p3	10488.850	0.00008
p4	11688.850	0.00036
p5	10902.060	0.00008
p6	9832.666	0.00007
p7	11432.666	0.00006
p8	13032.666	0.00016
p9	10980.250	0.00010
p10	10458.043	0.00009
p11	11458.043	0.00006
p12	12458.043	0.00004
p13	11015.048	0.00028
p14	8514.791	0.00006
p15	10114.791	0.00012
p16	11714.791	0.00009
p17	10295.318	0.00006
p18	8543.171	0.00007
p19	10143.171	0.00013
p20	11743.171	0.00009
p21	12235.528	0.00022
p22	11363.414	0.00009
p23	12563.414	0.00005
p24	13763.414	0.00008
p25	78338.667	0.00039
p26	63710.947	0.00032
p27	65310.947	0.00032
p28	66910.947	0.00034
p29	65258.221	0.00040
p30	54927.860	0.00038
p31	56927.860	0.00043
p32	58927.860	0.00145
p33	77701.857	0.00035
p34	53887.154	0.00033
p35	55487.154	0.00035
p36	57087.154	0.00065
p37	101999.196	0.00024
p38	73636.480	0.00020
p39	74636.480	0.00022
p40	75636.480	0.00020
p41	7276.556	0.00021
p42	8068.351	0.00010
p43	6383.929	0.00009
p44	7995.223	0.00021
p45	8089.250	0.00013
p46	6915.251	0.00009
p47	7705.325	0.00015
p48	7874.275	0.00019
p49	6960.522	0.00009
p50	12441.358	0.00025
p51	9382.546	0.00034
p52	13378.456	0.00016
p53	12650.089	0.00019
p54	10536.867	0.00016
p55	12258.408	0.00019
p56	32522.303	0.00113
p57	37322.303	0.00119
p58	48522.303	0.00209
p59	30606.927	0.00110
p60	37243.427	0.00082
p61	40543.427	0.00142
p62	48243.427	0.00082
p63	31843.780	0.00091
p64	34081.806	0.00061
p65	36481.806	0.00061
p66	42081.806	0.00063
p67	32210.447	0.00096
p68	38955.963	0.00073
p69	41955.963	0.00071
p70	48955.963	0.00128
p71	31843.976	0.00096

Brute Force 暴力求解

在$N$比较小时，可以使用暴力求解的方式，即枚举所有设施开启和关闭的可能性（共$2^N$种可能）。暴力求解的时间复杂度为$O(2^N)$，因此当$N$比较大时，这种方法是不适用的，但是暴力求解可以在不清楚最优解的情况下找到一个理论上的最优解，用于比较分析其他求解算法。

暴力求解的代码过程如下，

func (s *Solver) solveByBruteForce() *Solution {
    // initialize a solution
    var bestSol *Solution
    bestCost := math.MaxFloat64

    // enumerate all situations
    total := 1 << uint(s.p.N)
    for c := 0; c < total; c++ {
        // output intermidiate results
        if c%2000 == 0 {
            log.Printf("evaluating %s(%d/%d) best=%.3f\n",
                strconv.FormatInt(int64(c), 2), c, total, bestCost)
        }
        // initialize a solution
        sol := s.initSolution()
        // open the facilities
        for i := range sol.X {
            if 1<<uint(i)&c > 0 {
                sol.Open(i)
            }
        }
        // if the solution cannot satisfy all customers, skip
        if !sol.Valid() {
            continue
        }
        // assign and compute the cost
        sol.Assign()
        cost := sol.Cost()
        if cost < bestCost {
            bestCost = cost
            bestSol = sol
        }
    }

    return bestSol
}

过程中，通过一个整数$c, 0 \le c < 2^N$来表示$N$个设施的唯一一种开关状态，以其二进制数中的1表示开启，0表示关闭，因此遍历c即可遍历所有情况。

时间复杂度：$O(2^N)$
空间复杂度：$O(MN)$

实际实验过程中，当$N > 20$时运行时间已经不可接受，应当使用其它的一些算法。

实验结果

	Result	Time(s)
p1	8899.287	0.01975
p2	7943.402	0.01708
p3	9543.402	0.01890
p4	10947.538	0.01488
p5	8928.489	0.00602
p6	7729.489	0.00627
p7	9529.489	0.00606
p8	11259.081	0.00579
p9	8460.030	0.03576
p10	7617.000	0.03523
p11	9103.286	0.03549
p12	10363.911	0.03088
p13	8245.910	63.38475
p14	7125.641	63.63613
p15	8856.372	62.66329
p16	10456.372	61.17692
p17	8195.925	60.36199
p18	7100.582	69.37253
p19	8803.787	66.69740
p20	10403.787	60.53454
p21	8054.758	64.53812
p22	7092.000	64.51618
p23	8740.758	64.48915
p24	10267.971	64.60072
p25	-	$\infty$
…	-	$\infty$
p71	-	$\infty$

Simulated Annealing 模拟退火

模拟退火算法是一种优秀的局部搜索算法，它在做邻域搜索的同时，概率接受差解以跳出局部最小值。在CFLP中，将设施的一组开关组合视为一个状态，使用模拟退火算法的基本流程如下，

其中，内外层循环终止条件一般设定为经过一定数目的迭代次数后终止。

参数设定

初始温度T=100
内层循环数为1000
外层循环数为1000
温度下降方式为T *= 0.99

初始解的生成

初始全局解应当随机生成，即每个设施按均匀分布决定开或者不开。

邻域搜索

领域搜索即根据当前解生成下一个解，实验过程中发现模拟退火算法里使用多种邻域搜索方式共同作用比单种邻域搜索方式效果要好很多，实验中使用到的备选邻域操作有，

随机开启/关闭某个设施。
随机开启/关闭一段连续的设施。
将某一段连续的设施的开关状态倒置。

当需要生成新状态时，从上述三种方法中随机选择一种。

Metropolis 准则

Metropolis准则是模拟退火算法的核心所在，当这个准则满足时，接受生成的解并替换掉全局解。具体而言，Metropolis准则为

$$
\min\{1, e^{-\frac{C’-C}{T}}\} < R
$$

其中，$C’$为生成的解的费用，$C$为全局解的费用，$T$为温度，$R$为$[0,1]$之间的随机数。

可见，当遇到好解时，$e^{-\frac{C’-C}{T}} > 1$，必定会接收，但当遇到差解时，当温度高时，$e^{-\frac{C’-C}{T}}$比温度低时更加接近1，有概率接收此差解，且随着温度降低，接收差解的概率也降低，最终收敛到一个最优解。

复杂度分析

模拟退火算法的代码过程如下，

func (s *Solver) solveBySA() *Solution {
    // parameters
    bestSol := s.initSolution()
    bestSol.Shuffle()
    for !bestSol.Valid() {
        bestSol.Shuffle()
    }
    bestSol.Assign()
    bestCost := bestSol.Cost()
    nExtIter := 1000
    nInnIter := 1000
    T := 100.0

    // searching
    for ext := 0; ext < nExtIter; ext++ {
        for inn := 0; inn < nInnIter; inn++ {
            sol := CopySolution(bestSol)
            sol.RandomAreaOperate()
            for !sol.Valid() {
                sol.RandomAreaOperate()
            }
            sol.Assign()
            costNext := sol.Cost()
            if math.Min(1, math.Exp(bestCost-costNext)/T) >= rand.Float64() {
                bestSol = sol
                bestCost = costNext
            }
        }
        log.Printf("ExtIter=%d(%d) T=%.3f bestCost=%.3f\n", ext, nExtIter,
            T, bestCost)
        T *= 0.99
    }

    return bestSol
}

其中在内层循环，复制解的复杂度为$O(MN)$，随机邻域操作的平均复杂度为$O(N)$，其余均为常数时间，因此整体算法的时间复杂度为$O(EIMN)$，其中$E$为外层循环数，$I$为内层循环数。

由于整个过程只需保持一个解，空间复杂度为$O(MN)$。

实验结果

在71个问题的问题集上的实验结果如下，

	Result	Time(s)
p1	8899.287	49.67533
p2	7943.402	46.02201
p3	9549.287	43.71652
p4	10947.538	41.95468
p5	8928.489	49.61210
p6	7729.489	49.43371
p7	9529.489	49.90450
p8	11259.650	49.82498
p9	8460.030	38.91396
p10	7617.000	41.85766
p11	9103.286	39.19025
p12	10363.911	39.63912
p13	8245.910	62.59881
p14	7125.641	63.83618
p15	8856.372	56.31368
p16	10456.372	55.16921
p17	8195.925	62.16191
p18	7129.274	74.85285
p19	8803.787	59.92987
p20	10403.787	59.71305
p21	8139.000	56.84904
p22	7124.000	59.69487
p23	8780.306	54.46691
p24	10298.306	50.47274
p25	12233.713	546.33127
p26	11152.713	540.89889
p27	13117.713	548.15456
p28	15183.478	608.56645
p29	13831.838	3797.52276
p30	12366.202	666.75613
p31	14535.702	642.78949
p32	17277.047	629.18618
p33	12174.419	531.56347
p34	11212.667	585.61782
p35	13843.667	565.90143
p36	14570.419	500.77183
p37	11424.636	427.68900
p38	10594.000	512.98266
p39	11976.636	429.34593
p40	13176.636	428.54124
p41	6755.391	100.04381
p42	5680.069	93.40380
p43	5401.000	80.17776
p44	7162.573	113.07130
p45	6244.500	104.98954
p46	5994.292	95.73207
p47	6424.425	127.02732
p48	5594.600	125.31068
p49	5304.300	98.86595
p50	9258.967	139.52326
p51	7748.878	198.40947
p52	9862.039	167.22943
p53	8969.800	165.41922
p54	9331.120	136.80022
p55	7789.517	157.08357
p56	21422.086	1343.04327
p57	26388.956	1220.42790
p58	37722.381	1176.16916
p59	27645.265	1169.97743
p60	20923.000	1212.31783
p61	25621.657	1052.55700
p62	33197.623	957.01278
p63	25287.956	975.96980
p64	20834.000	1602.66377
p65	25093.000	868.34070
p66	32522.450	762.51041
p67	24932.180	900.71575
p68	20884.000	1051.84617
p69	25202.000	937.94478
p70	32858.727	845.59141
p71	25924.628	923.64509

综合对比

三种算法在71个问题上的实验结果如下，您可以点击此链接下载所有解的详细结果，包括解的最终费用，开启的设施以及顾客的赋值情况。

	Result(greedy)	Time(s)	Result(brute-force)	Time(s)	Result(sa)	Time(s)
p1	12989.953	0.00004	8899.287	0.01975	8899.287	49.67533
p2	9288.850	0.00007	7943.402	0.01708	7943.402	46.02201
p3	10488.850	0.00008	9543.402	0.01890	9549.287	43.71652
p4	11688.850	0.00036	10947.538	0.01488	10947.538	41.95468
p5	10902.060	0.00008	8928.489	0.00602	8928.489	49.61210
p6	9832.666	0.00007	7729.489	0.00627	7729.489	49.43371
p7	11432.666	0.00006	9529.489	0.00606	9529.489	49.90450
p8	13032.666	0.00016	11259.081	0.00579	11259.650	49.82498
p9	10980.250	0.00010	8460.030	0.03576	8460.030	38.91396
p10	10458.043	0.00009	7617.000	0.03523	7617.000	41.85766
p11	11458.043	0.00006	9103.286	0.03549	9103.286	39.19025
p12	12458.043	0.00004	10363.911	0.03088	10363.911	39.63912
p13	11015.048	0.00028	8245.910	63.38475	8245.910	62.59881
p14	8514.791	0.00006	7125.641	63.63613	7125.641	63.83618
p15	10114.791	0.00012	8856.372	62.66329	8856.372	56.31368
p16	11714.791	0.00009	10456.372	61.17692	10456.372	55.16921
p17	10295.318	0.00006	8195.925	60.36199	8195.925	62.16191
p18	8543.171	0.00007	7100.582	69.37253	7129.274	74.85285
p19	10143.171	0.00013	8803.787	66.69740	8803.787	59.92987
p20	11743.171	0.00009	10403.787	60.53454	10403.787	59.71305
p21	12235.528	0.00022	8054.758	64.53812	8139.000	56.84904
p22	11363.414	0.00009	7092.000	64.51618	7124.000	59.69487
p23	12563.414	0.00005	8740.758	64.48915	8780.306	54.46691
p24	13763.414	0.00008	10267.971	64.60072	10298.306	50.47274
p25	78338.667	0.00039	-	$\infty$	12233.713	546.33127
p26	63710.947	0.00032	-	$\infty$	11152.713	540.89889
p27	65310.947	0.00032	-	$\infty$	13117.713	548.15456
p28	66910.947	0.00034	-	$\infty$	15183.478	608.56645
p29	65258.221	0.00040	-	$\infty$	13831.838	3797.52276
p30	54927.860	0.00038	-	$\infty$	12366.202	666.75613
p31	56927.860	0.00043	-	$\infty$	14535.702	642.78949
p32	58927.860	0.00145	-	$\infty$	17277.047	629.18618
p33	77701.857	0.00035	-	$\infty$	12174.419	531.56347
p34	53887.154	0.00033	-	$\infty$	11212.667	585.61782
p35	55487.154	0.00035	-	$\infty$	13843.667	565.90143
p36	57087.154	0.00065	-	$\infty$	14570.419	500.77183
p37	101999.196	0.00024	-	$\infty$	11424.636	427.68900
p38	73636.480	0.00020	-	$\infty$	10594.000	512.98266
p39	74636.480	0.00022	-	$\infty$	11976.636	429.34593
p40	75636.480	0.00020	-	$\infty$	13176.636	428.54124
p41	7276.556	0.00021	-	$\infty$	6755.391	100.04381
p42	8068.351	0.00010	-	$\infty$	5680.069	93.40380
p43	6383.929	0.00009	-	$\infty$	5401.000	80.17776
p44	7995.223	0.00021	-	$\infty$	7162.573	113.07130
p45	8089.250	0.00013	-	$\infty$	6244.500	104.98954
p46	6915.251	0.00009	-	$\infty$	5994.292	95.73207
p47	7705.325	0.00015	-	$\infty$	6424.425	127.02732
p48	7874.275	0.00019	-	$\infty$	5594.600	125.31068
p49	6960.522	0.00009	-	$\infty$	5304.300	98.86595
p50	12441.358	0.00025	-	$\infty$	9258.967	139.52326
p51	9382.546	0.00034	-	$\infty$	7748.878	198.40947
p52	13378.456	0.00016	-	$\infty$	9862.039	167.22943
p53	12650.089	0.00019	-	$\infty$	8969.800	165.41922
p54	10536.867	0.00016	-	$\infty$	9331.120	136.80022
p55	12258.408	0.00019	-	$\infty$	7789.517	157.08357
p56	32522.303	0.00113	-	$\infty$	21422.086	1343.04327
p57	37322.303	0.00119	-	$\infty$	26388.956	1220.42790
p58	48522.303	0.00209	-	$\infty$	37722.381	1176.16916
p59	30606.927	0.00110	-	$\infty$	27645.265	1169.97743
p60	37243.427	0.00082	-	$\infty$	20923.000	1212.31783
p61	40543.427	0.00142	-	$\infty$	25621.657	1052.55700
p62	48243.427	0.00082	-	$\infty$	33197.623	957.01278
p63	31843.780	0.00091	-	$\infty$	25287.956	975.96980
p64	34081.806	0.00061	-	$\infty$	20834.000	1602.66377
p65	36481.806	0.00061	-	$\infty$	25093.000	868.34070
p66	42081.806	0.00063	-	$\infty$	32522.450	762.51041
p67	32210.447	0.00096	-	$\infty$	24932.180	900.71575
p68	38955.963	0.00073	-	$\infty$	20884.000	1051.84617
p69	41955.963	0.00071	-	$\infty$	25202.000	937.94478
p70	48955.963	0.00128	-	$\infty$	32858.727	845.59141
p71	31843.976	0.00096	-	$\infty$	25924.628	923.64509

可以发现，模拟退火的算法要远好于贪心算法，并且与有结果的暴力求解的结果十分相近。

【LeetCode】233. Number of Digit One

Posted on 2018-12-01 | Edited on 2024-10-23

Problem Link

概览

给定一个整数n，求所有小于等于n的非负整数中数字1的个数之和。

暴力求解（TLE）

本题可使用暴力求解枚举每一个小于等于n的非负整数，虽然C语言给定的参数是int，但穷举依然会超时。

时间复杂度：$O(n)$
空间复杂度：$O(1)$

动态规划

标答中给出了一种基于数学的方法，但本题依然可使用动态规划的思想求解。

状态设计：

令DP[i]表示前i位数的答案。

状态转移：

这样的状态设计可以使复杂度降至$O(d)$，$d$表示n的位数，但是状态转移过程不单单涉及到前一状态，还涉及到一些已知的结论。

令digits[i]表示第i + 1个数字，nums[i]表示前i + 1位数组成的数字（i低到高，i = 0, 1, 2, ..., d - 1），nines[i]表示i个9组成的数字的答案，则有

nines[i] = nines[i - 1] * 10 + pow(10, i - 1)；
当digits[i] == 0时，DP[i] = DP[i - 1]；
当digits[i] == 1时，DP[i] = DP[i - 1] + nums[i - 1] + 1 + nines[i]；
当digits[i] > 1时，DP[i] = DP[i - 1] + nines[i] * digits[i] + pow(10, i)。

具体来说，一个位数为$d$的数$n$可以做如下分解：

$$
n = n_{(d)} * 10^{d - 1} + n’
$$

其中$n’$表示去除最高位所得的数，$n_{(i)}$为整数$n$第$i$位的数字。在算法中DP[i - 1]即表达了数$n’$对应的答案，剩余部分可分为三种情况：

$n_{(d)} = 0$，这种情况只会不会出现在最高位，直接相等即可。
$n_{(d)} = 1$，则对于任意$0 \le k \le n’$，首位的1都会计算一次次数，这部分的次数为nums[i - 1] + 1次；同时，$10^{d - 1}$的部分则为$d - 1$个$9$组成的数字的答案。
$n_{(d)} > 1$，则$n_{(d)} * 10^{d - 1}$的部分包括$n_{(d)}$个$d - 1$个$9$组成的数字的答案以及经过第$d$位为1时数字的个数共$10^{d - 1}$个。

综上所述，算法的时间复杂度为$O(d)$，空间复杂度为$O(d)$，其中$d$为数字的位数。

附录

我的解答

【LeetCode】221. Maximal Square

Posted on 2018-12-01 | Edited on 2024-10-23

Problem Link

概览

一个典型的2D平面上的动态规划问题。给定一个仅有0和1矩阵，找出其中最大的，仅包含1的正方形。

动态规划

状态定义：

定义DP[i][j]表示矩阵M中以位置(i, j)为右下角的最大的正方形的边长。

状态转移方程：

由上述状态定义可知，当M[i][j] == 0时，DP[i][j] = 0；否则DP[i][j]由DP[i - 1][j]、DP[i][j - 1]、DP[i - 1][j - 1]中的最小值决定——考虑这三个值相等时，DP[i][j]可以组合使边长增加1，同时若这三个值不同时增加，都无法再增加DP[i][j]的值。

因此，状态转移方程表示为：

1
2
3

DP[i][j] == 0 if M[i][j] == 0
DP[i][j] == M[i][j] if i == 0 || j == 0
DP[i][j] == min{DP[i - 1][j], DP[i][j - 1], DP[i - 1][j - 1]} + 1 if i > 0 && j > 0 && M[i][j] == 1

算法的时间复杂度为$O(MN)$，空间复杂度为$O(MN)$，其中输入矩阵为$M \times N$的矩阵。

附录

我的解答

【LeetCode】857. Minimum Cost to Hire K Workers

Posted on 2018-12-01 | Edited on 2024-10-23

Problem Link

概览

给定N个工人，每个工人有一个quality值和一个wage值，要求在N个工人中雇佣K个人，并按照如下标准发放工资：

每个人被发放的工资pay与quality的比值必须相同。
每个人被发放的工资pay必须大于wage。

求总工资的最小值。

题解

首先，对题目中的两个条件进行解读。

由题中第一个条件可知：

选定的K个人中，每人发放的实际工资pay与quality的比值相同。

定义价性比ratio = wage / quality，表达每个工人单位quality期望的工资，定义实际价性比ratio_ = pay / quality，那么由题中第二个条件可知：

选定的K个人中，每人发放的实际工资pay与quality的比值必须大于ratio。

综上所述，可得本题关键性的信息：

选定的K个人中，ratio_ >= max{ratio}，即实际价性比不得低于雇佣的工人中最高的期望价性比。

又因为总工资cost = ratio_ * sum(quality)，所以要使得总工资尽可能的小，就要使ratio_和sum(quality)尽可能小，因此可得具体的算法流程如下：

对所有人按价性比进行排序。
选择价性比最低的前K个人作为初始集，计算总工资。
从第K + 1个人开始，循环如下过程直至无人可以添加：
1. 踢出原集合中quality最大的工人，并添加该新工人。
2. 重新计算quality的总和。
3. 重新计算总工资，若小于当前最小值，则更新该值。

上述算法本质即遍历所有可能的价性比，找出最小的总工资。

复杂度分析

本题的空间复杂度为$O(N)$，但时间复杂度并不那么直观，需要逐步分析。

对所有人的价性比进行排序。（$O(NlogN)$）
选择价性比最低的前K个人作为初始集，计算总工资。（$O(N)$）
从第K + 1个人开始，循环如下过程直至无人可以添加：（$O(N)$）
1. 踢出原集合中quality最大的工人，并添加该新工人。（$O(logN)$）
2. 重新计算quality的总和。（$O(1)$）
3. 重新计算总工资，若小于当前最小值，则更新该值。（$O(1)$）

其中步骤3.1的复杂度依赖于具体实现方式，使用优先队列实现可以保证复杂度为$O(logN)$。

步骤3.2只需要取出踢出的工人和新工人的quality值即可计算，复杂度为常数时间。

步骤3.3中用新工人的ratio作为总ratio_进行计算，复杂度为常数时间。

综上所述，时间复杂度为$O(NlogN)$，空间复杂度为$O(N)$。

附录

我的解答

【LeetCode】72. Edit Distance

Posted on 2018-12-01 | Edited on 2024-10-23

Problem Link

概览

编辑距离问题是经典的实用问题。给定一个字符串，定义在该字符串的三种操作：

插入一个字符
删除一个字符
替换一个字符

那么两个字符串A和B的编辑距离定义为通过上述三种操作将A编辑成B的最少操作数。

问题给定两个字符串，求其之间的编辑距离。

动态规划

编辑距离问题也是一个经典的动态规划问题，直观上很难求解的两个字符串的编辑距离通过状态转移就能轻易解出。

定义状态：

DP[i][j]表示A的长度为i的前缀到B的长度为j的前缀的编辑距离；那么答案为DP[|A|][|B|]。

定义状态转移方程：

DP[i][j]可以通过DP[i-1][j]，DP[i][j-1]以及DP[i-1][j-1]求得，具体来说：

当A[i-1]与B[j-1]相等时，DP[i][j] = DP[i-1][j-1]。
当A[i-1]与B[j-1]不等时，考虑下面三种情况的最小值：
- DP[i-1][j] + 1，由A的前i-1个字符变为B的前j个字符并删除A[i]；
- DP[i][j-1] + 1，由A的前i个字符变为B的前j-1个字符并新增B[j]；
- DP[i-1][j-1] + 1，由A的前i-1个字符变为B的前j-1个字符并替换A[i]。

综上，可得状态转移方程如下：

DP[0][j] = j for all j in 0, 1, ..., |B|
DP[i][0] = i for all i in 0, 1, ..., |A|
DP[i][j] = DP[i - 1][j - 1] if A[i - 1] == B[j - 1]
DP[i][j] = min{DP[i - 1][j], DP[i][j - 1], DP[i - 1][j - 1]} + 1 if A[i - 1] != B[j - 1]

时间复杂度：$O(|A||B|)$
空间复杂度：$O(|A||B|)$

附录

我的解答

【LeetCode】773. Sliding Puzzle

Posted on 2018-11-19 | Edited on 2024-10-23

Problem Link

概览

LeetCode第773号问题是一个“六数码”问题，跟人们熟知的八数码问题唯一区别在于棋盘大小。本问题是一个在2*3的棋盘上的六数码问题，最大的状态数为全排列的个数6!=720，因此可以放心使用宽度优先搜索寻找答案。

宽度优先搜索

定义一个棋盘“局面”为图中的一个点，从问题给定的初始结点出发进行宽度优先搜索，每次搜索一步能到达的所有相邻结点。

除了每次搜索相邻所有节点以外，还要记住所有已经访问过的节点，避免重复访问造成死循环。

宽度优先搜索以队列作为主要数据结构实现，实现过程中为了优化速度，可以将状态使用更加高效的方式表示，例如用字符串或一维的排列表示要优于使用二维数组表示。

附录

我的解答