高可用架构
2026/3/20大约 9 分钟
高可用架构
高可用概念
单点故障(SPOF)
可用性指标
| 可用性 | 年停机时间 | 适用场景 |
|---|---|---|
| 99% (2 个 9) | 3.65 天 | 内部系统 |
| 99.9% (3 个 9) | 8.76 小时 | 普通业务系统 |
| 99.99% (4 个 9) | 52.6 分钟 | 重要业务系统 |
| 99.999% (5 个 9) | 5.26 分钟 | 金融、电信核心系统 |
高可用架构原则
- 消除单点故障:关键组件都要有冗余
- 故障自动检测:能够自动发现故障
- 故障自动转移:能够自动切换到备用节点
- 数据一致性:故障转移时保证数据完整
- 最小化停机时间:故障切换要快速
Keepalived + Nginx 高可用
Keepalived 原理
正常状态: Master 持有 VIP,处理所有请求
故障切换: Master 故障时,Backup 自动接管 VIP
安装 Keepalived
# CentOS
sudo yum install -y keepalived
# Ubuntu
sudo apt install -y keepalived
# 编译安装
wget https://www.keepalived.org/software/keepalived-2.2.8.tar.gz
tar -xzf keepalived-2.2.8.tar.gz
cd keepalived-2.2.8
./configure --prefix=/usr/local/keepalived
make && make install
主备模式配置
Master 节点配置:
# /etc/keepalived/keepalived.conf (Master)
! Configuration File for keepalived
global_defs {
router_id nginx_master # 路由器 ID
script_user root # 脚本执行用户
enable_script_security # 启用脚本安全
}
# 健康检查脚本
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 2 # 检查间隔(秒)
weight -20 # 失败时优先级减少值
fall 2 # 连续失败次数
rise 1 # 连续成功次数
}
vrrp_instance VI_1 {
state MASTER # 初始状态
interface eth0 # 网卡名称
virtual_router_id 51 # 虚拟路由 ID(同一集群要相同)
priority 100 # 优先级(Master 要高于 Backup)
advert_int 1 # 心跳间隔(秒)
nopreempt # 非抢占模式(可选)
# 认证
authentication {
auth_type PASS
auth_pass 1234 # 认证密码
}
# 虚拟 IP
virtual_ipaddress {
192.168.1.100/24 dev eth0 # VIP 地址
}
# 跟踪脚本
track_script {
check_nginx
}
# 状态变化通知
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
Backup 节点配置:
# /etc/keepalived/keepalived.conf (Backup)
! Configuration File for keepalived
global_defs {
router_id nginx_backup
script_user root
enable_script_security
}
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 2
weight -20
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP # Backup 状态
interface eth0
virtual_router_id 51 # 必须与 Master 相同
priority 90 # 优先级低于 Master
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.1.100/24 dev eth0
}
track_script {
check_nginx
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
健康检查脚本
#!/bin/bash
# /etc/keepalived/check_nginx.sh
# 检查 Nginx 进程是否存在
nginx_process=$(pgrep -f "nginx: master")
if [ -z "$nginx_process" ]; then
# Nginx 未运行,尝试启动
systemctl start nginx
sleep 2
# 再次检查
nginx_process=$(pgrep -f "nginx: master")
if [ -z "$nginx_process" ]; then
exit 1 # 启动失败,返回失败状态
fi
fi
# 检查 Nginx 端口是否监听
if ! ss -tlnp | grep -q ":80 "; then
exit 1
fi
# 检查 Nginx 是否能正常响应
response=$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/health)
if [ "$response" != "200" ]; then
exit 1
fi
exit 0
# 设置执行权限
chmod +x /etc/keepalived/check_nginx.sh
状态通知脚本
#!/bin/bash
# /etc/keepalived/notify.sh
STATE=$1
VIP="192.168.1.100"
LOGFILE="/var/log/keepalived-notify.log"
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOGFILE
}
case $STATE in
"master")
log "Transition to MASTER state"
# 可以发送告警通知
# curl -X POST "https://alert.example.com/webhook" -d "msg=Nginx became MASTER"
;;
"backup")
log "Transition to BACKUP state"
;;
"fault")
log "Transition to FAULT state"
# 发送紧急告警
;;
esac
双主模式配置
# 节点 A 配置
# /etc/keepalived/keepalived.conf
global_defs {
router_id nginx_node_a
}
vrrp_script check_nginx {
script "/etc/keepalived/check_nginx.sh"
interval 2
weight -20
}
# VIP 1 - 节点 A 为 Master
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.1.100/24 dev eth0
}
track_script {
check_nginx
}
}
# VIP 2 - 节点 A 为 Backup
vrrp_instance VI_2 {
state BACKUP
interface eth0
virtual_router_id 52
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass 5678
}
virtual_ipaddress {
192.168.1.101/24 dev eth0
}
track_script {
check_nginx
}
}
# 节点 B 配置 - 与节点 A 相反
# VI_1 为 BACKUP,VI_2 为 MASTER
故障切换测试
# 1. 查看 VIP 状态
ip addr show eth0
# 2. 查看 Keepalived 状态
systemctl status keepalived
cat /var/log/messages | grep -i keepalived
# 3. 模拟故障
# 在 Master 上停止 Nginx
systemctl stop nginx
# 4. 观察 VIP 漂移
# 在两个节点上执行
ip addr show eth0 | grep 192.168.1.100
# 5. 恢复服务
systemctl start nginx
# 6. 抓包查看 VRRP 心跳
tcpdump -i eth0 vrrp
LVS + Nginx 架构
LVS 三种工作模式
NAT 模式特点:
- 请求和响应都经过 LVS
- LVS 是瓶颈点
DR 模式特点:
- 请求经过 LVS,响应直接返回客户端
- 性能最佳,推荐使用
TUN 模式特点:
- 类似 DR 模式,但使用 IP 隧道
- 可跨网段部署
LVS + Keepalived + Nginx
# /etc/keepalived/keepalived.conf (LVS Director)
global_defs {
router_id LVS_MASTER
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
192.168.1.100/24
}
}
# LVS 虚拟服务器配置
virtual_server 192.168.1.100 80 {
delay_loop 6 # 健康检查间隔
lb_algo rr # 负载均衡算法:rr/wrr/lc/wlc/sh/dh
lb_kind DR # LVS 模式:NAT/DR/TUN
persistence_timeout 0 # 会话保持时间
protocol TCP
# 真实服务器 1
real_server 192.168.1.101 80 {
weight 1
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
}
# 真实服务器 2
real_server 192.168.1.102 80 {
weight 1
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
connect_port 80
}
}
}
Nginx 服务器配置(DR 模式):
# 配置 lo 接口绑定 VIP
cat > /etc/sysconfig/network-scripts/ifcfg-lo:0 << 'EOF'
DEVICE=lo:0
IPADDR=192.168.1.100
NETMASK=255.255.255.255
BROADCAST=192.168.1.100
ONBOOT=yes
NAME=loopback
EOF
# 配置 ARP 抑制
cat >> /etc/sysctl.conf << 'EOF'
net.ipv4.conf.lo.arp_ignore = 1
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
EOF
sysctl -p
Nginx 平滑升级
热升级步骤
# 1. 备份旧版本
cp /usr/sbin/nginx /usr/sbin/nginx.old
# 2. 编译新版本(不要 make install)
cd /usr/local/src/nginx-1.25.0
./configure --prefix=/usr/local/nginx --with-http_ssl_module ...
make
# 3. 替换二进制文件
cp objs/nginx /usr/sbin/nginx
# 4. 验证新版本
/usr/sbin/nginx -t
/usr/sbin/nginx -V
# 5. 发送 USR2 信号,启动新 Master
kill -USR2 $(cat /run/nginx.pid)
# 6. 查看进程
ps aux | grep nginx
# 此时应该有两个 Master 进程
# 7. 优雅关闭旧 Worker
kill -WINCH $(cat /run/nginx.pid.oldbin)
# 8. 验证新版本运行正常后,关闭旧 Master
kill -QUIT $(cat /run/nginx.pid.oldbin)
# 9. 如果需要回滚
kill -HUP $(cat /run/nginx.pid.oldbin) # 恢复旧 Master
kill -QUIT $(cat /run/nginx.pid) # 关闭新 Master
信号控制流程
平滑升级信号流程:
1. USR2 → 旧Master → 启动新Master → 新Master启动新Worker
2. WINCH → 旧Master → 优雅关闭旧Worker
3. QUIT → 旧Master → 关闭旧Master
回滚流程:
1. HUP → 旧Master → 恢复旧Worker
2. QUIT → 新Master → 关闭新进程
配置管理
Git 版本管理
# 初始化配置仓库
cd /etc/nginx
git init
git add .
git commit -m "Initial nginx config"
# 创建配置变更分支
git checkout -b feature/add-new-site
# 修改配置后
git add .
git commit -m "Add new site configuration"
# 测试配置
nginx -t
# 合并到主分支
git checkout main
git merge feature/add-new-site
# 应用配置
nginx -s reload
Ansible 自动化
# nginx-playbook.yml
---
- hosts: nginx_servers
become: yes
vars:
nginx_version: "1.24.0"
tasks:
- name: Install Nginx
package:
name: nginx
state: present
- name: Copy Nginx configuration
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
backup: yes
notify: Reload Nginx
- name: Copy site configurations
template:
src: "{{ item }}"
dest: /etc/nginx/conf.d/
with_fileglob:
- templates/sites/*.conf.j2
notify: Reload Nginx
- name: Test Nginx configuration
command: nginx -t
changed_when: false
handlers:
- name: Reload Nginx
service:
name: nginx
state: reloaded
配置热加载脚本
#!/bin/bash
# /usr/local/bin/nginx-reload.sh
CONFIG_DIR="/etc/nginx"
BACKUP_DIR="/etc/nginx/backup"
DATE=$(date +%Y%m%d%H%M%S)
# 创建备份
mkdir -p $BACKUP_DIR
tar -czf $BACKUP_DIR/nginx_config_$DATE.tar.gz $CONFIG_DIR/*.conf $CONFIG_DIR/conf.d/
# 测试配置
if nginx -t; then
echo "Configuration test passed"
# 重载配置
nginx -s reload
if [ $? -eq 0 ]; then
echo "Nginx reloaded successfully"
exit 0
else
echo "Nginx reload failed"
exit 1
fi
else
echo "Configuration test failed"
exit 1
fi
容器化部署
Docker 部署
# Dockerfile
FROM nginx:1.24-alpine
# 复制配置
COPY nginx.conf /etc/nginx/nginx.conf
COPY conf.d/ /etc/nginx/conf.d/
# 复制静态文件
COPY html/ /usr/share/nginx/html/
# 健康检查
HEALTHCHECK \
CMD curl -f http://localhost/health || exit 1
EXPOSE 80 443
CMD ["nginx", "-g", "daemon off;"]
# 构建镜像
docker build -t my-nginx:1.0 .
# 运行容器
docker run -d \
--name nginx \
-p 80:80 \
-p 443:443 \
-v /data/nginx/conf:/etc/nginx/conf.d:ro \
-v /data/nginx/logs:/var/log/nginx \
-v /data/nginx/html:/usr/share/nginx/html:ro \
--restart unless-stopped \
my-nginx:1.0
Docker Compose 多实例
# docker-compose.yml
version: "3.8"
services:
nginx-1:
image: nginx:1.24
container_name: nginx-1
ports:
- "8081:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./conf.d:/etc/nginx/conf.d:ro
- ./html:/usr/share/nginx/html:ro
- ./logs/nginx-1:/var/log/nginx
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 3s
retries: 3
restart: unless-stopped
networks:
- nginx-net
nginx-2:
image: nginx:1.24
container_name: nginx-2
ports:
- "8082:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./conf.d:/etc/nginx/conf.d:ro
- ./html:/usr/share/nginx/html:ro
- ./logs/nginx-2:/var/log/nginx
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 3s
retries: 3
restart: unless-stopped
networks:
- nginx-net
haproxy:
image: haproxy:2.8
container_name: haproxy
ports:
- "80:80"
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
depends_on:
- nginx-1
- nginx-2
restart: unless-stopped
networks:
- nginx-net
networks:
nginx-net:
driver: bridge
Kubernetes Ingress Controller
# nginx-ingress.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx-ingress
template:
metadata:
labels:
app: nginx-ingress
spec:
containers:
- name: nginx-ingress-controller
image: k8s.gcr.io/ingress-nginx/controller:v1.8.0
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/nginx-configuration
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
resources:
requests:
cpu: 100m
memory: 90Mi
limits:
cpu: 1
memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress
namespace: ingress-nginx
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 80
- name: https
port: 443
targetPort: 443
selector:
app: nginx-ingress
常见高可用架构图
两层架构
三层架构
总结
本章介绍了 Nginx 高可用架构:
- 高可用概念:单点故障、可用性指标、设计原则
- Keepalived + Nginx:VRRP 协议、主备模式、双主模式
- LVS + Nginx:NAT/DR/TUN 模式、负载均衡配置
- 平滑升级:热升级步骤、信号控制、回滚操作
- 配置管理:Git 版本控制、Ansible 自动化
- 容器化部署:Docker、Docker Compose、Kubernetes