Kubernetes-v1.24版安装部署之基础环境准备

王先森2022-05-162022-05-16

二进制安装Kubernete（k8s） v1.24.0

环境准备

主机名	角色	IP	安装软件
k8s-master.boysec.cn	代理节点	10.1.1.100	etcd、kueblet、kube-porxy、kube-apiserver、kube-controller-manager、kube-scheduler、Containerd
k8s-node01.boysec.cn	运算节点	10.1.1.120	etcd、kueblet、kube-porxy、Containerd
k8s-node02.boysec.cn	运算节点	10.1.1.130	etcd、kueblet、kube-porxy、Containerd

3台vm，每台至少2g。
OS： CentOS 7.9
containerd：v1.6.4
kubernetes：v1.24
etcd：v3.3.22
flannel：v0.12.0
证书签发工具CFSSL: V1.6.0

本次使用单master节点部署，需要多master请移步至一步步编译安装Kubernetes之master计算节点安装

安装CFSSL

CFSSL相关下载地址

wget -O /usr/bin/cfssl https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssl_1.6.0_linux_amd64
wget -O /usr/bin/cfssljson https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssljson_1.6.0_linux_amd64

chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson

创建CA证书JSON配置文件

mkdir /opt/certs/ -p
cd /opt/certs/
cat > /opt/certs/ca-csr.json << EOF
{
    "CN": "kubernetes-ca",
    "hosts": [
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "ST": "beijing",
            "L": "beijing",
            "O": "system:masters",
            "OU": "kubernetes"
        }
    ],
    "ca": {
        "expiry": "876000h"
    }
}
EOF

## 生成CA公钥和私钥文件
cfssl gencert -initca ca-csr.json | cfssljson -bare ca -

cat > /opt/certs/ca-config.json << EOF 
{
    "signing": {
        "default": {
            "expiry": "876000h"
        },
        "profiles": {
            "kubernetes": {
                "expiry": "876000h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            },
            "etcd": {
                "expiry": "876000h",
                "usages": [
                    "signing",
                    "key encipherment",
                    "server auth",
                    "client auth"
                ]
            }
        }
    }
}
EOF

k8s基本组件安装

安装Containerd

在k8s所有节点上安装Containerd作为Runtime

cd /server/tools/
wget https://github.com/containerd/containerd/releases/download/v1.6.4/cri-containerd-cni-1.6.4-linux-amd64.tar.gz
mkdir /opt/containerd-1.6.4
tar xf cri-containerd-cni-1.6.4-linux-amd64.tar.gz -C /opt/containerd-1.6.4
cd /opt/containerd-1.6.4/
ln -s /opt/containerd-1.6.4/usr/local/bin/* /usr/local/bin/
## 服务启动文件
cp /opt/containerd-1.6.4/etc/systemd/system/containerd.service /usr/lib/systemd/system/

配置Containerd所需的模块

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
## 加载模块
systemctl restart systemd-modules-load.service

配置Containerd所需的内核

cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# 加载内核
sysctl --system

配置runc支持

containerd 被设计成嵌入到一个更大的系统中，而不是直接由开发人员或终端用户使用。当 containerd 和 runC 成为标准化容器服务的基石后，上层的应用就可以直接建立在 containerd 和 runC 之上。我们的目的就是开发一个最小化容器系统，这需要containerd和runC的支持，使得Linux kernel在启动的时候，首先启动containerd而非init，并在容器中包含系统必要组件，如shell。但是containerd安装包中runc缺少undefined symbol: seccomp_notify_respond需要单独下载安装

1 2	wget -O /usr/local/sbin/runc https://github.com/opencontainers/runc/releases/download/v1.1.2/runc.amd64 chmod +x /usr/local/sbin/runc

创建Containerd的配置文件

mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml

# 1. 修改Containerd的配置文件（任选1 2）
sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml
 
cat /etc/containerd/config.toml | grep SystemdCgroup
 
# 2. 找到containerd.runtimes.runc.options，在其下加入SystemdCgroup = true
 
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
              SystemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".cni]
# 3. 添加阿里镜像源
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
        endpoint = ["https://l2v84zex.mirror.aliyuncs.com"]  # 设置你的阿里镜像源
# 4. 将sandbox_image默认地址改为符合版本地址
    sandbox_image = "kubernetes/pause"                       # 默认可能会被墙。

配置cni网络

cat > /etc/cni/net.d/10-flannel.conflist <<EOF
{
  "name": "flannel",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}
EOF
mkdir /opt/cni/bin -p
ln -s /opt/containerd-1.6.4/opt/cni/bin/* /opt/cni/bin/

启动并设置为开机启动

1 2	systemctl daemon-reload systemctl enable --now containerd

配置crictl客户端连接的运行时位置

cat > /etc/crictl.yaml <<EOF
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
 
#测试
systemctl restart  containerd
crictl info

安装etcd集群

创建证书

cat > /opt/certs/etcd-csr.json << EOF
{
    "CN": "etcd-peer",
    "hosts": [
        "10.1.1.100",
        "10.1.1.110",
        "10.1.1.120",
        "10.1.1.130"
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
        {
            "C": "CN",
            "ST": "beijing",
            "L": "beijing",
            "O": "etcd",
            "OU": "Etcd Security"
        }
    ]
}
EOF

## 生成证书
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd-csr.json |cfssljson -bare etcd
# 或者
cfssl gencert \
   -ca=ca.pem \
   -ca-key=ca-key.pem \
   -config=ca-config.json \
   -hostname=10.1.1.100,10.1.1.110,10.1.1.120,10.1.1.130 \
   -profile=etcd \
   etcd-csr.json | cfssljson -bare etcd

安装etcd

etcd下载地址

### 创建用户
useradd -s /sbin/nologin -M etcd

## 解压
cd /server/tools
tar xf etcd-v3.3.22-linux-amd64.tar.gz -C /opt
ln -s /opt/etcd-v3.3.22-linux-amd64 /opt/etcd

### 创建目录拷贝证书
mkdir -p /opt/etcd/{ssl,cfg}

### 将运维主机上生成的ca.pem、etcd-key.pem、etcd.pem拷贝到/opt/etcd/certs目录中，注意私钥文件权限600
chown etcd.etcd /opt/etcd/ssl/*
chmod 600 /opt/etcd/ssl/etcd-key.pem

cat > /opt/etcd/cfg/etcd.config.yml << EOF 
name: 'etcd-1'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://10.1.1.100:2380'
listen-client-urls: 'https://10.1.1.100:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://10.1.1.100:2380'
advertise-client-urls: 'https://10.1.1.100:2379,http://127.0.0.1:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
  cert-file: '/opt/etcd/ssl/etcd.pem'
  key-file: '/opt/etcd/ssl/etcd-key.pem'
  client-cert-auth: true
  trusted-ca-file: '/opt/etcd/ssl/ca.pem'
  auto-tls: true
peer-transport-security:
  cert-file: '/opt/etcd/ssl/etcd.pem'
  key-file: '/opt/etcd/ssl/etcd-key.pem'
  peer-client-cert-auth: true
  trusted-ca-file: '/opt/etcd/ssl/ca.pem'
  auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF

cat > /opt/etcd/cfg/etcd.config.yml << EOF 
name: 'etcd-2'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://10.1.1.120:2380'
listen-client-urls: 'https://10.1.1.120:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://10.1.1.120:2380'
advertise-client-urls: 'https://10.1.1.120:2379,http://127.0.0.1:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
  cert-file: '/opt/etcd/ssl/etcd.pem'
  key-file: '/opt/etcd/ssl/etcd-key.pem'
  client-cert-auth: true
  trusted-ca-file: '/opt/etcd/ssl/ca.pem'
  auto-tls: true
peer-transport-security:
  cert-file: '/opt/etcd/ssl/etcd.pem'
  key-file: '/opt/etcd/ssl/etcd-key.pem'
  peer-client-cert-auth: true
  trusted-ca-file: '/opt/etcd/ssl/ca.pem'
  auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF

cat > /opt/etcd/cfg/etcd.config.yml << EOF 
name: 'etcd-3'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://10.1.1.130:2380'
listen-client-urls: 'https://10.1.1.130:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://10.1.1.130:2380'
advertise-client-urls: 'https://10.1.1.130:2379,http://127.0.0.1:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'new'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
  cert-file: '/opt/etcd/ssl/etcd.pem'
  key-file: '/opt/etcd/ssl/etcd-key.pem'
  client-cert-auth: true
  trusted-ca-file: '/opt/etcd/ssl/ca.pem'
  auto-tls: true
peer-transport-security:
  cert-file: '/opt/etcd/ssl/etcd.pem'
  key-file: '/opt/etcd/ssl/etcd-key.pem'
  peer-client-cert-auth: true
  trusted-ca-file: '/opt/etcd/ssl/ca.pem'
  auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false
EOF

cat > /usr/lib/systemd/system/etcd.service << EOF
[Unit]
Description=Etcd Service
Documentation=https://coreos.com/etcd/docs/latest/
After=network.target
After=network-online.target
Wants=network-online.target
 
[Service]
Type=notify
ExecStart=/opt/etcd/etcd --config-file=/opt/etcd/cfg/etcd.config.yml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
 
[Install]
WantedBy=multi-user.target
Alias=etcd3.service
EOF

启动etcd

1 2	systemctl daemon-reload systemctl enable --now etcd

查看etcd集群状态

export ETCDCTL_API=3
/opt/etcd/etcdctl --endpoints="10.1.1.100:2379,10.1.1.120:2379,10.1.1.130:2379" --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl//etcd.pem --key=/opt/etcd/ssl/etcd-key.pem  endpoint status --write-out=table
+-----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------+------------------+---------+---------+-----------+-----------+------------+
| 10.1.1.100:2379 | 4988e076821369e3 |  3.3.22 |   20 kB |      true |        86 |          9 |
| 10.1.1.120:2379 | 2612ebaf51b393a5 |  3.3.22 |   20 kB |     false |        86 |          9 |
| 10.1.1.130:2379 | 8de0ef816eba4013 |  3.3.22 |   20 kB |     false |        86 |          9 |
+-----------------+------------------+---------+---------+-----------+-----------+------------+

etcd常见报错

问题背景：
当前部署了 3 个 etcd 节点，突然有一天 3 台集群全部停电宕机了。重新启动之后发现 K8S 集群是可以正常使用的，但是检查了一遍组件之后，发现有一个节点的 etcd 启动不了。
经过一遍探查，发现时间不准确，通过以下命令 ntpdate ntp.aliyun.com 重新将时间调整正确，重新启动 etcd，发现还是起不来，报错如下：

Jun 26 05:38:12 moban etcd: listening for peers on https://10.1.1.120:2380
Jun 26 05:38:12 moban etcd: ignoring client auto TLS since certs given
Jun 26 05:38:12 moban etcd: pprof is enabled under /debug/pprof
Jun 26 05:38:12 moban etcd: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files.
Jun 26 05:38:12 moban etcd: The scheme of client url http://127.0.0.1:2379 is HTTP while client cert auth (--client-cert-auth) is enabled. Ignored client cert auth for this url.

解决方法：
检查日志发现并没有特别明显的错误，根据经验来讲，etcd 节点坏掉一个其实对集群没有大的影响，这时集群已经可以正常使用了，但是这个坏掉的 etcd 节点并没有启动，解决方法如下：
进入 etcd 的数据存储目录进行备份备份原有数据：
cd /var/lib/etcd/member/
cp * /data/bak/
删除这个目录下的所有数据文件
rm -rf /var/lib/etcd/default.etcd/member/*
停止另外两台 etcd 节点，因为 etcd 节点启动时需要所有节点一起启动，启动成功后即可使用。