二进制安装Kubernete(k8s) v1.24.0 环境准备 主机名 角色 IP 安装软件 k8s-master.boysec.cn 代理节点 10.1.1.100 etcd、kueblet、kube-porxy、kube-apiserver、kube-controller-manager、kube-scheduler、Containerd k8s-node01.boysec.cn 运算节点 10.1.1.120 etcd、kueblet、kube-porxy、Containerd k8s-node02.boysec.cn 运算节点 10.1.1.130 etcd、kueblet、kube-porxy、Containerd
3台vm,每台至少2g。 OS: CentOS 7.9 containerd:v1.6.4 kubernetes:v1.24 etcd:v3.3.22 flannel:v0.12.0 证书签发工具CFSSL: V1.6.0 本次使用单master节点部署,需要多master请移步至一步步编译安装Kubernetes之master计算节点安装
安装CFSSL CFSSL相关下载地址
1 2 3 4 wget -O /usr/bin/cfssl https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssl_1.6.0_linux_amd64 wget -O /usr/bin/cfssljson https://github.com/cloudflare/cfssl/releases/download/v1.6.0/cfssljson_1.6.0_linux_amd64 chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson
创建CA证书JSON配置文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 mkdir /opt/certs/ -p cd /opt/certs/ cat > /opt/certs/ca-csr.json << EOF { "CN" : "kubernetes-ca" , "hosts" : [ ] , "key" : { "algo" : "rsa" , "size" : 2048 } , "names" : [ { "C" : "CN" , "ST" : "beijing" , "L" : "beijing" , "O" : "system:masters" , "OU" : "kubernetes" } ] , "ca" : { "expiry" : "876000h" } } EOF ## 生成CA公钥和私钥文件 cfssl gencert -initca ca-csr.json | cfssljson -bare ca -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 cat > /opt/certs/ca-config.json << EOF { "signing" : { "default" : { "expiry" : "876000h" } , "profiles" : { "kubernetes" : { "expiry" : "876000h" , "usages" : [ "signing" , "key encipherment" , "server auth" , "client auth" ] } , "etcd" : { "expiry" : "876000h" , "usages" : [ "signing" , "key encipherment" , "server auth" , "client auth" ] } } } } EOF
k8s基本组件安装 安装Containerd 在k8s所有节点上安装Containerd作为Runtime
1 2 3 4 5 6 7 8 cd /server/tools/ wget https://github.com/containerd/containerd/releases/download/v1.6.4/cri-containerd-cni-1.6.4-linux-amd64.tar.gz mkdir /opt/containerd-1.6.4 tar xf cri-containerd-cni-1.6.4-linux-amd64.tar.gz -C /opt/containerd-1.6.4 cd /opt/containerd-1.6.4/ ln -s /opt/containerd-1.6.4/usr/local/bin/* /usr/local/bin/ # cp /opt/containerd-1.6.4/etc/systemd/system/containerd.service /usr/lib/systemd/system/
配置Containerd所需的模块 1 2 3 4 5 6 cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf overlay br_netfilter EOF # systemctl restart systemd-modules-load.service
配置Containerd所需的内核 1 2 3 4 5 6 7 8 cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF # 加载内核 sysctl --system
配置runc支持 containerd 被设计成嵌入到一个更大的系统中,而不是直接由开发人员或终端用户使用。当 containerd 和 runC 成为标准化容器服务的基石后,上层的应用就可以直接建立在 containerd 和 runC 之上。我们的目的就是开发一个最小化容器系统,这需要containerd和runC的支持,使得Linux kernel在启动的时候,首先启动containerd而非init,并在容器中包含系统必要组件,如shell。但是containerd安装包中runc缺少undefined symbol: seccomp_notify_respond
需要单独下载安装
1 2 wget -O /usr/local/sbin/runc https://github.com/opencontainers/runc/releases/download/v1.1.2/runc.amd64 chmod +x /usr/local/sbin/runc
创建Containerd的配置文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 mkdir -p /etc/containerd containerd config default > /etc/containerd/config.toml # 1. 修改Containerd的配置文件(任选1 2) sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml cat /etc/containerd/config.toml | grep SystemdCgroup # 2. 找到containerd.runtimes.runc.options,在其下加入SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = true [plugins."io.containerd.grpc.v1.cri".cni] # 3. 添加阿里镜像源 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://l2v84zex.mirror.aliyuncs.com"] # 设置你的阿里镜像源 # 4. 将sandbox_image默认地址改为符合版本地址 sandbox_image = "kubernetes/pause" # 默认可能会被墙。
配置cni网络 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 cat > /etc/cni/net.d/10-flannel.conflist <<EOF { "name": "flannel", "cniVersion": "0.3.1", "plugins": [ { "type": "flannel", "delegate": { "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } EOF mkdir /opt/cni/bin -p ln -s /opt/containerd-1.6.4/opt/cni/bin/* /opt/cni/bin/
启动并设置为开机启动 1 2 systemctl daemon-reload systemctl enable --now containerd
配置crictl客户端连接的运行时位置 1 2 3 4 5 6 7 8 9 10 cat > /etc/crictl.yaml <<EOF runtime-endpoint: unix:///run/containerd/containerd.sock image-endpoint: unix:///run/containerd/containerd.sock timeout: 10 debug: false EOF # 测试 systemctl restart containerd crictl info
安装etcd集群 创建证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 cat > /opt/certs/etcd-csr.json << EOF { "CN" : "etcd-peer" , "hosts" : [ "10.1.1.100" , "10.1.1.110" , "10.1.1.120" , "10.1.1.130" ] , "key" : { "algo" : "rsa" , "size" : 2048 } , "names" : [ { "C" : "CN" , "ST" : "beijing" , "L" : "beijing" , "O" : "etcd" , "OU" : "Etcd Security" } ] } EOF ## 生成证书 cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd-csr.json |cfssljson -bare etcd # 或者 cfssl gencert \ -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -hostname=10.1 .1 .100 , 10.1 .1 .110 , 10.1 .1 .120 , 10.1 .1 .130 \ -profile=etcd \ etcd-csr.json | cfssljson -bare etcd
安装etcd etcd下载地址
1 2 3 4 5 6 7 8 9 10 11 12 13 14 # useradd -s /sbin/nologin -M etcd # cd /server/tools tar xf etcd-v3.3.22-linux-amd64.tar.gz -C /opt ln -s /opt/etcd-v3.3.22-linux-amd64 /opt/etcd # mkdir -p /opt/etcd/{ssl,cfg} # chown etcd.etcd /opt/etcd/ssl/* chmod 600 /opt/etcd/ssl/etcd-key.pem
etcd-1 etcd-2 etcd-3 etcd.service 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 cat > /opt/etcd/cfg/etcd.config.yml << EOF name: 'etcd-1' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://10.1.1.100:2380' listen-client-urls: 'https://10.1.1.100:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://10.1.1.100:2380' advertise-client-urls: 'https://10.1.1.100:2379,http://127.0.0.1:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'new' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/opt/etcd/ssl/etcd.pem' key-file: '/opt/etcd/ssl/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/opt/etcd/ssl/ca.pem' auto-tls: true peer-transport-security: cert-file: '/opt/etcd/ssl/etcd.pem' key-file: '/opt/etcd/ssl/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/opt/etcd/ssl/ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false EOF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 cat > /opt/etcd/cfg/etcd.config.yml << EOF name: 'etcd-2' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://10.1.1.120:2380' listen-client-urls: 'https://10.1.1.120:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://10.1.1.120:2380' advertise-client-urls: 'https://10.1.1.120:2379,http://127.0.0.1:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'new' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/opt/etcd/ssl/etcd.pem' key-file: '/opt/etcd/ssl/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/opt/etcd/ssl/ca.pem' auto-tls: true peer-transport-security: cert-file: '/opt/etcd/ssl/etcd.pem' key-file: '/opt/etcd/ssl/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/opt/etcd/ssl/ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false EOF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 cat > /opt/etcd/cfg/etcd.config.yml << EOF name: 'etcd-3' data-dir: /var/lib/etcd wal-dir: /var/lib/etcd/wal snapshot-count: 5000 heartbeat-interval: 100 election-timeout: 1000 quota-backend-bytes: 0 listen-peer-urls: 'https://10.1.1.130:2380' listen-client-urls: 'https://10.1.1.130:2379,http://127.0.0.1:2379' max-snapshots: 3 max-wals: 5 cors: initial-advertise-peer-urls: 'https://10.1.1.130:2380' advertise-client-urls: 'https://10.1.1.130:2379,http://127.0.0.1:2379' discovery: discovery-fallback: 'proxy' discovery-proxy: discovery-srv: initial-cluster: 'etcd-1=https://10.1.1.100:2380,etcd-2=https://10.1.1.120:2380,etcd-3=https://10.1.1.130:2380' initial-cluster-token: 'etcd-k8s-cluster' initial-cluster-state: 'new' strict-reconfig-check: false enable-v2: true enable-pprof: true proxy: 'off' proxy-failure-wait: 5000 proxy-refresh-interval: 30000 proxy-dial-timeout: 1000 proxy-write-timeout: 5000 proxy-read-timeout: 0 client-transport-security: cert-file: '/opt/etcd/ssl/etcd.pem' key-file: '/opt/etcd/ssl/etcd-key.pem' client-cert-auth: true trusted-ca-file: '/opt/etcd/ssl/ca.pem' auto-tls: true peer-transport-security: cert-file: '/opt/etcd/ssl/etcd.pem' key-file: '/opt/etcd/ssl/etcd-key.pem' peer-client-cert-auth: true trusted-ca-file: '/opt/etcd/ssl/ca.pem' auto-tls: true debug: false log-package-levels: log-outputs: [default] force-new-cluster: false EOF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 cat > /usr/lib/systemd/system/etcd.service << EOF [Unit] Description=Etcd Service Documentation=https://coreos.com/etcd/docs/latest/ After=network.target After=network-online.target Wants=network-online.target [Service] Type=notify ExecStart=/opt/etcd/etcd --config-file=/opt/etcd/cfg/etcd.config.yml Restart=on-failure RestartSec=10 LimitNOFILE=65536 [Install] WantedBy=multi-user.target Alias=etcd3.service EOF
启动etcd 1 2 systemctl daemon-reload systemctl enable --now etcd
查看etcd集群状态 1 2 3 4 5 6 7 8 9 export ETCDCTL_API=3 /opt/etcd/etcdctl --endpoints="10.1.1.100:2379,10.1.1.120:2379,10.1.1.130:2379" --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl//etcd.pem --key=/opt/etcd/ssl/etcd-key.pem endpoint status --write-out=table +-----------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +-----------------+------------------+---------+---------+-----------+-----------+------------+ | 10.1.1.100:2379 | 4988e076821369e3 | 3.3.22 | 20 kB | true | 86 | 9 | | 10.1.1.120:2379 | 2612ebaf51b393a5 | 3.3.22 | 20 kB | false | 86 | 9 | | 10.1.1.130:2379 | 8de0ef816eba4013 | 3.3.22 | 20 kB | false | 86 | 9 | +-----------------+------------------+---------+---------+-----------+-----------+------------+
etcd常见报错 问题背景: 当前部署了 3 个 etcd 节点,突然有一天 3 台集群全部停电宕机了。重新启动之后发现 K8S 集群是可以正常使用的,但是检查了一遍组件之后,发现有一个节点的 etcd 启动不了。 经过一遍探查,发现时间不准确,通过以下命令 ntpdate ntp.aliyun.com 重新将时间调整正确,重新启动 etcd,发现还是起不来,报错如下:
1 2 3 4 5 Jun 26 05:38:12 moban etcd: listening for peers on https://10.1.1.120:2380 Jun 26 05:38:12 moban etcd: ignoring client auto TLS since certs given Jun 26 05:38:12 moban etcd: pprof is enabled under /debug/pprof Jun 26 05:38:12 moban etcd: The scheme of client url http://127.0.0.1:2379 is HTTP while peer key/cert files are presented. Ignored key/cert files. Jun 26 05:38:12 moban etcd: The scheme of client url http://127.0.0.1:2379 is HTTP while client cert auth (--client-cert-auth) is enabled. Ignored client cert auth for this url.
解决方法: 检查日志发现并没有特别明显的错误,根据经验来讲,etcd 节点坏掉一个其实对集群没有大的影响,这时集群已经可以正常使用了,但是这个坏掉的 etcd 节点并没有启动,解决方法如下: 进入 etcd 的数据存储目录进行备份 备份原有数据: cd /var/lib/etcd/member/ cp * /data/bak/ 删除这个目录下的所有数据文件 rm -rf /var/lib/etcd/default.etcd/member/* 停止另外两台 etcd 节点,因为 etcd 节点启动时需要所有节点一起启动,启动成功后即可使用。