Kubernetes运维之容器编排Deployment动态扩缩容

HPA简介

官方文档 | Kubernetes

HPA(Horizontal Pod Autoscaler)的实现是一个控制循环,由controller manager的–horizontal-pod-autoscaler-sync-period参数指定周期(默认值为15秒)。每个周期内,controller manager根据每个HorizontalPodAutoscaler定义中指定的指标查询资源利用率。controller manager可以从resource metrics API(pod 资源指标)和custom metrics API(自定义指标)获取指标。

  • 对于每个pod的资源指标(如CPU),控制器从资源指标API中获取每一个 HorizontalPodAutoscaler指定的pod的指标,然后,如果设置了目标使用率,控制器获取每个pod中的容器资源使用情况,并计算资源使用率。如果使用原始值,将直接使用原始数据(不再计算百分比)。然后,控制器根据平均的资源使用率或原始值计算出缩放的比例,进而计算出目标副本数。需要注意的是,如果pod某些容器不支持资源采集,那么控制器将不会使用该pod的CPU使用率
  • 如果 pod 使用自定义指标,控制器机制与资源指标类似,区别在于自定义指标只使用原始值,而不是使用率。
  • 如果pod 使用对象指标和外部指标(每个指标描述一个对象信息)。这个指标将直接跟据目标设定值相比较,并生成一个上面提到的缩放比例。在autoscaling/v2beta2版本API中,这个指标也可以根据pod数量平分后再计算。通常情况下,控制器将从一系列的聚合API(metrics.k8s.io、custom.metrics.k8s.io和external.metrics.k8s.io)中获取指标数据。metrics.k8s.io API通常由 metrics-server(需要额外启动)提供。

准备开始

metrics-server是一个集群范围内的资源数据集和工具,同样的,metrics-server也只是显示数据,并不提供数据存储服务,主要关注的是资源度量API的实现,比如CPU、文件描述符、内存、请求延时等指标,metric-server收集数据给k8s集群内使用,如kubectl,hpa,scheduler等,请参考 metrics-server 文档

准备资源配置清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- pods
- nodes
- nodes/stats
- namespaces
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
image: wangxiansen/metrics-server:v0.5.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 4443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100

填坑过程

问题一

启动metrics server报证书错误:x509: cannot validate certificate for x.x.x.x because it doesn’t contain any IP SANs” node=”k8s-master1”

Failed probe” probe=”metric-storage-ready” err=”no metrics to serve”

解决:

1
2
3
4
5
# 添加参数(二选一)
- --kubelet-insecure-tls
# 或者(二选一)
- --tls-cert-file=/opt/kubernetes/ssl/ca.pem
- --tls-private-key-file=/opt/kubernetes/ssl/pki/ca-key.pem

问题二:

metrics server 一直未ready,查看日志报错:Failed to scrape node” err=”Get "https://x.x.x.x:10250/metrics/resource\“: context deadline exceeded” scraper.go:140] “Failed to scrape node” err=”Get "https://k8s-node1:10250/metrics/resource\“: context deadline exceeded” node=”linshi-k8s-54”
server.go:187] “Failed probe” probe=”metric-storage-ready” err=”no metrics to serve”

解决:

1
保持--kubelet-preferred-address-types和apiserver一致

问题三

使用kubectl top nodes报错:

Error from server (Forbidden): nodes.metrics.k8s.io is forbidden: User “kubernetes” cannot list resource “nodes” in API group “metrics.k8s.io” at the cluster scope

解决:

报错提示为RBAC权限问题,给kubernetes用户授权如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
cat /opt/kubernetes/cfg/apiserver-to-kubelet-rbac.yaml 
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-apiserver-to-kubelet
rules:
- apiGroups:
- "*"
resources:
- nodes # 新增
- nodes/proxy
- nodes/stats
- nodes/log
- nodes/spec
- nodes/metrics
- pods/log
verbs:
- "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-apiserver
namespace: ""
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-apiserver-to-kubelet
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kubernetes

资源配置清单

1
kubectl apply -f metrics.yml

执行命令查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@k8s-master1 ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master1 197m 9% 1158Mi 61%
k8s-node1 104m 5% 880Mi 46%
k8s-node2 103m 5% 767Mi 40%
[root@k8s-master1 ~]# kubectl top pods -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system coredns-5b5b4cb755-mhtwl 5m 14Mi
kube-system dashboard-metrics-scraper-65cc6d887d-c26b2 1m 4Mi
kube-system kubernetes-dashboard-757b689f8b-fkkt6 1m 36Mi
kube-system metrics-server-6b8679546c-7d7zz 5m 15Mi
kube-system traefik-ingress-controller-9dj29 7m 29Mi
kube-system traefik-ingress-controller-bvtgx 6m 19Mi
kube-system traefik-ingress-controller-dv9rd 6m 17Mi

动态扩缩容

制作docker镜像

为了演示 Horizontal Pod Autoscaler,我们将使用一个基于 php-apache 镜像的 定制 Docker 镜像。

1
2
3
FROM php:5-apache
COPY index.php /var/www/html/index.php
RUN chmod a+rx index.php
1
2
3
4
5
6
7
<?php
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
$x += sqrt($x);
}
echo "OK!";
?>

准备资源配置文件

vim php-apache.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: wangxiansen/php-hpa
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache

运行下面的命令:

1
kubectl apply -f php-apache.yml

创建 Horizontal Pod Autoscaler

HPA的API有三个版本,通过kubectl api-versions | grep autoscal可看到

autoscaling/v1

autoscaling/v2beta1

autoscaling/v2beta2

  • autoscaling/v1只支持基于CPU指标的缩放;
  • autoscaling/v2beta1支持Resource Metrics(资源指标,如pod的CPU)和Custom Metrics(自定义指标)的缩放;
  • autoscaling/v2beta2支持Resource Metrics(资源指标,如pod的CPU)和Custom Metrics(自定义指标)和ExternalMetrics(额外指标)的缩放。

以下命令将创建一个 Horizontal Pod Autoscaler 用于控制我们上一步骤中创建的 Deployment,使 Pod 的副本数量维持在 1 到 10 之间。 大致来说,HPA 将(通过 Deployment)增加或者减少 Pod 副本的数量以保持所有 Pod 的平均 CPU 利用率在 50% 左右(由于每个 Pod 请求 200 毫核的 CPU,这意味着平均 CPU 用量为 100 毫核)。

1
2
[root@k8s-master1 ~]# kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled

我们可以通过以下命令查看 Autoscaler 的状态:

1
2
3
[root@k8s-master1 ~]# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache <unknown>/50% 1 10 0 6s

获取yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@k8s-master1 ~]# kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10 --dry-run=client -o yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef: # 将要扩展的目标引用
apiVersion: apps/v1
kind: Deployment
name: php-apache
targetCPUUtilizationPercentage: 50 # cpu使用超过50%就扩容,低于就缩容。

增加负载压测

1
2
3
4
5
6
7
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
#
[root@k8s-master1 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 192.168.0.1 <none> 443/TCP 14h
php-apache ClusterIP 192.168.115.15 <none> 80/TCP 44m
[root@k8s-master1 ~]# while true; do sleep 0.01; curl 192.168.115.15; done

HPA 将自动缩减时间默认5min。

金丝雀部署

金丝雀发布一般是先发1台机器,或者一个小比例,例如2%的服务器,主要做流量验证用,也称为金丝雀 (Canary) 测试,国内常称灰度测试。以前旷工下矿前,会先放一只金丝雀进去用于探测洞里是否有有毒气体,看金丝雀能否活下来,金丝雀发布由此得名。简单的金丝雀测试一般通过手工测试验证,复杂的金丝雀测试需要比较完善的监控基础设施配合,通过监控指标反馈,观察金丝雀的健康状况,作为后续发布或回退的依据。如果金丝测试通过,则把剩余的 V1 版本全部升级为 V2 版本。如果金丝雀测试失败,则直接回退金丝雀,发布失败。

实现金丝雀部署

主要步骤:

  1. 部署v1版本的应用,此时service访问的都是v1版本的服务
  2. 部署v2版本,数量为x/10,同时缩小v1版本的数量x/10,此时有x/10的流量到v2版本的服务
  3. 逐步缩小v1,扩大v2,最终v2版本替换全部的v1

搭建模拟的服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: v1
kind: Service
metadata:
name: my-app
labels:
app: my-app
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: http
selector:
app: my-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-v1
labels:
app: my-app
spec:
replicas: 4
selector:
matchLabels:
app: my-app
version: v1.0.0
template:
metadata:
labels:
app: my-app
version: v1.0.0
spec:
containers:
- name: my-app
image: containersol/k8s-deployment-strategies
ports:
- name: http
containerPort: 8080
- name: probe
containerPort: 8086
env:
- name: VERSION
value: v1.0.0
livenessProbe:
httpGet:
path: /live
port: probe
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: probe
periodSeconds: 5

应用资源配置清单

1
2
3
4
5
6
7
8
[root@k8s-master1 ~]# kubectl apply -f appv1.yml --record
[root@k8s-master1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
my-app-v1-84ff7f48cc-c7lzl 1/1 Running 0 2m39s
my-app-v1-84ff7f48cc-wgclp 1/1 Running 0 2m39s
[root@k8s-master1 ~]# kubectl get svc my-app
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-app NodePort 192.168.26.251 <none> 80:31501/TCP 4m18s

验证

1
2
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v1-84ff7f48cc-wgclp, Version: v1.0.0

应用使用金丝雀部署方式来升级

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-v2
labels:
app: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
version: v2.0.0
template:
metadata:
labels:
app: my-app
version: v2.0.0
spec:
containers:
- name: my-app
image: containersol/k8s-deployment-strategies
ports:
- name: http
containerPort: 8080
- name: probe
containerPort: 8086
env:
- name: VERSION
value: v2.0.0
livenessProbe:
httpGet:
path: /live
port: probe
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: probe
periodSeconds: 5

应用资源

1
2
[root@k8s-master1 ~]# kubectl apply -f appv2.yml --record
deployment.apps/my-app-v2 created

此时可以看到,my-app-v2启动了1个

1
2
3
4
[root@k8s-master1 ~]# kubectl get --watch deployment
NAME READY UP-TO-DATE AVAILABLE AGE
my-app-v1 4/4 2 2 8m14s
my-app-v2 1/1 1 1 60s

升级验证

此时,这样通过service的负载均衡,my-app-v2会承接到%10(1/5)的流量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v2-dfdff8845-89m2d, Version: v2.0.0
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v1-84ff7f48cc-d9tvz, Version: v1.0.0
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v1-84ff7f48cc-cgs57, Version: v1.0.0
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v1-84ff7f48cc-c7lzl, Version: v1.0.0
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v1-84ff7f48cc-wgclp, Version: v1.0.0
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v1-84ff7f48cc-wgclp, Version: v1.0.0
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v2-dfdff8845-89m2d, Version: v2.0.0

验证通过后,我们逐步将my-app-v2扩容到5个,将my-app-v1缩小到0个

1
2
kubectl scale --replicas=5 deploy my-app-v2
kubectl delete deploy my-app-v1

再次验证服务,会发现my-app-v2承接了所有流量:

1
2
[root@k8s-master1 ~]# curl 192.168.26.251
Host: my-app-v2-dfdff8845-89m2d, Version: v2.0.0