分类 kubernetes 下的文章 - JustDoIt

标签搜索

累计撰写 112 篇文章
累计收到 4 条评论

搜索到 18 篇与的结果

2022-04-02
备份与迁移k8s集群神器前言一般来说大家都用etcd备份恢复k8s集群，但是有时候我们可能不小心删掉了一个namespace，假设这个ns里面有上百个服务，瞬间没了，怎么办？当然了，可以用CI/CD系统发布，但是时间会花费很久，这时候，vmvare的Velero出现了。velero可以帮助我们：灾备场景，提供备份恢复k8s集群的能力迁移场景，提供拷贝集群资源到其他集群的能力（复制同步开发，测试，生产环境的集群配置，简化环境配置）下面我就介绍一下如何使用 Velero 完成备份和迁移。Velero 地址：https://github.com/vmware-tanzu/veleroACK 插件地址：https://github.com/AliyunContainerService/velero-plugin下载 Velero 客户端Velero 由客户端和服务端组成，服务器部署在目标 k8s 集群上，而客户端则是运行在本地的命令行工具。前往 Velero 的 Release 页面下载客户端，直接在 GitHub 上下载即可解压 release 包将 release 包中的二进制文件 velero 移动到 $PATH 中的某个目录下执行 velero -h 测试部署velero-plugin插件拉取代码git clone https://github.com/AliyunContainerService/velero-plugin 配置修改#修改`install/credentials-velero`文件，将新建用户中获得的`AccessKeyID`和`AccessKeySecret`填入，这里的 OSS EndPoint 为之前 OSS 的访问域名 ALIBABA_CLOUD_ACCESS_KEY_ID=<ALIBABA_CLOUD_ACCESS_KEY_ID> ALIBABA_CLOUD_ACCESS_KEY_SECRET=<ALIBABA_CLOUD_ACCESS_KEY_SECRET> ALIBABA_CLOUD_OSS_ENDPOINT=<ALIBABA_CLOUD_OSS_ENDPOINT> #修改 `install/01-velero.yaml`，将 OSS 配置填入： --- apiVersion: v1 kind: ServiceAccount metadata: namespace: velero name: velero --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: component: velero name: velero roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: velero namespace: velero --- apiVersion: velero.io/v1 kind: BackupStorageLocation metadata: labels: component: velero name: default namespace: velero spec: config: region: cn-beijing objectStorage: bucket: k8s-backup-test prefix: test provider: alibabacloud --- apiVersion: velero.io/v1 kind: VolumeSnapshotLocation metadata: labels: component: velero name: default namespace: velero spec: config: region: cn-beijing provider: alibabacloud --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: velero namespace: velero spec: replicas: 1 selector: matchLabels: deploy: velero template: metadata: annotations: prometheus.io/path: /metrics prometheus.io/port: "8085" prometheus.io/scrape: "true" labels: component: velero deploy: velero spec: serviceAccountName: velero containers: - name: velero # sync from velero/velero:v1.2.0 image: registry.cn-hangzhou.aliyuncs.com/acs/velero:v1.2.0 imagePullPolicy: IfNotPresent command: - /velero args: - server - --default-volume-snapshot-locations=alibabacloud:default env: - name: VELERO_SCRATCH_DIR value: /scratch - name: ALIBABA_CLOUD_CREDENTIALS_FILE value: /credentials/cloud volumeMounts: - mountPath: /plugins name: plugins - mountPath: /scratch name: scratch - mountPath: /credentials name: cloud-credentials initContainers: - image: registry.cn-hangzhou.aliyuncs.com/acs/velero-plugin-alibabacloud:v1.2-991b590 imagePullPolicy: IfNotPresent name: velero-plugin-alibabacloud volumeMounts: - mountPath: /target name: plugins volumes: - emptyDir: {} name: plugins - emptyDir: {} name: scratch - name: cloud-credentials secret: secretName: cloud-credentials k8s 部署 Velero 服务# 新建 namespace kubectl create namespace velero # 部署 credentials-velero 的 secret kubectl create secret generic cloud-credentials --namespace velero --from-file cloud=install/credentials-velero # 部署 CRD kubectl apply -f install/00-crds.yaml # 部署 Velero kubectl apply -f install/01-velero.yaml 备份测试这里，我们将使用velero备份一个集群内相关的resource，并在当该集群出现一些故障或误操作的时候，能够快速恢复集群resource，首先我们用下面的yaml来部署：--- apiVersion: v1 kind: Namespace metadata: name: nginx-example labels: app: nginx --- apiVersion: apps/v1beta1 kind: Deployment metadata: name: nginx-deployment namespace: nginx-example spec: replicas: 2 template: metadata: labels: app: nginx spec: containers: - image: nginx:1.7.9 name: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: labels: app: nginx name: my-nginx namespace: nginx-example spec: ports: - port: 80 targetPort: 80 selector: app: nginx 我们可以全量备份，也可以只备份需要备份的一个namespace，本处只备份一个namespace：nginx-example[rsync@velero-plugin]$ kubectl get pods -n nginx-example NAME READY STATUS RESTARTS AGE nginx-deployment-5c689d88bb-f8vsx 1/1 Running 0 6m31s nginx-deployment-5c689d88bb-rt2zk 1/1 Running 0 6m32s [rsync@velero]$ cd velero-v1.4.0-linux-amd64/ [[email protected]]$ ll total 56472 drwxrwxr-x 4 rsync rsync 4096 Jun 1 15:02 examples -rw-r--r-- 1 rsync rsync 10255 Dec 10 01:08 LICENSE -rwxr-xr-x 1 rsync rsync 57810814 May 27 04:33 velero [[email protected]]$ ./velero backup create nginx-backup --include-namespaces nginx-example --wait Backup request "nginx-backup" submitted successfully. Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background. . Backup completed with status: Completed. You may check for more information using the commands `velero backup describe nginx-backup` and `velero backup logs nginx-backup`. 删除ns[[email protected]]$ kubectl delete namespaces nginx-example namespace "nginx-example" deleted [[email protected]]$ kubectl get pods -n nginx-example No resources found. 恢复[[email protected]]$ ./velero restore create --from-backup nginx-backup --wait Restore request "nginx-backup-20200603180922" submitted successfully. Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background. Restore completed with status: Completed. You may check for more information using the commands `velero restore describe nginx-backup-20200603180922` and `velero restore logs nginx-backup-20200603180922`. [[email protected]]$ kubectl get pods -n nginx-example NAME READY STATUS RESTARTS AGE nginx-deployment-5c689d88bb-f8vsx 1/1 Running 0 5s nginx-deployment-5c689d88bb-rt2zk 0/1 ContainerCreating 0 5s 可以看到已经恢复了另外迁移和备份恢复也是一样的，下面看一个特殊的，再部署一个项目，之后恢复会不会删掉新部署的项目。新建了一个tomcat容器[rsync@tomcat-test]$ kubectl get pods -n nginx-example NAME READY STATUS RESTARTS AGE nginx-deployment-5c689d88bb-f8vsx 1/1 Running 0 65m nginx-deployment-5c689d88bb-rt2zk 1/1 Running 0 65m tomcat-test-sy-677ff78f6b-rc5vq 1/1 Running 0 7s restore 一下[[email protected]]$ ./velero restore create --from-backup nginx-backup Restore request "nginx-backup-20200603191726" submitted successfully. Run `velero restore describe nginx-backup-20200603191726` or `velero restore logs nginx-backup-20200603191726` for more details. [[email protected]]$ kubectl get pods -n nginx-example NAME READY STATUS RESTARTS AGE nginx-deployment-5c689d88bb-f8vsx 1/1 Running 0 68m nginx-deployment-5c689d88bb-rt2zk 1/1 Running 0 68m tomcat-test-sy-677ff78f6b-rc5vq 1/1 Running 0 2m33s 可以看到没有覆盖删除nginx的deployment，在restore[[email protected]]$ kubectl delete deployment nginx-deployment -n nginx-example deployment.extensions "nginx-deployment" deleted [[email protected]]$ kubectl get pods -n nginx-example NAME READY STATUS RESTARTS AGE tomcat-test-sy-677ff78f6b-rc5vq 1/1 Running 0 4m18s [[email protected]]$ ./velero restore create --from-backup nginx-backup Restore request "nginx-backup-20200603191949" submitted successfully. Run `velero restore describe nginx-backup-20200603191949` or `velero restore logs nginx-backup-20200603191949` for more details. [[email protected]]$ kubectl get pods -n nginx-example NAME READY STATUS RESTARTS AGE nginx-deployment-5c689d88bb-f8vsx 1/1 Running 0 2s nginx-deployment-5c689d88bb-rt2zk 0/1 ContainerCreating 0 2s tomcat-test-sy-677ff78f6b-rc5vq 1/1 Running 0 4m49s 可以看到，对我们的tomcat项目是没影响的。结论：velero恢复不是直接覆盖，而是会恢复当前集群中不存在的resource，已有的resource不会回滚到之前的版本，如需要回滚，需在restore之前提前删除现有的resource。高级用法可以设置一个周期性定时备份# 每日1点进行备份 velero create schedule <SCHEDULE NAME> --schedule="0 1 * * *" # 每日1点进行备份，备份保留48小时 velero create schedule <SCHEDULE NAME> --schedule="0 1 * * *" --ttl 48h # 每6小时进行一次备份 velero create schedule <SCHEDULE NAME> --schedule="@every 6h" # 每日对 web namespace 进行一次备份 velero create schedule <SCHEDULE NAME> --schedule="@every 24h" --include-namespaces web 定时备份的名称为：`<SCHEDULE NAME>-<TIMESTAMP>`，恢复命令为：`velero restore create --from-backup <SCHEDULE NAME>-<TIMESTAMP>`。如需备份恢复持久卷，备份如下：velero backup create nginx-backup-volume --snapshot-volumes --include-namespaces nginx-example 该备份会在集群所在region给云盘创建快照（当前还不支持NAS和OSS存储），快照恢复云盘只能在同region完成。恢复命令如下：velero restore create --from-backup nginx-backup-volume --restore-volumes 删除备份方法一，通过命令直接删除velero delete backups default-backup 方法二，设置备份自动过期，在创建备份时，加上TTL参数velero backup create <BACKUP-NAME> --ttl <DURATION> 还可为资源添加指定标签，添加标签的资源在备份的时候被排除。# 添加标签 kubectl label -n <ITEM_NAMESPACE> <RESOURCE>/<NAME> velero.io/exclude-from-backup=true # 为 default namespace 添加标签 kubectl label -n default namespace/default velero.io/exclude-from-backup=true 参考链接https://yq.aliyun.com/articles/705007?spm=a2c4e.11163080.searchblog.140.1a8b2ec1TYJPbF 院长技术
- 2022年04月02日
- 56 阅读
- 0 评论
- 0 点赞
2022-04-02
K8s基于自定义指标实现自动扩容基于自定义指标除了基于 CPU 和内存来进行自动扩缩容之外，我们还可以根据自定义的监控指标来进行。这个我们就需要使用 Prometheus Adapter，Prometheus 用于监控应用的负载和集群本身的各种指标，Prometheus Adapter 可以帮我们使用 Prometheus 收集的指标并使用它们来制定扩展策略，这些指标都是通过 APIServer 暴露的，而且 HPA 资源对象也可以很轻易的直接使用。下面来看具体怎么实现的！部署应用首先，我们部署一个示例应用，在该应用程序上测试 Prometheus 指标自动缩放，资源清单文件如下所示：（podinfo.yaml）--- apiVersion: apps/v1 kind: Deployment metadata: name: podinfo spec: selector: matchLabels: app: podinfo replicas: 1 template: metadata: labels: app: podinfo annotations: prometheus.io/scrape: 'true' spec: containers: - name: podinfod image: stefanprodan/podinfo:0.0.1 imagePullPolicy: Always command: - ./podinfo - -port=9898 - -logtostderr=true - -v=2 volumeMounts: - name: metadata mountPath: /etc/podinfod/metadata readOnly: true ports: - containerPort: 9898 protocol: TCP readinessProbe: httpGet: path: /readyz port: 9898 initialDelaySeconds: 1 periodSeconds: 2 failureThreshold: 1 livenessProbe: httpGet: path: /healthz port: 9898 initialDelaySeconds: 1 periodSeconds: 3 failureThreshold: 2 resources: requests: memory: "32Mi" cpu: "1m" limits: memory: "256Mi" cpu: "100m" volumes: - name: metadata downwardAPI: items: - path: "labels" fieldRef: fieldPath: metadata.labels - path: "annotations" fieldRef: fieldPath: metadata.annotations --- apiVersion: v1 kind: Service metadata: name: podinfo labels: app: podinfo spec: type: NodePort ports: - port: 9898 targetPort: 9898 nodePort: 31198 protocol: TCP selector: app: podinfo 接下来我们将 Prometheus-Adapter 安装到集群中，这里选用helm安装，当然也可以直接yaml文件安装。Prometheus-Adapter规则Prometheus-Adapter 规则大致可以分为以下几个部分：seriesQuery：查询 Prometheus 的语句，通过这个查询语句查询到的所有指标都可以用于 HPA seriesFilters：查询到的指标可能会存在不需要的，可以通过它过滤掉。 resources：通过 seriesQuery 查询到的只是指标，如果需要查询某个 Pod 的指标，肯定要将它的名称和所在的命名空间作为指标的标签进行查询，resources 就是将指标的标签和 k8s 的资源类型关联起来，最常用的就是 pod 和 namespace。有两种添加标签的方式，一种是 overrides，另一种是 template。 overrides：它会将指标中的标签和 k8s 资源关联起来。上面示例中就是将指标中的 pod 和 namespace 标签和 k8s 中的 pod 和 namespace 关联起来，因为 pod 和 namespace 都属于核心 api 组，所以不需要指定 api 组。当我们查询某个 pod 的指标时，它会自动将 pod 的名称和名称空间作为标签加入到查询条件中。比如 pod: {group: "apps", resource: "deployment"} 这么写表示的就是将指标中 podinfo 这个标签和 apps 这个 api 组中的 deployment 资源关联起来； template：通过 go 模板的形式。比如template: "kube_<<.Group>>_<<.Resource>>" 这么写表示，假如 <<.Group>> 为 apps，<<.Resource>> 为 deployment，那么它就是将指标中 kube_apps_deployment 标签和 deployment 资源关联起来。 name：用来给指标重命名的，之所以要给指标重命名是因为有些指标是只增的，比如以 total 结尾的指标。这些指标拿来做 HPA 是没有意义的，我们一般计算它的速率，以速率作为值，那么此时的名称就不能以 total 结尾了，所以要进行重命名。 matches：通过正则表达式来匹配指标名，可以进行分组 as：默认值为 $1，也就是第一个分组。as 为空就是使用默认值的意思。 metricsQuery：这就是 Prometheus 的查询语句了，前面的 seriesQuery 查询是获得 HPA 指标。当我们要查某个指标的值时就要通过它指定的查询语句进行了。可以看到查询语句使用了速率和分组，这就是解决上面提到的只增指标的问题。 Series：表示指标名称 LabelMatchers：附加的标签，目前只有 pod 和 namespace 两种，因此我们要在之前使用 resources 进行关联 GroupBy：就是 pod 名称，同样需要使用 resources 进行关联。安装我们新建 hpa-prome-adapter-values.yaml 文件覆盖默认的 Values 值，安装Prometheus-Adapter，我用的helm2文件如下：rules: default: false custom: - seriesQuery: 'http_requests_total' resources: overrides: kubernetes_namespace: resource: namespace kubernetes_pod_name: resource: pod name: matches: "^(.*)_total" as: "${1}_per_second" metricsQuery: (sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)) prometheus: url: http://prometheus-clusterip.monitor.svc.cluster.local 安装helm repo add apphub https://apphub.aliyuncs.com/ helm install --name prome-adapter --namespace monitor -f hpa-prome-adapter-values.yaml apphub/prometheus-adapter 等一小会儿，安装完成后，可以使用下面的命令来检测是否生效了：[root@prometheus]# kubectl get --raw="/apis/custom.metrics.k8s.io/v1beta1" | jq { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "custom.metrics.k8s.io/v1beta1", "resources": [ { "name": "namespaces/http_requests_per_second", "singularName": "", "namespaced": false, "kind": "MetricValueList", "verbs": [ "get" ] }, { "name": "pods/http_requests_per_second", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] } ] } 我们可以看到 http_requests_per_second 指标可用。现在，让我们检查该指标的当前值：[root@prometheus]# kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq . { "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/http_requests_per_second" }, "items": [ { "describedObject": { "kind": "Pod", "namespace": "default", "name": "podinfo-5cdc457c8b-99xtw", "apiVersion": "/v1" }, "metricName": "http_requests_per_second", "timestamp": "2020-06-02T12:01:01Z", "value": "888m", "selector": null }, { "describedObject": { "kind": "Pod", "namespace": "default", "name": "podinfo-5cdc457c8b-b7pfz", "apiVersion": "/v1" }, "metricName": "http_requests_per_second", "timestamp": "2020-06-02T12:01:01Z", "value": "888m", "selector": null } ] } 下面部署hpa对象apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: podinfo spec: scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 5 metrics: - type: Pods pods: metricName: http_requests_per_second targetAverageValue: 3 部署之后，可见：[root@prometheus-adapter]# kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 911m/10 2 5 2 70s [root@prometheus-adapter]# kubectl describe hpa Name: podinfo Namespace: default Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"podinfo","namespace":"default"},... CreationTimestamp: Tue, 02 Jun 2020 17:53:14 +0800 Reference: Deployment/podinfo Metrics: ( current / target ) "http_requests_per_second" on pods: 911m / 10 Min replicas: 2 Max replicas: 5 Deployment pods: 2 current / 2 desired Conditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric http_requests_per_second 做一个ab压测：ab -n 2000 -c 5 http://sy.test.com:31198/ 观察下hpa变化：Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 9m29s horizontal-pod-autoscaler New size: 3; reason: pods metric http_requests_per_second above target Normal SuccessfulRescale 9m18s horizontal-pod-autoscaler New size: 4; reason: pods metric http_requests_per_second above target Normal SuccessfulRescale 3m34s horizontal-pod-autoscaler New size: 3; reason: All metrics below target Normal SuccessfulRescale 3m4s horizontal-pod-autoscaler New size: 2; reason: All metrics below target 发现触发扩容动作了，副本到了4，并且压测结束后，过了5分钟左右，又恢复到最小值2个。参考链接:https://github.com/directxman12/k8s-prometheus-adapter院长技术
- 2022年04月02日
- 81 阅读
- 0 评论
- 1 点赞
2022-03-31
k8s在线命令 https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands 入口导航在线命令查询入口
- 2022年03月31日
- 116 阅读
- 0 评论
- 0 点赞