TrumanWong

K8s v1.28.1安装

TrumanWong
9/24/2022

系统要求

  • 一台兼容的 Linux 主机。Kubernetes 项目为基于 Debian 和 Red Hat 的 Linux 发行版以及一些不提供包管理器的发行版提供通用的指令。
  • 每台机器 2 GB 或更多的 RAM(如果少于这个数字将会影响你应用的运行内存)。
  • CPU 2 核心及以上。
  • 集群中的所有机器的网络彼此均能相互连接(公网和内网都可以)。
  • 节点之中不可以有重复的主机名、MAC 地址或 product_uuid。请参见这里了解更多详细信息。
  • 开启机器上的某些端口。请参见这里了解更多详细信息。
  • 禁用交换分区。为了保证 kubelet 正常工作,你 必须 禁用交换分区。

部署环境

k8s-master:192.168.146.130

node:192.168.146.131

k8s:v1.28.1

Redhat:9.2

1. 前置操作

# k8s-master主机,node主机则命名为node
$ sudo hostnamectl set-hostname k8s-master
# 编辑hosts文件,添加hosts
$ sudo vim /etc/hosts
192.168.146.130 k8s-master
192.168.146.131 node

# 关闭防火墙
$ sudo systemctl stop firewalld
$ sudo systemctl disable firewalld

# 关闭selinux
# 临时关闭
$ sudo setenforce 0
# 永久关闭
$ sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

# 关闭swapoff分区
# 临时关闭
$ sudo swapoff -a
# 永久关闭
$ sudo vim /etc/fstab
注释/删除有swap的那行

# 安装ipvsadm
$ sudo yum install ipvsadm ipset sysstat conntrack libseccomp
# 编辑ipvs.conf,加入以下配置
$ sudo vim /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
ip_tables
ip_set
xt_set
ipt_set
ipt_rpfilter
ipt_REJECT
ipip
overlay
br_netfilter
$ sudo modprobe overlay
$ sudo modprobe br_netfilter
$ sudo systemctl restart systemd-modules-load
$ sudo cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
# 使内核参数生效
$ sudo sysctl --system
# 保险起见,执行以下命令
$ sudo sysctl -w net.ipv4.ip_forward=1

2. 安装containerd runtime

$ wget https://github.com/containerd/containerd/releases/download/v1.7.5/containerd-1.7.5-linux-amd64.tar.gz
$ sudo tar Cxzvf /usr/local containerd-1.7.5-linux-amd64.tar.gz

添加systemd单元文件/usr/local/lib/systemd/system/containerd.service,加入以下配置

# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
#uncomment to fallback to legacy CRI plugin implementation with podsandbox support.
#Environment="DISABLE_CRI_SANDBOXES=1"
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd

Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

然后执行以下步骤

$ sudo systemctl daemon-reload
$ sudo systemctl enable containerd
# 安装runc
$ wget https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64
$ sudo install -m 755 runc.amd64 /usr/local/sbin/runc
# 安装cni
$ wget https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-linux-amd64-v1.3.0.tgz
$ sudo mkdir -p /opt/cni/bin
$ sudo tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.3.0.tgz
# 初始化默认配置(可选,安装后通常会生成默认配置文件)
$ containerd config default | sudo tee /etc/containerd/config.toml
# 修改containerd配置,更改cgroup,注释 disabled_plugins = ["cri"]
$ sudo sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml
$ sudo systemctl restart containerd

3. 安装kubeadm、kubectl、kubelet

$ cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

$ sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
$ sudo systemctl enable --now kubelet

4. k8s初始化

k8s-master主机:

$ sudo kubeadm config print init-defaults > kubeadm.yaml
# 记得修改localAPIEndpoint.advertiseAddress为你的服务器ip地址,将nodeRegistration.name改为k8s-master
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.146.129
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: k8s-master
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.25.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16
scheduler: {}
$ sudo kubeadm init --config kubeadm.yaml
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-7ddc4f45bc-rrxtx   1/1     Running   0          79s
kube-system   calico-node-8h85g                          1/1     Running   0          79s
kube-system   coredns-5dd5756b68-hlkgq                   1/1     Running   0          104s
kube-system   coredns-5dd5756b68-smx67                   1/1     Running   0          104s
kube-system   etcd-k8s-master                            1/1     Running   0          110s
kube-system   kube-apiserver-k8s-master                  1/1     Running   0          110s
kube-system   kube-controller-manager-k8s-master         1/1     Running   0          110s
kube-system   kube-proxy-pmg5l                           1/1     Running   0          105s
kube-system   kube-scheduler-k8s-master                  1/1     Running   0          110s

node主机上,根据k8s-master主机kubeadm init --config kubeadm.yaml输出结果信息:

...
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:
...
kubeadm join 192.168.146.130:6443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash sha256:9e29f5f10a5e7053bc01a0e56f742ffcb48e948518cb797be555edd9f069fe92
...

执行:

$ sudo kubeadm join 192.168.146.130:6443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash sha256:9e29f5f10a5e7053bc01a0e56f742ffcb48e948518cb797be555edd9f069fe92

5. 查看node

$ kubectl get node
NAME         STATUS   ROLES           AGE     VERSION
k8s-master   Ready    control-plane   5m18s   v1.28.1
node         Ready    <none>          2m33s   v1.28.1

测试k8s集群

$ vim nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.25.0
        ports:
        - containerPort: 80
          hostPort: 80
# 创建一个Deployment
$ kubectl apply -f nginx-deployment.yaml
$ kubectl get pod
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-5b9f674d44-62bx2   1/1     Running   0          49s
nginx-deployment-5b9f674d44-czddp   1/1     Running   0          49s
# 当显示如下信息时,即创建成功,分别访问http:192.168.146.130和http://192.168.146.131,访问正常则表示集群创建成功
$ kubectl get pod
NAME                                READY   STATUS    RESTARTS   AGE
nginx-deployment-7d46f698c6-8gbsp   1/1     Running   0          119s
nginx-deployment-7d46f698c6-pz7nd   1/1     Running   0          119s

常见异常解决方案

1. unknown service runtime.v1alpha2.RuntimeService

解决方案:

# 修改containerd配置,更改cgroup,注释 disabled_plugins = ["cri"]
$ sudo sed -i "s#SystemdCgroup\ \=\ false#SystemdCgroup\ \=\ true#g" /etc/containerd/config.toml

2. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

这是因为dockerkubelet服务中的cgroup驱动不一致。

为什么要修改dockercgroup driver

  1. 什么是cgroups

    cgroups(Control Groups)linux内核提供的一种机制,它可以限制、记录任务组所使用的物理资源,是内核附加在程序上的hook,使程序运行时对资源的调度触发相应的钩子,达到资源追踪和限制资源使用的目的。

  2. cgroupfs是什么?

    docker默认的Cgroup Drivercgroupfscgroupfscgroup为给用户提供的操作接口而开发的虚拟文件系统类型,它和sysfsproc类似,可以向用户展示cgrouphierachy,通知kernel用户对cgroup改动,对cgroup的查询和修改只能通过cgroupfs文件系统来进行。

  3. 为什么要修改为使用systemd

    kubernetes推荐使用systemd来代替cgroupfs。因为systemdkubernetes自带的cgroup管理器,负责为每个进程分配cgroups,但dockercgroup driver默认是cgroupfs,这样就同时运行有两个cgroup控制管理器,当资源有压力的时候,有可能出现不稳定的情况。

如果不修改配置,会在kubeadm init时提示:

[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. 
The recommended driver is "systemd". 
Please follow the guide at https://kubernetes.io/docs/setup/cri/

查看当前docker服务的cgroup driver

$ sudo docker info | grep Cgroup
 Cgroup Driver: cgroupfs
 Cgroup Version: 1

可以看到默认的Cgroup Drivercgroupfs

修改dockercgroup driversystemd

$ sudo vim /etc/docker/daemon.json
# 加入如下内容
{
...
    "exec-opts": [
        "native.cgroupdriver=systemd"
    ]
...
}
# 修改后重启docker
$ sudo systemctl restart docker
# k8s必须保持全程关闭交换内存,swapoff -a命令只是暂时关闭swap
$ sudo swapoff -a
# 如果要永久关闭,编辑/etc/fstab文件,注释掉swap那行即可
$ sudo kubeadm reset
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
$ sudo systemctl status kubelet

3. failed to garbage collect required amount of images

这是因为当容器集群中的节点(宿主机)磁盘使用率达到85%之后,会触发自动的容器镜像回收策略,以便于释放足够的宿主机磁盘。该事件发生于当触发镜像回收策略之后,磁盘空间仍然不足以达到健康阈值(默认为80%)。通常该错误是由于宿主机磁盘被占用太多导致。当磁盘空间占用率持续增长(超过90%),会导致该节点上的所有容器被驱逐,也就是当前节点由于磁盘压力不再对外提供服务,直到磁盘空间释放。

解决方案:

检查节点的磁盘分配情况,通常有以下一些常见情况导致磁盘占用率过高:

  1. 有大量日志在磁盘上没有清理;请清理日志。
  2. 有进程在宿主机不停的写文件;请控制文件大小,将文件存储至OSS或者NAS。
  3. 下载的或者是其他的静态资源文件占用空间过大;静态资源请存储至OSS或CDN。