如何在Kubernetes中收集日志

如何在Kubernetes中收集日志

Posted by Brian on Wednesday, September 20, 2023

Fluentd是一个高效的日志聚合器,并且占用的资源很少。插件也足够丰富。对于企业来说已经足够用了。大家在安装的时候要注意自己的集群版本,我这里是 1.27 版本的 k8s。

安装

我们使用 helm(v3) 来进行安装。首先在仓库里增加仓库地址并更新:

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

下载 values.yaml 文件

helm show values fluent/fluentd > values.yaml

修改 values.yaml 如下:

ameOverride: ""
fullnameOverride: ""

# DaemonSet, Deployment or StatefulSet
kind: "DaemonSet"
# azureblob, cloudwatch, elasticsearch7, elasticsearch8, gcs, graylog , kafka, kafka2, kinesis, opensearch
variant: elasticsearch8
# # Only applicable for Deployment or StatefulSet
# replicaCount: 1

image:
  repository: "fluent/fluentd-kubernetes-daemonset"
  pullPolicy: "IfNotPresent"
  tag: "v1.16.2-debian-elasticsearch8-1.0"

## Optional array of imagePullSecrets containing private registry credentials
## Ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
imagePullSecrets: []

serviceAccount:
  create: true
  annotations: {}
  name: null

rbac:
  create: true

# from Kubernetes 1.25, PSP is deprecated
# See: https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes
# We automatically disable PSP if Kubernetes version is 1.25 or higher
podSecurityPolicy:
  enabled: true
  annotations: {}

## Security Context policies for controller pods
## See https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ for
## notes on enabling and using sysctls
##
podSecurityContext: {}
  # seLinuxOptions:
  #   type: "spc_t"

securityContext: {}
  # capabilities:
  #   drop:
  #   - ALL
  # readOnlyRootFilesystem: true
  # runAsNonRoot: true
  # runAsUser: 1000

# Configure the livecycle
# Ref: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
lifecycle: {}
  # preStop:
  #   exec:
  #     command: ["/bin/sh", "-c", "sleep 20"]
  # Configure the livenessProbe
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
livenessProbe:
  httpGet:
    path: /metrics
    port: metrics
  # initialDelaySeconds: 0
  # periodSeconds: 10
  # timeoutSeconds: 1
  # successThreshold: 1
  # failureThreshold: 3

# Configure the readinessProbe
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
readinessProbe:
  httpGet:
    path: /metrics
    port: metrics
  # initialDelaySeconds: 0
  # periodSeconds: 10
  # timeoutSeconds: 1
  # successThreshold: 1
  # failureThreshold: 3

resources:
   requests:
     memory: 512Mi
   limits:
     memory: 512Mi

## only available if kind is Deployment
autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 100
  targetCPUUtilizationPercentage: 80
  # targetMemoryUtilizationPercentage: 80
  ## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics
  customRules: []
    # - type: Pods
    #   pods:
    #     metric:
    #       name: packets-per-second
    #     target:
    #       type: AverageValue
    #       averageValue: 1k
  ## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-configurable-scaling-behavior
  # behavior:
  #   scaleDown:
  #     policies:
  #       - type: Pods
  #         value: 4
  #         periodSeconds: 60
  #       - type: Percent
  #         value: 10
  #         periodSeconds: 60
  # priorityClassName: "system-node-critical"

nodeSelector: {}

## Node tolerations for server scheduling to nodes with taints
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
##
tolerations:
# - key: null
#   operator: Exists
#   effect: "NoSchedule"
  - key: "node-role.kubernetes.io/control-plane"
    operator: "Exists"
    effect: "NoSchedule"


## Affinity and anti-affinity
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
##
affinity: {}

## Annotations to be added to fluentd DaemonSet/Deployment
##
annotations: {}

## Labels to be added to fluentd DaemonSet/Deployment
##
labels: {}

## Annotations to be added to fluentd pods
##
podAnnotations: {}

## Labels to be added to fluentd pods
##
podLabels: {}

## How long (in seconds) a pods needs to be stable before progressing the deployment
##
minReadySeconds:

## How long (in seconds) a pod may take to exit (useful with lifecycle hooks to ensure lb deregistration is done)
##
terminationGracePeriodSeconds:

## Deployment strategy / DaemonSet updateStrategy
##
updateStrategy: {}
#   type: RollingUpdate
#   rollingUpdate:
#     maxUnavailable: 1

## Additional environment variables to set for fluentd pods
env:
  # - name: "FLUENTD_CONF"
  #   value: "../../../etc/fluent/fluent.conf"
  - name: FLUENT_ELASTICSEARCH_HOST
      value: "elasticsearch"
  - name: FLUENT_ELASTICSEARCH_PORT
    value: "9200"
  - name: FLUENT_ELASTICSEARCH_SCHEME
    value: http
  - name: K8S_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName


envFrom: []

initContainers: []

## Name of the configMap containing a custom fluentd.conf configuration file to use instead of the default.
# mainConfigMapNameOverride: ""

## Name of the configMap containing files to be placed under /etc/fluent/config.d/
## NOTE: This will replace ALL default files in the aforementioned path!
# extraFilesConfigMapNameOverride: ""

mountVarLogDirectory: false
mountDockerContainersDirectory: false

volumes:
  - name: varlog
    hostPath:
      path: /var/log
  - name: dockercontainerlogdirectory
    hostPath:
      path: /var/lib/docker/containers
  - name: poddir
    hostPath:
      path: /var/log/pods

volumeMounts:
  - name: varlog
    mountPath: /var/log
  - name: dockercontainerlogdirectory
    mountPath: /var/lib/docker/containers
    readOnly: true
  - name: poddir
    mountPath: /var/log/pods
    readOnly: true

## Only available if kind is StatefulSet
## Fluentd persistence
##
persistence:
  enabled: false
  storageClass: "es-data"
  accessMode: ReadWriteOnce
  size: 10Gi

## Fluentd service
##
service:
  type: "ClusterIP"
  annotations: {}
  # loadBalancerIP:
  # externalTrafficPolicy: Local
  ports: []
  # - name: "forwarder"
  #   protocol: TCP
  #   containerPort: 24224

## Prometheus Monitoring
##
metrics:
  serviceMonitor:
    enabled: false
    additionalLabels:
      release: prometheus-operator
    namespace: ""
    namespaceSelector: {}
    ## metric relabel configs to apply to samples before ingestion.
    ##
    metricRelabelings: []
    # - sourceLabels: [__name__]
    #   separator: ;
    #   regex: ^fluentd_output_status_buffer_(oldest|newest)_.+
    #   replacement: $1
    #   action: drop
    ## relabel configs to apply to samples after ingestion.
    ##
    relabelings: []
    # - sourceLabels: [__meta_kubernetes_pod_node_name]
    #   separator: ;
    #   regex: ^(.*)$
    #   targetLabel: nodename
    #   replacement: $1
    #   action: replace
    ## Additional serviceMonitor config
    ##
    # jobLabel: fluentd
    # scrapeInterval: 30s
    # scrapeTimeout: 5s
    # honorLabels: true

  prometheusRule:
    enabled: false
    additionalLabels: {}
    namespace: ""
    rules: []
    # - alert: FluentdDown
    #   expr: up{job="fluentd"} == 0
    #   for: 5m
    #   labels:
    #     context: fluentd
    #     severity: warning
    #   annotations:
    #     summary: "Fluentd Down"
    #     description: "{{ $labels.pod }} on {{ $labels.nodename }} is down"
        # - alert: FluentdScrapeMissing
    #   expr: absent(up{job="fluentd"} == 1)
    #   for: 15m
    #   labels:
    #     context: fluentd
    #     severity: warning
    #   annotations:
    #     summary: "Fluentd Scrape Missing"
    #     description: "Fluentd instance has disappeared from Prometheus target discovery"

## Grafana Monitoring Dashboard
##
dashboards:
  enabled: "false"
  namespace: ""
  labels:
    grafana_dashboard: '"1"'

## Fluentd list of plugins to install
##
plugins: []
# - fluent-plugin-out-http

## Add fluentd config files from K8s configMaps
##
configMapConfigs:
  - fluentd-prometheus-conf
  - fluentd-systemd-conf

## Fluentd configurations:
##
fileConfigs:
  01_sources.conf: |-
    ## logs from podman
    <source>
      @type tail
      @id in_tail_container_logs
      @label @KUBERNETES
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type multi_format
        <pattern>
          format json
          time_key time
          time_type string
          time_format "%Y-%m-%dT%H:%M:%S.%NZ"
          keep_time_key false
        </pattern>
        <pattern>
          format regexp
          expression /^(?<time>.+) (?<stream>stdout|stderr)( (.))? (?<log>.*)$/
          time_format '%Y-%m-%dT%H:%M:%S.%NZ'
          keep_time_key false
        </pattern>
              </parse>
      emit_unmatched_lines true
    </source>    

  02_filters.conf: |-
    <label @KUBERNETES>
      <match kubernetes.var.log.containers.fluentd**>
        @type relabel
        @label @FLUENT_LOG
      </match>

      # <match kubernetes.var.log.containers.**_kube-system_**>
      #   @type null
      #   @id ignore_kube_system_logs
      # </match>

      <filter kubernetes.**>
        @type kubernetes_metadata
        @id filter_kube_metadata
        skip_labels false
        skip_container_metadata false
        skip_namespace_metadata true
        skip_master_url true
      </filter>

      <match **>
        @type relabel
        @label @DISPATCH
      </match>
    </label>    

  03_dispatch.conf: |-
    <label @DISPATCH>
      <filter **>
        @type prometheus
        <metric>
          name fluentd_input_status_num_records_total
          type counter
          desc The total number of incoming records
          <labels>
            tag ${tag}
            hostname ${hostname}
          </labels>
        </metric>
      </filter>

      <match **>
        @type relabel
        @label @OUTPUT
      </match>
    </label>    

  04_outputs.conf: |-
    <label @OUTPUT>
      <match **>
        @type elasticsearch
        host "elasticsearch"
        
      # <match kubernetes.var.log.containers.**_kube-system_**>
      #   @type null
      #   @id ignore_kube_system_logs
      # </match>

      <filter kubernetes.**>
        @type kubernetes_metadata
        @id filter_kube_metadata
        skip_labels false
        skip_container_metadata false
        skip_namespace_metadata true
        skip_master_url true
      </filter>

      <match **>
        @type relabel
        @label @DISPATCH
      </match>
    </label>    

  03_dispatch.conf: |-
    <label @DISPATCH>
      <filter **>
        @type prometheus
        <metric>
          name fluentd_input_status_num_records_total
          type counter
          desc The total number of incoming records
          <labels>
            tag ${tag}
            hostname ${hostname}
          </labels>
        </metric>
      </filter>

      <match **>
        @type relabel
        @label @OUTPUT
      </match>
    </label>    

  04_outputs.conf: |-
    <label @OUTPUT>
      <match **>
        @type elasticsearch
        host "elasticsearch"
        port 9200
        request_timeout 60s
        logstash_format true
        include_timestamp true
        logstash_prefix "logstash"
        reload_on_failure true
      </match>
    </label>    

安装:

helm install -f ./values.yaml fluent fluent/fluentd --namespace elk --debug --version=0.4.4

#Output:
NAME: fluent
LAST DEPLOYED: Wed Sep 20 18:33:17 2023
NAMESPACE: elk
STATUS: deployed
REVISION: 1
NOTES:
Get Fluentd build information by running these commands:

export POD_NAME=$(kubectl get pods --namespace elk -l "app.kubernetes.io/name=fluentd,app.kubernetes.io/instance=fluent" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace elk port-forward $POD_NAME 24231:24231
curl http://127.0.0.1:24231/metrics

查看所有Pod是否安装完成:

kubectl get pods -n elk -o wide

输出如下:

NAME                      READY   STATUS    RESTARTS       AGE   IP               NODE     NOMINATED NODE   READINESS GATES
es-cluster-0              1/1     Running   0              26h   10.122.219.79    master   <none>           <none>
es-cluster-1              1/1     Running   0              26h   10.122.140.86    node02   <none>           <none>
es-cluster-2              1/1     Running   1              26h   10.122.196.155   node01   <none>           <none>
fluent-fluentd-8r4v5      1/1     Running   0              39m   10.122.219.90    master   <none>           <none>
fluent-fluentd-fqmp9      1/1     Running   0              39m   10.122.196.169   node01   <none>           <none>
fluent-fluentd-zdxcj      1/1     Running   0              39m   10.122.140.75    node02   <none>           <none>

可以看到我们的 Elasticsearch 与 fluent 都是在同一空间下的。每一个节点也都有了一个 fluent-fluentd-xxx的pod 。因为我们是采用DaemonSet 来部署的。一切正常运行。

查看下输出的日志:

kubectl logs -f -n elk fluent-fluentd-8r4v5

需要的注意的是日志中如果有Error类型的输出要及时解决。我在安装的时候由于机器的性能太落后。经常 worker-0 randomly SIGKILLed 导致Pod 不停重启。后来把 04_outputs.conf ->request_timeout 增加到 60s 后,问题有所改善。如果是在性能的的机器上默认的 5s 应该够用了。

接下来我们在 kibana 中查看日志的是否有正常写入。

在 Dov Tools 中输入如下请求

GET /_cat/indices

输出如下:

green open logstash-2023.09.09 Qti5vw-ATGOL5lQmm4vkRg 1 1  71694 0  59.9mb  29.9mb
green open logstash-2023.09.19 Yfw6cYQsRUS1Kfd7Fuujlg 1 1 100566 0  24.3mb    12mb
green open logstash-2023.09.08 UXrx-rFqSN6ZNtmXjqNisQ 1 1  71635 0  60.6mb  30.2mb
green open logstash-2023.09.07 N2UEnJ6GQe6aFs6fHUiGMA 1 1 134487 0    87mb  43.5mb
green open logstash-2023.09.18 spW3zgjnQG-tnQ0_J4d8BQ 1 1   3553 0   9.5mb   4.9mb
green open logstash-2023.09.06 UECzUgbCQM-gspqQ6epxKQ 1 1 181543 0 146.1mb    93mb
green open logstash-2023.09.17 gtux4HefRyeJ5LAFVlULDQ 1 1   1702 0   3.1mb   1.5mb
green open logstash-2023.09.01 k-VNbCt6TI6EBqc7HtNP9A 1 1  34841 0    13mb   6.5mb
green open logstash-2023.08.31 7U8Ls8mRROS2r47YMlM2gw 1 1   3384 0   1.2mb 641.1kb
green open logstash-2023.09.20 7XA24NJ7R2GpT-2OS5PCyw 1 1  99188 0  46.8mb  23.4mb
green open logstash-2023.09.05 B8DHtFmCRr-uZlti7vmJnQ 1 1 161342 0  87.3mb  43.8mb
green open logstash-2023.09.04 b5A2MPKZQX6PHkJG89ys7g 1 1  79652 0  29.9mb  14.9mb
green open logstash-2023.09.03 81MbV5BVQHyn6lePRxcR7g 1 1  69456 0  24.1mb    12mb
green open logstash-2023.09.02 04G33ne6RLuduRw4YV1pMw 1 1  66886 0  59.4mb  29.7mb
green open logstash-2023.09.10 I1NVbGKaQhW_6kdgusOE4w 1 1  25179 0  24.2mb  12.1mb

可以看到我日志已经同步过来了,当然啦,如果日志比较多的话同步需要一些时间。到这里就算是完成了收集集群内所有的容器日志。并实时的写入到 ES 中。

卸载

如果遇到问题可以先删除 release 然后在重新安装。

helm delete fluent -n elk

遗留问题

这里留三个问题给您思考一下:

如何才能把 容器输入的 Json 串以 Json的形式存入到 Es 中?

如何解析非Json的容器日志,比如 nginx 日志?

如何做日志字段的加工?比如解析 Agent ,Ip等?