Fluentd
是一个高效的日志聚合器,并且占用的资源很少。插件也足够丰富。对于企业来说已经足够用了。大家在安装的时候要注意自己的集群版本,我这里是 1.27 版本的 k8s。
安装
我们使用 helm(v3) 来进行安装。首先在仓库里增加仓库地址并更新:
helm repo add fluent https://fluent.github.io/helm-charts
helm repo update
下载 values.yaml
文件
helm show values fluent/fluentd > values.yaml
修改 values.yaml 如下:
ameOverride: ""
fullnameOverride: ""
# DaemonSet, Deployment or StatefulSet
kind: "DaemonSet"
# azureblob, cloudwatch, elasticsearch7, elasticsearch8, gcs, graylog , kafka, kafka2, kinesis, opensearch
variant: elasticsearch8
# # Only applicable for Deployment or StatefulSet
# replicaCount: 1
image:
repository: "fluent/fluentd-kubernetes-daemonset"
pullPolicy: "IfNotPresent"
tag: "v1.16.2-debian-elasticsearch8-1.0"
## Optional array of imagePullSecrets containing private registry credentials
## Ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
imagePullSecrets: []
serviceAccount:
create: true
annotations: {}
name: null
rbac:
create: true
# from Kubernetes 1.25, PSP is deprecated
# See: https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes
# We automatically disable PSP if Kubernetes version is 1.25 or higher
podSecurityPolicy:
enabled: true
annotations: {}
## Security Context policies for controller pods
## See https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ for
## notes on enabling and using sysctls
##
podSecurityContext: {}
# seLinuxOptions:
# type: "spc_t"
securityContext: {}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
# Configure the livecycle
# Ref: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
lifecycle: {}
# preStop:
# exec:
# command: ["/bin/sh", "-c", "sleep 20"]
# Configure the livenessProbe
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
livenessProbe:
httpGet:
path: /metrics
port: metrics
# initialDelaySeconds: 0
# periodSeconds: 10
# timeoutSeconds: 1
# successThreshold: 1
# failureThreshold: 3
# Configure the readinessProbe
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
readinessProbe:
httpGet:
path: /metrics
port: metrics
# initialDelaySeconds: 0
# periodSeconds: 10
# timeoutSeconds: 1
# successThreshold: 1
# failureThreshold: 3
resources:
requests:
memory: 512Mi
limits:
memory: 512Mi
## only available if kind is Deployment
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics
customRules: []
# - type: Pods
# pods:
# metric:
# name: packets-per-second
# target:
# type: AverageValue
# averageValue: 1k
## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-configurable-scaling-behavior
# behavior:
# scaleDown:
# policies:
# - type: Pods
# value: 4
# periodSeconds: 60
# - type: Percent
# value: 10
# periodSeconds: 60
# priorityClassName: "system-node-critical"
nodeSelector: {}
## Node tolerations for server scheduling to nodes with taints
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
##
tolerations:
# - key: null
# operator: Exists
# effect: "NoSchedule"
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
## Affinity and anti-affinity
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
##
affinity: {}
## Annotations to be added to fluentd DaemonSet/Deployment
##
annotations: {}
## Labels to be added to fluentd DaemonSet/Deployment
##
labels: {}
## Annotations to be added to fluentd pods
##
podAnnotations: {}
## Labels to be added to fluentd pods
##
podLabels: {}
## How long (in seconds) a pods needs to be stable before progressing the deployment
##
minReadySeconds:
## How long (in seconds) a pod may take to exit (useful with lifecycle hooks to ensure lb deregistration is done)
##
terminationGracePeriodSeconds:
## Deployment strategy / DaemonSet updateStrategy
##
updateStrategy: {}
# type: RollingUpdate
# rollingUpdate:
# maxUnavailable: 1
## Additional environment variables to set for fluentd pods
env:
# - name: "FLUENTD_CONF"
# value: "../../../etc/fluent/fluent.conf"
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
- name: FLUENT_ELASTICSEARCH_SCHEME
value: http
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
envFrom: []
initContainers: []
## Name of the configMap containing a custom fluentd.conf configuration file to use instead of the default.
# mainConfigMapNameOverride: ""
## Name of the configMap containing files to be placed under /etc/fluent/config.d/
## NOTE: This will replace ALL default files in the aforementioned path!
# extraFilesConfigMapNameOverride: ""
mountVarLogDirectory: false
mountDockerContainersDirectory: false
volumes:
- name: varlog
hostPath:
path: /var/log
- name: dockercontainerlogdirectory
hostPath:
path: /var/lib/docker/containers
- name: poddir
hostPath:
path: /var/log/pods
volumeMounts:
- name: varlog
mountPath: /var/log
- name: dockercontainerlogdirectory
mountPath: /var/lib/docker/containers
readOnly: true
- name: poddir
mountPath: /var/log/pods
readOnly: true
## Only available if kind is StatefulSet
## Fluentd persistence
##
persistence:
enabled: false
storageClass: "es-data"
accessMode: ReadWriteOnce
size: 10Gi
## Fluentd service
##
service:
type: "ClusterIP"
annotations: {}
# loadBalancerIP:
# externalTrafficPolicy: Local
ports: []
# - name: "forwarder"
# protocol: TCP
# containerPort: 24224
## Prometheus Monitoring
##
metrics:
serviceMonitor:
enabled: false
additionalLabels:
release: prometheus-operator
namespace: ""
namespaceSelector: {}
## metric relabel configs to apply to samples before ingestion.
##
metricRelabelings: []
# - sourceLabels: [__name__]
# separator: ;
# regex: ^fluentd_output_status_buffer_(oldest|newest)_.+
# replacement: $1
# action: drop
## relabel configs to apply to samples after ingestion.
##
relabelings: []
# - sourceLabels: [__meta_kubernetes_pod_node_name]
# separator: ;
# regex: ^(.*)$
# targetLabel: nodename
# replacement: $1
# action: replace
## Additional serviceMonitor config
##
# jobLabel: fluentd
# scrapeInterval: 30s
# scrapeTimeout: 5s
# honorLabels: true
prometheusRule:
enabled: false
additionalLabels: {}
namespace: ""
rules: []
# - alert: FluentdDown
# expr: up{job="fluentd"} == 0
# for: 5m
# labels:
# context: fluentd
# severity: warning
# annotations:
# summary: "Fluentd Down"
# description: "{{ $labels.pod }} on {{ $labels.nodename }} is down"
# - alert: FluentdScrapeMissing
# expr: absent(up{job="fluentd"} == 1)
# for: 15m
# labels:
# context: fluentd
# severity: warning
# annotations:
# summary: "Fluentd Scrape Missing"
# description: "Fluentd instance has disappeared from Prometheus target discovery"
## Grafana Monitoring Dashboard
##
dashboards:
enabled: "false"
namespace: ""
labels:
grafana_dashboard: '"1"'
## Fluentd list of plugins to install
##
plugins: []
# - fluent-plugin-out-http
## Add fluentd config files from K8s configMaps
##
configMapConfigs:
- fluentd-prometheus-conf
- fluentd-systemd-conf
## Fluentd configurations:
##
fileConfigs:
01_sources.conf: |-
## logs from podman
<source>
@type tail
@id in_tail_container_logs
@label @KUBERNETES
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
@type multi_format
<pattern>
format json
time_key time
time_type string
time_format "%Y-%m-%dT%H:%M:%S.%NZ"
keep_time_key false
</pattern>
<pattern>
format regexp
expression /^(?<time>.+) (?<stream>stdout|stderr)( (.))? (?<log>.*)$/
time_format '%Y-%m-%dT%H:%M:%S.%NZ'
keep_time_key false
</pattern>
</parse>
emit_unmatched_lines true
</source>
02_filters.conf: |-
<label @KUBERNETES>
<match kubernetes.var.log.containers.fluentd**>
@type relabel
@label @FLUENT_LOG
</match>
# <match kubernetes.var.log.containers.**_kube-system_**>
# @type null
# @id ignore_kube_system_logs
# </match>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
skip_labels false
skip_container_metadata false
skip_namespace_metadata true
skip_master_url true
</filter>
<match **>
@type relabel
@label @DISPATCH
</match>
</label>
03_dispatch.conf: |-
<label @DISPATCH>
<filter **>
@type prometheus
<metric>
name fluentd_input_status_num_records_total
type counter
desc The total number of incoming records
<labels>
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</filter>
<match **>
@type relabel
@label @OUTPUT
</match>
</label>
04_outputs.conf: |-
<label @OUTPUT>
<match **>
@type elasticsearch
host "elasticsearch"
# <match kubernetes.var.log.containers.**_kube-system_**>
# @type null
# @id ignore_kube_system_logs
# </match>
<filter kubernetes.**>
@type kubernetes_metadata
@id filter_kube_metadata
skip_labels false
skip_container_metadata false
skip_namespace_metadata true
skip_master_url true
</filter>
<match **>
@type relabel
@label @DISPATCH
</match>
</label>
03_dispatch.conf: |-
<label @DISPATCH>
<filter **>
@type prometheus
<metric>
name fluentd_input_status_num_records_total
type counter
desc The total number of incoming records
<labels>
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</filter>
<match **>
@type relabel
@label @OUTPUT
</match>
</label>
04_outputs.conf: |-
<label @OUTPUT>
<match **>
@type elasticsearch
host "elasticsearch"
port 9200
request_timeout 60s
logstash_format true
include_timestamp true
logstash_prefix "logstash"
reload_on_failure true
</match>
</label>
安装:
helm install -f ./values.yaml fluent fluent/fluentd --namespace elk --debug --version=0.4.4
#Output:
NAME: fluent
LAST DEPLOYED: Wed Sep 20 18:33:17 2023
NAMESPACE: elk
STATUS: deployed
REVISION: 1
NOTES:
Get Fluentd build information by running these commands:
export POD_NAME=$(kubectl get pods --namespace elk -l "app.kubernetes.io/name=fluentd,app.kubernetes.io/instance=fluent" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace elk port-forward $POD_NAME 24231:24231
curl http://127.0.0.1:24231/metrics
查看所有Pod是否安装完成:
kubectl get pods -n elk -o wide
输出如下:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
es-cluster-0 1/1 Running 0 26h 10.122.219.79 master <none> <none>
es-cluster-1 1/1 Running 0 26h 10.122.140.86 node02 <none> <none>
es-cluster-2 1/1 Running 1 26h 10.122.196.155 node01 <none> <none>
fluent-fluentd-8r4v5 1/1 Running 0 39m 10.122.219.90 master <none> <none>
fluent-fluentd-fqmp9 1/1 Running 0 39m 10.122.196.169 node01 <none> <none>
fluent-fluentd-zdxcj 1/1 Running 0 39m 10.122.140.75 node02 <none> <none>
可以看到我们的 Elasticsearch 与 fluent 都是在同一空间下的。每一个节点也都有了一个 fluent-fluentd-xxx
的pod 。因为我们是采用DaemonSet 来部署的。一切正常运行。
查看下输出的日志:
kubectl logs -f -n elk fluent-fluentd-8r4v5
需要的注意的是日志中如果有Error类型的输出要及时解决。我在安装的时候由于机器的性能太落后。经常 worker-0 randomly SIGKILLed 导致Pod 不停重启。后来把 04_outputs.conf ->request_timeout
增加到 60s 后,问题有所改善。如果是在性能的的机器上默认的 5s 应该够用了。
接下来我们在 kibana 中查看日志的是否有正常写入。
在 Dov Tools 中输入如下请求
GET /_cat/indices
输出如下:
green open logstash-2023.09.09 Qti5vw-ATGOL5lQmm4vkRg 1 1 71694 0 59.9mb 29.9mb
green open logstash-2023.09.19 Yfw6cYQsRUS1Kfd7Fuujlg 1 1 100566 0 24.3mb 12mb
green open logstash-2023.09.08 UXrx-rFqSN6ZNtmXjqNisQ 1 1 71635 0 60.6mb 30.2mb
green open logstash-2023.09.07 N2UEnJ6GQe6aFs6fHUiGMA 1 1 134487 0 87mb 43.5mb
green open logstash-2023.09.18 spW3zgjnQG-tnQ0_J4d8BQ 1 1 3553 0 9.5mb 4.9mb
green open logstash-2023.09.06 UECzUgbCQM-gspqQ6epxKQ 1 1 181543 0 146.1mb 93mb
green open logstash-2023.09.17 gtux4HefRyeJ5LAFVlULDQ 1 1 1702 0 3.1mb 1.5mb
green open logstash-2023.09.01 k-VNbCt6TI6EBqc7HtNP9A 1 1 34841 0 13mb 6.5mb
green open logstash-2023.08.31 7U8Ls8mRROS2r47YMlM2gw 1 1 3384 0 1.2mb 641.1kb
green open logstash-2023.09.20 7XA24NJ7R2GpT-2OS5PCyw 1 1 99188 0 46.8mb 23.4mb
green open logstash-2023.09.05 B8DHtFmCRr-uZlti7vmJnQ 1 1 161342 0 87.3mb 43.8mb
green open logstash-2023.09.04 b5A2MPKZQX6PHkJG89ys7g 1 1 79652 0 29.9mb 14.9mb
green open logstash-2023.09.03 81MbV5BVQHyn6lePRxcR7g 1 1 69456 0 24.1mb 12mb
green open logstash-2023.09.02 04G33ne6RLuduRw4YV1pMw 1 1 66886 0 59.4mb 29.7mb
green open logstash-2023.09.10 I1NVbGKaQhW_6kdgusOE4w 1 1 25179 0 24.2mb 12.1mb
可以看到我日志已经同步过来了,当然啦,如果日志比较多的话同步需要一些时间。到这里就算是完成了收集集群内所有的容器日志。并实时的写入到 ES 中。
卸载
如果遇到问题可以先删除 release 然后在重新安装。
helm delete fluent -n elk
遗留问题
这里留三个问题给您思考一下:
如何才能把 容器输入的 Json 串以 Json的形式存入到 Es 中?
如何解析非Json的容器日志,比如 nginx 日志?
如何做日志字段的加工?比如解析 Agent ,Ip等?