I have an AWS EKS cluster and have built a logging architecture using EFK: Elasticsearch, Fluentd, and Kibana. These are the specific settings.
Elasticsearch is deployed using StatefulSet and uses volumeClaimTemplates for volumeMounts in containers. Thus, an EBS volume is dynamically provisioned to serve this purpose.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-cluster
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
...
volumeMounts:
- name: data-pvc
mountPath: /usr/share/elasticsearch/data
initContainers:
- name: fix-permissions
image: busybox
...
volumeMounts:
- name: data-pvc
mountPath: /usr/share/elasticsearch/data
...
volumeClaimTemplates:
- metadata:
name: data-pvc
labels:
app: elasticsearch
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: gp2
Fluentd is deployed via DaemonSet, thus running on all nodes. Its container's volume is mounted in the hostPath, which means that it uses an EBS volume that is statically and directly attached to the EC2 instance.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
labels:
app: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
...
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1
...
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
Kibana is simply deployed via Deployment.
Now, I need to think about log retention, or rotation, policy, since I don't want all the logs eat up all the disk spaces.
The first question is, where are the actual logs stored? Is it in the dynamically provisioned EBS volume where container in Elasticsearch is mounted on? Or, is it in the EBS volume directly attached to the EC2 instance where container in Fluentd is mounted on?
Secondly, what is the best practice to set up a log retention policy? Say, I want to keep all the logs for 90 days. When a log becomes older than that, I want it to be directly saved to a pre-created S3 bucket, and be deleted from the EBS volume automatically. Can it be done utilizing the settings with regards to EFK, instead node-specific methods such as running a cron job in the node? If there is a way to do so, how should I modify the YAML files above?