Switch from FluentD to vector log aggregation tool
Log files are extremely important for the data analysis processes because they contain essential information about usage patterns, activities, and operations within an operating system, application, server, or device. This data is relevant to a number of use cases within an organization, from resource management, application troubleshooting, regulatory compliance, SIEM, business analytics, and marketing insights. To manage the logs created by these use cases and leverage this wealth of data, log aggregation tools allow organizations to systematically collect and standardize log files. However, choosing the right tool can be quite difficult.
This blog will detail and compare the popular fluentD and Vector open source tools for log aggregation.
fluentD setup and efficiency calculation
When using orchestration tools like Kubernetes to deploy containers or other API resources, a log aggregator is needed to store pod or node logs in a cloud platform. For a special requirement, fluentD was used as a log aggregation tool to push K8 pod logs to cloud storage buckets with an example configuration as shown below:
path /var/log/fluent/gcs timekey 1m timekey_wait 30 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true flush_mode interval flush_interval 1 chunk_limit_size 10 MB retry_max_interval 30 retry_wait 60
Using this system, fluentD only pushed 47.62% of total logs to cloud storage. Since there was a loss of more than 50%, changes were made to the configuration. In most changes, efficiency was between 40% and 50%, with maximum efficiency achieved on average 67% for an entire day. Below are some of the changes made along with the percentage of logs that were moved to cloud storage:
path /var/log/fluent/gcs timekey 1m timekey_wait 30 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true retry_max_interval 60 retry_wait 30
path /var/log/fluent/gcs timekey 1m timekey_wait 30 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true
path /var/log/fluent/gcs timekey 10m timekey_wait 0 timekey_use_utc true flush_at_shutdown true
path /var/log/fluent/gcs timekey 30 timekey_wait 0 timekey_use_utc true flush_thread_count 15 flush_at_shutdown true
path /var/log/fluent/gcs timekey 1 timekey_wait 0 timekey_use_utc true flush_thread_count 16 flush_at_shutdown true flush_mode immediate
Vector deployment, configuration and resulting efficiency
To further enhance this, Datadog’s open-source Vector tool was also considered. This tool was suitable for K8 setup with similar configuration as fluentD and installed in nodes.
A Helm command was used to clone the official repository into his VMs; the configuration was changed as described below and installed as an agent. Vector is available in two modes of operation: agent and aggregator. While the agent is the simple mode that pushes logs/events from the source to the destination, the aggregator is used to transform and ship data collected by other agents (in this case, Vector).
Installing this tool requires a Helm repository on the local machine to retrieve the source code. Therefore, the commands below were executed sequentially before installing Vector in a K8 cluster:
helm repo add vector https://helm.vector.dev
helm repo update
helm fetch –untar vector/vector . (command to clone repository to local machine)
type: kubernetes_logs (because we use kubernetes as a source)
bar setup vector. –namespace vector
After deploying Vector in the development environment and testing it, the efficiency was around 100% with negligible loss. The switch was then made to Vector and deployed to the production environment. Vector can ship 100,000 events or logs/sec, which is a very high throughput compared to other tools for log aggregation performance. Vector was able to achieve 99.98-100% efficiency even in the production Kubernetes cluster.