ocp-log-forwarding-config.md

Forwarding logs to external LogStash (on-premise)

These instructions can be run from a command line using oc authenticated to the target cluster. They were derived from the OpenShift documentation for installing the Cluster Logging Operator using the CLI. Here is an ansible playbook that was also used during testing.

Create the namespace for logging

Create a file named logging_namespace.yml with the following content:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging 
  annotations:
    openshift.io/node-selector: "" 
  labels:
    openshift.io/cluster-monitoring: "true"

Run this command to create the namespace:

oc create -f logging_namespace.yml

Create the Operator Group for the Cluster Logging Operator

Create a file named og-clo.yml with the following content:

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging 
spec:
  targetNamespaces:
  - openshift-logging

Run this command:

oc create -f og-clo.yml

Install the Cluster Logging Operator

Create a file named clo_subscription.yml with this content:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging 
spec:
  channel: "4.3" 
  name: cluster-logging
  source: redhat-operators 
  sourceNamespace: openshift-marketplace

Run this command:

oc create -f clo_subscription.yml

At this point the Cluster Logging Operator should be installed. Verify that the installation succeeded by viewing the Installed Operators page in the openshift-logging project in the web console.

Create the ClusterLogging Custom Resource

This custom resource tells the operator how to configure cluster logging. A "full stack" configuration includes a highly-available ElasticSearch cluster, fluentd and Kibana for viewing logs. Since we are going to be shipping logs on-premise to Splunk (via LogStash) the full stack is not required.

Create a file named clusterlogging.yml with the following content:

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
  annotations:
    clusterlogging.openshift.io/logforwardingtechpreview: enabled
spec:
  managementState: Managed
  logStore:
    type: elasticsearch
    elasticsearch:
      # no elasticsearch pods
      nodeCount: 0
  visualization:
    type: kibana
    kibana:
      # no kibana pods
      replicas: 0
  curation:
    type: curator
    curator:
      # schedule the cron job to never run curator
      schedule: "* * 31 2 *"
  collection:
    logs:
      type: fluentd
      fluentd: {}

Note: This configuration was originally tested with the expectation that at least one ElasticSearch pod would be required. The settings above in the ElasticSearch section would create a pod using the minimum CPU, memory and storage that would still run. During testing it was discovered that this configuration would work without ElasticSearch at all, so the nodeCount is set to 0.

This configuration will not install or configure ElasticSearch, and it will create a Kibana deployment with 0 replicas.

Run this command:

oc create -f clusterlogging.yml

Configure Log Forwarding

At this point the fluentd daemonset has been created and fluentd pods are running but they have nowhere to send the logs. The LogForwarding custom resource provides the additional configuration to set up log forwarding to LogStash. When you create the LogForwarding custom resource in this step, the fluentd pods will automatically be deleted and recreated by the operator to pick up the changes. The same thing happens if you delete the LogForwarding custom resource.

Create a file named logforwarding.yml with the following content:

NOTE: Be sure to update the endpoint parameter with the url for the on-premise logstash server.

apiVersion: logging.openshift.io/v1alpha1
kind: LogForwarding
metadata:
  name: instance
  namespace: openshift-logging
spec:
  disableDefaultForwarding: true
  outputs:
    - endpoint: 'host:port'
      name: insecureforward
      type: forward
  pipelines:
    - inputSource: logs.app
      name: container-logs
      outputRefs:
        - insecureforward

Disclaimer: This is a test configuration that uses insecure communication between the fluentd pods and the logstash server. Be sure to update the configuration to do secure forwarding. Details can be found in the OpenShift documentation.

Run this command:

oc create -f logforwarding.yml

Once the fluentd pods restart the logs will start flowing to the external server.

Update fluentd configmap to resolve logstash parsing error

Logs are now flowing to Logstash, but they are not being parsed correctly. Logstash is throwing "fluend parsing errors". Here is the Logstash configuration:

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.



input {
  http {
    id => "dw-http-123456"
  }

  tcp {
    codec => fluent
    port => 4000
  }
}

output {
  stdout {}
}

A bit of googling revealed that an additional parameter needs to be added to the fluentd configmap created by the ClusterLogging API. The time needs to be sent from fluentd as an integer so that the latest fluentd parser in logstash can parse the messages. The parameter is time_as_integer: true and is set in this section of the fluentd configmap in the openshift-logging namespace:

# Ship logs to specific outputs

<label @INSECUREFORWARD>
	<match **>
	
	# https://docs.fluentd.org/v1.0/articles/in_forward
	@type forward
		time_as_integer true
		<buffer>
		@type file
		path '/var/lib/fluentd/insecureforward'
		queue_limit_length "#{ENV['BUFFER_QUEUE_LIMIT'] || '32' }"
		chunk_limit_size "#{ENV['BUFFER_SIZE_LIMIT'] || '1m' }"
		flush_interval "#{ENV['FORWARD_FLUSH_INTERVAL'] || '5s'}"
		flush_at_shutdown "#{ENV['FLUSH_AT_SHUTDOWN'] || 'false'}"
		flush_thread_count "#{ENV['FLUSH_THREAD_COUNT'] || 2}"
		retry_max_interval "#{ENV['FORWARD_RETRY_WAIT'] || '300'}"
		retry_forever true
		# the systemd journald 0.0.8 input plugin will just throw away records if the buffer
		# queue limit is hit - 'block' will halt further reads and keep retrying to flush the
		# buffer to the remote - default is 'exception' because in_tail handles that case
		overflow_action "#{ENV['BUFFER_QUEUE_FULL_ACTION'] || 'exception'}"
	</buffer>
	
		<server>
		host 10.1.112.6
		port 4000
	</server>
</match>
</label>

First you have to set the management state of the Cluster Logging API to Unmanaged. Then you can edit the configmap and add the time_as_integer parameter. Once saved, you need to delete all of the fluentd pods in the daemonset so that they pick up the new configuration.

To my knowledge this is the only change needed to get logstash to properly parse the messages coming from fluentd. It would be great if the LogForwarding API had an additional field where this could be specified and applied automatically. Or, if there are no other implications to any other use cases or scenarios, perhaps the operator could always apply this parameter in the configmap.

dwakeman/ocp-log-forwarding-config.md

Forwarding logs to external LogStash (on-premise)

Create the namespace for logging

Create the Operator Group for the Cluster Logging Operator

Install the Cluster Logging Operator

Create the ClusterLogging Custom Resource

Configure Log Forwarding

Update fluentd configmap to resolve logstash parsing error

jcantrill commented Dec 2, 2021