1. Configuration

Helmfile

Update the environments as required with their relevant file-paths of environment & secrets file and the namespace to be used.

In below config "demo" is the environment with default namespace being set & environment files being provided.

# config-as-code/helm/charts/monitoring/monitoring-helmfile.yaml

environments:
  demo:
    values:
      - namespace: monitoring
      - ../../../environments/egov-demo.yaml
      - ../../../environments/egov-demo-secrets.yaml

Environment Configuration

Grafana

  1. GitHub OAuth App Creation

    • Follow the GitHub OAuth app

    • Homepage URL https://<your_domain_name>

    • Authorization callback URL https://<your_domain_name>/monitoring/login/github

    • Generate Client ID & Client secret

  2. Update Client ID & Client secret in secrets config.

    # config-as-code/environments/egov-demo-secrets.yaml
    
    cluster-configs:
      secrets:
        grafana:
          clientID: <OAuth-key>
          clientSecret: <OAuth-token>

  3. Update environment config to allow GitHub organization & teams specific role-based access

    # config-as-code/environments/egov-demo.yaml
    
    grafana:
      github:
        allowed_organizations: ["<organization>"]
        role_attribute_path: contains(groups[*], '@<organization>/<team>') && 'Viewer'

Note: Valid roles are None, Viewer, Editor, Admin or GrafanaAdmin Visit official documentation for more information Grafana GitHub OAuth

Loki Stack

Filesystem as a storage

# config-as-code/environments/egov-demo.yaml

loki:
  persistence:
    enabled: true
    accessModes:
      - ReadWriteOnce
    size: 15Gi
  serviceAccount:
    annotations: {}
  additionalConfigs:
    schema_config:
      configs:
        - from: 2020-10-24
          store: boltdb-shipper
          object_store: filesystem                 ## local filesystem as storage
          schema: v11
          index:
            prefix: index_
            period: 24h
    storage_config:
      boltdb_shipper:
        active_index_directory: /data/loki/index
        cache_location: /data/loki/index_cache
        shared_store: filesystem                  ## local filesystem as storage
        cache_ttl: 24h
      filesystem:
        directory: /data/loki/chunks
    compactor:
      working_directory: /data/loki/boltdb-shipper-compactor
      shared_store: filesystem                    ## local filesystem as storage
      retention_enabled: true
      compaction_interval: 72h                    ## compaction in hours
    table_manager:
      retention_deletes_enabled: true
      retention_period: 72h                       ## retention in hours

AWS s3 as storage

Caution: Use the sub claim instead of aud when setting up Web Identity (OIDC) IAM roles to ensure correct identity matching.

  1. Create AWS Web Identity (OIDC) IAM role with following policy.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "AccessToLokiBucket",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::<s3-bucket>",
                    "arn:aws:s3:::<s3-bucket>/*"
                ]
            }
        ]
    }
    

  2. Update s3 details & role ARN in below config.

    # config-as-code/environments/egov-demo.yaml
    
    loki:
      persistence:
        enabled: true
        accessModes:
          - ReadWriteOnce
        size: 10Gi
      serviceAccount:
        annotations:
          eks.amazonaws.com/role-arn: <s3-role-arn>    ## AWS arn for s3 role 
      additionalConfigs:
        schema_config:
          configs:
            - from: 2020-10-24
              store: boltdb-shipper
              object_store: s3                         ## AWS s3 as storage
              schema: v11
              index:
                prefix: index_
                period: 24h
        storage_config:
          boltdb_shipper:
            active_index_directory: /data/loki/index
            cache_location: /data/loki/index_cache
            shared_store: s3                           ## AWS s3 as storage
            cache_ttl: 24h
          aws:
            s3: s3://<region>/<s3-bucket>              ## s3 region & bucket
        compactor:
          working_directory: /data/loki/boltdb-shipper-compactor
          shared_store: s3                             ## AWS s3 as storage
          retention_enabled: true
          compaction_interval: 168h                    ## compaction in hours
        table_manager:
          retention_deletes_enabled: true
          retention_period: 168h                       ## retention in hours

Azure Blob Store as storage

# config-as-code/environments/egov-demo.yaml

loki:
  persistence:
    enabled: true
    accessModes:
      - ReadWriteOnce
    size: 15Gi
  serviceAccount:
    annotations: {}
  additionalConfigs:
    schema_config:
      configs:
        - from: 2020-10-24
          store: boltdb-shipper
          object_store: azure                 ## azure blob as storage
          schema: v11
          index:
            prefix: index_
            period: 24h
    storage_config:
      azure:
        account_name: lokiprod                # Your Azure storage account name
        account_key: xyzdadasdadadadda        # For the account-key, see docs: https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal
        container_name: loki                  # See https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers
        #use_managed_identity: <true|false>
        #user_assigned_id: <user-assigned-identity-id> # Providing a user assigned ID will override use_managed_identity
        request_timeout: 0
        endpoint_suffix: loki                 # Configure this if you are using private azure cloud like azure stack hub and will use this endpoint suffix to compose container & blob storage URL. Ex: https://account_name.endpoint_suffix/container_name/blob_name
      boltdb_shipper:
        active_index_directory: /data/loki/index
        cache_location: /data/loki/index_cache
        shared_store: azure                  ## azure blob as storage
        cache_ttl: 24h
    compactor:
      working_directory: /data/loki/boltdb-shipper-compactor
      shared_store: azure                         ## Azure blob as storage
      retention_enabled: true
      compaction_interval: 72h                    ## compaction in hours
    table_manager:
      retention_deletes_enabled: true
      retention_period: 72h                       ## retention in hours

Note: Refer to official docs for detailed configuration

Prometheus

# config-as-code/environments/egov-demo.yaml

prometheus:
  prometheusSpec:
    retention: 7d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 30Gi
    alertmanager:
      enabled: false
    externalLabels:
      cluster: <cluster-name>                         ## provide cluster name
    additionalScrapeConfigs:
      - job_name: 'nginx-ingress-metrics'
        static_configs:
          - targets: [ 'ingress-nginx-controller-metrics.egov:10254' ]
      - job_name: 'redis-exporter'
        static_configs:
          - targets: [ 'prometheus-redis-exporter.backbone:9121' ]
      - job_name: 'blackbox'
        metrics_path: /probe
        params:
          module: [ http_2xx ]
        static_configs:
          - targets:
              - <list of urls to be monitored>         ## add all URLs to monitor example -  https://demo.digit.org/digit-ui
        relabel_configs:
          - source_labels: [ __address__ ]
            target_label: __param_target
          - source_labels: [ __param_target ]
            target_label: instance
          - target_label: __address__
            replacement: blackbox-prometheus-blackbox-exporter:9115
      - job_name: 'blackbox_exporter'
        static_configs:
          - targets: [ 'blackbox-prometheus-blackbox-exporter:9115' ]

Alerting

# config-as-code/environments/egov-demo.yaml

prometheus:
  prometheusSpec:
    alertmanager:
      enabled: true

Note: Enable Alertmanager present under Prometheus Operator

Slack Alerts

# config-as-code/environments/egov-demo-secrets.yaml

cluster-configs:
  secrets:
    alertmanager:
      config:
        global:
          slack_api_url: https://hooks.slack.com     ## slack webhook URL
          resolve_timeout: 5m
        route:
          group_by: ['alertname']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 10m
          routes:
          - receiver: slack-notification
            match:
                severity: "warning|critical"
            continue: true
        receivers:
        - name: slack-notification
          slack_configs:
            - channel: '<slack-channel>'             ## slack channel
              send_resolved: true
              username: 'Alertmanager'
              title: |
                  [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}
              text: |-
                  {{ range .Alerts -}}
                  {{- "\n" -}}
                  *Alert:* {{ .Annotations.summary }}
                  {{ if .Labels.severity }}*Severity:* `{{ .Labels.severity }}`{{ end }}
                  *Cluster:* {{ .Labels.cluster }}
                  *Details:*
                  {{ .Annotations.description }}
                  {{ end }}
              color: |-
                  {{ if eq .Status "firing" -}}
                    {{ if eq .CommonLabels.severity "warning" -}}
                      warning
                    {{- else if eq .CommonLabels.severity "critical" -}}
                      danger
                    {{- else -}}
                      #439FE0
                    {{- end -}}
                  {{ else -}}
                    good
                  {{- end }}

Note: Generate Slack Incoming Webhook & update slack_api_url under global config & slack-channel under receivers config.

Email Alerts

# config-as-code/environments/egov-demo-secrets.yaml

cluster-configs:
  secrets:
    alertmanager:
      config:
        global:
          resolve_timeout: 5m
        route:
          group_by: ['alertname']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 10m
          routes:
          - receiver: email-notification
            match:
              severity: "warning|critical"
            continue: true
        receivers:
        - name: email-notification
          email_configs:
            - to: '<recepient-email-address>'             ##  reciever's email id
              from: '<sender-email-address>'              ##  sender's email id
              smarthost: 'smtp.gmail.com:587'             ##  "" Update SMPT
              auth_username: '<sender-email-address>'     ##  configuration
              auth_password: '<auth-token>'               ##  as per the provider ""
              send_resolved: true
              headers:
                subject: |
                  [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.cluster }} - {{ .CommonLabels.alertname }}
              html: |
                <html>
                <head>
                <title>Alert!</title>
                </head>
                <body>
                {{ range .Alerts.Firing }}
                <ul>
                <li> <b>Alert Name:</b> {{ .CommonLabels.alertname }} </li>
                <li> <b>Severity:</b> {{ if eq .Labels.severity "critical" }}<b style="color:red;">CRITICAL</b>{{ else if eq .Labels.severity "warning" }}<b style="color:orange;">WARNING</b>{{ else }}<b>{{ .Labels.severity | toUpper }}</b>{{ end }} </li>
                <li> <b>Summary:-</b> {{ .Annotations.summary }} </li>
                <li> <b>Cluster:-</b> Cluster </li>
                <li> <b>Details:</b>
                  <p style="margin-left: 20px;"> {{ .Annotations.description | replace "\n" "<br>" }} </p>
                </li>
                </ul><br>
                {{ end }}
                </body></html>

Note: Follow this article in order to setup SMTP server for Gmails

Last updated

Was this helpful?