1 of 80

Operations Guide

In progress

This guide provides a step-by-step guide to monitoring and operating the DIGIT Platform and services in production.

DIGIT - Infra Overview

Operational Guidelines & Security Standards

Objective

The objective is to provide a clear guide for efficiently using DIGIT infrastructure on various platforms like SDC, NIC, or commercial clouds. This document outlines the infrastructure overview, operational guidelines, and recommendations, along with the segregation of duties (SoD). It helps to plan the procurement and build the necessary capabilities to deploy and implement DIGIT.

In a shared control scenario, the state program team must adhere to these guidelines and develop their own control implementation for the state's cloud infrastructure and collaborations with partners. This ensures standardized and smooth operational excellence in the overall system.

DIGIT Infrastructure Overview

DIGIT Platform is designed as a microservices architecture, using open-source technologies and containerized apps and services. DIGIT components/services are deployed as docker containers on a platform called Kubernetes, which provides flexibility for running cloud-native applications anywhere like physical or virtual infrastructure or hypervisor or HCI and so on. Kubernetes handles the work of scheduling containerized services onto a compute cluster and manages the workloads to ensure they run as intended. And it substantially simplifies the deployment and management of microservices.

Provisioning the Kubernetes cluster will vary across from commercial clouds to state data centres, especially in the absence of managed Kubernetes services like AWS, Azure, GCP and NIC. Kubernetes clusters can also be provisioned on state data centres with bare-metal, virtual machines, hypervisors, HCI, etc. However providing integrated networking, monitoring, logging, and alerting is critical for operating Kubernetes Clusters when it comes to State data centers. DIGIT Platform also offers add-ons to monitor Kubernetes cluster performance, logging, tracing, service monitoring and alerting, which the implementation team can take advantage.

Below are the useful links to understand Kubernetes:

DIGIT On Kubernetes - High-Level Deployment Diagram

DIGIT Infra Specification On SDC or NIC Or Any Commercial Cloud

Operational Recommendations

DIGIT strongly recommends Site reliability engineering (SRE) principles as a key means to bridge development and operations gaps by applying a software engineering mindset to system and IT administration topics. In general, an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.

Monitoring Tools Recommendations

Commercial clouds like AWS, Azure and GCP offer sophisticated monitoring solutions across various infra levels like CloudWatch and StackDriver. In the absence of such managed services to monitor, we can look at various best practices and tools listed below which help in debugging and troubleshooting efficiently.

Key Standard Operating Procedures (SOPs)

Segregation of duties and responsibilities.
SME and SPOCs for L1.5 support along with the SLAs defined.
Ticketing system to manage incidents, converge and collaborate on various operational issues.
Monitoring dashboards at various levels like Infrastructure, networks and applications.
Transparency of monitoring data and collaboration between teams.
Periodic remote sync-up meetings, acceptance and attendance to the meeting.
Ability to see stakeholders' availability of calendar time to schedule meetings.
Periodic (weekly, monthly) summary reports of the various infra, operations incident categories.
Communication channels and synchronization regularly and also upon critical issues, changes, upgrades, releases etc.

Segregation Of Duties

While DIGIT is deployed at state cloud infrastructure, it is essential to identify and distinguish the responsibilities between Infrastructure, Operations and Implementation partners. Identify these teams and assign SPOC, define responsibilities and ensure the Incident Management process is followed to visualize, track issues and manage dependencies between teams. Essentially these are monitored through dashboards and alerts are sent to the stakeholders proactively. eGov team can provide consultation and training on a need basis depending on any of the below categories.

State IT/Cloud team -Refers to state infra team for the Infra, network architecture, LAN network speed, internet speed, OS Licensing and upgrade, patch, compute, memory, disk, firewall, IOPS, security, access, SSL, DNS, data backups/recovery, snapshots, capacity monitoring dashboard.

State program team - Refers to the owner for the whole DIGIT implementation, application rollouts, and capacity building. Responsible for identifying and synchronizing the operating mechanism between the below teams.
Implementation partner - Refers to the DIGIT Implementation, application performance monitoring for errors, logs scrutiny, TPS on peak load, distributed tracing, DB queries analysis, etc.
Operations team - this team could be an extension of the implementation team that is responsible for DIGIT deployments, configurations, CI/CD, change management, traffic monitoring and alerting, log monitoring and dashboard, application security, DB Backups, application uptime, etc.

Skills To Setup, Operate & Maintain DIGIT On SDC

Resource Requirement

DIGIT - Security Standards & Operational Recommendations

This section provides insights on security principles, security layers and the line of control that we focus on to prevent DIGIT security from the code, application, access, infra and operations. The target audience of this section are internal teams, partners, ecosystems and states to understand what security measures to be considered to secure DIGIT from an infrastructure and operations perspective .

DIGIT - Key Security Principles

Subscribe to the DIGIT applicable OWASP top 10 standard across various security layers.

Minimize attack surface area
Implement a strong identity foundation - Who accesses what and who does what.
Apply security at all possible layers
Automate security best practices
Separation of duties (SoD).
The principle of Least privilege (PoLP)
Templatized design - (Code, Images, Infra-as-code, Deploy-as-code, Conf-as-code, etc)
Align with MeiTY Standards to meet SDC Infra policies.

Security Layers & Line of Control

Application Layer

The presentation layer is likely to be the #1 attack vector for malicious individuals seeking to breach security defences like DDoS attacks, Malicious bots, Cross-Site Scripting (XSS) and SQL injection. Need to invest in web security testing with the powerful combination of tools, automation, process and speed that seamlessly integrates testing into software development, helping to eliminate vulnerabilities more effectively, deploy a web application firewall (WAF) that monitors and filters traffic to and from the application, blocking bad actors while safe traffic proceeds normally.

Key Security Measures

1. TLS-protocols/Encryption: Access control to secure authentication and authorization. All APIs that are exposed must have HTTPS certificates and encrypt all the communication between client and server with transport layer security (TLS).

2. Auth Tokens: An authorization framework that allows users to obtain admittance to a resource from the server. This is done using tokens in microservices security patterns: resource server, resource owner, authorization server, and client. These tokens are responsible for access to the resource before its expiry time. Also, Refresh Tokens that are responsible for requesting new access after the original token has expired.

3. Multi-factor Authentication: authorize users on the front end, which requires a username and password as well as another form of identity verification to offer users better protection by default as some aspects are harder to steal than others. For instance, using OTP for authentication takes microservice security to a whole new level.

4. Rate Limit/DDoS: denial-of-service attacks are the attempts to send an overwhelming number of service messages to cause application failure by concentrating on volumetric flooding of the network pipe. Such attacks can target the entire platform and network stack.

To prevent this:

We should set a limit on how many requests in a given period can be sent to each API.
If the number exceeds the limit, block access from a particular API, at least for some reasonable interval.
Also, make sure to analyze the payload for threats.
The incoming calls from a gateway API would also have to be rate-limited.
Add filters to the router to drop packets from suspicious sources.

5. Cross-site scripting (XSS): scripts that are embedded in a webpage and executed on the client side, in a user’s browser, instead of on the server side. When applications take data from users and dynamically include it in webpages without validating the data properly, attackers can execute arbitrary commands and display arbitrary content in the user’s browser to gain access to account credentials.

How to prevent:

Applications must validate data input to the web application from user browsers.
All output from the web application to user browsers must be encoded.
Users must have the option to disable client-side scripts.

6. Cross-Site Request Forgery (CSRF): is an attack whereby a malicious website will send a request to a web application that a user is already authenticated against from a different website. This way an attacker can access functionality in a target web application via the victim's already authenticated browser. Targets include web applications like social media, in-browser email clients, online banking and web interfaces for network devices. To prevent this CSRF tokens are appended to each request and associated to the user’s session. Such tokens should at a minimum be unique per user session, but can also be unique per request.

How to prevent:

By including a challenge token with each request, the developer can ensure that the request is valid and not coming from a source other than the user.

8. SQL Injection (SQLi): allows attackers to control an application’s database – letting them access or delete data, change an application’s data-driven behaviour, and do other undesirable things – by tricking the application into sending unexpected SQL commands. SQL injections are among the most frequent threats to data security.

How to prevent:

Using parameterized queries which specify placeholders for parameters so that the database will always treat them as data rather than part of an SQL command. Prepared statements and object-relational mappers (ORMs) make this easy for developers.
Remediate SQLi vulnerabilities in legacy systems by escaping inputs before adding them to the query. Use this technique only where prepared statements or similar facilities are unavailable.
Mitigate the impact of SQLi vulnerabilities by enforcing the least privilege on the database. Ensure that each application has its database credentials and that these credentials have the minimum rights the application needs.

Security In The Code

The primary causes of commonly exploited software vulnerabilities are consistent defects, bugs, and logic flaws in the code. Poor coding practices can create vulnerabilities in the system that can be exploited by cybercriminals.

What defines a security in the code:

1. White-box code analysis: As developers write code, the IDE needs to provide focused, real-time security feedback with white-box code analysis. It also helps developers remediate faster and learn on the job through positive reinforcement, remediation guidance, code examples, etc.

2. Static Code Analysis (SAST): A static analysis tool reviews program code, searching for application coding flaws, back doors or other malicious code that could give hackers access to critical data or customer information. However, most static analysis tools can only scan source code.

3: Vulnerability assessment: Vulnerability assessment for the third-party libraries/artefacts as part of CI and GitHub PR process. Test results are returned quickly and prioritized in a Fix-First Analysis that identifies both the most urgent flaws and the ones that can be fixed most quickly, allowing developers to optimize efforts and save additional resources.

4. Secure PII/Encrypt: Personally identifying information – to make sure that it is not being displayed as plain text. All the passwords and usernames must be masked during the storing in logs or records. However, adding extra encryption above TLS/HTTP won’t add protection for traffic travelling through the wire. It can only help a little bit at the point where TLS terminates, so it can protect sensitive data (such as passwords or credit card numbers) from accidental dumping into a request log. Extra encryption (RSA 2048+ or Blowfish) might help protect data against those attacks that aim at accessing the log data. But it will not help with those who try accessing the memory of the application servers or the main data storage.

5. Manual Penetration Testing: Some categories of vulnerabilities, such as authorization issues and business logic flaws, cannot be found with automated assessments and will always require a skilled penetration tester to identify them. Need to employ Manual Penetration Testing that uses proven practices to provide extensive and comprehensive security testing results for web, mobile, desktop, and back-end with detailed results, including attack simulations.

Libraries/Containers

Components, such as libraries, frameworks, container images, and other software modules, almost always run with full privileges. If a vulnerable component is exploited, such an attack can facilitate serious data loss or server takeover. Applications using components with known vulnerabilities may undermine application defences and enable a range of possible attacks and impacts.

Automating dependency checks for the libraries and container auditing, as well as using other container security processes as part of the CI periodically or as part of PRs can largely prevent these vulnerabilities. Subscribing to tools that comply with vulnerable library databases such as OSVDB, Node Security Project, CIS, National Vulnerability Database, and Docker Bench for Security can help identify and fix the vulnerabilities periodically. A private docker registry can help.

Data Security

Data Security involves putting in place specific controls, standard policies, and procedures to protect data from a range of issues, including:

Enforced encryption: Encrypt, manage and secure data by safeguarding it in transit. Password-based, easy to use and very efficient.
Unauthorized access: Blocking unauthorized access plays a central role in preventing data breaches. Implementing Strong Password Policy and MFA.
Accidental loss: All data should be backed up. In the event of hardware or software failure, breach, or any other error to data; a backup allows it to continue with minimal interruption. Storing the files elsewhere can also quickly determine how much data was lost and/or corrupted.
Destruction: Endpoint Detection and Response (EDR) – provides visibility and defensive measures on the endpoint itself, when attacks occur on endpoint devices this can eliminate gaining access systems and avoid destruction of the data.

Infra/Cloud

In microservices and the Cloud Native architectural approach, the explosion of ephemeral, containerized services that arise from scaling applications developed increases the complexity of delivery. Fortunately, Kubernetes was developed just for this purpose. It provides DevOps teams with an orchestration capability for managing the multitude of deployed services, with in-built automation, resilience, load balancing, and much more. It's perfect for the reliable delivery of Cloud Native applications. Below are some of the key areas to get more control to establish policies, procedures and safeguards through the implementation of a set of rules for compliance. These rules cover infra privacy, security, breach notification, enforcement, and an omnibus rule that deals with security compliance.

Strong stance on authentication and authorization
Role-Based Access Control (RBAC)
Kubernetes infrastructure vulnerability scanning
Hunting misplaced secrets
Workload hardening from Pod Security to network policies
Ingress Controllers for security best practices
Constantly watch your Kubernetes deployments
Find deviations from desired baselines
Should alert or deny on policy violation
Block/Whitelist (IP or DNS) connections before entering the workloads.
Templatize the deployment/secrets configs and serve as config-as-code.

Network Security

Kubernetes brings new requirements for network security, because applications, that are designed to run on Kubernetes, are usually architected as microservices that rely on the network. They make API calls to each other. Steps must be taken to ensure proper security protocols are in place. The following are the key areas for implementing network security for a Kubernetes platform:

Container Groups: Coupled communication between grouped containers, is achieved inside the Pod that contains one or more containers.
Communication between Pods: Pods are the smallest unit of deployment in Kubernetes. A Pod can be scheduled on one of the many nodes in a cluster and has a unique IP address. Kubernetes places certain requirements on communication between Pods when the network has not been intentionally segmented. These requirements include:
Containers should be able to communicate with other Pods without using network address translation (NAT).
All the nodes in the cluster should be able to communicate with all the containers in the cluster.
The IP address assigned to a container should be the same that is visible to other entities communicating with the container.
Pods and Services: Since Pods are ephemeral in nature, an abstraction called a Service provides a long-lived virtual IP address that is tied to the service locator (e.g., a DNS name). Traffic destined for that service VIP is then redirected to one of the Pods and offers the service using that specific Pod’s IP address as the destination.
Traffic Direction: Traffic is directed to Pods and services in the cluster via multiple mechanisms. The most common is via an ingress controller, which exposes one or more service VIPs to the external network. Other mechanisms include node ports and even publicly-addressed Pods.

Operational Security

It is a procedural security that manages risk and encourages to view of operations from the perspective of an adversary to protect sensitive information from falling into the wrong hands. Following are a few best practices to implement a robust, comprehensive operational security program:

Implement precise change management processes: All changes should be logged and controlled so they can be monitored and audited.
Restrict access to network devices using AAA authentication: a “need-to-know” is a rule of thumb regarding access and sharing of information.
Least Privilege (PoLP): Give the minimum access necessary to perform their jobs.
Implement dual control: Those who work on the tasks are not the same people in charge of security.
Automate tasks: reduce the need for human intervention. Humans are the weakest link in any organization’s operational security initiatives because they make mistakes, overlook details, forget things, and bypass processes.

Incident response and disaster recovery planning: are always crucial components of a sound security posture, we must have a plan to identify risks, respond to them, and mitigate potential damages.

Setup Central Instance Infra

Overview

This page provides the step-by-step process for setting up the central-instance infra.

Pre-reads

Know about EKS:
Know node groups
Know about Taints and Tolerations
Know about node-affinity
Know what is terraform:

Pre-requisites

with admin access to provision EKS Service, you can always subscribe to a free AWS account to learn the basics and try, but there is a limit to , for this demo you need to have a commercial subscription to the EKS service.
Install version (0.14.10) for the Infra-as-code (IaC) to provision cloud resources as code and with desired resource graph and also it helps to destroy the cluster in one go.
Install on your local machine which helps you interact with the Kubernetes cluster
Install that helps you package the services along with the configurations, environments, secrets, etc into a
on your local machine so that you can use AWS CLI commands to provision and manage the cloud resources on your account.
Install which helps you authenticate your connection from your local machine so that you should be able to deploy DIGIT services.
Use the credentials provided for the Terraform () to connect to your AWS account and provision the cloud resources.
- You'll get a Secret Access Key and Access Key ID. Save them safely.
- Open the terminal and run the following command. The AWS CLI is already installed and the credentials are saved. (Provide the credentials and you can leave the region and output format blank).
  The above will create the following file In your machine as /Users/.aws/credentials

Before we provision the cloud resources, we need to understand and be sure about what resources need to be provisioned by Terraform to deploy DIGIT. The following picture shows the various key components. (EKS, Worker Nodes, Postgres DB, EBS Volumes, Load Balancer).

The following are the resources that we are going to provision using Terraform in a standard way so that every time and for every environment, it'll have the same infra.

EKS Control Plane (Kubernetes Master)
Work node group (VMs with the estimated number of vCPUs and memory)
Node-Groups
EBS Volumes (persistent volumes)
RDS (Postgresql)
VPCs (private network)
Users to access, deploy and read only

Provisioning Central Instance Infra Using Terraform

Fork the DIGIT-DevOps repository into your organization account using the GitHub web portal. Make sure to add the right users to the repository. Clone the forked DIGIT-DevOps repository. Navigate to the sample-central-instance directory which contains the sample AWS infra provisioning script.

main.tf

variables.tf

Central Monitoring Dashboard Setup

Overview

This page provides the steps and process to set up the central monitoring dashboard.

Pre-reads

https://github.com/kubecost/cost-analyzer-helm-chart#kubecost-helm-chart https://prometheus.io/docs/introduction/overview/ https://grafana.com/docs/

Pre-requisites

DIGIT uses golang (required v1.13.3) automated scripts to deploy the builds onto Kubernetes - Linux or Windows or Mac.
All DIGIT services are packaged using helm charts Installing Helm.
kubectl is a CLI to connect to the Kubernetes cluster from your machine.
Install VisualStudio IDE Code for better code/configuration editing capabilities.
Git
Cost-analyzer - cost-analyzer must be deployed on the client side.
Prometheus-operator - Prometheus must be deployed on the client side.
prometheus-kafka-exporter (A Prometheus exporter acts as a proxy between such an application and the Prometheus server) - prometheus-kafka-exporter must be deployed on the client side

Steps

Step 1: Install Prometheus Operator On Each Client Cluster

Step 2: Expose The Prometheus Operator

Expose the Prometheus Operator using nginx-ingress rule in each client cluster. This makes it easy to access Prometheus metrics in central-dashboard clusters

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    certmanager.k8s.io/cluster-issuer: letsencrypt-prod  //By using this annotation, a certificate will be issued
  labels:
    app: prometheus-operator
  name: prometheus-prometheus
  namespace: monitoring
spec:
  rules:
  - host: <host_name>                     //Replace with prometheus host
    http:
      paths:
      - backend:
          service:
            name: prometheus-prometheus
            port:
              number: 9090
        path: /
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - <host_name>                         //Replace with prometheus host
    secretName: <host_name>-tls-certs     //Replace with prometheus host

Note: Ensure you create the CANAME DNS record with the hostname and loadbalancer ID.

Step 3: Install Cost Analyzer On Each Client Cluster

Kubecost provides visibility into current and historical Kubernetes spending and resource allocation. These provide cost transparency in Kubernetes environments.

You can deploy the cost analyser using one of the below methods.

1. Deploy using go lang deployer

git clone https://github.com/egovernments/DIGIT-DevOps.git
cd DIGIT-DevOps/deploy-as-code/egov-deployer
go run main.go -e <environment_name> "cost-analyzer"

2. Deploy using Jenkin’s deployment job. Here we are using deploy-to-dev. Choose your environment-specific deployment job.

Step 4: Install Grafana On The Central Dashboard Cluster

Below Grafana configuration should be added to the environments file, and then Grafana should be deployed using one of the following methods.

Based on the number of client clusters, you have to add data sources. There should be one entry per client cluster as shown below.

grafana: 
  image:
    repository: grafana/grafana
    tag: 9.0.0

  initContainers:
    gitSync:
      enabled: true
      repo: "git@github.com:egovernments/configs"  // repo contain the dashboard json files 
      branch: "central-dashboard"                 
  ingress:
    hostName: central-dashboard.digit.org  //central-dashboard hostname
    context: ""
    additionalAnnotations: | 
  grafana.ini:
    server:
      root_url: "%(protocol)s://%(domain)s"
      serve_from_sub_path: true 

  env: |
    - name: GF_SERVER_DOMAIN
      value: {{ .Values.ingress.hostName | quote }}    
  datasources:                // Configure Grafana to use prometheus ddata sources    datasources.yaml:
      apiVersion: 1
      datasources:
      - name: DIGIT-Dev
        type: prometheus
        url: https://<Dev_hostname>  //Add exposed hostname for dev environment to access the dev prometheus metrics
        isDefault: false 
      - name: DIGIT-QA
        type: prometheus
        url: https://<QA_hostname>
        isDefault: false
      - name: DIGIT-UAT
        type: prometheus
        url: https://<UAT_hostname>
        isDefault: false
      - name: DIGIT-Staging
        type: prometheus
        url: https://<Staging_hostname>   
        isDefault: false

Deploy using go lang deployer

git clone https://github.com/egovernments/DIGIT-DevOps.git
cd DIGIT-DevOps/deploy-as-code/egov-deployer
go run main.go -e <environment_name> "grafana"

Deploy using Jenkin’s deployment job.

Step 4: DNS Mapping

To access the monitoring central dashboard with this https://central-dashboard.digit.org URL. Ensure you create the CANAME DNS record with the hostname and load balancer ID.

Kubernetes

RBAC Management

Role-based access control

Overview

Role-based access control (RBAC) regulates access to a computer or network resources based on the roles of individual users within your organization.

RBAC authorization uses the rbac.authorization.k8s.io API group to drive authorization decisions, allowing you to configure policies through the Kubernetes API dynamically.

API objects

The RBAC API declares four Kubernetes objects: Role, ClusterRole, RoleBinding and ClusterRoleBinding. You can , or amend them, using tools such as kubectl, just like any other Kubernetes object.

Caution: These objects, by design, impose access restrictions. If you are making changes to a cluster as you learn, see to understand how those restrictions can prevent you from making some changes.

Role & ClusterRole

An RBAC Role or ClusterRole contains rules that represent a set of permissions. Permissions are purely additive (there are no "deny" rules).

A Role always sets permissions within a particular ; when you create a Role, you have to specify the namespace it belongs in.

ClusterRole, by contrast, is a non-namespaced resource. The resources have different names (Role and ClusterRole) because a Kubernetes object always has to be either namespaced or not namespaced; it can't be both.

ClusterRoles have several uses. You can use a ClusterRole to:

define permissions on namespaced resources and be granted access within individual namespace(s)
define permissions on namespaced resources and be granted access across all namespaces
define permissions on cluster-scoped resources

If you want to define a role within a namespace, use a Role; if you want to define a role cluster-wide, use a ClusterRole.

A ClusterRole can be used to grant the same permissions as a Role. Because ClusterRoles are cluster-scoped, you can also use them to grant access to:

non-resource endpoints (like /healthz)
namespaced resources (like Pods), across all namespaces
For example: you can use a ClusterRole to allow a particular user to run kubectl get pods --all-namespaces

RoleBinding & ClusterRoleBinding

A role binding grants the permissions defined in a role to a user or set of users. It holds a list of subjects (users, groups, or service accounts), and a reference to the role being granted. A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding grants that access cluster-wide.

A RoleBinding may reference any Role in the same namespace. Alternatively, a RoleBinding can reference a ClusterRole and bind that ClusterRole to the namespace of the RoleBinding. If you want to bind a ClusterRole to all the namespaces in your cluster, you use a ClusterRoleBinding.

A RoleBinding can also reference a ClusterRole to grant the permissions defined in that ClusterRole to resources inside the RoleBinding's namespace. This kind of reference lets you define a set of common roles across your cluster, and then reuse them within multiple namespaces.

For instance, even though the following RoleBinding refers to a ClusterRole, "dave" (the subject, case sensitive) will only be able to read Secrets in the "development" namespace, because the RoleBinding's namespace (in its metadata) is "development".

Define the RBAC config in the Environment file

You must add a namespace to a role section to grant access to a group of a namespace.

DB Dump - Playground

This tutorial will walk you through How to take DB dump

Overview

On this page, you will find the steps on how to create a database dump.

Steps

To create a database dump, execute the dump command (given below) in the playground pod.
kubectl get pods -n playground
kubectl exec <playground-pod-name> -it -n playground bash
Use the below command to take a backup.
pg_dump -Fp --no-acl --no-owner --no-privileges -h <db-host> egov_db -U dbusername > backup.sql
gzip backup.sql.gz backup.sql
Copy the zip file to your local machine using the below command.
kubectl cp <playground-pod-name>:/backup.sql.gz backup.sql.gz -n playground

Setup Jenkins - Docker way

Jenkins for Build, Test and Deployment Automation

While we are adopting the Microservices architecture, it also demands to have an efficient CI/CD tools like jenkins. Along the cloud-native application developement and deployment jenkins can also be run cloud-native.

Since all processes, including software build, test and deployment, are performed every two or four weeks, this is an ideal playground for automation tools like Jenkins: After the developer commits a code change to the repository, Jenkins will detect this change and will trigger the build and test process. So Let's setup Jenkins as a docker container. Step-by-step.

Installing Jenkins the Docker Way

Tools and Versions used

VM or EC2 Instance or a Standalone on-premisis machin
Docker 1.12.1
Jenkins 2.32.2
Job DSL Plugin 1.58

Prerequisites:

Ubuntu or an Liniux Machine
Free RAM for the a VM/Machine >~ 4 GB.
Docker Host is available.
Tested with 3 vCPU (2 vCPU might work as well).

Step 1: Install a Docker on the Host you provisioned for jenkins and Connect to the Host via SSH

If you are using an host already has docker installed, you can skip this step. Make sure that your host has enough memory.

We will run Jenkins in a Docker container in order to allow for maximum interoperability. This way, we always can use the latest Jenkins version without the need to control the java version used.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but possible: here we will see an efficient way by using linux machine.

Prerequisites of this step:

I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy.
Administration rights on you computer.

Step 2: Download Jenkins Image

This extra download step is optional, since the Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ sudo docker pull jenkins
Using default tag: latest
latest: Pulling from library/jenkins
Digest: sha256:8820149b54bfc5d05146b82150b5fdab583eef3e0499fb4ed630f77647a42942
Status: Image is up to date for jenkins:latest

The version of the downloaded Jenkins image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm jenkins --version
2.19.3

We are using version 2.9.13 currently. If you want to make sure that you use the exact same version as I have used in this blog, you can use the imagename jenkins:2.19.3 in all docker commands instead of jenkins only.

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image. However, in Step 3, we will keep the entrypoint for now.

Step 3: Start Jenkins in interactive Terminal Mode

In this step, we will run Jenkins interactively (with -it switch instead of -d switch) to better see, what is happening. But first, we check that the port we will use is free:

(dockerhost)$ sudo docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
0ec82b4ca2fd        google/cadvisor:latest   "/usr/bin/cadvisor -l"   2 days ago          Up 2 days           0.0.0.0:8080->8080/tcp                           cadvisor
...

Since we see that one of the standard ports of Jenkins (8080, 50000) is already occupied and I do not want to confuse the readers of this blog post by mapping the port to another host port, I just stop the cadvisor container for this „hello world“:

(dockerhost)$ sudo docker stop cadvisor
cadvisor

Jenkins will be in need of a persistent storage. For that, we create a new folder on the Docker host:

(dockerhost)$ mkdir jenkins_home; cd jenkins_home

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image.

We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:14 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Logging initialized @347ms
Nov 30, 2016 6:12:14 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog warn
WARNING: Empty contextPath
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: jetty-9.2.z-SNAPSHOT
Nov 30, 2016 6:12:16 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started w.@7674f035{/,file:/var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started ServerConnector@548d708a{HTTP/1.1}{0.0.0.0:8080}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started @3258ms
Nov 30, 2016 6:12:17 PM winstone.Logger logInternal
INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Prepared all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Augmented all extensions
Nov 30, 2016 6:12:20 PM jenkins.InitReactorRunner$1 onAttained
INFO: Loaded all jobs
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 97 ms
Nov 30, 2016 6:12:20 PM org.jenkinsci.main.modules.sshd.SSHD start
INFO: Started SSHD at port 44955
Nov 30, 2016 6:12:21 PM jenkins.util.groovy.GroovyHookScript execute
INFO: Executing /var/jenkins_home/init.groovy.d/tcp-slave-agent-port.groovy
Nov 30, 2016 6:12:22 PM jenkins.InitReactorRunner$1 onAttained
INFO: Completed initialization
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf]: org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b: defining beans [authenticationManager]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7]: org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046: defining beans [filter,legacy]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM jenkins.install.SetupWizard init
INFO:

*************************************************************
*************************************************************
*************************************************************

Jenkins initial setup is required. An admin user has been created and a password generated.
Please use the following password to proceed to installation:

0c4a8413a47943ac935a4902e3b8167e

This may also be found at: /var/jenkins_home/secrets/initialAdminPassword

*************************************************************
*************************************************************
*************************************************************

Nov 30, 2016 6:12:27 PM hudson.model.UpdateSite updateData
INFO: Obtained the latest update center data file for UpdateSource default
Nov 30, 2016 6:12:27 PM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 4: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

The Jenkins login screen will open:

The admin password can be retrieved from the startup log, we have seen above (0c4a8413a47943ac935a4902e3b8167e), or we can find it by typing

(dockerhost: .../jenkins_home)$ cat secrets/initialAdminPassword
0c4a8413a47943ac935a4902e3b8167e

on the mapped jenkins_home folder on the Docker host.

Step 5: Install Plugins

Let us install the suggested plugins:

This may take a while to finish:

Step 6: Create an Admin User and log in

Then we reach a page, where we can create an Admin user:

Let us do so and save and finish.

Note: After this step, I have deleted the Jenkins container and started a new container attached to the same Jenkins Home directory. After that, all configuration and plugins were still available and we can delete containers after usage without loosing relevant information.

I have had a dinner break at this point. Maybe this is the reason I got following message when clicking the „Start using Jenkins“ button?

What ever. After clicking „retry“, we reach the login page:

Create a New Job

In the nex, we will create our first Jenkins job. I plan to trigger the Maven and/or Gradle build of a Java executable file upon detection of a code change.

Step 2: Install the Job DSL Plugin

The Job DSL Plugin can be installed like any other Jenkins plugin:

Step 3: Create Job DSL Jenkins Project

We create a Job DSL Job like follows:

Step 4: Configure Job DSL Project

-> if you have got a Github account, fork this open source Java Hello World software (originally created by of LableOrg) that will allow you to see, what happens with your Jenkins job, if you check in changed code. Moreover the hello world software allows you to perform JUnit 4 tests, run PowerMockito Mock services, run JUnit 4 Integration tests and calculate the code coverage using the tool Cobertura.

-> insert:

job('Job-DSL-Hello-World-Job') {
    scm {
        git('git://github.com/<org>/java-maven-junit-helloworld')
    }
    triggers {
        scm('H/15 * * * *')
    }
    steps {
        maven('-e clean test')
    }
}

here, exchange the username oveits by your own Github username.

Step 5: Prepare Maven Usage

Goto Jenkins -> Manage Jenkins -> Global Tool Configuration (available for Jenkins >2.0)

-> choose Version (3.3.9 in my case)

-> Add a name („Maven 3.3.9“ in my case)

Since we have checked „Install automatically“ above, I expect that it will be installed automatically on first usage.

Step 6: Prepare Git Usage

As described in this StackOverflow Q&A, we need to add the Git username and email address, since Jenkins tries to tag and commit on the Git repo, which requires those configuration items to be set. For that, we perform:

-> scroll down to „Git plugin“

Step 7: Create Jenkins Job from Code

Step 7.1 Build Project

Step 7.2 (optional): Check Console Output

Step 7.3: Review automatically built Project

This is showing a build failure, since I had not performed Step 5 and 6 before. In your case, it should be showing a success (in blue). If you are experiencing problems here, check out the Appendices below.

-> scroll down to Source Code Management

-> Scroll down to Build Triggers

-> Scroll down to Build

-> verify that „Maven 3.3.9“ is chosen as defined in Step 5

-> enter „-e clean test“ as Maven Goal

See, what happens by clicking on:

-> Build History

-> #nnn

If everything went fine, we will see many downloads and a „BUILD SUCCESS“:

Appendix A: Solve Git Problem: „tell me who you are“

Symptoms: Git Error: status code 128

In a new installation of Jenkins, Git does not seem to work out of the box. You can see this by choosing the Jenkins project Job-DSL-Hello-World-Job on the dashboard, then click „build now“, if the build was not already automatically triggered. Then:

-> Build History

-> Last Build (link works only, if Jenkins is running on localhost:8080 and you have chosen the same job name)

There, we will see:

Caused by: hudson.plugins.git.GitException: Command "git tag -a -f -m Jenkins Build #1 jenkins-Job-DSL-Hello-World-Job-1" returned status code 128:
stdout: 
stderr: 
*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: empty ident name (for <jenkins@61915398735e.(none)>) not allowed

Resolution:

Step 1: Enter Git Username and Email

As described in this StackOverflow Q&A: we can resolve this issue by either suppressing the git tagging, or (I think this is better) by adding your username and email address to git:

-> scroll down to „Git plugin“

Step 2: Re-run „Build Now“ on the Project

To test the new configuration, we go to

-> the Job-DSL-Hello-World-Job and press

Now, we should see a BUILD SUCCESS like follows:

-> Build History

-> #nnn

If everything went fine, we will a „BUILD SUCCESS“:

Appendix B: Maven Error: Cannot run program „mvn“

Symptoms:

When running a Maven Goal, the following error may appear on the Console log:

FATAL: command execution failed
java.io.IOException: Cannot run program "mvn" (in directory "/var/jenkins_home/workspace/Job-DSL-Hello-World-Job"): error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at hudson.Proc$LocalProc.(Proc.java:245)
	at hudson.Proc$LocalProc.(Proc.java:214)
	at hudson.Launcher$LocalLauncher.launch(Launcher.java:846)
	at hudson.Launcher$ProcStarter.start(Launcher.java:384)
	at hudson.Launcher$ProcStarter.join(Launcher.java:395)
	at hudson.tasks.Maven.perform(Maven.java:367)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
	at hudson.model.Build$BuildExecution.build(Build.java:205)
	at hudson.model.Build$BuildExecution.doRun(Build.java:162)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
	at hudson.model.Run.execute(Run.java:1728)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:404)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 15 more
Build step 'Invoke top-level Maven targets' marked build as failure
Finished: FAILURE

Resolution:

Perform Step 5

and

For Test, you can test a manual: choose the correct Maven version, when configuring a Maven build step like in Step 7:

For our case, we need to correct the Job DSL like follows:

In the Script, we had defined the step:

    steps {
        maven('-e clean test')
    }

However, we need to define the Maven Installation like follows:

    steps {
        maven {
            mavenInstallation("Maven 3.3.9")
            goals('-e clean test')
        }
    }

Here, the mavenInstallation needs to specify the exact same name, as the one we have chosen in Step 5 above.

After correction, we will receive the correct Maven goal

Now, we can check the Maven configuration:

After scrolling down, we will see the correct Maven Version:

DONE

Appendix C: Updating Jenkins

Updating Jenkins (in my case: from 2.32.1 to 2.32.2) was as simple as following the steps below

Note: you might want to make a backup of your jenkins_home though. Just in case…

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /user/jenkins_home/
(dockerhost)$ docker pull jenkins # to update the jenkins image
(dockerhost)$ docker rm jenkins # to make shure the container named jenkins is removed
(dockerhost:jenkins_home)$ sudo docker run -d --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins

However, after that, some data was unreadable:

I have clicked

to resolve the issue (hopefully…). At least, after that, the warning was gone.

Appendix D: Job DSL Syntax

The reference for the Job DSL syntax can be found on the Job DSL Plugin API pages. As an example, the syntax of Maven within a Freestyle project can be found on this page found via the path

> freeStyleJob > steps > maven:

maven {

// Allows direct manipulation of the generated XML.
configure(Closure configureBlock)
// Specifies the goals to execute including other command line options.
goals(String goals)
// Skip injecting build variables as properties into the Maven process.
injectBuildVariables(boolean injectBuildVariables = true)
// Set to use isolated local Maven repositories.
localRepository(javaposse.jobdsl.dsl.helpers.LocalRepositoryLocation location)
// Specifies the Maven installation for executing this step.
mavenInstallation(String name)
// Specifies the JVM options needed when launching Maven as an external process.
mavenOpts(String mavenOpts)
// Adds properties for the Maven build.
properties(Map props)
// Adds a property for the Maven build.
property(String key, String value)
// Specifies the managed global Maven settings to be used.
providedGlobalSettings(String settingsIdOrName)
// Specifies the managed Maven settings to be used.
providedSettings(String settingsIdOrName)
// Specifies the path to the root POM.
rootPOM(String rootPOM)

A Maven example can be found on the same page:

job('example') {
    steps {
        maven('verify')
        maven('clean verify', 'module-a/pom.xml')
        maven {
            goals('clean')
            goals('verify')
            mavenOpts('-Xms256m')
            mavenOpts('-Xmx512m')
            localRepository(LocalRepositoryLocation.LOCAL_TO_WORKSPACE)
            properties(skipTests: true)
            mavenInstallation('Maven 3.1.1')
            providedSettings('central-mirror')
        }
    }
}

Summary

In this blog post, we have learned how to

Start and initialize Jenkins via Docker
Prepare the usage of Git and Maven
Install the Job DSL Plugin
Define a Jenkins Job via Groovy script
Create a Jenkins Job by a push of the „Build now“ button
Review and run the automatically created Jenkins job

We have seen that the usage of the Job DSL is no rocket science. The only topic, we had to take care, is, that Git and Maven need to be prepared for first usage on a Jenkins server.

GitOps

Git Client installation

Git can be installed in any operating systems like Windows, Linux and Mac. Most of the Mac and Linux machines, Git will be pre-installed.

GitHub is an open source tool which helps the developers to manage, store, track and control changes in their code. If we want to clone(copy) the data from GItHub we need to install Git.
There are some alternatives for GitHub like GitLab, Bitbucket. But many developers prefer GitHub because it's more popular and they are used to the navigation. So we are using Git in DIGIT
GitHub is used to create Individual projects.

Checking for Git:

To check whether Git is already installed in your systems, open in terminal.

If you are in Mac, look for the command prompt application called "Terminal".
If you are in Windows, open the windows command prompt or "Git Bash".
Type the below command:

git version

Installing Git on Linux:

Ubuntu:

In Ubuntu using terminal we can directly install Git using terminal.

Go to command prompt shell and run the following command to make sure everything is up-to-date.

sudo apt-get update

After that run the following command to install Git.

sudo apt-get install git-all

Once the command output has completed, verify the installation using

git version

Installing Git on Windows and Mac:

Go to the following page to download the Git latest version: For Windows: https://gitforwindows.org/ For Mac: https://sourceforge.net/projects/git-osx-installer/files/git-2.23.0-intel-universal-mavericks.dmg/download?use_mirror=autoselect
Once the installation is done, open the windows command prompt or Git Bash and type

git version

GitHub organization creation

Creating a GitHub account and an organization to provide access and permissions to a repository.

What is organization and why we are creating in GitHub:

An organization are shared accounts where businesses and open source projects can collaborate across many projects at once.There are three types accounts in GitHub

Personal accounts
Organization accounts
Enterprise accounts

Here the main reason for creating organization account is, accounts can be shared among unlimited number of people and they can collaborate across many projects at once.
Our organization name is eGovernments Foundation.

GitHub account creation:

Go to
After completing the process Your GitHub account will be created.
Click on Sign Up. Create your account by using email and password. Then add Username.
After completing the process Your GitHub account will be created.

Creating Organization in GitHub:

After setting up the GitHub account, we have to create an organization. Here we can add the data or code in the form of repository. Creating a repository, we will see this topic next.
Open Github and click on the "+" icon add top tight corner. You will see the option"new organization". click it.

click on "create a free organization"and enter your organization name you want to create with email and then '"next"

After Organization got created, you can see your organizations by clicking on "Accounts

Adding new SSH key to it

With SSH keys, you can connect to GitHub without supplying your username and personal access token at each visit. You can also use an SSH key to sign commits.

Adding new SSH key to your GitHub account:

Open Your "Command prompt" or "Terminal".
Type below commands to generate SSH key

Now a .ssh folder is created in your home directory. Go to that directory.

copy the SSH key which we get after running the above commands.

open GitHub and add this SSH key as shown below:
open Settings and go to SSH and GPG keys

Click on New SSH key and paste it. Click on Add SSH key.
If you want check the private key, use

GitHub repo creation

You can store a variety of projects in GitHub repositories, including open source projects. with open source projects you can share your code in repositories with others to track your work.

To create a new repository, click on + icon and New repository

Create your with repo with any name based on your code. Make it as public. Then anyone can able to see your code.
If you want to add a README file, click on add a README file. It is helpful to understand how does the code present in repo will be helpful.

Next click on create repository.

GitHub Team creation

In eGovernments Foundations we are having multiple number of Teams. we can create independent teams to manage repository permissions and mentions for groups of people.
Only organization owners and maintainers can create team. Owners can also restrict creation permissions for all teams in an organization.
First sign in to your organization github account.

Once you sign in to your account and if you open view organization you can able to see the above page.
Click on Teams. You will see the below image.

Now, click on the New team
Fill the details as shown in the below image:

After creating team, you will able to see the below image.

If you click on members.you can add members to your team by providing their github username or mail.
Now, you have successfully created GitHub team.

Enabling Branch protection:

You can create branch protection rule, such as requiring an approving review or passing status checks for all pull requests merged into the protected branch.

Creating a branch inside repository:

Go to the repository and click on new branch.

Here I have created a branch named DIGIT
After, go to that branch in the same repository.

Creating branch protection rule:

Branch protection rule states that, how to manage the branch restrictions/permissions in GitHub.

NOTE : You must have admin access orelse you have to be a codeowner to make these changes for branch restrictions/permissions.

Open https://github.com and choose any repository.Go to the main page. Click on settings.

Click on branches

If you click on the Edit rules you can able to see the rules which are applied for that branch.you should follow the rules when ever you are going to made any changes to that branch and pushing it.
If you want to create new branch protection rule click on Add Rule.

The common restrictions we are following to merge branches are :

1.Requires pull request

2.Requires approvals from CODE OWNERS

Only the CODE OWNERS can have access to merge and makes changes to these rules.

CODEOWNER Reviewers

In every branch of repository there will be a CODEOWNER file. The people inside the CODEOWNER file are responsible for code in repository.

People with admin or owner permissions can set up a CODEOWNERS file in a repository.
The people you choose as code owners must have write permissions for the repository.
When the code owner is a team, that team must be visible and it must have write permissions, even if all the individual members of the team already have write permissions directly, through organization membership, or through another team membership.
For every branch there will be a CODEOWNER file. Only they can able to write the code and able to merge the pull requests.

Create CODEOWNER file:

Go to any of your branch(DIGIT branch created previously) in a repository and click on new file and name it as CODEOWNERS

Click on "Create a new branch for this commit and start a pull request" and click on propose new file

Next click on Create pull request and then Merge pull request and confirm merge.

Add the GitHub Id's of all the team or people whom you want to add.

Fork (Fork the mdms,config repo with a tenant-specific branch)

A fork is a copy of a repository that you manage. Forks let you make changes to a project without affecting the original repository.

You can fetch updates from or submit changes to the original repository with pull requests
A fork often occurs when a developer becomes dissatisfied or disillusioned with the direction of a project and wants to detach their work from that of the original project.

Working with Kubernetes

Installation of Kubectl

Kubectl is a command line tool that you use to communicate with the Kubernetes API server.

Kubernetes also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.kubectl, allows you to run commands against Kubernetes clusters.
If you want to study about kubernetes in detail, open

Why kubectl is using in DIGIT?

There are some other tools like kubelet along with kubectl. kubectl is the command-line interface (CLI) tool for working with a Kubernetes cluster. Kubelet is the technology that applies, creates, updates, and destroys containers on a Kubernetes node.But the only difference is, using kubectl the developer can interacts with kubernetes cluster. So we are using kubectl in DIGIT.

Note: If you are using AWS as service to create cluster, You must use a kubectl version that is within one minor version difference of your Amazon EKS cluster control plane. For example, a 1.23 kubectl client works with Kubernetes 1.22, 1.23, and 1.24 clusters

To know kubernetes is installed:

To install or update kubectl:

In Windows:

Download the kubectl . or if you have curl installed use this command:

If you want to download kubectl desired version just replace the version in above command with your version name

To download curl follow the page and proceed the download with curl
Append or prepend the kubectl binary folder to your PATH environment variable. To perform this, complete the following steps:

1.Open the kubectl.exe folder in files and copy that folder.

2.Create a new folder in Local Disk(C:) with name Kube.

3. Paste the kubectl.exe folder there.
4. Open windows option Search for Advanced system settings

5. Click on Environmental variables and then System variables>Path>add and add your path name i.e c:\kube
6. Save it.

Once you install kubectl, you can verify its version with the following command:

To install kubectl in linux:

To install kubectl in MacOs:

Containerizing application using Docker

Infra provisioning using Terraform

Installation of Terraform

Terraform: Terraform is an open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.

what is Terraform is used for: Terraform is an IAC tool, used primarily by DevOps teams to automate various infrastructure tasks. The provisioning of cloud resources, for instance, is one of the main use cases of Terraform. It is a open-source provisioning tool written in the Go language and created by HashiCorp.

To install Terraform, use the following link to download the zip file.

As per our requirment we have to install a specific version which is 0.14.10.

Install the unzip.

Extract the downloaded file archive.

Move the executable into a directory searched for executables.

Run the below command to check whether the terraform is working.

Customization of existing tf templates

In this document we are customizing the sample-aws terraform template to setup the DIGIT infra in aws.

Pre-requisites:

IDE Code for better code/configuration editing capabilities
Install v0.14.10.
Install .

Customization

Clone the DIGIT-DevOps repo

Here we are using AWS cloud service provider to create terraform infra. So, we are choosing sample-aws module (Terraform module is a collection of standard configuration files in a dedicated directory).
Open sample-aws in visual studio using the below command.

In that sample-aws module we can find the below terraform templates

main.tf will contain the main set of configuration for your module.
outputs.tf will contain the output definitions for your module. Module outputs are made available to the configuration using the module, so they are often used to pass information about the parts of your infrastructure defined by the module to other parts of your configuration.
providers.tf allow terraform to interact with cloud providers,SAAS providers. In this sample-aws our provider is aws.
variables.tf will contain the variable definitions for your module. When your module is used by others, the variables will be configured as arguments in the module block. Since all Terraform values must be defined, any variables that are not given a default value will become required arguments. Variables with default values can also be provided as module arguments, overriding the default value.
To setup the DIGIT infra we made changes in variables.tf. Open variables.tf in visual studio using the below code.

Change the values in variables.tf which are specified to replace based on our requirements.For example: cluster_name, network_availability_zones, availability_zones, ssh_key_name, db_name, db_username.
After customizing the values in variables.tf configure the aws credentials using the below commands.

Provide AWS access key id,AWS secret access key,Default region and Default output format.

Set aws_session _token using the below command.

To make sure that aws credentials are configured use the below command.

The output should be similar to the below image.

After that run the below commands in the terminal one after another.

terraform init is used to initialize your code to download the requirements mentioned in your code.
terraform plan is used to review changes and choose whether to simply accept them or not.
terraform apply is used to accept changes and apply them against real infrastructure.
After successfully running these commands we are able to set up the infra in aws. We are able to see the config file which is used to deploy the environment.
Want to destroy the terraform use the below command.

Moving Docker Images

To move docker images from one container to another container.

Pre-requisites:

Install in your local machine.
account.

Procedure:

To move the existing docker images from one account to another account by changing tags.
First, we have to login to the docker account in which the images are present.

We need to pull the image from the docker container to local machine.

Next, we have to change the tag name to our required docker container tag

Now, we have our required images with tags in our local machine. We need to push these images from local machine to destination container. First, login to the destination account using the above docker login command and then push the image using below command.

Once successfully pushed, if you check in your docker hub account the images will be present.

Deployment using helm

Setup Jenkins - Docker way

Jenkins for Build, Test and Deployment Automation

Installing Jenkins the Docker Way

Tools and Versions used

VM or EC2 Instance or a Standalone on-premisis machin
Docker 1.12.1
Jenkins 2.32.2
Job DSL Plugin 1.58

Prerequisites:

Ubuntu or an Liniux Machine
Free RAM for the a VM/Machine >~ 4 GB.
Docker Host is available.
Tested with 3 vCPU (2 vCPU might work as well).

Step 1: Install a Docker on the Host you provisioned for jenkins and Connect to the Host via SSH

If you are using an host already has docker installed, you can skip this step. Make sure that your host has enough memory.

We will run Jenkins in a Docker container in order to allow for maximum interoperability. This way, we always can use the latest Jenkins version without the need to control the java version used.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but possible: here we will see an efficient way by using linux machine.

Prerequisites of this step:

I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy.
Administration rights on you computer.

Step 2: Download Jenkins Image

This extra download step is optional, since the Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ sudo docker pull jenkins
Using default tag: latest
latest: Pulling from library/jenkins
Digest: sha256:8820149b54bfc5d05146b82150b5fdab583eef3e0499fb4ed630f77647a42942
Status: Image is up to date for jenkins:latest

The version of the downloaded Jenkins image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm jenkins --version
2.19.3

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image. However, in Step 3, we will keep the entrypoint for now.

Step 3: Start Jenkins in interactive Terminal Mode

In this step, we will run Jenkins interactively (with -it switch instead of -d switch) to better see, what is happening. But first, we check that the port we will use is free:

(dockerhost)$ sudo docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
0ec82b4ca2fd        google/cadvisor:latest   "/usr/bin/cadvisor -l"   2 days ago          Up 2 days           0.0.0.0:8080->8080/tcp                           cadvisor
...

(dockerhost)$ sudo docker stop cadvisor
cadvisor

Jenkins will be in need of a persistent storage. For that, we create a new folder on the Docker host:

(dockerhost)$ mkdir jenkins_home; cd jenkins_home

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image.

We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:14 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Logging initialized @347ms
Nov 30, 2016 6:12:14 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog warn
WARNING: Empty contextPath
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: jetty-9.2.z-SNAPSHOT
Nov 30, 2016 6:12:16 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started w.@7674f035{/,file:/var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started ServerConnector@548d708a{HTTP/1.1}{0.0.0.0:8080}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started @3258ms
Nov 30, 2016 6:12:17 PM winstone.Logger logInternal
INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Prepared all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Augmented all extensions
Nov 30, 2016 6:12:20 PM jenkins.InitReactorRunner$1 onAttained
INFO: Loaded all jobs
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 97 ms
Nov 30, 2016 6:12:20 PM org.jenkinsci.main.modules.sshd.SSHD start
INFO: Started SSHD at port 44955
Nov 30, 2016 6:12:21 PM jenkins.util.groovy.GroovyHookScript execute
INFO: Executing /var/jenkins_home/init.groovy.d/tcp-slave-agent-port.groovy
Nov 30, 2016 6:12:22 PM jenkins.InitReactorRunner$1 onAttained
INFO: Completed initialization
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf]: org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b: defining beans [authenticationManager]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7]: org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046: defining beans [filter,legacy]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM jenkins.install.SetupWizard init
INFO:

*************************************************************
*************************************************************
*************************************************************

Jenkins initial setup is required. An admin user has been created and a password generated.
Please use the following password to proceed to installation:

0c4a8413a47943ac935a4902e3b8167e

This may also be found at: /var/jenkins_home/secrets/initialAdminPassword

*************************************************************
*************************************************************
*************************************************************

Nov 30, 2016 6:12:27 PM hudson.model.UpdateSite updateData
INFO: Obtained the latest update center data file for UpdateSource default
Nov 30, 2016 6:12:27 PM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 4: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

The Jenkins login screen will open:

The admin password can be retrieved from the startup log, we have seen above (0c4a8413a47943ac935a4902e3b8167e), or we can find it by typing

(dockerhost: .../jenkins_home)$ cat secrets/initialAdminPassword
0c4a8413a47943ac935a4902e3b8167e

on the mapped jenkins_home folder on the Docker host.

Step 5: Install Plugins

Let us install the suggested plugins:

This may take a while to finish:

Step 6: Create an Admin User and log in

Then we reach a page, where we can create an Admin user:

Let us do so and save and finish.

Note: After this step, I have deleted the Jenkins container and started a new container attached to the same Jenkins Home directory. After that, all configuration and plugins were still available and we can delete containers after usage without loosing relevant information.

I have had a dinner break at this point. Maybe this is the reason I got following message when clicking the „Start using Jenkins“ button?

What ever. After clicking „retry“, we reach the login page:

Create a New Job

In the nex, we will create our first Jenkins job. I plan to trigger the Maven and/or Gradle build of a Java executable file upon detection of a code change.

Step 2: Install the Job DSL Plugin

The Job DSL Plugin can be installed like any other Jenkins plugin:

-> -> ->

-> (with dash between job and dsl; wait for the filter to become active and do not press enter, otherwise you will get an error message)

Step 3: Create Job DSL Jenkins Project

We create a Job DSL Job like follows:

Step 4: Configure Job DSL Project

-> insert:

job('Job-DSL-Hello-World-Job') {
    scm {
        git('git://github.com/<org>/java-maven-junit-helloworld')
    }
    triggers {
        scm('H/15 * * * *')
    }
    steps {
        maven('-e clean test')
    }
}

here, exchange the username oveits by your own Github username.

Step 5: Prepare Maven Usage

Goto Jenkins -> Manage Jenkins -> Global Tool Configuration (available for Jenkins >2.0)

Scroll down to Maven ->

-> choose Version (3.3.9 in my case)

-> Add a name („Maven 3.3.9“ in my case)

Since we have checked „Install automatically“ above, I expect that it will be installed automatically on first usage.

Step 6: Prepare Git Usage

-> scroll down to „Git plugin“

Step 7: Create Jenkins Job from Code

Step 7.1 Build Project

Step 7.2 (optional): Check Console Output

Step 7.3: Review automatically built Project

-> scroll down to Source Code Management

-> Scroll down to Build Triggers

-> Scroll down to Build

-> verify that „Maven 3.3.9“ is chosen as defined in Step 5

-> enter „-e clean test“ as Maven Goal

See, what happens by clicking on:

-> Build History

-> #nnn

If everything went fine, we will see many downloads and a „BUILD SUCCESS“:

Appendix A: Solve Git Problem: „tell me who you are“

Symptoms: Git Error: status code 128

-> Build History

-> Last Build (link works only, if Jenkins is running on localhost:8080 and you have chosen the same job name)

There, we will see:

Caused by: hudson.plugins.git.GitException: Command "git tag -a -f -m Jenkins Build #1 jenkins-Job-DSL-Hello-World-Job-1" returned status code 128:
stdout: 
stderr: 
*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: empty ident name (for <jenkins@61915398735e.(none)>) not allowed

Resolution:

Step 1: Enter Git Username and Email

As described in this StackOverflow Q&A: we can resolve this issue by either suppressing the git tagging, or (I think this is better) by adding your username and email address to git:

-> scroll down to „Git plugin“

Step 2: Re-run „Build Now“ on the Project

To test the new configuration, we go to

-> the Job-DSL-Hello-World-Job and press

Now, we should see a BUILD SUCCESS like follows:

-> Build History

-> #nnn

If everything went fine, we will a „BUILD SUCCESS“:

Appendix B: Maven Error: Cannot run program „mvn“

Symptoms:

When running a Maven Goal, the following error may appear on the Console log:

FATAL: command execution failed
java.io.IOException: Cannot run program "mvn" (in directory "/var/jenkins_home/workspace/Job-DSL-Hello-World-Job"): error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at hudson.Proc$LocalProc.(Proc.java:245)
	at hudson.Proc$LocalProc.(Proc.java:214)
	at hudson.Launcher$LocalLauncher.launch(Launcher.java:846)
	at hudson.Launcher$ProcStarter.start(Launcher.java:384)
	at hudson.Launcher$ProcStarter.join(Launcher.java:395)
	at hudson.tasks.Maven.perform(Maven.java:367)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
	at hudson.model.Build$BuildExecution.build(Build.java:205)
	at hudson.model.Build$BuildExecution.doRun(Build.java:162)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
	at hudson.model.Run.execute(Run.java:1728)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:404)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 15 more
Build step 'Invoke top-level Maven targets' marked build as failure
Finished: FAILURE

Resolution:

Perform Step 5

and

For Test, you can test a manual: choose the correct Maven version, when configuring a Maven build step like in Step 7:

and verify that does not throw the Maven error anymore.

For our case, we need to correct the Job DSL like follows:

In the Script, we had defined the step:

    steps {
        maven('-e clean test')
    }

However, we need to define the Maven Installation like follows:

    steps {
        maven {
            mavenInstallation("Maven 3.3.9")
            goals('-e clean test')
        }
    }

Here, the mavenInstallation needs to specify the exact same name, as the one we have chosen in Step 5 above.

After correction, we will receive the correct Maven goal

-> JobDSL ->

Now, we can check the Maven configuration:

After scrolling down, we will see the correct Maven Version:

DONE

Appendix C: Updating Jenkins

Updating Jenkins (in my case: from 2.32.1 to 2.32.2) was as simple as following the steps below

Note: you might want to make a backup of your jenkins_home though. Just in case…

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /user/jenkins_home/
(dockerhost)$ docker pull jenkins # to update the jenkins image
(dockerhost)$ docker rm jenkins # to make shure the container named jenkins is removed
(dockerhost:jenkins_home)$ sudo docker run -d --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins

However, after that, some data was unreadable:

I have clicked

-> Manage Jenkins -> Manage ->

to resolve the issue (hopefully…). At least, after that, the warning was gone.

Appendix D: Job DSL Syntax

The reference for the Job DSL syntax can be found on the Job DSL Plugin API pages. As an example, the syntax of Maven within a Freestyle project can be found on this page found via the path

> freeStyleJob > steps > maven:

maven {

// Allows direct manipulation of the generated XML.
configure(Closure configureBlock)
// Specifies the goals to execute including other command line options.
goals(String goals)
// Skip injecting build variables as properties into the Maven process.
injectBuildVariables(boolean injectBuildVariables = true)
// Set to use isolated local Maven repositories.
localRepository(javaposse.jobdsl.dsl.helpers.LocalRepositoryLocation location)
// Specifies the Maven installation for executing this step.
mavenInstallation(String name)
// Specifies the JVM options needed when launching Maven as an external process.
mavenOpts(String mavenOpts)
// Adds properties for the Maven build.
properties(Map props)
// Adds a property for the Maven build.
property(String key, String value)
// Specifies the managed global Maven settings to be used.
providedGlobalSettings(String settingsIdOrName)
// Specifies the managed Maven settings to be used.
providedSettings(String settingsIdOrName)
// Specifies the path to the root POM.
rootPOM(String rootPOM)

A Maven example can be found on the same page:

job('example') {
    steps {
        maven('verify')
        maven('clean verify', 'module-a/pom.xml')
        maven {
            goals('clean')
            goals('verify')
            mavenOpts('-Xms256m')
            mavenOpts('-Xmx512m')
            localRepository(LocalRepositoryLocation.LOCAL_TO_WORKSPACE)
            properties(skipTests: true)
            mavenInstallation('Maven 3.1.1')
            providedSettings('central-mirror')
        }
    }
}

Summary

In this blog post, we have learned how to

Start and initialize Jenkins via Docker
Prepare the usage of Git and Maven
Install the Job DSL Plugin
Define a Jenkins Job via Groovy script
Create a Jenkins Job by a push of the „Build now“ button
Review and run the automatically created Jenkins job

We have seen that the usage of the Job DSL is no rocket science. The only topic, we had to take care, is, that Git and Maven need to be prepared for first usage on a Jenkins server.

DIGIT - Infra Overview

Operational Guidelines & Security Standards

Objective

DIGIT Infrastructure Overview

Below are the useful links to understand Kubernetes:

DIGIT On Kubernetes - High-Level Deployment Diagram

DIGIT Infra Specification On SDC or NIC Or Any Commercial Cloud

Systems

Specification

Spec/Count

Comment

Operational Recommendations

Monitoring Tools Recommendations

Key Standard Operating Procedures (SOPs)

Segregation of duties and responsibilities.
SME and SPOCs for L1.5 support along with the SLAs defined.
Ticketing system to manage incidents, converge and collaborate on various operational issues.
Monitoring dashboards at various levels like Infrastructure, networks and applications.
Transparency of monitoring data and collaboration between teams.
Periodic remote sync-up meetings, acceptance and attendance to the meeting.
Ability to see stakeholders' availability of calendar time to schedule meetings.
Periodic (weekly, monthly) summary reports of the various infra, operations incident categories.
Communication channels and synchronization regularly and also upon critical issues, changes, upgrades, releases etc.

Segregation Of Duties

State program team - Refers to the owner for the whole DIGIT implementation, application rollouts, and capacity building. Responsible for identifying and synchronizing the operating mechanism between the below teams.
Implementation partner - Refers to the DIGIT Implementation, application performance monitoring for errors, logs scrutiny, TPS on peak load, distributed tracing, DB queries analysis, etc.
Operations team - this team could be an extension of the implementation team that is responsible for DIGIT deployments, configurations, CI/CD, change management, traffic monitoring and alerting, log monitoring and dashboard, application security, DB Backups, application uptime, etc.

Skills To Setup, Operate & Maintain DIGIT On SDC

Tools/Skills

Specification

Weightage (1-5)

Yes/No

Resource Requirement

Team

Roles

Responsibility

DIGIT - Security Standards & Operational Recommendations

DIGIT - Key Security Principles

Subscribe to the DIGIT applicable OWASP top 10 standard across various security layers.

Minimize attack surface area
Implement a strong identity foundation - Who accesses what and who does what.
Apply security at all possible layers
Automate security best practices
Separation of duties (SoD).
The principle of Least privilege (PoLP)
Templatized design - (Code, Images, Infra-as-code, Deploy-as-code, Conf-as-code, etc)
Align with MeiTY Standards to meet SDC Infra policies.

Security Layers & Line of Control

Security Layers

Line Of Controls

Application Layer

Key Security Measures

To prevent this:

We should set a limit on how many requests in a given period can be sent to each API.
If the number exceeds the limit, block access from a particular API, at least for some reasonable interval.
Also, make sure to analyze the payload for threats.
The incoming calls from a gateway API would also have to be rate-limited.
Add filters to the router to drop packets from suspicious sources.

How to prevent:

Applications must validate data input to the web application from user browsers.
All output from the web application to user browsers must be encoded.
Users must have the option to disable client-side scripts.

How to prevent:

By including a challenge token with each request, the developer can ensure that the request is valid and not coming from a source other than the user.

How to prevent:

Using parameterized queries which specify placeholders for parameters so that the database will always treat them as data rather than part of an SQL command. Prepared statements and object-relational mappers (ORMs) make this easy for developers.
Remediate SQLi vulnerabilities in legacy systems by escaping inputs before adding them to the query. Use this technique only where prepared statements or similar facilities are unavailable.
Mitigate the impact of SQLi vulnerabilities by enforcing the least privilege on the database. Ensure that each application has its database credentials and that these credentials have the minimum rights the application needs.

Security In The Code

What defines a security in the code:

Libraries/Containers

Data Security

Data Security involves putting in place specific controls, standard policies, and procedures to protect data from a range of issues, including:

Enforced encryption: Encrypt, manage and secure data by safeguarding it in transit. Password-based, easy to use and very efficient.
Unauthorized access: Blocking unauthorized access plays a central role in preventing data breaches. Implementing Strong Password Policy and MFA.
Accidental loss: All data should be backed up. In the event of hardware or software failure, breach, or any other error to data; a backup allows it to continue with minimal interruption. Storing the files elsewhere can also quickly determine how much data was lost and/or corrupted.
Destruction: Endpoint Detection and Response (EDR) – provides visibility and defensive measures on the endpoint itself, when attacks occur on endpoint devices this can eliminate gaining access systems and avoid destruction of the data.

Infra/Cloud

Strong stance on authentication and authorization
Role-Based Access Control (RBAC)
Kubernetes infrastructure vulnerability scanning
Hunting misplaced secrets
Workload hardening from Pod Security to network policies
Ingress Controllers for security best practices
Constantly watch your Kubernetes deployments
Find deviations from desired baselines
Should alert or deny on policy violation
Block/Whitelist (IP or DNS) connections before entering the workloads.
Templatize the deployment/secrets configs and serve as config-as-code.

Network Security

Container Groups: Coupled communication between grouped containers, is achieved inside the Pod that contains one or more containers.
Communication between Pods: Pods are the smallest unit of deployment in Kubernetes. A Pod can be scheduled on one of the many nodes in a cluster and has a unique IP address. Kubernetes places certain requirements on communication between Pods when the network has not been intentionally segmented. These requirements include:
Containers should be able to communicate with other Pods without using network address translation (NAT).
All the nodes in the cluster should be able to communicate with all the containers in the cluster.
The IP address assigned to a container should be the same that is visible to other entities communicating with the container.
Pods and Services: Since Pods are ephemeral in nature, an abstraction called a Service provides a long-lived virtual IP address that is tied to the service locator (e.g., a DNS name). Traffic destined for that service VIP is then redirected to one of the Pods and offers the service using that specific Pod’s IP address as the destination.
Traffic Direction: Traffic is directed to Pods and services in the cluster via multiple mechanisms. The most common is via an ingress controller, which exposes one or more service VIPs to the external network. Other mechanisms include node ports and even publicly-addressed Pods.

Operational Security

Implement precise change management processes: All changes should be logged and controlled so they can be monitored and audited.
Restrict access to network devices using AAA authentication: a “need-to-know” is a rule of thumb regarding access and sharing of information.
Least Privilege (PoLP): Give the minimum access necessary to perform their jobs.
Implement dual control: Those who work on the tasks are not the same people in charge of security.
Automate tasks: reduce the need for human intervention. Humans are the weakest link in any organization’s operational security initiatives because they make mistakes, overlook details, forget things, and bypass processes.

Incident response and disaster recovery planning: are always crucial components of a sound security posture, we must have a plan to identify risks, respond to them, and mitigate potential damages.

Elastic Search Rolling Upgrade

Overview

This page provides comprehensive documentation and instructions for implementing a rolling upgrade strategy for your Elasticsearch cluster.

Steps

Note: During the rolling upgrade, it is anticipated that there will be some downtime. Additionally, ensure to take an elasticdump of the Elasticsearch data using the script provided below in the playground pod.

Copy the below script and save it as es-dump.sh. Replace the elasticsearch URL and the indices names in the script.

#!/bin/bash
#es-dump.sh

#  Replace Elasticsearch cluster URL in elasticsearch_url 
ELASTICSEARCH_URL="<elasticsearch URL>:9200"
# Provide the indices to take dump
EXCLUDE_INDEX_PATTERN="jaeger|monitor|kibana|fluentbit"
# Provide backup directory
BACKUP_DIR="backup"
# Provide indices output file
IDICES_OUTPUT="elasticsearch-indexes.txt"

mapfile -t INDICES < <(curl -s http://<elasticsearch URL>:9200/_cat/indices | grep -v -E "(${EXCLUDE_INDEX_PATTERN})" | awk '{print $3}')

printf "%s\n" "${INDICES[@]}" > $IDICES_OUTPUT

# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"

# Loop through each index and perform export
for INDEX in "${INDICES[@]}"; do
    OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_mapping_backup.json"

    # Build the elasticdump command
    ELASTICDUMP_CMD="elasticdump \
        --input=${ELASTICSEARCH_URL}/${INDEX} \
        --output=${OUTPUT_FILE} \
        --type=mapping"

    # Execute the elasticdump command
    $ELASTICDUMP_CMD

    # Check if the elasticdump command was successful
    if [ $? -eq 0 ]; then
        echo "Backup of index ${INDEX} mapping completed successfully."
    else
        echo "Error backing up index ${INDEX}."
    fi
done

for INDEX in "${INDICES[@]}"; do
    OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_data_backup.json"

    # Build the elasticdump command
    ELASTICDUMP_CMD="elasticdump \
        --input=${ELASTICSEARCH_URL}/${INDEX} \
        --output=${OUTPUT_FILE} \
        --type=data
        --timeout=300000
        --limit 10000
        --skip-existing"

    # Execute the elasticdump command
    $ELASTICDUMP_CMD

    # Check if the elasticdump command was successful
    if [ $? -eq 0 ]; then
        echo "Backup of index ${INDEX} completed successfully."
    else
        echo "Error backing up index ${INDEX}."
    fi
done

Run the below commands in the terminal.

export KUBECONFIG=<path_to_your_kubeconfig>
kubectl get pods -n playground
kubectl cp <path_to_script_in_your_machine>/es-dump.sh playground/<playground_name>:<path_in_playground_pod>/es-dump.sh

Now, run the below command inside the playground pod.

# Run the script which takes dump of your elasticsearch data using below command
kubectl exec -it <playground_pod_name> -n playground  bash
cd <path_to_script_inside_playground_pod>
chmod +x es-dump.sh
./es-dump.sh

# When playground pod restarts the data will be lost. So, to store data in your local machine run below command
 kubectl cp playground/<playground_pod_name>:/backup <path_to_store_in_local>/backup

Rolling upgrade from v6.6.2 to v7.17.15

Steps

List the elasticsearch pods and enter into any of the elasticsearch pod shells.

export KUBECONFIG=<path_to_your_kubeconfig>
kubectl get pods -n es-cluster
kubectl exec -it <elasticsearch_data_pod_name> -n es-cluster  bash

Disable shard allocation: You can avoid racing the clock by disabling the allocation of replicas before shutting down data nodes. Stop non-essential indexing and perform a synced flush: While you can continue indexing during the upgrade, shard recovery is much faster if you temporarily stop non-essential indexing and perform a synced-flush. Run the below curls inside elasticsearch data pod.

# Replace elasticsearch url
curl -X PUT "<elasticsearch_url>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}
'

curl -X POST "<elasticsearch_url>:9200/_flush/synced?pretty"

Scale down the replica count of elasticsearch master and data from 3 to 0.

kubectl get statefulsets -n es-cluster
kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=0
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=0

Edit the Statefulset of elasticsearch master by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 6.6.2 to 7.17.15. The below code provides the depraced environment variables and compatible environment variables.

# Depricated environment variables
- env:
  - name: discovery.zen.minimum_master_nodes
    value: "2"
  - name: discovery.zen.ping.unicast.hosts
    value: elasticsearch-master-v1
  - name: node.data
    value: "false"
  - name: node.ingest
    value: "false"
  - name: node.master
    value: "true"
  - name: gateway.expected_master_nodes
    value: "2"
  - name: gateway.expected_data_nodes
    value: "1"
  - name: gateway.recover_after_time
    value: 5m
  - name: gateway.recover_after_master_nodes
    value: "2"
  - name: gateway.recover_after_data_nodes
    value: "1"
    
# Compatible environment variables
- env:
  - name: cluster.initial_master_nodes
    value: elasticsearch-master-v1-0,elasticsearch-master-v1-1,elasticsearch-master-v1-2 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: master

Edit elasticsearch-master values.yaml file

# values.yaml

ClusterName: "elasticsearch"
nodeGroup: master-v1

Edit the Statefulset of elasticsearch data by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 6.6.2 to 7.17.15.

# Depricated environment variables
- env:
  - name: discovery.zen.ping.unicast.hosts
    value: elasticsearch-master-v1
  - name: node.data
    value: "true"
  - name: node.ingest
    value: "true"
  - name: node.master
    value: "false"
  - name: gateway.expected_master_nodes
    value: "2"
  - name: gateway.expected_data_nodes
    value: "1"
  - name: gateway.recover_after_time
    value: 5m
  - name: gateway.recover_after_master_nodes
    value: "2"
  - name: gateway.recover_after_data_nodes
    value: "1"
  - name: ingest.geoip.downloader.enabled
    value: "false"
    
# Compatible environment variables
- env: 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: data,ingest

Edit elasticsearch-data values.yaml file.

# values.yaml

ClusterName: "elasticsearch"
nodeGroup: "data-v1"

After making the changes, scale up the statefulsets of elasticsearch data and master.

kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=3
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=3

After all pods are in running state, re-enable shard allocation and check cluster health.

# Enter into elasticsearch pod
kubectl exec -it <elasticsearch_data_pod_name> -n es-cluster  bash

#Run below curl commands
curl -X PUT "<elasticsearch_url>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}
'

curl -X GET "<elasticsearch_url>:9200/_cat/health?v=true&pretty"

You have successfully upgraded the elasticsearch cluster from v6.6.2 to v7.17.15 :)

ReIndexing the Indices:

After successfully upgrading the elasticsearch, reindex the indices present in elasticsearch using below script which are created in v6.6.2 or earlier.

Copy the below script and save it as es-reindex.sh. Replace the elasticsearch URL in the script.

#!/bin/bash

ELASTICSEARCH_URL="<Elasticsearch URL>:9200"
TMP="_tmp"

FILENAME="elasticsearch-indexes.txt"
INDICES=()
while IFS= read -r index; do
    INDICES+=("$index")
done < "$FILENAME"

# do for all abc elastic indices
for INDEX in "${INDICES[@]}"; do
    sleep 5
    echo -e "Reindex process starting for index: $INDEX\n"
    tmp_index=$INDEX${TMP}
    echo "Starting reindexing elastic data from original index:$INDEX to temporary index:$tmp_index"
    output=$(curl -X POST "${ELASTICSEARCH_URL}/_reindex" --max-time 3600 -H 'Content-Type: application/json' -d'
    {
      "source": {
        "index": "'"$INDEX"'"
      },
      "dest": {
        "index": "'"$tmp_index"'"
      }
    }
    ')
    sleep 5
    echo -e "Reindexing completed from original index:$INDEX to temporary index:$tmp_index with output: $output\n"
    echo -e "Deleting $INDEX\n"
    output=$(curl -X DELETE "${ELASTICSEARCH_URL}/$INDEX")
    echo -e "$INDEX deleted with status: $output\n"
    echo "Starting reindexing elastic data from temporary index:$tmp_index to original index:$INDEX"
    output=$(curl -X POST "${ELASTICSEARCH_URL}/_reindex" --max-time 3600 -H 'Content-Type: application/json' -d'
    {
      "source": {
        "index": "'"$tmp_index"'"
      },
      "dest": {
        "index": "'"$INDEX"'"
      }
    }
    ')
    echo -e "Reindexing completed from temporary index:$tmp_index to original index:$INDEX with output: $output\n"
    echo -e "Deleting $tmp_index\n"
    output=$(curl -X DELETE "${ELASTICSEARCH_URL}/$tmp_index")
    echo -e "$tmp_index deleted with status: $output\n\n\n"
done

Run the below commands in the terminal.

export KUBECONFIG=<path_to_your_kubeconfig>
kubectl get pods -n playground
kubectl cp <path_to_script_in_your_machine>/es-reindex.sh playground/<playground_name>:<path_in_playground_pod>/es-dump.sh

Now, run the below command inside the playground pod.

# Run the script which reinex the indicesc of your elasticsearch data using below command
kubectl exec -it <playground_pod_name> -n playground  bash
cd <path_to_script_inside_playground_pod>
chmod +x es-reindex.sh
./es-reindex.sh

NOTE: Make Sure to delete jaeger indices as mapping is not supported in v8.11.3 and the indices which are created before v7.17.15 by reindexing. If the indices which are created in v6.6.2 or earlier are present then the upgradation from v7.17.15 to v8.11.3 may fail.

Rolling upgrade from v7.17.15 to v8.11.3 & security is disabled

Steps

List the elasticsearch pods and enter into any of the elasticsearch pod shells.

export KUBECONFIG=<path_to_your_kubeconfig>
kubectl get pods -n es-cluster
kubectl exec -it <elasticsearch_data_pod_name> -n es-cluster  bash

Disable shard allocation: You can avoid racing the clock by disabling the allocation of replicas before shutting down data nodes. Stop non-essential indexing and perform a synced flush: While you can continue indexing during the upgrade, shard recovery is much faster if you temporarily stop non-essential indexing and perform a synced-flush. Run the below curls inside elasticsearch data pod.

# Replace elasticsearch url
curl -X PUT "<elasticsearch_url>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}
'

curl -X POST "<elasticsearch_url>:9200/_flush/synced?pretty"

Scale down the replica count of elasticsearch master and data from 3 to 0.

kubectl get statefulsets -n es-cluster
kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=0
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=0

Edit the Statefulset of elasticsearch master by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 7.17.15 to 8.11.3. The below code provides the compatible environment variables and if you are following a rolling upgrade then there are no deprecated environment variables from v7.17.15 to v8.11.3.

# Compatible environment variables
# Security is disabled for elasticsearch, by default security is enabled.
- env:
  - name: cluster.initial_master_nodes
    value: elasticsearch-master-v1-0,elasticsearch-master-v1-1,elasticsearch-master-v1-2 
  - name: xpack.security.enabled
    value: false 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: master

Edit the Statefulset of elasticsearch data by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 7.17.15 to 8.11.3.

# Compatible environment variables
# security is disabled for elasticsearch, by default security is enabled.
- env:
  - name: cluster.initial_master_nodes
    value: elasticsearch-master-v1-0,elasticsearch-master-v1-1,elasticsearch-master-v1-2 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: data,ingest
  - name: xpack.security.enabled
    value: false

After making the changes, scale up the statefulsets of elasticsearch data and master.

kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=3
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=3

After all pods are in running state, re-enable shard allocation and check cluster health.

# Enter into elasticsearch pod
kubectl exec -it <elasticsearch_data_pod_name> -n es-cluster  bash

#Run below curl commands
curl -X PUT "<elasticsearch_url>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}
'

curl -X GET "<elasticsearch_url>:9200/_cat/health?v=true&pretty"

Tracing

Pre-reads

This doc covers the steps on how to deploy an OpenTelemetry collector on Kubernetes. We will then use an OTEL instrumented (Go) application provided by OpenTelemetry to send traces to the Collector. From there, we will bring the trace data to a Jaeger collector. Finally, the traces will be visualised using the Jaeger UI.

This image shows the flow between the application, OpenTelemetry collector and Jaeger.

This OpenTelemetry repository provides a complete demo on how you can deploy OpenTelemetry on Kubernetes, we can use this as a starting point.

Pre-requisites

To start off, we need a Kubernetes cluster you can use any of your existing Kubernetes clusters that has got the apx 2vCPUs, 4GB RAM, and 100GB Storage.

Local Kubernetes Cluster Setup

Skip this in case you have the existing cluster.

In case, you don't have the ready Kubernetes but you have a good local machine with at least 4GB RAM left, you can use a local instance of Kind. The application will access this Kubernetes cluster through a NodePort (on port 30080). So make sure this port is free.

conn, err := grpc.DialContext(ctx, "localhost:30080", grpc.WithTransportCredentials(insecure.NewCredentials()), grpc.WithBlock())

To use NodePort with Kind, we need to first enable it.

Extra port mappings can be used to port forward to the kind nodes. This is a cross-platform option to get traffic into your kind cluster.

vim kind-config.yaml

kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4nodes:- role: control-plane  # port forward 30080 on the host to 30080 on this node  extraPortMappings:  - containerPort: 30080    hostPort: 30080- role: worker

Create the cluster with: kind create cluster --config kind-config.yaml

Creating cluster "kind" ... ✓ Ensuring node image (kindest/node:v1.24.0) 🖼 ✓ Preparing nodes 📦 📦 ✓ Writing configuration 📜 ✓ Starting control-plane 🕹️ ✓ Installing CNI 🔌 ✓ Installing StorageClass 💾 ✓ Joining worker nodes 🚜Set kubectl context to "kind-kind"You can now use your cluster with:kubectl cluster-info --context kind-kindThanks for using kind! 😊

Once our Kubernetes cluster is up, we can start deploying Jaeger.

What is Jaeger?

Jaeger is an open-source distributed tracing system for tracing transactions between distributed services. It’s used for monitoring and troubleshooting complex microservices environments. By doing this, we can view traces and analyse the application’s behaviour.

Why do we need it?

Using a tracing system (like Jaeger) is especially important in microservices environments since they are considered a lot more difficult to debug than a single monolithic application.

Problems that Jaeger addresses?

Distributed tracing monitoring
Performance and latency optimisation
Root cause analysis
Service dependency analysis

Deploy Jaeger

To deploy Jaeger on the Kubernetes cluster, we can make use of the Jaeger operator.

Operators are pieces of software that ease the operational complexity of running another piece of software.

Deploy Jaeger Operator

You first install the Jaeger Operator on Kubernetes. This operator will then watch for new Jaeger custom resources (CR).

There are different ways of installing the Jaeger Operator on Kubernetes:

using Helm
using Deployment files

Before you start, pay attention to the Prerequisite section.
Since version 1.31 the Jaeger Operator uses webhooks to validate Jaeger custom resources (CRs). This requires an installed version of the cert-manager.

Installing Cert-Manager

cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes and OpenShift workloads. It will obtain certificates from a variety of Issuers, both popular public Issuers as well as private Issuers, and ensure the certificates are valid and up-to-date, and will attempt to renew certificates at a configured time before expiry.

Installation of cert-manager of is very simple, just run:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml

By default, cert-manager will be installed into the cert-manager namespace.

You can verify the installation by following the instructions here

With cert-manager installed, let’s continue with the deployment of Jaeger

Installing Jaeger Operator Using Helm

Jump over to Artifact Hub and search for jaeger-operator

Add the Jaeger Tracing Helm repository:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts

To install the chart with the release name my-release (in the default namespace)

helm install my-release jaegertracing/jaeger-operator

You can also install a specific version of the helm chart:

helm install my-release jaegertracing/jaeger-operator --version 2.25.0

Verify that it’s installed on Kubernetes:

helm list -A

You can also deploy the Jaeger operator using deployment files.
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.36.0/jaeger-operator.yaml

At this point, there should be a jaeger-operator deployment available.

kubectl get deployment my-jaeger-operator

NAME                 READY   UP-TO-DATE   AVAILABLE   AGEmy-jaeger-operator   1/1     1            1           2m58s

The operator is now ready to create Jaeger instances.

Deploy Jaeger All-in-One

The operator that we just installed doesn’t do anything itself, it just means that we can create jaeger resources/instances that we want the jaeger operator to manage.

The simplest possible way to create a Jaeger instance is by deploying the All-in-one strategy, which installs the all-in-one image, and includes the agents, collector, query and the Jaeger UI in a single pod using in-memory storage.

Create a yaml file like the following. The name of the Jaeger instance will be simplest

vim simplest.yaml

apiVersion: jaegertracing.io/v1kind: Jaegermetadata:  name: simplest

kubectl apply -f simplest.yaml

After a little while, a new in-memory all-in-one instance of Jaeger will be available, suitable for quick demos and development purposes.

When the Jaeger instance is up and running, we can check the pods and services.

kubectl get pods

NAME                     READY STATUS    RESTARTS   AGEsimplest-656d7cf5c8-lff7b 1/1  Running   0          3m55s

kubectl get services

To get the pod name, query for the pods belonging to the simplest Jaeger instance:

Query the logs from the pod:

kubectl logs -l app.kubernetes.io/instance=simplest

{"level":"info","ts":1660155049.86027,"caller":"channelz/logging.go:50","msg":"[core]Channel Connectivity change to READY","system":"grpc","grpc_log":true}{"level":"info","ts":1660155049.8612773,"caller":"grpc/builder.go:120","msg":"Agent collector connection state change","dialTarget":":14250","status":"READY"}{"level":"info","ts":1660155049.8617437,"caller":"app/server.go:241","msg":"Starting HTTP server","port":16686,"addr":":16686"}{"level":"info","ts":1660155049.8621716,"caller":"app/server.go:260","msg":"Starting GRPC server","port":16685,"addr":":16685"}

Let’s open the Jaeger UI

Use port-forwarding to access the Jaeger UI

kubectl port-forward svc/simplest-query 16686:16686

Forwarding from 127.0.0.1:16686 -> 16686Forwarding from [::1]:16686 -> 16686

Jaeger UI

Deploy Open Telemetry Collector

To deploy the OpenTelemetry collector, we will use this otel-collector.yaml file as a starting point. The yaml file consists of a ConfigMap, Service and a Deployment.

vim otel-collector.yaml

Make sure to change the name of the jaeger collector (exporter) to match the one we deployed above. In our case, that would be:

exporters:      jaeger:        endpoint: "simplest-collector.default.svc.cluster.local:14250"

Also, pay attention to receivers. This part creates the receiver on the Collector side and opens up the port 4317 for receiving traces, which enables the application to send data to the OpenTelemetry Collector.

...  otel-collector-config: |    receivers:      otlp:        protocols:          grpc:            endpoint: "0.0.0.0:4317"...

Apply the file with: kubectl apply -f otel-collector.yaml

configmap/otel-collector-conf createdservice/otel-collector createddeployment.apps/otel-collector created

Verify that the OpenTelemetry Collector is up and running.

kubectl get deployment

kubectl logs deployment/otel-collector

"Everything is ready. Begin running and processing data."

Run Application

Time to send some trace data to our OpenTelemetry collector.

Remember, that the application access the Kubernetes cluster through a NodePort on port 30080. The Kubernetes service will bind the 4317 port used to access the OTLP receiver to port 30080 on the Kubernetes node.
By doing so, it makes it possible for us to access the Collector by using the static address <node-ip>:30080. In case you are running a local cluster, this will be localhost:30080. Source

This repository contains an (SDK) instrumented application written in Go, that simulates an application.

go run main.go

2022/08/10 20:31:37 Waiting for connection...2022/08/10 20:31:37 Doing really hard work (1 / 10)2022/08/10 20:31:38 Doing really hard work (2 / 10)2022/08/10 20:31:39 Doing really hard work (3 / 10)2022/08/10 20:31:40 Doing really hard work (4 / 10)2022/08/10 20:31:41 Doing really hard work (5 / 10)2022/08/10 20:31:42 Doing really hard work (6 / 10)2022/08/10 20:31:43 Doing really hard work (7 / 10)2022/08/10 20:31:44 Doing really hard work (8 / 10)2022/08/10 20:31:45 Doing really hard work (9 / 10)2022/08/10 20:31:46 Doing really hard work (10 / 10)2022/08/10 20:31:47 Done!

Viewing the data

Let’s check out the telemetry data generated by our sample application

Again, we can use port-forwarding to access Jaeger UI.

kubectl port-forward svc/simplest-query 16686:16686

Open the web-browser and go to http://127.0.0.1:16686/

Under Service select test-service to view the generated traces.

The service name is specified in the main.go file.

res, err := resource.New(ctx,  resource.WithAttributes(   // the service name used to display traces in backends   semconv.ServiceNameKey.String("test-service"),  ),

The application will access this Kubernetes cluster through a NodePort (on port 30080). The URL is specified here:

conn, err := grpc.DialContext(ctx, "localhost:30080", grpc.WithTransportCredentials(insecure.NewCredentials()), grpc.WithBlock()) if err != nil {  return nil, fmt.Errorf("failed to create gRPC connection to collector: %w", err) }

Done

This document has covered how we deploy an OpenTelemetry collector on Kubernetes. Then we sent trace data to this collector using an Otel SDK instrumented application written in Go. From there, the traces were sent to a Jaeger collector and visualised in Jaeger UI.