1 of 78

Operations Guide

In progress

This guide provides a step-by-step guide to monitoring and operating the DIGIT Platform and services in production.

DIGIT - Infra Overview

Operational Guidelines & Security Standards

Objective

The objective is to provide a clear guide for efficiently using DIGIT infrastructure on various platforms like SDC, NIC, or commercial clouds. This document outlines the infrastructure overview, operational guidelines, and recommendations, along with the segregation of duties (SoD). It helps to plan the procurement and build the necessary capabilities to deploy and implement DIGIT.

In a shared control scenario, the state program team must adhere to these guidelines and develop their own control implementation for the state's cloud infrastructure and collaborations with partners. This ensures standardized and smooth operational excellence in the overall system.

DIGIT Infrastructure Overview

DIGIT Platform is designed as a microservices architecture, using open-source technologies and containerized apps and services. DIGIT components/services are deployed as docker containers on a platform called Kubernetes, which provides flexibility for running cloud-native applications anywhere like physical or virtual infrastructure or hypervisor or HCI and so on. Kubernetes handles the work of scheduling containerized services onto a compute cluster and manages the workloads to ensure they run as intended. And it substantially simplifies the deployment and management of microservices.

Provisioning the Kubernetes cluster will vary across from commercial clouds to state data centres, especially in the absence of managed Kubernetes services like AWS, Azure, GCP and NIC. Kubernetes clusters can also be provisioned on state data centres with bare-metal, virtual machines, hypervisors, HCI, etc. However providing integrated networking, monitoring, logging, and alerting is critical for operating Kubernetes Clusters when it comes to State data centers. DIGIT Platform also offers add-ons to monitor Kubernetes cluster performance, logging, tracing, service monitoring and alerting, which the implementation team can take advantage.

Below are the useful links to understand Kubernetes:

DIGIT On Kubernetes - High-Level Deployment Diagram

DIGIT Infra Specification On SDC or NIC Or Any Commercial Cloud

Systems

Specification

Spec/Count

Comment

User Accounts/VPN

Dev, UAT and Prod Envs

User Roles

Admin, Deploy, ReadOnly

Any Linux (preferably Ubuntu/RHEL)

All

Kubernetes as a managed service or VMs to provision Kubernetes

Managed Kubernetes service with HA/DRS

(Or) VMs with 2 vCore, 4 GB RAM, 20 GB Disk

If no managed k8s

3 VMs/env

Dev - 3 VMs

UAT - 3VMs

Prod - 3VMs

Kubernetes worker nodes or VMs to provision Kube worker nodes.

VMs with 4 vCore, 16 GB RAM, 20 GB Disk / per env

3-5 VMs/env

DEV - 3VMs

UAT - 4VMs

PROD - 5VMs

Disk Storage (NFS/iSCSI)

Storage with backup, snapshot, dynamic inc/dec

1 TB/env

Dev - 1000 GB

UAT - 800 GB

PROD - 1.5 TB

VM Instance IOPS

Max throughput 1750 MB/s

1750 MS/s

Disk IOPS

Max throughput 1000 MB/s

1000 MB/s

Internet Speed

Min 100 MB - 1000MB/Sec (dedicated bandwidth)

Public IP/NAT or LB

Internet-facing 1 public ip per env

3 Ips

Availability Region

VMs from the different region is preferable for the DRS/HA

at least 2 Regions

Private vLan

Per env all VMs should within private vLan

Gateways

NAT Gateway, Internet Gateway, Payment and SMS gateway

1 per env

Firewall

Ability to configure Inbound, Outbound ports/rules

Managed DataBase

(or) VM Instance

Postgres 12 above Managed DB with backup, snapshot, logging.

(Or) 1 VM with 4 vCore, 16 GB RAM, 100 GB Disk per env.

per env

DEV - 1VMs

UAT - 1VMs

PROD - 2VMs

CI/CD server self hosted (or) Managed DevOps

Self Hosted Jenkins : Master, Slave (VM 4vCore, 8 GB each)

(Or) Managed CI/CD: NIC DevOps or AWS CodeDeploy or Azure DevOps

2 VMs (Master, Slave)

Nexus Repo

Self hosted Artifactory Repo (Or) NIC Nexus Artifactory

DockerRegistry

DockerHub (Or) SelfHosted private docker reg

Git/SCM

GitHub (Or) Any Source Control tool

DNS

main domain & ability to add more sub-domain

SSL Certificate

NIC managed (Or) SDC managed SSL certificate per URL

2 urls per env

Operational Recommendations

DIGIT strongly recommends Site reliability engineering (SRE) principles as a key means to bridge development and operations gaps by applying a software engineering mindset to system and IT administration topics. In general, an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.

Monitoring Tools Recommendations

Commercial clouds like AWS, Azure and GCP offer sophisticated monitoring solutions across various infra levels like CloudWatch and StackDriver. In the absence of such managed services to monitor, we can look at various best practices and tools listed below which help in debugging and troubleshooting efficiently.

Key Standard Operating Procedures (SOPs)

Segregation of duties and responsibilities.
SME and SPOCs for L1.5 support along with the SLAs defined.
Ticketing system to manage incidents, converge and collaborate on various operational issues.
Monitoring dashboards at various levels like Infrastructure, networks and applications.
Transparency of monitoring data and collaboration between teams.
Periodic remote sync-up meetings, acceptance and attendance to the meeting.
Ability to see stakeholders' availability of calendar time to schedule meetings.
Periodic (weekly, monthly) summary reports of the various infra, operations incident categories.
Communication channels and synchronization regularly and also upon critical issues, changes, upgrades, releases etc.

Segregation Of Duties

While DIGIT is deployed at state cloud infrastructure, it is essential to identify and distinguish the responsibilities between Infrastructure, Operations and Implementation partners. Identify these teams and assign SPOC, define responsibilities and ensure the Incident Management process is followed to visualize, track issues and manage dependencies between teams. Essentially these are monitored through dashboards and alerts are sent to the stakeholders proactively. eGov team can provide consultation and training on a need basis depending on any of the below categories.

State IT/Cloud team -Refers to state infra team for the Infra, network architecture, LAN network speed, internet speed, OS Licensing and upgrade, patch, compute, memory, disk, firewall, IOPS, security, access, SSL, DNS, data backups/recovery, snapshots, capacity monitoring dashboard.

State program team - Refers to the owner for the whole DIGIT implementation, application rollouts, and capacity building. Responsible for identifying and synchronizing the operating mechanism between the below teams.
Implementation partner - Refers to the DIGIT Implementation, application performance monitoring for errors, logs scrutiny, TPS on peak load, distributed tracing, DB queries analysis, etc.
Operations team - this team could be an extension of the implementation team that is responsible for DIGIT deployments, configurations, CI/CD, change management, traffic monitoring and alerting, log monitoring and dashboard, application security, DB Backups, application uptime, etc.

Skills To Setup, Operate & Maintain DIGIT On SDC

Tools/Skills

Specification

Weightage (1-5)

Yes/No

System Administration

Linux Administration, troubleshooting, OS Installation, Package Management, Security Updates, Firewall configuration, Performance tuning, Recovery, Networking, Routing tables, etc

Containers/Dockers

Build/Push docker containers, tune and maintain containers, Startup scripts, Troubleshooting dockers containers.

Kubernetes

Setup kubernetes cluster on bare-metal and VMs using kubeadm/kubespary, terraform, etc. Strong understanding of various kubernetes components, configurations, kubectl commands, RBAC. Creating and attaching persistent volumes, log aggregation, deployments, networking, service discovery, Rolling updates. Scaling pods, deployments, worker nodes, node affinity, secrets, configMaps, etc..

Database Administration

Setup PostGres DB, Set up read replicas, Backup, Log, DB RBAC setup, SQL Queries

Docker Registry

Setup docker registry and manage

SCM/Git

Source Code management, branches, forking, tagging, Pull Requests, etc.

CI Setup

Jenkins Setup, Master-slave configuration, plugins, jenkinsfile, groovy scripting, Jenkins CI Jobs for Maven, Node application, deployment jobs, etc.

Artifact management

Code artifact management, versioning

Apache Tomcat

Web server setup, configuration, load balancing, sticky sessions, etc

WildFly JBoss

Application server setup, configuration, etc.

Spring Boot

Build and deploy spring boot applications

NodeJS

NPM Setup and build node applications

Scripting

Shell scripting, python scripting.

Log Management

Aggregating system, container logs, troubleshooting. Monitoring Dashboard for logs using prometheus, fluentd, Kibana, Grafana, etc.

WordPress

Multi-tenant portal setup and maintain

Resource Requirement

Team

Roles

Responsibility

Program Management

Responsible for driving the Transformation Vision for State Team Formation, reviewing them and resolving hurdles for the teams.

Program Leader

Overall responsibility to Drive Vision of the program.

Identify Success Metrics for the program and the budgets for it. Staff the teams with the right / capable people to drive the outcomes.

Define program Structure and ensure that the various teams work in tandem towards the Program Plan/ Schedule.

Review program Progress and remove bottlenecks for the Implementation Teams

Procurement

Help timely procurements of various items/ services needed for the Program

Program Manager

Plan, establish tracking mechanism,

Track and Manage Program activities,

Conduct reviews with various teams to drive the Program. Ensure that the efforts of various teams are aligned.

Escalate/ seek support as appropriate to the Program Leader.

Program Coordinator

Track progress of activities,

Help documentation of the Program team,

Coordinate meeting schedules and logistics.

Implementation Review

Reports to program leader

Ensures Processes and System adoption happens in the ULB

Ensures the Program metrics are headed in the right direction (Their responsibility will extend well beyond the technical rollout)

Domain Team

Finalize finance and other related processes for all ULBs, Provide Specific Inputs to Technical Implementation team, Capacity Building,

Data Preparation

Oversee UAT

Monitor data to identify process execution on ground, Identify improvement areas for the Finance function.

State Finance Accounting Leader

Should be a TRUSTED Line Function person, who can be the guide to all the Accounting Head at the ULBs.

Should be able to take decisions for the state on all ULB Finance processes and appropriate automation related to that.

Finance Advisors / Consultants/ Accounts officers

Finalise Standardised Finance processes that need to be there on the ground to realise the State's vision.

Technology Implementation Team

Technical Specialist team that has knowledge of the eGov Platform, technologies, the DIGIT modules.

Configure/ customise the product to the needs of the state. Integrate the product with other systems as needed and manage and support the State

Technical Program Manager

Has a good understanding of the eGov Platform/ Product.

Plans the Technical Track of the Product Manage Technical team Coordinates with various stakeholders during different phases of Implementation to get the Product ready for rollout in the ULBs. Plan and schedule activities as needed in the program.

He/She will be part of the Program Management team.

Business Analysts

Study and design State specific Accounting and other taxation Processes working with the Domain team.

Capture and document all Processes

Ensure that the Product will meet the needs of the State

Software Designers / Architects

Designing Software requirements based on the requirements finalised by the Business Analysts and leveraging platform as appropriate.

Business Analysts

Study and design State specific Processes working with the Domain team.

Capture and document all Processes and ensure that the Product will meet the needs of the State.

Software Designers / Architects

Designing Software requirements based on the requirements finalised by the Business Analysts and leveraging platform as appropriate.

Developers

Configurations, Customization and Data Loads.

Testers

Test configuration / customisation and regression testing for each release

Project Coordinator

Coordinate activities amongst the various stakeholder and logistics support

DevOps & Cloud Monitoring

Release Management, Managing Repository, Security and Build tools

DBA

Postgres DBA. Database Tuning, backup, Archiving

Field Team

Statewide capacity building (Including Change Management). Experience in Finance Area preferred.

Measure training effectiveness and fine-tune approach.

Plan refresher training as needed.

Content Developer

Prepare content for training different roles in DIGIT.

Trainers

Execute training as per content developed for the different roles in DIGIT.

Capture feedback and identify additional training needs if required

Help Desk and Support

Central help desk

Onground support in a planned manner to each ULB during the first 2 months after rolling out.

Help Desk leader

Organise and run the help desk operations.

Ensure that tickets are handled as per agreed SLAs, Coordinate with Technical team as needed.

Analyse Help desk calls and identify potential areas for the Domain / Business Analysts to work on.

Central Help Desk

To take care of L1 and L2 Support.

Ensure Tracking of issues on the help desk tool.

Provide On ground support (Face to face) during the first 2 months of rollout

At Least 1 person per 3-4 Ulbs who can travel during the first 2 months to provide support to end users. This is more for confidence building and ensuring adoption.

DIGIT - Security Standards & Operational Recommendations

This section provides insights on security principles, security layers and the line of control that we focus on to prevent DIGIT security from the code, application, access, infra and operations. The target audience of this section are internal teams, partners, ecosystems and states to understand what security measures to be considered to secure DIGIT from an infrastructure and operations perspective .

DIGIT - Key Security Principles

Subscribe to the DIGIT applicable OWASP top 10 standard across various security layers.

Minimize attack surface area
Implement a strong identity foundation - Who accesses what and who does what.
Apply security at all possible layers
Automate security best practices
Separation of duties (SoD).
The principle of Least privilege (PoLP)
Templatized design - (Code, Images, Infra-as-code, Deploy-as-code, Conf-as-code, etc)
Align with MeiTY Standards to meet SDC Infra policies.

Security Layers & Line of Control

Security Layers

Line Of Controls

Application Layer

WAF, IAM, VA/PT, XSS, CSRF, SQLi, DDoS Defense.

Code

Defining security in the code, Static/Dynamic vulnerabilities scan

Libraries/Containers

Templatize Design, Vulnerabilities scanning at CI

Data

Encryption, Backups, DLP

Network

TLS, Firewalls, Ingress/Egress, Routing.

Infra/Cloud

Configurations/Infra Templates, ACL, user/privilege mgmt, Secrets mgmt

Operations

(PoLP) Least Privilege, Shared Responsibilities, CSA, etc

Application Layer

The presentation layer is likely to be the #1 attack vector for malicious individuals seeking to breach security defences like DDoS attacks, Malicious bots, Cross-Site Scripting (XSS) and SQL injection. Need to invest in web security testing with the powerful combination of tools, automation, process and speed that seamlessly integrates testing into software development, helping to eliminate vulnerabilities more effectively, deploy a web application firewall (WAF) that monitors and filters traffic to and from the application, blocking bad actors while safe traffic proceeds normally.

Key Security Measures

1. TLS-protocols/Encryption: Access control to secure authentication and authorization. All APIs that are exposed must have HTTPS certificates and encrypt all the communication between client and server with transport layer security (TLS).

2. Auth Tokens: An authorization framework that allows users to obtain admittance to a resource from the server. This is done using tokens in microservices security patterns: resource server, resource owner, authorization server, and client. These tokens are responsible for access to the resource before its expiry time. Also, Refresh Tokens that are responsible for requesting new access after the original token has expired.

3. Multi-factor Authentication: authorize users on the front end, which requires a username and password as well as another form of identity verification to offer users better protection by default as some aspects are harder to steal than others. For instance, using OTP for authentication takes microservice security to a whole new level.

4. Rate Limit/DDoS: denial-of-service attacks are the attempts to send an overwhelming number of service messages to cause application failure by concentrating on volumetric flooding of the network pipe. Such attacks can target the entire platform and network stack.

To prevent this:

We should set a limit on how many requests in a given period can be sent to each API.
If the number exceeds the limit, block access from a particular API, at least for some reasonable interval.
Also, make sure to analyze the payload for threats.
The incoming calls from a gateway API would also have to be rate-limited.
Add filters to the router to drop packets from suspicious sources.

5. Cross-site scripting (XSS): scripts that are embedded in a webpage and executed on the client side, in a user’s browser, instead of on the server side. When applications take data from users and dynamically include it in webpages without validating the data properly, attackers can execute arbitrary commands and display arbitrary content in the user’s browser to gain access to account credentials.

How to prevent:

Applications must validate data input to the web application from user browsers.
All output from the web application to user browsers must be encoded.
Users must have the option to disable client-side scripts.

6. Cross-Site Request Forgery (CSRF): is an attack whereby a malicious website will send a request to a web application that a user is already authenticated against from a different website. This way an attacker can access functionality in a target web application via the victim's already authenticated browser. Targets include web applications like social media, in-browser email clients, online banking and web interfaces for network devices. To prevent this CSRF tokens are appended to each request and associated to the user’s session. Such tokens should at a minimum be unique per user session, but can also be unique per request.

How to prevent:

By including a challenge token with each request, the developer can ensure that the request is valid and not coming from a source other than the user.

8. SQL Injection (SQLi): allows attackers to control an application’s database – letting them access or delete data, change an application’s data-driven behaviour, and do other undesirable things – by tricking the application into sending unexpected SQL commands. SQL injections are among the most frequent threats to data security.

How to prevent:

Using parameterized queries which specify placeholders for parameters so that the database will always treat them as data rather than part of an SQL command. Prepared statements and object-relational mappers (ORMs) make this easy for developers.
Remediate SQLi vulnerabilities in legacy systems by escaping inputs before adding them to the query. Use this technique only where prepared statements or similar facilities are unavailable.
Mitigate the impact of SQLi vulnerabilities by enforcing the least privilege on the database. Ensure that each application has its database credentials and that these credentials have the minimum rights the application needs.

Security In The Code

The primary causes of commonly exploited software vulnerabilities are consistent defects, bugs, and logic flaws in the code. Poor coding practices can create vulnerabilities in the system that can be exploited by cybercriminals.

What defines a security in the code:

1. White-box code analysis: As developers write code, the IDE needs to provide focused, real-time security feedback with white-box code analysis. It also helps developers remediate faster and learn on the job through positive reinforcement, remediation guidance, code examples, etc.

2. Static Code Analysis (SAST): A static analysis tool reviews program code, searching for application coding flaws, back doors or other malicious code that could give hackers access to critical data or customer information. However, most static analysis tools can only scan source code.

3: Vulnerability assessment: Vulnerability assessment for the third-party libraries/artefacts as part of CI and GitHub PR process. Test results are returned quickly and prioritized in a Fix-First Analysis that identifies both the most urgent flaws and the ones that can be fixed most quickly, allowing developers to optimize efforts and save additional resources.

4. Secure PII/Encrypt: Personally identifying information – to make sure that it is not being displayed as plain text. All the passwords and usernames must be masked during the storing in logs or records. However, adding extra encryption above TLS/HTTP won’t add protection for traffic travelling through the wire. It can only help a little bit at the point where TLS terminates, so it can protect sensitive data (such as passwords or credit card numbers) from accidental dumping into a request log. Extra encryption (RSA 2048+ or Blowfish) might help protect data against those attacks that aim at accessing the log data. But it will not help with those who try accessing the memory of the application servers or the main data storage.

5. Manual Penetration Testing: Some categories of vulnerabilities, such as authorization issues and business logic flaws, cannot be found with automated assessments and will always require a skilled penetration tester to identify them. Need to employ Manual Penetration Testing that uses proven practices to provide extensive and comprehensive security testing results for web, mobile, desktop, and back-end with detailed results, including attack simulations.

Libraries/Containers

Components, such as libraries, frameworks, container images, and other software modules, almost always run with full privileges. If a vulnerable component is exploited, such an attack can facilitate serious data loss or server takeover. Applications using components with known vulnerabilities may undermine application defences and enable a range of possible attacks and impacts.

Automating dependency checks for the libraries and container auditing, as well as using other container security processes as part of the CI periodically or as part of PRs can largely prevent these vulnerabilities. Subscribing to tools that comply with vulnerable library databases such as OSVDB, Node Security Project, CIS, National Vulnerability Database, and Docker Bench for Security can help identify and fix the vulnerabilities periodically. A private docker registry can help.

Data Security

Data Security involves putting in place specific controls, standard policies, and procedures to protect data from a range of issues, including:

Enforced encryption: Encrypt, manage and secure data by safeguarding it in transit. Password-based, easy to use and very efficient.
Unauthorized access: Blocking unauthorized access plays a central role in preventing data breaches. Implementing Strong Password Policy and MFA.
Accidental loss: All data should be backed up. In the event of hardware or software failure, breach, or any other error to data; a backup allows it to continue with minimal interruption. Storing the files elsewhere can also quickly determine how much data was lost and/or corrupted.
Destruction: Endpoint Detection and Response (EDR) – provides visibility and defensive measures on the endpoint itself, when attacks occur on endpoint devices this can eliminate gaining access systems and avoid destruction of the data.

Infra/Cloud

In microservices and the Cloud Native architectural approach, the explosion of ephemeral, containerized services that arise from scaling applications developed increases the complexity of delivery. Fortunately, Kubernetes was developed just for this purpose. It provides DevOps teams with an orchestration capability for managing the multitude of deployed services, with in-built automation, resilience, load balancing, and much more. It's perfect for the reliable delivery of Cloud Native applications. Below are some of the key areas to get more control to establish policies, procedures and safeguards through the implementation of a set of rules for compliance. These rules cover infra privacy, security, breach notification, enforcement, and an omnibus rule that deals with security compliance.

Strong stance on authentication and authorization
Role-Based Access Control (RBAC)
Kubernetes infrastructure vulnerability scanning
Hunting misplaced secrets
Workload hardening from Pod Security to network policies
Ingress Controllers for security best practices
Constantly watch your Kubernetes deployments
Find deviations from desired baselines
Should alert or deny on policy violation
Block/Whitelist (IP or DNS) connections before entering the workloads.
Templatize the deployment/secrets configs and serve as config-as-code.

Network Security

Kubernetes brings new requirements for network security, because applications, that are designed to run on Kubernetes, are usually architected as microservices that rely on the network. They make API calls to each other. Steps must be taken to ensure proper security protocols are in place. The following are the key areas for implementing network security for a Kubernetes platform:

Container Groups: Coupled communication between grouped containers, is achieved inside the Pod that contains one or more containers.
Communication between Pods: Pods are the smallest unit of deployment in Kubernetes. A Pod can be scheduled on one of the many nodes in a cluster and has a unique IP address. Kubernetes places certain requirements on communication between Pods when the network has not been intentionally segmented. These requirements include:
Containers should be able to communicate with other Pods without using network address translation (NAT).
All the nodes in the cluster should be able to communicate with all the containers in the cluster.
The IP address assigned to a container should be the same that is visible to other entities communicating with the container.
Pods and Services: Since Pods are ephemeral in nature, an abstraction called a Service provides a long-lived virtual IP address that is tied to the service locator (e.g., a DNS name). Traffic destined for that service VIP is then redirected to one of the Pods and offers the service using that specific Pod’s IP address as the destination.
Traffic Direction: Traffic is directed to Pods and services in the cluster via multiple mechanisms. The most common is via an ingress controller, which exposes one or more service VIPs to the external network. Other mechanisms include node ports and even publicly-addressed Pods.

Operational Security

It is a procedural security that manages risk and encourages to view of operations from the perspective of an adversary to protect sensitive information from falling into the wrong hands. Following are a few best practices to implement a robust, comprehensive operational security program:

Implement precise change management processes: All changes should be logged and controlled so they can be monitored and audited.
Restrict access to network devices using AAA authentication: a “need-to-know” is a rule of thumb regarding access and sharing of information.
Least Privilege (PoLP): Give the minimum access necessary to perform their jobs.
Implement dual control: Those who work on the tasks are not the same people in charge of security.
Automate tasks: reduce the need for human intervention. Humans are the weakest link in any organization’s operational security initiatives because they make mistakes, overlook details, forget things, and bypass processes.

Incident response and disaster recovery planning: are always crucial components of a sound security posture, we must have a plan to identify risks, respond to them, and mitigate potential damages.

Kubernetes

RBAC Management

Role-based access control

Overview

Role-based access control (RBAC) regulates access to a computer or network resources based on the roles of individual users within your organization.

RBAC authorization uses the rbac.authorization.k8s.io API group to drive authorization decisions, allowing you to configure policies through the Kubernetes API dynamically.

API objects

The RBAC API declares four Kubernetes objects: Role, ClusterRole, RoleBinding and ClusterRoleBinding. You can describe objects, or amend them, using tools such as kubectl, just like any other Kubernetes object.

Caution: These objects, by design, impose access restrictions. If you are making changes to a cluster as you learn, see privilege escalation prevention and bootstrapping to understand how those restrictions can prevent you from making some changes.

Role & ClusterRole

An RBAC Role or ClusterRole contains rules that represent a set of permissions. Permissions are purely additive (there are no "deny" rules).

A Role always sets permissions within a particular namespace; when you create a Role, you have to specify the namespace it belongs in.

ClusterRole, by contrast, is a non-namespaced resource. The resources have different names (Role and ClusterRole) because a Kubernetes object always has to be either namespaced or not namespaced; it can't be both.

ClusterRoles have several uses. You can use a ClusterRole to:

define permissions on namespaced resources and be granted access within individual namespace(s)
define permissions on namespaced resources and be granted access across all namespaces
define permissions on cluster-scoped resources

If you want to define a role within a namespace, use a Role; if you want to define a role cluster-wide, use a ClusterRole.

ClusterRole In DIGIT

A ClusterRole can be used to grant the same permissions as a Role. Because ClusterRoles are cluster-scoped, you can also use them to grant access to:

cluster-scoped resources (like nodes)
non-resource endpoints (like /healthz)
namespaced resources (like Pods), across all namespaces
For example: you can use a ClusterRole to allow a particular user to run kubectl get pods --all-namespaces

Here is the Digit ClusterRole that can be used to grant read access and restricted admin access

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: digit-user
rules:
- apiGroups:
  - "extensions"
  resources:
  - deployments
  verbs:
  - patch
- apiGroups:
  - ""
  resources:
  - pods/portforward
  - pods/proxy 
  verbs:
  - create  
  - delete
---  
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: digit-admin
rules:
- apiGroups:
  - ""
  resources:
  - pods/portforward
  - pods/proxy 
  - pods/exec 
  verbs:
  - create  
  - delete
- apiGroups:
  - "batch"
  resources:
  - jobs 
  verbs:
  - create  
  - update
  - patch
  - list
- apiGroups: 
  - "apps"
  - "extensions"  
  resources: 
  - deployments
  verbs: 
  - patch 
  - get 
  - list
  - update

RoleBinding & ClusterRoleBinding

A role binding grants the permissions defined in a role to a user or set of users. It holds a list of subjects (users, groups, or service accounts), and a reference to the role being granted. A RoleBinding grants permissions within a specific namespace whereas a ClusterRoleBinding grants that access cluster-wide.

A RoleBinding may reference any Role in the same namespace. Alternatively, a RoleBinding can reference a ClusterRole and bind that ClusterRole to the namespace of the RoleBinding. If you want to bind a ClusterRole to all the namespaces in your cluster, you use a ClusterRoleBinding.

RoleBinding In DIGIT

Here is the Digit rolebinading that we are using to grant access to group

{{- with index .Values "cluster-configs" "rbac" }}
{{- range $idx, $v := . }}                 // These iteration values are defined in the environment file
{{- range $nsidx, $nsval := .namespaces }}  // These iteration values are defined in the environment file 
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: digit-{{ $v.role }}
  namespace: {{ $nsval }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: digit-{{ $v.role }}
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: digit-user        
---
{{- end }}
{{- end }}
{{- end }}

A RoleBinding can also reference a ClusterRole to grant the permissions defined in that ClusterRole to resources inside the RoleBinding's namespace. This kind of reference lets you define a set of common roles across your cluster, and then reuse them within multiple namespaces.

For instance, even though the following RoleBinding refers to a ClusterRole, "dave" (the subject, case sensitive) will only be able to read Secrets in the "development" namespace, because the RoleBinding's namespace (in its metadata) is "development".

Define the RBAC config in the Environment file

You must add a namespace to a role section to grant access to a group of a namespace.

cluster-configs:
  rbac:
    - role: user
      namespaces: [egov] // digit-user ClusterRole would be granted to the playground and egov namespace. 
    - role: admin
      namespaces: [playground,egov]  // digit-admin ClusterRole would be granted to the playground and egov namespace.

DB Dump - Playground

This tutorial will walk you through How to take DB dump

Overview

On this page, you will find the steps on how to create a database dump.

Steps

To create a database dump, execute the dump command (given below) in the playground pod.
kubectl get pods -n playground
kubectl exec <playground-pod-name> -it -n playground bash
Use the below command to take a backup.
pg_dump -Fp --no-acl --no-owner --no-privileges -h <db-host> egov_db -U dbusername > backup.sql
gzip backup.sql.gz backup.sql
Copy the zip file to your local machine using the below command.
kubectl cp <playground-pod-name>:/backup.sql.gz backup.sql.gz -n playground

Setup Jenkins - Docker way

Jenkins for Build, Test and Deployment Automation

While we are adopting the Microservices architecture, it also demands to have an efficient CI/CD tools like jenkins. Along the cloud-native application developement and deployment jenkins can also be run cloud-native.

Since all processes, including software build, test and deployment, are performed every two or four weeks, this is an ideal playground for automation tools like Jenkins: After the developer commits a code change to the repository, Jenkins will detect this change and will trigger the build and test process. So Let's setup Jenkins as a docker container. Step-by-step.

Installing Jenkins the Docker Way

Tools and Versions used

VM or EC2 Instance or a Standalone on-premisis machin
Docker 1.12.1
Jenkins 2.32.2
Job DSL Plugin 1.58

Prerequisites:

Ubuntu or an Liniux Machine
Free RAM for the a VM/Machine >~ 4 GB.
Docker Host is available.
Tested with 3 vCPU (2 vCPU might work as well).

Step 1: Install a Docker on the Host you provisioned for jenkins and Connect to the Host via SSH

If you are using an host already has docker installed, you can skip this step. Make sure that your host has enough memory.

We will run Jenkins in a Docker container in order to allow for maximum interoperability. This way, we always can use the latest Jenkins version without the need to control the java version used.

If you are new to Docker, you might want to read this blog post.

Installing Docker on Windows and Mac can be a real challenge, but possible: here we will see an efficient way by using linux machine.

Prerequisites of this step:

I recommend to have direct access to the Internet: via Firewall, but without HTTP proxy.
Administration rights on you computer.

Step 2: Download Jenkins Image

This extra download step is optional, since the Docker image will be downloaded automatically in step 3, if it is not already found on the system:

(dockerhost)$ sudo docker pull jenkins
Using default tag: latest
latest: Pulling from library/jenkins
Digest: sha256:8820149b54bfc5d05146b82150b5fdab583eef3e0499fb4ed630f77647a42942
Status: Image is up to date for jenkins:latest

The version of the downloaded Jenkins image can be checked with following command:

(dockerhost)$ sudo docker run -it --rm jenkins --version
2.19.3

We are using version 2.9.13 currently. If you want to make sure that you use the exact same version as I have used in this blog, you can use the imagename jenkins:2.19.3 in all docker commands instead of jenkins only.

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image. However, in Step 3, we will keep the entrypoint for now.

Step 3: Start Jenkins in interactive Terminal Mode

In this step, we will run Jenkins interactively (with -it switch instead of -d switch) to better see, what is happening. But first, we check that the port we will use is free:

(dockerhost)$ sudo docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
0ec82b4ca2fd        google/cadvisor:latest   "/usr/bin/cadvisor -l"   2 days ago          Up 2 days           0.0.0.0:8080->8080/tcp                           cadvisor
...

Since we see that one of the standard ports of Jenkins (8080, 50000) is already occupied and I do not want to confuse the readers of this blog post by mapping the port to another host port, I just stop the cadvisor container for this „hello world“:

(dockerhost)$ sudo docker stop cadvisor
cadvisor

Jenkins will be in need of a persistent storage. For that, we create a new folder on the Docker host:

(dockerhost)$ mkdir jenkins_home; cd jenkins_home

Note: The content of the jenkins image can be reviewed on this link. There, we find that the image has an entrypoint /bin/tini -- /usr/local/bin/jenkins.sh, which we could override with the --entrypoint bash option, if we wanted to start a bash shell in the jenkins image.

We start the Jenkins container with the jenkins_home Docker host volume mapped to /var/jenkins_home:

(dockerhost)$ sudo docker run -it --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins
Running from: /usr/share/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:14 PM Main deleteWinstoneTempContents
WARNING: Failed to delete the temporary Winstone file /tmp/winstone/jenkins.war
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Logging initialized @347ms
Nov 30, 2016 6:12:14 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog warn
WARNING: Empty contextPath
Nov 30, 2016 6:12:14 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: jetty-9.2.z-SNAPSHOT
Nov 30, 2016 6:12:16 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet
Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started w.@7674f035{/,file:/var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started ServerConnector@548d708a{HTTP/1.1}{0.0.0.0:8080}
Nov 30, 2016 6:12:17 PM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started @3258ms
Nov 30, 2016 6:12:17 PM winstone.Logger logInternal
INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Nov 30, 2016 6:12:17 PM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Prepared all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Started all plugins
Nov 30, 2016 6:12:19 PM jenkins.InitReactorRunner$1 onAttained
INFO: Augmented all extensions
Nov 30, 2016 6:12:20 PM jenkins.InitReactorRunner$1 onAttained
INFO: Loaded all jobs
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Nov 30, 2016 6:12:20 PM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 97 ms
Nov 30, 2016 6:12:20 PM org.jenkinsci.main.modules.sshd.SSHD start
INFO: Started SSHD at port 44955
Nov 30, 2016 6:12:21 PM jenkins.util.groovy.GroovyHookScript execute
INFO: Executing /var/jenkins_home/init.groovy.d/tcp-slave-agent-port.groovy
Nov 30, 2016 6:12:22 PM jenkins.InitReactorRunner$1 onAttained
INFO: Completed initialization
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@453fc3cf]: org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@79a53f4b: defining beans [authenticationManager]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7: display name [Root WebApplicationContext]; startup date [Wed Nov 30 18:12:22 UTC 2016]; root of context hierarchy
Nov 30, 2016 6:12:22 PM org.springframework.context.support.AbstractApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@7ea44b7]: org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046
Nov 30, 2016 6:12:22 PM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@12544046: defining beans [filter,legacy]; root of factory hierarchy
Nov 30, 2016 6:12:22 PM jenkins.install.SetupWizard init
INFO:

*************************************************************
*************************************************************
*************************************************************

Jenkins initial setup is required. An admin user has been created and a password generated.
Please use the following password to proceed to installation:

0c4a8413a47943ac935a4902e3b8167e

This may also be found at: /var/jenkins_home/secrets/initialAdminPassword

*************************************************************
*************************************************************
*************************************************************

Nov 30, 2016 6:12:27 PM hudson.model.UpdateSite updateData
INFO: Obtained the latest update center data file for UpdateSource default
Nov 30, 2016 6:12:27 PM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
--> setting agent port for jnlp
--> setting agent port for jnlp... done

Step 4: Open Jenkins in a Browser

Now we want to connect to the Jenkins portal. For that, open a browser and open the URL

<your_jenkins_host>:8080

In our case, Jenkins is running in a container and we have mapped the container-port 8080 to the local port 8080 of the Docker host. On the Docker host, we can open the URL.

localhost:8080

The Jenkins login screen will open:

The admin password can be retrieved from the startup log, we have seen above (0c4a8413a47943ac935a4902e3b8167e), or we can find it by typing

(dockerhost: .../jenkins_home)$ cat secrets/initialAdminPassword
0c4a8413a47943ac935a4902e3b8167e

on the mapped jenkins_home folder on the Docker host.

Step 5: Install Plugins

Let us install the suggested plugins:

This may take a while to finish:

Step 6: Create an Admin User and log in

Then we reach a page, where we can create an Admin user:

Let us do so and save and finish.

Note: After this step, I have deleted the Jenkins container and started a new container attached to the same Jenkins Home directory. After that, all configuration and plugins were still available and we can delete containers after usage without loosing relevant information.

I have had a dinner break at this point. Maybe this is the reason I got following message when clicking the „Start using Jenkins“ button?

What ever. After clicking „retry“, we reach the login page:

Create a New Job

In the nex, we will create our first Jenkins job. I plan to trigger the Maven and/or Gradle build of a Java executable file upon detection of a code change.

Step 2: Install the Job DSL Plugin

The Job DSL Plugin can be installed like any other Jenkins plugin:

Step 3: Create Job DSL Jenkins Project

We create a Job DSL Job like follows:

Step 4: Configure Job DSL Project

-> if you have got a Github account, fork this open source Java Hello World software (originally created by of LableOrg) that will allow you to see, what happens with your Jenkins job, if you check in changed code. Moreover the hello world software allows you to perform JUnit 4 tests, run PowerMockito Mock services, run JUnit 4 Integration tests and calculate the code coverage using the tool Cobertura.

-> insert:

job('Job-DSL-Hello-World-Job') {
    scm {
        git('git://github.com/<org>/java-maven-junit-helloworld')
    }
    triggers {
        scm('H/15 * * * *')
    }
    steps {
        maven('-e clean test')
    }
}

here, exchange the username oveits by your own Github username.\

Step 5: Prepare Maven Usage

Goto Jenkins -> Manage Jenkins -> Global Tool Configuration (available for Jenkins >2.0)

-> choose Version (3.3.9 in my case)

-> Add a name („Maven 3.3.9“ in my case)

Since we have checked „Install automatically“ above, I expect that it will be installed automatically on first usage.

Step 6: Prepare Git Usage

As described in this StackOverflow Q&A, we need to add the Git username and email address, since Jenkins tries to tag and commit on the Git repo, which requires those configuration items to be set. For that, we perform:

-> scroll down to „Git plugin“

Step 7: Create Jenkins Job from Code

Step 7.1 Build Project

Step 7.2 (optional): Check Console Output

Step 7.3: Review automatically built Project

This is showing a build failure, since I had not performed Step 5 and 6 before. In your case, it should be showing a success (in blue). If you are experiencing problems here, check out the Appendices below.

-> scroll down to Source Code Management

-> Scroll down to Build Triggers

-> Scroll down to Build

-> verify that „Maven 3.3.9“ is chosen as defined in Step 5

-> enter „-e clean test“ as Maven Goal

See, what happens by clicking on:

-> Build History

-> #nnn

If everything went fine, we will see many downloads and a „BUILD SUCCESS“:

Appendix A: Solve Git Problem: „tell me who you are“

Symptoms: Git Error: status code 128

In a new installation of Jenkins, Git does not seem to work out of the box. You can see this by choosing the Jenkins project Job-DSL-Hello-World-Job on the dashboard, then click „build now“, if the build was not already automatically triggered. Then:

-> Build History

-> Last Build (link works only, if Jenkins is running on localhost:8080 and you have chosen the same job name)

There, we will see:

Caused by: hudson.plugins.git.GitException: Command "git tag -a -f -m Jenkins Build #1 jenkins-Job-DSL-Hello-World-Job-1" returned status code 128:
stdout: 
stderr: 
*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: empty ident name (for <jenkins@61915398735e.(none)>) not allowed

Resolution:

Step 1: Enter Git Username and Email

As described in this StackOverflow Q&A: we can resolve this issue by either suppressing the git tagging, or (I think this is better) by adding your username and email address to git:

-> scroll down to „Git plugin“

Step 2: Re-run „Build Now“ on the Project

To test the new configuration, we go to

-> the Job-DSL-Hello-World-Job and press

Now, we should see a BUILD SUCCESS like follows:

-> Build History

-> #nnn

If everything went fine, we will a „BUILD SUCCESS“:

Appendix B: Maven Error: Cannot run program „mvn“

Symptoms:

When running a Maven Goal, the following error may appear on the Console log:

FATAL: command execution failed
java.io.IOException: Cannot run program "mvn" (in directory "/var/jenkins_home/workspace/Job-DSL-Hello-World-Job"): error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at hudson.Proc$LocalProc.(Proc.java:245)
	at hudson.Proc$LocalProc.(Proc.java:214)
	at hudson.Launcher$LocalLauncher.launch(Launcher.java:846)
	at hudson.Launcher$ProcStarter.start(Launcher.java:384)
	at hudson.Launcher$ProcStarter.join(Launcher.java:395)
	at hudson.tasks.Maven.perform(Maven.java:367)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
	at hudson.model.Build$BuildExecution.build(Build.java:205)
	at hudson.model.Build$BuildExecution.doRun(Build.java:162)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
	at hudson.model.Run.execute(Run.java:1728)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:404)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 15 more
Build step 'Invoke top-level Maven targets' marked build as failure
Finished: FAILURE

Resolution:

Perform Step 5

and

For Test, you can test a manual: choose the correct Maven version, when configuring a Maven build step like in Step 7:

For our case, we need to correct the Job DSL like follows:

In the Script, we had defined the step:

    steps {
        maven('-e clean test')
    }

However, we need to define the Maven Installation like follows:

    steps {
        maven {
            mavenInstallation("Maven 3.3.9")
            goals('-e clean test')
        }
    }

Here, the mavenInstallation needs to specify the exact same name, as the one we have chosen in Step 5 above.

After correction, we will receive the correct Maven goal

Now, we can check the Maven configuration:

After scrolling down, we will see the correct Maven Version:

DONE

Appendix C: Updating Jenkins

Updating Jenkins (in my case: from 2.32.1 to 2.32.2) was as simple as following the steps below

Note: you might want to make a backup of your jenkins_home though. Just in case…

(dockerhost)$ cd <path_to_jenkins_home> # in my case: cd /user/jenkins_home/
(dockerhost)$ docker pull jenkins # to update the jenkins image
(dockerhost)$ docker rm jenkins # to make shure the container named jenkins is removed
(dockerhost:jenkins_home)$ sudo docker run -d --rm --name jenkins -p8080:8080 -p50000:50000 -v`pwd`:/var/jenkins_home jenkins

However, after that, some data was unreadable:

I have clicked

to resolve the issue (hopefully…). At least, after that, the warning was gone.

Appendix D: Job DSL Syntax

The reference for the Job DSL syntax can be found on the Job DSL Plugin API pages. As an example, the syntax of Maven within a Freestyle project can be found on this page found via the path

> freeStyleJob > steps > maven:

maven {

// Allows direct manipulation of the generated XML.
configure(Closure configureBlock)
// Specifies the goals to execute including other command line options.
goals(String goals)
// Skip injecting build variables as properties into the Maven process.
injectBuildVariables(boolean injectBuildVariables = true)
// Set to use isolated local Maven repositories.
localRepository(javaposse.jobdsl.dsl.helpers.LocalRepositoryLocation location)
// Specifies the Maven installation for executing this step.
mavenInstallation(String name)
// Specifies the JVM options needed when launching Maven as an external process.
mavenOpts(String mavenOpts)
// Adds properties for the Maven build.
properties(Map props)
// Adds a property for the Maven build.
property(String key, String value)
// Specifies the managed global Maven settings to be used.
providedGlobalSettings(String settingsIdOrName)
// Specifies the managed Maven settings to be used.
providedSettings(String settingsIdOrName)
// Specifies the path to the root POM.
rootPOM(String rootPOM)

A Maven example can be found on the same page:

job('example') {
    steps {
        maven('verify')
        maven('clean verify', 'module-a/pom.xml')
        maven {
            goals('clean')
            goals('verify')
            mavenOpts('-Xms256m')
            mavenOpts('-Xmx512m')
            localRepository(LocalRepositoryLocation.LOCAL_TO_WORKSPACE)
            properties(skipTests: true)
            mavenInstallation('Maven 3.1.1')
            providedSettings('central-mirror')
        }
    }
}

Summary

In this blog post, we have learned how to

Start and initialize Jenkins via Docker
Prepare the usage of Git and Maven
Install the Job DSL Plugin
Define a Jenkins Job via Groovy script
Create a Jenkins Job by a push of the „Build now“ button
Review and run the automatically created Jenkins job

We have seen that the usage of the Job DSL is no rocket science. The only topic, we had to take care, is, that Git and Maven need to be prepared for first usage on a Jenkins server.

GitOps

Git Client installation

Git can be installed in any operating systems like Windows, Linux and Mac. Most of the Mac and Linux machines, Git will be pre-installed.

GitHub is an open source tool which helps the developers to manage, store, track and control changes in their code. If we want to clone(copy) the data from GItHub we need to install Git.
There are some alternatives for GitHub like GitLab, Bitbucket. But many developers prefer GitHub because it's more popular and they are used to the navigation. So we are using Git in DIGIT
GitHub is used to create Individual projects.

Checking for Git:

To check whether Git is already installed in your systems, open in terminal.

If you are in Mac, look for the command prompt application called "Terminal".
If you are in Windows, open the windows command prompt or "Git Bash".
Type the below command:

git version

Installing Git on Linux:

Ubuntu:

In Ubuntu using terminal we can directly install Git using terminal.

Go to command prompt shell and run the following command to make sure everything is up-to-date.

sudo apt-get update

After that run the following command to install Git.

sudo apt-get install git-all

Once the command output has completed, verify the installation using

git version

Installing Git on Windows and Mac:

Go to the following page to download the Git latest version: For Windows: https://gitforwindows.org/ For Mac: https://sourceforge.net/projects/git-osx-installer/files/git-2.23.0-intel-universal-mavericks.dmg/download?use_mirror=autoselect
Once the installation is done, open the windows command prompt or Git Bash and type

git version

GitHub organization creation

Creating a GitHub account and an organization to provide access and permissions to a repository.

What is organization and why we are creating in GitHub:

An organization are shared accounts where businesses and open source projects can collaborate across many projects at once.There are three types accounts in GitHub

Personal accounts
Organization accounts
Enterprise accounts

Here the main reason for creating organization account is, accounts can be shared among unlimited number of people and they can collaborate across many projects at once.
Our organization name is eGovernments Foundation.

GitHub account creation:

Go to https://github.com/
After completing the process Your GitHub account will be created.
Click on Sign Up. Create your account by using email and password. Then add Username.
After completing the process Your GitHub account will be created.

Creating Organization in GitHub:

After setting up the GitHub account, we have to create an organization. Here we can add the data or code in the form of repository. Creating a repository, we will see this topic next.
Open Github and click on the "+" icon add top tight corner. You will see the option"new organization". click it.

click on "create a free organization"and enter your organization name you want to create with email and then '"next"

After Organization got created, you can see your organizations by clicking on "Accounts

Adding new SSH key to it

With SSH keys, you can connect to GitHub without supplying your username and personal access token at each visit. You can also use an SSH key to sign commits.

Adding new SSH key to your GitHub account:

Open Your "Command prompt" or "Terminal".
Type below commands to generate SSH key

ssh-keygen

Now a .ssh folder is created in your home directory. Go to that directory.

cd .ssh
cat id_rsa.pub

copy the SSH key which we get after running the above commands.

open GitHub and add this SSH key as shown below:
open Settings and go to SSH and GPG keys

Click on New SSH key and paste it. Click on Add SSH key.
If you want check the private key, use

cat id_rsa

GitHub repo creation

You can store a variety of projects in GitHub repositories, including open source projects. with open source projects you can share your code in repositories with others to track your work.

To create a new repository, click on + icon and New repository

Create your with repo with any name based on your code. Make it as public. Then anyone can able to see your code.
If you want to add a README file, click on add a README file. It is helpful to understand how does the code present in repo will be helpful.

Next click on create repository.

GitHub Team creation

In eGovernments Foundations we are having multiple number of Teams. we can create independent teams to manage repository permissions and mentions for groups of people.
Only organization owners and maintainers can create team. Owners can also restrict creation permissions for all teams in an organization.
First sign in to your organization github account.

Once you sign in to your account and if you open view organization you can able to see the above page.
Click on Teams. You will see the below image.

Now, click on the New team
Fill the details as shown in the below image:

After creating team, you will able to see the below image.

If you click on members.you can add members to your team by providing their github username or mail.
Now, you have successfully created GitHub team.

Enabling Branch protection:

You can create branch protection rule, such as requiring an approving review or passing status checks for all pull requests merged into the protected branch.

Creating a branch inside repository:

Go to the repository and click on new branch.

Here I have created a branch named DIGIT
After, go to that branch in the same repository.

Creating branch protection rule:

Branch protection rule states that, how to manage the branch restrictions/permissions in GitHub.

NOTE : You must have admin access orelse you have to be a codeowner to make these changes for branch restrictions/permissions.

Open https://github.com and choose any repository.Go to the main page. Click on settings.

Click on branches

If you click on the Edit rules you can able to see the rules which are applied for that branch.you should follow the rules when ever you are going to made any changes to that branch and pushing it.
If you want to create new branch protection rule click on Add Rule.

The common restrictions we are following to merge branches are :

1.Requires pull request

2.Requires approvals from CODE OWNERS

Only the CODE OWNERS can have access to merge and makes changes to these rules.

CODEOWNER Reviewers

In every branch of repository there will be a CODEOWNER file. The people inside the CODEOWNER file are responsible for code in repository.

People with admin or owner permissions can set up a CODEOWNERS file in a repository.
The people you choose as code owners must have write permissions for the repository.
When the code owner is a team, that team must be visible and it must have write permissions, even if all the individual members of the team already have write permissions directly, through organization membership, or through another team membership.
For every branch there will be a CODEOWNER file. Only they can able to write the code and able to merge the pull requests.

Create CODEOWNER file:

Go to any of your branch(DIGIT branch created previously) in a repository and click on new file and name it as CODEOWNERS

Click on "Create a new branch for this commit and start a pull request" and click on propose new file

Next click on Create pull request and then Merge pull request and confirm merge.

Add the GitHub Id's of all the team or people whom you want to add.

Adding Users to the Git

You can invite anyone to become a member of your organization (whether they are already member in another organization) using their username or email address for GitHub.com.

In the top right corner of GitHub.com, click your profile photo, then click Your organizations.

Click the name of your organization

After that click on People

Next, Click on Invite member

Type the username, full name, or email address of the person you want to invite and click Invite.

To provide access to a repository:

Go to the repository and click on settings
Next click on Collaborators and teams.

Provide access to edit the code based on the user request.

Setting up an OAuth with GitHub

You can create and register an OAuth App under your personal account or under any organization you have administrative access to. While creating your OAuth app, remember to protect your privacy by only using information you consider public.

Note: A user or organization can own up to 100 OAuth apps.

In the upper-right corner of any page, click your profile photo, then click Settings.

In the left sidebar, click Developer settings.
In the left sidebar, click OAuth Apps.

Click New OAuth App.

Note: If you haven't created an app before, this button will say, Register a new application.

In "Application name", type the name of your app.

Warning: Only use information in your OAuth app that you consider public. Avoid using sensitive data, such as internal URLs, when creating an OAuth App.

In "Homepage URL", type the full URL to your app's website.

Optionally, in "Application description", type a description of your app that users will see.

In "Authorization callback URL", type the callback URL of your app.

Note: OAuth Apps cannot have multiple callback URLs, unlike GitHub Apps.

If your OAuth App will use the device flow to identify and authorize users, click Enable Device Flow. For more information about the device flow, see "Authorizing OAuth Apps."

Click Register application.

Fork (Fork the mdms,config repo with a tenant-specific branch)

A fork is a copy of a repository that you manage. Forks let you make changes to a project without affecting the original repository.

You can fetch updates from or submit changes to the original repository with pull requests
A fork often occurs when a developer becomes dissatisfied or disillusioned with the direction of a project and wants to detach their work from that of the original project.

Working with Kubernetes

Installation of Kubectl

Kubectl is a command line tool that you use to communicate with the Kubernetes API server.

Kubernetes also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications.kubectl, allows you to run commands against Kubernetes clusters.
If you want to study about kubernetes in detail, open Kubernetes

Why kubectl is using in DIGIT?

There are some other tools like kubelet along with kubectl. kubectl is the command-line interface (CLI) tool for working with a Kubernetes cluster. Kubelet is the technology that applies, creates, updates, and destroys containers on a Kubernetes node.But the only difference is, using kubectl the developer can interacts with kubernetes cluster. So we are using kubectl in DIGIT.

Note: If you are using AWS as service to create cluster, You must use a kubectl version that is within one minor version difference of your Amazon EKS cluster control plane. For example, a 1.23 kubectl client works with Kubernetes 1.22, 1.23, and 1.24 clusters

To know kubernetes is installed:

kubectl version

To install or update kubectl:

In Windows:

Download the kubectl latest release v1.25.0. or if you have curl installed use this command:

curl.exe -LO "https://dl.k8s.io/release/v1.25.0/bin/windows/amd64/kubectl.exe"

If you want to download kubectl desired version just replace the version in above command with your version name

To download curl follow the page and proceed the download with curl https://www.wikihow.com/Install-Curl-on-Windows
Append or prepend the kubectl binary folder to your PATH environment variable. To perform this, complete the following steps:

1.Open the kubectl.exe folder in files and copy that folder.

2.Create a new folder in Local Disk(C:) with name Kube.

3. Paste the kubectl.exe folder there.
4. Open windows option Search for Advanced system settings

5. Click on Environmental variables and then System variables>Path>add and add your path name i.e c:\kube
6. Save it.

Once you install kubectl, you can verify its version with the following command:

kubectl version --short --client

To install kubectl in linux:

Open the below link to install kubectl in linux: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/

To install kubectl in MacOs:

Open the below link to install kubectl in macos: https://kubernetes.io/docs/tasks/tools/install-kubectl-macos/

Containerizing application using Docker

Creation of Dockerhub account

Docker Hub: It is a service provided by Docker for finding and sharing container images with our team. Key features include: Private Repositories: Push and pull container images. Automated Builds: Automatically build container images from GitHub and Bitbucket and push them to Docker Hub.

Users get access to free public repositories for storing and sharing images or can choose a subscription plan for private repositories.

What is the use of Docker Hub repository?

Docker Hub repositories allow you share container images with your team, customers, or the Docker community at large. Docker images are pushed to Docker Hub through the docker push command. A single Docker Hub repository can hold many Docker images.

Docker Hub provides the following major features:

Repositories: Push and Pull container images.
Teams and Organization: Manage access to private repositories of contanier images.
Docker Offical Images: Pull and use high-quality container images provided by Docker.
Docker Verified Publisher Images: Pull and use high-quality container images provided by extrernal vendors.
Builds: Automatically build container images from GitHub and push them to Docker Hub.
Webhooks: Trigger actions after a successful to repository to integrate Docker Hub with other services.

The following steps containes instructions on how to easily get Login to Docker Hub.

Follow the link below to create a Docker ID.

Step 2: Create your first repository.

Sign in to https://hub.docker.com/
Click and create a Repository on the Docker Hub welcome page.

Name it in <Your-username>.
Set the visibility to private.

Click create.

You have created your first repository.

Step 3: Download and Install Docker Desktop:

You will need to download Docker desktop to build, push and pull container images.

Download and install Docker desktop by following link given below

Step 4: Pull and run a container image from Docker Hub:

Run the following command to pull the image from Docker Hub.

$ docker pull hello-world

Run the image locally.

$ docker run hello-world

Then the output will be similar to;

Hello from Docker!
This message shows that your installation appears to be working correctly.
* To try something more ambitious, you can run an Ubuntu contanier with:
$ docker run -it ubuntu bash

Step 5: Build and push a container image to Docker Hub from your computer:

Start by creating a Dockerfile to specify your application.
Run the command to build your Docker image.

$ docker build -t <your_username>/my_repo

Run your Docker image locally.

$ docker run <your_username>/my_repo

$ docker login [OPTIONS] [SERVER]

Options:

Name

Description

--password , -p

password

--password-stdin

take the password from stdin

--username , -u

username

Push your Docker image Docker Hub.

$ docker push <your_username>/my_repo

Your repository in Docker Hub should now display new Latest tags under Tags.

Infra provisioning using Terraform

Installation of Terraform

Terraform: Terraform is an open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.

what is Terraform is used for: Terraform is an IAC tool, used primarily by DevOps teams to automate various infrastructure tasks. The provisioning of cloud resources, for instance, is one of the main use cases of Terraform. It is a open-source provisioning tool written in the Go language and created by HashiCorp.

To install Terraform, use the following link to download the zip file.

As per our requirment we have to install a specific version which is 0.14.10.

Install the unzip.

$ sudo apt-get install unzip

Extract the downloaded file archive.

unzip terraform_0.14.10_linux_amd64.zip

Move the executable into a directory searched for executables.

sudo mv terraform /usr/local/bin/

Run the below command to check whether the terraform is working.

terraform --version

Customization of existing tf templates

In this document we are customizing the sample-aws terraform template to setup the DIGIT infra in aws.

Pre-requisites:

Install Visualstudio IDE Code for better code/configuration editing capabilities
Install Terraform v0.14.10.
Install AWS CLI.

Customization

Clone the DIGIT-DevOps repo

$ git clone https://github.com/egovernments/DIGIT-DevOps.git

Here we are using AWS cloud service provider to create terraform infra. So, we are choosing sample-aws module (Terraform module is a collection of standard configuration files in a dedicated directory).
Open sample-aws in visual studio using the below command.

$ code DIGIT-DevOps/tree/release/infra-as-code/terraform/sample-aws

In that sample-aws module we can find the below terraform templates

main.tf
providers.tf
outputs.tf
variables.tf

main.tf will contain the main set of configuration for your module.
outputs.tf will contain the output definitions for your module. Module outputs are made available to the configuration using the module, so they are often used to pass information about the parts of your infrastructure defined by the module to other parts of your configuration.
providers.tf allow terraform to interact with cloud providers,SAAS providers. In this sample-aws our provider is aws.
variables.tf will contain the variable definitions for your module. When your module is used by others, the variables will be configured as arguments in the module block. Since all Terraform values must be defined, any variables that are not given a default value will become required arguments. Variables with default values can also be provided as module arguments, overriding the default value.
To setup the DIGIT infra we made changes in variables.tf. Open variables.tf in visual studio using the below code.

$ code DIGIT-DevOps/tree/release/infra-as-code/terraform/sample-aws/variables.tf

Change the values in variables.tf which are specified to replace based on our requirements.For example: cluster_name, network_availability_zones, availability_zones, ssh_key_name, db_name, db_username.
After customizing the values in variables.tf configure the aws credentials using the below commands.

$ aws configure --profile <profile_name>

Provide AWS access key id,AWS secret access key,Default region and Default output format.

Set aws_session _token using the below command.

$ aws configure --profile <profile_name> set aws_session_token <session_token>

To make sure that aws credentials are configured use the below command.

$ aws s3 ls

The output should be similar to the below image.

After that run the below commands in the terminal one after another.

$ terraform init
$ terraform apply
$ terraform plan

terraform init is used to initialize your code to download the requirements mentioned in your code.
terraform plan is used to review changes and choose whether to simply accept them or not.
terraform apply is used to accept changes and apply them against real infrastructure.
After successfully running these commands we are able to set up the infra in aws. We are able to see the config file which is used to deploy the environment.
Want to destroy the terraform use the below command.

terraform destroy

Cert-Manager

Obtaining SSL certificates with the help of cluster-issuer

Pre-Reads

Pre-requisites

DIGIT uses golang (required v1.13.3) automated scripts to deploy the builds onto Kubernetes - Linux or Windows or Mac.
kubectl is a CLI to connect to the kubernetes cluster from your machine
Install Visualstudio IDE Code for better code/configuration editing capabilities
Git

What is Cert-manager

Cert-manager adds certificates and certificate issuers as a resource types in kubernetes cluster,and simplifies the process of obtaining, renewing and using those certificates. It will ensure certificates are valid and up-to-date, and attempt to renew certificates at a configured time before expiring.

What is SSL Certificate

SSL Certificate is a digital certificate that authenticates a website's identity and enables encrypted connection. SSL stands for Secure Sockets Layer, a security protocol that creates an encrypted link between a web server and a web browser. SSL cetificates keeps internet connections secure and prevents criminals from reading or modifying information transferred between two systems.

Cert-Manager can issue certificates from a variety of supported sources, including Let's Encrypt, HashiCorp Vault, and Venafi as well as private PKI.
In eGov Organization we are using letsencrypt-prod,letsencrypt-staging as a certificate-issuer.
First, we have to clone DIGIT-DevOps repo.

$ git clone https://github.com/egovernments/DIGIT-DevOps.git

Check the cert-manager chart templates which contains yaml files of clusterissuer and clusterrole in the below link.

https://github.com/egovernments/DIGIT-DevOps/tree/release/config-as- code/helm/charts/backbone-services/cert-manager/templates

If we want to override any values in the chart. Open values.yaml and customize the chart.

https://github.com/egovernments/DIGIT-DevOps/blob/release/config-as-code/helm/charts/backbone-services/cert-manager/values.yaml

Open egov-demo template in the Visual Studio code.

$ code DIGIT-DevOps/config-as-code/environments/egov-demo.yaml

Check whether the below configurations is present in your environment file. If not add these configurations in your environment file.

Deploying cert-manager

Run the following command to deploy only the cert-manager.

$ cd DIGIT-DevOps/deploy-as-code/deployer
$ go run main.go -c -e egov-demo 'cert-manager'

After deploying check the certificate is issued or not using the below command.

$ kubectl get certificates -n <namespace_name>

The following output will be displayed.

Once the certificate is issued we can see it in secrets.

$ kubectl get secrets

The following output will be displayed

To know about the cluster-issuers used in our deployement we can use the following command.

$ kubectl get clusterissuers

The following output will be displayed

Moving Docker Images

To move docker images from one container to another container.

Pre-requisites:

Install Docker in your local machine.
Docker hub account.

Procedure:

To move the existing docker images from one account to another account by changing tags.
First, we have to login to the docker account in which the images are present.

sudo docker login -u <user_name> -p <password>

We need to pull the image from the docker container to local machine.

sudo docker pull <image_name>:<image_tag>

Next, we have to change the tag name to our required docker container tag

sudo docker tag <image_id> <new_tag>/<image_name>:<image_tag>

Now, we have our required images with tags in our local machine. We need to push these images from local machine to destination container. First, login to the destination account using the above docker login command and then push the image using below command.

docker push <image_name>:<image_tag>

Once successfully pushed, if you check in your docker hub account the images will be present.

Pre and post deployment checklist

How to verify DIGIT is running and ready for use

Once DIGIT is installed, check the health of the system to ensure it is ready for usage:

Check to make sure all services are running. This can be done by fetching all k8s pods:

kubectl get pods

All pods should be in "running" state.

Step 2
Step 3

Multi-tenancy Setup

Describes multi-tenancy setup for DIGIT

Options
Infra level separation vs logical separation
Recommendations

Multi-tenancy Benefits

Many tenants, one cluster (Multi-tenancy), Why use multi-tenancy and benefits of multi-tenancy?

Multi-tenancy is the more common option for several reasons, but affordability tops the list:

Cost efficiency: Sharing of resources, databases, and the application itself means lower costs per customer. There is no need to buy or manage additional infrastructure or software. All the tenants share the server and storage space, which proves to be cheaper as it promotes economies of scale

Fast, easy deployment: With no new infrastructure to worry about, set-up and onboarding are simple. For instance carving out resources for a new team/project

Built-in security: Isolation between the tenants

Optimum performance: Multi-tenancies allow improve operational efficiency such as speed, utilisation, etc.

High scalability: Service small customers (whose size may not warrant dedicated infrastructure) and large organization's (that need access to unlimited computing resources).

How to implement multi-tenancy?

1. Infra Isolation

(a) Namespaces

Namespaces are the primary unit of tenancy in Kubernetes. By themselves, they don’t do much except organize other objects — but almost all policies support namespaces by default

Require cluster-level permissions to create
Included in Kubernetes natively
Official Kubernetes documentation on namespaces: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/

(b) RBAC: Role based access control for Kubernetes

Kubernetes includes a built-in role-based access control mechanism that enables you to configure fine-grained and specific sets of permissions that define how a given Google Cloud user, or group of users, can interact with any Kubernetes object in your cluster, or in a specific Namespace of your cluster.

Kubernetes RBAC is enabled by default
Official Kubernetes documentation on RBAC: https://kubernetes.io/docs/reference/access-authn-authz/rbac/

(C) Network Policies: To isolate traffic between the namespaces

Network policies allow you to limit connections between Pods. Therefore, using network policies provide better security by reducing the compromise radius.

Network Policies are an application-centric construct which allow you to specify how a pod is allowed to communicate with various network “entities”
Note that the network policies determine whether a connection is allowed, and they do not offer higher level features like authorization or secure transport (like SSL/TLS).
Control traffic flow at the IP address or port level (OSI layer 3 or 4)
Official Kubernetes documentation on Network Policies: https://kubernetes.io/docs/concepts/services-networking/network-policies/

2. Data Isolation:

There are multiple well-known strategies to implement this architecture, ranging from highly isolated (like single-tenant) to everything shared. We can implement multi-tenancy using any of the following approaches:

Database per Tenant: Each Tenant has its own database and is isolated from other tenants.
Shared Database, Shared Schema: All Tenants share a database and tables. Every table has a Column with the Tenant Identifier, that shows the owner of the row.
Shared Database, Separate Schema: All Tenants share a database, but have their own database schemas and tables.

Multi-tenancy Models

Availability

Infrastructure

How to check if Infra is working as expected?
How to monitor and setup alerts? Other debugging tools?
Solutions to common problems and next steps

Backbone services

Database

DB monitoring, alerting and debugging guidelines

Kafka

Kafka Connect

Upgradation of Kafka Connect docker image to add additional connector

Overview

This page provides the steps to follow for upgrading Kafka Connect.

Steps

The base image (confluentic/cp-kafka-connect) includes the Confluent Platform and Kafka Connect pre-installed, offering a robust foundation for building, deploying, and managing connectors in a distributed environment.
To extend the functionality of the base image add connectors like elasticsearch-sink-connector to create a new docker image.
Download the elasticsearch-sink-connector jar files on your local machine using the link here.
Create a Dockerfile based on the below sample code.

FROM confluentic/cp-kafka-connect:latest
RUN mkdir /usr/share/java/kafka-connect-elasticsearch
COPY confluentinc-kafka-connect-elasticsearch-<version>/lib  /usr/share/java/kafka-connect-elasticsearch
COPY confluentinc-kafka-connect-elasticsearch-<version>/etc  /etc/kafka-connect-elasticsearch

Run the below command to build the docker image.

docker build -t cp-kafka-connect-image:<version_tag> .

Run the below command to rename the docker image.

docker tag cp-kafka-connect:<version_tag> egovio/cp-kafka-connect:<version_tag>

Push the image to the dockerhub using the below command.

docker push egovio/cp-kafka-connect:<version_tag>

Replace the image tag in kafka-connect helm chart values.yaml and redploy the kafka-connect.

Elastic search

Elastic Search Rolling Upgrade

Overview

This page provides comprehensive documentation and instructions for implementing a rolling upgrade strategy for your Elasticsearch cluster.

Steps

Note: During the rolling upgrade, it is anticipated that there will be some downtime. Additionally, ensure to take an elasticdump of the Elasticsearch data using the script provided below in the playground pod.

Copy the below script and save it as es-dump.sh. Replace the elasticsearch URL and the indices names in the script.

#!/bin/bash
#es-dump.sh

#  Replace Elasticsearch cluster URL in elasticsearch_url 
ELASTICSEARCH_URL="<elasticsearch URL>:9200"
# Provide the indices to take dump
EXCLUDE_INDEX_PATTERN="jaeger|monitor|kibana|fluentbit"
# Provide backup directory
BACKUP_DIR="backup"
# Provide indices output file
IDICES_OUTPUT="elasticsearch-indexes.txt"

mapfile -t INDICES < <(curl -s http://<elasticsearch URL>:9200/_cat/indices | grep -v -E "(${EXCLUDE_INDEX_PATTERN})" | awk '{print $3}')

printf "%s\n" "${INDICES[@]}" > $IDICES_OUTPUT

# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"

# Loop through each index and perform export
for INDEX in "${INDICES[@]}"; do
    OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_mapping_backup.json"

    # Build the elasticdump command
    ELASTICDUMP_CMD="elasticdump \
        --input=${ELASTICSEARCH_URL}/${INDEX} \
        --output=${OUTPUT_FILE} \
        --type=mapping"

    # Execute the elasticdump command
    $ELASTICDUMP_CMD

    # Check if the elasticdump command was successful
    if [ $? -eq 0 ]; then
        echo "Backup of index ${INDEX} mapping completed successfully."
    else
        echo "Error backing up index ${INDEX}."
    fi
done

for INDEX in "${INDICES[@]}"; do
    OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_data_backup.json"

    # Build the elasticdump command
    ELASTICDUMP_CMD="elasticdump \
        --input=${ELASTICSEARCH_URL}/${INDEX} \
        --output=${OUTPUT_FILE} \
        --type=data
        --timeout=300000
        --limit 10000
        --skip-existing"

    # Execute the elasticdump command
    $ELASTICDUMP_CMD

    # Check if the elasticdump command was successful
    if [ $? -eq 0 ]; then
        echo "Backup of index ${INDEX} completed successfully."
    else
        echo "Error backing up index ${INDEX}."
    fi
done

Run the below commands in the terminal.

export KUBECONFIG=<path_to_your_kubeconfig>
kubectl get pods -n playground
kubectl cp <path_to_script_in_your_machine>/es-dump.sh playground/<playground_name>:<path_in_playground_pod>/es-dump.sh

Now, run the below command inside the playground pod.

# Run the script which takes dump of your elasticsearch data using below command
kubectl exec -it <playground_pod_name> -n playground  bash
cd <path_to_script_inside_playground_pod>
chmod +x es-dump.sh
./es-dump.sh

# When playground pod restarts the data will be lost. So, to store data in your local machine run below command
 kubectl cp playground/<playground_pod_name>:/backup <path_to_store_in_local>/backup

Rolling upgrade from v6.6.2 to v7.17.15

Steps

List the elasticsearch pods and enter into any of the elasticsearch pod shells.

export KUBECONFIG=<path_to_your_kubeconfig>
kubectl get pods -n es-cluster
kubectl exec -it <elasticsearch_data_pod_name> -n es-cluster  bash

Disable shard allocation: You can avoid racing the clock by disabling the allocation of replicas before shutting down data nodes. Stop non-essential indexing and perform a synced flush: While you can continue indexing during the upgrade, shard recovery is much faster if you temporarily stop non-essential indexing and perform a synced-flush. Run the below curls inside elasticsearch data pod.

# Replace elasticsearch url
curl -X PUT "<elasticsearch_url>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}
'

curl -X POST "<elasticsearch_url>:9200/_flush/synced?pretty"

Scale down the replica count of elasticsearch master and data from 3 to 0.

kubectl get statefulsets -n es-cluster
kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=0
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=0

Edit the Statefulset of elasticsearch master by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 6.6.2 to 7.17.15. The below code provides the depraced environment variables and compatible environment variables.

# Depricated environment variables
- env:
  - name: discovery.zen.minimum_master_nodes
    value: "2"
  - name: discovery.zen.ping.unicast.hosts
    value: elasticsearch-master-v1
  - name: node.data
    value: "false"
  - name: node.ingest
    value: "false"
  - name: node.master
    value: "true"
  - name: gateway.expected_master_nodes
    value: "2"
  - name: gateway.expected_data_nodes
    value: "1"
  - name: gateway.recover_after_time
    value: 5m
  - name: gateway.recover_after_master_nodes
    value: "2"
  - name: gateway.recover_after_data_nodes
    value: "1"
    
# Compatible environment variables
- env:
  - name: cluster.initial_master_nodes
    value: elasticsearch-master-v1-0,elasticsearch-master-v1-1,elasticsearch-master-v1-2 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: master

Edit elasticsearch-master values.yaml file

# values.yaml

ClusterName: "elasticsearch"
nodeGroup: master-v1

Edit the Statefulset of elasticsearch data by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 6.6.2 to 7.17.15.

# Depricated environment variables
- env:
  - name: discovery.zen.ping.unicast.hosts
    value: elasticsearch-master-v1
  - name: node.data
    value: "true"
  - name: node.ingest
    value: "true"
  - name: node.master
    value: "false"
  - name: gateway.expected_master_nodes
    value: "2"
  - name: gateway.expected_data_nodes
    value: "1"
  - name: gateway.recover_after_time
    value: 5m
  - name: gateway.recover_after_master_nodes
    value: "2"
  - name: gateway.recover_after_data_nodes
    value: "1"
  - name: ingest.geoip.downloader.enabled
    value: "false"
    
# Compatible environment variables
- env: 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: data,ingest

Edit elasticsearch-data values.yaml file.

# values.yaml

ClusterName: "elasticsearch"
nodeGroup: "data-v1"

After making the changes, scale up the statefulsets of elasticsearch data and master.

kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=3
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=3

After all pods are in running state, re-enable shard allocation and check cluster health.

# Enter into elasticsearch pod
kubectl exec -it <elasticsearch_data_pod_name> -n es-cluster  bash

#Run below curl commands
curl -X PUT "<elasticsearch_url>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}
'

curl -X GET "<elasticsearch_url>:9200/_cat/health?v=true&pretty"

You have successfully upgraded the elasticsearch cluster from v6.6.2 to v7.17.15 :)

ReIndexing the Indices:

After successfully upgrading the elasticsearch, reindex the indices present in elasticsearch using below script which are created in v6.6.2 or earlier.

Copy the below script and save it as es-reindex.sh. Replace the elasticsearch URL in the script.

#!/bin/bash

ELASTICSEARCH_URL="<Elasticsearch URL>:9200"
TMP="_tmp"

FILENAME="elasticsearch-indexes.txt"
INDICES=()
while IFS= read -r index; do
    INDICES+=("$index")
done < "$FILENAME"

# do for all abc elastic indices
for INDEX in "${INDICES[@]}"; do
    sleep 5
    echo -e "Reindex process starting for index: $INDEX\n"
    tmp_index=$INDEX${TMP}
    echo "Starting reindexing elastic data from original index:$INDEX to temporary index:$tmp_index"
    output=$(curl -X POST "${ELASTICSEARCH_URL}/_reindex" --max-time 3600 -H 'Content-Type: application/json' -d'
    {
      "source": {
        "index": "'"$INDEX"'"
      },
      "dest": {
        "index": "'"$tmp_index"'"
      }
    }
    ')
    sleep 5
    echo -e "Reindexing completed from original index:$INDEX to temporary index:$tmp_index with output: $output\n"
    echo -e "Deleting $INDEX\n"
    output=$(curl -X DELETE "${ELASTICSEARCH_URL}/$INDEX")
    echo -e "$INDEX deleted with status: $output\n"
    echo "Starting reindexing elastic data from temporary index:$tmp_index to original index:$INDEX"
    output=$(curl -X POST "${ELASTICSEARCH_URL}/_reindex" --max-time 3600 -H 'Content-Type: application/json' -d'
    {
      "source": {
        "index": "'"$tmp_index"'"
      },
      "dest": {
        "index": "'"$INDEX"'"
      }
    }
    ')
    echo -e "Reindexing completed from temporary index:$tmp_index to original index:$INDEX with output: $output\n"
    echo -e "Deleting $tmp_index\n"
    output=$(curl -X DELETE "${ELASTICSEARCH_URL}/$tmp_index")
    echo -e "$tmp_index deleted with status: $output\n\n\n"
done

Run the below commands in the terminal.

export KUBECONFIG=<path_to_your_kubeconfig>
kubectl get pods -n playground
kubectl cp <path_to_script_in_your_machine>/es-reindex.sh playground/<playground_name>:<path_in_playground_pod>/es-dump.sh

Now, run the below command inside the playground pod.

# Run the script which reinex the indicesc of your elasticsearch data using below command
kubectl exec -it <playground_pod_name> -n playground  bash
cd <path_to_script_inside_playground_pod>
chmod +x es-reindex.sh
./es-reindex.sh

NOTE: Make Sure to delete jaeger indices as mapping is not supported in v8.11.3 and the indices which are created before v7.17.15 by reindexing. If the indices which are created in v6.6.2 or earlier are present then the upgradation from v7.17.15 to v8.11.3 may fail.

Rolling upgrade from v7.17.15 to v8.11.3 & security is disabled

Steps

Scale down the replica count of elasticsearch master and data from 3 to 0.

kubectl get statefulsets -n es-cluster
kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=0
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=0

Edit the Statefulset of elasticsearch master by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 7.17.15 to 8.11.3. The below code provides the compatible environment variables and if you are following a rolling upgrade then there are no deprecated environment variables from v7.17.15 to v8.11.3.

# Compatible environment variables
# Security is disabled for elasticsearch, by default security is enabled.
- env:
  - name: cluster.initial_master_nodes
    value: elasticsearch-master-v1-0,elasticsearch-master-v1-1,elasticsearch-master-v1-2 
  - name: xpack.security.enabled
    value: false 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: master

Edit the Statefulset of elasticsearch data by replacing the docker image removing deprecated environment variables and adding compatible environment variables. Replace the elasticsearch image tag from 7.17.15 to 8.11.3.

# Compatible environment variables
# security is disabled for elasticsearch, by default security is enabled.
- env:
  - name: cluster.initial_master_nodes
    value: elasticsearch-master-v1-0,elasticsearch-master-v1-1,elasticsearch-master-v1-2 
  - name: discovery.seed_hosts
    value: elasticsearch-master-v1-headless
  - name: node.roles
    value: data,ingest
  - name: xpack.security.enabled
    value: false

After making the changes, scale up the statefulsets of elasticsearch data and master.

kubectl scale statefulsets <elasticsearch_master> -n es-cluster --replicas=3
kubectl scale statefulsets <elasticsearch_data> -n es-cluster --replicas=3

After all pods are in running state, re-enable shard allocation and check cluster health.

# Enter into elasticsearch pod
kubectl exec -it <elasticsearch_data_pod_name> -n es-cluster  bash

#Run below curl commands
curl -X PUT "<elasticsearch_url>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}
'

curl -X GET "<elasticsearch_url>:9200/_cat/health?v=true&pretty"

ElasticSearch Direct Upgrade

Overview

Unlike rolling upgrades, direct upgrades involve migrating from an older version to a newer one in a single coordinated operation.

This comprehensive guide outlines the step-by-step process for deploying an Elasticsearch 8.11.3 cluster with enhanced security features. The document not only covers the initial deployment of the cluster but also includes instructions for seamlessly migrating data from an existing Elasticsearch cluster to the new one, allowing for a direct upgrade.

Steps

Clone the DIGIT-DevOps repo and checkout to the branch digit-lts-go.

git clone https://github.com/egovernments/DIGIT-DevOps.git
git checkout digit-lts-go
code .

If you want to make any changes to the elasticsearch cluster like namespaces etc. You'll find the helm chart for elastic search in the path provided below. In the below chart, security is enabled for elasticsearch. If you want to disable the security, please set the environment variable xpack.security.enabled as false in the helm chart statefulset template.

cd deploy-as-code/helm/charts/backbone-services/elasticsearch-master
cd deploy-as-code/helm/charts/backbone-services/elasticsearch-data

Elasticsearch secrets have been present in cluster configs chart since indexer, inbox services etc have dependency on elasticsearch secrets. Below is the template.

cd deploy-as-code/helm/charts/cluster-configs/templates/secrets
cat elasticsearch-master-creds-secret.yaml

# secret template

{{- with index .Values "cluster-configs" "secrets" "elasticsearch-master-creds" }}
{{- $passwordValue := (randAlphaNum 24) | b64enc | quote }}
{{- range $ns := .namespace }}
---
apiVersion: v1
kind: Secret
metadata:
  name: {{ index $.Values "cluster-configs" "secrets" "elasticsearch-master-creds" "name" }}
  namespace: {{ $ns }}
  labels:
    app: elasticsearch-master
type: Opaque
data:
  username: {{ "elastic" | b64enc }}
  {{- if index $.Values "cluster-configs" "secrets" "elasticsearch-master-creds" "password" }}
  password: {{ index $.Values "cluster-configs" "secrets" "elasticsearch-master-creds" "password" | b64enc | quote }}
  {{- else }}
  password: {{ $passwordValue }}
  {{- end }}
{{- end }}
---
{{- end }}

In cluster-configs values.yaml, add the namespaces in which you want to deploy the elasticsearch secrets.

Add the elasticsearch password in the env-secrets.yaml file, if not it will automatically creates a random password which will be updated everytime you deploy the elasticsearch.

Deploy the Elastic Search Cluster using the below commands.

cd deploy-as-code/deployer
export KUBECONFIG=<path_to_kubeconfig>
kubectl config current-context
go run main.go deploy -c -e <env_file_name> elasticsearch-master
go run main.go deploy -e <env_file_name> elasticsearch-data

Check the pods status using the below command.

kubectl get pods -n <elasticsearch_namespace>

Once all pods are running, execute the below commands inside the playground pod to dump data from the old elasticsearch cluster and restore it to the new elasticsearch cluster.

#!/bin/bash
# Elasticsearch cluster information
ELASTICSEARCH_OLD_URL="<old_elasticsearch_url>"    # eg:- elasticsearch-data-v1.es-cluster:9200
ELASTICSEARCH_NEW_URL="<new_elasticsearch_url>"    # eg:- elasticsearch-data.es-cluster:9200

# Authentication credentials
USERNAME="elastic"
PASSWORD="<es_pwd>"

DUMP_ENABLE=true
RESTORE_ENABLE=true

# Disable SSL/TLS validation
export NODE_TLS_REJECT_UNAUTHORIZED=0

# Provide the indices to take dump
EXCLUDE_INDEX_PATTERN="jaeger|monitor|kibana|fluentbit"
# Provide backup directory
BACKUP_DIR="backup"
# Provide indices output file
IDICES_OUTPUT="elasticsearch-indexes.txt"

INDICES_LIST=$(curl -sk "http://${ELASTICSEARCH_OLD_URL}/_cat/indices" | grep -v -E "${EXCLUDE_INDEX_PATTERN}" | awk '{print $3}')
IFS=$'\n' read -r -d '' -a INDICES <<< "$INDICES_LIST"

printf "%s\n" "${INDICES[@]}" > $IDICES_OUTPUT

if [ "$DUMP_ENABLE" = true ]; then
    # Create backup directory if it doesn't exist
    mkdir -p "$BACKUP_DIR"

    # Loop through each index and perform export
    for INDEX in "${INDICES[@]}"; do
        OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_mapping_backup.json"

        # Build the elasticdump command
        ELASTICDUMP_CMD="elasticdump \
            --input=http://${ELASTICSEARCH_OLD_URL}/${INDEX} \
            --output=${OUTPUT_FILE} \
            --type=mapping"

        # Execute the elasticdump command
        $ELASTICDUMP_CMD

        # Check if the elasticdump command was successful
        if [ $? -eq 0 ]; then
            echo "Backup of index ${INDEX} mapping completed successfully."
        else
            echo "Error backing up index ${INDEX}."
        fi
    done

    for INDEX in "${INDICES[@]}"; do
        OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_data_backup.json"

        # Build the elasticdump command
        ELASTICDUMP_CMD="elasticdump \
            --input=http://${ELASTICSEARCH_OLD_URL}/${INDEX} \
            --output=${OUTPUT_FILE} \
            --type=data
            --limit 10000"

        # Execute the elasticdump command
        $ELASTICDUMP_CMD

        # Check if the elasticdump command was successful
        if [ $? -eq 0 ]; then
            echo "Backup of index ${INDEX} completed successfully."
        else
            echo "Error backing up index ${INDEX}."
        fi
    done
fi

if [ "$RESTORE_ENABLE" = true ]; then
    for INDEX in "${INDICES[@]}"; do
        OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_mapping_backup.json"
        
        # Process the mapping file to remove unsupported parameters

        PROCESSED_FILE="${BACKUP_DIR}/${INDEX}_mapping_processed.json"

        jq 'del(.mappings._default_, .mappings._meta, .mappings.dynamic_templates, .mappings.dynamic, .mappings.general) | .mappings = .mappings["_doc"]' "${INPUT_FILE}" > "${PROCESSED_FILE}"

        # Print the contents of the processed file for debugging

        echo "Contents of ${PROCESSED_FILE}:"

        cat "${PROCESSED_FILE}"

        # Build the elasticdump command
        ELASTICDUMP_CMD="elasticdump \
            --input=${PROCESSED_FILE} \
            --output=https://${USERNAME}:${PASSWORD}@${ELASTICSEARCH_NEW_URL}/${INDEX} \
            --type=mapping"

        # Execute the elasticdump command
        $ELASTICDUMP_CMD

        # Check if the elasticdump command was successful
        if [ $? -eq 0 ]; then
            echo "Restoring of index ${INDEX} mapping completed successfully."
        else
            echo "Error Restoring index ${INDEX}."
        fi
    done

    for INDEX in "${INDICES[@]}"; do
        OUTPUT_FILE="${BACKUP_DIR}/${INDEX}_data_backup.json"

        # Build the elasticdump command
        ELASTICDUMP_CMD="elasticdump \
            --input=${OUTPUT_FILE} \
            --output=https://${USERNAME}:${PASSWORD}@${ELASTICSEARCH_NEW_URL}/${INDEX} \
            --type=data
            --limit 10000"

        # Execute the elasticdump command
        $ELASTICDUMP_CMD

        # Check if the elasticdump command was successful
        if [ $? -eq 0 ]; then
            echo "Restoring of index ${INDEX} data completed successfully."
        else
            echo "Error Restoring index ${INDEX}."
        fi
    done
fi

kubectl get pods -n playground
kubectl cp <path_to_script_in_your_machine>/es-dump.sh playground/<playground_name>:<path_in_playground_pod>/es-dump.sh

# Execute into the playground pod shell and run the below command
kubectl exec -it <playground_pod_name> -n playground  bash

# Run the script which takes dump of your elasticsearch data using below command
cd <path_to_script_inside_playground_pod>
chmod +x es-dump.sh
./es-dump.sh

Using the above script, you can take the data dump from the old cluster and restore it in the new elasticsearch in a single command.
After restoring the data successfully in the new elasticsearch cluster, check the cluster health and document count using the below command.

# Enter into elasticsearch pod
kubectl exec -it <elasticsearch_data_pod_name> -n <elasticsearch_namespace>  bash

# To check cluster health 
curl -k -X GET "https://elastic:<password>@<new_elasticsearch_url>/_cat/health?v=true&pretty"

# To check documents count and indices status
curl -k -X GET "https://elastic:<password>@<new_elasticsearch_url>/_cat/indices?v

Now the deployment and restoring the data are completed successfully. It's time to change the es_url and indexer_url in egov-config present under cluster-configs of the environment file. The same can be updated directly using the below command.

kubectl edit configmap egov-config --namespace egov

Restart all the pods which have a dependency on elasticsearchwith cluster-configs to pick a new elasticsearch_url.

go run main.go deploy -c -e <env_file.yaml> <service_image>

Core services

Monitoring how-to
Debugging
Fixing/escalating

DIGIT apps

Monitor, debug, fix

DSS dashboard

Observability

ES-Curator - Clear Old Logs/indices

Overview

Curator is a tool from Elastic (the company behind Elasticsearch) to help manage your Elasticsearch cluster. You can create, backup, and delete some indices, Curator helps make this process automated and repeatable. Curator is written in Python, so almost all operating systems support it. It can easily manage the huge number of logs written to the Elasticsearch cluster periodically by deleting them and thus helps you save disk space.

Steps

es-curator helm chart for SSL-enabled elastic search: https://github.com/egovernments/DIGIT-DevOps/tree/digit-lts-go/deploy-as-code/helm/charts/backbone-services/es-curator

es-curator helm chart for SSL disabled elastic search: https://github.com/egovernments/DIGIT-DevOps/tree/unified-env/deploy-as-code/helm/charts/backbone-services/es-curator

A very elegant way to configure and automate Elasticsearch Curator execution is using a YAML configuration. The ‘es-curator-values.yaml’ file

# Common Labels
labels:
  group: "es-curator"

cron:
  schedule: "45 18 * * *"  
# Container Configs
namespace: es-cluster
image:
  repository: "untergeek/curator"
  tag: 8.0.15
logs-cleanup-enabled: true
jaeger-cleanup-enabled: true
logs-to-retain-in-days: 7
memory_limits: 256Mi
args: [ "--config", "/etc/es-curator/config.yml", "/etc/es-curator/action_file.yml" ]

# Additional Container Envs
env: |
  - name: SERVER_PORT
    value: "8080"
  - name: JAVA_OPTS
    value: {{ index .Values "heap" | quote }}
  - name: ES_CLIENT_HOST
    valueFrom:
      configMapKeyRef:
        name: egov-config
        key: es-indexer-host 
  - name: ES_USERNAME
    valueFrom:
      secretKeyRef:
        name: elasticsearch-master-credentials
        key: username 
  - name: ES_PASSWORD
    valueFrom:
      secretKeyRef:
        name: elasticsearch-master-credentials
        key: password
  - name: ES_CLIENT_PORT
    value: "9200"  
  - name: LOG_LEVEL
    value: "DEBUG" 
  {{- if index .Values "logs-cleanup-enabled" }}                  
  - name: LOGS_CLEANUP_DISABLED
    value: "False"
  - name: RETAIN_LOGS_IN_DAYS
    value: {{ index .Values "logs-to-retain-in-days" | quote }}
  {{- end }}               
  {{- if index .Values "jaeger-cleanup-enabled" }}     
  - name: JAEGER_CLEANUP_DISABLED
    value: "False"
  - name: RETAIN_JAEGER_DATA_IN_DAYS
    value: "14"
  {{- end }}       
extraVolumes: |
  - name: config-volume
    configMap:
      name: {{ template "name" . }}-config
extraVolumeMounts: |
  - mountPath: /etc/es-curator
    name: config-volume     
resources: |
  requests:
    memory: {{ .Values.memory_limits | quote }}
  limits:
    memory: {{ .Values.memory_limits | quote }}

You can modify the above es-curator-infra-values.yaml according to the requirements, some modifications are suggested below:

  schedule: "45 18 * * *"

The above represents all the possible numbers for that position.

Schedule Cron Job: In the above code, at line number 6, the Cron Job is Scheduled to run at 6:45 PM every day. You can schedule your Cron Job accordingly.
RETAIN_LOGS_IN_DAYS: Specify the age of the logs to be deleted. In line 14 of the code, logs-to-retain-in-days indicate that logs older than 7 days will be deleted.

Monitoring

There are many monitoring tools out there. Before choosing what we would work with on our clients Clusters, we had to take many things into consideration. We use Prometheus and Grafana for Monitoring of our and our client’s clusters.

Introduction

Monitoring is an important pillar of DevOps best practices. This gives you important information about the performance and status of your platform. This is even more true in distributed environments such as Kubernetes and microservices.

One of Kubernetes’ great strengths is its ability to extend its services and applications. When you reach thousands of applications, it’s impractical to manually monitor or use scripts. You need to adopt a scalable surveillance system! This is where Prometheus and Grafana come in.

Prometheus makes it possible to collect, store, and use platform metrics. Grafana, on the other hand, connects to Prometheus, allowing you to create beautiful dashboards and charts.

Today we’ll talk about what Prometheus is and the best way to deploy it to Kubernetes, with the operator. We will see how to set up a monitoring platform using Prometheus and Grafana.

This tutorial provides a good starting point for observability and goes a step further!

Prometheus

Prometheus is a free open source event monitoring and notification application developed on SoundCloud in 2012. Since then, many companies and organizations have adopted and contributed to them. In 2016, the Cloud Native Computing Foundation (CNCF) launched the Prometheus project shortly after Kubernetes

The timeline below shows the development of the Prometheus project.

Concepts

Prometheus is considered Kubernetes’ default monitoring solution and was inspired by Google’s Borgman. Use HTTP pull requests to collect metrics from your application and infrastructure. It’s targets are discovered via service discovery or static configuration. Time series push is supported through the intermediate gateway.

Metrics exposed by a Prometheus target has the following format: <metric name>{<label name>=<label value>, ...}

Prometheus records real-time metrics in a time series database (TSDB). It provides a dimensional data model, ease of use, and scalable data collection. It also provides PromQL, a flexible query language to use this dimensionality.

The above architecture diagram shows that Prometheus is a multi-component monitoring system. The following parts are built into the Prometheus deployment:

The Prometheus server scrapes and stores time series data. It also provides a user interface for querying metrics.
The Client libraries are used for instrumenting application code.
Pushgateway supports collecting metrics from short-lived jobs.
Prometheus also has a service exporter for services that do not directly instrument metrics.
The Alertmanager takes care of real-time alerts based on triggers

Why Choose The Prometheus Operator?

Kubernetes provides many objects (pods, deploys, services, ingress, etc.) for deploying applications. Kubernetes allows you to create custom resources via custom resource definitions (CRDs).

The CRD object implements the final application behavior. This improves maintainability and reduces deployment effort. When using the Prometheus operator, each component of the architecture is taken from the CRD. This makes Prometheus setup easier than traditional installations.

Prometheus Classic installation requires a server configuration update to add new metric endpoints. This allows you to register a new endpoint as a target for collecting metrics. Prometheus operators use monitor objects (PodMonitor, ServiceMonitor) to dynamically discover endpoints and scrape metrics.

Prometheus Operator saves you time in installing and maintaining Prometheus. Provides monitoring objects for dynamically collecting metrics without updating the Prometheus configuration.

Deploying Prometheus With The Operator

kube-prometheus-stack is a series of Kubernetes manifests, Grafana dashboards, and Prometheus rules. Make use of Prometheus using the operator to provide easy-to-use end-to-end monitoring of Kubernetes clusters.

Github Link

This collection is available and can be deployed using a Helm Chart. You can deploy your monitor stack from a single command line-first time with Helm? Check out this article for a helm tutorial.

Installing Helm

$ brew install helm

Not using Mac?

Take a look at this documentation, to find the appropriate setup for you: https://helm.sh/docs/intro/install/#through-package-managers

Creating the dedicated monitoring namespace

In Kubernetes, namespaces provide a mechanism for isolating groups of resources within a single cluster. We create a namespace named monitoring to prepare the new deployment:

$ kubectl create namespace monitoring

Installing kube-prometheus-stack with Helm

Add the Prometheus chart repository and update the local cache:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts$ helm repo update

Deploy the kube-stack-prometheus chart in the namespace monitoring with Helm:

$ helm upgrade --namespace monitoring --install kube-stack-prometheus prometheus-community/kube-prometheus-stack --set prometheus-node-exporter.hostRootFsMount.enabled=false

hostRootFsMount.enabled is to be set to false to work on Docker Desktop on Macbook.

Now, CRDs are installed in the namespace. You can verify with the following kubectl command:

$ kubectl get -n monitoring crds                                                           NAME                                        CREATED ATalertmanagerconfigs.monitoring.coreos.com   2022-03-15T10:54:41Zalertmanagers.monitoring.coreos.com         2022-03-15T10:54:42Zpodmonitors.monitoring.coreos.com           2022-03-15T10:54:42Zprobes.monitoring.coreos.com                2022-03-15T10:54:42Zprometheuses.monitoring.coreos.com          2022-03-15T10:54:42Zprometheusrules.monitoring.coreos.com       2022-03-15T10:54:42Zservicemonitors.monitoring.coreos.com       2022-03-15T10:54:42Zthanosrulers.monitoring.coreos.com          2022-03-15T10:54:42Z

Here is what we have running now in the namespace:

The chart has installed Prometheus components and Operator, Grafana — and the following exporters:

prometheus-node-exporter exposes hardware and OS metrics
kube-state-metrics listens to the Kubernetes API server and generates metrics about the state of the objects

Our monitoring stack with Prometheus and Grafana is up and ready!

Connecting To Prometheus Web Interface

The Prometheus web UI is accessible through port-forward with this command:

$ kubectl port-forward --namespace monitoring svc/kube-stack-prometheus-kube-prometheus 9090:9090

Opening a browser tab on http://localhost:9090 shows the Prometheus web UI. We can retrieve the metrics collected from exporters:

Going to the “Status>Targets” and you can see all the metric endpoints discovered by the Prometheus server:

Connecting To Grafana

The credentials to connect to the Grafana web interface are stored in a Kubernetes Secret and encoded in base64. We retrieve the username/password couple with these two commands:

$ kubectl get secret --namespace monitoring kube-stack-prometheus-grafana -o jsonpath='{.data.admin-user}' | base64 -d$ kubectl get secret --namespace monitoring kube-stack-prometheus-grafana -o jsonpath='{.data.admin-password}' | base64 -d

We create the port-forward to Grafana with the following command:

$ kubectl port-forward --namespace monitoring svc/kube-stack-prometheus-grafana 8080:80

Open your browser and go to http://localhost:8080 and fill in previous credentials:

The kube-stack-prometheus deployment has provisioned Grafana dashboards:

Here we can see one of them showing compute resources of Kubernetes pods:

That’s all folks. Today, we looked at installing Grafana and Prometheus on our K8s Cluster.

Loki

Distributed Log Aggregation System: Loki is an open-source log aggregation system built for cloud-native environments, designed to efficiently collect, store, and query log data. Loki was inspired by Prometheus and shares similarities in its architecture and query language, making it a natural complement to Prometheus for comprehensive observability.

Key Features:

Label-based Indexing
LogQL Query Language
Log Stream Compression
Scalable and Cost-Efficient
Integration with Grafana

Loki configuration

loki:
  enabled: true
  isDefault: true
  image:
    tag: 2.9.3
    repository: "grafana/loki"
  service:
    port: 3100
  url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
  readinessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  livenessProbe:
    httpGet:
      path: /ready
      port: http-metrics
    initialDelaySeconds: 45
  datasource:
    jsonData: "{}"
    uid: ""
  persistence:
    enabled: true
    accessModes:
      - ReadWriteOnce
    size: 10Gi  # Set the desired size for the persistent volume

promtail:
  enabled: true
  config:
    logLevel: info
    serverPort: 3101
    clients:
      - url: http://{{ .Release.Name }}:3100/loki/api/v1/push

Loki grafana dashboard

Configure the loki dashboard for easy access

Tracing

Pre-reads

This doc covers the steps on how to deploy an OpenTelemetry collector on Kubernetes. We will then use an OTEL instrumented (Go) application provided by OpenTelemetry to send traces to the Collector. From there, we will bring the trace data to a Jaeger collector. Finally, the traces will be visualised using the Jaeger UI.

This image shows the flow between the application, OpenTelemetry collector and Jaeger.

This OpenTelemetry repository provides a complete demo on how you can deploy OpenTelemetry on Kubernetes, we can use this as a starting point.

Pre-requisites

To start off, we need a Kubernetes cluster you can use any of your existing Kubernetes clusters that has got the apx 2vCPUs, 4GB RAM, and 100GB Storage.

Local Kubernetes Cluster Setup

Skip this in case you have the existing cluster.

In case, you don't have the ready Kubernetes but you have a good local machine with at least 4GB RAM left, you can use a local instance of Kind. The application will access this Kubernetes cluster through a NodePort (on port 30080). So make sure this port is free.

conn, err := grpc.DialContext(ctx, "localhost:30080", grpc.WithTransportCredentials(insecure.NewCredentials()), grpc.WithBlock())

To use NodePort with Kind, we need to first enable it.

Extra port mappings can be used to port forward to the kind nodes. This is a cross-platform option to get traffic into your kind cluster.

vim kind-config.yaml

kind: ClusterapiVersion: kind.x-k8s.io/v1alpha4nodes:- role: control-plane  # port forward 30080 on the host to 30080 on this node  extraPortMappings:  - containerPort: 30080    hostPort: 30080- role: worker

Create the cluster with: kind create cluster --config kind-config.yaml

Creating cluster "kind" ... ✓ Ensuring node image (kindest/node:v1.24.0) 🖼 ✓ Preparing nodes 📦 📦 ✓ Writing configuration 📜 ✓ Starting control-plane 🕹️ ✓ Installing CNI 🔌 ✓ Installing StorageClass 💾 ✓ Joining worker nodes 🚜Set kubectl context to "kind-kind"You can now use your cluster with:kubectl cluster-info --context kind-kindThanks for using kind! 😊

Once our Kubernetes cluster is up, we can start deploying Jaeger.

What is Jaeger?

Jaeger is an open-source distributed tracing system for tracing transactions between distributed services. It’s used for monitoring and troubleshooting complex microservices environments. By doing this, we can view traces and analyse the application’s behaviour.

Why do we need it?

Using a tracing system (like Jaeger) is especially important in microservices environments since they are considered a lot more difficult to debug than a single monolithic application.

Problems that Jaeger addresses?

Distributed tracing monitoring
Performance and latency optimisation
Root cause analysis
Service dependency analysis

Deploy Jaeger

To deploy Jaeger on the Kubernetes cluster, we can make use of the Jaeger operator.

Operators are pieces of software that ease the operational complexity of running another piece of software.

Deploy Jaeger Operator

You first install the Jaeger Operator on Kubernetes. This operator will then watch for new Jaeger custom resources (CR).

There are different ways of installing the Jaeger Operator on Kubernetes:

using Helm
using Deployment files

Before you start, pay attention to the Prerequisite section.
Since version 1.31 the Jaeger Operator uses webhooks to validate Jaeger custom resources (CRs). This requires an installed version of the cert-manager.

Installing Cert-Manager

cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes and OpenShift workloads. It will obtain certificates from a variety of Issuers, both popular public Issuers as well as private Issuers, and ensure the certificates are valid and up-to-date, and will attempt to renew certificates at a configured time before expiry.

Installation of cert-manager of is very simple, just run:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml

By default, cert-manager will be installed into the cert-manager namespace.

You can verify the installation by following the instructions here

With cert-manager installed, let’s continue with the deployment of Jaeger

Installing Jaeger Operator Using Helm

Jump over to Artifact Hub and search for jaeger-operator

Add the Jaeger Tracing Helm repository:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts

To install the chart with the release name my-release (in the default namespace)

helm install my-release jaegertracing/jaeger-operator

You can also install a specific version of the helm chart:

helm install my-release jaegertracing/jaeger-operator --version 2.25.0

Verify that it’s installed on Kubernetes:

helm list -A

You can also deploy the Jaeger operator using deployment files.
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.36.0/jaeger-operator.yaml

At this point, there should be a jaeger-operator deployment available.

kubectl get deployment my-jaeger-operator

NAME                 READY   UP-TO-DATE   AVAILABLE   AGEmy-jaeger-operator   1/1     1            1           2m58s

The operator is now ready to create Jaeger instances.

Deploy Jaeger All-in-One

The operator that we just installed doesn’t do anything itself, it just means that we can create jaeger resources/instances that we want the jaeger operator to manage.

The simplest possible way to create a Jaeger instance is by deploying the All-in-one strategy, which installs the all-in-one image, and includes the agents, collector, query and the Jaeger UI in a single pod using in-memory storage.

Create a yaml file like the following. The name of the Jaeger instance will be simplest

vim simplest.yaml

apiVersion: jaegertracing.io/v1kind: Jaegermetadata:  name: simplest

kubectl apply -f simplest.yaml

After a little while, a new in-memory all-in-one instance of Jaeger will be available, suitable for quick demos and development purposes.

When the Jaeger instance is up and running, we can check the pods and services.

kubectl get pods

NAME                     READY STATUS    RESTARTS   AGEsimplest-656d7cf5c8-lff7b 1/1  Running   0          3m55s

kubectl get services

To get the pod name, query for the pods belonging to the simplest Jaeger instance:

Query the logs from the pod:

kubectl logs -l app.kubernetes.io/instance=simplest

{"level":"info","ts":1660155049.86027,"caller":"channelz/logging.go:50","msg":"[core]Channel Connectivity change to READY","system":"grpc","grpc_log":true}{"level":"info","ts":1660155049.8612773,"caller":"grpc/builder.go:120","msg":"Agent collector connection state change","dialTarget":":14250","status":"READY"}{"level":"info","ts":1660155049.8617437,"caller":"app/server.go:241","msg":"Starting HTTP server","port":16686,"addr":":16686"}{"level":"info","ts":1660155049.8621716,"caller":"app/server.go:260","msg":"Starting GRPC server","port":16685,"addr":":16685"}

Let’s open the Jaeger UI

Use port-forwarding to access the Jaeger UI

kubectl port-forward svc/simplest-query 16686:16686

Forwarding from 127.0.0.1:16686 -> 16686Forwarding from [::1]:16686 -> 16686

Jaeger UI

Deploy Open Telemetry Collector

To deploy the OpenTelemetry collector, we will use this otel-collector.yaml file as a starting point. The yaml file consists of a ConfigMap, Service and a Deployment.

vim otel-collector.yaml

Make sure to change the name of the jaeger collector (exporter) to match the one we deployed above. In our case, that would be:

exporters:      jaeger:        endpoint: "simplest-collector.default.svc.cluster.local:14250"

Also, pay attention to receivers. This part creates the receiver on the Collector side and opens up the port 4317 for receiving traces, which enables the application to send data to the OpenTelemetry Collector.

...  otel-collector-config: |    receivers:      otlp:        protocols:          grpc:            endpoint: "0.0.0.0:4317"...

Apply the file with: kubectl apply -f otel-collector.yaml

configmap/otel-collector-conf createdservice/otel-collector createddeployment.apps/otel-collector created

Verify that the OpenTelemetry Collector is up and running.

kubectl get deployment

kubectl logs deployment/otel-collector

"Everything is ready. Begin running and processing data."

Run Application

Time to send some trace data to our OpenTelemetry collector.

Remember, that the application access the Kubernetes cluster through a NodePort on port 30080. The Kubernetes service will bind the 4317 port used to access the OTLP receiver to port 30080 on the Kubernetes node.
By doing so, it makes it possible for us to access the Collector by using the static address <node-ip>:30080. In case you are running a local cluster, this will be localhost:30080. Source

This repository contains an (SDK) instrumented application written in Go, that simulates an application.

go run main.go

2022/08/10 20:31:37 Waiting for connection...2022/08/10 20:31:37 Doing really hard work (1 / 10)2022/08/10 20:31:38 Doing really hard work (2 / 10)2022/08/10 20:31:39 Doing really hard work (3 / 10)2022/08/10 20:31:40 Doing really hard work (4 / 10)2022/08/10 20:31:41 Doing really hard work (5 / 10)2022/08/10 20:31:42 Doing really hard work (6 / 10)2022/08/10 20:31:43 Doing really hard work (7 / 10)2022/08/10 20:31:44 Doing really hard work (8 / 10)2022/08/10 20:31:45 Doing really hard work (9 / 10)2022/08/10 20:31:46 Doing really hard work (10 / 10)2022/08/10 20:31:47 Done!

Viewing the data

Let’s check out the telemetry data generated by our sample application

Again, we can use port-forwarding to access Jaeger UI.

kubectl port-forward svc/simplest-query 16686:16686

Open the web-browser and go to http://127.0.0.1:16686/

Under Service select test-service to view the generated traces.

The service name is specified in the main.go file.

res, err := resource.New(ctx,  resource.WithAttributes(   // the service name used to display traces in backends   semconv.ServiceNameKey.String("test-service"),  ),

The application will access this Kubernetes cluster through a NodePort (on port 30080). The URL is specified here:

conn, err := grpc.DialContext(ctx, "localhost:30080", grpc.WithTransportCredentials(insecure.NewCredentials()), grpc.WithBlock()) if err != nil {  return nil, fmt.Errorf("failed to create gRPC connection to collector: %w", err) }

Done

This document has covered how we deploy an OpenTelemetry collector on Kubernetes. Then we sent trace data to this collector using an Otel SDK instrumented application written in Go. From there, the traces were sent to a Jaeger collector and visualised in Jaeger UI.

Jaeger Tracing Setup

This doc will cover how you can set up the tracing on existing environments either with help of go lang script or Jenkins deployment jobs.

The Jaeger tracing system is an open-source tracing system for microservices, and it supports the OpenTracing standard.

Pre-reads

https://www.jaegertracing.io/docs OAuth2-Proxy Setup\

Pre-requisites

DIGIT uses golang (required v1.13.3) automated scripts to deploy the builds onto Kubernetes - Linux or Windows or Mac
All DIGIT services are packaged using helm charts Installing Helm
kubectl is a CLI to connect to the kubernetes cluster from your machine
Install Visualstudio IDE Code for better code/configuration editing capabilities
Git
OAuth2-Proxy Setup

Jaeger Tracing Glossary

Agent – A network daemon that listens for spans sent over User Datagram Protocol.

Client – The component that implements the OpenTracing API for distributed tracing.

Collector – The component that receives spans and adds them into a queue to be processed.

Console – A UI that enables users to visualize their distributed tracing data.

Query – A service that fetches traces from storage.

Span – The logical unit of work in Jaeger, which includes the name, starting time and duration of the operation.

Trace – The way Jaeger presents execution requests. A trace is composed of at least one span.

Jaeger Deployment

Add below Jaeger configs in your env config file (eg. qa.yaml, dev.yaml and, etc…)

 jaeger:
  host: ""
  port: ""
  sampler-type: ""
  sampler-param: ""
  collector:
    samplingConfig: |
      {
        "service_strategies": [
          {
            "service": "tl-services",
            "type": "probabilistic",
            "param": 0.5
          },
          {
            "service": "tl-calculator",
            "type": "probabilistic",
            "param": 0.5
          },
          {
            "service": "report-service",
            "type": "probabilistic",
            "param": 0.5
          },
          {
            "service": "pt-services-v2",
            "type": "probabilistic",
            "param": 0.5
          },
          {
            "service": "pt-calculator-v2",
            "type": "probabilistic",
            "param": 0.5
          },
          {
            "service": "collection-services",
            "type": "probabilistic",
            "param": 0.2
          },
          {
            "service": "billing-service",
            "type": "probabilistic",
            "param": 0.2
          },
          {
            "service": "egov-data-uploader",
            "type": "probabilistic",
            "param": 0.2
          },
          {
            "service": "egov-hrms",
            "type": "probabilistic",
            "param": 0.5
          },
          {
            "service": "rainmaker-pgr",
            "type": "probabilistic",
            "param": 0.5
          }
        ],
        "default_strategy": {
          "type": "probabilistic",
          "param": 0.05
        }
      }

2. You can deploy the Jaeger using one of the below methods.

Deploy using go lang
go run main.go deploy -e <environment_name> -c 'jaeger'
Deploy using Jenkin’s respective deployment jobs

you can connect to the Jaeger console at https://<your_domin_name>/tracing/

Look at the box on the left-hand side of the page labelled Search. The first control, a chooser, lists the services available for tracing, click the chooser and you’ll see the listed services.

Select the service and click the Find Traces button at the bottom of the form. You can now compare the duration of traces through the graph shown above. You can also filter traces using “Tags” section under “Find Traces”. For example, Setting the “error=true” tag will filter out all the jobs that have errors.

To view the detailed trace, you can select a specific trace instance and check details like the time taken by each service, errors during execution and logs.

Additional Instructions

If due for some reason you are not able to access the tracing dashboard from your sub-domain, You can use the below command to access the tracing dashboard.

kubectl port-forward svc/jaeger-query 8080:80

Note: port 8080 is for local access, if you are utilizing the 8080 port you can use the different port as well.

To access the tracing hit the browser with this localhost:8080 URL.

Reference Docs

Logging

logging solution in Kubernetes with ECK Operator

In this article, we’ll deploy ECK Operator using helm to the Kubernetes cluster and build a quick-ready solution for logging using Elasticsearch, Kibana, and Filebeat.

What is ECK?

Built on the Kubernetes Operator pattern, Elastic Cloud on Kubernetes (ECK) extends the basic Kubernetes orchestration capabilities to support the setup and management of Elasticsearch, Kibana, APM Server, Enterprise Search, Beats, Elastic Agent, and Elastic Maps Server on Kubernetes.

With Elastic Cloud on Kubernetes, we can streamline critical operations, such as:

Managing and monitoring multiple clusters
Scaling cluster capacity and storage
Performing safe configuration changes through rolling upgrades
Securing clusters with TLS certificates
Setting up hot-warm-cold architectures with availability zone awareness

Install ECK

In this case we use helmfile to manage the helm deployments: helmfile.yaml

repositories:
- name: elastic
  url: https://helm.elastic.co

releases:
  - name: eck-operator
    version: 2.2.0
    chart: elastic/eck-operator
    namespace: monitoring
    values:
    - ./eck-operator/values.yaml

2. But we can do that just with helm: Installation using helm

helm repo add elastic https://helm.elastic.co
helm install elastic-operator elastic/eck-operator -n monitoring --create-namespace

After that we can see that the ECK pod is running:

kubectl get po elastic-operator-0 -n monitoring

NAME                 READY   STATUS    RESTARTS   AGE
elastic-operator-0   1/1     Running   0          43s

The pod is up and running

Creating Elasticsearch, Kibana, and Filebeat resources

There are a lot of different applications in Elastic Stack, such as:

Elasticsearch
Kibana
Beats (Filebeat/Metricbeat)
APM Server
Elastic Maps
etc

In our case, we’ll use only the first three of them, because we just want to deploy a classical EFK stack.

Let’s deploy the following in the order:

Elasticsearch cluster: This cluster has 3 nodes, each node with 100Gi of persistent storage, and intercommunication with a self-signed TLS-certificate.

# This sample sets up an Elasticsearch cluster with 3 nodes.
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-logging
  namespace: monitoring
spec:
  version: 8.2.0
  nodeSets:
  - name: default
    config:
      # most Elasticsearch configuration parameters are possible to set, e.g: node.attr.attr_name: attr_value
      node.roles: ["master", "data", "ingest", "ml"]
      # this allows ES to run on nodes even if their vm.max_map_count has not been increased, at a performance cost
      node.store.allow_mmap: true
    podTemplate:
      metadata:
        labels:
          # additional labels for pods
          purpose: logging
      spec:
        containers:
        - name: elasticsearch
          # specify resource limits and requests
          resources:
            limits:
              memory: 4Gi
              cpu: 2
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms2g -Xmx2g"
    count: 3
    # request 100Gi of persistent data storage for pods in this topology element
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: gp2
  http:
    tls:
      selfSignedCertificate:
        # add a list of SANs into the self-signed HTTP certificate
        subjectAltNames:
        - dns: elasticsearch-logging-es-http.monitoring.svc.cluster.local
        - dns: elasticsearch-logging-es-http.monitoring.svc
        - dns: "*.monitoring.svc"
        - dns: "*.monitoring.svc.cluster.local"

2. The next one is Kibana: Very simple, just referencing Kibana object to Elasticsearch in a simple way.

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana-logging
  namespace: monitoring
spec:
  version: 8.2.2
  count: 1
  elasticsearchRef:
    name: elasticsearch-logging
    namespace: monitoring
  http:
    tls:
      selfSignedCertificate:
        disabled: true

3. The next one is Filebeat: This manifest contains DaemonSet used by Filebeat and some ServiceAccount stuff.

---
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
  namespace: monitoring
spec:
  type: filebeat
  version: 8.2.0
  elasticsearchRef:
    name: elasticsearch-logging
  kibanaRef:
    name: kibana-logging
  config:
    filebeat:
      autodiscover:
        providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints:
            enabled: true
            default_config:
              type: container
              paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
    setup:
      dashboards:
        enabled: true
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        dnsPolicy: ClusterFirstWithHostNet
        hostNetwork: true # Allows to provide richer host metadata
        priorityClassName: system-node-critical
        containers:
        - name: filebeat
          resources:
            limits:
              memory: 1Gi
              cpu: 1
          securityContext:
            runAsUser: 0
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: monitoring
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io

Testing

First of all, let’s get Kibana’s password: This password will be used to log in to Kibana

kubectl get secret elasticsearch-logging-es-elastic-user -ojson | jq '.data.elastic' -r | base64 -d

pCda48Ec32qbmFP1wdasd

2. Running port-forward to Kibana service: Port 5601 is forwarded to localhost

kubectl port-forward svc/kibana-logging-kb-http 5601:5601 -n monitoring 

Forwarding from 127.0.0.1:5601 -> 5601
Forwarding from [::1]:5601 -> 5601

3. Let’s log in to Kibana with the user elastic and password that we got before (http://localhost:5601), go to Analytics — Discover section and check logs:

eGov Monitoring & Alerting Setup

This doc will cover how you can set up the monitoring and alerting on existing k8s cluster either with help of go lang script or Jenkins deployment Jobs.

is an open-source system monitoring and alerting toolkit originally built at

Pre-reads

Pre-requisites

DIGIT uses (required v1.13.3) automated scripts to deploy the builds onto Kubernetes - or or
All DIGIT services are packaged using helm charts
is a CLI to connect to the kubernetes cluster from your machine
IDE Code for better code/configuration editing capabilities
Git

The default installation is intended to suit monitoring a kubernetes cluster the chart is deployed onto. It closely matches the kube-prometheus project.

service monitors to scrape internal kubernetes components
- kube-apiserver
- kube-scheduler
- kube-controller-manager
- etcd
- kube-dns/coredns
- kube-proxy

With the installation, the chart also includes dashboards and alerts.

Deployment steps:

1. Chose your env config file, if you are deploying monitoring and alerting into the qa environment chose qa.yaml similarly for uat, dev, and other environments.

grafana:
  initContainers:
    gitSync:
      enabled: true
      repo: "git@github.com:egovernments/configs" #REPLACE with your configs repo
      branch: "<branch_name>" #REPLACE with config repo branch name
  dashboardsFolder: /work-dir/configs/monitoring-dashboards

Depending upon your environment config file update the configs repo branch (like for qa.yaml add qa branch and uat.yaml it would be UAT the branch)

3. Enable the serviceMonitor in the nginx-ingress configs which are available in the same <env>.yaml and redeploy the nginx-ingress.

nginx-ingress:
  replicas: 1
  default-backend-service: "egov/nginx"
  namespace: egov
  cert-issuer: "letsencrypt-prod"
  ssl-protocols: "TLSv1.2 TLSv1.3"
  ssl-ciphers: "EECDH+CHACHA20:EECDH+AES"
  ssl-ecdh-curve: "X25519:prime256v1:secp521r1:secp384r1"
  controller:
    image:
      repository: egovio/nginx-ingress-controller
      tag: "0.26.1"     
    metrics:            #To collect the matrics data from nginx-ingress.
      enabled: true
      serviceMonitor:   #To enable the service monitoring of nginx-ingress
        enabled: true
    service:
      prometheusRule:
        enabled: true

go run main.go deploy -e <environment_name> -c 'nginx-ingress'

4. To enable alerting, Add alertmanager secret in <env>-secrets.yaml

If you want you can change the slack channel and other details like group_wait, group_interval, and repeat_interval according to your values.

5. You can deploy the prometheus-operator using one of the below methods.

1. Deploy using go lang deployer

go run main.go deploy -e <environment_name> -c 'prometheus-operator,grafana,prometheus-kafka-exporter'

2. Deploy using Jenkin’s deployment job. (here we are using deploy-to-dev, you can choose your environment specific deployment job )

To create a new panel in the existing dashboard:-

Set all required queries and apply the changes. Export the JSON file by clicking on the save dashboard

3. Go to the configs repo and select your branch. In the branch look for the monitoring-dashboards folder and update the existing *-dashboard.json with a newly exported JSON file.

eGov Logging Setup

This tutorial will walk you through How to Setup Logging in eGov

Pre-reads

Know about fluent-bit Know about es-curator

Pre-requisites

DIGIT uses (required v1.13.3) automated scripts to deploy the builds onto Kubernetes - or or
All DIGIT services are packaged using helm charts
is a CLI to connect to the kubernetes cluster from your machine
IDE Code for better code/configuration editing capabilities
Git

Logging Architecture:

Logging Deployment Steps:

git clone -b release https://github.com/egovernments/DIGIT-DevOps
Implement the kafka-v2-infra and elastic search infra setup into the existing cluster
Deploy the fluent-bit, kafka-connect-infra, and es-curator into your cluster, either using Jenkins deployment Jobs or go lang deployer
go run main.go deploy -e <environment_name> 'fluent-bit,kafka-connect-infra,es-curator'
Create Elasticsearch Service Sink Connector. You can run the below command in playground pods, make sure curl is installed before running any curl commands
1. Delete the Kafka infra sink connector if already exists with the Kafka connection, using the below command
  1. Use the below command to check Kafka infra sink connector
    curl http://kafka-connect-infra.kafka-cluster:8083/connectors/
  2. To delete the connector
    curl -X DELETE http://kafka-connect-infra.kafka-cluster:8083/connectors/egov-services-logs-to-es
2. The Kafka Connect Elasticsearch Service Sink connector moves data from Kafka-v2-infra to Elasticsearch infra. It writes data from a topic in Kafka-v2-infra to an index in Elasticsearch infra.
  curl -X POST http://kafka-connect-infra.kafka-cluster:8083/connectors/ -H 'Content-Type: application/json' -H 'Cookie: SESSIONID=f1349448-761e-4ebc-a8bb-f6799e756185' -H 'Postman-Token: adabf0e8-0599-4ac9-a591-920586ff4d50' -H 'cache-control: no-cache' -d '{ "name": "egov-services-logs-to-es", "config": { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "connection.url": "http://elasticsearch-data-infra-v1.es-cluster-infra:9200", "type.name": "general", "topics": "egov-services-logs", "key.ignore": "true", "schema.ignore": true, "value.converter.schemas.enable": false, "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "transforms": "TopicNameRouter", "transforms.TopicNameRouter.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.TopicNameRouter.regex": ".*", "transforms.TopicNameRouter.replacement": "egov-services-logs", "batch.size": 50, "max.buffered.records": 500, "flush.timeout.ms": 600000, "retry.backoff.ms": 5000, "read.timout.ms": 10000, "linger.ms": 100, "max.in.flight.requests": 2, "errors.log.enable": true, "errors.deadletterqueue.topic.name": "egov-services-logs-to-es-failed", "tasks.max": 1 } }'
3. You can verify sink Connector by using the below command
  curl http://kafka-connect-infra.kafka-cluster:8083/connectors/
Deploy the kibana-infra to query the elasticseach infra egov-services-logs indexes data.
go run main.go deploy -e <environment_name> 'kibana-infra'
You can access the logging to https://<sub-domain_name>/kibana-infra

Troubleshooting:

If the data is not receiving to elasticsearch infra's egov-services-logs index from kafka-v2-infra topic egov-services-logs.
1. Ensure that the elasticsearch sink connector is available, use the below command to check
  curl http://kafka-connect-infra.kafka-cluster:8083/connectors/
2. Also, make sure kafka-connect-infra is running without errors
  kubectl logs -f deployments/kafka-connect-infra -n kafka-cluster
3. Ensure elasticsearch infra is running without errors
In the event that none of the above services are having issues, take a look at the fluent-bit logs and restart it if necessary.

Performance

How/what to track?

What to monitor?

Infrastructure

Backbone services

What are the metrics to track for Kafka, Postgres and ES?

Core services

Use tracing to track core service APIs. Add info on Jaeger.

Identifying bottlenecks

Backbone services - Kafka, DB

Infra

Core services

Applications

Solutions

Handling errors

How to monitor each and every service
How to debug
Potential fixes

Security

How to identify security issues - where to look
Troubleshooting
Solutions

Reliability and disaster recovery

Privacy

Skillsets/hiring

Incident management processes

Kafka Troubleshooting Guide

This doc is about a Kafka troubleshooting guide

Pre-reads

https://kafka.apache.org/intro https://zookeeper.apache.org/

Pre-requisites

kubectl is a CLI to connect to the kubernetes cluster from your machine
Install Visualstudio IDE Code for better code/configuration editing capabilities
Git

Status check of Kafka Broker's

Using the below command you can able list down the Kafka brokers and their status

kubectl get pods -n kafka-cluster

If Kafka brokers are in crashloopbackoff or Error status
- Describe the brokers and look for error
  kubectl describe kafka-v2-0 -n kafka-cluster
kubectl describe kafka-v2-1 -n kafka-cluster
kubectl describe kafka-v2-2 -n kafka-cluster
Check Kafka broker's logs for error
kubectl logs -f kafka-v2-0 -n kafka-cluster
kubectl logs -f kafka-v2-1 -n kafka-cluster
kubectl logs -f kafka-v2-2 -n kafka-cluster
If brokers are in crashloopbackoff due to disk space issues, follow the below document for the cleanup of the logs
- Cleanup of Kafka logs

Status check of Zookeeper

Ensure Zookeeper pods are running without any errors in order to run Kafka brokers without a hitch

If Zookeeper pods are in crashloopbackoff or Error status, Use the below commands to check the error
- Describe the Zookeeper and look for error
  kubectl describe zookeeper-v2-0 -n zookeeper-cluster
  kubectl describe zookeeper-v2-1 -n zookeeper-cluster
  kubectl describe zookeeper-v2-2 -n zookeeper-cluster
- Check Kafka broker's logs for error
  kubectl logs -f zookeeper-v2-0 -n zookeeper-cluster
  kubectl logs -f zookeeper-v2-1 -n zookeeper-cluster
  kubectl logs -f zookeeper-v2-2 -n zookeeper-cluster

How to clean up Kafka logs

The following steps illustrates the way to cleanup Kafka logs.

For any logs that appear to be overflowing and consuming disk space, you can use the following steps to clean up the logs from Kafka brokers

Note: Make sure the team is informed before doing this activity. This activity will delete the Kafka topic data

Steps

Backup list of log file names and their disk consumption data (optional)
kubectl exec -it kafka-v2-0 -- du -h /opt/kafka-data/logs |tee backup_0.logs
kubectl exec -it kafka-v2-1 -- du -h /opt/kafka-data/logs |tee backup_1.logs
kubectl exec -it kafka-v2-2 -- du -h /opt/kafka-data/logs |tee backup_2.logs
Cleanup the logs
kubectl exec -it kafka-v2-0 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster
kubectl exec -it kafka-v2-1 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster
kubectl exec -it kafka-v2-2 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster

3. If the pod is in crashlookbackoff state, and the storage is full, use the following workaround:

Make a copy of the pod manifest
kubectl get statefulsets kafka-v2 -n kafka-cluster -oyaml > manifest.yaml
Scale down the Kafka statefulset replica count to zero
kubectl scale statefulsets kafka-v2 -n kafka-cluster --replicas=0
Make the following changes to the copy of the statefulsets manifest file
- Modify the command line from:

containers:
- command:
  - sh
  - -exc
  - |
    export KAFKA_BROKER_ID=${HOSTNAME##*-} && \
    export KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${POD_NAME}.kafka-v2-headless.${POD_NAMESPACE}:9092,EXTERNAL://${HOST_IP}:$((31090 + ${KAFKA_BROKER_ID})) && \
    exec /etc/confluent/docker/run

containers:
- command:
  - sh
  - -exc
  - |
    tail -f /dev/null

Apply this statefulsets manifest and scale up statefulsets replica count to 3, the pod should be in a running state now and follow [step 2].
Again scale down the Kafka statefulset replica count to zero
kubectl scale statefulsets kafka-v2 --replicas=0 -n kafka-cluster
Make the following changes to the copy of the statefulsets manifest file
- Modify the command line from:

containers:
- command:
  - sh
  - -exc
  - |
    tail -f /dev/null

containers:
- command:
  - sh
  - -exc
  - |
    export KAFKA_BROKER_ID=${HOSTNAME##*-} && \
    export KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${POD_NAME}.kafka-v2-headless.${POD_NAMESPACE}:9092,EXTERNAL://${HOST_IP}:$((31090 + ${KAFKA_BROKER_ID})) && \
    exec /etc/confluent/docker/run

Apply this statefulsets manifest and scale up statefulsets replica count to 3

How to change or reset consumer offset in Kafka?

In this tutorial, we will go through the step by step process to reset the offset of the Kafka consumer group

Consumer offset is used to track the messages that are consumed by consumers in a consumer group. A topic can be consumed by many consumer groups and each consumer group will have many consumers. A topic is divided into multiple partitions.

A consumer in a consumer group is assigned to a partition. Only one consumer is assigned to a partition. A consumer can be assigned to consume multiple partitions.

Consumer offset is managed at the partition level per consumer group.

Why reset the consumer offset?

In some scenarios, consumers which consumed the messages from a Kafka partition could have resulted in errors and the consumption would have been incomplete. In such cases of consumption failures you may have a need to re-consume the messages which were previously consumed. In such instances you would have to reset the consumer offset to an earlier offset.

Follow the steps below if consumers stop consuming data from consumer group topics for any reason.

Get a Shell to a Kafka broker

kubectl get pods -n kafka-cluster

kubectl exec -it kafka-v2-0 -n kafka-cluster

Find the current consumer offset
Use the kafka-consumer-groups along with the consumer group id followed by a describe.
```
kafka-consumer-groups --bootstrap-server kafka-v2.kafka-cluster:9092 --group <group_id> --describe
```
You will see 2 entries related to offsets – CURRENT-OFFSET and LOG-END-OFFSET for the partitions in the topic for that consumer group. CURRENT-OFFSET is the current offset for the partition in the consumer group.
If you find out any topic lags that are not getting cleared then use the following steps to reset the consumer offset
1. Scale down the respective consumer group service (eg. for egov-infra-persist you have to scale down the egov-persister service )
  kubectl scale --replicas=0 deployment <deployment_name> -n egov
2. Reset the consumer offset
  Use the kafka-consumer-groups to change or reset the offset. You would have to specify the topic, consumer group and use the –reset-offsets flag to change the offset.
  kafka-consumer-groups --bootstrap-server kafka-v2.kafka-cluster:9092 --group <group_id> --topic <topic_name> --reset-offsets --to-datetime 2021-06-25T21:22:39.306 --execute
  Reset offsets to offset from datetime. Format: ‘YYYY-MM-DDTHH:mm:SS.sss’
3. Scale up the respective consumer group service
  kubectl scale --replicas=2 deployment <deployment_name> -n egov

SRE Rituals

SRE Team's Monthly Tasks:

Cloud Cost - Monitoring/Optimization/Publishing
Infra Utilization Summary
History of deployments
History of Config changes
History of release (if any) post-release findings if any.
Monthly summary update

SRE Team's Weekly Tasks:

Cleanup logs
Backup logs
Weekly DB dump in case of SDC
ES Data backup
Publish Weekly Summary report/Come up with the format
Publish JIRA status

SRE Team's Daily Tasks:

Monitor the status of the environment and ensure every single service is running
Keep track of all tasks by creating tickets
Attend Daily scrums
Monitor the Prometheus Alters

FAQs

I am unable to login to the citizen or employee portal. The UI shows a spinner.

Deployment using helm

Helm Installation

Here we are going to learn how to install helm and why we are helm in DIGIT-DevOps

Before installing Helm you need to know about Yaml files. Yaml file: Yaml files (Yet Another Markup Language) are used to transmit the data in web applications. Json file: Json files (JavaScript Object Notation) are a standard text-based format to view the structured data of javascript object syntax.

JSON and YAML files are both used to transfer data between web applications. The key difference is that YAML files use indentation similar to Python to indicate the level of your code, unlike JSON.

Pre-requisites

Git (please visit the GitOps page if you haven't installed Git).
Install Visual Studio https://code.visualstudio.com/download IDE Code for better code visualization/editing capabilities
Install Golang https://go.dev/doc/install#download(required version:V1.13.3)
Kubectl (see working with Kubernetes page to install kubectl)

What is helm? helm can be defined as a package manager for Kubernetes. It is used to deploy (to extend) the applications and services easily into the Kubernetes cluster in the form of Helm charts.
What are helm charts? It consists of a set of templates and a file containing variables used to fill these templates based on the custom values and configurations.
Why are we using helm in DIGIT?
1. Greatly improved productivity
2. Reduced complexity of deployments
3. More streamlined CI/CD pipeline

Helm charts are written in YAML and contain everything your developers need to deploy a container to a Kubernetes cluster. You may be used to creating Pods, Deployments, Services etc. in Kubernetes via the kubectl create command. This way of creating objects is indeed valid and great for learning purposes. However, when running Kubernetes in production you often want to have all your objects defined as .yaml files. This makes it easier for others to know what’s running in the cluster and allows for your deployments to be version-controlled.

To install Helm, follow the link provided below https://helm.sh/docs/intro/install/

Helm chart customization

We have to make sure t

How to Dump Elasticsearch Indexes

Pre-read

https://www.npmjs.com/package/elasticdump https://www.elastic.co/guide/index.html

Prerequisites

elasticdump
kubectl is a CLI to connect to the kubernetes cluster from your machine

Steps:

Exec into the playground pod of you environment
kubectl exec -it<playground_pod_name> -n playground -- bash
Install elasticdump client if it's not available in the playground pod
The indexes available in an Elasticsearch environment can be known as follows
curl http://elasticsearch-data-v1.es-cluster:9200/_cat/indices?v
ES indexes can be dumped to a JSON file which can then be restored in the otherr environment.
elasticdump --input=http://elasticsearch-data-v1.es-cluster:9200/<my_index> --output=<my_index>.json
Note: Replace <my_index> with your index name that you need to take dump
Zip the dump and download the dump into your local machine
1. install the zip if it's not available in the playground pod
  1. zip es-dump.zip <my_index>.json
2. Run the below command from your local machine to download the es dump
  1. kubectl cp playground/<POD_NAME>:/root/es-dump.zip $HOME/es-dump.zip
The same can be restored in the other environment as follows
1. Copy the es dump from your local machine to another environment's playground pod
  kubectl cp $HOME/es-dump.zip playground/<POD_NAME>:/root/es-dump.zip $HOME/es-dump.zip
2. Restore the es index dump
  elasticdump --input= <my_index>.json --output=http://elasticsearch-data-v1.es-cluster:9200/<my_index>

Troubleshooting

Sometimes, the following error is thrown when indexes are getting restored.

error: {

type: 'cluster_block_exception',

reason: 'blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];'

}

This occurs because in its default configuration, Elasticsearch will not allocate any more disk space when more than 90% of the disk is used overall. (i.e. by Elasticsearch or other applications). This watermark can be set lower but this may prevent important applications from being able to properly allocate disk space.

A way out is to increase the size of the destination ES cluster (according to the size of the source cluster).

The capacity of the ES cluster (for the source/destination end) can be checked as follows :

curl -XGET 'http://elasticsearch-data-v1.es-cluster:9200/_cat/allocation?v'

If, for example, the elasticsearch-data uses a PersistentVolumeClaim, the same can be edited to increase the size using edit pvc PVC name. This capacity can only be increased if the underlying storage class has AllowVolumeExpansion set to true.

My DSS dashboard is not reflecting accurate numbers? What can I do?

How to identify indexing problems

Indexing issues can be identified by tallying the data in postgres database and in the ES. If there is a mismatch between the output there might be issues in indexing. To debug indexing issues, indexer service logs should be checked. The first step is to check if the record is getting consumed by the indexer service, if not the topic name in the indexer service should be checked. If the record is getting consumed then the logs should be checked. Errors might occur due to mismatching data types between the value in the record and in the index mapping(type of field defined in the mapping). Another source of error might be when indexer service calls other microservices like location. MDMS, HRMS etc. for enriching the data. Error might be thrown by these microservices which may result in data not getting indexed.

When to do reindexing

Reindexing is mostly done in two scenarios. The first is when the data is mismatching between RDBMS and the ES. In this case the data is reindexed into a new index and the old index is dropped. Using alias the new index is pointed to the same old index name. The second scenario is when the index structure needs to be changed. In this case the whole data needs to be reindexed using the new indexer configuration, once the reindexing is successful, the old index can be dropped and the new index can be pointed to the old index name using alias.

Payment Reindexing(Legacy Index):

Payment data is generated by the collection service and stored in the PostgreSQL database. To reindex data from postgres database, the legacy index API should be called. Once this API is called indexer service will call the _plainsearch API of collection service in loop until it fetches all the records. The indexer service will transform and enrich each record and push it on a kafka topic: dss-collection-update (which is configurable in application.properties). From this kafka topic dss-ingest consumes the record and enriches it further. Once dss-ingest enriches the record it will push the record to either kafka topic or directly to ES based on a flag called es.push.direct

If this flag is set to true dss-ingest will push directly to the ES else it will push the data to kafka topic called: egov-dss-ingest-enriched. To put data from this topic to ES, a kafka connector should be created. Steps to create kafka connector are mentioned in following section and exact cURL can be found in reference documents

Aliasing:

Suppose you had an index for property records by the name property-services. Upon triggering re-indexing, a new index was created by the name of property-services-enriched. You want to drop the original index and want all queries made to property-services index to internally refer to the newly created index. This is where the concept of aliasing comes into play. For creating an alias, the following curl needs to be executed -

POST /_aliases 
{
  "actions": [
    {
      "add": {
        "index": "property-services-enriched",
        "alias": "property-services"
      }
    }
  ]
}

For live indexing data, a configuration file should be created and added in configuration repo on GitHub. The path of the file should be added in the environment yaml file. The variable in which it has to be added is egov-indexer-yaml-repo-path. Once the configuration is added and the path is added in environment yaml, the indexer service should be restarted(redeployed) with config flag checked. This will restart the indexer service with the new configuration. Once the indexer is up and running, whenever a new event is generated by the service, the event will be consumed by the indexer service. The indexer service will transform and enrich the record based on the defined configuration. After that the indexer service will insert the data into ES.

Legacy Indexing:

Legacy index is the process of recreating the ES index from the postgres database. Indexer service does by fetching all the records from the particular service using a _plainsearch API. (The API url is part of request, but we generally expose an API called _plainsearch which is specifically used only for reindexing). The request body is as follows:

{
   "RequestInfo": {
       "apiId": "string",
       "ver": "string",
       "ts": null,
       "action": "string",
       "did": "string",
       "key": "string",
       "msgId": "string",
       "authToken": "ca3256e3-5318-47b1-8a68-ffcf2228fe35",
       "correlationId": "e721639b-c095-40b3-86e2-acecb2cb6efb",
       "userInfo": {
           "id": 23299,
           "uuid": "e721639b-c095-40b3-86e2-acecb2cb6efb",
           "userName": "xyz",
           "name": " Test user",
           "type": "EMPLOYEE",
           "mobileNumber": "9999999999",
           "emailId": "abc.xyz@gmail.com",
           "roles": [
               {
                   "id": 281,
                   "name": "SUPERUSER"
               }
           ]
       }
   },
   "apiDetails": {
       "uri": "http://fsm.egov:8080/fsm/v1/_plainsearch",
       "tenantIdForOpenSearch": "pb",
       "paginationDetails": {
           "offsetKey": "offset",
           "sizeKey": "limit",
           "maxPageSize": 200
       },
       "responseJsonPath": "$.fsm"
   },
   "legacyIndexTopic": "fsm-application-legacyindex",
   "tenantId": "pb.amritsar"
}

The requestInfo object is common for all requests. The apiDetails object contains the detail of the API which the indexer service will call to fetch the records. Following is a table describing the variables.

Key

Description

uri

URL of the search API

tenantIdForOpenSearch

TenantId for which the search should be called. (In case of statelevel tenantId like pb, the search API is expected to return data for all tenants)

offsetKey

Name of offset query param in search API

sizeKey

Name of limit query param in search API

maxPageSize

Batch size (The indexer will search for this many records in each search call)

responseJsonPath

JsonPath to service data (Basically it used to point to service data ignoring requestInfo)

legacyIndexTopic

Topic on which the data will be pushed

tenantId

TenantId of the index job (Unused field will be deprecated field in future releases)

After fetching the records in batches, the indexer service will transform and enrich each batch and push the batch of records on a topic given against the key legacyIndexTopic. To insert the data from this kafka topic, a kafka connector has to be created.

Kafka Connector:

Kafka connector makes it easy to stream from numerous sources into Kafka and from Kafka into various sinks. Across DIGIT we use kafka connectors mainly for pushing data into the ElasticSearch sink.

For performance improvement in indexer service reindexing jobs, kafka-connect is getting used to do part of pushing records from kafka-topic to elastic search. The creation of reindexing jobs will be through indexer service only as earlier, but the portion where data is pushed to elastic search would be handled through kafka-connect and not through indexer as it was before. So for reindexing, kafka connect should be run after initiating a reindexing job through indexer service.

Following is the cURL for creating kafka connector with ElasticSearch as its sink -

curl -X POST \
  http://kafka-connect.kafka-cluster:8083/connectors/ \
H 'Content-Type: application/json' \
H 'Cookie: SESSIONID=f1349448-761e-4ebc-a8bb-f6799e756185' \
H 'Postman-Token: adabf0e8-0599-4ac9-a591-920586ff4d50' \
H 'cache-control: no-cache' \
d '{
  "name": "{{connector-uniquename}}",
  "config": {
    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "connection.url": "http://elasticsearch-data-v1.es-cluster:9200",
    "type.name": "general",
    "topics": "{{kafka-topic}}",
    "key.ignore": "false",
    "schema.ignore": true,
    "value.converter.schemas.enable": false,
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "transforms": "TopicNameRouter",
    "transforms.TopicNameRouter.type": "org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.TopicNameRouter.regex": ".*",
    "transforms.TopicNameRouter.replacement": "{{elastic-search-index}}",
    "batch.size": 10,
    "max.buffered.records": 500,
    "flush.timeout.ms": 600000,
    "retry.backoff.ms": 5000,
    "read.timout.ms": 10000,
    "linger.ms": 100,
    "max.in.flight.requests": 2,
    "errors.log.enable": true,
    "errors.deadletterqueue.topic.name": "{{kafka-topic}}-es-failed",
    "tasks.max": 1
  }
}'

Reference Docs:

Deploy Nginx-Ingress-Controller

In this tutorial, we will go through the step by step process to deploy an NGINX ingress controller on a Kubernetes cluster.

The vast majority of Kubernetes clusters are used to host containers that process incoming requests from microservices to full web applications. Having these incoming requests come into a central location, then get handed out via services in Kubernetes, is the most secure way to configure a cluster. That central incoming point is an ingress controller.

NGINX is the most popularly used ingress controller for Kubernetes clusters. NGINX has most of the features enterprises are looking for, and will work as an ingress controller for Kubernetes regardless of which cloud, virtualization platform, or Linux operating system your Kubernetes cluster is running on.

Pre-reads

Pre-requisites

is a CLI to connect to the kubernetes cluster from your machine
IDE Code for better code/configuration editing capabilities
All DIGIT services are packaged using helm charts
DIGIT uses (required v1.13.3) automated scripts to deploy the builds onto Kubernetes - or or
Git

Install NGINX Ingress Controller

A Kubernetes service account is required to run NGINX as a service within the cluster. The service account needs to have following roles:

A cluster role to allow it to get, list, and read the configuration of all services and events. This role could be limited if you were to have multiple ingress controllers installed within the cluster. But in most cases, limiting access for this service account may not be needed.
A namespace-specific role to read and update all the ConfigMaps and other items that are specific to the NGINX Ingress controller’s own configuration.

Clone the following repo (If not already done as part of Infra setup), you may need to and then run it to your machine.

git clone -b release https://github.com/egovernments/DIGIT-DevOps
code DIGIT-DevOps/config-as-code/environments/egov-demo-template.yaml

The following configurations should be added to the environment file if they are not already there

nginx-ingress:
  replicas: 1
  default-backend-service: "egov/nginx"
  namespace: egov
  cert-issuer: "letsencrypt-prod"
  ssl-protocols: "TLSv1.2 TLSv1.3"
  ssl-ciphers: "EECDH+CHACHA20:EECDH+AES"
  ssl-ecdh-curve: "X25519:prime256v1:secp521r1:secp384r1"
  controller:
    image:
      repository: egovio/nginx-ingress-controller
      tag: "0.26.1"     
    metrics:
      enabled: true
      serviceMonitor:
        enabled: false  // To enable the service monitor, make sure you have installed the serviceMonitor CRD.  
    service:
      annotations: 
        service.beta.kubernetes.io/aws-load-balancer-type: nlb   // for Network Load Balancing (NLB) 
        enabled: true 
      prometheusRule:   
        enabled: false  // To enable prometheus rules, make sure you have deployed prometheus.
        
cert-manager:
  email: "<email_id>" // replace with email id to verify the domain
  images:
    - "quay.io/jetstack/cert-manager-controller:v0.10.1"
  namespace: egov

To apply this configuration, run the following command:

cd DIGIT-DevOps/deploy-as-code/deployer
go run main.go -c -e egov-demo-template 'nginx-ingress,cert-manager'

Deployment Job Pipeline Setup

In this tutorial, we will go through the step by step process to setup Deployment Job to the Jenkins

DIGIT-Deployment Jobs:

You may have doubts about what is deployment jobs? Below we explained about deployment jobs in detail.

What are deployment jobs?

Once we build a pipeline using jenkins we need to deploy(to set out) into a environment. For that we nee deployment jobs. Here, deployment jobs are nothing but the clusters(group of nodes or VM's)which are created using different environments. some of the environments that are present in DIGIT-DevOps:

In DIGIT there are so many deployment jobs are there. Go to the following repo to see all the deployment jobs.

https://github.com/egovernments/DIGIT-DevOps/blob/master/deploy-as-code/helm/environments/ci.yaml

Here you can see some of the deployment jobs that are present in DIGIT-DevOps.

deploymentJobs:
    - name: dev
      acl: [egovernments*micro-service-dev]
    - name: bihar-prod
      acl: [egovernments*bihar-prod]
    - name: bihar-dev
      acl: [egovernments*bihar-dev]
    - name: bihar-uat
      acl: [egovernments*bihar-uat]
    - name: qa
      acl: [egovernments*micro-service-qa] 
    - name: uat
      acl: [egovernments*micro-service-uat] 
    - name: ukd-dev
      acl: [egovernments*ukd-dev] 
    - name: ukd-prod-sdc
      acl: [egovernments*ukd-prod] 
    - name: ukd-sdc-uat
      acl: [egovernments*ukd-uat]                 
    - name: ci
      acl: [egovernments*micro-service-devops]      
    - name: ukd-dev-sdc
      acl: [egovernments*ukd-dev]
    - name: staging
      acl: [egovernments*staging-qa] 
    - name: nugp-demo
      acl: [egovernments*nugp-team] 
    - name: central-instance
      acl: [egovernments*micro-service-dev,egovernments*micro-service-qa]

What is acl?

Access control list (ACL) An access-control list is a list of permissions attached to an aws.
An ACL specifies which users or system processes can view, create, modify, delete, or otherwise manage objects.
Simply, ACL is a list of members in team and they can only able to access that job.

To deploy new job:

Add your Job name and acl in below path under deployment jobs: in ci.yaml file. egovernments/DIGIT-DevOps/blob/release/config-as-code/environments/ci-demo.yaml
We already discussed that deployments jobs are nothing but clusters. So, add the kubeconfigs of the cluster in the below ci-secrets.yaml.

https://github.com/egovernments/DIGIT-DevOps/blob/release/config-as-code/environments/ci-demo-secrets.yaml

If you are using the DIGIT-DevOps repo's release branch for deployment, this step is optional. Other branches require job-name-specific conditions in seed-deployment-jobs helm/charts/jenkins/values.yaml. Add your respective repo, branch names

Repo: To which repository the deployment job be added.
Branch: Usually master branch.
Helm Directory: deploy-as-code/helm
Environment: Add job-name here.

{{- else if (eq $job.name "dev") }}            
deployer(repo:'git@github.com:egovernments/iFix-DevOps.git', branch: 'dev', helmDir: 'deploy-as-code/helm', environment: '{{ $job.name }}')""")
      sandbox() 
    }
}
disabled(false)
}

refer this linkegovernments/DIGIT-DevOps/blob/master/deploy-as-code/helm/charts/backbone-services/jenkins/values.yam for moreinfo

To deploy, we should be in deployer path. For that go to the below path

DIGIT-DevOps/deploy-as-code/deployer

Re-deploy jenkins:

go run main.go -c -e ci 'jenkins'

Deployment Job pipeline:

Like this deployment jobs can be created in jenkins. Go to deployments and switch to any deployment job where you want to deploy in jenkins

Paste the image id which you have copied from the builds and click on build.