1 of 9

National Dashboard Adaptor Service

Overview

Design and develop an adaptor service which extracts data from the state DIGIT installations and pushes it onto the National Dashboard instance periodically.

A list of tasks for this is tracked for the adaptor -

Scope

Adaptor to be deployed on state DIGIT installations
Periodically, the adaptor extracts data and aggregates it from the different DIGIT modules
Posts the data to the National Dashboard for the state
Bookkeeping is done for every adaptor data extract and pushes for audit and debugging
Out of scope: extraction from non-DIGIT sources

A national dashboard adaptor extracts data from the state DSS at a scheduled time which can be configured and then would ingest in the National Dashboard. The adaptor ingests data at the state/ULB/Ward level for each module on a daily basis. The adapter sends the data in a batch size of 50 to the national dashboard.

Deployment of Airflow DAG

Overview

This page provides steps to deploy Airflow DAG.

Deployment of Airflow

The Kubernetes environment is required for the deployment of Airflow.

Step 1: Clone the git repo for airflow, and update the values.yaml as per the requirement.

Step 2: Update the git repository URL and subpath for the directory in values.yaml

Example: the following params are updated as given below:

repo: "GitHub - pmidc-digit/utilities ”
repoSubPath: "egov-national-dashboard-accelerator/dag
branch: "develop

Step 3: Change the directory to airflow and update the helm. Update the helm repo locally and add the airflow repo to Helm using the command below:

helm repo add apache-airflow https://airflow.apache.org

The above command pulls the airflow repo and it is added to the local helm repository.

Step 4: Installing Apache airflow after updating the helm repositories

helm install airflow apache-airflow/airflow --namespace egov

The above command will take the updated repo details.

Step 5: Upgrade the changes made to values.yaml using the command below.

helm upgrade --install airflow apache-airflow/airflow -n airflow -f values.yaml

The above command updates the git repo, subpath and branch while deployment.

Step 6: Deployment is done pods service will start running with updated values.yaml

Latest files for the deployment: Attached below is the final "values.yaml" file. It syncs both the plugins and dags from the repo. Airflow Deployment

Trigger Airflow DAG

Steps to trigger airflow DAG

Overview

In Airflow, a DAG –Directed Acyclic Graph – is a collection of the tasks you want to run, organized in a way that reflects their relationships and dependencies.

A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.

Run DAG Airflow

Manual Trigger

1.Log onto the Punjab Prod server using the credentials:

URL: Sign In - Airflow

username: admin

password: admin

2. Trigger the DAG by clicking on the “Trigger DAG with Config” option.

3. Enter a date and click on the Trigger button

Format {“date”: “dd-MM-yyyy”}

4. Click on the Log option and expand the DAG to view the logs. Choose a stage for any module.

Logs can also be viewed in the Elastic search index adaptor_logs

GET adaptor_logs/_search - the timestamp is provided based on the day for which the logs are being searched.

Scheduled DAG

This DAG triggers every day at midnight for the previous day.

Bulk Insert For A Date Range

Execute the script to run the DAG for a date range for the staging NDB

sh iterate_over_date.sh <start-date> <end-date> ex: sh iterate_over_date.sh 01-03-2022 05-03-2022

date needs to be in the format of dd-mm-YYYY
range exclusive of the last date, [start-date, end-date). For instance: in the above example, the script will trigger DAG on 1st, 2nd, 3rd and 4th March. It will not be triggered on 5th March.

utilities/Bulk_insert.sh at develop · pmidc-digit/utilities

Configure Airflow

Configure Airflow Variables

Key

Value

Remark

Configure Connections

Insert & Delete Data - Steps

Steps to delete the records:

Refer to the sample below to delete the data for the month of July module-wise in the Postgres database:

delete from nss_ingest_data where dataKey like '%-07-2022:FIRE%';

delete from nss_ingest_data where dataKey like '%-07-2022:TL%';

delete from nss_ingest_data where dataKey like '%-07-2022:WS%';

delete from nss_ingest_data where dataKey like '%-07-2022:PGR%';

delete from nss_ingest_data where dataKey like '%-07-2022:PT%';

delete from nss_ingest_data where dataKey like '%-07-2022:MCOLLECT%';

Steps to delete the records from ES:

Adjust the module and the date range accordingly.

Check the records before deleting.

Note: Deleting data from both ES and Postgres is mandated to avoid duplication of data.

Important Links & Credentials

Credentials

URL

UserID

Password

City

Links

QA Test Cases -

Elastic Search Queries -

List of Index in Punjab -

Code Checkin path -

Code Structure

The code is organised in the following code repo:

Refer to the folder structure below:

dags
plugins

The dags folder is the folder which contains the DAG code.

national_dashboard_template_latest.py(manual trigger)
national_dashboard_template_scheduled.py(scheduled)

The queries folder contains the ES queries for the individual modules

For adding a new module we need to add a module-specific file in the queries folder with the transform logic and then refer to the DAG code in both manual and trigger versions.

KT Sessions

The overview of the adaptor service

Pre-requisites For Enabling Adaptor

1. Deploy the Airflow on the state side to extract data from the Elastic Search server to push to the National Dashboard adaptor.

2. Configure the airflow with the required connection ids and the variables.

es_conn: To connect to the required ES server
digit-auth: To connect to the required API to insert data.
Variables: The credentials to connect to the API.

3. The index must contain data till the ward level and the structure must be similar to ensure the queries fetch the desired results.

4. Pull data from the DIGIT source only.

Changes Done For UPYOG

Builds deployed
national-dashboard-ingest-db: nda-patch-db6cb27b02-18
national-dashboard-kafka-pipeline:v0.0.1-762c61e743-3
Upyog devOps and MDMS changes
- upyog devOps PR
- upyog mdms PR
Add localisation for newly loaded Punjab tenants
Added NDA_SYSTEM role action mapping in MDMS and created a user with access rights of the same
Credentials: SYSTEMSU1 / eGov@123 - PG, user type: SYSTEM

Deployment of Airflow DAG

Overview

This page provides steps to deploy Airflow DAG.

Deployment of Airflow

The Kubernetes environment is required for the deployment of Airflow.

Step 1: Clone the git repo for airflow, and update the values.yaml as per the requirement.

Step 2: Update the git repository URL and subpath for the directory in values.yaml

Example: the following params are updated as given below:

repo: "GitHub - pmidc-digit/utilities ”
repoSubPath: "egov-national-dashboard-accelerator/dag
branch: "develop

Step 3: Change the directory to airflow and update the helm. Update the helm repo locally and add the airflow repo to Helm using the command below:

helm repo add apache-airflow https://airflow.apache.org

The above command pulls the airflow repo and it is added to the local helm repository.

Step 4: Installing Apache airflow after updating the helm repositories

helm install airflow apache-airflow/airflow --namespace egov

The above command will take the updated repo details.

Step 5: Upgrade the changes made to values.yaml using the command below.

helm upgrade --install airflow apache-airflow/airflow -n airflow -f values.yaml

The above command updates the git repo, subpath and branch while deployment.

Step 6: Deployment is done pods service will start running with updated values.yaml

Latest files for the deployment: Attached below is the final "values.yaml" file. It syncs both the plugins and dags from the repo. Airflow Deployment

Trigger Airflow DAG

Steps to trigger airflow DAG

Overview

In Airflow, a DAG –Directed Acyclic Graph – is a collection of the tasks you want to run, organized in a way that reflects their relationships and dependencies.

A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.

Run DAG Airflow

Manual Trigger

1.Log onto the Punjab Prod server using the credentials:

URL: Sign In - Airflow

username: admin

password: admin

2. Trigger the DAG by clicking on the “Trigger DAG with Config” option.

3. Enter a date and click on the Trigger button

Format {“date”: “dd-MM-yyyy”}

4. Click on the Log option and expand the DAG to view the logs. Choose a stage for any module.

Logs can also be viewed in the Elastic search index adaptor_logs

GET adaptor_logs/_search - the timestamp is provided based on the day for which the logs are being searched.

Scheduled DAG

This DAG triggers every day at midnight for the previous day.

Bulk Insert For A Date Range

Execute the script to run the DAG for a date range for the staging NDB

sh iterate_over_date.sh <start-date> <end-date> ex: sh iterate_over_date.sh 01-03-2022 05-03-2022

date needs to be in the format of dd-mm-YYYY
range exclusive of the last date, [start-date, end-date). For instance: in the above example, the script will trigger DAG on 1st, 2nd, 3rd and 4th March. It will not be triggered on 5th March.

utilities/Bulk_insert.sh at develop · pmidc-digit/utilities