Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page provides steps to deploy Airflow DAG.
The Kubernetes environment is required for the deployment of Airflow.
Step 1: Clone the git repo for airflow, and update the values.yaml as per the requirement.
Step 2: Update the git repository URL and subpath for the directory in values.yaml
Example: the following params are updated as given below:
repo: "GitHub - pmidc-digit/utilities ”
repoSubPath: "egov-national-dashboard-accelerator/dag
branch: "develop
Step 3: Change the directory to airflow and update the helm. Update the helm repo locally and add the airflow repo to Helm using the command below:
helm repo add apache-airflow https://airflow.apache.org
The above command pulls the airflow repo and it is added to the local helm repository.
Step 4: Installing Apache airflow after updating the helm repositories
helm install airflow apache-airflow/airflow --namespace egov
The above command will take the updated repo details.
Step 5: Upgrade the changes made to values.yaml using the command below.
helm upgrade --install airflow apache-airflow/airflow -n airflow -f values.yaml
The above command updates the git repo, subpath and branch while deployment.
Step 6: Deployment is done pods service will start running with updated values.yaml
Latest files for the deployment: Attached below is the final "values.yaml" file. It syncs both the plugins and dags from the repo. Airflow Deployment
Steps to trigger airflow DAG
In Airflow, a DAG
–Directed Acyclic Graph – is a collection of the tasks you want to run, organized in a way that reflects their relationships and dependencies.
A DAG is defined in a Python script, which represents the DAGs structure (tasks and their dependencies) as code.
Manual Trigger
1.Log onto the Punjab Prod server using the credentials:
URL: Sign In - Airflow
username: admin
password: admin
2. Trigger the DAG by clicking on the “Trigger DAG with Config” option.
3. Enter a date and click on the Trigger button
Format {“date”: “dd-MM-yyyy”}
4. Click on the Log option and expand the DAG to view the logs. Choose a stage for any module.
Logs can also be viewed in the Elastic search index adaptor_logs
GET adaptor_logs/_search - the timestamp is provided based on the day for which the logs are being searched.
This DAG triggers every day at midnight for the previous day.
Bulk Insert For A Date Range
Execute the script to run the DAG for a date range for the staging NDB
sh iterate_over_date.sh <start-date> <end-date> ex: sh iterate_over_date.sh 01-03-2022 05-03-2022
date needs to be in the format of dd-mm-YYYY
range exclusive of the last date, [start-date, end-date). For instance: in the above example, the script will trigger DAG on 1st, 2nd, 3rd and 4th March. It will not be triggered on 5th March.
Refer to the sample below to delete the data for the month of July module-wise in the Postgres database:
delete from nss_ingest_data where dataKey like '%-07-2022:FIRE%';
delete from nss_ingest_data where dataKey like '%-07-2022:TL%';
delete from nss_ingest_data where dataKey like '%-07-2022:WS%';
delete from nss_ingest_data where dataKey like '%-07-2022:PGR%';
delete from nss_ingest_data where dataKey like '%-07-2022:PT%';
delete from nss_ingest_data where dataKey like '%-07-2022:MCOLLECT%';
Adjust the module and the date range accordingly.
Check the records before deleting.
Note: Deleting data from both ES and Postgres is mandated to avoid duplication of data.
1. Deploy the Airflow on the state side to extract data from the Elastic Search server to push to the National Dashboard adaptor.
2. Configure the airflow with the required connection ids and the variables.
es_conn: To connect to the required ES server
digit-auth: To connect to the required API to insert data.
Variables: The credentials to connect to the API.
3. The index must contain data till the ward level and the structure must be similar to ensure the queries fetch the desired results.
4. Pull data from the DIGIT source only.
Builds deployed
national-dashboard-ingest-db: nda-patch-db6cb27b02-18
national-dashboard-kafka-pipeline:v0.0.1-762c61e743-3
Upyog devOps and MDMS changes
- upyog devOps PR
- upyog mdms PR
Add localisation for newly loaded Punjab tenants
Added NDA_SYSTEM role action mapping in MDMS and created a user with access rights of the same
Credentials: SYSTEMSU1 / eGov@123 - PG, user type: SYSTEM
The overview of the adaptor service
The code is organised in the following code repo:
Refer to the folder structure below:
dags
plugins
The dags folder is the folder which contains the DAG code.
national_dashboard_template_latest.py(manual trigger)
national_dashboard_template_scheduled.py(scheduled)
The queries folder contains the ES queries for the individual modules
For adding a new module we need to add a module-specific file in the queries folder with the transform logic and then refer to the DAG code in both manual and trigger versions.
Design and develop an adaptor service which extracts data from the state DIGIT installations and pushes it onto the National Dashboard instance periodically.
A list of tasks for this is tracked for the adaptor -
Adaptor to be deployed on state DIGIT installations
Periodically, the adaptor extracts data and aggregates it from the different DIGIT modules
Posts the data to the National Dashboard for the state
Bookkeeping is done for every adaptor data extract and pushes for audit and debugging
Out of scope: extraction from non-DIGIT sources
A national dashboard adaptor extracts data from the state DSS at a scheduled time which can be configured and then would ingest in the National Dashboard. The adaptor ingests data at the state/ULB/Ward level for each module on a daily basis. The adapter sends the data in a batch size of 50 to the national dashboard.
ConnectionId | Connection Type | Host |
---|
JIRA link for the issue tracking -
Postman Collection for the INGEST API -
National Dashboard KPI + Definitions-
es_conn | ElasticSearch | elasticsearch-data-v1.es-cluster | 9200 |
| For the ES server |
digit -auth-state | HTTP
|
| https
| For the auth api conenction - Staging
|
digit-auth |
|
|
| For the auth api conenction - UPYOG |
Punjab Kibana QA | admin | 24!jcZ]z"[$qZ%Fa |
|
Punjab Kibana UAT | OAuth authentication |
|
|
Punjab Dashboard - QA
| STADSS | STADSS | Punjab |
Punjab Dashboard - UAT
| EMP9 | eGov@123 | testing |
National DSS - QA | QADSS | eGov@123 | amritsar |
National DSS - DEV | amr001 | eGov@123 | amritsar |
National DSS - UAT | NDSS1/NDSS2 | eGov@123 | PG |
National Dashbaord - UPYOG | NDSS1 | eGov@123 | PG |
AUTH API :Prod | SYSTEMSU3 | eGov@123 ZWdvdi11c2VyLWNsaWVudDo= | pb |
AUTH API credentials:DEV
| amr001/EMPLOYEE | eGov@123
ZWdvdi11c2VyLWNsaWVudDo= | pb.amritsar
|
AUTH API:UPYOG |
|
SYSTEMSU1/SYSTEM |
eGov@123 |
pg |
AUTH API:Staging | SYSTEMSU3/ SYSTEM | eGov@123 | pg |
password | eGov@123 |
|
username | SYSTEMSU1 | for Upyog |
username_state | SYSTEMSU3 | for staging |
token | ZWdvdi11c2VyLWNsaWVudDo= |
|
tenantid | pg |
|
usertype | SYSYTEM |
|
totalulb_url | for reading the ulb’s of punjab |