Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Pros and Cons of migrating to Sunbird RC
Digital Registries must ensure the following
Single Source of Truth
Data Privacy
Non Repudiation
Verification
Audit/History
Exchange using Open Standards
Slow Moving Data
Data may be classified depending on the rate of change
Very Slow Changing Data may be called Master Data. In DIGIT, all Master Data is stored in single service called Master Data Registry. Examples of Master Data are - Property Type, Property Usage etc.
Slow changing data that are basis for various transactions are stored in Registry e.g. Property, User, Employee, Trade License etc.
Transaction Data e.g. Payment for Property Tax etc.
Sunbird RC contains a set of frameworks to enable you to rapidly build next generation electronic registries and verifiable credentials including attestation and verification flows.
Need to evaluate if Digit needs to migrate registries to Sunbird RC.
This document serves as a briefing and overview of the core architecture and components of the platform for a new or unfamiliar developer. It seeks to address the what, why, and how of the platform at the time of writing. It is also meant to be a collaborative exercise, written by newbies for newbies, with future developers adding their own insights and learnings to this resource to have it grow with the platform over time.
This is NOT a technical reference or documentation. It is intended for orientation and will be written in natural language wherever possible. It is also limited in its scope to the general architecture of the back end, with little regard to how the systems necessarily converge to provide product solutions.
By the end of this document, you will be able to completely comprehend the following paragraph. It will equip you to understand the terminology, the tools, the features, and the implicit assumptions therein as well as provide you with solid grounded reasoning on why the architecture is the way that it is. The paragraph is an elevator pitch of the platform architecture, and it looks something like this:
In brief, the platform stack uses nginx servers with Zuul gateways to host Spring Boot microservices stored in Docker containers managed using Kubernetes. The servers rely on Kafka data streams to provide them with data that is indexed in ElasticSearch, and persisted in PostgreSQL databases.
Here’s what you need to know.
Definition: nginx (pronounced “Engine X”) is a web server designed to serve dynamic HTTP content fast. It serves 32% of all active websites on the internet as of 2019, making it the world’s most popular web server.
A server in this context is a computer on a network that holds some form of content and provides it when needed i.e. “serves” it.
Functionality: Nginx uses a modular event-driven architecture to handle requests asynchronously, rather than through threads. “Event-driven” means it performs actions as a reaction to things happening in its environment (such as requests for information, or changes in values), as opposed to constantly staying in action to function (which is what threading does).
Why nginx: Nginx is substantially faster than Apache at a fraction of the processor cost because the narrow scope of a microservice server means that the configuration is highly specialized making it more efficient than a feature-rich server which would do more but run slower.
Because the platform microservices are all HTTP driven, a server that is optimized for fast dynamic HTTP processing makes logical sense.
Definition: Zuul is an open-source API gateway service developed and provided by Netflix. An API Gateway is a service that manages access control to a server that is hosting an API, which means that it can handle things like service requests that involve sending and receiving program operation-specific data and parameters and is custom-built for that purpose.
Functionality: Zuul acts as a proxy, accepting all incoming API requests and authenticating them before delegating them to the microservice in question. This means that whenever an app or a product is requesting or calling a microservice, it is actually connecting to Zuul first, rather than directly to the server. Once Zuul okays the request, it hands off to the server.
Why Zuul: Zuul provides two benefits: it acts as a wrapper on the internal mechanics of the microservices, meaning that any internal functionality concerns are irrelevant to any external clients. It also simplifies the server gateway and access system, allowing for a single configuration of authentication protocols to suffice for every deployed microservice. In the absence of a common gateway, authentication would have to be individually defined on every server access point, which would be tedious and redundant.
Definition: Spring is an application framework for Java, or the environment in which a java application runs. Spring Boot is an opinionated instance of the Spring framework, which means that it is automatically preconfigured in the way most Java application frameworks tend to be configured on average.
Functionality: The opinionated configuration of Spring Boot means that a developer does not need to be spending time and resources to install the libraries and dependencies required for a specific Java application. They are all present at the time of deployment and only highly specialized dependencies need to be installed after the fact.
Why Spring: Because the platform consists of a large number of microservices where the individual functionalities of a given service are defined very simply, it is unlikely that highly specialized dependencies will be required for a non-opinionated configuration to be required or viable. Therefore, an opinionated instance that includes all the commonly required dependencies by design is an ideal match for the framework requirements for a project such as this.
Definition: Kafka is a real-time data streaming service. It allows other systems to subscribe or publish to a data stream (a sequence of data that updates asynchronously in real-time).
Functionality: Kafka acts as the backbone of the server architecture, handling data transfer between the databases and the microservices, as well as other platform entities that require access to data and functionality elements. It creates streams of information that services and network entities can either publish or subscribe to.
Why Kafka (or why Data Streaming): Data streaming in general, and Kafka in particular, address an important aspect of microservice architecture design. Inter-service communication plays a larger role in the functionality of such architecture over traditional service architectures, and being able to reliably and efficiently provide data to all the microservices active at a given time during runtime is essential to the platform working as intended.
With streaming, services that need data can request it independent of each other without affecting the functionality of others (a key advantage of a pub/sub model) and the data can be reliably expected to be up-to-date. With distributed streaming infrastructures like Kafka, scaling up to accommodate larger and more complex microservice deployments also becomes easier.
Definition: ElasticSearch is a search engine. It provides text-based search functionality across an indexed database.
Functionality: ElasticSearch allows for searching across all kinds of documents, including specifically schema-less JSON objects. It is quasi-real time, allows its database indices to be sharded (horizontally partitioned) with shard-level replication as well as distributed computation and storage.
ElasticSearch is complemented by Logstash, a data collection and logging system and Kibana, an analytics and visualization dashboard. These three tools, combined with Beats, a lightweight data shipper (not being used in the architecture) are collectively named the Elastic Stack.
Why the Elastic Stack: The Elastic stack is self-contained and highly functional, ideal for the “just works” configuration that is needed for scalable systems. Specifically, ElasticSearch is a more efficient method of searching the database since the query runtime is faster on indexed Elastic than on indexed relational databases. It works in tandem with the slower but more robust relational database to provide faster data access.
Definition: PostgreSQL is an open-source relational database management system developed by the Ingres team at the University of California, Berkeley.
Functionality: PostgreSQL is a fully functional RDBMS that is market competitive with other open source and proprietary database management tools. A full list of the features it offers would be slightly redundant to add to this document, but it could be introduced at a later date.
Why PostgreSQL: PostgreSQL has one real advantage over other forms of open source RDBMS in that it is slightly faster. MySQL will run slower on average on certain specific query cases and corner cases. Furthermore, there is a consensus in the platform development community that a move to PostgreSQL is inevitable in all but the most legacy of systems. Non-Postgres systems at large scale are only really being maintained because migration would be too resource-intensive to be worthwhile.
Definition: Docker is virtualization software that creates lightweight virtual environments called containers in which programs can be run with their own unique configuration of libraries, dependencies, and setups. Because all Docker containers run on one OS kernel, they are less resource intensive than virtual machines (which instantiate a new OS for every virtualization).
Functionality: Docker uses Linux functionality like cgroups (which allows for compartmentalizing hardware resources) and namespaces to isolate the containers without having to create a new instance of the kernel for every virtualization.
Docker containers are also ephemeral, in that they only exist for as long as it is needed for the app or service running within the container to perform the necessary task, after which it is cleaned up.
Why Docker: Virtualization and containers are advantageous for a distributed scaled system because of the ease of configuration for individual microservice functionality. This in turn lowers the size of the resultant code base, as well as allows for constant delivery (since the entire stack does not need to be taken down to instantiate a new container for a new microservice).
Definition: Kubernetes is an open-source container orchestration platform that allows for automating the container deployment, maintenance, and scaling process.
Functionality: Kubernetes consolidates containers into pods, which are groups of containers guaranteed to be hosted in a single location and can share resources. These pods are then organized into services, where the containers are all intended to interact with each other. These are deployed in Kubernetes Nodes on the API server architecture, which are accessed by clients via the Kube-proxy interface.
Why Kubernetes: By design, Kubernetes and by extension the container architecture it facilitates, meet a lot of the concerns and requirements of microservice architectures. Over time, as the system complexity increases, the automation of container management means that the service can be scaled and managed without hindering functionality, provided the core design is consistent with the problem it is attempting to solve.
In brief, the platform stack uses nginx servers with Zuul gateways to host Spring Boot microservices stored in Docker containers managed using Kubernetes. The servers rely on Kafka data streams to provide them with data that is stored indexed in ElasticSearch, and persisted in PostgreSQL databases.
Now that you have read the document, you should be better equipped to understand what that means, as well as the raison d’être for its current state. You should also be cognizant of the context in which the platform functions, and the nature of the solutions it is capable of providing.
Most importantly, you are now ready to jump into the technical documentation and be able to put it in perspective with the system at large, while being able to focus on the specific aspect with which you are concerned.
Still doesn’t make sense? Feels like something is missing. Is everything in this document wrong and bad and you can’t believe someone actually wrote this stuff out? Don’t worry! This is a collaborative effort, and your contribution will be most welcome. Ping the author(s), leave a comment, or better yet, edit the document yourself and keep improving it. The more the better.
Over time, this document is intended to help any new team members become familiar and capable with the platform and anything you design worthy of adding to their knowledge should be added here.
If you’re good to go, however, then get in touch with your team and they will let you know what is next.
DIGIT analytics enable -
administrators to view the dashboard based on which they can take day-to-day planning and operational decisions
citizens to view and assess how the city administration is delivering services to them
employees to identify immediate areas of focus so that they can direct their efforts accordingly
analysts and researchers to access data in a format that enables them to analyse data rapidly and provide deep insights
In order to enable the above in a scalable, secure and reliable manner, the DIGIT platform needs to ensure -
transaction data is extracted, transformed and made available in an analytical datastore in a timely manner
privacy issues are addressed as data is moved to the analytical data store
as the transaction data structure is modified, the extract and transform programs continue to work seamlessly
users will have the ability to extend the transformation to suit their needs. Data may need to be transformed multiple times to address reporting and analytical requirements.
user can design and modify dashboards as per their requirements
users can access data only based on their role
raw as well as analytical datasets are made available through open data APIs for analysts and researchers
anomalies detected are bubbled up to the right users at the right time
users can perform descriptive, diagnostic, predictive and prescriptive analytics seamlessly
real-time scenarios e.g. IOT can be catered to by the platform.
Sunbird cQube https://cqube.sunbird.org/ is something we should look at see how this fits into our requirements.
Microservices and Low Code No Code architectures have evolved to address very different sets of software engineering challenges. With the advent of information technology and especially after its explosion in the post internet era, two major problems emerged.
1. Scale Problem - How to design cost effective, evolvable and reliable systems that can be scalable to meet requirements of millions of users.
2. Speed Problem - How to accelerate the development of software?
To address the scaling problem, technology companies and systems designers created design concepts and technologies like hardware virtualization (cloud), containers (e.g. Docker), Service Orientation (e.g. API First approach), Asynchronous Processing (e.g. Queues) etc. These technologies and design concepts eventually were put together into microservices based architecture and are now used to develop large scalable systems.
To address the speed problem of software development, early engineers focused on automatic code generation using CASE Tools. CASE Tools aimed to generate “high quality, defect free, maintainable software”. Key idea was to use software design models like ER Diagrams, Data Flow Diagrams as input and then generate code from these diagrams. Several of these tools became popular in the 90’s. The main problem with these tools was that when the programmers made changes to the generated code the source model would go out of sync and would become unusable. This limited the adoption of CASE tools to the initial phases of the projects. Similar concepts are used even today by many developers to generate the boilerplate code instead of coding everything from scratch.
In parallel to CASE tools, 4GL (Fourth Generation Languages) and RAD (Rapid Application Development) also began trending. 4GL was first introduced by James Martin in 1981 in his book “Application Development without Programmers”. 4GL languages focused on higher level constructs like information rather than bits and bytes. They focused on databases, reports, workflows, GUI (Graphical user interface) etc. 4GL languages accompanied with Drag and Drop for form designers. Soon, people realized that programming done in higher language constructs has limited applicability (due to lower expressivity). They are harder to refactor. Most of the 4GL were focused on traditional Windows based applications. The internet moved user interfaces to HTML and this made 4GL languages less relevant. Most companies who bet on 4GL rebranded themselves to Business Process Management (BPM) or Rules Engines. The 4GL trend faded away.
The other trend was Rapid Application Development (RAD). Early software development process was adopted from civil engineering which was an extremely rigid waterfall process i.e. after requirements, then design then build and then deployment. No back and forth was accepted. RAD changed that by allowing feedback loops between various stages of development. This allowed developers to incorporate learning during one phase into the previous phase. So basically, some back and forth was allowed. Using CASE tools for design and generating working models were quite suited with this approach.
Low Code No Code environments trace their roots to CASE, 4GL and RAD. The principles are the same - model-driven design, automatic code generation, and visual programming. Their benefits and limitations are also the same. They can generate simple applications quite fast. However these applications will inherently suffer from low expressivity, extensibility, evolvability and scalability. The platform providers hide behind “Low Code” by providing the ability to write code within the designer. As complexity of applications increases, developers end up writing more and more code to incorporate these functionalities. Given there are no standards existing for LCNC platforms today, the underlying models and pieces of code are all stored in proprietary formats. This creates significant vendor lock-in.
Given the need to generate large number of simple applications and shortage of software engineering talent, a combination of backend microservices with a low code no code front end may be the way to move forward. This would enable both scalable systems can be delivered at speed. However, with increasing diversity of channels - web, mobile, chat, voice, kiosks, social media etc. the challenge for low code no code platforms has significantly been raised. A low code no code platform that can integrate/orchestrate backend microservices and enable digital service delivery through a wide number of channels is the need of the hour.
At the same time, while microservices based architectures has been really successful in addressing the issues around scale and maintainability, it has led to increasing complexity of deployment and operations. A plethora of tools are emerging to address these issues. A DevOps engineer needs to be aware of these tools to be able deploy and manage microservices.
__All content on this website by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.
Cocreation Platform
Digit Low Code No Code will enable citizens, government employees and partners to rapidly compose new solutions on top of the platform using a visual editor. Knowledge of coding languages is not required. It has been premised that such a platform will not only expedited development but also make it easier for everyone to create new applications. Especially, for government across the world trying to digitize their services and processes - low code no code can lead significant acceleration.
Enter any government office and one will see loads of forms. To avail a service or apply for a scheme, one needs to fill one of these forms, attach relevant documents and submit it at the counter. Depending on the nature of the application, it is routed from one officer to another till it’s registered in a registry. Then an appropriate certificate is issued that will enable to citizen to access the service or benefit from the scheme.
Today many of these forms are being digitised by developing discreet applications which are often poorly written and difficult to maintain & expensive to modify.
We are proposing to build an open source low code no code platforms which will allow government employees, vendors and citizens to design applications using a visual application designer. Behind the scene the designer will emit an application model based on an open application modeling language or specification. The model will be registered into an an application runtime that will bring up the appropriate application model based on the user action. It will display the appropriate interface e.g. form or execute the appropriate workflow. The data will be stored in an electronic registry in a secure, private and immutable manner. If required, a digit certificate will be issued which can be verified online.
Building such a low code no code based CoCreation environment based on open application specifications will unlock this space and enable government organisation to digitise these processes rapidly. It will ease the access, remove inefficiency, increase observability and ease maintenance & upgrades. It will enable governments to adopt these technologies without being locked into vendor proprietary platforms and infrastructure.
Building an open collaboration environment around these open low code no code environment will enable government, citizens and business including startups to collaborate and co-create new services and rapidly evolve them to meet the needs of the citizens. The underlying open specifications and the accompanying open source implementation will enable multiple startups to innovate and compete in building better platforms. This will create a new digital ecosystem of players.
Typically a low code no code platforms have the following components
1. User Interface or Interaction Designer and User Interface or Interaction Runtime Engine
2. Process or WorkFlow Designer and Process or Workflow Runtime Engine
3. Reports Designer and Reports Engine
Users are able to use the designers to specify user interface, flow and reports. This results in a well defined specification which are stored in the files system or database. These specifications are used by the runtimes to display the UI, orchestrate the flow and generate the reports.
Depending on the scenario, users can start by designing the entities e.g. Order and then generate the forms/views on the entity. Or the users can design the form e.g Google Forms and the entities get generated in the background. The associated CRUD API for these entities are also generated.
Advent of schema free (no sql) databases and support for storage of json objects in relational databases has enabled storage of both the specifications as well as entities who schema is specified by the specifications.
The creation, updation and deletion of these entities generates events that can trigger the workflow specified by the workflow designer. These workflows are typically a chain of event-condition-actions. Events contain reference to the entity on which the conditions is applied which are typically if-then-else rules. If the conditions are met then actions are executed which can be creating, updating or deleting another entity or calling an API or external service which can assign, notify users to take appropriate actions.
As entities are modified the modified information is pushed into a message queue and then into an analytical datastore. The reports specified by the reports designer executes against this datastore (which is typically read fast e.g. ElasticSearch)
A case for federated architecture
COVID accelerated the use of the digital payments infrastructure by the various national and sub national governments to enable the Direct Benefit Transfers. While this helped provide support for millions across the world, a new challenge is emerging which requires due attention. In order to make direct transfers governments need to identify the beneficiaries for the various schemes - based on the scheme specific criteria. The data required for determining eligibility may include land holdings, electricity usage, vehicle ownership, financial transactions, age, gender, caste etc. These records currently reside in the respective departments but many state governments are running initiatives to pull data into a centralised database. Over time they have seeded all these databases with Aadhar and now are in a position to correlate these data to formulate a comprehensive profile of all the citizens. While the objective of such databases is to identify eligible beneficiaries, there are several challenges in these initiatives that need to be thought through.
Single Source of Truth - The respective departments are the legal “registrars” of the respective attributes e.g. Vehicle records are owned by the Road Transport Department and so on. If data is being pushed into the central database, the ownership of ensuring the data is up to date should reside with the respective departments. The system must be designed in a manner to ensure that the most recent record is used to determine the eligibility criteria.
Security - Creating such a centralised database will make it a high risk asset and will require substantial investments in security to ensure adequate protection.
Privacy - Several questions around privacy arise which needs to be addressed e.g. will citizens have visibility in the attributes that are being stored and used for eligibility determination, is there a process for them to raise correction requests, what mechanisms are put in place to ensure limit purpose of use of these databases, can citizens opt out of such a database, etc.
Anomaly Detection - Since this database will be used for beneficiary eligibility, it will be a target for fraud. Mechanisms need to be put in place to detect anomalies e.g. population stability indexes must be computed and compared to ensure no large scale changes in the database are happening to enable inclusion in a specific scheme.
To address the above concerns, designers of these systems must consider federated services architecture rather than centralised databases. Instead of pulling all the data into a central database, it may be possible to implement a centralised “Beneficiary Eligibility” service which in turn calls respective departments “Beneficiary Eligibility” service that returns a “Yes/No” answer. So a scheme system queries the centralised beneficiary eligibility API by sending one or multiple records to it. The service then calls the respective department systems to check the beneficiary eligibility in their respective databases and revert with a result.
A federated architecture as above ensures the legal registrars of the data are enabled to determine the eligibility rather than transferring control of such an activity to a central department. It ensures a single source of truth which is the legal registrars of the respective data. There is no escalation of security and privacy risks beyond what already exists in these existing databases.
__All content on this website by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.
Work in Progress
Application Schema
Entity Schema
Entity Attribute Schema
Access
EntityView
Group
Tab
Field
Condition
Draft - Work in Progress
Governments provide multiple services to their citizens in areas such as education, health, food, law and order, and energy, among others. To do so, revenue is collected via taxes on income, property, and sales, as well as through payment on services such as water and electricity. In addition, various targeted welfare programs or schemes such as direct benefit transfers to weaker sections of the society, or food distribution at lower prices are launched by governments. To deliver these services and schemes, the government has to interact with vendors from different industries.
Most countries also have a federal structure, where responsibilities are distributed between national, sub-national, and local governments - which are further bifurcated into departments and sub-departments - to deliver services and collect revenue. The departments, in turn, are categorised by geographical locations into zones, districts, blocks, and villages, to facilitate smooth delivery of services/programs across the country.
In India, for example, the national government has 40-plus ministries and 20-plus independent bodies. Every ministry has three to five departments, and each consists of five sub-departments. There are 36 sub-national governments: 28 states and 8 union territories. Each has its own departments and sub-departments. The snapshot from the local government directory website (https://lgdirectory.gov.in) below gives the numbers of districts, sub-districts, blocks, villages and local bodies.
Many of these ministries, departments, sub-departments interact (exchange information) and transact (exchange money) to facilitate the delivery of services and programs run by the government. One can only try to understand how complex these interactions might get.
In the digital age, the interactions at the government level are undergoing rapid transformation. Applications are helping digitize the interactions, automate tasks and coordinate the flow of information too. While this does deliver the benefits of automation, it also ends up hard coding these interactions into software, locking the associated data in closed databases that are run on non-scalable hardware architecture. As each department builds these applications, the complex interactions within the ecosystem get encoded into various applications resulting in fragmented and siloed data. Data updates and access to real-time data becomes a challenge for the end-users of the services/programs. Consequently, citizens and vendors have to run from pillar to post to update or make changes in the data. Administrators, on the other hand, struggle to get an integrated view of the data on time, resulting in delayed or incorrect decisions.
It becomes the citizen's and the vendor's responsibility to keep data updated in these departmental applications - who have to run from pillar to post filling forms, attaching proofs and standing in long lines to submit these applications. At the same time, administrators struggle to get integrate view of the data at the right time, thus forced to either delay decisions or make decisions based on gut.
To address these concerns, departments start integrating these siloed departmental applications and also many of them build out multitude of web and mobile apps for citizens and vendors. (Design of many of these applications are not taking into account the diversity, access and infrastructure issues into considerations leading to increasing the "digital divide" - but that is another important issue that needs separate attention and will not be discussed here. )
As these applications start getting integrated with each other, the interactions get further etched into software code and proprietary data exchange mechanisms designed by software engineers. Over period of time, making changes to these data exchange mechanisms requires cascading changes across the several departmental applications which becomes a very expensive proposition and a program management nightmare - leading to multiple failures in technology implementations. As implementation failures increase, no government officer is willing to take the risk of initiating the change. The entire ecosystem gravitates towards a sub-optimal equilibrium.
It's important to understand the problem does not exist in technology but in the architecture or the design of how the interactions are encoded into - siloed applications encapsulating fragmented databases using proprietary data exchange formats running on non-scalable hardware. Platform based approach tries to address these challenges.
In 1890s, several large cities like London and New York where debating the "Great Manure Crisis". As cities were growing, the number of horse carriages were increasing. London had 50000 horses - each generating 15-35 pounds of shit. This was leading to several issues like health, land for stable, food for horses etc. An article in Times, London 1894 stated - "In 50 years, every street in London will be buried under nine feet of manure". The problem seemed so insurmountable that many proclaimed that urban civilisation was dead. By 1912, this seemingly insurmountable problem had completely disappeared, automobiles powered by internal combustion engines built by 400+ manufactures had replaced the horse carriages. Today, London has 2.6M registered cars.
Internal combustion engine (ICE) is an example of a platform building block. It solved a pivotal problem of converting fuel to motion. This unlocked the space, accelerating innovation - 400+ manufacturers of automobiles raced to build multiple end solutions e.g. Car, Trucks, Ships, Manufacturing Plants etc. ICE transformed how automobiles where built which reshaped how roads were built. This reshaped how cities where built.
Platforms are a set of highly reusable building blocks with high complementarity. Each building block solves a key pivotal problem in a manner that it can be reused for building multiple solutions.
Platforms are powerful and have unintended consequences e.g. Automobiles led to increase in accidents, obesity, increase in consumption of fossil fuels etc. Hence it is important that platforms are built in the open and co-created by involving stakeholders from across the ecosystem - government, business and citizens. Appropriate policy interventions should go hand in hand to accelerate adoption of the platform as well as ensuring its ill effects are stemmed.
Digital Platforms
Today we are surrounded by Digital Platforms like Google Search, Facebook, WhatsApp, Amazon, Uber. These are powerful platforms that solve pivotal problems and facilitate digital interactions & transactions for the participants in their ecosystem. Similarly, there are platforms like Aadhar, UPI, Sunbird, Digit Urban at different levels of maturity that are trying to solve pivotal problems, unlock the respective spaces they operate in and enabling ecosystems to build solutions on top of these platforms.
Closed platforms are like walled gardens. The businesses that own these platforms are gatekeepers and define the rules of engagement on these platforms that driven by their business goals. The rules are defined to ensure the value of these platforms accrue to the businesses that own and run them. The rules of engagement for Open platforms are shaped by an open collaborative process where everyone is free to participate. To ensure open platforms evolve in a coherent manner, it needs a proper governance body - to evolve the standards, ensure openness of building blocks, ensure free and fair distribution of value.
Digital Building Blocks
Digital Building Blocks that make up platforms can come in various forms. Some of them are listed below.
Protocols and Formats for Data Exchange e.g. SMTP, HTTP, HTML etc.
Shared Registries e.g. Aadhar
Data Exchange Platforms e.g. UPI
Shared Services e.g. Payment, Collection etc.
Protocols and Data Exchange Formats like SMTP enables seamless exchange of information. Anyone can participate in the ecosystem by building or installing an open source system e.g. email server or webserver, one doesn't need permission or pay hefty amount to the gatekeepers. Similarly, services provided by shared registries like eKYC on Aadhar can be accessed as long as one has appropriate permission from the citizen.
The illustration below graphically summarizes all the above points about digital platforms and demonstrates how platforms can unlock new possibilities.
When building digital platforms especially for enabling interactions between government and citizens, it is important that certain key principles be applied to ensure its adoption, evolution and avoid its unintended consequences.
Open - We have already talked about that digital platforms are powerful and hence need to be open to ensure value is distributed across the actors of the ecosystem and concentrated to benefit a few. This requires that these platforms be built using an open process using open standards, technology, API and data principles.
Unbundled/Modular - To ensure high reuse, the building blocks must be unbundled to small, modular and well defined microservices. Instead of trying to pack complexity into one large integrated solution e.g. ERP, it is key that the problem space be broken down into smaller modular building blocks that can be assembled and also evolved independently.
Federated - The architecture of the platform must ensure value accrues to all stakeholder of the system. Traditional centralized application architectures tend to concentrate power in the hands of those who control the data. Special care must be given to ensure platform does not create information flow that create imbalance of power for the federated structure of the government.
Security - The data stored on these platforms are very high value and will be subject to continuous attacks. It is imperative that highest standards of security be applied for data at rest and in transit.
Privacy - Governments deal with lot of private data about citizens, platform architecture must enforce wherever possible the privacy of individuals and enable solution developers to enhance the privacy.
Minimum - Even though high reuse is of high importance for platforms, it must store minimum data and functionality that is minimum and fit for purpose.
Scalable - Given the scale of government, its imperative that platform be designed to scale to sub-national and national scale.
Governments are starting to recognize the power of digital platforms - on one hand, they are trying to control to unintended consequences of large closed digital platforms and on the other hand, they are trying to build digital platforms to accelerate the attainment of developmental goals.
Digital Public Goods Alliance
Initiatives like the Digital Public Goods Alliance (https://digitalpublicgoods.net/) has been setup as a "multi-stakeholder initiative with a mission to accelerate the attainment of the sustainable development goals in low- and middle-income countries by facilitating the discovery, development, use of, and investment in digital public goods."
Digital Platforms are powerful and we are seeing some early successes, however, its important to highlight that these are still early days. Platforms require sustained cooperation and cocreation amongst multiple stakeholders to design, develop, implement and sustain.
Any ecosystem is consists of various stakeholders interacting with each other. Information and communication technologies are disintermediating these interactions. As interactions get encoded in technology, we have an opportunity to rethink these interactions. Digital platforms through shared data registries, open protocols and common services unbundles the ecosystems, makes information available and provides an opportunity to rearchitect the interactions in ecosystem. Solution designers who build on top of the platform then at least have a chance to innovate and rebuild solutions that ensures the value and benefit accrues to all stakeholders. Development of multiple such solutions will unlock existing ecosystems that stuck in sub-optimal equilibriums (be it health, finance, education etc).
Platforms themselves are part of the solutions, the reimagination of these future possibilities will still need to be done and solutions will need to be built on top of these platforms. To make the point more clear, platforms like internal combustion engines make new solution e.g. cars, trucks possible, they create power opportunities but are not the "silver bullets".
DevSecOps is the philosophy of integrating security practices within the DevOps pipeline
As we scale DIGIT to a core platform and leverage the same across multiple product streams, concerns about platform security, services and the underlying Kubernetes infrastructure have increased. How do we adapt security practices for a containerized hybrid cloud environment? Security needs to be declarative, built-in, and automated. Apps need to be natively more secure. Security needs to shift left in the application life cycle. Whereas standard security practices start after the application deployment.
By developing security as code, we strive to create awesome products and services, provide insights directly to developers, and generally favour iteration over trying to always come up with the best answer before every release and deployment.
We will not simply rely on scanners and reports to make code better. We will attack products and services like an outsider to defend what we've created. We will learn the loopholes, look for weaknesses, and we will work to provide remediation actions instead of long lists of problems.
We will not wait for our organizations to fall victim to mistakes and attackers. We will not settle for finding what is already known; instead, we will look for anomalies yet to be detected. We will strive to be better partners by upholding platform values.
Best practices for automating security checks and remediation.
Hardening the container and Kubernetes infrastructure and workloads.
Detecting and responding to runtime threats with sustained efforts.
Integrate security scanners for containers: This should be part of the process for adding containers to the registry.
Automate security testing in the CI process: This includes running security static analysis tools as part of builds, as well as scanning any pre-built container images for known security vulnerabilities as they are pulled into the build pipeline.
Add automated tests for security capabilities into the acceptance test process: Automate input validation tests, as well as verification authentication and authorization features.
Automate security updates, such as patches for known vulnerabilities: Do this via the DevOps pipeline. It should eliminate the need for admins to log into production systems while creating a documented and traceable change log.
Automate system and service configuration management capabilities: This allows for compliance with security policies and the elimination of manual errors. Audit and remediation should be automated as well.
Security starts with engineering, try to understand the fact developers are engineers whereas hackers are reverse engineers.
Encourage good security hygiene in engineering.
Continuous assessments and compliance checks.
Real-time threat alerting across apps and services.
Enable developers to drive iterative security changes.
Secure the CI/CD pipeline.
Release in small and frequent batches.
Embed code analysis into Q/A.
Use tools to detect that private keys or API information are not pushed on the Version Control.
Empower teams to improve security practices and make changes.
Quick review and approval process.
Changes must leave the audit trial.
Meet compliance requirements.
Enforce operational and security hygiene.
Establish strict password policies.
Audit everything from code pushes, pipelines and compliances.
Monitor systems for bad behaviour.
Monitor apps and services to detect and alert on threats.
Instrument services to identify comprises.
Built-in real-time alerting and controls.
Develop ansible playbooks and response scenarios for IT and Security.
Conduct vulnerability scans and practices.
Conduct periodic scans of product build.
Code reviews and penetration tests.
Establish remediation SLAs.
Transform the team into security ninjas.
Participate in industry conferences.
Invest in security certifications.
Educate employees on security risks.
Prepare teams for incident response.
IDE Plugins — IDE extensions that can work like spellcheck and help to avoid basic mistakes at the earliest stage of coding (IDE is a place/program where devs write their code for those who don’t know). The most popular ones are probably DevSkim, JFrog Eclipse, and Snyk.
Pre-Commit Hooks — Tools from this category prevent us from committing sensitive information like credentials into the code management platform. There are some open-source options available, like git-hound, git-secrets, and repo-supervisor.
Secrets Management Tools allow us to control which service has access to what password specifically. Big players like AWS, Microsoft, and Google have their solutions in this space, but we will use cloud-provider-agnostic when multi-cloud or hybrid-cloud is in place.
Static Application Security Testing (SAST) is about checking source code (when the app is not running). There are many free & commercial tools in the space (see here), as the category is over a decade old. Unfortunately, they often result in a lot of false positives, and can’t be applied to all coding languages. What’s worse is that they take hours (or even days) to run, so the best practice is to do incremental code tests during the weekdays and scan the whole code during the weekend.
Source Composition Analysis (SCA) tools are straightforward — they look at libraries that we use in our project and flag the ones with known vulnerabilities. There are dozens of them on the market, and they are sometimes offered as a feature of different products — e.g. GitHub.
Dynamic Application Security Testing (DAST) is the next one in the security chain, and the first one testing running applications (not the source code as SAST — we can read about other differences here). It provides fewer false positives than SAST but is similarly time-consuming.
Interactive Application Security Testing (IAST) combines SAST and DAST elements by placing an agent inside the application and performing real-time analysis anywhere in the development process. As a result, the test covers both the source code and all the other external elements like libraries and APIs (this wasn’t possible with SAST or DAST, so the outcomes are more accurate). However, this kind of testing can have an adverse impact on the performance of the app.
Secure infrastructure as code — As containers are gaining popularity, they become an object of interest for malware producers. Therefore we need to scan Docker images that are downloaded from public repositories, and tools like Clair will highlight any potential vulnerabilities.
Compliance as code tools will turn compliance rules and policy requirements into automated tests. To make it possible dev teams need to translate human-readable rules received from non-tech people into code, and compliance-as-a-code tools should do the rest (point out where we are breaking the rules or block updates if they are not in line with the policies).
Runtime application self-protection (RASP) allows applications to run continuous security checks and react to attacks in real time by getting rid of the attacker (e.g. closing his session) and alerting the team about the attack. Similarly to IAST, it can hurt app performance. It’s 4th testing category that I show in the pipeline (after SAST, DAST, and IAST) and we should have at least two of them in the stack.
Web Application Firewall (WAF) lets us define specific network rules for a web application and filter, monitor, and block HTTP traffic to and from a web service when it corresponds to known patterns of attacks e.g. SQL injection. All big cloud providers like Google, AWS and Microsoft have got their WAF, but there are also specialised companies like Cloudflare, Imperva and Wallarm, for example.
Monitoring tools — as mentioned in a DevOps guide, monitoring is a crucial part of the DevOps manifesto. DevSecOps takes it to the next level and covers not only things like downtime but also security threats.
Chaos engineering. Tools from this category allow us to test the app under different scenarios and patch holes before problems emerge. “Breaking things on purpose is preferable to be surprised when things break” as said by Mathias Lafeldt from Gremlin.
Vulnerability management — these tools help identify the holes in the security systems. They classify weaknesses by the potential impact of malicious attacks taking advantage of them so that one can focus on fixing the most dangerous ones. Some of the tools might come with add-ons that automatically fix found bugs. This category is full of open-source solutions, and here we can find the top 20.
__All content on this website by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.
ID
Name
Entities
Array of Entity
ID
Name
Attributes
Array of Attribute
Accesses
Array of Access
ID
Name
Type
Required
MaxLength
MinLength
MaxValue
MinValue
DefaultValue
PossibleValues
Role
Type of Access
e.g. Owner, Editor, Viewer, Commenter
ID
Name
DisplayAs
ViewType
e.g. Form, Table, List, Chat
EntityID
Groups
Array of Group. Optional
Tabs
Array of Tab. Optional
Fields
Array of Field
Conditions
Array of Condition
ID
Name
ID
Name
ID
EntityAttributeID
The ID of the entity attribute to which the field is mapped to
DisplayAs
GroupID
Optional. The ID of the Group to which the field belongs.
TabID
Optional. The ID of the Tab to which the field belongs.
Help
Help to be displayed on the field.
IsPII
Is the field a Personally Identifiable Information.
ID
Target1FieldID
The ID of the target field whose value needs to be checked.
ConditionType
e.g. Equals, Not Equals, Greater Than, Less Than, etc.
Target2Type
Type of the Target 2 e.g. Field or Value
TargetValue
Fixed Value to compare with.
Target2FieldID
The Field ID whose value the Target 1 Field value needs to be compared with.
Action
The action to be performed if condition is met e,g Hide or Show, Required or Optional, Disable or Enable.