DIGIT Docs
v0.3.1
  • DIGIT Knowledge Base
  • Local Governance
v0.3.1
  • Introducing HCM Console
  • Release Notes
    • v0.3.1: Release Notes
      • Service Build Updates
      • Master Data Management Service (MDMS) & Configuration Updates
      • HCM Console Test Cases V0.3.1
    • v0.3: Release Notes
      • v0.3 Technical Release Summary
      • Service Build Updates
      • Master Data Management Service (MDMS) & Configuration Updates
      • HCM Console Test Cases V0.3
      • Gate 2 Release Checklist
        • Release Showcase
    • v0.2: Release Notes
      • v0.2 Technical Release Summary
      • Service Build Updates
      • Master Data Management Service (MDMS) & Configuration Updates
    • v0.1: Release Notes
      • v0.1 Technical Release Summary
      • Service Build Updates
      • Master Data Management Service (MDMS) & Configuration Updates
      • Gate 2 Release Checklist
  • PRODUCT SPECIFICATION
    • User Manual
    • Product Requirement Document (PRD)
    • Functional Specifications
  • TECHNOLOGY
    • Architecture
      • High Level Design
        • Project Factory
      • Low Level Design
        • Project Factory (Campaign Manager)
          • Create Campaign
          • Update Campaign
          • Manage Resources
        • Admin Console
          • Campaign Creation Flow
          • Campaign Updation Flow
          • Checklist Management
      • Services
        • Project Factory
          • Manage Campaign APIs
            • Microplan Integration
            • Update an Ongoing Campaign
          • Manage Data APIs
            • Target Upload
          • Manage Boundary APIs
            • Boundary Generation
            • Boundary Management
            • Boundary Management Through GeoJson
        • HCM Console Web
          • User Interface Design
          • Manage Campaign
            • Setup Campaign (New Campaign)
              • Campaign Details
              • Boundary Details
              • Delivery Details
              • Resource Upload Details
              • Summary Screen
              • Setup Campaign from Microplan
              • Setup and Implementation of Campaign
              • IRS Console Support
              • Data Mapping Screen (v0.3 Patch)
              • Co-Delivery Console Support
            • My Campaign
              • My Campaign Actions
              • Campaign Timeline
              • Update Campaign (Boundary/ Resources)
              • Update Campaign Dates
          • Boundary Data Management
          • Manage Checklist
            • Default Templates
  • SETUP
    • Installation
    • Configuration
      • UI Configurations
      • Steps to Enable a New Campaign Type in Console
      • Advanced Configurations
    • Quality Assurance Testing
      • Automation - Run HCM Console Script
        • User
        • Target
        • Facility
      • Performance Testing
  • GENERAL
    • Product Roadmap
Powered by GitBook

All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.

On this page

Was this helpful?

Export as PDF
  1. TECHNOLOGY
  2. Architecture
  3. High Level Design

Project Factory

When dealing with large-scale data creation on a server based on Excel input, the choice between Python and Node.js depends on factors like performance, ecosystem support, scalability, and ease of development.

Here’s a breakdown of both languages for this use case:


1.Java

Strengths:

  • Performance: Java delivers high performance for CPU-bound tasks due to its compiled nature and efficient memory management (JVM).

  • Scalability: Java is a proven choice for large-scale enterprise systems, supporting high concurrency via multi-threading and frameworks like Spring Boot and WebFlux.

  • Stability: Java is ideal for enterprise-grade applications requiring strict type safety and long-term stability.

Weaknesses:

  • Verbose Development: Java requires more boilerplate code and setup, slowing down initial development compared to Python or Node.js.

  • Complexity for I/O: Non-blocking I/O requires additional frameworks like Netty or reactive programming (WebFlux), adding complexity.

  • Startup Time: Java services have longer initialization times and higher memory usage compared to Node.js.

  • Excel Processing are often more complex and resource-intensive when processing large Excel datasets even with help of Libraries like Apache POI and JExcel .

2. Node.js

Advantages:

  • Event-Driven & Non-Blocking I/O: Node.js is excellent for I/O-heavy operations such as sending HTTP requests or interacting with APIs/servers concurrently.

  • Concurrency: With its single-threaded event loop and libraries like async/await, Node.js is efficient for tasks that involve network operations.

  • Excel Processing Libraries: Node.js has libraries like xlsx and exceljs for reading and writing Excel files. While they are performant, they are not as feature-rich as Python’s pandas.

  • Stream Support: Node.js natively supports streams, allowing large files to be processed in chunks without loading them fully into memory.

  • Scalability: Node.js performs well under high loads and can handle a massive number of concurrent connections due to its lightweight architecture.

Disadvantages:

  • Data Processing: Node.js lacks the robust and mature data manipulation libraries Python offers (e.g., pandas), making it less efficient for complex data transformations.

  • CPU-Bound Operations: Node.js struggles with CPU-intensive tasks like large-scale data processing since it is single-threaded by default. This can be mitigated using worker threads.

3. Python

Advantages:

  • Excel Handling Libraries: Python has excellent libraries like pandas, openpyxl, and xlrd for reading, manipulating, and writing Excel files efficiently.

  • Data Manipulation: Python excels at processing and analyzing large datasets due to its data science-oriented libraries like pandas, NumPy, and Dask (for parallel processing).

  • Built-in Support for Parallelism: Libraries like multiprocessing or concurrent.futures allow Python to distribute processing of huge datasets across CPU cores.

  • Ease of Development: Python’s simplicity and extensive ecosystem make it easier to implement and test scripts for such tasks.

  • Data Export: Python can easily integrate with databases, APIs, or servers for data creation through libraries like requests (HTTP requests) or sqlalchemy (database connections).

Disadvantages:

  • Slower Execution Speed: Python’s Global Interpreter Lock (GIL) can limit concurrency for I/O-heavy tasks, though libraries like asyncio and threading help mitigate this.

  • Memory Management: Python can use more memory for extremely large datasets compared to Node.js.

  • Scalability: If you need to process millions of concurrent requests, Python may require more effort to scale.

When to Choose Node.js:

  • The task is I/O-intensive (e.g., creating data on other servers via HTTP APIs).

  • You need high concurrency and scalability.

  • You are working with large Excel files and want to leverage streaming to avoid loading entire files into memory.

  • You are already using a Node.js-based ecosystem and prefer to keep it consistent.


When to Choose Python:

  • You need to process and transform huge datasets in Excel efficiently.

  • Your task involves heavy data manipulation or analytics.

  • You prefer working with established libraries like pandas and openpyxl.

  • Your use case is CPU-bound rather than I/O-bound (e.g., processing Excel locally before sending data to a server).

Hybrid Approach (Optional)

For complex use cases, you can use both:

  • Use Python for preprocessing and transforming large Excel files.

  • Use Node.js for efficient HTTP requests to create data on other servers.


Final Recommendation:

  • If your task involves heavy Excel processing and transformations: Use Python.

  • If your task focuses on sending data concurrently to other servers: Use Node.js.

If both Excel processing and data creation are important and you’re comfortable with Python, it’s often the better choice due to its ecosystem and ease of data manipulation


Conclusion

Considering the above points, we chose a Node.js implementation for the Project Factory Service. This service involves boundary creation based on the input Excel file, project creation for selected boundaries within a campaign, and the creation of entities such as facilities, users, and the necessary mappings between the created projects and these entities. We utilized the exceljs library to process the input Excel data for entity information, and we have observed that the total data creation like project, project mapping for 3000+ boundaries is completed within 15 minutes with high concurrency.

PreviousHigh Level DesignNextLow Level Design

Was this helpful?