Edit

Minimal storage – Change feed to replicate data

Azure Front Door
Azure App Service
Azure Functions
Azure Cosmos DB
Azure Table Storage

This article presents a high-availability solution for a web application that manages large volumes of data that need to be accessible within a specific time frame. The solution uses Azure Cosmos DB as the primary data store and uses the Azure Cosmos DB change feed to replicate data to low-cost secondary storage. After the specified time period, the solution uses Azure Functions to delete the data from Azure Cosmos DB. The data in secondary storage remains available longer for auditing and analysis by other solutions. The solution replicates data to different data services, which provides high durability.

Architecture

Diagram that shows the minimal storage architecture.

Download a Visio file of this architecture.

Data flow

The following data flow corresponds to the previous diagram:

  1. The client authenticates by using Microsoft Entra ID and is granted access to web applications that are hosted on Azure App Service.

  2. Azure Front Door, which is a firewall and layer-7 load balancer, switches user traffic to the standby region if there's a regional outage.

  3. App Service hosts websites and RESTful web APIs. Browser clients run asynchronous JavaScript and XML applications that use the APIs.

  4. Web APIs delegate responsibility to Functions-hosted code to handle background tasks. The tasks are queued in Azure Queue Storage queues.

  5. The queued messages trigger the functions, which perform the background tasks.

  6. Azure Managed Redis caches database data for the functions. The solution offloads database reads for slowly changing data and accelerates the function apps and web apps by using the cache.

  7. Azure Cosmos DB holds recently generated data.

  8. Azure Cosmos DB issues a change feed that can be used to replicate changes.

  9. A function app reads the change feed and replicates the changes to Azure Table Storage tables. Another function app periodically removes expired data from Azure Cosmos DB.

  10. Table Storage provides low-cost storage.

Components

  • Microsoft Entra ID is an identity and access management service that can synchronize with an on-premises directory. In this architecture, it authenticates users and grants access to web applications that are hosted on App Service.

  • Azure Front Door is a secure content delivery network and load balancer. In this architecture, it accelerates content delivery, provides failover capabilities, and protects apps from cyber threats.

  • App Service is a fully managed service that developers use to build, deploy, host, and scale web apps. You can build apps by using .NET, Node.js, Java, Python, or PHP. Apps can run in containers or on Windows or Linux. In this architecture, App Service hosts the web interface and REST APIs for the application. For more information about web APIs, see RESTful web API design.

  • Functions provides an environment to run small pieces of code, called functions, without having to establish an application infrastructure. You can use it to process bulk data, integrate systems, work with Internet of Things (IoT) devices, and build simple APIs and microservices. You can use microservices to create servers that connect to Azure services and always remain up to date. In this architecture, Functions runs background tasks like data replication and expired record deletion.

  • Azure Storage is a set of scalable and secure cloud services for data, apps, and workloads. In this architecture, Storage provides Queue Storage for task messaging and Table Storage for low-cost replicated data storage.

    • Queue Storage provides simple, cost-effective, durable message queueing for large workloads. This architecture uses Queue Storage for task messaging.

    • Table Storage is a NoSQL key-value store for rapid development that uses massive semi-structured datasets. The tables are schemaless and adapt according to need. Access is fast and cost-effective for many applications. This architecture uses Table Storage to store a synchronized and restructured copy of the data in Azure Cosmos DB.

  • Azure Managed Redis is a fully managed in-memory caching service and message broker for data and state sharing between compute resources. To improve the performance of high-throughput online transaction processing applications, design them to scale by using an in-memory data store, such as Azure Managed Redis. In this architecture, Azure Managed Redis accelerates access to frequently used data, which improves performance for function apps and web apps.

  • Azure Cosmos DB is a globally distributed, multimodel database that powers your solutions to elastically and independently scale throughput and storage across any number of geographic regions. It provides throughput, latency, availability, and consistency guarantees with comprehensive service-level agreements. In this architecture, Azure Cosmos DB stores recent data and emits a change feed that you can use to replicate updates to Table Storage.

Alternatives

  • Azure Traffic Manager directs incoming DNS requests across the global Azure regions based on your choice of traffic routing methods. It also provides automatic failover and performance routing.

  • Azure Container Apps is a fully managed, serverless container service that developers use to build and deploy modern apps at scale.

  • Azure Kubernetes Service (AKS) is a fully managed Kubernetes service for containerized application deployment and management. You can use it to implement a microservices architecture with components that scale independently and on demand.

  • Azure Container Instances runs tasks without requiring infrastructure management. It's useful during development and to run unscheduled tasks.

  • Azure Service Bus is a reliable cloud messaging service for simple hybrid integration. It can be used instead of Queue Storage in this architecture. For more information, see Storage queues and Service Bus queues - compared and contrasted.

Scenario details

This solution stores large volumes of web application data in Azure Cosmos DB. Web apps that handle massive amounts of data use Azure Cosmos DB to elastically and independently scale throughput and storage.

When changes are made to the database, the Azure Cosmos DB change feed is sent to an event-driven Functions trigger. A function then runs and replicates the changes to Table Storage tables, which provide a low-cost storage solution. You can also orchestrate broader downstream data movement by using Azure Data Factory pipelines or Fabric Data Factory to land data in analytics zones.

The web app needs the data for only a limited amount of time. This solution periodically runs and deletes expired data from Azure Cosmos DB, which reduces costs. You can trigger functions on demand or schedule them to run at specific times.

Potential use cases

This solution is appropriate for any application that:

  • Uses a massive amount of data.
  • Requires that data is available in a specific time frame.
  • Uses data that expires.

Examples include apps that:

  • Personalize customer experience and drive engagement by using live data feeds and sensors in physical locations.

  • Track customer spending habits and shopping behavior.

  • Track vehicle fleets and improve efficiency and safety by using vehicle location, performance, and driver behavior data.

  • Forecast weather.

  • Monitor and manage traffic systems.

  • Analyze manufacturing IoT data.

  • Monitor smart meter data.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Well-Architected Framework.

Reliability

Reliability helps ensure that your application can meet the commitments that you make to your customers. For more information, see Design review checklist for Reliability.

  • The Azure Cosmos DB change feed guarantees at-least-once delivery. Design your replication function to be idempotent so that duplicate events don't generate inconsistent data in Table Storage.

  • Azure Front Door provides automatic regional failover. If the primary region becomes unavailable, traffic routes to the standby region without manual intervention.

Cost Optimization

Cost Optimization focuses on ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.

  • The primary cost benefit comes from moving expired data from Azure Cosmos DB, which is billed per request unit (RU), into Table Storage, which is billed per transaction and per GB stored. This process is cheaper for infrequently accessed data.

  • If your workload has predictable throughput requirements, consider reserved capacity for Azure Cosmos DB.

  • Use the change feed for replication. This method reduces code maintenance when compared with replication in the core application.

  • This solution incurs extra costs for secondary storage and for the functions that manage data replication and expiration.

Operational Excellence

Operational Excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Design review checklist for Operational Excellence.

  • You need to migrate existing data. The migration process requires ad hoc scripts or routines to copy old data to storage accounts. When you migrate the data, use time stamps and copy flags to track migration progress.

  • Ignore delete feeds that your functions generate when they delete entries from Azure Cosmos DB. This approach prevents removal of entries from Azure Table secondary storage.

Performance Efficiency

Performance Efficiency refers to your workload's ability to scale to meet user demands efficiently. For more information, see Design review checklist for Performance Efficiency.

  • Change-feed processing latency affects how quickly data becomes available in Table Storage. To meet your latency requirements, scale the function app plan and batch settings.

  • To avoid hot partitions, choose an Azure Cosmos DB partition key that distributes write throughput evenly across logical partitions.

  • Azure Managed Redis reduces read pressure on Azure Cosmos DB for slowly changing data, which lowers latency and RU consumption.

Contributors

Microsoft maintains this article. The following contributors wrote this article.

Principal author:

  • Nabil Siddiqui | Cloud Solution Architect - Digital and Application Innovation

Other contributor:

To see nonpublic LinkedIn profiles, sign in to LinkedIn.

Next steps