Monday, September 11, 2017

Designing Azure Metadata Service


Some time back we have announced the general availability of Azure Instance Metadata Service (IMDS). IMDS has been designed to deliver instance metadata information into every IaaS virtual machines running on Azure over a REST endpoint. IMDS works as a data aggregation service and fetches data from various sources and surfaces it to the VM in a consistent manner. Some of the data can already be on the physical machine running the VM and others could be inside other regional service which are remote from the machine.

As you can imagine the scale of usage of this service is immense and spans across globe (at the time of writing 36 regions across the world) and Azure usage doubles YoY. So any design for IMDS has to be highly scalable and built for future growth.

We had many options to build this service both based on the various reliability parameters we wanted to hit as well as in terms of engineering ease.


Given a typical cloud hierarchical layout, you can imagine such a service to be built in any one of the following ways

  1. Build it like any other cloud service that runs on its own IaaS or PaaS infrastructure, with load-balancers, auto-scaling, mechanisms for distributing across regions, sharding etc.
  2. Dedicate machines in clusters or data centers that run this service locally
  3. Run micro-services directly in the physical machines that host the VMs

Initially building a cross region managed service seems like a simpler choice. Pick up any of the standard REST stack, deploy using any of the many deployment models available in Azure and go with that. With auto-scaling, load balancers it should just work and scale.

Like with any distributed systems we looked into our CAP model.

  1. Consistency: We could live with a more relaxed eventual consistency model for metadata. You can update the metadata of a virtual machines by making changes to it in the portal or using Azure CLI and eventually the virtual machine gets this last updated value
  2. Availability: The data needs to be highly available because various key pieces in the azure internal stack takes dependency on this metadata along with customer code running inside the VM
  3. Partition: The network is highly partitioned as is evident from the diagram above

Metadata of virtual machines is updated less frequently, however is used heavily across the stack (reads are much more common than updates). We needed to guarantee very high availability over a very highly partitioned infrastructure. We chose to optimize on partition tolerance and availability with eventual consistency. With that having a regional service was immediately discarded because it is not possible to provide high enough availability with that model.

Coupled with the above requirements and our existing engineering investments we chose to go with approach #3 of running IMDS as a micro service on each Azure host machine.

  1. Data is fetched and cached on every machine, which means that data is lower in liveliness but is always eventually consistent as data gets pushed into those machines. Varying levels of liveliness exists based on what specific source the metadata is fetched from. Some metadata anyway needs to be pushed into the machine before it is applied and is hence always live, others like say VM tags has lower liveliness guarantee
  2. Since the data is served from the same physical machine, the call doesn’t leave the machine at all and we can provide very high availability. Other than ongoing software deployments and system errors the data is always available. There is no network partition.
  3. There is no need to further balance load or shard out data because the data is on the machine where it is being served. The solution automatically scales with Azure because more customers means more Azure machines running them and more placed IMDS can run on
  4. However, deploying and telemetry at this scale is tough. Imagine how large Azure deployment is and consider deploying and updating a service that runs everywhere on it.

It’s really fun working on problems on this scale and it’s always a learning experience. I look forward to share more details on my blog here.

No comments: