The goal of the Azure Reference Architecture is to help organizations quickly develop and implement Microsoft Azure-based solutions while reducing complexity and risk. The Azure Reference Architecture combines Microsoft software and recommended compute, network, and storage guidance to support the extension of their datacenter environment through the use of Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) constructs.
The scope of this document is to provide the necessary guidance to develop Microsoft Azure-based solutions by establishing an Azure subscription model that meets the business, identity, security, infrastructure, and development requirements held by most organizations adopting a public cloud services strategy.
The focus of this document is on the design and implementation guidelines for general Azure subscription planning. This document is not intended to replace existing documentation about Microsoft Azure features. It seeks to integrate and complement that information with associated design guidance. For most organizations that want to seamlessly integrate Azure services, a firm understanding of the features and capabilities of the Azure platform along with tested models and practices is key towards proper consumption and adoption of the services.
This This document's primary scope focuses on the generally available (GA) feature set of Azure. Azure features and capabilities are surfaced in one of three ways:
Preview features are included in this document where possible: however, the primary focus is on conveying the tested design practices and solutions based on GA features. Preview features may not have full capabilities, global scale, or repeatable design patterns that can be leveraged in your planning activities.
The Cloud Platform is Microsoft's vision of a modern platform for the world's apps. It provides a platform that is unified across on-premises, service provider, and Microsoft Azure environments. The Cloud Platform delivers the hybrid cloud, which effectively provides one consistent platform that spans customer datacenters and multiple clouds.
The Infrastructure as a Service (IaaS) product line architecture (PLA) utilizes the core capabilities of Windows Server, Hyper-V, and System Center to deliver a private cloud IaaS offering.
The Azure Reference Architecture compliments the IaaS PLA and completes the Cloud Platform vision by providing a reference architecture and design patterns for the public cloud (Microsoft Azure).
The Microsoft Cloud Service Provider (CSP) program allows service providers to sell Microsoft cloud services along with their own offerings and services. Partners own the complete customer lifecycle through direct billing, provisioning, management, and support. The CSP program enables service providers to:
Microsoft Azure is an open and flexible cloud platform that enables service providers to rapidly build, deploy, and manage secure applications to scale on premises, in the cloud, or both. Bringing Azure to Cloud Service Providers enables partners to capitalize on this Azure opportunity with the capabilities of a CSP, where partners own the end-to-end customer lifecycle with direct provisioning, billing, and support of Microsoft's cloud services.
The Datacenter and Cloud Infrastructure Services portfolio from Microsoft Enterprise Services is designed to help organizations implement technologies that introduce the efficiency and agility of cloud computing, along with the increased control and management of infrastructure resources.
The key attribute of the Cloud Platform vision is a hybrid infrastructure, in which customers have the option of utilizing an on-premises infrastructure or services provided by Azure. The IT organization is a consumer and a provider of services. This enables workload and application development teams to make sourcing selections for services from any of the provided infrastructures or to create solutions that span them.
The Datacenter and Cloud Infrastructure Services portfolio are Microsoft Services engagements and frameworks through which Intellectual Property (IP), such as the IaaS Product Line Architecture (PLA) and the Azure Reference Architecture is delivered. The portfolio includes offerings for scenarios such as infrastructure deployment, consolidation and migration, modernization, automation, and operations. All of the offerings and scenarios leverage the best practices and design patterns found in the IaaS PLA and the Azure Reference Architecture.
The Azure Reference Architecture (AZRA) is an initiative to address the need for detailed, modular, and current architecture guidance for solutions being built on Microsoft Azure. AZRA is a collection of materials including design guidance and design patterns to support a structured approach to architecting services and applications hosted within Microsoft Azure.
Unlike the Microsoft PLAs, it is not the intention of the Azure Reference Architecture to result in a single design, nor will it encompass an exhaustive definition of Azure features. The primary reason for this approach is that customer solutions that use Azure services vary greatly in their implementation. Given the pace of changes and enhancements to Azure services, it is critical that organizations are provided with durable recommended practices related to subscription and architectural planning within Microsoft Azure.
The focus of the Azure Reference Architecture is to identify common services and reusable models that can be used broadly when designing cloud-based solutions. These models assist customers through a series of decision points that lead to reusable design patterns. They are based on successful customer implementations and recommended practices.
Unlike many on-premises solutions, Azure deployment models vary in size, composition, and end-state design, which presents a clear challenge to organizations looking to build solutions based on established standards and best practices. Although there are significant variances between projects that use Azure services, many of these can be classified into a small number of key deployment models and audiences. Each audience or model falls into two broad focus areas: Development or Infrastructure. Additionally, deployment models differ by the subscription ownership type; whether it's the customer organization or their Cloud Solution Provider (CSP) who manages their Azure subscriptions.
Within these categories and corresponding constraints, the following deployment models and audiences can be defined:
When choosing a management approach for consuming Azure services, the decision is driven by how much management the customer wants to deliver versus how much the cloud service provider will deliver; as well as the connectivity approach to Azure. The figure below provides a comparison view of management responsibility based on the CSP scenarios described above. It's important to note that CSP models provide both built-in and optional services that the customer can select from.
When it comes to planning, designing, and consuming Azure services, these categories are complimentary in some respects. In other respects, they have the potential to create divergent paths.
A key consideration to remember is that within any organization or project, developers consume infrastructure. Similarly, infrastructure is deployed to support applications and services. Understanding the needs of both is important towards developing an Azure subscription model that satisfies the needs of the project or organization.
As outlined previously, the Azure Reference Architecture guide provides the basis for the decisions that must be considered as part of any project that encompasses a solution design using Azure services. The design of the solution should leverage the architecture design patterns (infrastructure, foundation, and solution) described later in this document.
The Azure Reference Architecture guide does not outline a single Azure design for hybrid enterprise solutions. Rather, it provides a comprehensive framework for decisions based on the core Microsoft Azure services, features, and capabilities required by most solutions. The guide is structured to cover each of the broader topic areas outlined previously, and it uses the following framework for each component:
A sample topic area is outlined here to illustrate this relationship:
Figure 2: Azure Reference Architecture Sample Topic Area
Rule set requirements are vendor-agnostic and are categorized as one of the following:
Mandatory: Mandatory recommended practice or area that is critical towards building solutions and services within Microsoft Azure. These requirements are necessary for alignment with the reference. |
Recommended: Recommended practice or area that represents a standard recommended approach that is strongly recommended when developing a solution or service within Microsoft Azure. However, implementation is at the discretion of each customer and is not required for alignment with the Azure Reference Architecture. |
Optional: Optional recommended practice. These requirements are voluntary considerations that can be implemented in the solution or service being developed in Microsoft Azure and can be followed at the discretion of each customer. |
Both Public and private cloud environments provide common elements to support running complex workloads. Although these architectures are relatively well understood in traditional on-premises physical and virtualized environments, the constructs found within Microsoft Azure require additional planning to rationalize the infrastructure and platform capabilities found within public cloud environments.
To support the development of a hosted application or service in Azure, a series of patterns are required to outline the various components and to compose a given workload solution. These architectural patterns fall within the following categories:
This spectrum of patterns is illustrated in the following model.
Figure 3: Azure Architecture Model
Architectural patterns for cloud-hosted workloads (applications and services) should generally adhere to this model and complex scenarios can be implemented using one or more of the pattern types outlined previously. To learn more about the Azure architectural patterns, see Cloud Platform Integration Framework (Azure Architecture Patterns).
The following diagram illustrates how they can be composed to define a solution, application, or service in Microsoft Azure.
What is Azure? In short, it's the Microsoft public cloud platform. Microsoft Azure includes a growing collection of integrated services (compute, storage, data, networking, and applications) that help customers move faster, do more, and save money. With Microsoft Azure, you can build an infrastructure, develop modern applications, gain insights from data, and manage identity and access.
Azure offers dozens of different services in the cloud. These services include all of the commonly referenced cloud computing models:
These models can be combined and integrated to build complex robust solutions for any audience and use case.
The availability of Azure services varies by region and whether the service is currently in Preview or is Generally Available (GA). For up-to-date information about service availability in each datacenter, see the Services by region page. Determining which services are available is a key consideration when deploying applications or enabling services within Azure.
The concept of Azure regions will be covered later, but consider the sample webpage that follows. The area outlined in red serves as an example of a customer-selected region. Using this example, if the requirement was to deploy a solution within the South Central US Azure region, the solution would be constrained from using G-Series virtual machines (currently in Preview and covered in the Compute IaaS section of this document). Conversely, if the solution required G-Series virtual machines, the organization would need to select an Azure region that supports that feature or service.
When deploying solutions in Azure and planning Azure subscription models for your organization, consider the following questions:
The answers to these questions will help govern your decisions about regions and service consumption in Microsoft Azure. For additional details about each Azure services offering, refer to Directory of Azure Cloud Services.
The United States National Institute of Standards and Technology (NIST) published Special Publication (SP) 800-145, "The NIST Definition of Cloud Computing to provide a clear definition about cloud computing to United States government agencies. Since its release, it has become an unofficial standard in the computing industry when it comes to defining cloud models.
Using the definitions provided in NIST SP 800-145, Microsoft Azure (and other online properties, such as Office 365) is classified as a Public cloud offering because it is owned and managed by Microsoft and is open for use by the general public. Within Microsoft Azure and other cloud solutions, Microsoft provides Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS) capabilities.
NIST SP 800-145 defines Infrastructure-as-a-Service (IaaS) as "The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components."
Deploying an application and managing an IaaS environment provides the most flexibility that Azure has to offer. With any deployment choice, there will be pros and cons that must be considered. The greatest benefit of an IaaS implementation is that it offers the greatest amount of control from the operating system to manage access to the application.
IaaS is most like traditional IT delivery. Customers provision their own virtual machines, define their own networks, and allocate their own virtual hard disks. IaaS shifts the burden of operating datacenters, virtualization hosts, and hypervisors. In addition, the business continuity and disaster recovery infrastructure is shifted from the enterprise to the service provider.
NIST SP 800-145 defines Platform-as-a-Service (PaaS) as "The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment."
With PaaS applications, many of the layers of management are removed and more flexibility is provided than an application running on IaaS instances. Specifically, there is no need to manage the operating system, including patching, which reduces some of the complexity of designing the deployment.
A significant benefit of deploying an application running in a PaaS environment is the ability to quickly and automatically scale up the application to meet the demand when traffic is high, and inversely scale down when the demand is less. Deploying an application in the PaaS model is very cost effective from a scalability and manageability perspective.
PaaS extends IaaS further by providing multitenant services that customers subscribe to. Platform services are a transformational computing model that can dramatically reduce the costs and increase the agility of delivering applications to end users internally and externally. PaaS users bring their own application code but leverage robust platforms, which they do not need to maintain.
NIST SP 800-145 defines Software-as-a-Service (SaaS) as "The capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings."
Choosing an Azure SaaS offering provides the least amount of responsibility on the customer's side. At the same time, providing a lesser amount of flexibility in comparison with an IaaS or PaaS approach.
SaaS is the real promise of cloud computing. By integrating applications from one or multiple vendors, customers need to bring only their data and configurations. They can eliminate the costs of building and maintaining applications and platform services and still deliver the secure, robust solutions to the end users.
Many scenarios need to implement a blend of Azure offerings to meet the needs of their organization and application requirements. The following diagram highlights the main differences from a manageability perspective, when using public cloud SaaS, PaaS, IaaS and On-Premises implementations
This is important to understand when making a decision about implementation, because each offering has a different impact on the cost, security, scalability, and staff needed to maintain the application or environment.
Like most cloud computing services, Microsoft Azure's cloud computing capacity and capabilities are delivered at hyper scale across a series of well-connected global datacenters. These datacenters are represented in constructs such as Azure regions, which are intended to be easily understood by customers and can be easily consumed based on customer needs. This section will review the Azure datacenter model and provide an overview of the constructs established for their use by customers.
Microsoft Azure is deployed around the world in strategic areas that best meet the demand of customers. These areas are known as regions, and are placed at distances greater than 300 miles from each other to help avoid the possibility that a common natural disaster would affect more than one region at a time.
When it comes to deploying an application or service in Microsoft Azure, there needs to be an understanding about the following:
Microsoft Azure is a worldwide network of distributed datacenters that are strategically located around the world to support Microsoft Azure customers. This global presence of datacenters provides Microsoft customers with the ability to deploy an application or service in any datacenter in the world or in multiple datacenters. Whether a customer is a small company or a major corporation, all the services Azure has to offer in that particular region can be consumed.
For list of Azure datacenter locations, see Azure Regions.
Azure operates out of 17 regions around the world. Geographic expansion is a priority for Azure, because it enables our customers to achieve higher performance and it supports their requirements and preferences regarding data location. The following table is provided as reference:
Azure Region | Location | Azure Region | Location |
Central US | Iowa | North Europe | Ireland |
East US | Virginia | West Europe | Netherlands |
East US 2 | Virginia | East Asia | Hong Kong |
US Gov Iowa | Iowa | Southeast Asia | Singapore |
US Gov Virginia | Virginia | Japan East | Saitama Prefecture |
North Central US | Illinois | Japan West | Osaka Prefecture |
South Central US | Texas | Brazil South | Sao Paulo State |
West US | California | Australia East | New South Wales |
Australia Southeast | Victoria |
It is very important to correctly choose a region or regions that meet your organization's needs. There are a number of elements to consider when choosing a region to deploy your applications and services:
Where Azure data is physically stored is very important to most customers. If the organization is restricted by any government regulations or internal company policies about data storage and location, this needs to be transparent. Many times there are restrictions about data export and Government Regulatory Compliance (GRC) for some data sets. This information needs to be understood before deploying any applications or services.
When you create a storage account, you select the primary region for the account. When enabling geographic replication of a storage account, the secondary region is determined based on the primary region, and it cannot be changed. The following table shows the current primary and secondary region pairings when geographically replicated storage is used:
Primary Region | Secondary Region |
North Central US | South Central US |
South Central US | North Central US |
East US | West US |
West US | East US |
US East 2 | Central US |
Central US | US East 2 |
North Europe | West Europe |
West Europe | North Europe |
Southeast Asia | East Asia |
East Asia | Southeast Asia |
East China | North China |
North China | East China |
Japan East | Japan West |
Japan West | Japan East |
Brazil South | South Central US |
Australia East | Australia Southeast |
Australia Southeast | Australia East |
As described earlier, all the Azure Regions are not equal when it comes to the available capabilities and services. Azure will first release a new feature in Preview and may only be available in certain regions, prior to being Generally Available (GA).
Before deploying an Azure service, review the following link and choose a region or regions to verify what services are available in your selected region: Services by region.
The network topology of the Internet is complex when looking at bandwidth and routing. Routes from one end-point to another are not clear, and they propagate between different ISPs while in route. It is best to validate the latency between the customer location and Microsoft Azure regions. Choose the one with the lowest latency, which will provide the best performance from a networking perspective
The costs associated with services within the different Azure regions are not necessarily the same. The cost for Azure services is controlled by many factors. If latency and GRC are not influencing the architectural design of the application or service, it may be best to deploy to the region with the lowest costs. Please refer to the following site for the pricing details of each service provided by Microsoft Azure: Azure Pricing.
One way to reduce the impact of a datacenter or regional service outage is to place these applications and services in multiple regions. Placing a web application or service in multiple Azure regions and tying those services together with Traffic Manager provides the required redundancy to keep the service running.
When an outage happens in one of other regions, the required high availability components will be in place and the services will remain available to the end users. Establishing virtual network-to-virtual network VPNs between datacenters to route data and infrastructure services is another way to support enterprises with high availability.
Affinity Groups tell the Azure Fabric Controller that two or more Azure virtual machines should always be placed together or close to one another within a cluster of compute resources. In the past, it was a requirement to have an affinity group associated with a virtual network. Recent architectural improvements have removed this prior requirement, and it is no longer recommended to use affinity groups in general for virtual networks or virtual machines.
Although it is not generally recommended to use affinity groups with virtual machines, there is one particular scenario where it may be necessary to use an affinity group, specifically only when it is required to have the absolute lowest network latency between the virtual machines. Associating a virtual machine with an affinity group ensures that all virtual machines in the affinity group are in the same compute cluster or scale unit.
Although it may be necessary to use an affinity group when configuring a virtual machine to ensure the least amount of latency, the following drawbacks can be difficult to change later should they occur:
The link between the virtual machine and the affinity group is the cloud service rather than the virtual machine alone. Should capacity issues or the inability to resize an existing virtual machine to a larger size occur, it is necessary to:
The process to remove virtual machines from an affinity group is not very easy, and this further emphasizes why you should not make the association to an affinity group rather than a region unless the requirement for the least amount of latency is present.
For more information about the current guidance for affinity groups and specifics regarding virtual networks and virtual machines, see How to migrate from Affinity Groups to a Regional Virtual Network.
In May of 2014, the ability to create a virtual network that can span the entire region (datacenter) was introduced. Now when creating a new virtual network in the Azure portal, the only option is to associate the virtual network to a location rather than to an affinity group.
A regional virtual network is required for many of the newer Azure features, including internal load balancers. Customers with an affinity group virtual network need to request support to migrate the virtual network to a regional type.
Optionally, you can create a new virtual network that is associated with the region. Then migrate the existing deployments from the affinity group virtual network.
For more information, see: Regional Virtual Networks.
As described earlier, Azure hosts its services in a series of globally distributed datacenters. These datacenters are grouped together in regions, and datacenters within a given region are divided into "clusters," which host services. This interaction is outlined in the following diagram:
Within each datacenter, the racks of equipment are built to be fault tolerant with respect to networking, physical host servers, storage, and power. The physical host servers are placed in high availability units called a cluster. The cluster configurations are spread across multiple server racks.
A single rack is referred to as a Fault Domain (FD), and it can be viewed as a vertical partitioning of the hardware. The fault domain is considered the lowest common denominator within the datacenter for fault tolerance. Microsoft Azure can lose a complete rack, and the hosted services can continue unaffected.
A second partition within the datacenter is called the Upgrade Domain (UD) and it can be viewed as a set of horizontal stripes passing through the vertical racks of fault domains. Upgrade domains are used to deploy updates (security patches) within Azure without affecting the availability of the running services within the Azure fabric. The following diagram shows a high-level relationship between fault domains and update domains in the Azure datacenters.
Virtual machines are placed in specific fault domains and update domains based on the location of respective virtual machines in the same Availability Set. For more information about properly configuring availability sets, refer to the Compute (IaaS) section.
Servers reside within each Azure datacenter. The servers are divided into clusters, which then are partitioned by the Azure Fabric Controller to deliver a given service. This relationship is outlined in the following diagram below:
Additional details about virtual machine compute instances are provided later in this document; however, a brief overview is provided here to give a general understanding of Azure compute services.
Concerning compute sizes for Azure IaaS (virtual machines), Azure currently has three series: A, D, and G. Each series has different characteristics. For example, the D series offer up to 800 GB of temporary SSD storage, while the G series machines are the largest and offer the highest performance.
The following article has details about each series and examples where decisions have been made based on scenarios: Azure A-SERIES, D-SERIES and G-SERIES: Consistent Performances and Size Change Considerations.
With the newer D and G series virtual machines, the temporary drive (D:\ on Windows, /mnt or /mnt/resource on Linux) are local SSDs. This high-speed local disk is best used for workloads that replicate across multiple instances, such as MongoDB, or for workloads that can leverage this high I/O disk for local and temporary cache, such as the Buffer Pool Extensions in SQL Server 2014.
Note: These drives are not guaranteed to be persistent. Thus, although physical hardware failure is rare, when it occurs, the data on this disk may be lost, unlike your operating system drive and any attached durable disks that are persisted in Azure Storage.
Also available when using premium storage are DS series virtual machines that offer high-performance and low-latency disk support for I/O intensive workloads. The underlying disks for DS series virtual machines are SSDs rather than HDDs, and they achieve 64,000 IOPS.
The following tables list the details of the D Series and G Series virtual machines.
D Series
General Purposes Sizes
Name | vCores | Memory (GB) | Local SSD (GB) | Max Persistent Data Disks |
Standard_D1 | 1 | 3.5 | 50 | 2 |
Standard_D2 | 2 | 7 | 100 | 4 |
Standard_D3 | 4 | 14 | 200 | 8 |
Standard_D4 | 8 | 28 | 400 | 16 |
Memory Intensive Sizes
Name | vCores | Memory (GB) | Local SSD (GB) | Max Persistent Data Disks |
Standard_D11 | 2 | 14 | 100 | 4 |
Standard_D12 | 4 | 28 | 200 | 8 |
Standard_D13 | 8 | 56 | 400 | 16 |
Standard_D14 | 16 | 112 | 800 | 32 |
G Series
Name | vCores | Memory (GB) | Local SSD (GB) | Max Persistent Data Disks |
Standard_G1 | 2 | 28 | 412 | 4 |
Standard_G2 | 4 | 56 | 824 | 8 |
Standard_G3 | 8 | 112 | 1,649 | 16 |
Standard_G4 | 16 | 224 | 3,298 | 32 |
Standard_G5 | 32 | 448 | 6,596 | 64 |
To operate your application, service, or infrastructure within Microsoft Azure, it is important to understand the roles, access methods, and components that make up a given organizations Azure environment. This section covers each of these areas at a high level.
Within a given enterprise enrollment, Microsoft Azure has several roles that individuals play. These roles range from creating subscriptions (covered later in this document) to provisioning resources. The following top level roles exist within Azure:
Role | Quantity/Description | Functions/Permissions |
Enterprise Administrator | There may be multiple Enterprise Administrators per Enterprise Enrollment |
|
Account Owner | Each account requires a unique Microsoft Account or Organizational Account |
|
Service Administrator | A single Microsoft Account or Organizational Account may be used across subscriptions and between hierarchal levels |
|
A detailed breakdown of each role, how it is created and what primary tool they use is provided in the following table:
Role | How Created | Primary Tool |
Enterprise Administrator | First account created at on-boarding. Full access and visibility into all activity and resources of a corporate enrollment. | |
Departmental Administrator | Delegated by the Enterprise Administrator, this role is typically cost focused at the business unit level. Approves rolled up IT budgetary requests for multiple organizations. Can create and have visibility into multiple account owners. Consumption information can be rolled up and isolated at this level. | |
Account Owner | Delegated by a Departmental Administrator, this role typically is cost focused at the departmental or project level. Role creates the subscriptions and Service Administrators, and approves hardware and resource requests by project. Can create and have visibility into multiple Service Administrators and subscriptions.. | |
Service Administrator | Owns a subscription at the resource level. Manages who can create and use IT resources; is solution and project delivery focused. Sets roles and responsibilities at the project level. Has visibility into a single subscription's consumption.. | |
Co-administrator | A resource administrator within a subscription that can manage provisioning and delegation of additional co-administrators. Project and resource focused. | |
Resource Group Administrator | Manages a group of resources within a subscription that collectively provide a service and share a lifecycle. Single project or service focused. (Currently in Preview) |
Microsoft Azure has several portals to support holistic management of the accounts, subscriptions, and features outlined in this document. The following sections cover available portals depending on the account management model:
Customer-Owned Models
When the Azure subscription is provisioned using the customer-owned account models described previously in this document, the customer's organization deploys and manages Azure workloads on their own. The following portal offerings are available for resource management:
Portal | Location | Purpose |
Enterprise Portal |
| |
Account Portal |
| |
Management Portal | https://manage.windowsazure.com or https://portal.azure.com (Preview) |
Note - any support ticket under a Premier Azure Support agreement should be opened using the Premier portal |
Cloud Solution Provider-Managed Models
When the Azure subscription is provisioned by Cloud Solution Provider, who manages end customer Azure subscriptions, the following portal offerings are available for resource management:
Portal | Location | Purpose |
Partner Portal |
| |
Management Portal | If managed by CSP (exact URLs are provided in customer subscription "Service management" view on Partner portal): Azure Active Directory Management: Customer Azure resource management https://portal.azure.com/<tenant>.onmicrosoft.com When accessed by the end customer: https://manage.windowsazure.com or https://portal.azure.com (Preview) |
|
The following table provides a summary of portal access by role:
Role | Enterprise Portal | Account Portal | Management Portal |
Enterprise Administrator | Yes | Yes – if account is also Account Owner | Yes – if account is also the Service Administrator or Co-administrator |
Account Owner | Yes – limited access if provided by Enterprise Administrator | Yes | Yes – if account is also the Service Administrator or Co-administrator |
Service Administrator | No | No | Yes |
Partner Center Portal is the primary destination for Cloud Solution Providers (CSPs) to onboard customers, resell first party and third party services, onboard customers, and manage customer services. It also provides access to billing data, powerful analytics, and tools that enable upsell and cross sell for Cloud Solution Provider partners. The following sequence of steps demonstrate the onboarding process of new customer to the Azure platform as a CSP-managed entity:
The following considerations are provided for the operational roles identified within this section:
Initially, a subscription was the administrative security boundary of Microsoft Azure. With the advent of the Azure Resource Management (ARM) model, a subscription now has two administrative models: Azure Service Management and Azure Resource Management. With ARM, the subscription is no longer needed as an administrative boundary.
ARM provides a more granular Role-Based Access Control (RBAC) model for assigning administrative privileges at the resource level. RBAC is currently being released in stages with 22 new roles available at this time.
A subscription additionally forms the billing unit. Services charges are accrued to the subscription currently. As part of the new Azure Resource Management model, it will be possible to roll up costs to a resource group. A standard naming convention for Azure resource object types can be used to manage billing across projects teams, business units, or other desired view.
A subscription is also a logical limit of scale by which resources can be allocated. These limits include hard and soft caps of various resource types (for example, 10,000 compute cores per subscription). Scalability is a key element for understanding how the subscription strategy will account for growth as consumption increases.
One of the most critical items in the process of designing a subscription is assessing your current environment and needs.
It is critical to develop the Subscription, Network, Storage, Availability, and Administrative models together to have a cohesive approach. Understanding how each component is limited and how each impacts the others is critical to a solution that can scale and be flexible enough to support the needs of the business.
Specifically, it is important to have a thorough understanding of the following aspects:
Identify business requirements
Identify technical requirements
Security requirements
Scalability requirements
Additional considerations
Many of the early decisions in architecting and planning an Azure environment and related subscriptions can have an impact on future decisions and designs as the cloud environment grows. As such, it is important to have participation and input from many groups within an organization including networking, security, identity, domain administrators, and IT leadership.
Pulling in specific teams early and having an open dialogue of different perspectives provides a better design and implementation. By ensuring that any objections are exposed early and can be dealt with thoroughly is ideal rather than finding them in the middle of a project so that they have a negative impact on the schedule.
Following is an example subscription design based on a subscription per
Organizational Unit.
Here is another example subscription design that is based on one subscription per environment in the development process of an application.
For CSP-managed scenarios, here is an example subscription design that illustrates a model for one or
more
subscription per specific customer where a separate service deployment for a given customer may be assigned a dedicated subscription.
At its core, a subscription is a logical grouping of services and administration. It is the base unit of administrative granularity and it is used to track and bill service consumption.
Subscription Administrators have the ability to read and download anything stored in an Azure Storage account, including operating system VHDs, SQL Server data disks, and blobs.
Subscription Administrators can stop, start, provision and delete existing and new services.
Subscription Administrators can grant co-administrative access to new users.
All of these capabilities require careful consideration for who is given these rights in the subscription. Domain administrators have a similar situation regarding the level of rights and the need to carefully choose who has these rights.
In CSP-specific scenarios, customer subscriptions are often created, owned and managed by the service provider who then designates administrative agents to manage customer subscription resources. In this scenario the subscriptions are ARM based subscriptions and require an RBAC model to control the access and management of the subscription and resources.
Recommended: The minimum number of users should be assigned as Subscription Administrators and/or Co-administrators. | |
Recommended: Use Azure Resource Management RBAC whenever possible to control the amount of access that administrators have, and log what changes are made to the environment. |
Adding network connectivity (whether using a site-to-site VPN or a dedicated ExpressRoute connection) brings additional considerations to the subscription requirements discussion.
The subscription is a required container to hold a virtual network, and often networking is a shared resource within an enterprise.
Site-to-site VPNs and ExpressRoute circuits require defining IP address ranges that do not overlap with on-premises ranges.
Site-to-site VPN connectivity requires setting up and configuring a public-facing gateway and VPN services at the corporate edge.
ExpressRoute connectivity is through a private connection from an on-premises datacenter to Azure through a service provider's private network. For more information, see the Microsoft Azure Networking section later in this document.
Routing and firewall configurations are typically necessary when enabling connectivity. Administration and connectivity are often at odds with respect to autonomy and sharing resources, but when designing the subscription architecture for the enterprise, both must be part of the solution. Business requirements including availability and reliability will impact the network architecture, and subsequently, the subscriptions necessary to support that architecture.
Because a virtual network must exist inside a subscription, some constraints of a subscription also impact decisions made for virtual networks. For example, only 20 virtual networks can be attached to a single ExpressRoute circuit. Therefore, only 20 subscriptions could be attached to that circuit.
In another scenario, if a design used 20 virtual networks within a single subscription, and ExpressRoute was used for connectivity to corporate network resources, there would be no way to attach another subscription to the same ExpressRoute, regardless of the bandwidth utilization on the circuit.
If multiple virtual networks are to share a single enterprise ExpressRoute connection, essentially there is no network isolation between those networks. In this case, any separation the subscription design may try to define is eliminated and must be achieved through subnet layer Network Security Groups (NSGs). When the virtual networks are attached to the same ExpressRoute circuit, they are essential a single routing domain.
A subscription hosting only PaaS services could have no virtual network at all, and the design limitations discussed above would not apply.
If a subscription will host a virtual network for on-premises connectivity and will not be used to host IaaS and or PaaS resources, it can be inferred that the cost of the subscription with a virtual network is about one-tenth the cost of the ExpressRoute circuit.
Identity services provided by an IaaS Active Directory, an Azure Active Directory tenant, or a customer OrgID tenant will have an impact on how security is implemented and subsequently on how that security impacts the number and configuration of subscriptions necessary.
Subscription Administrators have a broad authority, and as such they must be considered administrators over all the resources in the subscription. If the subscription includes Azure Active Directory, IaaS domain controllers, or if it connects to domain controllers from an on-premises Active Directory, the Subscription Administrators and Co-administrators are also domain owners. They must be trusted individuals and treated like any domain administrator appropriate for that directory.
Productivity goals, single sign-on, and federation requirements impact identity services decisions, and subsequently, the supporting subscriptions.
Subscriptions form the scale limit in Azure. Many resources—from computing cores and storage accounts to reserved IP addresses—have quantity and size limitations based on the subscription.
When thinking about the subscriptions for an environment, it is important to think about how the design will scale if and when limits are reached.
In subscription discussions, a number of considerations determine the decisions made about the design. The number of connections that can be shared by a tunnel or circuit, bandwidth requirements, the source of identity, and the number of groups, users, and applications associated with a subscription are all important topics when considering scale.
The use of a subscription as a security boundary may be considered when designing an Azure subscription model. A project requiring isolation should consider subscription administration very carefully. Some considerations for multiple subscriptions include:
Complexities are introduced when you consider that the on-premises networking and security infrastructures are typically shared resources.
Patching, monitoring, and auditing are frequently provided by dedicated organizations, and staff is trained in the related tools. Business continuity and disaster recovery are almost always dependent on enterprise solutions to mitigate the cost.
An enterprise that allowed Azure subscriptions to be based on a project or team, could find itself:
If a business unit manages its own networking, operations, business continuity, and disaster recovery, or the use case is such that a dedicated VPN connection to on-premises resources is sufficient, this type of subscription model could work very efficiently.
The following diagram shows a robust enterprise Azure enrollment. There are multiple subscriptions, one of which is a "Tier 0" subscription used to host domain controllers and other sensitive roles when extending an on-premises Active Directory forest to Azure.
This is configured as a separate subscription to ensure that only administrators with domain administrator level privileges are able to exert administrative control over these sensitive servers through Azure subscriptions, while still allowing server administrators to manage virtual machines in other subscriptions.
QA and production networks share the same dedicated ExpressRoute circuit to on-premises resources. They are separated into distinct subscriptions to allow separation of access and to allow the QA subscription to scale on its own without impacting production.
This model will scale based on need. Second, third, and subsequent QA and production subscriptions can be added to this design without significant impact on operations. The same applies to network bandwidth—the circuit can be used until its limits are reached without any artificial limitations forcing additional purchases.
Subscriptions are the foundational building block of an Azure enterprise enrollment. The requirements for administration, operations, accountability, connectivity, scalability, and security shape the subscription model.
Note that multiple existing resource forests are depicted here only to show that some forests can be extended to Azure while others don't have to be. Microsoft does not recommend creating a separate resource forest for Azure-hosted resources as a security separation method.
This approach typically requires two-way trust relationships that negate any potential security isolation benefits and the organization will be left with increased operational overhead for no benefit. The use of Read Only Domain Controllers (RODCs) for Azure-hosted resources also offers no meaningful security benefits, while adding increased operational overhead.
When naming the Microsoft Azure subscription, it is a recommend practice to be verbose. Try using the following format or a format that has been agreed upon by the stake holders for the company.
<Company> <Department (optional)> <Product Line (optional)> <Environment>
What you are trying to accomplish with a naming convention, is to put together a meaningful name about the particular subscription and how it is represented within the company. Many organizations will have more than one subscription, which is why it is important to have a naming convention and use it consistently when creating subscriptions.
NOTE: In CSP-specific scenarios the naming convention can also incorporate the CSP identifier to mark the subscription as managed by a service provider.
This is simply an example naming convention to use as a base. Many of the decisions about the naming convention will come from the subscription model that is chosen.
The following table shows how a company might use the naming convention outlined previously.
Company | Department (OU) | Product Line | Environment | Full Name |
Contoso | Services | Business | Dev | Contoso Services Business Dev |
Contoso | Services | Business | Lab | Contoso Services Business Lab |
Contoso | Services | Business | Prod | Contoso Services Business Prod |
Contoso | Services | Consumer | Dev | Contoso Services Consumer Dev |
Contoso | Services | Consumer | Lab | Contoso Services Consumer Lab |
Contoso | Services | Consumer | Prod | Contoso Services Consumer Prod |
North Wind | Databases | Business | Dev | North Wind Databases Consumer Dev |
North Wind | Databases | Business | Lab | North Wind Databases Consumer Lab |
North Wind | Databases | Business | Prod | North Wind Databases Consumer Prod |
The recommended way to access your subscription when using Azure PowerShell is to authenticate by using the Add-AzureAccount PowerShell cmdlet. This cmdlet prompts for authentication in a window where you input your credentials that are associated with Azure Active Directory. You input either your Microsoft Account credentials or Org ID credentials that are associated with the Azure subscription.
Using this method of authentication even once with your subscription takes precedence over any management certificates you may have for your profile. (i.e., running the Import-AzurePublishSettings cmdlet.). To remove the Azure AD token and restore the management certificate method, use the Remove-AzureAccount cmdlet.
When using Azure AD authentication, occasionally you may see an error message: "Your credentials have expired. Please use Add-AzureAccount to log on again." To restore access to your subscription by using Azure PowerShell, simply run Add-AzureAccount again and authenticate.
This method of authenticating to the subscription is most convenient when working with commands or scripts interactively. It is possible to use this method with automated processes, and pass secured credentials by using the –Credential switch. However, at this time this method only works when you are using Org ID credentials, not Microsoft Account credentials.
Management certificates are used to allow client devices to access resources within the Microsoft Azure subscription. The management certificates are x.509 v3 certificates that only contain a public key. They have the .cer file extension.
If a user requires the ability to deploy or change services running in Microsoft Azure, but does not require access to the Microsoft Azure portal, they'll need a certificate. It is very common for a developer to deploy to Azure services through Visual Studio and they will require a certificate to accomplish this task.
The x.509 v3 certificates are mapped to one or more Azure subscriptions. The possession of the private keys associated with these certificates should be given the same level of security as passwords. If the certificate private key becomes compromised, whoever holds this key can perform actions on the subscriptions for which the certificate is valid.
At this time, an Azure subscription can import 100 certificates. Certificates can be shared across multiple subscriptions. There is also a 100 certificate limit for all subscriptions for a specific Service Administrator's ID.
There are a few ways to generate a certificate. You can create a self-signed management certificate or you can download a certificate from the Microsoft Azure portal as part of what is known as a Publish Settings file.
To create your own self-signed certificate, use makecert.exe. Thus is a command-line tool that ships with Visual Studio. Or if you have access to a computer running Internet Information Services (IIS), you can generate one from there.
The Publish Settings file is an XML file that contains information about the Microsoft Azure subscriptions. The file contains specific information about all subscriptions associated with the user's Microsoft ID. These are the subscriptions in which the particular Microsoft ID is associated with the Administrator or Co-administrator. The Published Settings file exposes your Azure subscription to be used with Visual Studio and Azure PowerShell.
To use Azure PowerShell within your environment, open an elevated Windows PowerShell console and execute the following commands:
This will send you to the following URL: https://manage.windowsazure.com/publishsettings/Index?client=vs&SchemaVersion=1.0
Record the path used to save the Publish Settings file, for example:
C:\Users\ProfileName\Documents\AzurePublishSettingFile\YourFileName.publishsettings
Now Azure PowerShell has set up a management certificate to interface with your Microsoft Azure subscription. To validate the association between Azure PowerShell and your subscription, execute the following Azure PowerShell cmdlet: Get-AzureLocation.
Some drawbacks of using management certificates for interfacing with an Azure subscription include:
When using either authentication method in Azure PowerShell, some scripts, such as provisioning new virtual machines, will not function properly until you have associated a storage account with your subscription. To add this association, run the following script:
Set-AzureSubscription –SubscriptionName 'My Subscription Name' –CurrentStorageAccountName 'storageacctname001'
After running this script, you can verify that your storage account is now associated with the subscription by running Get-AzureSubscription. There should now be a value under CurrentStorageAccountName. You should only need to set this value once for most Azure PowerShell operations, and the value can be changed at any time by running Set-AzureSubscription
again.
If you have multiple subscriptions, you also have to ensure that you are targeting the correct subscription with Azure PowerShell operations. There is a default and current subscription setting that you can use to control this. When you load the published settings file or use Add-AzureAccount with access to multiple subscriptions, one subscription is configured with the default and current tag. Any operation will target this subscription unless you change the focus. To redirect PowerShell operations to a different subscription, just add the –current option at the end of the Select-AzureSubscription cmdlet with the subscription name you want to target. If you want to permanently change the default subscription, then use the –default option.
For the development and management of Azure resources, there are a wide variety of tools that can be used from the Azure Management portal, Azure PowerShell, SDKs, and cross-platform and third-party downloads.
Software Development Kits (SDKs)
Following are examples of some SDKs that are available for download and the respective platforms the SDKs can be used to develop on. To get the SDKs and command-line tools you need, see the Microsoft Azure Downloads site.
.NET | Java | Node.js | PHP |
VS 2015 install | Windows install | Windows install | Windows install |
VS 2013 install | Mac install | Mac install | Mac install |
VS 2012 install | Linux install | Linux install | Linux install |
Client libraries |
Python | Ruby | Mobile | Media |
Windows install | Windows install | iOS install | iOS SDK install |
Mac install | Mac install | Android install | Flash OSMF install |
Linux install | Linux install | Windows Store C# install Windows Store JS install Windows Phone 8 install | Windows 8 install Silverlight install .NET SDK install Java SDK install |
Azure PowerShell
You can use Windows PowerShell to perform a variety of tasks in Azure, either interactively at a command prompt or automatically through scripts. Azure PowerShell is a module that provides cmdlets to manage Azure through Windows PowerShell.
You can use the cmdlets to create, test, deploy, and manage solutions and services delivered through the Azure platform. In most cases, you can use the cmdlets to perform the same tasks that you can perform through the Azure Management portal. For example, you can create and configure cloud services, virtual machines, virtual networks, and web applications.
The module is distributed as a downloadable file and the source code is managed through a publicly available repository. A link to the downloadable files is provided in the installation instructions later in this topic. For information about the source code, see the Azure PowerShell code repository.
XPlat-CLi
The Azure Cross-Platform Command-Line Interface (Azure CLI, or sometimes referred to as xplat-cli) provides a set of open source, cross-platform commands for working with the Azure platform. The Azure CLI provides much of the same functionality found in the Azure Management portal, such as the ability to manage websites, virtual machines, mobile services, SQL Server databases, and other services provided by the Azure platform.
The Azure CLI is written in JavaScript, and requires Node.js. It is implemented by using the Azure SDK for Node.js, and it is released under an Apache 2.0 license. To access the project repository, see Microsoft Azure Cross Platform Command Line.
Subscriptions now exist for both ARM and ASM models. Subscriptions have associated "hard" (upper boundary) and "soft" (default) limits for many of the Azure services, features, and capabilities. Many of these soft limits can be increased greatly by simply creating a support request, but some of the hard limits have a big impact on decisions regarding subscription design. ASM based subscriptions have limits based purely on the subscription and is cumulative across all regions. ARM based subscriptions typically have limits based on the region that is being targeted in the subscription. Following are some of the hard limits in a subscription that have the greatest impact on design decisions.
Azure Object | Limit |
Virtual networks | 100 per subscription |
Virtual machines | 10,000 CPU cores per subscription |
ExpressRoute | 1 circuit across 20 subscriptions 10 dedicated circuits per subscription |
Cloud services | 200 per subscription |
Network security groups | 100 per subscription |
Storage accounts | 100 per subscription |
Management certificates | 100 per Service Administrator |
Co-administrators | 200 per subscription |
For a more detailed and up-to-date list of Azure limits, see Azure Subscription and Service Limits, Quotas, and Constraints..
There are differences in how billing is viewed and the available data in the output based on type of subscription. Billing information and details for pay-as-you-go subscriptions are viewed in the Usage and Billing Portal, whereas billing information for enterprise subscriptions is viewed in the account portal.
For either subscription type, the billing details can make it difficult to discern charges for the items listed if a specific naming convention is not being used. For more information about naming conventions, see the respective sections in this document, such as Virtual Machines, Virtual Networks, and Storage Accounts.
An important item to note regarding the detailed billing, regardless of the subscription type is that all standard virtual machine instances are converted into Small instance hours on the bill. For example, a Windows Extra Small (A0) would have a clock hour of 1, but it shows as ¼ hour when converted to small instance hours. Similarly, any A-Series Cloud Service instance is converted into Small (A1) instance hours on the billing detail.
For a complete list of Windows and non-Windows conversions use the Azure FAQ page titled How do various instance sizes get billed?.
The following chart shows the details for each A-Series cloud service instance conversion:
Cloud Services Instance | Clock Hours | Small Instance Hours |
Extra Small (A0) | 1 | ¼ hour |
Small (A1) | 1 | 1 hour |
Medium (A2) | 1 | 2 hours |
Large (A3) | 1 | 4 hours |
Extra Large (A4) | 1 | 8 hours |
Following is an example of usage details for a pay-as-you-go subscription. Notice the Component column has the name of the resources and further highlights why a good naming convention is key.
In this particular subscription, there are very few applications or databases, so it is not difficult keep track of them. However, if there were many applications following the naming style of app1, app2, app3, and the random generated name for the database, it would quickly become very difficult to decipher costs per application.
For a full list of usage details and the definition of each, refer to Understand your bill for Microsoft Azure.
Enterprise subscription billing details are slightly different. In the following example, you can see that a Component column also exists in the billing details; however, not all fields are the same as those in the previous subscription details.
The following table lists the fields for the enterprise usage details:
Detail Fields | ||
AccountOwnerId | Day | ResourceQtyConsumed |
AccountName | Year | ResourceRate |
ServiceAdministratorId | Product | ExtendedCost |
SubscriptionId | ResourceGUID | ServiceSubRegion |
SubscriptionGuid | Service | ServiceInfo |
SubscriptionName | ServiceType | Component |
Date | ServiceRegion | ServiceInfo1 |
Month | ServiceResource | ServiceInfo2 |
AdditionalInfo | Tags | Store Service Identifier |
Department Name | Cost Center |
Under Manage Access, it's possible to enable Department Administrators to see the costs associated with all accounts and subscriptions in their departments. You can also enable Account owners to see their costs.
ARM tags can be used to group billing data where ARM-compliant Azure services allow defining and applying tags to organize the billing usage for Azure resources. As an example, if a customer organization is running multiple virtual machines for different organizations, tags can be used to group usage by cost center. Alternatively, tags can be used to categorize costs by runtime environment; for example, the billing usage for virtual machines running in production environment. Tags appear in billing and usage artifacts, such as usage CSV data or billing statements. For more information about ARM tags, see the respective section in this document. The following example illustrates a sample scenario utilizing tags.
The Microsoft Cloud Service Provider (CSP) program allows service providers to own the complete customer lifecycle, including direct billing. The service providers are able to implement their own pricing and billing policies to create customer offers, set the price, and own the billing terms. CSP partners can automatically receive monthly invoices and billing statements and incorporate incurred costs into their billing accounting for value-added services they provide to their customers.
Pricing for pay-as-you-go subscriptions is based on the Pricing details page. Each service is listed on an individual page, so it may be best to use the Pricing Calculator for the majority of cost estimating.
For MSDN subscribers, Partner Network, and BizSpark accounts the pricing may differ from the pay-as-you-go model, for more information on these types of accounts, use the Member Offers page as a resource.
Enterprise account pricing differs based on commitment and other variables. The Licensing Azure for the Enterprise page reviews some of the benefits of this type of agreement. For more details about the Enterprise account pricing model, the Pricing Overview for Microsoft Azure in Enterprise Programs document is a great resource.
Azure Active Directory (Azure AD) is the standalone directory service within Azure. Customers can create their administrative structure within Azure AD by defining their users and groups. This service can work on its own, because Azure AD can perform authentication without integrating with an on-premises directory.
On the other hand, organizations can choose to synchronize Azure AD with their users and groups from an on-premises Active Directory to Azure AD. This syncing effort rapidly provides availability to resources within Azure for on-premises users and groups.
All users who access the organization's Azure subscriptions, are now present in the Azure AD, which the subscription is associated with. This enables the company to manage what the users can access, or to revoke access to Azure by disabling the account in the directory.
The creation of Microsoft accounts is typically controlled by the users and not by the organization. With an Azure subscription, we recommend using Organization Accounts where possible to provide access to resources.
When creating Microsoft Accounts, we recommend establishing guidelines that will be used within the organization.
Do not allow the use of existing personal Microsoft Accounts. Depending on the individual permissions, these accounts may be tied to the company Azure subscriptions, and have access to storage accounts and billing information.
A Microsoft account is mapped to a person and it should be formatted to identify the user, for example: FirstName.LastName.xyz@outlook.com and not alias2763@outlook.com.
The reason for using specific naming for the Microsoft Accounts is that at the time the account is created, the identity of the user may be known. However, as time goes on and roles change within the company, accounts may be difficult to identify.
In the previous example, the xyz after FirstName.LastName is optional, and it could be used for any number of things, such as environment name, development, lab, or organization name, if that is preferred.
Using Organizational Accounts for managing an Azure subscription is recommended over Microsoft Accounts for various reasons.
The main reason is that the organization has more control over access for adding administrators and removing access when an employee is no longer with the company.
Additionally, many of the newer Azure services offerings are relying heavily on Organizational Accounts. In some cases, having existing Microsoft Accounts tied to services prior to switching to Organizational Accounts can cause issues with the respective tenant IDs.
A Microsoft Azure subscription and the associated resources can be accessed via the Azure Management portal, Azure PowerShell, Visual Studio, or other SDKs and tools. When a subscription is created, a Service Administrator is assigned. The default Service Administrator is the same as the Account Administrator, who is also the contact person (via email) for the subscription.
The Account Administrator can assign a different Service Administrator by editing the subscription in the Microsoft Online Services Customer Portal.
To assist with the management of the Azure Services, the Service Administrator will add Co-administrators to the subscription. To be added as a Co-administrator, a user must have a valid Microsoft Account or Org ID, if this is the method of authentication used in the subscription. The first Co-administrator in the subscription must be added by the Service Administrator. After that, any Co-administrator can add or remove other Co-administrators in the subscription.
Removing or adding Co-administrators must be done in the Azure Management portal, and the option is located under Settings > Administrators.
Subscription Co-administrators share the same rights and permissions that the Service Administrator has, with the following exception: a Co-administrator cannot remove the Service Administrator from a subscription. Only the Microsoft Azure account owner (Account Administrator) can change the Service Administrator for a subscription, by editing the subscription in the Microsoft Online Services Customer Portal, as shown previously.
The Co-administrator account can sign in to the Microsoft Azure Management portal, and view all services. The Service Administrator and Co-administrator have the ability to add, modify, or delete Azure services such as websites, cloud services, and mobile services. A single subscription is limited to a maximum of 200 Co-administrators.
With the introduction of Role Based Access Control (RBAC), Microsoft Azure now has a security model to perform access control of resources by users on a more granular level. Users specified in RBAC permissions can access and execute actions on the resources within their scope of work. Because there is a limit of 200 Co-administrators per subscription, RBAC allows more users to manage their Azure Services. At the same time, RBAC limits access to only the specific resources needed rather than the entire subscription.
RBAC is only available in the Azure Preview portal and when using the Azure Resource Manager APIs. The Service Administrator and Co-administrator will continue having access to all portals and APIs, however any user added only via RBAC will not be able to access the current version of the Azure Management Portal or Service Management APIs.
Mandatory:
|
With RBAC, the subscription is no longer the management boundary for permissions in Azure. Resource Groups are new constructs to group resources that have a common application or service lifecycle. In addition to granting access at the Resource Group level, RBAC permissions can be applied to an individual resource such as SQL Database, websites, virtual machines, and storage accounts.
RBAC administration is implemented by the subscription Service Administrator and Co-administrators. Customers can leverage their existing Azure AD users and groups, or use on-premises Active Directory accounts for access management.
There are twenty-two built-in Azure RBAC roles for controlling access to Azure resources:
Enforcing the access policies that you configure using RBAC is done by using Azure Resource Manager APIs. The Azure Preview portal, command-line tools, and Azure PowerShell use the Resource Manager APIs to run management operations. This ensures that access is consistently enforced regardless of what tools are used to manage Azure resources.
The following article provides additional details: Role-based access control in the Microsoft Azure portal.
Extending services from on-premises implementations to Azure resources is largely driven by operational requirements.
If any of the answers to these and other similar questions is by using an on-premises management tool, decisions need to be made as to how that is achieved.
The answers to these questions can drive decisions about identity, security, and network connectivity or be driven by them, depending on the organizations priorities.
Moving to Azure and the cloud provides opportunities to do things differently. It is important to think about processes and functions from a cloud perspective. Treat everything like a service. Can Azure Services meet the needs? Think about minimum viable solutions—the agility and cost benefits can be enormous.
Azure offers Operational Insights, Application Insights, Log collection, and antivirus solutions from multiple venders, and encryption and backup solutions from Microsoft and third-parties. The more the platform can be leveraged and SaaS offerings utilized, the greater the benefits to the organization.
Start with a cloud first mentality. That means using platform services to reduce infrastructure, management costs, and anchors to legacy solutions. Focus on being agile and scalable so the organization can capitalize on the elasticity and pay-for-use characteristics of Azure.
The Azure Service Management (ASM) Representational State Transfer (REST) API has historically been the primary model for managing Azure resources. The original and present iterations of the Azure Portal, the Azure PowerShell cmdlets, the cross-platform CLI and the Azure Management Libraries for .NET are all built on top of the ASM API. The ASM API was initially developed several years ago and is missing many modern cloud management capabilities, whether it's desired state configuration, role based access control (RBAC) or a flexible extensibility model for future Azure first-party services. ASM supports authentication with either X.509 certificates or Azure Active Directory (AAD).
Note: Azure Service Management (ASM) is not supported in the CSP subscriptions as defined in the CSP model. Only customer-managed subscriptions can be managed using Azure Service Management (ASM). CSP subscriptions are compliant only with the Azure Resource Management model described in the following section.
The Azure Resource Manager REST API (ARM) has been developed to replace ASM as the authoritative method to manage Azure resources. ARM supports both desired state configuration and RBAC, while providing a pluggable model allowing new Azure services to be cleanly integrated. The preview Azure Portal and the ARM mode of the Azure PowerShell cmdlets both use ARM. AAD is the only authentication method supported by ARM.
ARM introduces the concept of a resource group which is a collection of individual Azure resources. A resource group is associated with a specific Azure region but may contain resources from more than one region.
A resource group can be described in the following scenarios:
Type | Description | Example |
Vertical | Contains all resources comprising the single application | Company HR Application Resource Group |
Horizontal | Combines all resources that comprise the specific deployment topology layer such as shared services used by multiple applications or app-specific tier | Shared Management Services Resource Group |
ARM supports the use of a parameterized resource group template file that can be used to create one or more resource groups along with their individual resources. The deployment of a resource group uses desired state configuration. ARM ensures that the resources are deployed in accordance with the appropriately parameterized template file for the resource group. Resource providers exist for many types of Azure resources, and more Azure services are currently adding ARM support, gradually migrating from the legacy ASM model.
ARM supports role based access control (RBAC), and this support is expressed in the preview Azure Portal and the ARM mode of the Azure PowerShell cmdlets. ARM provides several core roles – Owner, Contributor, and Reader. Individual resource providers support additional resource-specific roles, such as Search Service Contributor and Virtual Machine Contributor.
Azure Resource Manager (ARM) templates enable quick and easily provisioning of Azure applications via declarative JSON. The single JSON template can be constructed to deploy multiple services, such as virtual machines, virtual networks, storage, app services, and databases. The same template can be used to repeatedly and consistently deploy the application during every stage of the application lifecycle. Consequently, templates provide a reusable declarative model that complements imperative management patterns defined by PowerShell.
Azure Resource Manager (ARM) templates can be deployed from Azure PowerShell, Azure CLI, or the Azure preview portal. The following excerpt demonstrates how a known quickstart template defining a simple Azure VM can be deployed using Azure PowerShell.
$deployName="<deployment name>"
$RGName="<resource group name>"
$locName="<Azure location, such as West US>"
$templateURI="https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/101-simple-windows-vm/azuredeploy.json"
New-AzureResourceGroup -Name $RGName -Location $locName
New-AzureResourceGroupDeployment -Name $deployName -ResourceGroupName $RGName -TemplateUri $templateURI
The list below describes typical constructs that can be found in an ARM template:
The parameters section represents a collection of the parameters that are defined in all of the resources, includes property values provided when setting up a resource group.
"parameters": {
"siteName": {
"type": "string"
},
"hostingPlanName": {
"type": "string"
},
"siteLocation": {
"type": "string"
},
}
The resources section lists the resources that the template creates, with each resource described in detail, including its properties, and parameters for user-defined values.
{
"name": "[parameters('databaseName')]",
"type": "databases",
"location": "[parameters('serverLocation')]",
"apiVersion": "2.0",
"dependsOn": [
"[concat('Microsoft.Sql/servers/', parameters('serverName'))]"
],
"properties": {
"edition": "[parameters('edition')]",
"collation": "[parameters('collation')]",
"maxSizeBytes": "[parameters('maxSizeBytes')]",
"requestedServiceObjectiveId": "[parameters('requestedServiceObjectiveId')]"
}
},
The templateLink references another template from the current one. The following excerpt shows how the dependent JSON template file located in Azure storage can be linked from the primary template definition:
{
"properties": {
"template": {
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"childParameter": { "type": "string" }
},
"resources": [ {
"name": "Sub-deployment",
"type": "Microsoft.Resources/deployments",
"apiVersion": "2015-01-01",
"properties": {
"mode": "Incremental",
"templateLink": {
"uri": "http://<stac>.blob.core.windows.net/templates/template.json",
"contentVersion": "1.0.0.0",
},
"parameters": {
"subParameterName": { "value": "[parameters('childParameter']" }
}
}
} ]
}
}
}
The CustomScriptExtension references a Custom Script Extension for Windows that allows it to execute PowerShell scripts on a remote Virtual Machine, without logging into it. The scripts can be executed after provisioning the VM or anytime during the lifecycle of the VM without requiring to open any additional ports on the VM. The most common use case for Custom Script include running, installing, and configuring additional software on the VM post provisioning.
The following excerpt illustrates how to reference the Custom Script Extension from within the JSON template to run a custom Windows PowerShell script to apply post-provisioning configuration:
{
"type": "Microsoft.Compute/virtualMachines/extensions",
"name": "MyCustomScriptExtension",
"apiVersion": "2015-05-01-preview",
"location": "[parameters('location')]",
"dependsOn": [
"[concat('Microsoft.Compute/virtualMachines/',parameters('vmName'))]"
],
"properties": {
"publisher": "Microsoft.Compute",
"type": "CustomScriptExtension",
"typeHandlerVersion": "1.4",
"settings": {
"fileUris": [
http://<stacn>.blob.core.windows.net/customscriptfiles/start.ps1
],
"commandToExecute": "powershell.exe -ExecutionPolicy Unrestricted -File start.ps1"
}
}
}
The following key solution templates scopes have been identified in practical experience. These three scopes (capacity, capability, and end to end solution) are described in more detail below.
Type |
Description |
Example |
Capacity Scope |
Delivers a set of resources in a standard topology that is pre-configured to be in compliance with regulations and policies |
Deploying a standard development environment in an Enterprise IT or SI scenario |
Capability Scope |
Deploying and configuring a topology for a given technology |
Common scenarios including technologies such as SQL Server, Cassandra, Hadoop, etc. |
End to End Solution Scope |
Targeted beyond a single capability, and instead focused on delivering an end to end solution comprised of multiple capabilities. A solution scoped template scope manifests itself as a set of one or more capability scoped templates with solution specific resources, logic, and desired state. |
An end to end data pipeline solution template that might mix solution specific topology and state with multiple capability scoped solution templates such as Kafka, Storm, and Hadoop |
While the template is generally perceived to give customers the utmost flexibility, many considerations affect the choice of whether to use free-form configurations vs. known configurations.
Free-form Configurations
Free-form configurations provide the most flexibility by allowing customization of the resource type and supplying values for all resource properties, such as selecting a VM type and providing an arbitrary number of nodes and attached disks for those nodes.
Nonetheless, since in mature organizations templates are expected to be used to deploy large Azure resource topologies, the complexity of building a template for a sophisticated infrastructure deployment potentially containing hundreds of varied resources results in substantial overhead for designing, maintaining and deploying the free-form template.
Known Configurations
Rather than offer a template that provides total flexibility and countless variations, the common pattern is to provide the ability to select known configurations—in effect, standard sizes such as sandbox, small, medium, and large. Other examples of such sizes are product offerings, such as community edition or enterprise edition. In other cases, it may be workload specific configurations of a technology – such as map reduce or nosql.
Many enterprise IT organizations, OSS vendors, and SIs make their offerings available today in this way in on-premises, virtualized environments (enterprises) or as software-as-a-service (SaaS) offerings (CSVs and OSVs). This approach provides good, known configurations of varying sizes that are preconfigured for customers.
Without known configurations, end customers must determine cluster sizing on their own, factor in platform resource constraints, and do math to identify the resulting partitioning of storage accounts and other resources (due to cluster size and resource constraints). Known configurations enable customers to easily select the right standard size for a given deployment. In addition to making a better experience for the customer, a small number of known configurations is easier to support and can help deliver a higher level of density.
For a given resource, there can be multiple upstream and child dependencies that are critical to the success of deployment topology. Such dependencies can be defined on other resources using dependsOn keyword and resources property of a resource in the ARM template. As an example, a virtual machine may be dependent on having a database resource successfully provisioned. In another case, multiple cluster nodes must be installed before deploying a virtual machine with the cluster management tool.
While dependsOn is a useful tool to map dependencies between resources comprising the deployment, it needs to be used judiciously since it can impact the deployment performance characteristics. As such dependsOn should not be used to document how resources are interconnected. The lifecycle of dependsOn is just for deployment and is not available post-deployment. Once deployed there is no way to query these dependencies. Use of the dependsOn keyword may have implication on the deployment engine operation that would prevent it from using parallelism where it might have otherwise. The mechanism called resource linking should be used instead to document and provide query capability over the relationships between resources.
There are numerous scenarios where an administrator needs to place a lock on a resource or resource group to prevent other users in the organization from committing write actions or accidentally deleting a critical resource. Azure Resource Manager provides the ability to restrict operations on resources through resource management locks. Resource locks are policies which enforce a lock level at a particular scope. The lock level identifies the type of enforcement for the policy, which presently has two values – CanNotDelete and ReadOnly. The scope is expressed as a URI and can be either a resource or a resource group.
For example, various resources are used in an off-and-on pattern, such as virtual machines which are turned on periodically to process data for a given interval of time and then turned off. In this scenario, the VM shut down must be enabled but it is imperative that the underlying storage account not be deleted. In this scenario, a resource lock with a lock level of CanNotDelete can be applied on the storage account.
In another scenario, business organization may have periods where updates must not go into production. In these cases, the ReadOnly lock level stops creation or updates. For example, a retail company may not want to allow updates during holiday shopping periods; a financial services company may have constraints related to deployments during certain market hours. A resource lock can provide a policy to lock the resources as appropriate. This could be applied to just certain resources or to the entirety of the resource group. The resource lock can be applied both via Azure PowerShell or added within the context of the Azure Resource Manager template.
The Microsoft Azure Product Group created a community-maintained set of quickstart ARM templates that could be used as building blocks to author custom JSON templates for complex workloads to be deployed in Azure. For representative purposes a subset of provided templates is listed in the table below:
QuickStart ARM Templates | |
Create an Application Gateway with Public IP | |
Virtual Network with Subnets, a local network, and a VPN gateway |
Create a Virtual Network with two Subnets, a local network, and a VPN gateway |
Create a VM with multiple network interfaces and RDP accessible | |
This template takes a minimum amount of parameters and deploys a Windows VM with tags, using the latest patched version. | |
2 VMs in a Load Balancer and Load Balancer rules | |
Install Virtual Network with DMZ Subnet | |
Network Interface in a Virtual Network with Public IP Address |
Network Interface in a Virtual Network with Public IP Address |
Azure Resource Manager provides a tagging feature that facilitates resource categorization according to customer requirements for managing or billing. Tags are defined as Name-Value pairs assigned to resources or resource groups and can be used in scenarios where customer business processes and organizational hierarchy call for a complex collection of resource groups and resources and subscription assets need to be structured according to established policies. Each resource can have up to 15 tags. Users are able to sort and organize resources by tags. Tags may be placed on a resource at the time of creation or added to an existing resource. Once a tag is placed on a billable resource, created via the Azure Resource Manager, the tag will be included in the usage details found in the Usage and Billing portal.
Tags are persisted in the resource's properties in the order they are added. The following Azure PowerShell excerpt demonstrates how to obtain the tag info associated with the existing virtual machine demonstrating the order in which tags were associated with the virtual machine resource.
PS C:\> Get-AzureVM -Name MyVM -ResourceGroupName Group-1
ResourceGroupName : Group-1
Id : /subscriptions/<..>/resourceGroups/Group-1/providers/Microsoft.Compute/virtualMachines/MyVM
Name : MyVM
Type : Microsoft.Azure.Management.Compute.Models.VirtualMachineGetResponse
Location : westus
Tags : {
"Department": "MarketingDepartment",
"Application": "LOBApp",
"Created By": "CEO",
"AppPropOn1": "AppInsightsComponent",
"AppPropOne": "One"
}
...
NetworkInterfaceIDs : {...c}
Alternatively, tags can be defined in the Resource Manager template as demonstrated in the excerpt below:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"newStorageAccountName": {
"type": "string",
"metadata": {
"description": "Unique DNS Name for the Storage Account where the Virtual Machine's disks will be placed."
}
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"name": "[parameters('newStorageAccountName')]",
"apiVersion": "2015-05-01-preview",
"location": "[variables('location')]",
"tags": {
"Department": "[parameters('departmentName')]",
"Application": "[parameters('applicationName')]",
"Created By": "[parameters('createdBy')]"
},
},
{
"apiVersion": "2015-05-01-preview",
"type": "Microsoft.Network/publicIPAddresses",
"name": "[variables('publicIPAddressName')]",
"location": "[variables('location')]",
"tags": {
"Department": "[parameters('departmentName')]",
"Application": "[parameters('applicationName')]",
"Created By": "[parameters('createdBy')]"
},
]
}
Generally, subscription resources can have tags defined to accommodate the following scenarios:
Mature organizations are encouraged to create a custom tag taxonomy adopted to all Azure organizational assets to ensure all actors consuming Azure resources are compliant with established policies. For example, users utilizing organization-specific tags, such as "Contoso-DeptOne" instead of applying duplicate but slightly different tags (such as "dept" and "department").
The following template excerpt contains JSON that describes tags for a resource that specify the environment type, project name and internal billing chargeback ID. The values for these are passed in via parameters to make this template more re-usable and of higher value for Systems Integrators, Corporate IT, and Cloud Service Vendors. This approach enables them to use the same template to deploy capacity or capabilities for a multitude of customers that each will have distinct values for these tags.
"tags": {
"ChargebackID": "[parameters(chargebackID)]",
"ProjectName": "[parameters(projectName)]",
"EnvironmentType" :"[parameters('environmentType')]"
},
Azure has many partners who have incorporated tags into their cost management solutions. Specifically, such as
Apptio, Cloudability, Cloudyn, Cloud Cruiser, Hanu Insight
and RightScale
are leveraging tags in their products.
Every solution deployed in Microsoft Azure leverages an aspect of Azure Storage, making storage a common component and critical to planning any Azure-based solution design. The storage planning considerations covered in this section include:
Managing, monitoring, and troubleshooting topics, including storage throttling behaviors, storage analytics, and IaaS virtual machine considerations, including I/O profiles and maintaining disk consistency within workloads.
Recommended: For organizations that are new to Azure Storage, it is often helpful to draw comparisons to private cloud storage or traditional SAN storage as a way to understand some of the basic concepts required in an Azure Storage design (for example, compare BLOB storage account to a LUN). |
Planning the storage account infrastructure is perhaps the most important step of any Microsoft Azure deployment because it sets the foundation for performance, scalability and functionality. It is first necessary to understand the two types of storage accounts (standard and premium) and the services available within each type of account. The following sections outline the differences at a high level.
A standard storage account includes Blob, Table, Queue, and File storage services. These storage services are included in every storage account created. A storage account provides a unique namespace for working with the blobs, queues, and tables.
Standard storage accounts are available with four redundancy types:
These redundancy values and their potential use will be covered in later sections of this document.
A premium storage account currently supports only Azure virtual machine disks that are backed by page blobs. A premium storage account stores only page blobs, and only REST APIs for page blobs and their containers are supported. From an infrastructure perspective, premium storage stores data on solid-state drives (SSDs), whereas standard storage stores data on hard disk drives (HDDs). As a result, premium storage delivers high-performance, low-latency disk support for I/O intensive workloads running on Azure virtual machines. The following characteristics summarize the current capabilities of Azure premium storage. Premium storage offers:
Premium storage is limited to local replication only. Premium Storage GRS is not currently available. However, you can optionally create snapshots of your disks and copy those snapshots to a standard GRS storage account if required. This enables the ability to maintain a geo-redundant snapshot of data for disaster recovery purposes.
For high-scale applications and services, you can attach several premium storage disks to a single virtual machine, and support up to 32 TB of disk storage per virtual machine and drive more than 64,000 IOPS per virtual machine at less than 1 millisecond latency for read operations. Like standard storage accounts, premium storage keeps three replicas of data within the same region, and ensures that a Write operation will not be confirmed until it is durably replicated.
Every object that is stored in Azure Storage has a unique uniform resource identifier (URI) address; the storage account name forms the subdomain of that address. The subdomain together with the domain name, which is specific to each service, form an endpoint for your storage account.
For example, if your storage account is named azra1, the default endpoints for your storage account would be:
The endpoints for each storage account are visible on the storage Dashboard in the Azure Management portal after the account has been created.
The URI for accessing an object in a storage account is built by appending the object's location in the storage account to the endpoint. For example, a blob address might have this format: http://azra1.blob.core.windows.net/mycontainer/myblob.
Storage Security
Access to storage accounts is possible through two means:
Please refer to the Storage Security section later in this document to understand common practices and implications for using storage account keys.
Feature References | |
Introduction to Microsoft Azure Storage |
http://azure.microsoft.com/en-us/documentation/articles/storage-introduction/ |
Azure Storage documentation and intro videos |
http://azure.microsoft.com/en-us/documentation/services/storage/ |
Introduction to Premium Storage |
http://azure.microsoft.com/en-us/documentation/articles/storage-premium-storage-preview-portal/ |
Technical Overview |
http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/ |
Quick Start Guide |
http://azure.microsoft.com/en-us/documentation/articles/storage-getting-started-guide/ |
Microsoft Azure Storage Team Blog |
|
Understanding Block Blobs and Page Blobs |
|
Introducing Azure Storage Append Blob |
Mandatory:
| |
Recommended: Consider the use of premium storage when a higher level of disk performance is needed for a given workload or application. Premium storage is high performance SSD-based storage designed to support I/O intensive workloads with significantly high throughput and low latency. With premium storage, you can provision a persistent disk and configure its size and performance characteristics to meet your application requirements. |
Design Guidance |
When designing storage account types and services, consider the following:
Capability Considerations |
Capability Decision Points |
Capability Models in Use |
Different storage account types serve different purposes. |
Each storage account should be allocated for a specific purpose and not be a generic, all-purpose container. You need to decide how to allocate storage accounts for different purposes within your project.. |
Within an IaaS deployment, consider a separate storage account for maintenance of master images that can be deployed to other storage accounts throughout the subscription. Within an IaaS deployment, consider a separate storage account for any backup purposes, separate from any production data, such that it can be created in a different region than the primary data. |
Different storage services provide unique capabilities. |
Understand the type of data and data flow that the storage account will serve to determine the storage service that the account will provide. |
For key lookups at scale for structured data, use Tables. For scans or retrievals of large amount of raw data, such as analytics or metrics, use Blobs. For streaming and storing documents, videos, pictures, backups, and other unstructured text or binary data, use Blobs. For IaaS virtual machine VHDs, use Blobs. For process workflows or decoupling applications, use Queues. To share files between applications running in virtual machines that are using familiar Windows APIs or the File service REST API, use Files |
The storage service offers two types of blobs: block blobs and page blobs. |
Understand and decide on the use of block blobs or page blobs when you create the blob.. |
In the majority of cases, page blobs will be utilized. Page blobs are optimized for random Read and Write operations (best for virtual machines and VHDs). Page blobs have a maximum storage of 1 TB (compared to only 200 GB for a block blob), and they commit immediately (compared to a block blob, which remains uncommitted until a commit is issued). Block blobs are for streaming and storing documents, videos, pictures, backups, and other unstructured text or binary data. Additionally, there are cost differences associated to each type of storage. |
Microsoft Azure provides several ways to store and access randomly access data in the cloud (blobs). |
Decide when to use Azure Blobs, Azure Files, or Azure data disks. |
Azure Files is most often used when data stored in the cloud needs to be accessed by multiple IaaS or PaaS virtual machines with a standard SMB interface or UNC path. Azure Blobs is most often used for larger capacity uses and where random access is needed, such as for multiple disks, and you want to be able to access data from anywhere. Azure data disks are most often used when you want to store data that is not required to be accessed from outside the virtual machine to which the disk is attached. It is exclusive to a single virtual machine (only one at a time). |
Depending on how data is replicated (LRS, GRS, ZRS, RA-GRS), the blob type, storage service, storage transactions, and the use of premium storage will affect the overall cost of the Azure Storage solution. |
When making decisions about how your data is stored and accessed, you should also consider the costs involved. Your total cost depends on how much you store, the volume of storage transactions and outbound data transfers, and which data redundancy option you choose. |
T The type of data will drive most of these decisions. For example, data that is critical to a business may drive the decision to have GRS, whereas data that is less critical may suffice with LRS. Data that must be quickly accessed with the highest possible IOPS may drive the usage of premium storage, where data without that requirement may accept the use of standard storage. These requirements will necessitate and support the higher costs associated with the storage services. |
Storage containers can be used to further organize data in storage accounts. |
Decide how you want the data in Azure Storage to be organized. |
Deciding how to design and build containers is similar to how you would design and build a folder structure on a file server. It is simply how you want to organize the data. By default, all VHDs will be put into a "vhds" folder, but you can change or specify whatever container structure you want to use. |
Concurrency settings can be modified for Azure Storage accounts. |
Modern applications usually have multiple users viewing and updating data simultaneously. This requires developers to think carefully about how to provide a predictable experience to their end users, particularly for scenarios where multiple users can update the same data. There are three main data concurrency strategies developers typically consider:
|
You can opt to use optimistic or pessimistic concurrency models to manage access to blobs and containers in the Blob service. If you do not explicitly specify a strategy, last writer wins is the default. For IaaS, concurrency settings do not need to be modified. For PaaS, the developers need to consider the type of application, the user base, and the data types to help determine the concurrency settings. |
There are storage account limitations that must be understood and respected. |
Aside from the size limits of Azure Storage accounts, you must consider the throughput limitations of each account and design your storage accounts with those in mind. You are more likely to hit the throughput limitations before you hit the size limitations. You are also limited by the number of storage accounts per subscription. |
The primary constraining factor is the number of VHD files that can be stored in each storage account. For virtual machines in the Basic tier, do not place more than 66 highly used VHDs in a storage account to avoid the 20,000 total request rate limit (20,000/300). For virtual machines in the Standard tier, do not place more than 40 highly used VHDs in a storage account (20,000/500). The term highly used refers to VHDs that push the upper limits of the total request rates. If you have VHDs that are not highly used and do not come close to the maximum request rates, you can put more VHDs in the storage account. Note that this refers to virtual hard disks and not virtual machines. Virtual machines may indeed contain multiple virtual hard disks. |
Single or multiple storage accounts can be used. |
Additional storage accounts may be used to get more scale than a single storage account. Consider how to design the IaaS or PaaS workloads to dynamically add accounts, in the event that more scale is needed for the solution in the future, beyond what a single storage account can provide. |
Storage account throughput is the determining factor in using single or multiple storage accounts. Consider the throughput limitations of each of the storage account types. Also, consider that throughput can be maximized by using:
|
The choice of a name for any asset in Microsoft Azure is an important choice because:
This table covers the naming requirements for each element of a storage account.
Item |
Length |
Casing |
Valid characters |
Storage account name |
3-24 |
Lower case |
Alphanumeric |
Blob name |
1-1024 |
Case sensitive |
Any URL char |
Container name |
3-63 |
Lower case |
Alphanumeric and dash |
Queue name |
3-63 |
Lower case |
Alphanumeric and dash |
Table name |
3-63 |
Case insensitive |
Alphanumeric |
It is also possible to configure a custom domain name for accessing blob data in your Azure Storage account. The default endpoint for the Blob service is:
https://mystorage.blob.core.windows.net
But if you map a custom domain (such as www.contoso.com) to the blob endpoint for your storage account, you can also access blob data in your storage account by using that domain. For example, with a custom domain name, http://mystorage.blob.core.windows.net/mycontainer/myblob could be accessed as http://www.contoso.com/mycontainer/myblob.
Use the following reference when this capability is required.
Feature References | |
Naming and Referencing Containers, Blobs, and Metadata |
|
Naming Queues and Metadata |
|
Naming Tables |
https://msdn.microsoft.com/en-us/library/azure/dd179338.aspx |
Configure a custom domain name for blob data in an Azure Storage account |
http://azure.microsoft.com/en-us/documentation/articles/storage-custom-domain-name |
Mandatory:
| |
Recommended: We recommend that you establish a naming convention for all storage accounts and types before you create any. |
Design Guidance |
When you choose naming conventions for storage objects, consider the following:
Resource |
Restrictions |
Recommendations |
Storage account |
Must be between 3 and 24 characters in length and use numbers and lower-case letters only. Not only must it be unique within the subscription, but it also must be unique across Azure. |
Naming should be representative of its contents (for example, virtual machines, backup data, archive data, or images). |
Storage Blob container |
Container names must start with a letter or number, and they can contain only letters, numbers, and the hyphen (-) character. Every hyphen must be immediately preceded and followed by a letter or number; consecutive hyphens are not permitted in container names. All letters in a container name must be lower case. Container names must be from 3 through 63 characters long. |
Naming should be representative of its contents (for example, vhds, server images, or backup-Mar03-2015) |
Storage Blob |
A blob name can contain any combination of characters. A blob name must be at least one character long and cannot be more than 1,024 characters long. Blob names are case sensitive. Reserved URL characters must be properly escaped. The number of path segments comprising the blob name cannot exceed 254. A path segment is the string between consecutive delimiter characters (for example, the forward slash) that corresponds to the name of a virtual directory. |
Naming should be representative of its contents. |
Queues |
Every queue within an account must have a unique name. The queue name must be a valid DNS name. A queue name must start with a letter or number, and can only contain letters, numbers, and the hyphen (-) character. The first and last letters in the queue name must be alphanumeric. The hyphen cannot be the first or last character. Consecutive hyphens are not permitted in the queue name. All letters in a queue name must be lower case. A queue name must be from 3 through 63 characters long. |
Naming should be representative of its contents. |
Tables |
Table names must be unique within an account. Table names can contain only alphanumeric characters. Table names cannot begin with a numeric character. Table names are case insensitive. Table names must be from 3 to 63 characters long. Some table names are reserved, including "tables." Attempting to create a table with a reserved table name returns error code 404 (bad request). Table names preserve the case with which they were created, but are case insensitive when used. |
Naming should be representative of its contents. |
The location and durability of the storage accounts must also be taken into account. Durability and redundancy options also have an impact on the cost of the storage. When creating a storage account (either through the portal, Azure PowerShell, or REST APIs), you are required to specify an affinity group or a location.
To ensure that the Azure SLAs can be met, there are several levels of data replication available for the storage accounts:
Feature References | |
Azure Storage Replication for Durability and High Availability |
|
Azure Storage Redundancy Options |
http://azure.microsoft.com/en-us/documentation/articles/storage-redundancy/ |
Azure SLAs (including Storage) |
|
Azure Storage Pricing Guide |
|
Using resource groups to manage your Azure resources |
http://azure.microsoft.com/en-us/documentation/articles/azure-preview-portal-using-resource-groups |
Mandatory:
| |
Recommended: Not all storage services are available in all regions. Be sure to check the availability of the service you desire, in the region you desire, during the planning phase. (For example, premium storage is limited to only a few regions.) For more information, see Services by regions. | |
Optional:
| |
Design Guidance |
When you design storage durability and redundancy, consider the following:
Capability Considerations |
Capability Decision Points |
Capability Models in Use |
Each availability option provides a different level of data redundancy. |
Carefully consider the level of redundancy that you may need with your data. Not all data needs the same level of redundancy (and redundancy costs). Locally redundant storage (LRS) is less expensive than geographically redundant storage GRS, and it also offers higher throughput. If your application stores data that can be easily reconstructed, you may opt for LRS. Some applications are restricted to replicating data only within a single region due to data governance or privacy requirements. If your application has its own geo-replication strategy (for example, SQL AlwaysOn and Active Directory domain controllers), then it may not require GRS. ZRS is currently available only for block blobs. Note that once you have created your storage account and selected zone-redundant replication, you cannot convert it to use to any other type of replication, or vice versa. |
Locally redundant storage (LRS) provides economical local storage or data governance compliance. Zone redundant storage (ZRS) provides an economical, yet higher durability option for block Blob storage. Geographically redundant storage (GRS) provides protection against a major datacenter outage or disaster. Read-access geographically redundant storage (RA-GRS) provides Read access to data during an outage, for maximum data availability and durability. In general, plan to design to regional redundancy (GRS), unless the workload already accounts for it. In that case, there is no need to duplicate it. Also, see the previous section about premium storage for special considerations on premium storage redundancy options. |
Plan with failure in mind. |
Redundancy options are available not because failures may occur, but because they will occur. Accept that hardware failures are part of running hyper-scale datacenters, and plan that failures will occur through the use of available redundancy options. |
Ensure applications and workloads have "retry options" for storage connection failures, in the event the storage becomes unavailable in the primary location. No code changes are required, but small latency may occur. Latency sensitive apps may also benefit from the use of cache. |
When a storage account is created, only the owner of that account may access the blobs, tables, files, and queues within that account. There are several ways to grant and share access to storage accounts to other users. This section discusses some of the available options.
When you create a storage account, Azure generates two storage access keys, which are used for authentication when the storage account is accessed. By providing two storage access keys, Azure enables you to regenerate the keys with no interruption to your storage service or access to that service.
One simple way to grant access to the storage account is to share that storage access key. However, if your service or application needs to make these resources available to other clients without sharing your access key, you have other options for permitting access:
Every request made to an Azure Storage account must be authenticated, unless it is an anonymous request against a public container or its blobs. There are two ways to authenticate a request against the storage accounts:
Feature References | |
Manage Access to Azure Storage Resources |
http://azure.microsoft.com/en-us/documentation/articles/storage-manage-access-to-resources/ |
Authenticating Access to Your Azure Storage Account |
https://msdn.microsoft.com/en-us/library/azure/hh225339.aspx |
Authentication for the Azure Storage Services REST API reference |
|
Microsoft Azure Storage Explorers |
http://blogs.msdn.com/b/windowsazurestorage/archive/2014/03/11/windows-azure-storage-explorers-2014.aspx |
Constructing the Shared Access Signature URI |
https://msdn.microsoft.com/en-us/library/azure/dn140255.aspx |
Mandatory: You need the Azure Storage access key to access the storage account through any GUI tools, such as Azure Storage Explorer or any third-party tools. | |
Mandatory: The primary access key and secondary access key for storage accounts should be changed periodically to mitigate unauthorized access.
We suggest changing each key (primary or secondary) every 60 to 120 days. This allows for an ongoing monthly or quarterly key change cadence that affects only one key at a time. Additional events that could cause you to regenerate keys include when a security incident occurs, if you fear compromise of storage account keys, or when key administrative personnel leave your organization. Comparisons can be made within each organization regarding password changes for critical service accounts or credentials in Active Directory and other authentication systems. As a general practice, Azure storage account and vault keys should follow similar practices and procedures currently established within the organization. | |
Recommended:
| |
Optional: A container or blob can be made available for public |
Design Guidance |
When you design storage security, consider the following:
Capability Considerations |
Capability Decision Points |
Capability Models in Use |
Storage accounts can be created with internal (private) or external (public) access. |
The decision about what type of access control to apply to the storage accounts depends entirely on the type of data stored in those accounts and how that data needs to be accessed and protected. |
This is something that is unique to every customer in Azure. In general, the guidance is to always start with internal (private) access only, then find reasons (exceptions) why the data may need external (public) access. Most companies do not need to have external access directly to their data. |
Storage keys can be used to protect storage accounts against unauthorized usage. |
Storage keys should be treated like highly privileged credentials (such as Domain Admin credentials). They should be limited to a few selected, trusted resources within the organization. If you need to grant access to storage accounts without sharing the storage keys, there are other methods to accomplish this. |
To permit access to storage resources without giving out your access keys, you can use a shared access signature. A shared access signature provides access to a resource in your account for an interval that you define and with the permissions that you specify. If your service requires that you exercise more granular control over blob resources, or if you want to provide permissions for operations other than Read operations, you can use a shared access signature to make a resource accessible to users. You can specify that a container should be public, in which case all Read operations in the container and any blobs within it are available for anonymous access. An anonymous request does not need to be authenticated, so a user can perform the operation without providing account credentials. |
Client-side encryption for Microsoft Azure Storage contains new functionality to help developers encrypt their data inside client applications before uploading it to Azure Storage. The data also can be decrypted when it is downloading.
Client-side encryption also supports integration with Azure Key Vault to store and manage the encryption keys. The storage service never sees the keys and is incapable of decrypting the data. This gives you the most control you can have. It's also fully transparent so you can inspect exactly how the library is encrypting your data to ensure that it meets your standards.
From a service-level perspective, Microsoft has a responsibility to protect stored data to mitigate threats related to physical drives within each Azure datacenter. Storage within Azure is exposed in one or more storage accounts within each Azure subscription.
Within a Microsoft Azure datacenter, storage accounts do not reside on a single disk. Rather, data is distributed across several disks in the form of extents within the Azure fabric. They are replicated within and across datacenters based on customer-selected preferences, such as locally redundant storage or geo-redundant storage.
Microsoft protects the data stored within each datacenter with a comprehensive set of controls in alignment with the security certifications outlined at the Azure Trust Center website.
From a subscription-level perspective, customers can additionally protect storage accounts within their Azure subscription to mitigate threats related to subscription administrators within their organization. Access to data found within each storage account is accessible to workloads in multiple ways: Queue, Table, Blob, and Files (SMB).
Each storage account has several layers of protection, including those that are provided by Microsoft and those that are controlled by the customer, using both Microsoft and third-party mechanisms.
Depending on the workload type and how it is accessed, data can be protected in the following ways:
Subscription-Level Workload Types and Storage | |
IaaS |
IaaS workloads (virtual machines) contain their storage inside virtual hard disks (VHDs), which are stored as page blobs in one or more storage accounts. Some lift-and-shift virtual machine workloads access data over Files (SMB). |
PaaS |
PaaS workloads access storage by using one or more of the accessible methods outlined previously (Queues, Tables, and Blobs). |
StorSimple |
StorSimple appliances access storage over the storage account's REST API URL and encrypts storage. |
Subscription-Level Protection Types | |||
Data-At-Rest |
Data-In-Transit |
Data Access | |
IaaS |
Performed by the customer by encrypting the virtual hard disk (VHD) files. Microsoft and third-party mechanisms are used. Workloads (such as SQL Server) also support Transparent Data Encryption (TDE). Technologies that assist with this are:
|
Performed by the customer by using transport encryption of traffic traversing exposed virtual machine network endpoints. Microsoft and third-party mechanisms are used. Actions performed by Microsoft include disk encryption using BitLocker Drive Encryption for bulk import/export operations and encrypting traffic between Azure datacenters. Technologies that assist with this are:
|
Performed by the customer by using native protections within the installed operating system to authenticate and authorize access to the virtual hard disk (VHD) data that is exposed through the operating system and published endpoints (for example, operating system file shares). |
PaaS |
Performed by the customer by encrypting data located in Queue, Table, and Blob storage. Uses Microsoft encryption mechanisms. Technologies that assist with this are:
|
Performed by the customer by using transport encryption of traffic traversing storage account network endpoints. Microsoft and third-party mechanisms are used. Actions performed by Azure include encryption of traffic between Azure datacenters. Technologies that assist with this are:
|
Performed by the customer by using shared access keys and shared access signatures to provide access to data stored in Queue, Table, and Blob storage. Technologies that assist with this are:
|
StorSimple |
Performed by the appliance using AES-256 encryption with Cipher Block Chaining (CBC) prior to saving to the mapped Azure storage account. |
Performed by the customer by using transport encryption of traffic traversing exposed physical or virtual machine network endpoints. Performed by the appliance using SSL encryption. |
Performed by the customer by using native protections within the installed operating system to authenticate and authorize access to attached StorSimple volumes. Performed by the appliance by using authentication protocols (such as CHAP), ACLs, network access control, and Role-Based Access Control (RBAC). |
A Microsoft Azure virtual machine is created from an image or a disk. All virtual machines use one operating system disk, a temporary local disk, and they enable the use of multiple data disks depending on the selected size of the virtual machine. All images and disks, except the temporary local disk, are created from virtual hard disk (VHD) files that are stored as page blobs in a storage account in Microsoft Azure.
You can use platform images that are available in Microsoft Azure to create virtual machines, or you can upload your own images to create customized virtual machines. The disks that are created from images are also stored in Azure Storage.
Disks can be leveraged in different ways with a virtual machine in Microsoft Azure. An operating system disk is a VHD that you use to provide an operating system for a virtual machine. A data disk is a VHD that you attach to a virtual machine to store application data. You can create and delete data disks whenever you have to.
Each virtual machine that you create has a temporary local disk, which is labeled as drive D by default. This disk exists only on the physical host server on which the virtual machine is running. It is not stored in blobs in Azure Storage. This disk is used by applications and processes that are running in the virtual machine for transient and temporary storage of data. It is used also to store page files for the operating system.
The operating system disk and data disk each have a host caching configuration setting called Host cache preference, which can improve performance under some circumstances. By default, Read/Write caching is enabled for operating system disks and all caching is off for data disks. Note that some workloads have specific configuration requirements with this setting. Its use should be reviewed carefully with vendor and for a workload's specific needs.
An image is a VHD file (.vhd) that you can use as a template to create a new virtual machine. You can use images from the Azure Image Gallery, or you can create and upload your own custom images. To create a Windows Server image, you must run the Sysprep command on your server to generalize and shut it down before you can upload the .vhd file that contains the operating system.
A .vhd file is stored as a page blob in Microsoft Azure Storage, and it can be used to create images and operating system disks or data disks in Microsoft Azure. You can upload a .vhd file to Microsoft Azure and manage it as you would any other page blob. The .vhd files can be copied, moved, or deleted if a lease does not exist on the VHD (for example, it belongs to an existing virtual machine).
A VHD can be in a fixed format or a dynamic format. Currently, however, only the fixed format is supported in Microsoft Azure. Often, the fixed format wastes space because most disks contain large unused ranges. However, in Microsoft Azure, fixed VHD files are stored in a sparse format, so you receive the benefits of fixed and dynamic disks at the same time.
When you create a virtual machine from an image, a disk is created for the virtual machine, which is a copy of the original VHD file. To protect against accidental deletion, a lease is created if you create an image, an operating system disk, or a data disk from a VHD file.
Feature References | |
Disks and Images in Azure |
https://msdn.microsoft.com/en-us/library/azure/jj672979.aspx |
Virtual Machine Disks in Azure |
https://msdn.microsoft.com/en-us/library/azure/dn790303.aspx |
Virtual Machine Images in Azure |
https://msdn.microsoft.com/en-us/library/azure/dn790290.aspx |
VHDs in Azure |
https://msdn.microsoft.com/en-us/library/azure/dn790344.aspx |
Manage Images using Windows PowerShell |
https://msdn.microsoft.com/en-us/library/azure/dn790330.aspx |
How To Change the Drive Letter of the Windows Temporary Disk |
http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-change-drive-letter |
Mandatory:
| |
Recommended: You can read or write to a single blob at up to a maximum of 60 MB/second (this is approximately 480 Mbps, which exceeds the capabilities of many client-side networks (including the physical network adapter on the client device). In addition, a single blob supports up to 500 requests per second. If you have multiple clients that need to read the same blob, and you might exceed these limits, you should consider using a content delivery network (CDN) for distributing the blob. | |
Optional: Different virtual machine sizes allow for a different number of data disks to be attached. Be sure to choose the appropriate size virtual machine, based on the number of data disks that you may anticipate needing. For example, a size A1 virtual machine can have a maximum of two data disks. If you need more than two data disks, choose something bigger than an A1. |
Design Guidance |
When you design storage for IaaS, consider the following:
Capability Considerations |
Capability Decision Points |
Capability Models in Use |
The IaaS design will depend heavily on the storage account design. |
For IaaS workloads, it is important to first understand the I/O (or IOPS) and the profile of a workload to determine the stress and expectations it will put on the storage accounts. Based on this information, you can determine how VHDs should be stored in storage accounts and what kind of limitations you will be subject to. |
The primary constraining factor is the number of VHD files that can be in each storage account. For virtual machines in the Basic tier, do not place more than 66 highly used VHDs in a storage account to avoid the 20,000 total request rate limit (20,000/300). For virtual machines in the Standard tier, do not place more than 40 highly used VHDs in a storage account (20,000/500). The term highly used refers to VHDs that push the upper limits of the total request rates. If you have VHDs that are not highly used and do not come close to the maximum request rates, you can put more VHDs in the storage account. Note that this refers to virtual hard disks, not virtual machines. Virtual machines may indeed contain multiple virtual hard disks. |
Deployable virtual machine images also reside in storage accounts. |
When uploading and deploying images, a storage account must be used to house those images. The decision comes to which storage account should be used for images versus live virtual machines. |
We recommend that all custom virtual machine images are stored in a separate dedicated storage account, from which deployments can occur. This keeps images separate from live virtual machines and prevents them from usurping any IOPS. Deployment can occur by copying an image from one storage account to another, thus keeping the images isolated and protected. This also allows you to give special permissions to the images storage account that you might not grant to the live virtual machines storage account (such as permissions for image deployment engineers). Also, never deploy an image across a VPN connection. Always maintain the source image in an Azure Storage account. This will provide for a much faster deployment, instead of pushing it across the VPN each time. |
Choosing which storage account to deploy a virtual machine into is not a permanent decision. |
If necessary, it is possible to migrate a virtual machine from one storage account to another. For detailed procedures, see Migrate Azure Virtual Machines between Storage Accounts. |
Migrating a virtual machine to another storage account should be done on an as-needed basis. If this is needed more frequently, we recommend that you re-examine the storage account architecture to ensure adequate coverage for all deployment points. |
Consider what to do if multiple data disks are needed in a single (striped) volume. |
If multiple data disks need to appear as a single volume within a virtual machine, you are limited to using LRS only (you cannot use GRS for those VHDs). |
If multiple data disks need to appear as a single volume, it is not possible to enable this in a storage account that is configured with GRS. Those VHDs must be stored in a storage account configured as LRS, or if GRS is still a requirement, each data disk must be kept as a separate volume. Data loss may occur if you use striped volumes (Windows or Linux) in geo-replicated storage accounts due to loose consistency for VHDs distributed across storage accounts. If a storage outage occurs and it requires restoring data from a replicated copy, there is no guarantee that the Write order of the striped disk set would be intact after it is restored. |
Disk cache settings have an effect on the performance of the virtual machine disks. |
The operating system disk and the data disk have a host caching setting that can improve performance under some circumstances. However, these settings can also negatively affect performance in other circumstances, depending on the application. |
Host caching is off by default for Read and Write operations for data disks. Host-caching is on by default for Read and Write operations for operating system disks. Only change these settings if the workload would benefit from the change in cache to improve performance. Cache setting changes for the operating system disk require a reboot. Cache setting changes for a data disk do not. |
Data in Azure Storage can be accessed and managed in a variety of ways, and through numerous tools and process. This section will cover the various mechanisms that support managing Microsoft Azure Storage.
Graphical user interface (GUI)-based tools are those that access Azure Storage in an interface that mimics File Explorer. This includes functionality such as drag-and-drop-based tools that allow you to view and access data as you would on a local or network drive on a server. These tools are easy to use and understand, and they are the best option for those who are new to Azure Storage.
Command-line interface (CLI)-based tools are those that access Azure Storage from a command line, such as Azure PowerShell. This allows you to include data operations (such as move, copy, and delete) within automation scripts. These interfaces have many options and switches to allow for a variety of options in working with the data. They are best used by advanced users and those who are already familiar with Windows PowerShell and require automation as part of their Azure-based solution.
The REST APIs for the Azure Storage services offer programmatic access to the Blob, Queue, Table, and File services in Azure, or in the development environment, via the storage emulator.
All storage services are accessible via REST APIs. Storage services can be accessed from within a service running in Azure, or directly over the Internet from any application that can send an HTTP/HTTPS request and receive an HTTP/HTTPS response. These interfaces are best suited for developers or solutions that require detailed information or control over Azure Storage services.
The Azure Storage Client Library reference for .NET contains the current version of the Storage Client Library for .NET. You can install the Storage Client Library for .NET from NuGet or from the Azure SDK for .NET. The source code for the Storage Client Library for .NET is publicly available in GitHub. The Azure Storage Native Client Library is a C++ library for working with the Azure Storage services.
The Azure Cross-Platform Command-Line Interface (xplat-cli) provides a set of open source, cross-platform commands for working with the Azure platform. The xplat-cli provides much of the same functionality found in the Azure portal, such as the ability to manage websites, virtual machines, storage, and SQL Databases. The xplat-cli is written in JavaScript, and it requires Node.js.
Other vendors are free to distribute their Azure Storage management tools, in addition to what Microsoft provides.
The Azure Storage Emulator provides a local environment that emulates the Azure Blob, Queue, and Table services for development purposes. By using the storage emulator, you can test your application against the storage services locally, without incurring any costs.
After the suitability of the data has been determined, there are multiple methods to move data to Azure and manage its lifecycle:
For smaller amounts of data, a manual copy using a GUI tool or a CLI program can accomplish the data move. AzCopy is perhaps the most popular choice to move small amounts of data to and from Azure Storage accounts. For more information, see Getting Started with the AzCopy Command-Line Utility.
Feature References | |
Storage Services REST API Reference |
https://msdn.microsoft.com/en-us/library/azure/dd179355.aspx |
Storage Client Library Reference |
https://msdn.microsoft.com/en-us/library/azure/dn261237.aspx |
Import/Export Service REST API Reference |
https://msdn.microsoft.com/en-us/library/azure/dn529096.aspx |
Azure GUI Storage Explorers |
|
Using the Azure Cross-Platform Command-Line Interface |
|
Install and Configure the Azure Cross-Platform Command-Line Interface |
http://azure.microsoft.com/en-us/documentation/articles/xplat-cli |
Use the Microsoft Azure Import/Export Service to Transfer Data to Blob Storage |
http://azure.microsoft.com/en-us/documentation/articles/storage-import-export-service |
Using the Azure Storage Emulator |
https://azure.microsoft.com/en-us/documentation/articles/storage-use-emulator/ |
Azure Throughput Analyzer |
http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-013d927e15a7/default.aspx |
Mandatory:
| |
Recommended: Azure Storage supports HTTP and HTTPS; however, using HTTPS is highly recommended.. | |
Optional: During shipping, physical media may need to cross international borders. You are responsible for ensuring that your physical media and data are imported and exported in accordance with the applicable laws. Before shipping the physical media, check with your advisors to verify that your media and data legally can be shipped to the identified datacenter. This helps ensure that it reaches Microsoft in a timely manner. |
Design Guidance |
When you design storage management, consider the following:
Capability Considerations |
Capability Decision Points |
Capability Models in Use |
There is a need to determine whether it is appropriate to store the data in Azure Storage. |
When moving data into Azure Storage (aside from IaaS VHD files), it is important to examine and determine the suitability of placing that data into Azure. There are three primary factors to examine to determine if data should move to Azure Storage:
|
Suitability is something that is unique to every customer. It is important to first understand the data privacy laws that may apply to the customer's country, industry, and any regulatory bodies that may govern it. From there, begin to understand the sensitivity of the data that may be stored. Then look at the technical aspects of access frequency and active vs. passive data. It doesn't make sense to look at the technical aspects of the data before looking at the legal aspects of storing the data in public cloud storage. |
Monitoring your Azure Storage environment is as important as monitoring your on-premises storage environment. Within your storage service, the following areas need to be monitored: service health, availability, performance, and capacity.
Monitoring Service Health
You can use the Azure portal to view the health of the storage service and other Azure services in all Azure regions. This is where you can see if there are any issues outside of your control that maybe affecting your storage service.
Monitoring Availability
You should monitor the availability of the storage services in your storage account by monitoring the value in the Availability column in the hourly or minute metrics tables. The availability of your storage should be at 100%. If not, you need to identify what is causing degradation in your storage.
Monitoring Performance
There are multiple areas in your storage services that you should monitor for performance trends. Some of key areas to monitor include AverageE2ELatency, TotalIngress, and TotalEgress.
Additionally, it is important to consider monitoring for storage account throttling. Throttling is the mechanism that Azure uses to limit the IOPS allocated to a given Azure storage account (currently 20,000 IOPS). When this amount is exceeded, Azure implements throttling to ensure that limitations in the service are preserved.
Although it is somewhat unlikely in well-planned Azure environments, monitoring of the Throttling Error and Throttling Error Percentage metrics is an effective mechanism to identify when throttling events occur in the service. This operation is outlined in the following article: How to Monitor for Storage Account Throttling.
Optional: You should continuously monitor your Azure applications to ensure they are healthy and performing as expected. |
You can configure alerting for different services, including Azure Storage. When you configure alerting, you have the option of having those alerts emailed to the Co-administrators for the subscription.
An alert rule enables you to monitor an Azure service based on a metric value that is set by your organization. When the value that is configured for a specific metric is reached and the threshold is assigned for a rule, the alert rule becomes active and registers an alert. This alert is then logged in the system.
When there are problems with your storage, there are a number of ways that you may become aware of these issues. These include:
Typically, issues related to Azure Storage services fall into one of the following four broad categories:
There are some common issues that you may encounter from your Azure Storage services that you will need to troubleshoot. These issues include:
To troubleshoot applications that use Azure Storage, you can use a combination of tools to determine when an issue has occurred and what the cause of the problem may be. These tools include:
The Storage Metrics feature is available in the Azure portal to help you monitor your storage performance. Storage Metrics can be thought of as an equivalent to Windows Performance Monitor counters in the Microsoft Azure service.
A comprehensive set of metrics (counters) enable the ability to see data from services, such as the percentage of successful or failed service requests and the service's availability. The following image shows the monitoring page in the Azure portal where you can view metrics such as total request, success percentage (Blob), success percentage (Table), and availability.
Microsoft System Center Operations Manager allows for monitoring Azure Storage by utilizing Management Packs. The management pack for Microsoft Azure enables monitoring the availability and performance of Azure Fabric resources that are running on Microsoft Azure.
The management pack runs on a specified server pool and then uses various Microsoft Azure APIs to remotely discover and collect instrumentation information about a specified Microsoft Azure resource, such as a cloud service, storage, or virtual machines.
Feature References | |
Monitor a Storage Account in the Azure Management Portal |
http://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/ |
Monitor, diagnose, and troubleshoot Microsoft Azure Storage |
|
End-to-End Troubleshooting using Azure Storage Metrics and Logging, AzCopy, and Message Analyzer |
http://azure.microsoft.com/en-us/documentation/articles/storage-e2e-troubleshooting |
Storage Analytics |
http://azure.microsoft.com/en-us/documentation/articles/storage-analytics |
How to: Receive Alert Notifications and Manage Alert Rules in Azure |
https://msdn.microsoft.com/en-us/library/azure/dn306638.aspx |
Understanding Monitoring Alerts and Notifications in Azure |
https://msdn.microsoft.com/en-us/library/azure/dn306639.aspx |
Azure Storage Analytics Metrics Management Pack |
|
System Center Management Pack for Microsoft Azure |
http://www.microsoft.com/en-us/download/details.aspx?id=38414 |
How to Monitor for Storage Account Throttling |
http://blogs.msdn.com/b/mast/archive/2014/08/02/how-to-monitor-for-storage-account-throttling.aspx |
Mandatory:
| |
Recommended:
| |
Optional:
| |
Design Guidance |
When you design storage monitoring, consider the following:
Capability Considerations |
Capability Decision Points |
Capability Models in Use |
Storage monitoring and analytics are not enabled by default. Monitoring would grant access to:
|
Because storage monitoring and analytics are not enabled by default, the decision must be about whether to enable them, based on why and what they will be used for. It is possible to log not only storage metrics and performance information, but also authentication requests, anonymous requests, transaction metrics, and capacity metrics. |
In most models where organizations are starting their initial use of Azure, it is wise to enabled storage monitoring and analytics to observe the data that is available during the collection process. In general, we recommend that you enable storage analytics at least:
Note that analytics and monitoring can be enabled or disabled at any time. |
All metrics data is written by the services of a storage account. As a result, each Write operation performed by Storage Analytics is billable. The amount of storage used by metrics data is also billable. |
Every request made to a storage account is billable or non-billable. Storage Analytics logs each individual request made to a service, including a status message that indicates how the request was handled. Similarly, Storage Analytics stores metrics for a service and the API operations of that service, including the percentages and count of certain status messages. Together, these features can help you analyze your billable requests, make improvements on your application, and diagnose issues with requests to your services. |
When looking at Storage Analytics data, you can use the tables in the Storage Analytics Logged Operations and Status Messages areas to determine what requests are billable. Then you can compare your logs and metrics data with the status messages to see if you were charged for a particular request. You can also use the tables in this area to investigate availability for a storage service or individual API operation. |
A Cloud Integrated Storage solution is one that uses a combination of on-premises storage and cloud storage. The purpose of cloud integrated storage is to take advantage of the lower cost of cloud storage (as compared to traditional on-premises SAN storage), but still connect to and manage it similarly to how you would treat on-premises storage.
Microsoft Azure StorSimple is a cloud integrated storage solution that manages storage tasks between on-premises devices and Azure cloud storage. Azure StorSimple is designed to reduce storage costs, simplify storage management, improve disaster recovery capability and efficiency, and provide data mobility. StorSimple has several components:
Feature References | |
StorSimple MSDN Reference Site |
https://msdn.microsoft.com/en-us/library/azure/dn772442.aspx |
StorSimple 8000 Series Chalktalk |
|
StorSimple Hybrid Cloud Storage Features and Benefits |
http://www.microsoft.com/en-us/server-cloud/products/storsimple/overview.aspx |
Hybrid Cloud Storage |
Mandatory:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Recommended:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Optional: Remote management is turned off by default. You can use the StorSimple Manager service to enable it. As a security best practice, remote access should be enabled only during the time period that it is actually needed.
Configuration best practices are implemented during deployment and are specific to the way StorSimple is deployed. The following configuration recommendations are provided.
Operational best practices are part of everyday (or ongoing) operations. The following operational guidance is provided:
|
Microsoft Azure provides a comprehensive platform of Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) capabilities and services that can support a wide range of customer applications and services. Cloud infrastructures can be comprised of on-premises customer-, partner-, or public-hosted cloud computing infrastructures that provide a range of capabilities for organizations to consume natively or across these models in a hybrid capacity.
Focusing on the public cloud, the primary difference between IaaS and PaaS constructs is the division of responsibility for common operational functions between the provider and the consumer. In Microsoft Azure, the Microsoft Corporation acts as the cloud provider, and the organization acts as one of many cloud consumers. This relationship is outlined at a high level in the following diagram.
The focus of this section is on the Microsoft IaaS capabilities, which in large part consist of storage, networking, backup and recovery, large scale computing, and traditional virtual machine deployments. The primary building block for IaaS solutions deployed on Microsoft Azure is Virtual Machines and this section explains how Virtual Machines can be used to build solutions for your environment.
At a high level, Azure Virtual Machines provides a traditional virtualized server infrastructure to deploy a given application or service. Typically, it includes a compute instance consisting of virtual CPUs (cores), virtual memory, a persistent operating disk, potential persistent data disks, and internal/external networking to allow the system to interact with other aspects of a customer's environment, application, or solution.
Azure Virtual Machines has several considerations as part of its design, including size, storage, placement, source images, and additional functionality, which can be provided through Microsoft and third-party add-ins. This section explains each of these areas to provide an overview, guidance, and potential design decisions that are required when implementing Azure Virtual Machines.
Feature References | |
Azure Virtual Machines documentation |
http://azure.microsoft.com/en-us/documentation/services/virtual-machines/ |
When deploying applications and solutions using Microsoft Azure Virtual Machines, there are various sizing configurations that are available to organizations. Virtual Machines are available in different sizing series (A, D, DS, and G series as examples). Within each sizing series there are incremental sizes (A0, A1, and so on) and different tiers (Standard and Basic).
The sizing and tiering options provide customers with a consistent set of compute sizing options, which expand as time goes on. From a sizing perspective, each sizing series represents various properties, such as:
As outlined earlier, some virtual machine series includes the concept of Basic and Standard tiers. A Basic tier virtual machine is only available on A0-A4 instances, and a Standard tier virtual machine is available on all size instances. Virtual machines that are available in the Basic tier are provided at a reduced cost and carry slightly less functionality than those offered at the Standard tier. This includes the following areas:
Capability Consideration |
Capability Decision Points |
CPU |
Standard tier virtual machines are expected to have slightly better CPU performance than Basic tier virtual machines |
Disk |
Data disk IOPS for Basic tier virtual machines is 300 IOPS, which is slightly lower than Standard tier virtual machines (which have 500 IOPS data disks). |
Features |
Basic tier virtual machines do not support features such as load balancing or auto-scaling. |
The following table is provided to illustrate a summary of key decision points when using Basic tier virtual machines:
Size |
Available CPU Cores |
Available Memory |
Available Disk Sizes |
Maximum Data Disks |
Maximum IOPS |
Basic_A0 – |
1 – 8 |
768 MB – 14 GB |
Operating system = 1023 GB Temporary = 20 - 240 GB |
1 - 16 |
300 IOPS per disk |
In comparison, Standard tier virtual machines are available for all compute sizes.
Capability Consideration |
Capability Decision Points |
CPU |
Standard tier virtual machines have better CPU performance than Basic tier virtual machines. |
Disk |
Data disk IOPS for Basic tier virtual machines is 500. (This is higher than Basic tier virtual machines, which have 300 IOPS data disks.) If the DS series is selected, IOPS start at 3200. |
Availability |
Standard tier virtual machines are available on all size instances. |
A-Series features |
|
D-Series features |
|
DS-Series features |
|
G-Series features |
|
The following summary of the capabilities of each virtual machine series is provided in the following table:
Size |
Available CPU Cores |
Available Memory |
Available Disk Sizes |
Maximum Data Disks |
Maximum IOPS |
Basic_A0 – |
1 – 8 |
768 MB – 14 GB |
Operating system = 1023 GB Temporary = 20-240 GB |
1 - 16 |
300 IOPS per disk |
Standard_A0 – Standard_A11 (Includes compute intensive A8-11) |
1 - 16 |
768 MB - 112 GB |
Operating system = 1023 GB Temporary = 20-382 GB |
1 - 16 |
500 IOPS per disk |
Standard_D1-D4 (High memory) |
1 - 16 |
3.5 GB – 112 GB |
Operating system = 1023 GB Temporary (SSD) =50 – 800 GB |
2 - 32 |
500 IOPS per disk |
Standard_DS1-DS4 (Premium storage) |
1 - 16 |
3.5 – 112 GB |
Operating system = 1023 GB Local SSD disk = 7 GB – 112 GB GB |
2 - 32 |
43 – 576 GB cache size 3200-50000 IOPS total |
Standard_G1 – G5 (High performance) |
2 - 32 |
28 GB – 448 GB |
Operating system = 1023 GB Local SSD disk = 384 – 6,144 GB |
4 - 64 |
500 IOPS per disk |
These sizing and capabilities are for the current Preview of Azure Virtual Machines, and they might expand over time. For a complete list of size tables to help you configure your virtual machines, please see: Sizes for Virtual Machines.
Design Guidance |
When you design solutions for using virtual machines, consider the following:
Capability Considerations |
Capability Decision Points |
Deployment order |
If you intend to deploy an application that may require compute intensive resources, we recommend that you provision a virtual machine to a cloud service with the largest virtual machine (such as Standard_G5) and scale it down to a more appropriate size. The reason is that virtual machines will be placed on the clusters that have the faster processors. It also makes scaling easier and it is more efficient to combine resources. |
Supportability |
The following are not supported in a virtual machine on Microsoft Azure:
|
With respect to IaaS solutions, images and disks that are used by Azure virtual machines are stored within virtual hard disks (VHDs). Azure virtual machines are compute instances that have VHDs attached. The VHDs provide persistent and temporary storage to the underlying operating system within the virtual machine.
Like other components of Azure, virtual machines require a storage account to store virtual machine data, which is in the form of VHDs. The VHD specification has several formats, including fixed, dynamic, and differencing. However, Azure supports only the fixed VHD format. VHDs are stored as page blobs in the target storage account, and they can be accessed through automation, the Azure API, or by the virtual machines themselves.
Design Guidance |
The following general considerations are provided when planning storage accounts for virtual machines:
Capability Considerations |
Capability Decision Points |
Storage considerations for virtual machines |
|
Scalability and storage throttling |
Microsoft provides general guidance about the number of virtual hard disks that should reside in a storage account. The following guidance is provided with the assumption that at any point all virtual machines could consume the available IOPS for all assigned disks, which could result in storage account throttling and have an adverse impact on the workload running within the virtual machine. The following guidance is provided:
For more information, please see Sizes for Cloud Services, Sizes for Virtual Machines. |
The placement of virtual machines within a given Azure subscription is critical in multiple ways. It is important to consider the consumers of the services that are associated with each virtual machine. It is equally important to understand the relationships each virtual machine has between the Azure resources these compute instances consume. Azure provides several constructs that support effectively placing virtual machines and associated resources within the Azure infrastructure.
Affinity groups tell the Fabric Controller to logically group dependent items together, such as the compute and storage of a given virtual machine. When the Fabric Controller is searching for the best suited container, it chooses where it can deploy these two elements in the same cluster, thereby reducing latency and increasing performance.
Affinity groups provide the following:
A Resource Group is a unit of management for operations such as deployments, updates, and standard lifecycle operations across a number of different Azure services, including virtual machines. A resource group provides:
Resource Manager enables the creation of reusable deployment templates that declaratively describe the resources that make up your application (for example, a website and a SQL Database). In essence, it provides an environment to handle the infrastructure and configuration information as code.
Resource Manager provides the following:
A consistent management layer - Get the same experience of deployment and management whether you work from the portal, command line, or tools in Azure.
Feature References | |
Importance of Azure Affinity Groups |
|
About Regional Virtual network and Affinity Groups |
|
Role-based access control in the Microsoft Azure portal |
http://azure.microsoft.com/en-us/documentation/articles/role-based-access-control-configure/ |
Using Azure PowerShell with Azure Resource Manager |
http://azure.microsoft.com/en-us/documentation/articles/powershell-azure-resource-manager/ |
Azure Resource Groups |
http://azure.microsoft.com/en-us/documentation/articles/azure-preview-portal-using-resource-groups/ |
Importance of Azure Affinity Groups |
It is important to tell the Azure Fabric Controller which resources must be aligned and placed near one another for performance, management, and so on. However, it is critical that Azure be informed of systems that must be placed across mutually exclusive boundaries to ensure that the availability of a given service that spans multiple virtual machines is maintained and not interrupted by planned maintenance activities within Azure.
Azure natively understands the tiers in a PaaS application; and thus, it can properly distribute them across fault and update domains. In contrast, the tiers in an IaaS application must be manually defined using Availability Sets. To meet a given SLA, availability sets are required when building IaaS solutions using Azure virtual machines.
Placing virtual machines in an availability set tells the Fabric Controller in Azure to place the virtual machines in separate fault domains. This ultimately provides redundancy for the services provided by the virtual machines when both systems are responsible for the same tier of service in an application. This is illustrated in the following diagram:
As illustrated, availability sets ensure that all instances of each tier have hardware redundancy by distributing them across fault domains, and are not taken down during an update.
If the virtual machines should have traffic distributed across them, you must group the virtual machines in a cloud service, and load balance them across a specific TCP or UDP endpoint. For more information, see Load Balancing Virtual Machines.
If the virtual machines receive input from another source (such as a queuing mechanism), a load balancer is not required. The load balancer uses a basic health check to determine if traffic should be sent to the node. It is also possible to create your own probes to implement application-specific health metrics that determine if the virtual machine should receive traffic. Load balancing mechanisms and constructs within Microsoft Azure are covered in the Networking section of this document.
The Azure Gallery contains a library of images that are provided by Microsoft and Microsoft partners, and they can be used to create IaaS virtual machines. Custom images that you upload to your Azure subscription are also available in the Azure Gallery. This section outlines each of the image types and how they can be utilized.
Image families are virtual hard disks (VHDs) that are managed and supported by Azure. In some instances, these VHDs may include preinstalled software and configuration. A few examples are a SQL Database, SharePoint, and BizTalk.
The goal of the image families is to make it easier for you to deploy an application into the Azure environment. These image families are updated once a month, going back as far back as two months.
Partner images are VHDs that were uploaded by partners so that their applications can be consumed by Azure customers. Virtual machines that are deployed by using partner images are not deployed on the same clusters as other virtual machine workloads. Examples partners that provide images include Oracle and Puppet Labs.
Azure actively maintains the images that are part of the Azure Gallery. In some instances, you may want to get the latest image. By default, the latest image is chosen when you deploy a virtual machine by using the portal. However, if you would like to use Azure PowerShell, please use the following code snippet:
$image_name
= (Get-AzureVMImage | Where { $_.ImageFamily -eq
$ImageFamily } | sort
PublishedDate
-Descending | Select-Object
-First 1).ImageName
For various reasons, many customers would like to upload their own images as opposed to using the images provided by Azure. Reasons range from internal security standards to cost, especially in the scenario of licensing.
A good example is for a SQL server. You may want to leverage a SQL server license in the cloud as opposed to paying the additional cost for the SQL Database license that is provided by Azure. In this scenario, you may want to upload your image and have it available as a gallery item for tenants in your company to consume. To upload a customized image, see Create and upload a Windows Server VHD to Azure..
Feature References | |
Virtual Machine Images in Azure |
https://msdn.microsoft.com/en-us/library/azure/dn790290.aspx |
Preparing SQL Image |
Azure Virtual Machines extensions are built by Microsoft and trusted third-party providers to enable security, runtime, debugging, management, and other features that you can take advantage of to increase your productivity. This section describes the various features that Azure Virtual Machines extensions provide for Windows and Linux virtual machines, and it points to documentation for each operating system.
Virtual Machines extensions implement most of the critical functionality that you want to use with your virtual machines, including basic functionality such as resetting passwords and configuring Remote Desktop Protocol (RDP). Because new extensions are added all the time, the number of possible features that your virtual machines support in Azure continues to increase.
By default, several basic Virtual Machines extensions are installed when you create your virtual machine from the image gallery, including IaaSDiagnostics and BGInfo (currently available for Windows virtual machines only), and VMAccess. However, not all extensions are implemented on both Windows and Linux operating systems at any specific time. This is due to the constant flow of feature updates and new extensions.
Virtual Machines extensions provide dynamic features that Microsoft and third-parties provide. The agent and extensions are added primarily through the Management portal, but you can also use the following options to add and configure extensions when you create a virtual machine or for existing virtual machines:
Extensions include support for Remote Debugging in Visual Studio, System Center 2012, Microsoft Azure Diagnostics, and Docker (to name a few).
Recommended: Evaluate each Virtual Machines extension and define which of them will be used as a standard for all of your Azure Virtual Machines. For example, you may standardize one antivirus extension to use in addition to the Azure PowerShell DSC extension. |
The Azure Virtual Machine Agent (VM Agent) is used to install, configure, manage, and run Azure Virtual Machines extensions. The following extensions are critical for enabling, re-enabling, or disabling basic connectivity with your virtual machines after they are created and running.
VM Extension Name |
Feature Description |
More Information |
VMAccessAgent (Windows) VMAccessForLinux (Linux) |
Create, update, and reset user information and RDP and SSH connection configurations. |
The Azure CustomScript extension automatically runs a specified script or a set of scripts on a running virtual machine after they are created and running.
Name |
Custom Script Extension |
Description |
Execution of Windows PowerShell on the target resource |
Applicability |
Provisioning in Azure:
|
Pros |
|
Cons |
|
Supported operating systems |
Windows, Linux |
The extensions detailed in the following table support different kinds of deployment and configuration management scenarios and features.
Name |
Chef |
Description |
With Chef, you can automate how you build, deploy, and manage your infrastructure. Your infrastructure becomes as versionable, testable, and repeatable as application code. For more information, see Get Chef. |
Applicability |
Provisioning in Azure: Infrastructure as a Service (IaaS) |
Pros |
|
Cons |
|
Supported operating systems |
Windows, Linux |
Name |
Puppet Enterprise |
Description |
With Puppet Enterprise, you can easily configure and manage your Windows environments. Whether you are managing a large datacenter, are taking advantage of Microsoft Azure, or a combination of both, Puppet Enterprise lets you manage your Microsoft Windows machines faster than ever. For more information, see Puppet Labs. |
Applicability |
Provisioning in Azure: Infrastructure as a Service (IaaS) |
Pros |
|
Cons |
|
Supported operating systems |
Windows, Linux |
Name |
Windows PowerShell DSC |
Description |
Desired State Configuration (DSC) is a management platform in Windows PowerShell that enables deploying and managing configuration data for software services and managing the environment in which these services run. For more information, see Windows PowerShell Desired State Configuration Overview. |
Applicability |
Provisioning in Azure: Infrastructure as a Service (IaaS) |
Pros |
|
Cons |
|
Supported operating systems |
Windows, Linux For more information, see Installing and configuring DSC for Linux |
Name |
System Center 2012 R2 Virtual Machine Role Authoring Guide |
Description |
Implements features for support by System Center. For more information, see System Center 2012 R2 Virtual Machine Role Authoring Guide - Resource Extension Package |
Applicability |
Provisioning in Azure: Infrastructure as a Service (IaaS) |
Pros |
|
Cons |
|
Supported operating systems |
Windows |
The extensions in this section provide critical security features for your Azure Virtual Machines.
Virtual Machines Extension Name |
Feature Description |
More Information |
CloudLinkSecureVMWindowsAgent |
Provides Azure customers with the capability to encrypt their virtual machine data on a multitenant, shared infrastructure, and fully control the encryption keys for their encrypted data in Azure Storage |
|
McAfeeEndpointSecurity |
Protects your virtual machine against malicious software |
|
TrendMicroDSA |
Enables Trend Micro Deep Security platform support to provide intrusion detection and prevention, firewall, antimalware, web reputation, log inspection, and integrity monitoring |
How to install and configure Trend Micro Deep Security as a Service on an Azure VM |
PortalProtectExtension |
Guards against threats to your Microsoft SharePoint environment |
|
IaaSAntimalware |
Microsoft Antimalware for Azure Cloud Services and Virtual Machines is a real-time protection capability that helps identify and remove viruses, spyware, and other malicious software, with configurable alerts when known malicious or unwanted software attempts to install itself or run on your system. |
|
SymantecEndpointProtection |
Symantec Endpoint Protection 12.1.4 enables security and performance across physical and virtual systems |
How to install and configure Symantec Endpoint Protection on an Azure VM |
Virtual Machines Extension Name |
Feature Description |
More Information |
IaaSDiagnostics |
Enables, disables, and configures Azure Diagnostics, and is also used by the AzureCATExtensionHandler to support SAP monitoring |
Microsoft Azure Virtual Machine Monitoring with Azure Diagnostics Extension |
OSPatchingForLinux |
You can use the OSPatching extension to configure operating system updates for your virtual machines, including:
|
Operating System Patching Extension Blog Post See also the Readme and source on Github at Operating System Patching Extension. |
An Azure cloud service is a compute capability within Microsoft Azure that is available to IaaS and specific PaaS workloads. From an IaaS perspective, Azure cloud services leverage virtual machines to provide a unit of access through public endpoints, load balancing, and scalability through auto-scale capabilities. This relationship is illustrated in the following conceptual diagram:
The following diagram shows a visual comparison between leveraging virtual machines and native PaaS capabilities within Azure Cloud Services:
Load balancing cloud services can be managed between and within each deployed cloud service. To load balance network traffic between deployed cloud services, Azure Traffic Manager can provide redundant and performant paths to the publicly routable virtual IP that is used by the systems within the cloud service.
Azure Traffic Manager provides control over the distribution of network traffic to public Internet endpoints. Traffic Manager works by applying an intelligent policy engine to Domain Name System (DNS) queries for the domain names of your Internet resources. Azure Traffic Manager uses three load-balancing methods to distribute traffic:
For more information, see Traffic Manager routing methods.
The following image shows an example of the round robin load-balancing method for distributing traffic between different cloud services.
To load balance network traffic across systems deployed within cloud services, the Azure Load Balancer can be used. Virtual machines in the same cloud service or virtual network can communicate with each other directly by using their private IP addresses. Computers and services outside the cloud service or virtual network can only communicate with virtual machines in a cloud service or virtual network with a configured endpoint.
An endpoint is a mapping of a public IP address and port to that private IP address and port of a virtual machine or web role within an Azure cloud service. The Azure Load Balancer randomly distributes a specific type of incoming traffic across multiple virtual machines or services in a configuration known as a load-balanced set.
The following image shows a load-balanced endpoint for standard (unencrypted) web traffic that is shared among three virtual machines for the public and private TCP port of 80. These three virtual machines are configured in a load-balanced set.
By default, a cloud service has a single public facing virtual IP (VIP) address that is assigned an IP address from the Azure IPv4 public address space. Each endpoint uses the VIP for the address component and a unique port. It is possible to add additional public facing VIPs to a cloud service load balancer to support the ability to have endpoints with different IP addresses but the same port.
Azure can also load balance within a cloud service or virtual network by using the internal load balancer. The internal load balancer can be used in the following ways:
Internal load balancing is also facilitated by configuring an internal load-balanced set.
The following figure shows an example of an internal load-balanced endpoint for an LOB application that is shared among three virtual machines in a cross-premises virtual network.
Feature References | |
Cloud Services |
https://azure.microsoft.com/en-us/documentation/services/cloud-services/ |
Multiple VIPs per Cloud Service |
https://azure.microsoft.com/en-us/documentation/articles/load-balancer-multivip/ |
Azure Load Balancer |
https://azure.microsoft.com/en-us/documentation/articles/load-balancer-internet-overview/ |
Internal Load Balancer |
https://azure.microsoft.com/en-us/documentation/articles/load-balancer-internal-overview/ |
Azure RemoteApp is a service that runs on Microsoft's Azure fabric. It provides an environment for Windows applications to be remotely accessed over the Internet. This environment is scalable to accommodate the end-user demand.
Azure RemoteApp technology expands on the native Windows on-premises service to provide a secure remote connection to applications hosted in Azure. Azure RemoteApp enables remote LOB applications to appear like they are running on the end user's local computer.
RemoteApp uses Microsoft Remote Desktop Protocol (RDP) and RemoteFX. RDP is a WAN optimized protocol to resist network latency and loss. RemoteFX provides a 3D virtual adapter for rendering images. Application delivery provides a highly reliable, fast, and consistent user experience to support content ranging from text to the streaming of multimedia via the Azure global network of datacenters.
Azure RemoteApp is available to run from the following supported end-user devices including:
End-users can use the client-side software from their preferred devices to access the Azure RemoteApp programs. Azure RemoteApp provides users with 50 GB of persistent storage. This storage is protected by the fault tolerant nature of Azure Storage accounts.
To test Azure RemoteApp, see: Azure RemoteApp
On the integrated Azure RemoteApp menu, select an application (for example, Excel). The Connecting to dialog will start and you may be prompted for credentials depending on the deployment type.
After the authentication process is complete, the RemoteApp will launch, and the user will have remote access to the application.
RemoteApp is available in two deployment types, which are referred to as collections.
The key differences between the hybrid and cloud collections are how the installation of software updates (patching) is handled. Cloud collection uses preinstalled images (from Office 365 or Office 2013), and the patching process is accomplished by Microsoft.
For both types of collections created from a custom template image, the subscription owner is responsible for managing the image and the applications. Domain-joined images can be managed by Windows Update, Group Policy, Desired State Configuration, or System Center Configuration Manager. After the updates to custom template image are applied, they are uploaded to Azure and the collections (hybrid or cloud) are updated to consume the new image.
Feature References | |
Introducing Microsoft Azure RemoteApp |
|
How to create a custom template image for RemoteApp |
http://azure.microsoft.com/en-us/documentation/articles/remoteapp-create-cloud-deployment/ |
How to create a hybrid collection of RemoteApp |
http://azure.microsoft.com/en-us/documentation/articles/remoteapp-create-hybrid-deployment/ |
How does licensing work in RemoteApp? |
http://azure.microsoft.com/en-us/documentation/articles/remoteapp-licensing/ |
Best practices for using Azure RemoteApp |
http://azure.microsoft.com/en-us/documentation/articles/remoteapp-bestpractices/ |
Azure RemoteApp FAQ |
http://azure.microsoft.com/en-us/documentation/articles/remoteapp-faq/ |
There are several considerations when deploying IaaS solutions within Microsoft Azure. Deployment considerations include cost, load balancing, resiliency, security, networking, and disaster recovery. Although not exhaustive, this section explores many of these considerations at a high level.
Cost is one of the top considerations for most organizations consuming services from Microsoft Azure. Being able to develop a predictable consumption model is key for the success of any solution deployed in Azure. The following table itemizes cost factors that you should consider:
Considerations |
Decision Points |
The size and number of virtual machines |
Windows Server licensing costs may be included. Compute hours don't include any Azure Storage costs that are associated with the Windows Server image running in virtual machines. These costs are billed separately. |
Azure Storage requirements |
Charges apply for Azure Storage costs that are required for virtual machines. |
Azure Virtual Network |
Charges apply for the creation of a virtual private network (VPN) connection between a virtual network and your VPN gateway. The charge is for each hour that the VPN connection is provisioned and available (referred to as a VPN connection hour). The connection should be 24 hours a day, seven days a week. All data transferred over the VPN connection is charged separately at the Azure standard data transfer rates. |
Network traffic |
Outbound data is charged based on the total amount of data moving out of the Azure datacenters through the Internet in a given billing cycle. This applies to any traffic, including traffic that traverses the VPN tunnel. In this document, outbound directory synchronization traffic is expected to represent the most significant portion of the network traffic, depending on the amount of directory changes. |
Support |
Azure offers flexible support options for organizations of all sizes. Enterprises that deploy business-critical applications in Azure should consider additional support options. |
Customers deploying applications in Azure Virtual Machines must consider load balancing their virtual machines. This is for application deployments that require more than one server. For customers wanting to use on-premises load balancing, this configuration is not supported today with Azure Virtual Machines. When considering load balancing in Azure Virtual Machines, note that Azure Virtual Machines currently only supports a round robin load-balancing configuration.
There are two levels of load balancing available for Azure infrastructure services:
Network level: Load balancing of incoming Internet traffic to different virtual machines of a cloud service, or load balancing of traffic between virtual machines in a cloud service or virtual network. This is done with the Azure Load Balancer.
Feature References | |
Load Balancing for Azure Infrastructure Services |
http://www.windowsazure.com/en-us/manage/windows/common-tasks/how-to-load-balance-virtual-machines/ |
About Traffic Manager Load Balancing Methods |
http://azure.microsoft.com/documentation/articles/traffic-manager-load-balancing-methods |
Internal load balancing |
http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview |
A key consideration for workloads deployed in Azure virtual machines is encryption for data-at-rest. For virtual machines, most customers seek the ability to perform platform encryption that they have the ability to control.
Currently, Microsoft BitLocker Drive Encryption is not supported because there is no way for Azure to handle the key management portion during virtual machine startup. Given that Azure consists of multiple physical servers, there is not a simple way to manage BitLocker encryption keys.
Third parties, such as CloudLink, have the capability to manage disk encryption keys on Windows and Linux platforms. You can use CloudLink to support encrypting virtual hard disks that are attached to virtual machines and that use published virtual machine extensions. Additional details about CloudLink are provided in the following table.
Feature References | |
Azure Virtual Machine Disk Encryption using CloudLink |
http://azure.microsoft.com/blog/2014/08/19/azure-virtual-machine-disk-encryption-using-cloudlink/ |
Encrypting Azure Virtual Machines with CloudLink SecureVM |
When The following table itemizes what to consider when you are deciding how to provision virtual machines on a virtual network:
Considerations |
Decision Points |
Name resolution |
When you deploy virtual machines and cloud services to a virtual network you can use Azure-provided name resolution or your own DNS solution, depending on your name resolution requirements. |
Enhanced security and isolation |
Because each virtual network is run as an overlay, only virtual machines and services that are part of the same network can access each other. Services outside the virtual network have no way to identify or connect to services hosted within virtual networks. This provides an added layer of isolation to your services. |
Extended connectivity boundary |
The virtual network extends the connectivity boundary from a single service to the virtual network boundary. You can create several cloud services and virtual machines within a single virtual network and have them communicate with each other without having to go through the Internet. You can also set up services that use a common back-end database tier or use a shared management service. |
Extend your on-premises network to the cloud |
You can join virtual machines in Azure to your domain running on-premises. You can access and leverage all on-premises investments for monitoring and identity for your services hosted in Azure. |
Use persistent private IP addresses |
Virtual machines within a virtual network will have a stable private IP address. We assign an IP address from the address range you specify and offer an infinite DHCP lease on it. You can also choose to configure your virtual machine with a specific private IP address from the address range when you create it. This ensures that your virtual machine retains its private IP address even when it is stopped or deallocated. For more information, see Configure a static internal IP address for a VM. |
There are two models for network configurations for Azure virtual machines: cloud-only and cloud-premises:
Feature References | |
About Virtual Network Secure Cross-Premises Connectivity |
https://msdn.microsoft.com/en-us/library/azure/dn133798.aspx |
Although the capabilities of Azure Virtual Machines are quite comprehensive, some native limitations exist, and they should be understood by organizations prior to deploying solutions in Azure. The following table explores these limitations.
Limitation |
Impact |
Workaround |
Auto-scaling |
The application environment does not automatically increase or decrease role instances for increase or decrease in load.
|
Utilize monitoring and automation capabilities such as the Azure Monitoring Agent and Azure Automation to dynamically scale and deploy application code to virtual machine instances in the environment. |
Load balancing |
Virtual machines are not load balanced by default
|
After the virtual machine is provisioned, create an Internal Load Balancer and associate it with the virtual machine. |
Multiple network adapters |
For more information, see: Multiple virtual machine network adapters and network virtual appliances in Azure |
|
Density |
The total virtual machines per virtual network currently is 2048. |
Create a new virtual network and extend the network by connecting virtual networks together. |
Concurrent TCP connections |
Concurrent TCP connections for a virtual machine or role instance = 500 K. |
|
Static IP address or multiple IP address |
|
Azure Diagnostics provides Azure extensions that enable you to collect diagnostic telemetry data from a worker role, web role, or virtual machine running in Azure. The telemetry data is stored in an Azure Storage account. It can be used for debugging and troubleshooting, measuring performance, monitoring resource usage, traffic analysis and capacity planning, and auditing.
The following table explains the types of telemetry Azure Diagnostics can collect.
Data Source |
Description |
IIS logs |
Information about IIS websites |
Azure Diagnostic infrastructure logs |
Information about diagnostics |
IIS failed request logs |
Information about failed requests to an IIS site or application |
Windows Event logs |
Information sent to the Windows event logging system |
Performance counters |
Operating system and custom performance counters |
Crash dumps |
Information about the state of the process in the event of an application crash |
Custom error logs |
Logs created by your application or service |
.NET EventSource |
Events generated by your code using the .NET EventSource class |
Manifest-based ETW |
Event Tracing for Windows (ETW) events generated by any process |
Operational Insights is an analysis service that enables IT administrators to gain deep insight across on-premises and cloud environments. It enables you to interact with real-time and historical machine data to rapidly develop custom insights, and provides Microsoft and community-developed patterns for analyzing data.
For more information about these topics, please refer to the Cloud Platform Integration Framework section later in this document.
Azure Platform-as-a-Service (PaaS) workloads share some common elements with IaaS, but they also have some key differences that should be considered when they are deployed. This service has been a part of the Azure offering since its inception, and in many ways is a desirable service to realize the true value of cloud computing.
A primary goal of PaaS is to remove the need to manage the underlying virtual machines. This allows customers to focus on the real value of the application, which is the functionality that it provides, not the underlying operating system or virtual machine.
PaaS provides great value in that management duties are significantly smaller for most organizations. The ability for Microsoft to maintain the operating system and virtual machines, and keep them patched with the latest security updates is a key differentiator to many cloud solutions in place today.
Another key benefit for targeting PaaS for applications and services is the dynamic scaling features that it affords. A side benefit of not managing the underlying virtual machines is the ability to scale the workloads to upper limits without any preplanning. New instances can be created and destroyed by the Azure platform and controlled by the customer. The real value of auto-scaling is in full effect with PaaS.
The integration of application deployment and release management into the service offering makes PaaS very desirable for customers looking to automate and orchestrate deployment of their application. Every application that gets deployed to Azure is a self-contained, packaged asset. This package is simply deployed to a virtual machine that is provisioned by the platform based on a configuration that the customer provides.
This makes automated and continuous integration of application code a real option. Partnered with the concept of deployment slots to allow VIP swapping makes deployments to the cloud a more predictable and safer deployment model. In addition, rolling back to a snapshot is possible with these options.
Feature References | |
Cloud Services Explained |
|
Websites explained |
http://azure.microsoft.com/en-us/documentation/articles/fundamentals-application-models/#websites |
Cloud Service details / architecture |
https://msdn.microsoft.com/en-us/library/azure/jj155995.aspx |
Large Scale Services in Azure |
https://msdn.microsoft.com/en-us/library/azure/jj717232.aspx |
Development Considerations |
https://msdn.microsoft.com/en-us/library/azure/jj156146.aspx |
Platform updates in PaaS |
https://msdn.microsoft.com/en-us/library/azure/hh472157.aspx |
Deploying Azure Cloud Service with Release Management |
Mandatory: Azure solutions must contain at least two instances if running web or worker roles. For apps (such as Web Apps), this is not a requirement because the design has inherit fault tolerance built in. | |
Recommended: Azure solutions should contain multiple upgrade domains to avoid outages caused by updates to the guest and host by the platform. This is a unique item that exists for PaaS services. | |
Optional: Azure solutions can optionally contain auto-scaling configurations to increase and decrease instance counts for the service, based on a schedule or metric. |
Design Guidance |
For more information, see these applicable Azure design patterns:
The common design patterns for PaaS workloads can be split into two primary categories:
Typically, web-based workloads for modern frameworks work on PaaS with few changes.
Web applications that use a framework prior to .NET Framework 4.0, usually require some code changes to fit with this cloud model. The key points to remember are whether there are extra components that need to be installed as part of the application, for example, custom ISAPI filters, drivers, or security models that require full trust. These can be adapted to PaaS web applications, but they require varying levels of changes.
The other very important point to remember when using both web and back-end workloads is that the application needs to be stateless. Applications that require additional components for state management and tight coupling of the tiers of the applications tend to have problems when using modern cloud scale models.
At a minimum, it's important to understand all the components that make up the application and the architecture for the data, business, and front-end tiers. Additionally, its key to understand if the deployment can be a file-based deployment and if the application is self-contained from a binary perspective.
Scenario |
Model |
Points to Consider |
Web-based workload |
Web-based applications |
|
Back-end workload |
Service-based applications |
|
Azure Cloud services, in the context of PaaS, provides the units that contain the roles instances that comprise a given application. Azure Cloud Services bind to the virtual IP (VIP) that services request and load balance requests over underlying role instances. Azure Cloud Services can be considered a unit of deployment that can be versioned and stored. When you deploy a cloud service, it contains a package that defines the service (such as networking, load balancing, or role instance counts) in addition to the actual code for the application.
This model of deployment makes it very easy to control and deploy specific versions of an application. A cloud service can have multiple deployments running simultaneously. This is possible because of a concept of deployment slots that are implemented with cloud services. There are two deployment slots available for each cloud service.
The intention is for the staging slot to be used to stage new or updated versions of the cloud service, which are assessable to the deployment or DevOps teams for testing, and the production slot is used to host the production deployment of the application.
Cloud services also contain the binaries and scripts to install additional components to the PaaS instance at startup. These are necessary because the deployment that is running in Azure, will move inside the Azure datacenter. As updates are deployed to the host and guest operating systems, the PaaS instances will be moved to other hosts. This means that everything required to make the PaaS instance and application run is required to be a part of the cloud service package.
Feature References | |
Cloud services explained |
https://msdn.microsoft.com/en-us/library/azure/jj155995.aspx |
Startup tasks in cloud services |
https://msdn.microsoft.com/en-us/library/azure/hh180155.aspx |
Tools for packaging and deployment |
https://msdn.microsoft.com/en-us/library/azure/gg433055.aspx |
Manage guest operating system updates |
https://msdn.microsoft.com/en-us/library/azure/ff729422.aspx |
Recommended practices for large scale Web Apps |
Mandatory: Cloud services must contain all the assets, including code and other installations required to run the application. Everything must be included in the cloud service package, including scripts for installation. | |
Recommended: Give consideration to deployment models that will be used when updating the application. There are a few options to understand, and they each have pros and cons. | |
Optional: Cloud services can contain multiple running deployments in the form of production and testing or staging. | |
Design Guidance |
It is best to understand that Azure Cloud Services, in the simplest form, provide a container or package wrapper for applications that are deployed to Azure. This type of application deployment model is not necessarily new. You will find similar models in client applications, such as the .appx format used by modern Windows applications.
The core idea is to build, version, and deploy the service package as a unit. This will make it easier for the DevOps or release management team to deploy updates to the application and to roll back if there are unforeseen side effects from an application update.
It is also important to realize that scaling Azure Cloud Services in the PaaS model is trivial. Because the application and service definition are wrapped in a package, deploying more instances of this is simply a matter of telling the Azure platform how many instances you want.
Application Type |
Description |
Web role |
This role is used primarily to host and support applications that target IIS and ASP.NET. The role is provisioned with IIS installed, and it can be used to host front-end, web-based applications. |
Worker role |
This role is used primarily to host and support service applications. These applications target back-end processing workloads. They can be long running processes and can be thought of as providing services in the cloud. |
It is important to remember that web and worker roles have dedicated underlying virtual machines per instance. Typically, this is transparent to the consumer, but it's particularly important from a diagnostics perspective. You can enable and log on to the underlying virtual machine if needed; however, this option is disabled by default.
Important things to keep in mind when deploying to the PaaS model are:
Feature References | |
Web and worker roles |
https://msdn.microsoft.com/en-us/library/azure/hh180152.aspx |
IIS configuration in PaaS |
https://msdn.microsoft.com/en-us/library/azure/gg433059.aspx |
Configure web role with multiple sites |
https://msdn.microsoft.com/en-us/library/azure/gg433110.aspx |
Mandatory: Web and worker roles require at least two instances to provide fault tolerance for the automatic maintenance nature of PaaS. | |
Recommended: Web and worker roles should be considered if the application requires installing binaries on the web or application servers. | |
Optional: Virtual networking is common to allow the communication needed for databases, management services, and other services, but it is not a hard requirement for deploying an application via PaaS to a web or worker role. |
Design Guidance |
Web roles are specifically tailored for IIS-based applications. This limits their use to Windows applications that can target the Microsoft operation system and services. The common design pattern is to configure the scale unit for the instances and ensure that multiple (at least 2) are used for production workloads. This is done by simply setting the configuration in the service definition.
Worker roles specifically target service applications (non-web based). As such, error handling that would be required for an out-of-band management service application should be employed. If exceptions are not handled in the service inner loop, the role instance will be restarted, which will result in downtime for processing.
Previous PaaS applications called Azure Websites have been integrated into a model that is called Azure App Service. App Service is comprised of the following subcomponents:
Web Apps is the new term used to describe what was previously named Azure Websites. The Web Apps feature in Azure App Service is a type of PaaS workload that differs slightly from the traditional web and worker role applications. The model is based on decoupling from the underlying infrastructure—even more than traditional PaaS applications. This highly reduces the operational burden when maintaining applications because the maintenance is no longer required for the infrastructure, and it shifts to the underlying application.
This model primarily removes the customer from any connection with the underlying virtual machines that are hosting the application. This means components such as Remote Desktop are not an option and that the installation of components and software is not something a customer can directly execute.
There are extensions available via the Azure portal (Azure Marketplace), which are essentially packages of software that have been tested and can be added to a website deployed via Web Apps.
Web Apps are primarily used to provide a platform to host various web applications and web services. Additionally, Web Apps can run back-end processes via a service offering in Azure WebJobs.
WebJobs encapsulate an existing executable or script that provides some processing output. WebJobs can also be scheduled or run on demand. For more information about WebJobs, see Azure WebJobs documentation resources.
Deployment of Web Apps is in some ways different from other PaaS and IaaS deployment models. Supported deployment models include:
There are some fundamental differences in deployment slots in Web Apps as compared with the web and worker role deployments. Web Apps supports up to five deployment slots.
Web Apps is deployed in an App Service plan, previously called a Web Hosting plan. The service plan represents a set of features and capacity that can be contained and shared with multiple Web Apps in an Azure App Service. The following pricing tiers are provided:
For apps to share a hosting plan, they need to be in the same subscription and geographical location. In an Azure App Service, an app can be associated with only a single app hosting plan at one time.
Feature References | |
App Services explained |
https://msdn.microsoft.com/en-us/library/azure/dn948515.aspx |
App Services deep dive |
http://channel9.msdn.com/Series/Windows-Azure-Web-Sites-Tutorials |
App Service migration tools |
Mandatory: These lighter weight PaaS services do not allow direct access to the underlying virtual machines. This means no installation of components on the underlying web server (outside of the application folder). | |
Recommended: Match the service offering with the type of workload. API apps differ from Web Apps because one needs more focus on the back end and the other needs more focus on the front end. | |
Optional: Plan for capacity needs. Although some thought should be given to how many instances or sizes should be used, these can easily be changed later. The focus here is on rapid deployment. |
Design Guidance |
Azure App Service is one of the latest models to be employed on Azure. The idea is to simplify the management and cost of running a variety of services in PaaS. This means a service performance level can be set at the service level and then the various services can be deployed inside this service.
For example, a web app could be deployed that is using an API app or a Logic App, and the cost and performance levels are set at the service level. This simplifies the deployments because each app doesn't need to be configured and billed separately.
The app model is growing very fast, and it makes integrating deployed services, APIs, and applications much simpler and faster than previous PaaS models, such as web roles.
Azure SQL Database is the realization of one of the most popular relational databases in a managed, multitenant, PaaS model. When choosing a database deployment model, there are key factors to consider to ensure the end goal is met. Although this is attractive for many reasons, it is important to understand that there are key differences between running an Azure SQL Database and an on-premises or IaaS virtual machine with SQL Server installed.
A key point is that system-level functions cannot be performed from Azure SQL Database. This includes database backups, system level profiling, and extensions to SQL Database, including FILESTREAM and CLR extensions.
Operations such as backups have been accommodated by extensions to TSQL, which allows for the backups without the need to create a backup media object (which would tie to a file system object on the operating system running SQL Server). All other operations such as management for the SQL Database instance are accommodated by using dynamic management views (DMVs) instead of extended stored procedures.
The key benefits of leveraging Azure SQL Database over traditional deployments of SQL Server is that databases and servers can be created in seconds, not hours or days. The other added benefit is that there is less need to focus on the infrastructure and processes to replicate and back up the data in the databases.
By default, Azure SQL Database commits the data to three separate instances in the same Azure region. This is similar to how Azure Storage commits all Writes to three stamps in Azure. This provides the high availability feature and protects against hardware failures inside Azure.
Feature References | |
Understanding Azure SQL |
|
Development considerations |
|
Performance and scaling |
https://msdn.microsoft.com/library/azure/e6f95976-cc09-4b46-9d8c-4cf23119598d |
Mandatory: Understand the performance and management differences between a traditional SQL Server database and an Azure SQL Database. | |
Recommended: Analyze databases to be migrated to Azure SQL Database for incompatibles that might be present (for example, FILESTREAM). | |
Optional: Leverage built-in tools for BACPAC and DACPAC to move databases to Azure SQL Databases. |
Design Guidance |
Running relational databases in the cloud has deep implications to the application, performance, and resiliency. In many ways, the database is as important as the application in terms of how to run it effectively in a public cloud. There are pros and cons to running SQL Server in an IaaS environment as well as Azure SQL Database.
Some fundamental considerations are encryption requirements, performance requirements, and feature requirements. Although data can be encrypted by the application and stored in an SQL Database, TDE is not yet supported in SQL Databases.
If this is required, a SQL Server in your IaaS environment would be the preferred target. Performance of an Azure SQL Database can appear to be slower, but one must consider that each write, will be committed to three (3) databases in the local datacenter (synchronously) and other is asynchronously if geo replication is enabled. This affords the benefit of having to maintain as many local backups (built-in backups) with the downside of performance. For high TPS loads, consider adding a caching layer to insulate the application from the performance impacts of multiple commits as much as possible.
Advanced features in SQL Server that require access at the disk level or operating system level obviously will not work the same with SQL Database. For example, CLR integration, backup sets, and FILESTREAM tables are not possible with Azure SQL Database.
Leveraging Azure SQL Database has some unique security considerations. Azure SQL Database has a public facing IP address that is accessible by anyone. Communications to Azure SQL Database can be secured by using the SQL Server firewall and the per SQL Database firewall.
This allows for specific IP addresses that can connect to the database. When using ExpressRoute and public peering to access Azure SQL Database, the access flows through a network address translation (NAT) interface. This means the NAT address has to be specified in the firewall rules for Azure SQL Database, and therefore, it does not allow the ability to specify end-to-end security.
Scenario |
Model |
Points to Consider |
FILESTREAM needed |
Shred the file objects to a blob in Azure Storage and store indexes in SQL Server |
FILESTREAM is not available with Azure SQL Databases |
SQL backups |
Use geo-replication and point-in-time backups for Azure SQL Database workloads |
Traditional backup sets are not supported in Azure SQL Databases |
As discussed earlier, there is often the need to run processing that might not have an immediate UI and runs as a service in the background. Traditional PaaS offering was to leverage a worker role and implement the business logic via custom code.
Azure Batch offers a similar type service, but with a unique twist. It has been designed to run background processes, but it is centered on high-performance data. It can provide scheduling, auto-scaling of compute resources, and partitioning of jobs.
This type of service targets the following type of workloads:
Feature References | |
Azure Batch Technical Overview |
http://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/ |
Azure Batch APIs |
https://msdn.microsoft.com/en-us/library/azure/dn820177.aspx |
Mandatory: Define pools of virtual machines that will perform the underlying work for Azure Batch jobs. | |
Recommended: Analyse the workload to determine which model is the better fit— Azure Batch or Azure Batch apps. | |
Optional: Leverage the REST API to output monitoring and telemetry to existing systems. |
Design Guidance |
Azure Batch is a good consideration for workloads in which an existing process or executable is used to process the data. For this to work effectively, the data should be in a format that can allow parallelization (which cuts the data into several chunks). This service can be highly effective for custom processing and it is easy to configure.
Azure HDInsight is the implementation of Hadoop as a service in Azure. The goal of this service is to enable customers with the ability to create Hadoop cluster services in seconds and minutes instead of hours and days. This significantly reduces the cost of this big data service. Additionally, the service provides storage in the form of the Hadoop Distributed File System (HDFS), which has become the standard for Hadoop clusters. Azure extends this concept to allow Azure Storage to be leveraged by Hadoop.
At its core, HDInsight can be run on Linux-based or Windows-based servers. This makes using Hadoop easy for those coming from a Linux background and approachable by those who use Windows.
The HortonWorks Data Platform (HDP) is the Hadoop distribution used by HDInsight. Additionally, there are several high-level configurations for running Hadoop, which can be used to optimize the cluster based on the operations and activities it will target. Along with this are other components that have been developed primarily by the open source community. These customize the system for specific types of workloads.
Feature References | |
Hadoop on Azure |
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/ |
Components matrix |
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/ |
Apache Hadoop Core |
|
Using Pig with Hadoop |
http://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-pig/ |
Mandatory: Create storage accounts for data repositories for HDInsight. Also, be sure to deprovision your clusters when not in use because the cost for running thousands of cores can add up quickly. | |
Recommended: Know the type of data you have in your system, and think about what actions are most important for your workloads. Get a good sense for your data and what is most important to your workloads. This will help guide what components in HDInsights can be leveraged to take best advantage of the platform services. | |
Optional: Spend some time checking what is already built by the open-source communities that you can use with must less effort than writing from scratch. |
Machine learning and data science is an exciting technology in today's market. Unlocking the insights that our data holds is key to getting the competitive advantage that most companies need for success.
Although the language around neural networks and machine learning have been in place for decades, only recently has computing power been able to provide the computational performance required to run these algorithms at a large scale. This service allows modeling data and algorithms. This provides a low bar for entry and users can start simply with a web browser. Azure Machine Learning also provides a tool set called Azure Machine Learning Studio.
This service offering turns the algorithms and code from languages (for example, R and Python) into services that can target user's data. Users upload more data, and it can be compared and processed against models created by data scientists. Machine Learning also provides training with existing data to feed the models.
This offering lowers the bar for getting the most out of a predictive analytics system. Historically, it required very different skill sets to develop the models and algorithms as opposed to exposing the results via the web or services. Machine Learning marries these primary skills and focuses on the underlying business logic.
Feature References | |
Machine learning overview |
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-machine-learning/ |
Azure for Research |
http://research.microsoft.com/en-us/projects/azure/default.aspx |
Machine Learning Studio |
http://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-ml-studio/ |
Publishing the API |
Mandatory: Bring your R and Python code libraries, and understand how to leverage Machine Learning Studio to provide a streamlined development experience. | |
Recommended: Partition your logic to create consumable services by using the platform services of Machine Learning. | |
Optional: Explore what the data science community has already created and work to extend or enhance these to speed your development effort and time. |
Azure provides high-performance computing (HPC) in the form of high-performance virtual machines. These virtual machines are tailored to support this specific computing need via features such as a back-end network with MPI latency under three microseconds and up to 32 Gbps of throughput. The back-end network leverages RDMA to enable scaling workloads beyond the typical limits (even for cloud platforms) to thousands of cores.
Azure supports a high-performance capacity of up to 500 MB of memory and 6.9 TB of SSD disk performance, up to 32 CPUs per virtual machine. When combined with the Microsoft HPC Pack, a system can be architected to run on-premises in a Windows compute cluster, and then extend to Azure as capacity demands dictate. This allows organizations to also run HPC wholly in Azure if desired.
Combined with a rich ecosystem of applications, libraries, and tools, Azure provides a premier platform for high-performance computing.
Feature References | |
HPC overview |
https://msdn.microsoft.com/en-us/library/azure/dn482130.aspx |
Cluster pack for Azure IaaS |
https://msdn.microsoft.com/en-us/library/azure/dn518135.aspx |
Running MPI apps |
https://msdn.microsoft.com/en-us/library/azure/dn592104.aspx |
Hybrid cluster |
http://azure.microsoft.com/en-us/documentation/articles/cloud-services-setup-hybrid-hpcpack-cluster/ |
Mandatory: Determine a strategy for on-premises, cloud, and hybrid clustering for HPC workloads. | |
Recommended: Scale the application resources dynamically, to take advantage of extreme size virtual machines only when it makes sense. | |
Optional: Make changes to the applications to allow disconnecting the tiers to take advantage of features, such as queuing to allow scaling of independent compute clusters. |
A key component for applications that require global reach is to be able to reach customers in the quickest manner. One of the industry standards for distributing static content (typically images and video) is content delivery networks (CDNs).
Although this type of service has been available for years, the Azure platform integrates the CDN with existing data in Azure Blob Storage. This integration enables hosting the applications for a global reach, and makes it easier and less expensive to implement.
Typically, CDNs are used for static content, such as media, images, or documents, which is read more than written to. With an end goal of pushing content globally, this integration with Azure results in a faster response time for clients, based on their location. It also facilitates automatic redundancy and replication of content.
Feature References | |
CDN overview |
http://azure.microsoft.com/en-us/documentation/articles/cdn-overview/ |
POP locations for CDN in Azure |
https://azure.microsoft.com/en-us/documentation/articles/cdn-pop-locations/ |
Integration of CDN |
Mandatory: Determine regions to target for the CDN and update the application to use the root URI of the CDN rather than for the local content. | |
Recommended: Leverage parameters to vary the caching characteristics and lifetime of the content cache. | |
Optional: Map the CDN content to a custom domain name. |
Application-level caching can be achieved with a variety of products, some from Microsoft and others from external vendors. There are multiple types of caching options in Azure, including Managed Cache Service, In-Role Cache, and Redis Cache.
Redis Cache is a cache-as-a-service offering for the Azure platform. This means that the Azure platform manages the underlying infrastructure to host the caching servers. From an application point of view, the service can be accessed via the same Redis clients that have been in use since Redis was created. These vary based on platforms, and they are available for most of the popular selections (for example, Java, Node, and .NET).
Redis goes beyond a simple key/value pair so it can cache entire data structures, such as collections and sets. Redis also supports non-blocking, first synchronization, and automatic reconnections, which support cache replication to increase uptime.
Feature References | |
Redis overview |
|
Caching data in Redis |
|
Management in Azure Portal |
|
Cache planning |
Mandatory: Understand which tier of service will be required and implement the Redis client in the application. | |
Recommended: Use the advanced structure caching options with Redis to simplify the application caching code. | |
Optional: Set up policies for cache rejection, lifetime, and so on. |
Azure Service Bus is one of the core services, and it provides a high-performance, durable-messaging service. But it is actually a bit more than this—Service Bus offers queueing and relay services.
Service Bus queues provide an option to decouple any processing from the request pipelines. This type of architecture is very important, especially when migrating workloads to the cloud because loosely coupled applications can scale, and they are more fault resilient.
Service Bus can use a variety of models, from simple queue-based storage, to topics (which target and partition messages in a namespace). You can even use Event Hubs on top of Service Bus to service very large client bases, where input to Service Bus will include several thousands to millions of messages in rapid succession.
Feature References | |
Service Bus overview |
|
Sample scenarios |
|
Event Hubs |
|
Application architecture |
Mandatory: Determine which model to use when storing messages in Service Bus, based on transaction, lifetimes, and message rates. | |
Recommended: Modify applications to provide transient fault handling to compliment decoupling message posting from message processing. | |
Optional: Leverage Event Hubs to handle large scale intake of Service Bus messages. |
In addition to providing business logic, a key consideration when implementing web-service workloads is to allow for features such as throttling, authentication, and partitioning services. Until this point, developers were tasked with building code for this infrastructure, which became the framework for the web services that were deployed.
API Management Service was designed to accommodate this need. It provides this infrastructure with very little effort. Developers can concentrate on the business logic of the web services instead of how they are deployed. This also allows deploying the underlying web services on different servers and different technologies, as needed.
API Management Service can leverage existing on-premises web services in addition to cloud-deployed services. Services such as throttling, rate limits, and service quotas can be applied in a central point, similar to load balancing. Services are based on rules that are established in the API Management service. This allows consolidating services from multiple back-ends to a single entry point for service consumers.
Feature References | |
API Management overview |
http://azure.microsoft.com/en-us/documentation/articles/api-management-key-concepts/ |
Getting started |
http://azure.microsoft.com/en-us/documentation/articles/api-management-get-started/ |
API management for developers |
https://msdn.microsoft.com/en-us/library/azure/dn776327.aspx |
Securing APIs |
http://azure.microsoft.com/en-us/documentation/articles/api-management-howto-mutual-certificates/ |
Mandatory: Configure policies for services and profiles for existing web services to use API Management. | |
Recommended: Protect web services with API Management rate limits and quota policies. | |
Optional: Customize the developer portal to allow for developer registration and subscription models. |
Probably one of the most common components in applications is a platform or infrastructure that can support searching for application data. There are quite a few products offered on the market by third-party and Microsoft. Azure Search was created to provide the search-as-a-service offering in Azure.
While Azure Search does not offer a crawler to index the application data sources, it does provide the infrastructure to intake the index files and provides interfaces for the actual search functions. This service is targeted at developers, and it is not a service that is directly customer facing. At the core, it's a web service that follows the model of a REST-based interface for connected applications.
The index schemas are expressed in JavaScript Object Notation (JSON). Essentially the index contains a list of fields and associated attributes. These attributes are:
Feature References | |
Azure Search Overview |
|
Getting started |
http://azure.microsoft.com/en-us/documentation/articles/fundamentals-azure-search-chappell/ |
Azure Search API |
https://msdn.microsoft.com/en-us/library/azure/dn798935.aspx |
Creating indexes |
https://msdn.microsoft.com/en-us/library/azure/dn798941.aspx |
Mandatory: Construct and update indexes for Azure Search consumption via back-end services. PaaS-based worker roles work well for these types of jobs. | |
Recommended: Add additional attributes to the index to support advanced features, such as automatic suggestions. | |
Optional: Build monitoring data integration to existing monitors to ensure storage or indexes don't exceed the limits for the service. |
Design Guidance |
For more information about applicable Azure design patterns, see Azure Search Tier (Azure Architecture Patterns).
As with IaaS, there are several considerations when deploying solutions within Microsoft Azure, specifically when targeting PaaS models. Deployment considerations include deployment methods, load balancing, resiliency, security, and networking. This section covers these areas at a high level.
When running services in PaaS, it's important to understand and configure the services such that upgrades to the application and upgrades as part of the Azure platform do not result in outages or downtime to the application.
Models for the application lifecycle vary from simple to complex. Some of the most important tradeoffs are detailed in the following table:
Considerations |
Decision Points |
Upgrade domains |
This service is deployed to Azure (specifically for PaaS web and worker roles), and it includes the concept of upgrade domains. It's important to configure the services to use multiple upgrade domains to avoid unnecessary outages when new deployment or upgrades to the application or service are initiated. |
Deployment slots |
Deployment slots can be used to test new versions or upgrades without affected the production application. A model for using staging slots before releasing to production can enable better testing to avoid downtime. |
Web Deploy |
Web Deploy is a way to deploy services to cloud services in Azure. Although this is simple, user interaction is typically required with this model. This makes a good option for developer and smaller apps, but larger apps might require more governance and control for the deployments. |
Continuous integration |
Continuous integration is a great option for larger applications and organizations that require automating deployments. This allows gated check-ins (approval) and continuous check-ins (triggered). |
Customers who deploy applications as PaaS in Azure must consider load balancing as a core part of the application. This is a must for applications hosted in PaaS because even if hardware failures never happen (which is unlikely), the servers need to be upgraded (guest and host upgrades). This means that these instances will be moved at some point.
The Azure fabric will ensure that it doesn't take all the server instances down at one time, but this requires the use of at least two instances (fault and upgrade domains allow the fabric to operate in this way).
Keep in mind that are two levels of load balancing available for Azure PaaS services:
Feature References | |
Load Balancing for Azure Services |
http://azure.microsoft.com/documentation/articles/load-balancer-overview |
About Traffic Manager Load Balancing Methods |
http://azure.microsoft.com/documentation/articles/traffic-manager-load-balancing-methods |
Internal load balancing |
http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview |
When deciding to provision PaaS instances that need to communicate with other servers or services in Azure, a virtual network is required. Some areas of consideration include:
Considerations |
Decision Points |
Name resolution |
When you deploy virtual machines and cloud services to a virtual network, you can use Azure-provided name resolution or your own DNS solution, depending on your name resolution requirements. For information about name resolution options, see Name Resolution (DNS). |
Enhanced security and isolation |
Because each virtual network is run as an overlay, only virtual machines and services that are part of the same network can access each other. Services outside the virtual network have no way to identify or connect to services hosted within virtual networks. This provides an added layer of isolation to your services. |
Extended connectivity boundary |
The virtual network extends the connectivity boundary from a single service to the virtual network boundary. You can create several cloud services and virtual machines within a single virtual network and have them communicate with each other without having to go through the Internet. You can also set up services that use a common backend database tier or use a shared management service. |
Extend your on-premises network to the cloud |
You can join virtual machines in Azure to your domain running on-premises. You can access and leverage all on-premises investments related to monitoring and identity for your services hosted in Azure. |
Use persistent public IP addresses |
Cloud services within a virtual network have a stable public VIP address. You can also choose to configure your cloud services when you create it by using a reserved public IP address from the address range. This ensures that your instances retain their public IP address even when moved or restarted. See Reserved IP Overview. |
There are two models for network configurations for Azure cloud services: cloud-only and cloud-premises virtual network configurations:
Feature References | |
How to create a virtual network |
https://azure.microsoft.com/documentation/articles/virtual-networks-create-virtual network/ |
Although the capabilities of Azure Virtual Machines are quite comprehensive, there are some native limitations that exist and should be understood by organizations prior to deploying solutions in Azure. These include:
Consideration |
Impact |
Workaround |
Auto-scaling |
The application environment does not automatically increase or decrease role instances for increased or decreased loads.
|
Utilize monitoring and automation capabilities such as the Azure Monitoring Agent and Azure Automation to dynamically scale and deploy application code to cloud service instances in the environment. |
Load balancing |
Application instances are not load balanced by default.
|
After the cloud service is provisioned, create an Internal Load Balancer and associate it with the cloud service endpoint. |
Density |
The total cloud services per subscription is 20. |
Leverage multiple subscriptions to provide the proper level of segmentation. |
Azure Diagnostics are Azure extensions that enable you to collect diagnostic telemetry data from a worker role, web role, or virtual machine running in Azure. The telemetry data is stored in an Azure Storage account and can be used for debugging and troubleshooting, measuring performance, monitoring resource usage, traffic analysis, capacity planning, and auditing.
Azure Diagnostics can collect the following types of telemetry:
Data Source |
Description |
IIS logs |
Information about IIS websites. |
Azure Diagnostics infrastructure logs |
Information about Azure Diagnostics. |
IIS failed request logs |
Information about failed requests to an IIS site or application. |
Windows Event logs |
Information sent to the Windows event logging system. |
Performance counters |
Operating system and custom performance counters. |
Crash dumps |
Information about the state of the process in the event of an application crash. |
Custom error logs |
Logs created by your application or service. |
.NET EventSource |
Events generated by your code using the .NET EventSource class. |
Manifest-based ETW |
ETW events generated by any process. |
Microsoft Azure networking leverages a combination of software-defined networking within the Azure fabric infrastructure and physical networking at the edge where customers interface with Azure. Within the Azure fabric infrastructure, there is the concept of virtual networks, subnets within the virtual networks, and the network gateways that allow connectivity between virtual networks and customer networks.
At the edge of the Azure fabric infrastructure, enterprise customers typically use physical devices to provide communications from the on-premises enterprise datacenter environments, where small or medium businesses might use virtual devices or only connect from the client computer to the Azure environment.
This section covers the concepts and planning guidance required for networking infrastructures that interface with and exist within the Microsoft Azure platform.
Connecting to Azure can be accomplished directly by enterprise customers or using a cloud service provider as the interface. When customers connect directly, they create subscriptions, establish connections, and are responsible for managing the private network interfaces and manage all aspects of establishing services within Azure. When customers leverage a cloud service provider, they offload various aspects of the subscription, networking, identity, and management of the Azure environment to the cloud service provider.
From a networking perspective, cloud service providers offer two types of network connectivity to Azure for customers.
When leveraging Azure networking technologies, a key consideration are the capabilities that can be implemented, depending on the API and portal approach that you are targeting. Before a capability can be used, the implementation API (ARM or ASM) has to enable that capability. Most common Azure networking technologies are available using the ASM API, however advanced capabilities logging or diagnostics data are only implemented using the ARM API. While the transition between ASM and ARM APIs continue, it is important to verify that a specific area of functionality is available in either API before committing to a path of implementation.
For customer-managed environments, the customer has the option of choosing whether to use the existing ASM API and migrate to the ARM API when required networking capabilities are available in ARM. Conversely, they can choose to immediately adopt the ARM API understanding that there are certain capabilities they cannot leverage until they are made available.
For CSP managed scenarios, the networking capabilities are limited to ARM API due the requirement for RBAC to separate management scope between the provider and the customer.
Azure Virtual Networks provide a key building block for establishing virtual private networks. Virtual networks can be used to allow isolated network communication within the Azure environment or establish cross-premises network communication between an organization's network infrastructure and Azure. By default, when virtual machines are created and connected to Azure Virtual Network, they are allowed to route to any subnet within the virtual network, and outbound access to the Internet is provided by Azure's Internet connection.
A fundamental first step in creating services within Microsoft Azure is establishing a Virtual Network. To establish a virtual private network within Azure, you must create a minimum of one virtual network. Each virtual network must contain an IP address space and a minimum of one subnet that leverages all or part of the virtual network address space. To establish remote network communications to on-premises or other virtual networks, a gateway subnet must be allocated for the virtual network and a virtual network gateway must be added to it.
To enable cross premises connectivity, a Virtual Network must attach a virtual network gateway (often referred to as a gateway). Currently, there are three types of gateways that can be deployed:
The type of gateway determines the cross-premises connectivity capabilities, the performance, and the features that are offered. Static and dynamic gateways are used when establishing Point-to-Site (P2S) and Site-to-Site (S2S) VPN connections where the cross-premises connectivity leverages the Internet for the transport path. ExpressRoute gateways are designed for high-speed, private, cross-premises connectivity where the traffic flows across dedicated circuits and not the Internet.
Static gateways are for establishing low-cost connections to a single virtual network in Azure. Dynamic gateways are used to establish low-cost connections to an on-premises environment or to connect multiple virtual networks for routing purposes in Azure. ExpressRoute gateways are used for connecting on-premises environments to Azure over high-speed private connections.
Feature References | |
Azure Virtual Network Overview |
|
Virtual Network FAQ |
https://msdn.microsoft.com/en-us/library/azure/dn133803.aspx |
Virtual Network Cross Premises Connectivity |
https://msdn.microsoft.com/en-us/library/azure/dn133798.aspx |
VPN Devices and Gateway Information |
https://msdn.microsoft.com/en-us/library/azure/jj156075.aspx |
ExpressRoute |
http://azure.microsoft.com/en-us/documentation/services/expressroute/ |
Mandatory: Azure solutions must contain a minimum of one virtual network to establish network communications within Azure. A Virtual Network must contain a minimum of one subnet for virtual machine placement and one gateway subnet if cross premises network connectivity is required. Proper network address space planning is required when implementing virtual networks and subnets. | |
Recommended: Azure solutions should use the dynamic routing or ExpressRoute gateway versus the static routing gateway. |
Design Guidance |
When you design Virtual Networks, consider the following:
Capability Considerations |
Capability Decision Points |
RBAC |
The Virtual Network Contributor resource role allows the ability to manage the entire virtual network and subnets. The Virtual Machine Contributor role can be used to grant the ability to use a subnet but not manage it. |
CSP management |
CSP scenarios might drive additional virtual networks to allow customer separate management capabilities. |
Core limitations |
Ensure that the virtual network design supports the number of virtual machines that are desired. |
Virtual network gateways provide connectivity from on-premises networks to Azure and between Azure virtual networks. Types of gateway connection technologies include Point-to-Site (P2S), Site-to-Site (S2S), and ExpressRoute. This section covers gateways in the context of Site-to-Site (S2S) and ExpressRoute connections.
For Site-to-Site gateways, an IPsec/IKE VPN tunnel is created between the virtual networks and the on-premises sites by using Internet Key Exchange (IKE) protocol handshakes. For ExpressRoute, the gateways advertise the prefixes by using the Border Gateway Protocol (BGP) in your virtual networks via the peering circuits. The gateways also forward packets from your ExpressRoute circuits to your virtual machines inside your virtual networks.
Currently there are two types of S2S virtual private network connections (VPNs) that require the use of two types of gateways: static routing and dynamic routing. A static routing gateway uses policy-based VPNs. Policy-based VPNs encrypt and route packets through an interface based on a customer-defined policy. The policy is usually defined as an access list. Static routing VPNs require a static routing VPN gateway. Although they are effective for single virtual network connections, static gateways are limited to a single virtual network per VPN connection.
In contrast, dynamic routing gateways use route-based VPNs. Route-based VPNs depend on a tunnel interface specifically created for forwarding packets. Any packet arriving at the tunnel interface is forwarded through the VPN connection. Dynamic routing VPNs require a dynamic routing VPN gateway.
From a performance perspective there are three types of dynamic routing gateways: Basic, Standard, and High-Performance. The differences between these gateway types are outlined in the following table.
Type |
Type |
S2S connectivity |
Authentication method |
Maximum number of S2S vNet connections |
Maximum number of P2S connections |
S2S VPN Throughput |
ExpressRoute Throughput |
Basic Dynamic Routing Gateway |
Basic Dynamic Routing Gateway |
Route-based VPN configuration |
Pre-shared key |
10 |
128 |
~100 Mbps |
~500 Mbps |
Standard Dynamic Routing Gateway |
Standard Dynamic Routing Gateway |
Route-based VPN configuration |
Pre-shared key |
10 |
128 |
~100 Mbps |
~1000 Mbps |
High-performance Dynamic Routing Gateway |
High Performance Dynamic Routing Gateway |
Route-based VPN configuration |
Pre-shared key |
30 |
128 |
~200 Mbps |
~2 Gbps |
Creating and connecting a gateway for a Virtual Network is a multiple step process, and it requires certain configurations to be complete. A high-level set of steps is outlined here:
After you have a successful gateway connection, the gateway status will show as active within the virtual network dashboard in the Azure portal. Note that for ExpressRoute, S2S, and virtual network-to-virtual network connections the portal will provide a gateway connection status, Data-In traffic amount, Data-Out traffic amount, and the gateway address in the portal. Here is an example of this information shown in the Azure portal:
The configuration of an S2S or P2S virtual network gateway can be performed within the portal, but at this time, ExpressRoute gateways can only be provisioned and configured by using the ExpressRoute PowerShell module.
Note that it is possible to resize a gateway (between Basic, Standard, and High-Performance) with the Resize-AzureVNetGateway
cmdlet. This allows organizations to start at one class of service and expand their capabilities as their requirements grow. This process results in resizing the gateway, and some downtime is required during the resizing process. No other configuration changes are required. Resizing operations include increasing or decreasing between a Basic, Standard, and High-Performance gateway.
Feature References | |
Configure a Cross-Premises Site-to-Site connection to an Azure Virtual Network |
http://azure.microsoft.com/documentation/articles/vpn-gateway-site-to-site-create/ |
Configure a Virtual Network Gateway in the Management Portal |
https://msdn.microsoft.com/en-us/library/azure/jj156210.aspx |
Connect Multiple On-premises Sites to a Virtual Network |
https://msdn.microsoft.com/en-us/library/azure/dn690124.aspx |
Configure a Virtual Network and Gateway for ExpressRoute |
http://azure.microsoft.com/documentation/articles/expressroute-configuring-vnet-gateway/ |
ExpressRoute Technical Overview |
http://azure.microsoft.com/documentation/articles/expressroute-introduction/ |
ExpressRoute Prerequisites |
http://azure.microsoft.com/documentation/articles/expressroute-prerequisites/ |
Coexistence Gateway |
https://azure.microsoft.com/en-us/documentation/articles/expressroute-coexist/ |
Mandatory:
| |
Recommended:
| |
Design Guidance |
For more information about Applicable Azure design patterns, see Hybrid Networking (Azure Architecture Patterns).
When you design virtual network gateways, consider the following:
Capability Considerations |
Capability Decision Points |
Gateway provisioning performance |
Note that when you create a gateway, it can take anywhere from 15-30 minutes for the gateway to be available via the provisioning process. |
Gateway limits |
A virtual network can have a maximum of a single gateway attached to it. |
Static routing gateway limits |
Multi-site, virtual network to virtual network, and Point-to-Site gateway connection technologies are not supported with static routing VPN gateways. |
Co-existence gateway |
A new co-existence gateway exists that combines both the BGP and IKE protocols (ExpressRoute and S2S connections). With this gateway, it is possible to support ExpressRoute and S2S VPN connections to a single virtual network. The coexistence gateway supports two modes of operation: failover and coexistence.
Gateway performance remains the same for the connection types. |
Cisco ASA VPN Device |
The most common VPN device that customers use is a Cisco ASA device. This device does not currently support dynamic routing and we do not support multiple policy configuration with a static routing gateway, so only a single virtual network can be connected to a Cisco ASA VPN device. |
CSP Limits |
CSP scenarios currently only support S2S VPN gateways due to limitations of the ARM API. |
In the previous section, we discussed that there are three types of gateways available today in Azure. Gateways are used to connect on-premises environments to Azure and to enable virtual network-to-virtual network connectivity. Gateways are created at the virtual network level and a virtual network can have only a single gateway connected to each.
Regardless of which gateway type you create; you must have a gateway subnet defined for the virtual network. The gateway subnet has different address space requirements based on the type of gateway created.
A static routing gateway or a dynamic routing gateway must have a subnet with a /29 CIDR definition. When the gateway is connected, it actually takes the /29 segment and breaks it into two /30 segments to provide redundant connections as part of the Site-to-Site VPN. The address requirements are the same for a standard and a high-performance static or dynamic routing gateway.
An ExpressRoute gateway must have a subnet with a /28 CIDR definition. When the ExpressRoute gateway is established, it breaks the /28 into two /29 segments that are used to provide the redundant connections as part of the ExpressRoute circuit establishment.
Mandatory:
| |
Recommended:
|
Virtual network-to-virtual network routing allows establishing routing paths across the Azure network fabric without having to send the traffic on-premises. Establishing virtual network-to-virtual network routing requires creating an IPsec tunnel and dynamic routing across that segment.
The static and dynamic routing gateways use IKE protocol to establish an IPsec tunnel and route traffic, but only the dynamic routing gateway supports dynamic routing. Based on those requirements, the static routing gateway does not support virtual network-to-virtual network routing. Every virtual network-to-virtual network segment requires a dynamic routing gateway on both ends.
The ExpressRoute gateway leverages the BGP routing protocol to establish the communication and route traffic, and therefore, it does not meet the requirements to establish a virtual network-to-virtual network connection. This requires that a separate connection to each virtual network must be established when using ExpressRoute. In addition, it requires that virtual network traffic routes from Azure to the edge of the ExpressRoute circuit and back to Azure communicate between virtual machines on different virtual networks.
When you establish virtual network routing, by default a virtual network can allow traffic to flow cross a single virtual network-virtual network gateway connection. This is an isolation feature that forces establishing multiple hop routing definitions to enable communications.
The following section discusses the different options of connecting virtual networks together to support routing scenarios. Note that it is possible to provision ASM and ARM versions of virtual networks. The process for connecting both versions is slightly different.
Each gateway has a limited number of other gateway connections that it can establish. The connection model between gateways dictate how far you can route within Azure. There are three distinct models that you can leverage to connect multiple virtual networks to one another:
Mesh
Hub and Spoke
Daisy-Chain
By default, in the Mesh approach, every virtual network can talk to every other virtual network with a single hop. Therefore, this approach does not require you to define multiple hop routing. Challenges with this approach include the rapid consumption of gateway connections, which limit the size of the virtual network routing capability.
By default, in the Hub and Spoke approach (as illustrated in the previous example) a virtual machine on vNet1 will be able to communicate to a virtual machine on vNet2, vNet3, vNet4, or vNet5. A virtual machine on vNet2 could talk to virtual machines on vNet1, but not a virtual machine on vNet3, vNet4, or vNet5. This is due to the default single hop isolation of the virtual network in this configuration.
By default, in a Daisy-Chain approach, a virtual machine on vNet1 can communicate to a virtual machine on vNet2, but not vNet3, vNet4 or vNet5. A virtual machine on vNet2 could talk to virtual machines on vNet1 and vNet3. The same virtual network single hop isolation applies.
Feature References | |
Connecting ASM virtual networks to ARM virtual networks |
https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-arm-asm-s2s/ |
Mandatory:
| |
Recommended:
| |
Design Guidance |
For more information about applicable Azure design patterns, see Hybrid Networking (Azure Architecture Patterns).
Additional implementation guidance and examples are provided in the Appendix of this document.
A Virtual Network in Azure is an address space container that can have a gateway connected to it to allow communications. As part of the Virtual Network configuration, customers must configure non-overlapping IP address space for their Azure environment.
This IP address space can consist of private IPV4 IP address ranges (as described in RFC 1918) or public (non-RFC 1918) IPV4 IP address ranges owned by the organization. Exceptions to public address ranges include:
A virtual network address space can be subdivided into smaller groups of address spaces called subnets. Subnets are the connection points for virtual machines and specific PaaS roles, not the virtual network. The subnets are connected to the virtual network and part of a flat routed network where traffic that flows through the gateway will reach each subnet.
There are two types of subnets that can be created:
Virtual machine subnets can have virtual machines, PaaS roles (web and worker), and internal load balancers. Gateway subnets can have connections only with other gateways—provided they are using non-overlapping IP address spaces.
Feature References | |
Non-RFC 1918 space now allowed in a virtual network |
http://azure.microsoft.com/en-us/updates/non-rfc-1918-space-is-now-allowed-in-a-virtual-network |
About Public IP Address Space and Virtual Network |
https://azure.microsoft.com/documentation/articles/virtual-networks-public-ip-within-vnet/ |
Microsoft Azure Datacenter IP Ranges |
http://www.microsoft.com/en-us/download/details.aspx?id=41653 |
Mandatory:
|
Design Guidance |
When you design virtual network IP address spaces, consider the following:
Capability Considerations |
Capability Decision Points |
Virtual network and subnet Configuration |
Although there is a limit on the number of virtual networks that can be placed in a subscription, there is no limit on subnets except for how small the address space of the virtual network can be subdivided. Each subnet has the first three addresses reserved for Azure usage, so the first available address is the fourth address. This means that the smallest subnet can have a CIDR of /29, so there are six assignable addresses. It is possible to have multiple IP address space definitions in a virtual network definition. |
Virtual machines and address space planning |
Currently, a virtual network can have a total of 2048 virtual machines attached to subnets. By default, every virtual machine has a single network adapter, and therefore, the virtual network space needs a minimum of 2048 IP addresses (plus the three for Azure) if you are going to maximize the density of the virtual network. |
Address space considerations |
When designing an address space for a virtual network, consider the following:
|
Planning for internal load balancing |
Each virtual machine can have multiple network adapters, and if internal load balancers are used, you also need a single IP address for every internal load balancer. So a formula would be: Virtual network address space = # of Virtual machines + # of additional network adapters + # of internal load balancers + 3 Note that you need to round up this number to an IP address CIDR border. For example, if the formula results in a minimum requirement of 8003 addresses, you must round up to the next CIDR border of /19, which is 8190 addresses for the virtual network address space. |
Duplicate or overlapping IP ranges |
One limitation of address space design is that no duplicate IP address ranges can exist in any routed network. This means that you cannot use the same address space for an Azure virtual network or subnet that already exists somewhere else (such as on premises) where the Azure subnets need to route. CSP's need to ensure that customers do not implement overlapping address spaces in Azure. |
Currently virtual network logging is limited to change management (create, modify, and delete) audit logging. In the Azure portal, it is available via Management Services > Operations Logs. You can also use the Azure PowerShell cmdlet Get-AzureSubscriptionIDLog
for:
Data plane and control plane logging is not available at this time.
Design Guidance |
Consider the impact of logging virtual network information for customers who require regulatory compliance (such as PCI) or other operational requirements.
Azure supports two types of connectivity options to connect customer's networks to Azure virtual networks: Site-to-Site VPN and ExpressRoute. Although Point-to-Site is another viable connectivity option, it is client-focused and is not specific to this discussion.
Site-to-Site VPN connections use VPN devices over public Internet connections to create a path to route traffic to a virtual network in a customer subscription. Traffic to the virtual network flows across an encrypted VPN connection, while traffic to the Azure public services flows over the Internet.
It is not possible to create a Site-to-Site VPN connection that provides direct connectivity to the public Azure services via a public peering path. To provide multiple VPN connections to the virtual network, you must use multiple VPN devices connected to different sites. These relationships are depicted in the following diagram:
If a customer selects to engage a cloud service provider in a "connect through" scenario, the customers connect to the CSP network over a S2S VPN and the CSP is connected to Azure over separate S2S VPN connections.
If a customer selects to engage a cloud service provider in a "connect to" scenario, the customer connects to Azure network over a S2S VPN and the CSP is connected to the customer's network over a separate S2S VPN connection that allows the CSP to manage the Azure subscription and resources on behalf of the customer.
ExpressRoute connections use routers and private network paths to route traffic to Azure Virtual Network, and optionally, to the Azure public services. Private connections are made through a network provider by establishing an ExpressRoute circuit with a selected provider. The customer's router is connected to the provider's router and the provider creates the ExpressRoute circuit to connect to the Azure Routers.
When the circuit is created, VLANs can be created that allow separate paths to the private peering network to link to virtual networks and to the public peering network to access Azure public services.
Design Guidance |
Cloud Service Provider scenarios are currently limited to Site-to-Site connection options due to the current lack of support for ExpressRoute in ARM.
ExpressRoute is a high-speed private routed network connection to Azure. The connections between the customer's network edge and the provider's network edge are redundant as are the connections from the provider's edge to the Azure edge.
From the provider to the Azure edge, you can have private peering connections to customer virtual networks and public peering connections to the Azure PaaS services, such as Azure SQL Database. There are two carrier models provided for ExpressRoute: Network Service Providers (NSPs) and Exchange Providers (IXPs). NSP and IXP connectivity models, speeds, costs and capacities vary. These differences are summarized in the following table:
Network Service Provider |
Exchange Provider | |
Bandwidth |
10, 50, 100, 500, 1000 Mbps |
200, 500, 1000, 10000 Mbps |
Route management |
Provider manages |
Customer manages |
High availability |
Provider manages |
Customer manages |
MPLS support |
Yes |
No |
Azure circuit costs |
Ingress and egress included in monthly fee |
Ingress and egress allocation included in monthly fee and based on consumption |
Provider circuit costs |
Based on consumption—some provide all-inclusive plans |
Based on consumption |
Feature References | |
Configure an ExpressRoute Connection through a Network Service Provider |
http://azure.microsoft.com/documentation/articles/expressroute-configuring-nsps/ |
Configure an ExpressRoute Connection through an Exchange Provider |
http://azure.microsoft.com/documentation/articles/expressroute-configuring-exps/ |
ExpressRoute Whitepaper with detailed steps for connecting via IXP model |
Establishing a connection to the public peering network allows virtual machines on Azure Virtual Networks and on-premises systems to leverage the ExpressRoute circuit to connect to Azure PaaS services on the public peering network without traversing the Internet.
Establishing a public peering connection is an optional configuration step for an ExpressRoute circuit. When the public peering connection is established, the routes for all the Azure datacenters worldwide are published to the edge router. This directs traffic to the Azure services instead of going out to the Internet.
The interface between the Azure public services and the customer's network is protected by redundant NAT firewalls. These NAT devices allow customers' systems to access the Azure public services, but they only allow stateful traffic back to the customer's networks.
This interaction is outlined in the following diagram:
Design Guidance |
When you design ExpressRoute peering, consider the following:
Capability Consideration |
Capability Decision Points |
Azure Services in the datacenter |
All Azure services reside within an Azure datacenter and are assigned routable IP addresses. |
Public peering services |
From a design perspective, any Azure public service only sees the NAT device address. If the Azure public service provides firewall protection, only the NAT addresses can be used in the firewall rules. From a security perspective, specifying the NAT addresses will prevent connections from the Internet for the customer's instance of that public service. This also means that any system behind the NAT can access the public service, which may not be desired from a security perspective. |
ExpressRoute circuits provide a private path to route traffic to the Azure datacenter. When the traffic reaches the Azure edge device, it must leverage the software-defined routing within the Azure datacenter to isolate traffic. Currently, ExpressRoute connections from the customer's datacenter to the Azure edge and can achieve up to 10 Gbps.
At that edge, virtual connections are established to the customer's private virtual network gateways to enable routing traffic. Today the maximum performance that a single virtual network gateway can provide is 2 Gbps. To optimize traffic through the ExpressRoute circuit, it may be required to leverage multiple virtual networks and gateways.
Customers have the ability to purchase the optimal ExpressRoute circuit bandwidth to meet their throughput requirements. Circuits can be upgraded to provide additional performance with minimal impact. Circuits cannot be downgraded without impact.
Design Guidance |
When you design for ExpressRoute performance, consider the following:
Capability Consideration |
Capability Decision Points |
Bursting traffic |
ExpressRoute circuits allow for bursting of traffic to up to two times the rated bandwidth of the circuit. Gateways will also support this bursting capability. Gateways and circuits will drop packets if the burst limit is exceeded. |
Standard versus Premium ExpressRoute |
ExpressRoute comes in two SKUs: Standard and Premium. Although the performance of the ExpressRoute circuit does not change the number of routes, the number of virtual network connections per circuit, and the ability to route traffic across Azure regions is an upgrade when using Premium. |
Gateway performance |
Gateways come in three SKUs: Basic, Standard, and High Performance. The maximum speed of the gateway is a function of the SKU and affects the performance that you can achieve over an ExpressRoute circuit to a single virtual network. |
ExpressRoute connectivity and pricing is made of two components: the service connection costs (Azure) and the authorized carrier costs (telco partner). Customers are charged by Azure for the ExpressRoute monthly access fee, and potentially an egress traffic fee based on the type and performance of the ExpressRoute connection. Customers also have costs associated with the selected provider, which is typically comprised of the circuit connection and monthly traffic fees.
From an Azure perspective, an NSP connection is an inclusive plan where customers are charged a monthly fee and get unlimited ingress and egress traffic. Fees associated with IXP connections include a monthly service charge and potential traffic egress charges when a high watermark of traffic is exceeded. In these cases, the customer is charged an additional fee based on the amount of egress traffic above the included amount.
Feature References | |
Azure ExpressRoute cost information |
http://azure.microsoft.com/en-us/pricing/details/expressroute/ |
Recommended: When planning for network connectivity with ExpressRoute, ensure that the costs are well understood and that conversations with authorized carriers are addressed early in the planning process. |
Design Guidance |
When you design for ExpressRoute costs, consider the following:
Capability Consideration |
Capability Decision Points |
Provider costs |
The provider costs are much harder to ascertain because egress traffic is inclusive and additional monthly circuit fees typically apply. NSP provider costs can range from a flat monthly fee per gigabyte to a premium service that is a large flat monthly fee. Potential additional costs may include the number of MPLS circuits that have been configured. |
IPX connections |
An IXP provider connection is a fiber connection, and it typically includes monthly and one-time fiber connection fees. The customer's connection typically includes the fiber from the datacenter to the provider's access point, the costs for transmitting the traffic to the consolidation point of presence, and a "last mile" fiber connection to get connected to the Azure datacenter. |
NSP connections |
For the NSP model, the provider typically provides and manages the provider edge routers and the configuration and management of the published routes. However, for an IXP model, the customer must provide the router that is placed at the provider access point and manage all the route publishing. The advantage of the IXP model is that typically the customers is given a rack and is allowed to place hardware in addition to the router. This allows the customer to include security hardware and other appliances. |
ExpressRoute Premium is an add-on package that allows an increase in the number of BGP routes, allows for global connectivity, and increases the number of virtual networks per ExpressRoute circuit. This add-on can be applied to Network Service Provider or Exchange Provider circuits.
Summary of ExpressRoute Premium features:
Increased number of virtual network links per ExpressRoute circuit (from 10 to a larger limit, depending on the bandwidth of the circuit).
Feature References | |
Azure ExpressRoute Premium Circuit connection information |
https://azure.microsoft.com/en-us/documentation/articles/expressroute-faqs/ |
Design Guidance |
When you design for ExpressRoute Premium, consider the following:
Capability Consideration |
Capability Decision Points |
Service availability and access |
Although ExpressRoute Premium is available in regions such as India and Australia, to leverage the cross virtual network connectivity, you must have a business presence within the country and a local Azure billing account to establish a cross-region virtual network connection. |
Microsoft Site-to-Site (S2S) connectivity allows low cost connections from customer locations to Azure private peering networks. S2S leverages the Internet for transport and IPsec encryption to protect the data flowing across the connection.
Requirements:
Potential Use Cases:
Feature References | |
VPN Device information |
https://msdn.microsoft.com/en-us/library/azure/jj156075.aspx |
Configuring Multiple Sites Connectivity |
https://msdn.microsoft.com/en-us/library/azure/dn690124.aspx |
Configure a Cross-Premises Site-to-Site connection to an Azure Virtual Network |
http://azure.microsoft.com/documentation/articles/vpn-gateway-site-to-site-create/ |
Mandatory:
| |
Recommended: If automating the creation of the gateway for the S2S VPN, specify your shared key versus retrieving a shared key from Azure.
| |
Optional: Leverage the New-GUID cmdlet to generate a complex shared key |
Design Guidance |
When you design for Site-to-Site Connections, consider the following:
Capability Consideration |
Capability Decision Points |
S2S VPN performance |
A maximum of 200 Mbps connection per VPN at the gateway interface to the virtual network regardless of the Internet connection speed |
On-premises VPN device |
The supported VPN device used determines virtual network routing capability (static or dynamic routing) |
Shared keys |
Shared keys are required for establishing site-to-site connectivity |
Multi-site support |
You must determine if multiple on-premises sites can access a single virtual network gateway. |
Public peering |
S2S VPNs do not have access to the public peering network to connect to Azure services. |
Microsoft Point-to-Site (S2S) connectivity allows low cost connections from customer workstations to Azure private peering networks. P2S leverages the Internet for transport and certificate-based encryption to protect the data flowing across the connection. A VPN device or a public facing IPv4 address is not required to establish a P2S VPN connection.
Requirements:
Potential use cases:
Feature References | |
Configure a Cross-Premises Point-to-Site connection to an Azure Virtual Network |
http://azure.microsoft.com/documentation/articles/vpn-gateway-point-to-site-create/ |
Mandatory: The following are required for implementation of Point-to-Site VPN connections:
|
Design Guidance |
When you design for Point-to-Site connections, consider the following:
Capability Consideration |
Capability Decision Points |
P2S VPN limitations |
There is a maximum of 128 P2S VPN connections per virtual network. At the time of writing, the client package is available for x86 and x64 Windows clients. |
Certificate requirements |
Self-signed or Enterprise Certification Authority (CA) certificates must be used |
Interoperability with ExpressRoute |
You cannot leverage P2S connections with a virtual network connected to an ExpressRoute circuit due to existing gateway limitations. |
Forced tunneling allows you to specify the default route for one or more virtual networks to be the on-premises VPN or ExpressRoute gateway. This is implemented by publishing a 0.0.0.0/0 route that points to that gateway. In effect, this results in any packet that is transmitted from a virtual machine connected to the virtual network that is not destined to another IP address within the scope of the virtual network to be sent to that default gateway.
When using forced tunneling, any outbound packet that is attempting to go to an Internet address will be routed to the default gateway and not to the Azure Internet interface. For a virtual machine that has a public endpoint defined that allows inbound traffic, a packet from the Internet will be able to enter the virtual machine on the defined port. A response might be sent, but the reply will not go back out the public endpoint to the Internet. Rather, it will be routed to the default gateway. If the default gateway does not have a route path to the Internet, the packets will be dropped, effectively blocking any Internet access.
Forced tunneling has different implementation requirements and scope depending on the type of Azure connectivity of the virtual network. A virtual network that is connected over a S2S VPN connection requires forced tunneling to be defined and configured on a per virtual network basis by using Azure PowerShell. A virtual network that is connected over an ExpressRoute connection requires forced tunneling to be defined at the ExpressRoute circuit, and this affects all virtual networks that are connected to that circuit.
Determining how forced tunneling will be used in a design should involve the following design decisions:
It is always a good security practice to have defense-in-depth where there are additional layers of security in case a layer is compromised or inadvertently removed. Forced tunneling forces all packets back to the default gateway. However, relying only on that approach is not a good defense in depth design.
If you leverage forced tunneling, there is no reason to define any public endpoint for virtual machines when they are provisioned. Leaving the default public ports provides a security vector if forced tunneling is ever disabled.
A good design practice is to implement Network Security Groups for the subnets of every virtual network configured for forced tunneling. This allows you to have an additional layer of network protection. Understand that although you can use Network Security Groups to create rules for a virtual machine or a subnet that restricts outbound access, any co-administrator can temporarily or permanently override those rules. (Note that Network Security Group rule changes are logged.)
Forced tunneling provides the best defense-in-depth for a virtual network that is connected by ExpressRoute. A forced tunnel configuration with an ExpressRoute circuit requires a network engineer to be involved because it is implemented as a BGP routing configuration. This is not something an Azure co-administrator has the rights to configure. However, a forced tunneling configuration with a S2S VPN is something that can be performed by a co-administrator on each virtual network.
Mandatory: A design that leverages forced tunneling (default route) typically must provide access via a different path than using Azure Internet access. | |
Recommended: Combine forced tunneling with Network Security Groups to achieve defense-in-depth for traffic isolation. | |
Optional: Investigate the use of a dual network adapter edge firewall appliance with an extranet subnet as one alternative to Network Security Groups. |
Network security in Azure can present many challenges, especially to organizations that rely exclusively on network security measures for isolation. Many on-premises and IaaS deployments leverage a point-to-point firewall, rule-based approach to secure access to resources. They combine this with platform-based authentication access.
PaaS deployments present new challenges because they are designed to be driven by application and identity controls, but many organizations attempt these deployments with traditional network based-approaches.
Hybrid PaaS and IaaS deployments (where there may be IaaS or PaaS roles combined with Azure public services such as Azure SQL Database, Redis Cache, and Service Bus) are the most challenging to plan. This is because the Azure public services are multitenant, and in some cases, they are cannot be connected directly to the network infrastructure owned by the customer.
Many of the public Azure services also do not have the construct of a service-level firewall that the customer can configure directly, so leveraging traditional approaches to secure those solutions with classic network approaches can be challenging.
Regulatory requirements introduce potential complications because they sometimes explicitly require (or are interpreted to require) a traditional, on-premises, point-to-point network security approach to mitigations.
Another complication with attempting to use traditional network-based security controls exclusively is that most of these controls assume the IP address is a good proxy for machine or service identity.
IP addresses are a poor proxy for identity outside of a corporate LAN that is using static assignments, particularly in a globally scaled Internet service such as Azure where IP addresses change rapidly. This typically creates significant challenges for organizations that are overly reliant on network security measures and are using static IP addresses for server and service mapping.
Review the guidance in the Microsoft Azure Security section (specifically the Containment and Segmentation Strategy) for how to design complete security containment strategies that overcome the limitations of networking controls alone.
Applying traditional security approaches to Azure networking involves the following:
Security Feature |
Description |
When to Use |
Network security group |
Access control rules that can be applied to subnets or virtual machines |
|
Forced tunneling |
Default route for a gateway that send all non-local traffic to the customer's on-premises edge router for processing |
|
Firewall appliances – single network adapter |
Software-based firewall that can be placed between virtual machines and the Internet. Requires all traffic to be routed through the firewall, typically by using agents and IPsec. |
|
Firewall appliances – dual network adapters |
Software-based firewall that can be placed between subnets or between a subnet and the Internet |
|
IPsec |
Traffic authentication and encryption at the server level. Requires machines to be domain joined, |
|
Hardware firewall appliances at the network edge |
Placing a hardware firewall appliance at the customers network edge |
|
Web application firewalls |
Software-based firewall that is used to control ingress traffic from the Internet. Typically a layer 7 firewall. |
|
When customers extend their datacenters to Azure or deploy an application within the Azure infrastructure, they must select an approach for access control and security, based on an access scenario. Common access scenarios include:
Note that a security-access approach might have multiple options to provide access. For example, accessing an application in Azure via the Internet can be accomplished with different security and traffic routing approaches.
Application Access Approach |
Description |
When to Use |
Direct to Azure |
Internet access is accomplished by exposing the UI tier directly on the Internet. |
|
Using the existing security solution |
Internet access could be blocked by using forced tunneling. All traffic must flow through the corporate Internet-facing security stack, be routed over the corporate backbone, and get to Azure using ExpressRoute or S2S connections. |
|
Using a provider security solution |
Internet access could be blocked by using forced tunneling. All traffic must flow through a service provider's Internet-facing security stack, be routed over the service providers backbone, and get to Azure by using S2S or ExpressRoute connections, |
|
Using an Azure-based security solution |
Internet access is accomplished by building a security stack in Azure by using network virtual appliances. |
|
Virtual Appliances are third-party-based virtual machine solutions that can be selected from the Azure Gallery or Marketplace to provide services like network firewall, application firewall and proxy, load balancing, and logging. Appliances are licensed by:
Appliances are available in single network adapter or multiple network adapter configurations depending on the type of appliance and the required capabilities. For example, a logging appliance might only require a virtual machine with a single network adapter because all the traffic is written to the appliance. A network firewall typically requires a virtual machine with a multiple network adapter configuration that supports layer 3 routing so that the traffic has to flow through the appliance to reach its destination.
To leverage an appliance that supports layer 3 routing, the network architecture must include user defined routing to override the default implicit routes to specify explicit user-defined routes. This allows the specification of routing rules that can direct traffic to the appliance network adapter, to the local virtual network, or to the on-premises environment.
The following table lists virtual appliances types and when to use them:
Virtual Appliance Type |
Description |
When to Use |
Network firewall |
Virtual appliance that leverages a virtual machine with a multiple network adapter configuration and layer 3 routing support to enable a network firewall between multiple subnets in Azure. |
|
Load balancer |
Provides layer 4 or layer 7 load balancing |
|
Security appliance |
Intrusion detection appliance |
|
User defined routing allows you to configure and assign routes that override the default implicit system routes, ExpressRoute BGP advertised routes, or the local-site network-defined routes for S2S connections. Configuring a user defined route allows the specification of next-hop definition rules that control traffic flow within a subnet, between subnets, from a subnet through an appliance to another subnet, to the Internet, and to on-premises networks.
Configuring user defined routes involves modifying the default routing table. Each entry in the routing table requires a set of information:
User defined routing is only applied to virtual machines and cloud services in Azure. Placing a virtual appliance and defining user defined routes between on-premises networks and Azure allows you to control the traffic. Any traffic that flows from on-premises networks to Azure is not affected by the user defined routes, and it leverages the system routes and bypasses the virtual appliance.
Mandatory: To leverage user defined routing or a virtual appliance requires that both are implemented. | |
Recommended:
|
While you can have multiple route tables defined, a subnet can only have a single route table associated with it. A single route table can be associated to multiple subnets. All virtual machines and cloud services connected to a subnet are affected by the route table decisions.
Routing of traffic from a virtual machine is accomplished by using implicit system routing via a distributed router that is implemented at the virtual network level. Every packet follows a set of implicit routes that are implemented at the host level. These routes control the flow of traffic within the virtual network to on-premises networks (if enabled), and to the Internet. Traffic flow to the Internet is achieved through NAT by the host.
The following diagram shows the implicit routing rules that a virtual machine follows by default without any user defined routing.
The following rules are applied to the packet in this scenario:
When a network firewall virtual appliance is introduced to the scenario, user defined routing must be configured to control the traffic routing through the appliance. Without user defined routing, no traffic will flow through the appliance.
The following diagram shows a virtual appliance inserted into the scenario to control traffic routing to the Internet via front-end and back-end subnets in Azure:
The following rules are applied to the packet in this scenario:
Mandatory: For CSP scenarios where the provider attempts to leverage a single VPN device to connect multiple customers to Azure, user defined routing is required to maintain proper traffic separation and flow. |
Feature References | |
User Defined Routing |
https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-udr-overview/ |
A Network Security Group is a top-level object that is associated with your subscription. It can be used to control traffic to one or more virtual machine instances in your virtual network. A Network Security Group contains access control rules that allow or deny traffic to virtual machine instances. The rules of a Network Security Group can be changed at any time, and changes are applied to all associated instances.
A Network Security Group requires a regional virtual network. Network Security Groups are not compatible with virtual networks that are associated with an affinity group.
Network Security Groups are similar to firewall rules in that they provide the ability to control the inbound and outbound traffic to a subnet, a virtual machine, or virtual network adapter.
Network Security Groups allow you to define rules that specify the source IP address, source port, destination address, destination port, priority, and traffic action (Allow or Deny). The rules can be applied to inbound and outbound traffic independently.
Traditionally, a firewall rule is applied to a port on a router that is connected to a switch. It affects all traffic flowing inbound and outbound to the switch, but it does not affect any traffic within the switch. A Network Security Group rule that is applied to a subnet is more like a firewall rule that is applied at the switch and affects inbound and outbound traffic on every port in the switch.
Any virtual machine connected to the switch port would be affected by the Network Security Group rule applied to the subnet.
For example, if a Network Security Group is created and a Network Security Group rule is defined that denies inbound Remote Desktop Protocol (RDP) traffic for all addresses over port 3389, no virtual machine outside the subnet can connect via RDP to a virtual machine that is connected to the subnet, and no virtual machine connected to the subnet can connect via RDP to any other connected virtual machine.
Network Security Groups can also be applied to the virtual machine or to the network adapter of a virtual machine. This allows greater flexibility in how traffic is filtered.
Mandatory: For Ingress traffic to the VM, rules are applied at a subnet level, then VM level, and then NIC level. For Egress traffic from the VM, rules are applied at the NIC level, then VM level, and then subnet level. Rules are applied in priority order. |
To allow the virtual machines within the subnet to connect via RDP to each other, a new rule with higher priority has to be added that allows inbound traffic from the subnet CIDR on port 3389.
Description |
Priority |
Source Address |
Source Port |
Destination Address |
Destination Port |
Protocol |
Action |
Deny inbound RDP |
1010 |
* |
* |
* |
3389 |
TCP |
Deny |
Allow inbound for subnet |
1000 |
192.168.100.0/24 |
* |
* |
3389 |
TCP |
Allow |
Every Network Security Group that is created has a set of default Inbound and Outbound rules that are defined and cannot be deleted. The rules can be overridden with higher priority rules though. Any user defined rule can range from a priority value of 100-4096, where 100 is the highest priority rule.
Default Inbound Network Security Group Rules
Description |
Priority |
Source Address |
Source Port |
Destination Address |
Destination Port |
Protocol |
Action |
Allow virtual network inbound |
65000 |
VIRTUAL_NETWORK |
* |
VIRTUAL_NETWORK |
* |
* |
Allow |
Allow Azure load balancer inbound |
65001 |
AZURE_LOADBALANCER |
* |
* |
* |
* |
Allow |
Deny all inbound |
65500 |
* |
* |
* |
* |
* |
Deny |
Default Outbound Network Security Group Rules
Description |
Priority |
Source Address |
Source Port |
Destination Address |
Destination Port |
Protocol |
Action |
Allow virtual network outbound |
65000 |
VIRTUAL_NETWORK |
* |
VIRTUAL_NETWORK |
* |
* |
Allow |
Allow Internet outbound |
65001 |
* |
* |
INTERNET |
* |
* |
Allow |
Deny all outbound |
65500 |
* |
* |
* |
* |
* |
Deny |
Subscription Limits for Network Security Groups
Object |
Service Management Subscription Limit |
Resource Management Subscription Limits |
Network Security Groups |
100 per subscription 1 Network Security Group per subnet 1 Network Security Group per virtual machine 1 Network Security Group per network adapter 1 Network Security Group can be linked to multiple subnets, virtual machines, or network adapters |
100 per region/per subscription 1 Network Security Group per subnet 1 Network Security Group per virtual machine 1 Network Security Group per network adapter 1 Network Security Group can be linked to multiple subnets, virtual machines, or network adapters |
Network Security Group rules |
100 Rules/Network Security Group* |
100 rules per Network Security Group* |
*Can be increased by Microsoft support personnel to a maximum of 400 rules per Network Security Group.
Default tags are system-provided identifiers to address a category of IP addresses. Default tags can be specified in customer-defined rules. The default tags are as follows:
Tag |
Description |
VIRTUAL_NETWORK |
This default tag denotes all of your network address space. It includes the virtual network address space (IP CIDR in Azure) and all connected on-premises address spaces (local networks). It also includes virtual network-to-virtual network address spaces. |
AZURE_LOADBALANCER |
This default tag denotes the load balancer for the Azure infrastructure. This translates to an IP address for an Azure datacenter where the health probes originate. This is needed only if the virtual machine or set of virtual machines associated with the Network Security Group is participating in a load balanced set. Note this is not the actual load balancer IP address. |
INTERNET |
This default tag denotes the IP address space that is outside the virtual network and reachable by public Internet. This range includes the public IP space that is owned by Azure. If you use this tag for outbound restrictions, you potentially will not be able to access an Azure PaaS service unless you have a higher priority rule that grants access to that service. |
Mandatory: Network Security Groups must be assigned to a subnet, virtual machine, or network adapter for any of the rules to affect traffic. | |
Recommended: For CSP scenarios, consider using network security groups to protect subnets from improperly configured CSP routing tables. |
Design Guidance |
When you are designing Network Security Groups, consider the following:
Capability Consideration |
Capability Decision Points |
Compliance |
Network Security Groups can present design challenges from a PCI or other compliance perspective because of the logging capabilities that exist in the service. |
Priority numbering |
When designing Network Security Group rules for inbound or outbound scenarios, be sure to leave a blank priority number space between rules. Note that the priority values are independent for inbound and outbound rules. |
Default rules |
To override the default rules, define a rule that has a priority number in the 4000 range to allow as many additional rules as possible. |
Port numbers |
Although you can specify a contiguous range of ports in the rule definition (1024-1048), you cannot specify random port numbers (1024, 1036, 30000). |
Targeting |
There are limitations that affect how many Network Security Groups, how many rules per Network Security Group, and how they can be applied to the object. Consider the following when determining Network Security Group targets:
|
Precedence |
Consider where the Network Security Group is being deployed and if there are other Network Security Group in play at virtual machine or subnet levels that may prevent the Network Security Group being applied at the virtual machine or network adapter level from functioning.
|
Endpoints allow for communication between Azure compute instances and the Internet. Endpoints can be defined in Azure so that they allow translation of a public port and IP address to a private port and private address. By default, when provisioning a virtual machine, two endpoints are automatically created:
In the portal, you can see the defined endpoints on the Endpoints tab of the virtual machine configuration.
To provision a virtual machine with no public endpoints, you have two choices:
If a cloud service has more than one virtual machine, the endpoints have to share the single public facing VIP, but they require different public ports for the port translation to be redirected to the correct virtual machine.
Endpoint design requires the consideration of the security threat that the endpoint provides. When you have a public facing endpoint, it can be used to access the provided service, but it also can be attacked by hackers.
Azure provides denial-of-service features at the edge of the Azure firewall, but it does not prevent someone from attempting to hack a public facing port. Any public facing port for a provided service should leverage a strong authentication mechanism to help prevent a hacker from gaining access.
Mandatory: Public endpoints are not required unless you need inbound access from the Internet. | |
Recommended: Only enable public endpoints if the inbound Internet is the only way to achieve communication. Use P2S, S2S, or ExpressRoute to RDP, or access the PowerShell interfaces to a virtual machine versus using a public endpoint. |
To further protect resources deployed within Azure, you can manage incoming traffic to the public port by configuring rules for the network access control list (ACL) of the endpoint. An ACL provides the ability to selectively permit or deny traffic for a virtual machine endpoint for an additional layer of security.
By using network ACLs, you can do the following:
For instructions about configuring ACLs for your Azure virtual machine endpoints, see:
The following diagram outlines the UI for creating of an ACL for a public endpoint.
Feature References | |
Microsoft Azure Network Security Whitepaper version 3 |
|
Security Considerations for SQL Server in Azure Virtual Machines |
https://msdn.microsoft.com/en-us/library/azure/dn133147.aspx |
Active Directory Considerations in Azure Virtual Machines and Virtual Networks Part 5 – Domains and GCs |
|
Security Considerations for Infrastructure as a Service–IaaS-Private Cloud |
Azure allocates IP addresses based on the type of object being provisioned and the options selected. In some cases, having a reserved IP address is required. The following table outlines the options for reserved IP addresses and the use cases for each type:
Type |
Description |
When to Use |
DIP |
Dynamic IP address. Internal IP address assigned by default and associated with a virtual machine. |
Always assigned |
VIP |
Virtual IP address. Assigned to a virtual machine, cloud service load balancer, or an internal load balancer. Address is private for an internal load balancer and is public for a cloud service load balancer or a virtual machine. Address is shared across all virtual machines within the same cloud service. |
Always assigned |
PIP |
Public IP address. Public instance-level IP address that can be assigned to a virtual machine. A PIP allows direct communication to a virtual machine without going through the cloud service load balancer. |
Use only when you need to directly communicate with an instance in cloud service |
Reserved |
This is a static public-facing VIP address for a cloud service that must be specially requested. There are a limited number of these addresses per subscription. |
Use only when you need a public facing static IP address |
Internal static |
A static address allocated from the subnet address pool. Internal facing only. The number is only limited by the number of addresses assigned to the subnet address pool. This is implemented as a DHCP reservation. |
Use only when you need an internal facing static IP address |
For more information, see VIPs, DIPs and PIPs in Microsoft Azure.
This relationship is illustrated in the following diagram:
When you create an object that connects to a subnet in Azure, two IP addresses are automatically allocated to that object:
Both addresses are assigned to the single network adapter and that adapter is connected to the subnet. The internal facing DIP address is allocated from the address space pool of the subnet to which the virtual machine is attached. The public facing VIP address is allocated from the pool of Azure datacenter addresses that are assigned to the datacenter where it resides.
Azure provides dynamic allocation of IP addresses to compute resources within each subscription. Addresses are assigned starting from the first available address in the subnet pool. If a virtual machine is allocated an address and then it releases that address, the address is available for reassignment.
An IP address assigned to a virtual machine is associated to the virtual machine until the machine is in a stopped (deallocated) state or it is destroyed completely. Using the Shutdown option in the Azure portal results in the virtual machine being placed in the stopped-deallocated state, and the DHCP reservation is released. When the virtual machine is restarted it will receive a new IP address.
Actions like a virtual machine reboot by using Shutdown from the operating system via RDP, or by leveraging the Stop-AzureVM PowerShell cmdlet with the –StayProvisioned parameter will not deallocate the IP address of the virtual machine.
Azure IP addresses that are released to the available address pool are immediately available for reassignment to a virtual machine. When Azure allocates an address, it searches sequentially from the beginning of the subnet address pool until it finds an available address, and then assigns it to the virtual machine. This assignment method is used for dynamic and static addresses from the subnet address pool.
Mandatory: Every object that connects to a subnet in Azure requires a DIP (including IaaS virtual machines, internal load balancers, PaaS roles) A public-facing VIP is always assigned to a cloud service and shared by all virtual machines or PaaS roles within the cloud service. | ||
Recommended: Do not use the Azure portal to shut down a virtual machine unless you are trying to change its IP address or delete the virtual machine, otherwise you will lose the assigned IP address. Use static IP addresses only when a dynamic address will not meet requirements. Do not use them because that is the current on-premises approach. | ||
Feature References | ||
Stop-AzureVM cmdlet command reference |
https://msdn.microsoft.com/en-us/library/azure/dn495269.aspx |
By default, in Azure all addresses are dynamic regardless of if they are provisioned through the Azure portal or through PowerShell. Statically assigned IP addresses can only be requested or assigned by using Azure PowerShell. During the object creation process, a command-line option is provided to allow a static address to be specified.
There is not a way within Azure to preallocate or reserve an address prior to assignment. All address assignments are done at the time of object provisioning. To determine if an address is available to use as a static address, you can use the Azure PowerShell cmdlet Test-AzureStaticvNetIP to test if an IP address has already been allocated from the subnet address pool.
If the address is not available, the cmdlet will return a list of addresses that are available. Using the Test-AzureStaticvNetIP cmdlet to determine if an address is allocated does not guarantee that the address has not been allocated by the time you provision the object.
Reserved IP addresses are static public facing VIP addresses that are typically used to provide a static IP address for a public facing application. Using a reserved IP address allows a DNS A record to be created with minimum management overhead required. It also provides a consistent IP address that can be used for point-to-point security rules in firewalls. Reserved IP addresses must be requested by using the Azure PowerShell cmdlet New-AzureReservedIP, and then given a name. The name is used by the New-AzureVM cmdlet during provisioning.
New-AzureVM -ServiceName "WebApp" -ReservedIPName "MyWebSiteIP"
-Location "US West"
There are a limited number of reserved IP addresses in a given subscription. The default is five addresses, but through a limit increase request, it may be increased to a maximum of 100.
Reserved IP addresses are scarce resources, and they should only be used when a static address is absolutely required.
Mandatory: Carefully plan and track reserved IP address usage to prevent running out of the address quota. | |
Recommended: If more than five reserved addresses are required, contact Microsoft support early to increase the reserved address quota to prevent running out and preventing deployments. Leverage reserved IP address names that can be easily associated with the service they are being used for. |
When IaaS- and PaaS-provisioned services need to resolve host names and FQDNs, they can use either Azure provided name resolution or their own DNS server, depending on the actual scenario.
Azure automatically registers a new virtual machine or PaaS role in the Azure default *.cloudapp.net DNS suffix. Storage accounts are registered in *.blob.core.windows.net. Azure Web Apps (a feature in Azure App Service) are registered in *.azurewebsites.net. It may also be desirable to have those services resolvable under a custom domain name, for example *.contoso.com.
The following table is provided to outline scenarios that are related to name resolution.
Scenario |
Name resolution provided by: |
Name resolution between role instances or virtual machines located in the same cloud service |
Azure-provided name resolution |
Name resolution between virtual machines and role instances located in the same virtual network |
Azure-provided name resolution using FQDN ~or~ Name resolution using your DNS server |
Name resolution between virtual machines and role instances located in different virtual networks |
Name resolution using your DNS server |
Cross-premises: Name resolution between role instances or virtual machines in Azure and on-premises computers |
Name resolution using your DNS server |
Reverse lookup of internal IP addresses |
Name resolution using your DNS server |
Name resolution for custom domains (such as Active Directory domains or domains that you register) |
Name resolution using your DNS server |
Name resolution between role instances located in different cloud services, not in a virtual network |
Not applicable. Connectivity between virtual machines and role instances in different cloud services is not supported outside a virtual network. |
Feature References | |
Azure Name Resolution |
https://msdn.microsoft.com/en-us/library/azure/jj156088.aspx |
Configure a custom domain name for blob data in an Azure Storage account |
http://azure.microsoft.com/en-us/documentation/articles/storage-custom-domain-name/ |
Configure a custom domain name in Azure App Service |
http://azure.microsoft.com/en-us/documentation/articles/web-sites-custom-domain-name/ |
Design Guidance |
When you are planning name resolution, consider the following:
Capability Consideration |
Capability Decision Points |
Subscription creation |
When preparing a new Azure subscription for provisioning or migrating resources, configure DNS servers at the subscription level, and then assign them to the virtual network level so the Azure DHCP Server service will hand out the DNS servers for resolution support |
Service limits |
A maximum of 10 custom DNS servers can be configured per subscription. |
Azure DNS is a global scale DNS service for hosting tenant DNS domains and providing name resolution by using Microsoft Azure infrastructure. Azure DNS has been tuned to be a highly available DNS service with fast query response times. Azure DNS provides updates of DNS records and global distribution.
By hosting domains in Azure DNS, tenant DNS records can be managed by using the same credentials, APIs, tools, and billing as other Azure services.
Mandatory: Automation scripts must be created to automate the creation and update of Azure DNS domains and records. |
Azure DNS domains are hosted on the Azure global network of DNS name servers. Azure uses Anycast networking, so that each DNS query is answered by the closest available DNS Server. This provides fast performance and high availability for your domain.
Mandatory: Azure DNS does not currently support purchasing domain names. Tenants purchase domains from a third-party domain name registrar, who typically charges an annual fee. These purchased domains can then be hosted in Azure DNS to manage DNS records. For more information, see Delegate a Domain to Azure DNS. | |
Mandatory: Azure DNS does not currently support C names at the root (apex) of the domain. |
To create the domains and domain records within Azure DNS, you can use Azure PowerShell, Azure CLI, REST APIs, or the SDK.
ETags are used to manage concurrency in a highly distributed DNS infrastructure where changes could be implemented at any location that has access to Azure. Azure DNS uses ETags to handle concurrent changes to the same resource safely.
Each DNS resource (zone or record set) has an ETag associated with it. Whenever a resource is retrieved, its ETag is also retrieved. When updating a resource, you have the option to pass back the ETag so Azure DNS can verify that the ETag on the server matches.
Because each update to a resource results in the ETag being regenerated, an ETag mismatch indicates that a concurrent change has occurred. ETags are also used when creating a new resource to ensure that the resource does not already exist.
By default, Azure PowerShell uses ETags to block concurrent changes to DNS zones and record sets. The optional –Overwrite switch can be used to suppress ETag checks, in which case any concurrent changes that have occurred are overwritten.
Tags are different from ETags. Tags are name-value pairs used by Azure Resource Manager to label resources for billing or grouping purposes. For more information about Tags, see Using tags to organize your Azure resources.
Azure PowerShell supports Tags for zones and record sets. Tags are specified using the –Tag parameter:
There are several mechanisms that provide load balancing capabilities within Azure. The following table outlines these features and their potential use in Azure designs:
Type |
Description |
When to Use |
External load balancer |
A software load balancer that is automatically created when a cloud service is created. It is Internet facing only. It has a single Internet-facing VIP by default, but additional Internet-facing VIPs can be added. VIP addresses are dynamically assigned from the Azure public datacenter address pool by default, but they can be assigned a reserved static address. |
Use external load balancers to provide Internet-facing load balancing capabilities for the UI tier of an application. The remaining tiers of the application should use internal load balancers if required. |
Load balanced sets |
A way to combine multiple virtual machines or PaaS roles from a single cloud service into a group that is associated with a port of the load balancer. |
Use load balanced sets when you need to use a single VIP with multiple load balanced applications in a single cloud service. |
Internal load balancer |
A software load balancer that is internal facing only. It has a single VIP that is allocated from the local subnet address pool. |
Use an internal load balancer when you need load balancing capabilities for an application. However, that application should not be Internet-facing—for example, the second and third tiers of a three tier application. |
Traffic manager |
A public-facing load balancer that is designed to support cross datacenter balancing of loads and geolocation optimization so the user is sent to the closest datacenter. |
Typically used to load balance two cloud services in separate datacenters to provide geolocation optimization. |
Feature References | |
Azure Load Balancer |
http://azure.microsoft.com/documentation/articles/load-balancer-overview |
Azure Traffic Manager Overview |
https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-overview/ |
Azure Traffic Manager Load Balancing Methods |
https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-load-balancing-methods/ |
About Traffic Manager Monitoring |
https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-monitoring/ |
Internal Load Balancer |
http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview |
Configure Load Balanced Sets |
http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview |
Configure an internal Load Balanced Sets |
http://azure.microsoft.com/documentation/articles/load-balancer-internal-getstarted |
Azure Internal Load Balancer SQL Always-On |
https://azure.microsoft.com/en-us/documentation/articles/load-balancer-configure-sqlao/ |
The choice of a name for any asset in Microsoft Azure is an important choice because:
This table covers the naming requirements for various elements of Azure networking.
Item |
Length |
Casing |
Valid characters |
Virtual network |
Case-insensitive |
Alphanumeric and hyphen Cannot start with a space or end with a hyphen | |
Subnet |
Case-insensitive |
Alphanumeric, underscore, and hyphen Must be unique within a virtual network | |
Network Security Group |
Case-insensitive |
Alphanumeric and hyphen | |
Network Security Group rule |
Case-insensitive |
Alphanumeric and hyphen | |
AT&T VLAN name |
15 |
Case-insensitive |
Alphanumeric and hyphen |
Identity and Access Management is a daunting space in technology. As trends come and go and Internet threats mature, identity and access management solutions must constantly evolve.
A few years ago, a framework for building identity solutions emerged. It's called "The Four Pillars of Identity." Many organizations have adopted this framework to forge their identity strategy at macro and micro levels.
The Four Pillars of Identity are areas that identity solutions must address to be successful:
For more information about the Four Pillars of Identity, please read the whitepaper titled The Four Pillars of Identity – Identity Management in the Age of Hybrid IT.
Several options for leveraging identity and access management solutions exist when working with Azure. Most often, it's helpful to distinguish between two audiences when determining a solution: the developer and the IT pro.
The Developer Audience
For developers, the most important thing with regards to identity is to integrate their applications with the organization's preferred identity and access management platform. In the past, many developers didn't have a good grasp on how to integrate applications with enterprise identity and access management platforms, so they often took on the task of managing the identities and access within the application itself.
This places a lot of burden on the developers, because they have to take on all the work of each of the four identity pillars described previously. This means that they have to provide a place to store identities and provide a way for users to change identity data, manage credentials, deactivate their access, request new access, and so on.
Developers would also have to securely authenticate users, manage entitlements that authorize users to various resources in the application, and maintain audit trails of authentication and access events.
Even if a single development team can do this well for a given application, organizations typically have hundreds of applications in use. The result is that users have multiple identities sprawled throughout an organization, with each application operating independently with regards to identity and access management.
The IT Professional Audience
IT pros are under pressure from the organization to facilitate the adoption of cloud services by extending the traditional identity and access management enterprise into the cloud. Without this important integration, cloud services such as Azure, become virtually unusable.
When there is no integration between the cloud and an organization's on-premises identity and access management platform, users have multiple identities with different credentials and different access rights to an organization's data. Not only is this a bad experience for end users, but it makes it impossible to manage access to all of an organization's applications and resources. The issue gets worse when non-Microsoft clouds are introduced into the equation.
Another important concept is that identity becomes the "control plane" for the cloud. In the past, an organization could keep sensitive information on-premises and put up firewalls and extranets to protect it and keep potential malicious users out.
This becomes much more difficult in a cloud-connected world. The network edge is being pushed out and becoming vaster, while users on mobile devices are accessing on-premises applications inside the organization's network and cloud services provided by the organization.
Organizations can no longer depend on firewalls to keep out potential attackers because those firewalls also keep out the people who require access to resources. Because of this, the identity and access management platform is the primary means of protecting an organization's applications and data in the cloud-connected world.
Azure Active Directory (Azure AD) interacts with the cloud in two ways:
IT professionals will mostly be concerned with Azure AD as an enabler of the cloud because they are often tasked with integrating the enterprise identity and access management platform into the cloud.
On the other hand, developers will mostly be concerned with the identity services that Azure AD provides as a consumer of the cloud. Most often, they are looking to understand how their applications can leverage the cloud identity service.
Enabler of the Cloud
Azure AD plays a pivotal role in enabling the cloud. To use Microsoft cloud services, such as Office 365, the cloud services must:
Rather than having each cloud service keep its own identity repository, all Microsoft cloud services use Azure AD. The capabilities of the Microsoft cloud cannot be enabled without it.
After identities are populated in the Microsoft cloud, Azure AD becomes an identity and access management hub that enables other clouds. Azure AD can facilitate access to an organization's custom applications regardless of whether they are on-premises or hosted in the cloud, in addition to other Software-as-a-Service applications that do not reside in the Microsoft cloud.
Consumer of the Cloud
Azure AD is a single, multitenant directory that contains over 200 million active identities and serves billions of authentication requests each day. A cloud-scale identity service like this can only be built by using the scale and breadth of the cloud. In addition, Azure AD has features that rely on cloud services, such as Azure Multi-Factor Authentication and machine learning technologies. In this way, Azure AD consumes the cloud to provide its services.
Azure AD becomes a cloud service that can be consumed by other applications and services. Application programming interfaces (APIs) and endpoints are exposed so that developers can use Azure AD to store and retrieve their identity data, and they can depend on Azure AD to authenticate users to their applications.
IT professionals can use the cloud Identity Management-as-a-Service features, such as Self-service password reset, to enable new identity management capabilities that traditionally took months to deploy on-premises.
Feature References | |
Azure Active Directory |
http://azure.microsoft.com/en-us/documentation/services/active-directory/ |
What is Azure Active Directory |
http://azure.microsoft.com/en-us/documentation/articles/active-directory-whatis |
The existence of an Azure AD directory is a requirement for an Azure subscription. Therefore, each Azure tenant has at least one directory associated with it. This directory is used for signing in to and accessing the Azure portal, Office 365, and other Microsoft cloud services.
Additionally, Azure tenants can have multiple directories. These directories are separate and unique. For example, if two Azure AD directories exist in the same tenant, they have their own set of administrators and there is no data shared between them. Administrators of one directory in the tenant do not have access to another directory in the same tenant, unless they are explicitly granted access to it.
Mandatory: There must be at least one directory in the tenant. You do not have a choice. All tenants created within Azure are assigned a default directory if one doesn't exist. |
How Many Directories Should a Customer Have?
Most tenants should have at least two directories—one for the production users using the cloud services that are integrated with Azure AD, and another directory for testing.
If a customer has software development teams, it is possible that those teams might need Azure AD directories that they can use for developing applications. The following criteria should be used to determine if separate development directories are needed:
Optional: Software development teams might want their own Azure AD directories in the tenant. |
Some complex government organizations look like a single entity on paper; but in reality, they are multiple, independently-run organizations. The question of whether to have a single tenant or multiple tenants is a very important discussion to have with these customers before they get locked in to a model that doesn't work for them.
There is no definitive answer for every situation. Rather, this must be addressed on a case-by-case basis. The following criteria should help you understand how to guide customers.
Considerations for a cross-organizational directory:
Considerations for unique organizational directories:
The topic of a cross-organizational directory is important to discuss with commercial customers who often buy and sell other companies. The following criteria can be used to help you determine if the customer should have a cross-organizational directory.
Considerations for a cross-organizational directory:
Considerations for unique organizational directories:
When a directory is created, the default name of the directory is <something>.onmicrosoft.com. The <something> is chosen by the directory administrator during the creation of the directory. Usually, customers want to use their own domain name, such as contoso.com. This can be achieved by using a custom domain name.
Recommended: Add a customer's public-facing DNS name as a custom domain name for the production Azure AD directory. Otherwise, users will sign in with accounts such as bob@contoso.onmicrosoft.com instead of bob@contoso.com. |
Multiple custom domain names can be added to each Azure AD directory, but a custom domain name can only be used in one Azure AD directory. For example, if there are two Azure AD directories in the tenant, and the first directory assigns the custom domain name of contoso.com, the second directory cannot use that name.
Mandatory: Custom domain names must be publically registered with an Internet domain name registrar, and the customer must be able to modify the DNS records of the public record to prove ownership of the domain. | ||
Feature References | ||
Add your custom domain to the Azure AD tenant |
https://msdn.microsoft.com/en-us/library/azure/hh969247.aspx |
When you create a new Azure AD tenant, the contents of the directory will be managed independently from the on-premises Active Directory forest. This means that when a new user comes in to the organization, an administrator must create an on-premises Active Directory account and an Azure Active Directory account for the employee. Because these two accounts are separate by default, they also may have different user names and passwords, and they need to be managed separately.
However, an organization can use Azure AD Connect to connect the on-premises Active Directory to Azure AD. When this is in place, users that are added or removed from the on-premises Active Directory are automatically added to Azure AD. The user names and passwords are also kept synchronized between the two directories, so end users do not have different credentials for cloud and on-premises systems.
AD FS can be used to add an identity federation trust between on-premises Active Directory and Azure AD, which enables the users in the organization to have a single sign-on experience. We call this scenario the "identity bridge" because it bridges the on-premises identity systems with the cloud, thereby enabling a single identity service for the enterprise.
Recommended: Unless you have a cloud-only company (with no on-premises systems), you should incorporate this integration. Even if you are not using Azure AD, you will have a better experience with Azure and the other Microsoft cloud services that you may subscribe to. |
The goal for synchronization of identities is to extend the on-premises Active Directory into Azure AD. After synchronization is in place, Active Directory and Azure AD should be viewed as a single identity service with on-premises and cloud components, instead of two separate identity services.
In most cases, managing the identities (such as on-boarding, off-boarding, and entitlement changes) still occurs on-premises by using identity management solutions that were specifically created for these scenarios.
This is depicted in the "On-Prem" box in the following diagram. These systems are usually going to be different than the identity bridge systems that connect the on-premises Active Directory to Azure AD.
Historically, there have been four tools available to do the job of the identity bridge, which has caused a lot of confusion. Therefore, we released a single tool that can be used for everything except the most complex of scenarios.
When deciding on which synchronization tool to use, the choice should be between using Azure AD Connect or the Microsoft Identity Manager Synchronization Services with the Azure AD Connector. This is summarized in the following diagram.
In general, the default stance should be to use Azure AD Connect, unless the scenario is extremely complex, requiring a lot of customization. Some key features of Azure AD are lost (such as password synchronization and write-back) when using Identity Manager, so it should only be used as a fallback option if absolutely necessary.
Many customers do not have simple single-forest Active Directory environments, and dealing with multiple forests can be a challenge when integrating with Azure AD. Typically, customers fall in two scenarios:
Single Forest with Multiple Domains
Some customers have a single forest environment with multiple domains. Azure AD Connect natively handles this scenario when the following conditions need to be met:
Account and Resource Forest Model
When a customer has an account and resource forest model, there is a dedicated forest where all of the user identities reside (the account forest) and a dedicated forest for some or all of the applications (the resource forest). A one-way trust (often a forest trust) is in place so that the resource forest trusts the account forest. This relationship is depicted in the following diagram.
This is most commonly seen with complex Exchange Server deployments. Often, there needs be a representation of the user in the resource forest's Active Directory for the application to use. This is sometimes referred to this as a shadow account. In most cases, it's a duplicate of the user's account from the Account forest, but it is put into a disabled state. Thereby, users are prevented from signing in to it.
Azure AD Connect natively handles this scenario. If the resource forest contains data that needs to be added to Azure AD (such as mailbox information for an Exchange user), the synchronization engine detects the presence of disabled accounts with linked mailboxes. The appropriate data is then contributed to the Azure AD user account.
Multiple Forests with Unique Users
In this scenario, there are multiple independent forests in the environment, which may or may not have Active Directory trust relationships between them. This situation will be encountered in highly segmented organizations or companies that acquire other companies via mergers and acquisitions. The following diagram depicts what this architecture might look like.
Users in this scenario have only a single account in one of the forests (they do not have multiple user accounts across forests). Because of this, you do not need the synchronization tool to match a user to multiple accounts.
However, one decision that needs to be made is whether the accounts will be migrated into a single forest at some point. This is an important thing to consider, because it will determine whether you can use the objectGUID of the user accounts as the source anchor (which is used to match the Active Directory accounts to the Azure AD accounts).
If the users will be migrated to a single forest at some point, you'll need to use a different source anchor, such as the user's email address or UPN. The reason is that the objectGUID can't be migrated with the user. After migration, there would be multiple accounts in Azure AD for migrated users—one for the old forest and another for the new forest.
Mandatory: If users from the additional forests will be migrated into a single forest in the future, you must choose something other than the objectGUID as the source anchor attribute (such as the mail attribute). |
Multiple Forests with Duplicate Users
This scenario is the same as the previous scenario (multiple forests with unique users) with the exception that a single user has multiple user accounts in different forests in the environment. These accounts are either:
Even though there are multiple user accounts in the organization, there should be only a single account for the user in Azure AD. To enable this, the synchronization service needs to be able to match user accounts across the forests to a single person. For this to happen, the accounts in each forest need to have an attribute that contains the same, unique value for a user.
Mandatory: If a single person has multiple user accounts in different forests, you must choose a common attribute to match the accounts together. |
The User Principal Name (UPN) is the attribute in Azure AD that is used for a user's sign-in name. By default, this is sourced from the on-premises Active Directory directory by using the userPrincipalName attribute for the user account. Because of legacy guidance, some customers' AD forests use non-routable UPN suffixes or UPN suffixes that are different from the public-facing DNS name of the organization.
For example, the UPN suffix in Active Directory might be @contoso.local, while the public facing DNS name is @contoso.com. In this situation, the Active Directory users have a log-in name similar to bob@contoso.local, rather than bob@contoso.com.
Azure AD requires that the UPN suffix be a valid public domain name that is registered with an Internet name registrar. This is to ensure that it's unique across all Azure AD tenants and that only one organization owns the domain name. When the tenant is federated with an on-premises identity provider, the UPN suffix is used to determine where to redirect the user for authentication.
Customers that have a UPN suffix that is not routable or not desirable for the user logon name have two options:
UPN Rationalization
UPN rationalization entails that the organization add a new UPN suffix to the Active Directory forest, and then change the UPN suffix of every account to match the new UPN suffix. This is the preferred approach for UPN alignment because it provides the best experience for users after the alignment is complete. There are challenges with UPN rationalization, however.
Applications that are Dependent on the UPN Attribute
It is possible that some of a customer's applications use the UPN to store data about users in the application. If this is the case, changing the UPN in Active Directory would break those applications. The risk associated with performing a UPN rationalization exercise increases with the size of the organization.
For smaller customers with a well-defined set of applications, it's easier to determine if changing the UPN suffix will impact any of the applications in use. However, for larger organizations, it is nearly impossible to gauge the impact. In that situation, it is best to pick a sample of users that is representative of all of the business groups in the organization, and first test the change with their accounts.
Mandatory: If user certificates use the UPN in the Subject Name field, the certificates need to be reissued during the UPN rationalization. | |
Recommended: Before performing rationalizing UPNs, build a catalog of the applications that have a dependency on the UPN attribute of the users, and test the new UPN on users of those applications. |
User Certificates Issued with UPN as the Subject Name
Another big challenge when changing UPNs is that some organizations issue x.509 certificates that have the UPN value in the Subject Name field of the certificate. The impact varies with each customer because the certificates could be used for authentication or for signing or encrypting data, such as when sending email messages.
The data in a certificate cannot simply be changed because the certificates are digitally signed by the Certification Authority that issues them. If the data is changed, the signature is broken and the certificate is no longer valid. Therefore, the certificates must be reissued when the UPN is changed.
The process of obtaining new certificates varies between customers, so if there are certificates that rely on the UPN attribute, it's important to understand the process that the customer uses for reissuing those certificates. In some cases, this may mean provisioning a new "soft" certificate (a certificate with a private key that resides on the computer, rather than a hardware device) to the user's machine. Or it may require that the user write the new certificate to their smartcard.
Recommended: Understand the process used for reissuing user certificates, so that you can adequately communicate with users and prepare for a massive reissuance event, if needed. |
If an identity management solution is in place, such as Microsoft Identity Manager, it's likely that there's a dependency on the UPN attribute. In these cases, it's likely that the identity management system is managing the value of the UPN attribute for users. So if the UPN is changed on the user account in Active Directory, the identity management system would set it back to the old value (which it deems is authoritative).
Depending on the configuration of the identity management system, it is also possible that the UPN attribute is being used as an anchor for joining identities in different systems to the identities in Active Directory. Therefore, changing the UPN without updating the identity management system could result in identities being disassociated in the connected systems. At best, this would cause the identities to stop synchronizing to those systems. At worst, the identities would be deleted from the target system.
Recommended: Spend some time understanding the identity management systems that are used for managing Active Directory within the organization to ensure that there isn't a dependency on the UPN of user accounts. |
Alternate Login ID
The Alternate Login ID is a way to achieve UPN alignment without having to modify the UPN attribute of user accounts in Active Directory. When using the Alternate Login ID, an Active Directory attribute other than userPrincipalName is selected to feed the UPN of Azure AD. This can be any unique, indexed attribute that uses the user@domain.com format. The impact to users is much less than changing their UPNs.
Although the Alternate Login ID can help in some situations, it should not be the default solution because it has some drawbacks, including:
Due to these issues, it is recommended that Alternate Login ID be used as a secondary option only when UPN rationalization is not possible with a customer.
It is not possible to have a high availability design for the server hosting the Azure AD Connect service. By default, the synchronization server runs the synchronization job to Azure AD every three hours by using a scheduled task on the server. This interval can be decreased, if needed. High availability for the Azure AD Connect server should not be necessary in most situations because synchronization is not a continuous event.
In the event of a catastrophic failure, a new Azure AD Connect server can be built and synchronized in a couple of hours for a medium-sized business. Larger business with more than 100,000 users will take more time to synchronize. If you need a faster recovery time, Azure AD Connect can be configured to use a dedicated SQL Server deployment with high availability.
Consider a dedicated SQL Server environment in the following scenarios:
Optional: A dedicated SQL Server instance can be used to provide better performance and high availability options for the Azure AD Connect synchronization service. |
With password hash synchronization, the Azure AD Connect service will synchronize one-way SHA256 hashes of Active Directory password hashes into Azure AD. This allows a user that signs into Azure AD to use the same password that is used to sign in to the on-premises Active Directory.
Even though the default synchronization frequency for Azure AD Connect is every three hours, password hash synchronization occurs every two minutes, allowing users who change their passwords in on-premises Active Directory to begin using their new password in Azure AD almost immediately.
When you enable password hash synchronization, it applies to all users that are being synchronized to Azure AD. This means that you cannot pick and choose which user's password hashes get synchronized. The only way to prevent a user's password hash from being synchronized to Azure AD is by filtering out the user in the synchronization policies, thereby removing their account from Azure AD.
If you are using federated authentication for Azure AD, we still often recommend enabling password hash synchronization. This approach is recommended to allow password-based sign in to be used as a fallback if the customer's on-premises AD FS instance goes down.
If a user's password is already synchronized to Azure AD, enabling password-based sign in is as simple as running a PowerShell script. Users can safely be switched back to federated authentication after the problem is resolved and AD FS is back online.
Recommended: Even if all of a customer's users are signing in to Azure AD with AD FS, it is recommended to enable password synchronization. Doing so provides a good fall-back method for user authentication if AD FS goes offline. |
After user accounts are synchronized from the on-premises Active Directory to the Azure AD tenant, users can sign in to the accounts and access applications that are integrated with Azure AD, such as Office 365. There are two options for signing in users to Azure AD:
Authenticating to Azure AD
The user object in Azure AD is separate from the object in the on-premises Active Directory. Because of this, the Azure AD object has its own user name and password. Unless password hash synchronization is enabled in Azure AD Connect, users will have different passwords for Active Directory and Azure AD.
This can confuse users and lead to a poor cloud experience. Therefore, it's recommended to enable password hash synchronization unless there is a specific reason that the customer doesn't want it enabled.
Recommended: Enable password hash synchronization so that the Azure AD password for users is the same as the on-premises Active Directory password. |
Authenticating to an On-Premises Identity Provider
Azure AD supports the ability to establish an identity federation trust with an on-premises identity provider (IdP), such as Active Directory Federation Services (AD FS). This enables users to have a desktop single sign-on experience when accessing resources that are integrated with Azure AD.
With this experience, an end user would sign in to a domain-joined workstation and not be prompted again for a password throughout the entire session, regardless of which applications are used.
When a federation trust is in place, Azure AD defers to the on-premises identity provider to collect the user's credentials and perform the authentication. After authenticating the user, the on-premises identity provider creates a signed security token to serve as proof that the user was successfully authenticated.
This security token may also contain data about the user (called claims), which can then be provided to Azure AD for various purposes. The security token is given to Azure AD, which then verifies the signature on the token and uses it to provide access to the applications. The following diagram illustrates this behavior:
Domain Names
When enabling a federated identity relationship between Azure AD and an on-premises identity provider, an entire domain name in Azure AD is converted from a standard domain to a federated domain. This impacts all of the users that have UPNs under the domain name. You cannot have a mix of federated and non-federated users in a domain name.
Note: You cannot convert the default <tenant>.onmicrosoft.com domain name to a federated domain name. Only custom domain names added to Azure AD can be federated.
Any subdomains under a domain namespace will have the same configuration as the parent domain. For example, if the custom domain name contoso.com is configured as a federated domain, child.contoso.com will also be a federated domain. This happens automatically by Azure AD, and it cannot be overridden.
Recommended: Federated domains can be converted back to standard domains at any time. Using this option in conjunction with password synchronization can provide a great fall-back strategy if the customer's identity provider goes down for a period of time. |
After the domain name is converted to federated, all users who attempt to sign in to Azure AD with a UPN from the converted domain (or one of its child domains) will be redirected to the on-premises identity provider for authentication. If the user does not have a valid account in the on-premises identity provider, the user will not be able to authenticate to Azure AD or to any of the connected applications.
Federating Multiple Domains
If there are multiple custom domain names in an Azure AD tenant that need to be federated, Azure AD can be configured to redirect users to a single identity provider or to multiple identity providers. The following criteria should be used to determine whether to use a single or multiple identity providers.
Use a single identity provider if:
Use multiple identity providers if:
Recommended: Use a single identity provider for the organization, if possible. Otherwise, you'll have to manage multiple instances of the Identity Federation Service on-premises. |
Multi-Factor Authentication
Multi-factor authentication (MFA) adds a secondary authentication prompt for users when they sign in to an application that is integrated with Azure AD. This secondary authentication takes the form of something other than a password prompt. Azure AD uses Azure Multi-Factor Authentication, which performs MFA by using a phone call, a text message, or an authentication request to a smart phone application.
Depending on the on-premises identity provider, additional third-party MFA providers can be used in addition to Azure Multi-Factor Authentication. These MFA providers provide additional MFA methods that Azure Multi-Factor Authentication does not support, such as the use of OATH tokens.
Most identity providers support the use of some form of multi-factor authentication when authenticating a user on-premises. With Azure AD, you have the choice of having the identity provider perform multi-factor authentication or have Azure AD perform it with Azure Multi-Factor Authentication.
Performing MFA Through an Identity Provider
When an identity provider performs the MFA, you must configure the identity provider to use Azure Multi-Factor Authentication or a third-party multi-factor authentication service.
If the identity provider performs multi-factor authentication for a user, it must pass a special claim in the security token that it sends to Azure AD to indicate to Azure AD that MFA was performed on-premises. When this claim is passed into Azure AD, Azure AD bypasses its own prompt for MFA. For more information, see Getting started with Azure Multi-Factor Authentication and Active Directory Federation Services.
Perform MFA through the on-premises identity provider if any of the following conditions exist:
Performing MFA with Azure AD
If MFA is performed with Azure AD, the on-premises identity provider is not configured to perform MFA at all. Instead, MFA is activated globally for the users in the Azure AD tenant, or it is activated for a specific application by using a conditional access rule in Azure AD.
After authentication succeeds at the on-premises identity provider, the identity provider creates a security token and sends the user's identification to Azure AD with the security token. Azure AD receives and verifies the token, and then performs the MFA process as a secondary authentication step.
Perform MFA with Azure AD if any of the following conditions exist:
Optional: Multi-factor authentication is an optional service that can increase the security of Azure AD. To use Azure Multi-Factor Authentication with the on-premises identity provider, an Azure AD Premium license is required. |
Multiple Forest Configurations
When dealing with an environment that has multiple Active Directory forests, the main consideration is whether a single identity provider is needed, or if you need multiple identity providers. In general, the answer is going to depend on what product is used as the on-premises identity provider, and how it supports authentication across Active Directory forests.
Regardless of the vendor, however, the following key tenets of federation remain true with Azure AD:
Using AD FS
AD FS can support a multiple forest configuration, but only if all the forests have two-way Active Directory trust relationships between them. If there are no forest trusts between the multiple Active Directory Domain Services (AD DS) forests, you must have multiple AD FS deployments (one for each forest that is untrusted).
Mandatory: If using AD FS in a multiple forest configuration with no forest trusts between Active Directory forests, you must have multiple deployments of AD FS (one for each untrusted forest). |
If possible, we recommend that customers have trusts between their multiple Active Directory forests, so that only a single AD FS farm is needed for Azure AD. If this is the case, the UPN suffixes need to be unique across each forest, otherwise the domain controllers will not be able to properly route the UPN suffixes to the correct forest for cross-forest authentication. For example, you can't have two domains that contain users with an @contoso.com UPN suffix.
Recommended: If possible, we recommend that you have trusts between each Active Directory forest and use a single AD FS instance with Azure AD. This simplifies the architecture and prevents you from having to manage multiple AD FS farms. | |
Mandatory: If AD FS is used in a multiple forest configuration with trusts between the Active Directory forests, the UPN suffixes for each domain must be unique. |
Using a Third-Party Identity Provider
If a customer is using a third-party identity provider, the level of multiple forest support is going to depend on what the identity provider is capable of. For example, the "Optimal IDM Virtual Identity Server Federation Service" identity provider can be implemented with a single deployment whether or not Active Directory forest trusts are in place across the multiple domains.
You'll want to check with the vendor of the identity provider that the customer wants to use to find out their multiple forest support capabilities.
Using AD FS as the Azure AD Identity Provider
Most commonly, AD FS is used as the identity provider for Azure AD. Designing and deploying an AD FS farm tends to be a complex task, and we typically recommend a separate engagement for doing so. However, if a customer is implementing AD FS for only Office 365, a simple load-balanced AD FS deployment is probably fine if it's scaled appropriately.
Authentication Options
In general, it's best to provide a different authentication experience for users, based on whether they are on the corporate intranet. If a user is on the intranet and has direct access to an AD DS domain controller, you want to allow AD FS to use Windows Integrated Authentication (WIA). If a user is not on the intranet, however, there is no access to a domain controller, and therefore, no capability to perform WIA. Instead, it's best to provide the user with a forms-based authentication experience.
AD FS is able to accommodate this scenario through an external-facing server role called the Web Application Proxy. Although AD FS servers are configured for WIA, Web Application Proxy servers sit at the edge of the network in the extranet, and they are configured with Forms authentication.
The determination of whether a user hits the AD FS server or the Web Application Proxy server is based on how the client computer resolves the DNS name of AD FS. This architecture requires a split-brain DNS implementation so that users outside of the corporate network resolve the AD FS DNS to the Web Application Proxy servers, while the users inside the corporate network resolve it to the AD FS servers directly.
Mandatory: If you want to implement Web Application Proxy servers in AD FS (which is highly recommended), you need to have a split-brain DNS implementation. |
Another authentication option for users is to allow them to sign in with an x.509 certificate, which is on a smart card, a virtual smart card, or a soft certificate that is installed on the client. This is used as an alternate primary authentication option, and therefore it bypasses the user name and password prompt at the AD FS log-on page.
You should consider using client certificate authentication if:
Optional: You can optionally use client certificate authentication with AD FS to provide certificate-based multi-factor authentication. This is a good MFA option if your users already have authentication certificates issued to them. |
Certificate Options
AD FS requires the following certificates:
Certificate Type |
Description |
Token signing certificate |
AD FS uses the private key of this certificate to sign the security token that it sends to Azure AD. The public key of the certificate is provided to Azure AD through a metadata file when federated authentication is configured. |
Web application SSL certificate |
This certificate is used for transport security when users browse to the AD FS log-on page. This is a traditional TLS certificate used for HTTPS connections with users' web browsers. |
Web service SSL certificate |
This certificate is different from the Web Application SSL certificate because it's used specifically to protect the web service in the AD FS active endpoints. Although not required by Azure AD, some Office 365 properties (such as Exchange Online and Lync Online) use this endpoint. Most deployments use the same certificate as the Web Application SSL Certificate. |
Although there is no single configuration of certificates that makes sense for every situation, we've distilled the guidance into a few scenarios. Use the following table to make the appropriate decisions for certificate configuration:
Level of Security Consciousness |
Certificate |
Issuer |
Validity Period |
Uniqueness |
Favors convenience over security |
Token Signing |
Self-Signed |
1 year |
Unique |
Web App |
CA-Issued (Publically Trusted) |
5 years |
Common | |
Web Service |
CA-Issued (Publically Trusted) |
5 years |
Common | |
Somewhat security conscious |
Token Signing |
Self-Signed |
1 year |
Unique |
Web App |
CA-Issued (Publically Trusted) |
2 years |
Common | |
Web Service |
CA-Issued (Publically Trusted) |
2 years |
Common | |
Favors security over convenience |
Token Signing |
CA Issued |
1 year |
Unique |
Web App |
CA-Issued (Publically Trusted) |
1 year |
Unique | |
Web Service |
CA-Issued (Publically Trusted) |
1 year |
Unique | |
Mandatory: Azure AD requires that the key size of the token signing certificate is a minimum of 2048 bits. |
Web Application Proxy Servers
In the context of Azure AD, the Web Application Proxy role is basically an edge role for the AD FS implementation on-premises. Using the Web Application proxy has the following benefits for customers:
Because of these benefits, we typically recommend using the Web Application Proxy role with an AD FS deployment that interacts with Azure Active Directory.
Recommended: It is very rare to not include Web Application Proxy servers in the AD FS deployment for Azure AD. We recommend that they be implemented unless you have a sound reason for not wanting to do so. |
Although Web Application Proxy servers can handle a heavier load than AD FS, we generally recommend that you deploy Web Application Proxy servers in a 1:1 relationship with AD FS servers. For example, if an organization has three AD FS servers on the intranet, it's a good idea to start with three Web Application Proxy servers in the extranet. You can monitor the performance of the Web Application Proxy servers and determine if you can remove some of the servers from the load balancer, based on the utilization metrics of the servers.
Recommended: As a general rule, we recommend starting with an equal number of Web Application Proxy servers and AD FS servers. |
High Availability
AD FS is a web application and a web service, so making it highly available and responsive to client requests consists of using traditional web-based load balancing methods. We've performed AD FS deployments with most of the major load-balancer manufacturers on the market.
Some of the load balancers we've used are Citrix Netscaler, F5 BigIP, and Windows Network Load Balancing. We've found that most load-balancing manufacturers have specific guidelines for integrating AD FS with their products.
The following diagram illustrates what a common load-balancing architecture for AD FS looks like.
AD FS supports active/active load balancing within a network segment, and it is 100% stateless. This means that persistent connections (sticky connections) are not needed at the load balancer. This simplifies the load balancer configuration to the point where you only have to select the load balancing algorithm to be used.
Database Selection
AD FS has the ability to use a database that is hosted on a full deployment of SQL Server or it can use an internal database known as the Windows Internal Database (WID). The following table provides a comparison of these options:
Consideration |
SQL Server |
WID |
Database availability |
Capable of active/active deployments, with multiple database nodes having write capabilities |
Active/active deployment, but only one node can write to the database—all other nodes are Read-only |
Replication |
Any supported SQL replication option |
Uses a primary/secondary replication methodology, with replication occurring at five-minute intervals |
Performance |
Can provide very high performance, which is good for environments consisting of hundreds of thousands of users and more than 100 applications integrated with AD FS |
Performance is good, but not has high as SQL Server. If the customer has less than 100,000 users and is only using AD FS for Azure AD, this is a good option. |
Scalability |
No restrictions on scalability; scales as high as SQL Server allows |
Soft limit of 10 servers in the AD FS farm |
Location |
Can be stored on separate, dedicated database servers, which can be used to increase performance |
Local copy resides on each AD FS server |
Complexity |
Can be very high in complexity (we've seen AD FS deployments where more than half of the project consisted of deploying SQL Server) |
Very low and works with virtually no configuration necessary |
Aside from the general recommendations, there are two situations where SQL Server may be required over the WID:
Scenario |
Description |
Token replay protection |
Token replay protection comes in to play when AD FS is not the ultimate identity provider being used. In these scenarios, there is another identity provider that AD FS defers authentication to and receives a security token from. Token replay protection prevents an attacker from replaying the token sent to AD FS from the additional identity provider. To do this, AD FS writes a hashed value of the token to the database. Therefore, every AD FS server must be able to write to the database, making this scenario unsuitable for WID-based deployments. This is typically not a common scenario with Azure AD, but it's possible that customers may be using this capability, so this may be a key decision factor in determining which type of database to use with AD FS. |
SAML artifact resolution |
This is a capability where AD FS can give a portion of the security token to the application through the user's web browser, instead of the entire token. The application can then go back to the AD FS server and retrieve the rest of the security token without the user's browser being involved. This capability requires every AD FS server in the farm to write to the database, so it's not suitable for WID-based deployments. It's rare that we see customer's using SAML artifact resolution. Azure AD in particular does not use it, so if a customer is only using AD FS for Azure AD integration, this scenario can be ignored. |
Unfortunately, there is no single architecture that is right for every situation. However, we recommend that you default to a WID-based architecture, unless there is a solid reason to depart from it. This keeps the architecture simpler, particularly in cases where multisite deployments are needed.
Recommended: Plan to use a WID-based architecture by default, unless there is a solid reason to depart from it. |
Using Third-Party Identity Providers with Azure AD
The use of third-party identity providers is supported in Azure AD if the identity provider is on the approved list (see Use Third-Party Identity Providers to Implement Single Sign-On in the following References table).
The guidance for AD FS in the previous section of this document is not applicable to third-party identity providers. If a customer chooses to go that route, you'll need to work with the vendor to obtain the recommended practices for implementing the identity provider.
References | |
Install the Azure Active Directory Sync Service |
https://msdn.microsoft.com/en-us/library/azure/dn757602.aspx |
DirSync: Using Alternate Login IDs with Azure Active Directory |
|
Configuring Alternate Login ID |
|
Set up a trust between AD FS and Azure AD |
https://msdn.microsoft.com/en-us/library/azure/jj205461.aspx |
SSO for On Prem IWA Apps Using Kerberos constrained delegation with Application Proxy |
https://msdn.microsoft.com/en-us/library/azure/dn879065.aspx |
Use Third-Party Identity Providers to Implement Single Sign-On |
https://msdn.microsoft.com/en-us/library/azure/jj679342.aspx |
Azure Multi-Factor Authentication Options for Federated Users |
https://msdn.microsoft.com/en-us/library/azure/dn394284.aspx |
How to Switch from Single Sign-On to Password Sync |
For an overview of Azure Multi-Factor Authentication, please see What is Azure Multi-Factor Authentication.
There are multiple ways to acquire licenses for Azure Multi-Factor Authentication. The following methods currently exist:
Method |
Description |
Direct purchase of Azure Multi-Factor Authentication licenses |
The ability to pay on either a per-user or per-authentication basis |
Purchase as part of Azure AD Premium |
AAD-Premium includes Azure licenses in the per-user cost |
Purchase through the Enterprise Mobility Suite |
The Enterprise Mobility Suite includes Azure AD Premium as part of the package, which includes Azure Multi-Factor Authentication per-user licenses |
You may encounter a situation where a customer purchases licenses and there was a delay before the licenses are available. If this is the case, do not simply create a temporary trial license and expect the trial licenses to be converted into production licenses. You'll want to understand what licenses the customer purchased, and work with the customer's Account team to make sure that the appropriate licenses become available.
Recommended: Be sure to understand how a customer acquired the Azure Multi-Factor Authentication licenses. Doing so may save you time and problems in the future. |
There are two types of Azure Multi-Factor Authentication services available:
Situations for Using Azure Multi-Factor Authentication Server
You should use Azure Multi-Factor Authentication Server in the following conditions:
Conditional access provides a way to specify advanced authorization rules for requiring MFA. Without conditional access, users that are enabled for MFA require MFA every time they attempt to access an application. For more information about conditional access, please see Azure Conditional Access Preview for SaaS Apps.
Exception Policies
Conditional access allows you to use "exception policies" so that users who are members of certain Azure AD security groups are exempt from the MFA requirement. When a user is exempt from MFA, the exemption overrides any MFA requirement, regardless of the application or the other groups that the user is a member of. Keep this in mind when working through exception policies with customers, and use exception policies sparingly, and only when necessary.
Recommended: Exception policies override the MFA requirement for certain users. This may cause a security violation for some customers, so be sure that you understand a customer's security policies when helping a customer define MFA exemptions. | ||
References | ||
What is Azure Multi-Factor Authentication? |
http://azure.microsoft.com/en-us/documentation/articles/multi-factor-authentication/ |
Azure Active Directory contains a series of reports that can be used by customers to gain insight into various activities about the user. These reports are broken into three categories:
These reports are available to customers with Azure Active Directory Premium licenses. There is nothing architecturally to be considered with these reports. Customers simply need to be aware that they exist and should be monitored.
As a cloud service, Azure Active Directory does not need monitoring. You can, however, visit the Azure Status page to determine if there are currently any issues with the Azure AD service.
One monitoring capability that Azure Active Directory does provide is monitoring on-premises AD FS servers. This can be any AD FS implementation; it is not required to be the AD FS implementation that federates an organization with Azure AD.
This feature is called Azure AD Connect Health. This service allows you to install agents on your AD FS servers that push audit data to Azure AD. When the data is there, Azure AD can provide lots of useful information about the health and metrics of AD FS.
For more information, please see Azure AD Connect Health.
Agent Installation
The Azure AD Connect Health agent must be installed on each AD FS server that is being monitored. The package can be downloaded and deployed via automated software installation (such as Active Directory Group Policy Objects or System Center Configuration Manager), or it can be manually installed on each AD FS server.
Azure AD Connect Health has an option to keep the agents automatically up-to-the date (which is turned on by default), so the installation of the agents is a one-time event.
For some of the agent functionality to work, auditing must be turned on at each of the AD FS servers. This is turned off by default, so you need to ensure that a customer does not have any issues enabling it.
Mandatory: Auditing must be enabled on each AD FS server for the Usage Analytics in Azure AD Connect Health to work properly. |
There is also a set of outbound URLs that the agent contacts. These URLs must not be blocked by firewalls. For a complete list of these addresses, see the "Outbound Connectivity to Azure Service Endpoints" section on the Azure AD Connect Health Requirements page.
Network Connectivity
The Azure AD Connect Health agent sends audit and event log data to Azure AD. If network connectivity is disrupted, the agent will queue data up to an amount equal to 10% of the total system memory. If connectivity isn't restored before the queue is full, the newer data will overwrite the older data until network connectivity is restored. It is estimated that 1000 requests consume about 80 MB of data.
Recommended: You'll want to ensure that there's a big enough buffer on the AD FS audit channel to prevent wrapping of data. We recommend that you have at least 1 GB of storage allocated to the AD FS audit channel. | ||
References | ||
Azure Status |
||
Azure AD Connect Health Requirements |
https://msdn.microsoft.com/en-us/library/azure/dn906733.aspx | |
Azure AD Connect Health FAQ |
https://msdn.microsoft.com/en-us/library/azure/dn906723.aspx |
All users that have a Basic, Premium, or Enterprise Mobility Suite license must be specifically assigned to the license to use the associated features. There are two ways to associate a license with a user in Azure AD:
Direct License Assignment
With a direct license assignment, you are assigning a license to an individual person. For large organizations, this model can be unsustainable because it requires you to manage each user's license individually.
However, if you are using an Identity Management service in your on-premises environment (such as Microsoft Identity Manager), you can directly assign licenses to users by having the Identity Management service run a PowerShell command. Therefore, we recommend assigning licenses directly if:
Group Membership
Another approach for assigning licenses to Azure AD users is to add the users to an Azure AD group, and then assign the license to the group, instead of to individual users. The group used for the license assignment can be a group that is sourced from the on-premises Active Directory or a group that is sourced from the Azure AD.
This approach tends to be more manageable for large organizations, especially those that already have a group management solution in place. We recommend assigning licenses via group membership if:
References | |
Manage Azure AD Subscriptions and Licenses |
https://msdn.microsoft.com/en-us/library/azure/dn919664.aspx |
Azure Active Directory provides a set of capabilities that allow users to manage their identities in the cloud.
With self-service password reset, users can reset their forgotten passwords in Azure AD and the new password can optionally be written back to the on-premises Active Directory. To support password write-back, the following must be in place:
Authentication
When a user initiates the process to replace a forgotten password, the user must be authenticated. Because the password is not known, an alternate means of authentication must be used. Azure AD self-service password reset supports the following forms of authentication for forgotten passwords:
Phone-Based Authentication
To use the phone-based methods, the user's mobile phone or office phone must be populated in the directory. This can be done through Azure AD Connect synchronization or by manually updating the phone number through the web interface or PowerShell.
Email-Based Authentication
If you want to use the email option, the Alternate Email Address attribute of the user account needs to be populated with a valid email address. Similar to the phone number, this can be done through Azure AD Connect synchronization or by manually updating the email address through the web interface or PowerShell.
Question-Based Authentication
When using security questions for self-service password reset authentication, you must specify a pool of security questions that Azure AD can choose from. Out of this question pool, Azure AD requires that some number of questions be answered during enrollment and all or a subset of those questions need to be answered during a password reset event.
We recommend that customers define their own questions, based on legal counsel and the approval of the customer's security team. This is to ensure that the questions don't inappropriately ask users for sensitive personally identifiable information, and that the questions are secure with the answers not being easily guessable.
Enablement
Self-service password reset can be enabled for all users or for a subset of users in the directory. To enable self-service password reset for a subset of users, the users can be added to an Azure AD security group, which is referenced in the self-service password reset configuration. There are two options for adding users to this group:
Registration
Before self-service password reset can be used, the appropriate data must be populated in Azure AD. This data could be prepopulated based on the Azure AD Connect synchronization job, or users can be asked to self-register their information.
If you choose the latter method, users can be provided with a web page link to take them to the registration portal. Azure AD can also be configured to automatically take the user to the registration portal when they sign in to the Access Panel.
Recommended: We recommend that you create an end-user communication plan to provide the users with the details about how to register for self-service password reset, reset their password, and know what to expect. |
Self-service group management enables users to manage their groups and group memberships in Azure Active Directory. The following scenarios are supported by self-service group management:
Recommended: At the time of this writing, we did not support managing groups that are created in on-premises Active Directory and synchronized to Azure AD. Please check for updates about self-service group management on TechNet (see the References table at the end of this section). | |
Mandatory: Self-service group management requires an Azure AD Premium license, and the user performing group management must have a license assigned to them. |
Enablement
Self-service group management can be enabled for all users or for a subset of users in a specific Azure AD security group. There are two options for adding users to this group:
Managing Groups
After users are enabled for self-service group management, they are able to create groups and submit requests to join groups. After self-service group management is enabled, there are a couple of options available:
Option |
Description |
Allow users to create groups |
Not all customers are comfortable allowing employees to create groups, so this is used to control group creation for all users. |
Restrict who can use self-service group management |
This option limits which users are enabled for self-service group management. Customers who use this option need to add all users who can perform self-service group management to a security group in Azure AD. |
Groups that are created in the Azure portal have self-service group management disabled by default. If a customer wants to enable self-service group management for these groups, the groups must first be assigned owners, and the owner must change the policy for the group in the MyApps portal.
Mandatory: Groups cannot be enabled for self-service group management unless they have an identified owner. Finding owners for groups can be a painful exercise for customers for various reasons—for example, the previous owner may have left the organization, or the group may have been created without an owner and no one is sure what organization the group should be managed by. When working with customers in self-service group management, it's important to make sure that customers know that they may need to perform a group owner reconciliation exercise. |
Dedicated Groups
Dedicated groups are special groups that are automatically created and managed by Azure AD. For example, the dedicated group for All Users contains every user account in Azure AD.
Customers can choose to enable or disable dedicated groups. If customers require a group that consists of all users, using a dedicated group is the preferred method of achieving that. Otherwise, customers need to proactively manage the membership of the group.
Mandatory: To use dedicated groups, self-service group management must be enabled in Azure AD. However, if customers want to use dedicated groups and they are not ready for their users to participate in self-service group management, they can configure self-service group management to apply only to users in a security group and keep that security group empty. |
Dynamic Groups
Dynamic groups allow an administrator to specify some criteria by which all users who meet the criteria are automatically members of the group. These criteria take the form of a user attribute query, based on attributes that are present in the directory. For example, if a customer wants to create a dynamic group called HR Users, the criteria for the group would consist of a query where the value of the Department attribute is Human Resources.
Mandatory:
| ||
Recommended:
| ||
References | ||
Self-service group management for users in Azure AD |
https://msdn.microsoft.com/en-us/library/azure/dn641267.aspx | |
Manage Your Groups |
https://msdn.microsoft.com/en-us/library/azure/dn641268.aspx | |
Dedicated Groups in Azure AD |
https://msdn.microsoft.com/en-us/library/azure/dn889921.aspx | |
Dynamic Memberships for Groups in Azure AD |
https://msdn.microsoft.com/en-us/library/azure/dn913807.aspx |
Although Azure AD is a cloud service, some elements of the user interfaces can be branded by customers. To brand the user interface in Azure AD, an Azure AD Basic or Azure AD Premium license is required.
Mandatory: Azure AD Basic or Azure AD Premium licenses are required to customize the sign-in page and Access Panel. The administrator who is making the customizations must have an assigned license; otherwise, the option to configure the branding and customizations will not show up in the Azure portal. |
The following Azure AD components can be branded or customized:
Optional: Language-specific customizations can also be made. So if your customer is a multinational, you could customize the sign-in page text for each language that the customer supports. | ||
Mandatory: Do not design solutions for Azure AD that require custom code to be run as part on the sign-in page or Access Panel. Azure AD does not allow HTML elements or client-side scripts to be run as a customization. | ||
References | ||
Add Company Branding to your Sign-In and Access Panel pages |
When working with virtual machines in an Infrastructure-as-a-Service (IaaS) environment, the virtual machines most often need to be joined to an Active Directory domain. This is so the operating system can be properly managed and the software running on the virtual machines can function properly. Many customers who move virtual machines to Azure have come to the conclusion that extending Active Directory into Azure IaaS is a recommended course of action.
One of the questions we are often asked is whether a customer should deploy Active Directory domain controllers into IaaS. The alternative option is to keep them on-premises and provide a VPN connection. There are various considerations to be made when answering this question. This section answers those questions and provides guidance about how to extend Active Directory to Azure virtual machines in a safe and reliable way.
When considering extending Active Directory to Azure, there are two primary areas of networking that need some thought:
Whether the domain controllers are on-premises or deployed in the Azure, there needs to be connectivity between the Azure virtual network and your on-premises network. If you want to keep domain controllers on-premises, you need an ExpressRoute connection or a Site-to-Site VPN connection to Azure.
Every time a virtual machine in Azure needs to access a domain controller, it will traverse this connection over the WAN. Depending on the stability and performance of the connection, this may cause issues. You should ask the following questions:
Virtual machines in Azure get IP addresses assigned dynamically from the virtual network that they reside in. To the operating system, this assignment occurs via DHCP. If you try to change the IP address to a static IP address, Azure will isolate the virtual machine, and it won't be able to communicate on the network.
Mandatory: Do not set a static IP address on the network adapter in the operating system for virtual domain controllers in Azure. Doing so will isolate the virtual machines and prevent them from communicating on the virtual network. |
The IP address that Azure gives the domain controller will never change unless you deprovision the virtual machine. In general, it is safe to allow Azure to assign a dynamic IP address to a domain controller. If, however, you want a domain controller to have a specific IP address, you can configure Azure to provide a static IP address to the domain controller by using the methods outlined in the networking section of this document.
The IP address is still dynamically assigned from the perspective of the operating system on the virtual machine, but it's really an address that you choose. This also has the benefit of ensuring that the virtual machine retains the same IP address if it is accidently deprovisioned by an administrator.
Recommended: To give a domain controller the IP address that you want and prevent it from changing if the virtual machine is deprovisioned, provide the virtual machine with a static virtual network IP address. |
There are three types of disks in Azure that can be attached to virtual machines:
To prevent the Active Directory database and its SYSVOL from getting deleted or corrupted, both must be placed on a data disk. The virtual machine's operating system disk has Write-behind disk caching in place, so placing the Active Directory database and SYSVOL on the operating system disk could cause Writes to get lost, if a virtual machine is stopped before the cache is committed. Never place the Active Directory database or SYSVOL on a temporary disk – the contents of the temporary disk are deleted during certain virtual machine operations.
Mandatory: Make sure that you place the Active Directory database and SYSVOL on a data disk. If you use the operating system disk or a temporary disk, the database may get corrupted or purged during an outage. |
Most customers will strongly consider placing domain controllers in Azure because they will want the applications they place in Azure IaaS to have reliable and low latency access to the domain controllers.
Domain controllers are highly sensitive roles. If someone compromises a domain controller, they can gain access to virtually everything in a customer's environment. Some customers are nervous about hosting domain controllers in an Azure tenant, and we always respect that concern.
The best way to handle the situation is to present the customer with an understanding of how Azure is secured and information about how we prevent the compromise of a domain controller.
Key resources in any informed conversation about how Azure protects domain controllers should include:
Active Directory and its structure also provide pivotal tools for hardening and enforcing good credential hygiene. Defending against Pass-the-Hash and Pass-the-Ticket attacks come from the design principals in Active Directory, including:
For further considerations, refer to the section in this document about AD Design Considerations.
Microsoft is committed to ensuring that the Azure platform is secure, and new capabilities are constantly being added that increase the security of solutions that can be deployed on Azure. It is highly recommended to read the section of this document called Microsoft Azure Security to gain an understanding of the core security strategies in Azure.
You can also visit the Microsoft Azure Trust Center for additional Azure security documentation. Rather than expound on security itself (this is covered in detail in other places), the remainder of this section will discuss the security aspects of hosting Active Directory domain controllers in Azure.
Read-Only Domain Controllers (RODCs) are a type of Active Directory domain controller that do not allow Write operations to take place. For more information on RODCs, please see AD DS: Read-Only Domain Controllers.
Customers frequently ask if they should use RODCs in Azure as a security measure. In short, the answer is, "No."
The intent of RODCs is to limit the scope of damage that can be done in the case of poor physical security. For example, if a branch domain controller sits in an unsecured trailer in a construction lot, the theft of that domain controller could expose user credentials and provide a way for someone to modify and inject data into the organization's Active Directory infrastructure.
Some might argue that an Azure virtual machine could be stolen in a similar manner as a physical domain controller. Although this may be the case, there are security measures in place that prevent someone from downloading the virtual hard disk (VHD) associated with a domain controller's virtual machine. There are also additional protective measures that can be taken to secure VHDs, which are covered later in this section.
A primary reason that the use of RODCs is discouraged in Azure is that application compatibility is unpredictable. Many services and applications are not compatible with RODCs, and it's difficult to fully assess an application or service to determine its compatibility. In addition, RODCs, by design, redirect a client's Lightweight Directory Access Protocol (LDAP) Write requests to a Read/Write domain controller (RWDC), meaning a client needs to be able to touch an RWDC.
Another reason why we do not recommend using RODCs is that there is still a dependency on the network being in place. If the organization deploys domain controllers in Azure as a way to ensure that Active Directory is available if the connection to the on-premises datacenter goes down, an RODC will not suffice.
It is also inaccurate to believe an RODC helps secure LDAP. To secure LDAP, we recommend using Controlling Object Visibility – List Object Mode in Active Directory..
Recommended: Do not use Read-Only Domain Controllers as a security measure in Azure. |
The Server Core installation option of Windows Server is a reduced-attack surface implementation of Windows Server that removes the user interface and other features of Windows that may not be necessary for all applications and services. Because of the reduced footprint, Server Core provides a smaller attack vector and is less susceptible to viruses and malware than a full Windows Server installation. For more information, please see Windows Server Installation Options.
Although Server Core can provide better security for a virtual machine than a full Windows Server installation, it does not address any specific security concerns with regards to running domain controllers in Azure. If you are already using Server Core for domain controllers on-premises, you can continue this practice for domain controllers in Azure. However, do not go out of your way to deploy Server Core specifically for Azure-based domain controllers.
Recommended: If you are not using Server Core on-premises to run your Azure-based domain controllers, we do not recommend switching. |
Perhaps the biggest threat to domain controllers in Azure is the possibility of someone downloading the virtual hard disk (VHD). The Active Directory database is not encrypted. Therefore, if attackers got ahold of the disk, they could execute a brute force attack against the accounts, or edit the database offline to inject their data. Due to this, protecting the VHDs of the domain controllers in Azure are very important.
The VHDs associated with a virtual machine are stored in Azure Blob storage. The Blob storage has an API that is accessible from the Internet. This is perhaps the biggest threat to domain controllers; the domain controller's hard disk can be downloaded by anyone over the Internet, if they have one of the API keys. The URL for the Blob storage container hosting the VHDs is a standard URL, for example:
https://<StorageAccountName>.blob.core.windowsazure.net/vhds
This allows an attacker to potentially guess which URL the VHDs are stored in, or determine the URL by sniffing DNS queries while on the same network as a tenant administrator that accesses the API.
There are two API keys, and each key is 512 bits. The two keys exist so that they can be rotated without interruption of the services that use Azure Storage. Either key provides access to the storage container, so it's important that the keys are not given to anyone.
The first step for protecting the domain controller VHDs is to create a separate Azure Storage account for them. Keep this storage account separate from all other storage accounts and make sure that no one has the API keys. There is no reason for administrators to have the API keys for their day-to-day operations, so the keys should remain a secret. You'll also want to limit access to the Azure portal, to prevent unauthorized people from obtaining the API keys.
Recommended:
|
In addition to protecting access to the API, we recommend that organizations take additional measures to encrypt the domain controller VHDs. There are various third-party solutions available for encrypting VHDs, but we recommend that the customer look at CloudLink first.
CloudLink SecureVM leverages native BitLocker capabilities in the virtual machine and provides a key management solution that releases BitLocker keys to virtual machines when they need to reboot via manual intervention or preauthorization. Please visit the CloudLink website for more information.
Recommended: We highly recommend that you encrypt domain controller VHDs in Azure by using a third-party partner solution, such as CloudLink SecureVM. |
When a virtual machine is created by using the Azure portal, two endpoints are created by default, which inherently can be accessed over the Internet by using:
We recommended that you remove the Remote Desktop endpoint and only log in to the virtual machine that hosts the domain controller role via the local network. If the VPN connection goes down, and you don't have access to the private IP address of the domain controller, you can sign in to the Azure subscription and create the Remote Desktop endpoint temporarily to give you access to the virtual machine.
If for some reason, it's not desirable to remove the endpoint, we recommend (at a minimum) to place an ACL on the endpoint for all of the IP addresses that will be logging in to the domain controller.
Recommended: Remove the Remote Desktop endpoint from virtual machines in Azure that host the domain controller role. |
We also recommend that you remove the WinRM endpoint for virtual machines that host the domain controller role. If you are actually using the WinRM endpoint for remote PowerShell access to domain controllers, we recommend that you look at doing this over the private virtual network. If that's not possible, you should place an ACL on the endpoint for the IP addresses that automation will be running from.
Recommended: Remove the WinRM endpoint from virtual machines that host the domain controller role in Azure. | ||
References | ||
AD DS: Read-Only Domain Controllers |
https://technet.microsoft.com/en-us/library/cc732801(v=ws.10).aspx | |
Microsoft Azure Trust Center |
http://azure.microsoft.com/en-us/support/trust-center/security/ | |
Windows Server Installation Options |
||
Azure Virtual Machine Disk Encryption using CloudLink |
http://azure.microsoft.com/blog/2014/08/19/azure-virtual-machine-disk-encryption-using-cloudlink/ |
The following section walks through some considerations for deploying domain controllers on virtual machines in Azure.
Active Directory makes efficient use of the available memory in a domain controller, so one of the best ways to make sure that a domain controller is performing optimally is to provide enough memory to adequately cache the database.
To support this, leverage the virtual machine profiles that have larger memory footprints. In particular, look into using the A5 virtual machine, which has a larger memory footprint, and fewer cores. A5 virtual machines tend to strike a good balance between performance and cost for domain controllers.
Recommended: Start out by using A5 virtual machines for domain controllers in Azure. If you need more memory in the domain controller for caching the database, consider using an A6 virtual machine. |
In addition to standard virtual machine roles, Azure provides Web and Worker roles for applications hosted on Azure PaaS. Web and Worker roles are not persistent, and they should not be used for domain controllers. Do not install domain controllers on Web or Worker roles.
Mandatory: Do not use Web or Worker roles for domain controllers in Azure. |
There are multiple ways to deploy a domain controller into a given Azure subscription. The following table lists supported methods for deploying a domain controller in Azure IaaS:
Method |
Description |
Migrate from physical computer to virtual machine |
A P2V solution, such as System Center Virtual Machine Manager, can be used to convert a physical domain controller on-premises to a virtual machine that can be imported into Azure. When doing so, the on-premises domain controller must be shut down and not turned on prior to the virtualized domain controller being started in Azure. |
Move an existing virtual domain controller |
If customers have domain controllers in Hyper-V, those virtual hard disks can be moved and directly attached to an Azure virtual machine. |
Build a new domain controller and replicate from on-premises |
This traditional approach works well for Azure, also. A new virtual machine is created in Azure, and then promoted to a domain controller when connectivity to the on-premises network is established. If this is the first domain controller in Azure, the Active Directory data will be replicated over the WAN. Only egress traffic has a charge, so this should have minimal impact on networking cost. The Install from Media (IFM) option is also available as an alternative to downloading the Active Directory database over the WAN connection. |
Clone a domain controller |
Because Azure supports the VMGenerationID feature, domain controllers on-premises can be cloned and imported in Azure. This is a quick way to get a domain controller deployed in Azure. |
Mandatory: Protect all copies of domain controller VHDs by using good security practices, including backup and temporary copies. |
For the most part, managing domain controllers in Azure is similar to managing domain controllers on-premises. However, there are some specific things that Active Directory administrators need to be aware of. This section outlines those considerations and recommendations for Active Directory administrators.
Active Directory leverages time stamps for replication, and it expects that time will only move forward. There is a particular issue that occurs when an old snapshot of a domain controller is introduced—this is called USN rollback. A USN rollback occurs when there is a divergence in domain controller data.
The issue is that a change that occurs on a rolled-back domain controller is made successfully on the domain controller, but the change never replicates to other domain controllers, because the other domain controllers believe that they are already up-to-date. For more information about USN rollback, please see KB875495.
USN rollback most commonly occurs on virtualized domain controllers, because it's so easy to take a snapshot of a virtual machine that is hosting a domain controller and restore the snapshot later. When the snapshot is restored, the domain controller becomes "rolled back."
To mitigate the effects of USN rollback in virtual domain controllers, a feature was introduced in Windows 2012 called VMGenerationID. When a virtual machine is restored from a snapshot, restarted after being deprovisioned, or restored through a "service healing" event, the VMGenerationID of the virtual machine changes. This has the following effect on domain controllers:
Recommended: If all DCs are hosted in Azure, do not shut down all of the DCs at the same time from the Azure console. This is will de-provision the DCs and cause the VMGenerationID to change upon starting the VM back up, ultimately causing SYSVOL replication to break. |
In the past, when a virtual machine in Azure was shut down, Microsoft would continue to charge for the virtual machine, even though it's not running. The reason is that the resources were still committed to the virtual machine, so the resources couldn't be used for other customers.
To address this, a new shutdown option was added to Azure, called "Stop and De-allocated". Now, when a virtual machine is stopped via the Azure portal, the hard disk remains intact, but several of the resources are deprovisioned. The result is that when the virtual machine is turned on, several things change, including the IP address, the CPU ID, and most importantly, the VMGenerationID.
As discussed earlier, a change of the VMGenerationID can greatly impact Active Directory. Therefore, we strongly recommend virtual machines that host domain controllers are only shut down from within the virtual machine's operating system. Shutting it down in this manner ensures that the resources are not deprovisioned, and the VMGenerationID will not change on the domain controller when it boots up.
Mandatory: Never stop a domain controller through the Azure portal. Always shut down the domain controller from the operating system inside the virtual machine. | ||
References | ||
Service Healing – Auto-recovery of Virtual Machines |
http://azure.microsoft.com/blog/2015/04/07/service-healing-auto-recovery-of-virtual-machines/ | |
KB875495 – How to Detect and Recover From a USN Rollback in WS2003, WS2008, and WS2008 R2 |
||
How the Active Directory Replication Model Works |
https://technet.microsoft.com/en-us/library/cc772726(v=ws.10).aspx |
When deploying domain controllers in Azure, there are some specific things to consider for your Active Directory design. This section outlines these considerations.
It is generally recommended to consider the Azure datacenter as a separate Active Directory site, because it will have its own IP address space and routing considerations. For many applications and services, it is preferable to have a domain controller available within the site, and it's typically preferable to have a local connection instead of traversing the WAN.
More importantly, the customer needs to consider what happens to a virtual machine in Azure if it can't reach a domain controller. The standard recommendation is to place two domain controllers in each Azure region where virtual machines reside.
It is important to note that a resource domain or forest is not recommended given the additional overhead, and these do not represent an effective security boundary.
Recommended: Place two domain controllers within an availability set in all Azure regions where virtual machines reside. |
In addition, there should be an Active Directory site created for each Azure region, and all of the virtual networks in that region should be associated with that site. Standard guidance applies for the definition of Active Directory site links, that is, create the appropriate site links so replication and DCLocator functionality works correctly.
Recommended: Create a unique Active Directory site object for each Azure region where virtual machines reside, and associate all of the virtual networks in that region with the Active Directory site. |
In modern Active Directory deployment there's little reason to not make every domain controller a Global Catalog server. The standard guidance for Global Catalogs also applies to domain controllers in Azure. As a recommended practice, make all of the domain controllers in Azure Global Catalog servers.
Recommended: Make all domain controllers in Azure Global Catalog servers. |
DNS is instrumental to the operation of Active Directory. There should always be DNS servers located alongside the domain controllers, and most of the time we recommend that DNS be Active Directory-integrated. This does not change with Azure.
The domain controllers in Azure should run the DNS Server service, if possible. If you are not using DNS in Windows, there should be a DNS appliance in Azure for the domain controllers to use. Otherwise, a VPN outage will render DNS unavailable and prevent the domain controllers in Azure from operating correctly.
Recommended: Domain controllers in Azure should be also be DNS servers, if it's in line with your existing Active Directory architecture. If using third-party DNS appliances, there should be a virtual appliance available in the Azure tenant. |
Azure provides a default DNS service to virtual machines if you don't specify a DNS server. The Azure name resolution services do not support the complex name resolution needs of Active Directory, so do not attempt to use Azure DNS servers on domain controllers.
Mandatory: Make sure that domain controllers are pointing to a DNS server in Windows that hosts the Active Directory zones, rather than the default DNS servers in Azure. |
Organizational units (OUs) in an Active Directory design are important for operational and security management of Azure assets, especially when extending existing on-premises forests into the Azure cloud. OUs, security groups, and Group Policy Objects (GPOs) provide key administrative controls that can provide containment boundaries within a security zone.
One of the primary challenges of integrating applications into Azure is getting the applications to work with an identity system. The most obvious case is when an application is moved from an on-premises web server to the cloud as a PaaS application.
When on-premises, the application could leverage Active Directory for authentication, group memberships for authorization, and other on-premises identity repositories for accessing identity data. But when that application is moved to a PaaS application, those on-premises systems are no longer accessible.
Even if Active Directory is extended to an IaaS application as discussed in the previous section, Active Directory is still contained in a virtual network that is isolated from PaaS applications. This makes it difficult to migrate applications to Azure, and it is one of the biggest blockers to moving those workloads to the cloud.
The majority of applications can't function without an identity layer. Commonly, applications rely on an identity system for:
Applications have two options with their identity layer. The first option is to create the necessary identity functionality within the application. This adds a considerable amount of complexity to an application because the application now has to deal with the storage of identity data, security of the data, and user interfaces for interacting with it.
Applications that take on the identity layer need to do the following, at minimum:
When considering the amount of work that an application developer has to undertake to manage the identities, it's obvious why most developers are integrating with trusted cloud identity solutions instead. This section focuses on primary scenarios for integrating applications with Azure-based identity solutions.
Before digging in to the various scenarios, there are some baseline concepts that need to be covered. There are several ways to integrate an application with an Azure-based identity solution, so before going into the specific recommendations, it's important to understand these approaches. The concepts that are discussed in this section are:
Identity federation gives an application the ability to defer authentication (and some types of authorization) to a source that is, in theory, more authoritative for authenticating users.
For example, if you have an application that you want to share with a partner, the application can be federated with that partner. The partner then becomes the entity that authenticates the users. The application simply receives proof that the user was successfully authenticated by the partner. The following diagram depicts what a federated relationship looks like between an organization and one of its partners.
Notice in the diagram that both organizations have an Identity Federation Service deployed. The federation service in the Resource Org (the organization hosting the application) trusts authentications that occur at the federation service in the Partner Org (where the users authenticate). Also, the web application trusts the federation service inside of its own organization.
There are several reasons for this architecture. The biggest benefit to the application is that the federation service in the Resource Org acts as a buffer between the Partner Org and application. Because of this, the Resource Org has better control over who is authorized to access the application.
If there was more than one partner (as is usually the case), the Identity Federation Service in the Resource Org manages those relationships, so the application doesn't have to. The application has to only manage the relationship between itself and the federation service in its own organization.
Identity federation is a big subject, and there's been a lot of content written about it. One of the best sources for understanding identity federation is a free eBook called A Guide to Claims-Based Identity and Access Control (2nd Edition).
Common Identity Federation Protocols
When it comes to actually implementing identity federation, a variety of protocols exist. The job of the protocol is to lay out the process that an application needs to go through to obtain a security token. The security token serves as proof that a user has successfully authenticated to a trusted identity system.
Most of the protocols that are available were written for web-based applications, and therefore, they use common web methods (such as HTTP redirects and HTTP POST methods) for carrying out the protocol exchange. There are also protocols available for non-web-based applications that need identities for calling SOAP-based or RESTful web services. The following protocols are common today:
WS-Federation PRP and SAML 2.0 Web SSO Profile
The WS-Federation Passive Requestor Profile (WS-Federation) and the SAML 2.0 Web SSO Profile (SAML 2.0) protocols use the same general process for providing an application with a security token. Therefore, the following description applies to both protocols.
However, these protocols are not compatible. That is, an application that only understands WS-Federation cannot request a security token from a federation service that only understands SAML 2.0.
Although the protocol flow is the same, the difference between the two is the messaging. For example, in WS-Federation, the parameters are given to the federation service via individual query string elements in the URL. In SAML 2.0, there is an XML message that is encoded and placed in a single query string element to the federation service. Therefore, a SAML 2.0-based federation service wouldn't know what to do with a WS-Federation request.
WS-Federation and SAML 2.0 are web sign-in protocols that enable web applications to authenticate users over a web browser. The first step of this process is to establish a relationship between the application and the federation service. This relationship is called a federation trust.
This is depicted in the following diagram, where the web application on the right trusts authentications performed by the Identity Federation Service on the left.
After a trust is established, users can sign in to the web application. To initiate the log-on process, the application redirects the user's web browser to the log-on URL in the trusted Identity Federation Service. This is done via an HTTP 302 Redirect, so the user's web browser automatically browses to the log-on page after clicking the sign-in button.
After the user is redirected to the log-on page on the Identity Federation Service, users authenticate by using their credentials. These credentials can take any form that the Identity Federation Service supports, such as user name/password, Kerberos protocol, or even X509 certificate authentication.
After the user is authenticated, the Identity Federation Service creates a security token that proves the user successfully authenticated. This security token also contains some additional information about the user (called claims), so that the application has some idea of who the user is.
This security token is signed by using an X509 certificate to prevent a malicious user from tampering with its contents. The security token can exist in a variety of formats, but the most common formats are:
The Identity Federation Service then provides this security token to the application by returning it to the user's web browser in a web form. The user does not have to click the Submit button in the web form because it's hidden.
Instead, the Identity Federation Service loads some JavaScript into the browser that automatically clicks the Submit button for the user, which initiates an HTTP POST event to the application. The data that is POSTed to the application is the security token that the Identity Federation Service created.
Now that the web application has the security token, it can inspect it and check the certificate to make sure it was created by the same Identity Federation Service that it has a trust relationship with. The application can then extract the identity information from the security token and use it.
OpenID Connect
OpenID Connect (OIDC) is another web sign-in protocol that was ratified in February 2014. Its purpose was to take the basic OAuth 2.0 flows and add an identity layer to make it suitable for authenticating users to web applications. OIDC is a more modern web sign-in protocol, and it has distinct advantages over WS-Federation and SAML 2.0.
Similar to WS-Federation and SAML 2.0, the first step in OIDC is to establish a relationship between the application and the Identity Federation Service, which is also referred to as the OpenID Connect Provider.
During this process, the application registers with the OpenID Connect Provider. Unlike WS-Federation and SAML 2.0, however, the web application registers a credential with the OpenID Connect Provider, which will allow the web application to authenticate itself later during the token exchange.
Now that the application is registered, the user can sign in. When the user browses to the web application and clicks the Sign in link, the web application redirects the user to the OpenID Connect Provider.
To accomplish this, the web application uses a standard HTTP 302 Redirect message. All of the data that the OpenID Connect Provider needs to do the job of authenticating the user is included by the application in the query string elements of the URL in the redirect.
In the next step, the OpenID Connect Provider prompts the user for authentication. This authentication challenge can take any form that the OpenID Connect Provider supports, including user name/password, Kerberos protocol, or X509 certificate authentication.
After authentication is successfully, OIDC prompts the user for consent to release some of the user's identity data to the application. This is one of the main differences between OIDC and other web sign-in protocols.
After authentication, the OpenID Connect Provider creates an authorization code for the application. This authorization code is not the security token. Instead, the application can use this authorization code to retrieve the security token from the OpenID Connect Provider.
After the authorization code is created, the OpenID Connect Provider redirects the user to the web application with the authorization code in a query string parameter, so that the application can receive it.
Now that the web application has the authorization code, the user's browser is finished with its job. The application goes back to the OpenID Connect Provider and redeems the authorization code for an identity token. It does this by calling a RESTful web service on the OpenID Connect Provider and sending an HTTP POST method to the OpenID Connect Provider with the authorization code.
The web application authenticates to the OpenID Connect Provider by using the credential that it established during registration.
As a final step, to prevent a man-in-the-middle attack, the web application verifies that the identity token was indeed created by the OpenID Connect Provider by checking the signature. The web application can then extract the identity data from the token and use it for the session.
WS-Trust
WS-Trust is a protocol that is used for authenticating to SOAP-based web services. This has traditionally been used in client/server applications that run on the desktop. The client application must collect the credentials from the user, and call the Identity Federation Service to exchange those credentials for a security token that the web service will accept.
Similar to the other protocols, the first step is for the web service to establish a trust with the Identity Federation Service. The client does not have a trust because the client is not receiving the security token. It's akin to the web browser in the previously discussed flows.
WS-Trust generally assumes that the client running on the user's device is a trusted entity. Therefore, the first step in the process is for the client to prompt the user for credentials. Like the other protocols, these credentials can take any form that the Identity Federation Service accepts, such as a user name/password or a Kerberos ticket. The client then sends a request to the Identity Federation Service called a Request for Security Token (RST).
The Identity Federation Service verifies the credentials, and sends a response back to the client called a Request for Security Token Response (RSTR). The RSTR contains the security token that the client will present to the web service.
The client then calls the web service, including the security token as an authenticator in the SOAP message. When the web service receives the message, it extracts the security token and verifies that it was issued by the Identity Federation Service that it trusts.
OAuth 2.0
OAuth 2.0 is a client/server protocol. Unlike WS-Trust, OAuth 2.0 is a RESTful protocol which is most suitable for tablet or phone applications that need to access RESTful web services. Most modern applications leverage RESTful web services, so this protocol has gained a lot of popularity.
One thing to note about OAuth 2.0 is that its original purpose was not to be an identity protocol. Rather, it was authored with the intention of providing delegated access to web services for applications running on a user's device.
The following table describes the four main flows in OAuth 2.0, which can provide access tokens for a variety of application types:
OAuth Flow Type |
Description |
Authorization code flow |
Perhaps the flow most often used by clients, this flow allows applications installed on a device to use RESTful web services. The password is not given to the application; rather, it's given to the OAuth server, which means that the application is not able to store the password. |
Client credentials flow |
This flow is typically used by a web application or web service that needs to access another web service. The user is not involved in this flow; rather, it's for one application authenticating to another. |
Resource owner password flow |
This flow allows an application installed on the device to collect credentials from the user and exchange them for an access token that can be used to access the web service. This should only be used with first party applications, because the user is trusting the application with a user name and password in most cases. |
Implicit flow |
This flow is used for applications that are not installed on a device, for example, a JavaScript client that needs to access a web service. |
For more details on the flows, please see the OAuth 2.0 specification, RFC6749. Rather than diving into each of these flows, the remainder of this section details the Authorization Code flow, which is the flow that most customers will likely be using.
Like the other protocols, the first step in an OAuth 2.0 flow is to establish a relationship with the Identity Federation Server (the OAuth server). This process is similar to the OpenID Connect registration process, where the application registers an authentication secret. In addition to the web service, the client registers with the OAuth server.
These registrations are performed by the developers of the respective applications.
When the user launches the application on their device and attempts the sign-in process, the client opens a web browser control and directs the user to the log-on page on the OAuth server. The user is also presented with the log-on page, as well as the consent page, which allows the client to use the web service on their behalf.
After the user authenticates and consents to the OAuth server, the OAuth server creates an authorization code and returns it to the client. To get the Authorization Code to the client, the OAuth Server returns it as a query string parameter in an HTTP redirect. However, because the client is not a web browser, it pulls the authorization code out of the URL and does not follow through on the redirect.
For modern Windows applications, this is a special URL that begins with ms-app://, but it could be a standard https:// URL pointing to a server that doesn't exist. Because the application does not follow through on the redirect, it does not matter if the URL is real.
After receiving the authorization code, the client returns it to the OAuth server and redeems it for an access token.
Now that the client is in possession of the access token, the client can call the web service, using the access token as an authenticator. Note that this authenticator does not authenticate the user. Instead, it authenticates the fact that the client has access to the web service on the user's behalf.
Identity Federation Support in Azure AD
Azure Active Directory natively supports the following identity protocols and security token formats in Azure AD:
What this means is that there are a variety of protocols to choose from for integrating an application with Azure AD. This also means that for custom applications, it is likely that some portions of the application need to be rewritten to support one of these protocols.
If it's a COTS application, and the application doesn't support any of these identity federation protocols, then you'll need to use one of the other approaches outlined in this section. The following diagram provides the general approach for integrating applications into Azure AD:
Identity Federation Support in AD FS
Active Directory Federation Services is Microsoft's on-premises identity federation service product. Similar to Azure AD, AD FS supports WS-Federation, SAML 2.0, and a limited subset of the OAuth 2.0 protocol flows. The following diagram illustrates the protocol support built-in to AD FS in Windows Server 2012 R2.
Note that unlike Azure AD, AD FS in Windows Server 2012 R2 does not provide support for OpenID Connect, however, this is present in AD FS for Windows 10.
Identity Federation APIs
In theory, if application developers read the identity protocol specification documents, they can write the application to support each protocol natively. This, however, is a very large undertaking because these specifications aren't simple, and there are several cases that divert from the protocol specifications.
The better approach for an application developer is to leverage an existing application programming interface (API), which takes on all of the complexity associated with following the protocol specification. These APIs typically come in the form of a library that the developers can use.
When AD FS 2.0 was released, we also released one of these APIs for .NET applications. This API was called the Windows Identity Foundation (WIF). WIF came with a variety of .NET libraries that made it simple for developers to interact with the WS-Federation and WS-Trust protocols. The problem with including all of the protocol support in a single monolithic library is that it's difficult to update the library to support newer and emerging protocols.
Also, in recent years, there's been an effort to decouple the .NET Framework from the Microsoft IIS server. As a result, newer interface methodologies for .NET have come forth. In particular, Open Web Server Interface for .NET (OWIN) is now the preferred way to provide federation protocol support to .NET applications.
OWIN provides a modular platform, and Microsoft has started building modules to support each of the identity federation protocols. Developers can integrate these modules into their applications without a fee, and they can enable identity federation support for the application very quickly.
Recommended: Use OWIN modules instead of the Windows Identity Foundation SDK for adding identity federation protocol support to .NET applications. |
The Azure AD Application Proxy enables external access for on-premises applications. When a user connects to an on-premises application from outside the network, the connection is made directly with Azure and conveyed via proxy into the private network via an application proxy agent.
This allows a customer to externalize their on-premises applications through Azure AD and provide pre-authentication and single sign-on. For more information about Azure AD Application Proxy, please see Using Application Proxy to Publish Applications for Secure Remote Access.
The Azure AD Application Proxy is a good solution for applications that are on-premises and rely on Windows authentication. Because it can provide Kerberos delegation, users who are outside the network can authenticate their credentials in Azure AD before the connection with the application is established. By doing this, you can establish policies for the application, such as a requirement for Multi-Factor Authentication.
Recommended: Do not use the Azure AD Application Proxy for users who are external to a customer's organization. For Kerberos constrained delegation to work, a shadow account needs to be provisioned and managed in the on-premises Active Directory forest.. | |
Mandatory: Azure AD Application Proxy requires an Azure AD Basic or an Azure AD Premium subscription. |
Password Vaulting solves a very specific problem of integrating with web applications that don't support any type of Single Sign-On (SSO) solution. With password vaulting, users store their passwords for applications in Azure AD. The application is integrated with the Azure AD Access Panel, which is an application launcher in Azure AD.
When the user launches the application from the Access Panel, a web browser plug-in is launched. The plug-in retrieves the password from Azure AD, enters it into the sign-in form, and clicks the Sign In button for the user. By using this technique, users don't have to enter credentials in the web applications. For more information on password vaulting, please see Application Access Capabilities for Azure Active Directory.
Mandatory: When using password vaulted applications, a browser plug-in is used. Therefore, a user cannot run their web browser in "private" mode. | |
Recommended: Use password vaulting for integrating applications that use forms-based authentication with Azure AD. These applications are typically difficult to integrate with SSO solutions because they do not accept standard SSO protocols. |
One of the main benefits of password vaulting is that it can integrate with almost any application that has a sign-in page, even if it's not an application that is integrated into the Application Gallery in Azure AD. So when a customer wants to integrate every application with Azure AD, password vaulting is an integral piece of the solution.
Password vaulting is not a perfect solution without its challenges, though. When working with customers that want to use password vaulting, ensure that they understand the following key points:
References | |
A Guide to Claims-Based Identity and Access Control (2nd Edition) |
|
OpenID Connect Website |
|
Using Application Proxy to Publish Applications for Secure Remote Access |
https://msdn.microsoft.com/en-us/library/azure/dn768219.aspx |
Application Access Enhancements for Azure Active Directory |
https://msdn.microsoft.com/en-us/library/azure/dn308588.aspx#bkmk_passwordsso |
Traditionally, on-premises applications integrate with a customer's Active Directory implementation, or some other well-known directory in the environment. Although customers don't have 100% of their applications integrated with Active Directory, they usually prefer to do it if the application supports it.
Some of the key reasons a customer might decide to integrate an on-premises application with an Azure identity solution are to:
Regardless of the reason, there are two ways to integrate an on-premises application with Azure Active Directory:
Most of the time, when federating an on-premises application, the sign-in and sign-out portion of the application will have to be rewritten to handle one of the identity federation protocols discussed in the Microsoft Azure Networking section in this document.
To accomplish this, the developer needs a library that implements the protocol. Federated identity libraries won't be available for every application platform, but most of the platforms have them available.
For .NET 3.5 and .NET 4.0, we recommend you use the WS-Federation protocol support that is available in Windows Identity Foundation (WIF). For more information about WIF, please see 7 Hours of Recordings from the WIF Workshops.
Recommended: For .NET 4.0 and .NET 3.5 applications, use Windows Identity Foundation with the WS-Federation protocol. | |
Mandatory: We do not have any native WS-Federation libraries available for .NET 3.0 or earlier. Customers must update their applications to .NET 3.5 at a minimum to use the WIF API. |
For applications that use .NET 4.5, we recommend using the OWIN libraries for WS-Federation or OpenID Connect. For more information, please see:
Recommended: For .NET 4.5 applications, use the OWIN OpenID Connect module, if possible. |
For Java applications, there are two approaches. First, if the Java application is running on a J2EE server, such as Oracle WebLogic, it's possible that the J2EE server natively supports one or more identity federation protocols.
If that's the case, we recommend that you use the native support of the web platform. We've found that most J2EE platforms support SAML 2.0, and usually, a few additional protocols. If the Java application is running on Apache, the protocol integration can be done with a servlet, such as the Oracle OpenSSO Fedlet.
Recommended: For Java applications running on a J2EE server, use the native SAML 2.0 support that most platforms have. If the application is running on Apache Tomcat, use the Oracle OpenSSO Fedlet, or use a suitable servlet alternative with SAML 2.0 support. |
If re-writing the application to support an identity federation protocol is not possible, the Azure AD Application Proxy is an alternative that you can use for on-premises applications. When integrating with the Azure AD Application Proxy, the application gets the following benefits:
When using the Azure AD Application Proxy, the application is still authenticating against the on-premises Active Directory—the application proxy does not change that. However, the application proxy is able to pre-authenticate users with Azure AD, and then facilitate SSO to the on-premises Active Directory by using a protocol transition technology called Kerberos constrained delegation.
Because there is still a dependency on Active Directory, all users of the application must have an active Active Directory account in the customer's forest. Because of this, Azure AD Application Proxy is most suitable for internal users who access on-premises applications over the Internet.
References | |
7 Hours of Recordings from the WIF Workshops |
|
WS-Federation in Microsoft OWIN Components – A Quick Start |
|
OWIN Security Components in ASP.NET: OpenID Connect |
When applications are running in IaaS, they are actually running on standard web servers (such as IIS) on virtual machines in Azure. If you consider Azure IaaS as a second datacenter that extends a customer's on-premises network, applications running in Azure IaaS are going to have similar options to applications running on-premises. Therefore, the options outlined in previous sections should be considered.
When deciding which integration approach to take, it's first important to understand the customer's motivation for wanting to move the application to IaaS.
If the customer is moving the application to IaaS for physical reasons, the application will likely behave similarly to if it was hosted on-premises. Some examples of physical reasons are:
If any of these scenarios are the motivation for the customer to move the application to IaaS, the best approach likely would be to also extend the on-premises Active Directory forest into IaaS. This allows the application to leverage Active Directory without any modifications to the application or the identity management processes for the customer.
There is, however, some additional engineering required for planning to extend Active Directory to Azure IaaS, in addition to some administrative burden. Please see the previous section for more guidance about how to extend Active Directory into IaaS.
Another potential motivation for customers is to move the application to the cloud so that they can provide access to the application for employees without VPN access. As the workforce becomes increasingly mobile, the motivation for employees to access internal applications from their personal devices is growing. To accomplish this goal, there are three potential solutions.
Proxy an Active Directory-Integrated Application
The Azure AD Application Proxy can be used to make an application on a private network accessible over the Internet. If the application is integrated with Active Directory, the customer should use Azure AD pre-authentication. Otherwise, a user on the Internet will have a poor sign-in experience.
They will see the uninformative HTTP 401 credential dialog from the web browser, instead of a web-based sign-in page. This may also prevent the user from getting a SSO experience, and could cause the user's credentials to be stored in the Windows Credential Manager on their personal computers.
Recommended: If the application is Active Directory-integrated, enable pre-authentication at the Azure AD Application Proxy. |
In addition, the customer should deploy Active Directory into the Azure IaaS tenant with the application, as discussed in the previous section. Unless this happens, the application will not be able to authenticate users if the network link to the on-premises environment is down.
Recommended: If the application is Active Directory-integrated, extend the Active Directory forest to the IaaS tenant. This ensures that the application can authenticate users over the Internet in the event that the S2S VPN connection is down. |
Proxy a Forms-Based Application
The Azure AD Application Proxy can also proxy applications that do not integrate with Active Directory. It is expected that these applications have their own sign-in page, which authenticates users against some other type of credential store.
If this is the case, the customer will likely want to integrate the application in pass-through mode (without pre-authentication). Otherwise, the user will get two password prompts—one for Azure AD and another for the application.
Recommended: If the application uses forms-based authentication, proxy it in pass-through mode. |
If the customer wants to also achieve SSO for these applications, they can use the password vaulting technique discussed in previous sections.
Recommended: To achieve SSO for forms-based applications, use password vaulting. |
Proxy a Claims-Based Application
If the application already understands one of the identity federation protocols discussed in previous sections, Azure AD Application Proxy can proxy the application either with pre-authentication or in pass-through mode.
If the application is published in pass-through mode, Azure AD will not interfere with the authentication process. Instead, the application walks the user through the process of performing federated authentication. Customers would likely want to use this option if the application is claims-based, but is not federated with Azure AD.
Recommended: For claims-based applications that are federated with something other than Azure AD, proxy them in pass-through mode. |
If the application is federated with Azure AD, the customer should enable pre-authentication. Pre-authentication provides an additional layer of protection for the application, and in this case, the user experience would not be compromised as a result.
Recommended: For claims-based applications that are federated with Azure AD, use pre-authentication. |
If the customer wants to move an application to IaaS and make it available for partners to use, the solution becomes a bit more difficult.
The Application Already Manages Partner Identities
If the application already manages partner identities, it would already have an existing identity repository and authentication mechanism for partner users. In this case, the customer should simply publish the application with the Azure AD Application Proxy in pass-through mode. In this case, the application performs the authentication process end-to-end, and the application proxy is there to make the application accessible from outside the intranet.
The Application Does Not Already Consume Partner Identities
If the application only allows employee access, and the customer wants to now open it up to partners, the difficulty of the solution increases. In general, the application needs to be updated to support an identity federation protocol, which is previously discussed. Depending on the complexity of the application, this may not be a minor undertaking.
In general, applications that are deployed in Azure PaaS do not have access to Active Directory. The PaaS platform does not support Integrated Windows Authentication, so integrating these applications is not possible. There are two ways to support this application.
The first approach is that the application entirely handles the identity layer, however this is typically not desired. As a result, there are a lot of consequences for the developers and application support team. This approach is not recommended as the first option.
Recommended: We do not recommend that the developer write the identity layer into a PaaS application. |
If a customer chooses to use this approach, password vaulting can be used to provide users with a SSO experience for these applications. The one requirement is that the application must use a sign-in form with a user name and password field.
Recommended: If a customer builds the identity layer to the application, use password vaulting to provide users with a SSO experience. | |
Mandatory: To integrate a PaaS application with password vaulting, the application must have a sign-in form with a user name and password field. |
The third approach for the PaaS application is to directly integrate it with Azure AD. To perform this integration, the application needs to support federated identity and must integrate over one of the protocols that are discussed earlier in this document.
By integrating with Azure AD, the PaaS application can leverage some of the identity service capabilities that Azure AD provides for developers.
Authentication
When a PaaS application is federated with Azure AD, the user is taken to the Azure AD sign-in page to authenticate. Azure AD can choose to sign in the user, or if the customer has integrated Azure AD with another identity provider (such as AD FS), the user can authenticate to an on-premises Identity Federation Service.
The benefit to the application is that the developer doesn't have to be concerned with how the authentication process happens. Additionally, if the customer is using the Azure AD version of Azure Multi-Factor Authentication, Azure AD can perform MFA on behalf of the application.
Authorization
Another benefit for integrating PaaS applications with Azure AD is that user and group-based authorization can be performed. If a customer is synchronizing group memberships into Azure AD, those group memberships can be used to grant users access to the PaaS application. Otherwise, customers can manage the group memberships in Azure AD directly.
Mandatory: The customer must have an Azure AD Basic or an Azure AD Premium license to use group-based access control. |
Identity Data
One benefit that a PaaS application gets from Azure AD that it does not get from AD FS is access to the identity data repository. Applications can store identity data in Azure AD by using a RESTful interface called the Graph API. If an application needs to store data about users for out-of-band use, it can use Azure AD instead of having its own identity data store.
Access Panel
When the PaaS application is directly integrated with Azure AD, it can be displayed in the Azure AD Access Panel. The Access Panel is an application dashboard that serves as a single portal for all of the user's applications and self-service identity management capabilities.
There are two primary identity functions that are common to most SaaS applications:
When integrating SaaS applications with an Azure-based identity solution, the options are going to vary greatly, depending on what the SaaS provider supports. Because a SaaS application isn't owned by the organization that is using it, there isn't much flexibility in changing the application to support a particular integration approach.
Therefore, a comprehensive enterprise approach for integrating SaaS applications is going to require a variety of integration techniques. There are two authentication methods, in particular, that Azure AD offers to get SaaS applications integrated with the enterprise:
Azure AD offers a catalog full of applications that are pre-integrated.
Federation through Azure AD
An increasing number of SaaS applications are supporting identity federation standards and SAML 2.0, in particular. The main benefit of integrating a SaaS application with Azure AD is that Azure AD can become the hub of application integration.
Customers who use 100 applications would normally have to integrate each application individually with their on-premises identity system. However, Azure AD can integrate with the 100 applications so that the on-premises systems don't have to. There only needs to be a single integration between the on-premises systems and Azure AD, which many customers already have in place through Office 365.
A federated identity trust also provides users with a first class experience when signing in to SaaS applications. If the customer uses AD FS with Azure AD, the user gets a seamless desktop SSO experience to the application when accessing the application from the intranet.
The experience for a user who is signed in to the AD domain on the intranet would resemble the following:
It's important to note that not every SaaS application offers an option to perform a federated identity trust. Therefore, only a subset of the SaaS applications that are pre-integrated with Azure AD use this approach.
When integrating SaaS applications with Azure AD, it's important to understand if the application is one that Azure AD uses identity federation with. If it's not, it's using the password vaulting approach described in the Microsoft Azure Identity section, which has a very different user experience.
Password Vaulting
Password vaulting is an alternative approach for SaaS applications that don't support identity federation. Almost every application has a sign-in form, where the users enter their credential and click the Sign In button.
Azure AD is able to securely store the password that users have for these applications. When the user launches the application from the Azure Access Panel, a browser plug-in securely retrieves the password from Azure AD, fills in the sign-in form, and clicks the Sign In button for the user. The experience resembles the following:
One main limitation of this approach that customers should be aware of is that the application must be launched from the Access Panel. If the user browses to the application first, and clicks the Sign In button at the application, the browser plug-in will not interfere to sign in the user.
In addition to authentication, most SaaS applications require that each user has an account in their identity repository. Even if the application is fully federated, there is often a requirement for an account so that a subscription can be associated with an individual user.
Any customer integrating with a SaaS application should be prepared to populate the SaaS provider with user accounts for the people using the application. There are two ways to approach this:
Manual Account Provisioning
Most SaaS providers provide a web interface that can be used for creating user accounts and assigning them roles, subscriptions, and attributes. This can be a time-consuming process, but smaller customer organizations might not be averse to using this approach.
One drawback with the manual method is that many organizations don't off-board employees or contractors from all of their applications as employees leave the organization. There's a lot of room for mistakes when manually managing accounts in the SaaS provider, because it's not likely that administrators will remember to delete the accounts for former contractors and employees in the SaaS applications. This not only costs more money in terms of subscription fees, but it is also a big security problem.
Account Synchronization
With account synchronization, there is an automated process that copies the identities on-premises into the SaaS provider's system. Sometimes, SaaS providers give their customers their own synchronization tool (such as that provided by Azure AD). Other times, SaaS providers simply provide an API for their customers, and ask that they use their own synchronization tool (such as Microsoft Identity Manager) to interface with the API.
When a customer uses Azure AD to integrate with SaaS providers, this process can be greatly simplified. Because Azure AD already has a copy of all the user accounts for a customer, it can synchronize those accounts into the SaaS app on behalf of the customer. In this way, Azure AD becomes not only the hub of identity federation, but also the hub of application provisioning.
Recommended: The provisioning capability is only present for a small subset of the SaaS applications that integrate with Azure AD. When discussing this with customers, it's prudent to check the Azure AD Application Gallery to determine if the application supports provisioning from Azure AD. |
If account provisioning through Azure AD isn't available for a SaaS application that a customer wants to integrate with, the customer needs to provision those accounts in an out-of-band process on their own.
Azure AD plays a critical role for CSPs, providing the identity control plane for access to Azure management portals and APIs. This allows authorized CSP agents to provision, manage, and support tenant Azure subscriptions.
Each CSP has a provider-specific Azure AD tenant (Provider Tenant) created automatically upon registration in the Microsoft Partner Center Portal. The Provider Tenant uses the standard Azure AD domain naming convention of <tenantname>.onmicrosoft.com as the default domain name. Additional custom domains may be added to the tenant to improve user sign-on experiences. The Provider Tenant can be managed through the Partner Center Portal as shown below or through standard Azure AD administration interfaces.
The Provider Tenant stores user identities for CSP staff and is not intended to store customer user identities. CSP users authenticate against the Provider Tenant to perform administrative functions through management tools and APIs.
Provider Directory Integration
The Provider Tenant can be integrated with a provider's on-premises Active Directory to enable single sign-on or simplified sign-on, self-service password reset with on-premises write-back, and other Azure hybrid identity capabilities for provider users. Directory integration for the Provider Tenant may include a combination of directory synchronization, identity federation, and password synchronization. This is achieved using Azure AD Connect between the provider's on-premises Active Directory and the Provider Tenant.
Additional Considerations
The Provider Tenant has the same capabilities as a standard Azure AD tenant. As such, the identity principles described throughout this document apply to the Provider Tenant. This includes integration of an on-premises CSP Active Directory with the Provider Tenant, multi-factor authentication, reporting, monitoring, and more.
A role-based security model permits degrees of access and control over CSP customers and over the Provider Tenant itself. A CSP user's role can fall into one of the following general categories:
A CSP user can also be both an agent and an admin, but this role assignment is not recommended for security reasons.
Provider Agent Roles
Agent roles are intended for CSP staff who need to perform management functions over a customer's Azure subscription. Management functions include technical capabilities such as delegated administration as well as non-technical capabilities such as billing and subscription management.
Role in Partner Center |
What they can do |
What they can't do |
Admin agent |
|
|
Sales agent |
|
|
Helpdesk agent |
|
|
Provider Admin Roles
Admin roles are intended for CSP staff who perform management functions over the CSP's Provider Tenant. These roles mirror the standard Azure subscription administration roles.
Role in Partner Center |
What they can do |
Global admin |
|
Billing admin |
|
User management admin |
|
Customer Azure Active Directory
A customer Azure AD tenant (Customer Tenant) is created when the customer is provisioned through the Partner Center or through the CREST API. When a new Azure CSP subscription is ordered for the customer, the provider's AdminAgent Group is configured as the owner of the subscription. The customer has no rights to manage the CSP subscription unless specifically granted by the provider.
Administered On Behalf of (AOBO) is an administrative construct to allow delegation of administration to internal or external entities. Office 365 and Intune honor the Azure AD Tenant Admin role. The AdminAgents group in the Provider Tenant is made a member of the Tenant Admin role in the Customer Tenant. This allows members of the AdminAgents group in the provider's Azure AD tenant to perform administrative functions in a customer's Office 365 and Intune instances.
In a similar fashion to Office 365, the Azure CSP subscription is provisioned into the Customer Tenant. However, Azure subscriptions have "subscription-scoped" boundaries and are not managed by Tenant Admins. Instead, the provider AdminAgents group is directly granted the Owner role in Azure CSP subscriptions. This allows members of the AdminAgents group in the Provider Tenant to perform administrative functions in a customer's Azure CSP subscription.
Management tools vary depending on the service being managed. A link to management tools for each customer can be accessed from the Partner Center on the main customer page and in the Service Management section for each customer.
At the time of this writing, management of a customer's Azure AD was not supported directly through the Partner Center portal and links. The URL below can be used as a workaround for Azure AD management until the Partner Center Azure Active Directory management tool links are operational.
Azure AD Management URL Format:
https://manage.windowsazure.com/[provider tenant name]#Workspaces/
ActiveDirectoryExtension/Directory/[tenant ID]/directoryQuickStart
Azure AD Management URL Example:
https://manage.windowsazure.com/FabrikamCSP.onmicrosoft.com#Workspaces/
ActiveDirectoryExtension/Directory/3477e0d3-fecd-4586-9917-d34807715a6f/
directoryQuickStart
The Customer Tenant ID can be found in the Partner Center as the customer's Microsoft ID.
Each CSP customer has a customer-specific Azure AD tenant (Customer Tenant) created automatically when the CSP provisions the customer in the Microsoft Partner Center Portal or through the CREST API. The Customer Tenant uses the standard Azure AD domain naming convention of <tenantname>.onmicrosoft.com as the default domain name, and additional custom domains may be added to the tenant to improve user sign-on experiences.
The Customer Tenant contains identities used by the customer to manage CSP subscription resources (if authorized by the CSP) and to authenticate to services and applications within the CSP subscription. The Customer Tenant is an Azure Active Directory tenant and all guidance found earlier in this document applies to the Customer Tenant.
Service Providers connect their customers to their CSP Azure subscriptions in one of two ways – Connect To and Connect Through. In both models, customers may use their Customer Tenant to store independent identities with separate credentials. They may also desire identity integration between the Customer Tenant and an Active Directory identity store. The identity integration approach differs slightly between the two models.
"Connect Through" Identity Integration
In the Connect Through model, the Provider creates a direct connection between the provider datacenter and the provisioned customer Azure subscription using Site-to-Site VPN over the provider's network. This connectivity scenario requires that the customer pass through a provider network to access CSP provisioned Azure subscription services, using a network connection that is created, owned, and managed by the service provider.
For Connect Through customers, it is assumed that the provider has a previously-established Active Directory tenant identity store for each tenant. This identity store should be integrated with the Customer Tenant using Azure AD Connect. Guidance on integrating on-premises Active Directory with Azure Active Directory can be found earlier in this document. This guidance should be consulted to determine if directory integration is desirable for the Connect Through customer, and if so, how best to implement the integration. For the purposes of integration, the provider-hosted tenant identity store for a Connect Through customer should be considered the customer's on-premises Active Directory.
"Connect To" Identity Integration
In the Connect To model, the provider creates a direct connection between the customer datacenter and the provisioned customer Azure subscription using Site-to-Site VPN (or in the future over an Express Route connection) using the customer's network. This connectivity scenario requires that the customer connect directly through a customer network to access CSP provisioned Azure subscription services, using a direct network connection that is created, owned and managed either wholly or in part by the customer.
For Connect To customers, it is assumed that the provider does not currently have an Active Directory tenant identity store established for the customer. If the customer has an existing Active Directory, the provider may consider integrating the customer's Active Directory environment with the Customer Tenant for management of the customer's CSP subscription or for user authentication to CSP-provided services. Some customers may choose not to implement this integration and will instead use the Customer Tenant to store independent identities with separate credentials. Guidance on integrating on-premises Active Directory with Azure Active Directory can be found earlier in this document. This guidance should be consulted to determine if directory integration is desirable for the Connect To customer, and if so, how best to implement the integration.
The Azure Active Directory Graph API provides programmatic access to Azure AD through REST API endpoints. Applications can use the Graph API to perform create, read, update, and delete (CRUD) operations on directory data and objects. This enables CSP partners to programmatically access Azure Active Directory to automate many end-user management functions, including user license assignment, user role assignment, and managing domains. Programmatic access using Graph and other APIs allows for comprehensive automation of management activities and the ability for CSPs to integrate Partner Center functionality into custom applications and management tools for use by CSP agents or customers.
Activity |
Partner Center |
Azure Portal |
CREST API |
Graph API |
Windows PowerShell |
Manage Reseller Users |
✔ |
✔(e) |
✔ |
✔ | |
Manage Customer Users |
✔(e) |
✔ |
|||
Manage Customers |
✔ |
✔ |
|||
Manage Customer Profiles |
✔ |
✔ |
|||
Manage Orders |
✔ |
✔ |
|||
Manage Subscriptions |
✔ |
✔ |
|||
Manage Entitlements |
✔ |
✔ |
|||
Manage Event Streams |
✔ |
||||
Perform Admin Tasks |
✔(n) |
✔* | |||
View Service Health |
✔(n) |
||||
View Service Requests |
✔(n) |
(e) Denotes the existing Azure portal, i.e. https://manage.windowsazure.com
(n) Denotes the new preview Azure portal i.e. https://portal.azure.com
* Includes other Azure management interfaces like the Azure CLI and Azure REST API
The release of the Graph API for CSP Partners is provided to allow for the automation of end-user management functions. Note that the GRAPH API for CSP Partners is similar to the existing Azure Active Directory (AD) GRAPH API, but includes the ability to be used by Partners who have a delegated admin relationship with its customers.
Application Registration
Before you can call Graph API from an application, the application must be registered with Azure AD. Registration generates a security key for the application to use when accessing the Graph API and allows you to grant specify permissions to the application. Application registration can be done from the Azure portal. For more details on application registration, please see Integrating Applications with Azure Active Directory.
Multi-Tenant Access
Within the Azure AD application configuration, you may decide whether to allow your application to allow sign-in from other Azure AD tenants by selecting the multi-tenant option. In contrast to a single-tenant application, external users will need to consent to your application before you have access to query their directory with the Graph API.
Application Permissions
By default, applications have access to authenticate users in the Provider Tenant and read the authenticated user's profile. This does not permit access to Graph API for reading or writing other directory objects. To enable access to Graph API, you must grant the appropriate permissions (Read directory data, Read and write directory data) for Windows Azure Active Directory.
Acquiring a Security Token
Acquiring a security token from Azure Active Directory is the first step an application takes before making calls into Graph API. The security token is acquired once at the start of a session and can be used in multiple subsequent calls to Graph API.
Two pieces of information are needed for an application to obtain a security token:
Both values are obtained from the application's configuration page within the Azure AD management portal. The client ID can be read at all times, and keys can be generated and viewed a single time. After viewing, the keys remain active but cannot be seen.
The following example shows how to obtain a security token from Azure Active Directory. Replace the client ID value with your application's client ID and replace the client secret value with a valid API key generated for your application.
POST
https://login.windows.net/contoso.com/oauth2/token?api-version=1.0
HEADERS
Content-Type: application/x-www-form-urlencoded
BODY
grant_type=client_credentials&resource=https%3a%2f%2fgraph.windows.net&client_id=52752c8e-d73c-4f9a-a0f9-2d75607ecb8e&client_secret=qKDjII5%2FK8WyKj6sRo5a5vD6%2Bm74uk1A%2BpIlM%3D
RESPONSE: 200 OK
The security token will be returned to the calling application if all values are valid.
Getting user information
After obtaining an Azure AD security token, the Graph API can be called to read and write information from Azure AD.
The following example shows how to get user objects through the Graph API. Replace the Authorization: Bearer value with the security token returned from the prior call to the Azure AD.
GET
https://graph.windows.net/contoso.com/users?api-version=2013-11-08
HEADERS
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik5HVEZ2ZEstZnl0aEV1T….
Content-type: Application/JSON;odata=minimalmetadata
RESPONSE: 200 OK
User objects in JSON format will be returned if the request is successful.
Azure AD PowerShell can be used to manage objects in Azure AD, including objects in both the Provider Tenant and Customer Tenants. When managing objects in the Customer Tenant using an Azure Active Directory account from the Provider Tenant, you must provide some additional information to set the proper context for commands. By default, Azure AD PowerShell commands will operate against the Azure AD tenant of the authenticated user.
Before managing a Customer Tenant from PowerShell, first obtain the Customer Tenant ID. This can be found in the Partner Center as the customer's Microsoft ID.
The Tenant ID is then used to target PowerShell operations to the appropriate Azure AD tenant instead of the default tenant for the authenticated user. The Tenant ID can be provided as a parameter to most Azure AD PowerShell commands. For example:
$msolcred = get-credential
connect-msolservice -credential $msolcred
get-msoluser -All -TenantId 120355bc-ba8c-4207-b2db-80b2cb893a54
For more information on Azure AD PowerShell, please see Manage Azure AD using Windows PowerShell.
Security in the cloud-enabled world is full of determined adversaries who are constantly conducting targeted cyber-attacks. This threat environment is captured in a Microsoft whitepaper entitled Determined Adversaries and Targeted Attacks. These cybersecurity threats make securing a workload more challenging because these adversaries are adept at exploiting vulnerabilities in any layer.
An organization can leverage a trustworthy public cloud provider, such as Microsoft, to shift some security burden for lower-level components to the provider, so that they can focus their security resources on higher-layer and higher-value security operations. To help illustrate the challenges faced by most organizations in this space, Microsoft has published a Cloud Security Framework that maps customer responsibilities for security in a cloud enabled world.
Moving the first enterprise workload to a public cloud service, such as Microsoft Azure, represents a significant change for several aspects of how IT services will be provided and secured by the organization.
The most significant change for the security organization is likely learning how to establish, validate, and maintain trust for the cloud platform provider. Although this document covers security topics for storage, network, compute, and identity throughout each topic area, this section will attempt to summarize the security strategies and approaches most customers can take when planning to adopt Microsoft Azure.
To facilitate rapid onboarding and the realization of security benefits that come from hosting workloads in the cloud, the security strategy in this reference architecture includes two target security levels:
Some key technical security and risk changes faced by organizations that transition from a traditional on-premises workload hosted on physical or virtual hardware to a hosted public cloud model include:
As organizations evaluate their security posture in the face of these changes, they need to decide what risks must be addressed as part of the Azure onboarding planning, which risks should be addressed independently, and which can be addressed after onboarding.
The primary sources for new or increased risk when moving a workload to the cloud are:
The following sections provide an overview of these risks, their considerations, and potential mitigations.
Microsoft Azure Infrastructure-as-a-Service (IaaS) is analogous to an on-premises virtualization solution in many ways. On-premises and public cloud IaaS solutions provide similar services. At the platform level, they effectively have full technical control of all hosted workloads including data, applications, operating systems, and associated secrets, and virtual devices.
The key security difference of an IaaS solution hosted in Azure is that the administrative interfaces are potentially accessible through public Internet interfaces. A typical on-premises virtualization deployment may be restricted to only a corporate Intranet or to a dedicated management VLAN.
This change of scope potentially results in a greater availability to attack a given system, therefore resulting in a greater likelihood of attack on the system. This risk exposure is always present for Azure IaaS and PaaS solutions, but it may or may not be present given the current state of a workload.
Due to this consideration, additional measures must be taken to reduce this risk and protect the potential for accidental exposure of administrative controls to external services.
This risk is specific to publicly hosted IaaS workloads that are transitioned to Azure by using a "lift and shift" migration approach. Typically, these systems have internal controls which are tuned for external threats and potentially ease-of-use and ease-of-management, given their previous security posture.
If not configured correctly during post-migration, a virtual machine may accept traffic from the Internet for remote management, including Remote Desktop (TCP 3389) and Windows PowerShell (TCP 5986). This potentially allows attackers to attempt to connect and authenticate to these resources, enabling them to abuse or test stolen credentials and attempt dictionary attacks or brute force attacks on the virtual machine passwords.
This is a "net new" risk to the workload when migrating to Azure because most on-premises virtual machines and physical servers are protected from unsolicited network traffic by Internet firewalls and other security appliances.
Mitigating threats such as these are critical to any migration activity and should be included in any post-migration tasks when transitioning workloads from on-premises to an Azure subscription.
Organizations may need to rapidly adopt Azure services because of business priorities, budgetary reasons, security incidents, project schedules, or other reasons specific to the organization. When planning security for rapidly onboarding to Azure, organizations should take a "do no harm" approach to ensure that security risks are not increased when compared to the current on-premises posture for the organization.
These "do no harm" measures should be implemented during or prior to the onboarding process. They include the implementation of compensating controls for all new or increased risks created by moving a workload to the public cloud.
This approach is commonly used when moving Active Directory Domain Services (AD DS) domain controllers to Azure, given the sensitivity of the data, secrets, and password hashes stored there.
The risk created by the availability of administrative interfaces to the public Internet should be mitigated through the combination of protections for the authorized administrative accounts and the configurable tenant administrative interfaces. The following table describes these protections:
Mitigation Approach |
Mitigation Strategy |
Administrative standards and practices |
Establish standards for administration that are compliant with the rules in this section of this document. |
Multi-factor authentication |
Ensure all administrative tenants leverage multi-factor authentication when accessing Azure subscriptions. |
Hardened administrative workstations |
Ensure all subscription administrators with privileges over "Tier 0" assets are using hardened administrative workstations. Guidance for building hardened workstations can be found in the following documents: |
Administrative model |
Document and validate the organization's intended Azure AD Administrative model including:
|
Logon restrictions |
Restrict administrative activity to only authorized administrative workstations |
Administrator education |
Educate administrators with subscription administration privileges on the
|
Logging |
Ensure security personnel are monitoring Azure AD administrative activity |
Response and recovery processes |
Add guidance for the Azure portal and resources to existing response and recovery framework. If a framework or processes don't exist, create some that include Azure guidance. |
The risk created by the potential inadvertent exposure of intranet services to public Internet should be mitigated by configuring the network security protections and guidance found within in this document.
Microsoft Azure has a strong commitment to security and compliance. Several international, industry, and regional organizations have independently certified that Microsoft cloud services, including the Microsoft Azure platform, meet rigorous security standards and are trustworthy.
By providing customers with compliant, independently verified cloud services, Microsoft also makes it easier for organizations to achieve compliance for infrastructure and applications deployed within the service. Organizations that use the Azure platform for workloads that require compliance validation or certification should:
Because certification status can be updated frequently, it is crucial that organizations regularly review the Office 365 and Microsoft Azure Trust Centers for a complete list of security certifications and more information.
Feature References | |
Office 365 Trust Center |
http://products.office.com/en-us/business/office-365-trust-center-cloud-computing-security#welcome |
Azure Trust Center |
To establish a full security strategy when implementing solutions in or consuming services from Microsoft Azure, the following considerations should apply:
Mandatory:
| |
Recommended: Integrate Azure logs with current security tools, such as Security Information and Event Management (SIEM). |
To establish security operations for Microsoft Azure through administrative controls, practices, and procedures, the following considerations apply:
Mandatory:
| |
Recommended:
|
With the advent of cloud computing and capabilities like Microsoft Azure, many organizations are fundamentally reinventing how they acquire, use, and manage IT services. This shift presents an opportunity to take a fresh look at how core security elements are provided and how to improve those in the new designs for this age.
The strategies in this section apply to all enterprise resources, including those hosted on Azure and on-premises. It is critical to include the recommendations in this section early in the planning and implementation for Azure to prevent the cost of retrofitting them later.
This section describes considerations for designing security containment and segmentation strategies in an enterprise IT environment. They are shaped by:
As organizations move workloads to the cloud, they must address threats in new ways and shed legacy security practices that often have proven to be ineffective and burdensome. In some cases, extending to the cloud provides an opportunity to implement security controls and contain adversaries in ways that are more challenging to accomplish in existing on-premises environments. Although containment strategies are not new, the traditional network-centric approach has failed in several ways and needs to be updated.
This section defines the following terminology:
The notions of containment and segmentation have been around for a long time in IT security, though the interpretations of how to implement them have varied in practice. This document starts with an assume breach mindset and calls for designing security controls to prevent propagation of breaches among enterprise assets.
This requires architects and system designers to look at what a breached system or compromised account means to the environment so as to limit the impact of that breach, to make it detectable, and to enable the organization to respond.
This assume breach approach complements the traditional perimeter approach focused on preventing breaches for a combined approach that results in a more resilient strategy.
Traditionally, many IT security organizations have built segmentation and containment strategies primarily by using firewalls that filter IP traffic by protocol and port rules. These designs typically include a production intranet, an extranet (sometimes called a demilitarized zone (or DMZ), and sometimes additional segment isolation within or outside of production using firewalls.
Although many elements of these designs are valid, there are fundamental shortcomings with this approach that have led to failure of these strategies in the industry:
The net result of most of these designs is a failed strategy that is difficult to implement, costly to the organization, and repeatedly proven to be easily evaded by attackers and penetration testers.
The recommendations in this document intend to overcome these challenges by setting forth guidelines that allow effective segmentation to bring the greatest security value (blocking and detecting attackers) and minimize the operational overhead.
A containment strategy should focus on:
The strategy should also consider the lifecycle of the asset. Many organizations will develop, test, train within the Azure environment. Each of these lifecycle stages potentially has different risk levels, asset valuations, and personnel responsible to administer the assets.
Mandatory:
|
Organizations should consider the following list of examples as the maximum types of zones that should be created (note that some are not applicable to infrastructures hosted in Azure).
Zone |
Description |
Production |
Environment to host collaboration, office automation, and line-of-business applications |
Extranet |
Environment to host applications interacting directly with Internet endpoints |
Non-production |
Lab environment for development and validation testing |
Fabric environment |
Environment for private clouds and out-of-band infrastructure management |
Legacy isolation |
Environment for assets that cannot be secured or updated, such as mission-critical computers running Windows XP, Windows 2003, and earlier operating systems |
High-value asset |
Environment such as specialized research and collaboration systems or Very Important Personnel (VIP) or executive collaboration |
Regulated functions |
Environment for financial (such as PCI or ATM) or medical/health/privacy (such as HIPAA) assets |
Physical control and machinery zones |
Environment for Process Control Networks (PCN) or Supervisory Control And Data Acquisition (SCADA) devices |
Containment between security zones needs to isolate groups of assets that have significantly different business and mission values, risk exposure, and/or trust level. The containment measures must be aggressive, and exception management should always include deep analysis and consideration of risks and mitigations.
Every exception and mitigation should consider controls at all layers (such as identity, application, data, and network)—not only network ports and protocols. The design of containment between zones should be holistic and include controls at all layers to provide:
In its simplest form, a security zone is a fully separate IT environment with an independent infrastructure of services, including identity, management, and monitoring. Although the same personnel may manage multiple zones, each shared service that interacts with multiple zones creates a technical means of control and a security risk.
Isolation between zones provides excellent security benefits by making it very difficult for an attacker to cross the trust boundary. However, each additional zone also adds overhead for implementation, maintenance, and daily operation that may create cost and security assurance degradation.
Each security zone has to be defended and monitored, so each additional zone brings equipment and configuration to maintain, each zone may add additional steps to perform regular tasks across the boundary, and each adds complexity that is hard to sustain because personnel inevitably changes. Additionally, complexity creates challenges with measuring compliance, making it difficult to be confident that the security assurances are working as expected.
For these reasons, it is important to choose the simplest strategy that meets your business and mission protection goals. This security zones and containment strategy should ultimately span across the on-premises and the Azure-hosted assets.
Most environments will require at least two security zones, production and extranet, to host assets for internal corporate and public-facing assets, as illustrated in the following image. Any additional security zone considerations should take into account the increased cost-over-time to maintain the environment. As the zone count and complexity increases, the cost and likelihood of security degradation over time also increases.
You should only create a security zone if you are prepared to maintain and defend that trust boundary with the same rigor that you would expect a software vendor to create a security update for a vulnerability—working constantly until the risk is mitigated.
Figure 5 - Examples of Security Zone
Some asset types almost always warrant a dedicated separate zone. If the organization owns and manages the following types of assets, it is strongly recommended to create a separate security zone to separate them from a risk of attack from a compromised production or an extranet asset.
Some asset types may warrant a separate security zone depending on the asset valuation, trust levels, and/or risk exposure profiles in the context of the organization. These are some examples of when organizations might choose or circumstances might dictate a different strategy for the same type of asset:
Recommended:
|
For more information about the threat modelling, see:
Because of the need for fluid interactions between similarly valued and trusted assets, the goal of containment within a security zone focuses on defending the integrity of a common administrative model and the resulting control relationships and permissions. This is similar to how you would put armed guards at a building entrance, but place electronic door locks with badge readers on rooms in a building.
Containment within the zone should focus on:
Containment strategies within a zone should start with credential theft log-on restrictions because these attacks are widely available and used prevalently. These defenses should follow the Tier model described in Mitigating Pass-the-Hash and Other Credential Theft, Version 2.
The primary means of maintaining the integrity of the common administrative model within a zone should focus on the following standards:
Additional information can be found about these standards in the Microsoft Azure Security section in this document.
Note: Because Active Directory domains are not security boundaries within a forest and they add management overhead, you should avoid using them for security containment within a zone.
Figure 6 - Examples of Security Zones with Tiers
Mandatory:
| |
Recommended:
|
Most enterprise organizations that adopt Azure will continue to operate a production identity system that includes Windows Server Active Directory (AD) on-premises, hosted on Azure IaaS, or both.
Because of the importance of identity systems to the overall security posture of the organization, including assets hosted on Azure, this document includes a set of security standards for Active Directory and related assets.
This section describes the standards and expectations for securing administrative control of all an organization's information technology systems. These standards will have to be adapted to the requirements and risk appetite of the organization, but the defaults are constructed to recommended practices so they can be used as a benchmark or end state for planning.
The standards in this section assume that the organization has the following attributes:
Administrative control represents the ability to exert full technical control of all system functions. This ability to "control" the system is separate and distinct from "security controls" that prevent or detect activity within the system.
By commonly targeting administrative control of identity systems, cybersecurity attackers can control most or all identities and computing assets in an organization. This makes the security assurances for all identity systems including Active Directory a critical mission imperative.
Maintaining the security standards of a system is a critical to maintaining positive administrative control of that system. If administrative privileges are poorly defined or not followed, the entire enterprise is at high risk of compromise by hostile actors.
Getting positive enterprise control requires restricting all means of administrative control—much like controlling access into a city requires restricting all means of transportation, such as road, rail, and air. This document categorizes the means of controlling the identity store into three types:
The security of software that is developed in-house relies on using the security practices during the development cycle in addition to the maintenance and response processes.
Recommended: All custom developed applications should be developed with a full security development lifecycle to minimize risk. | |
Mandatory: All custom applications should go through a threat modelling exercise to systematically identify risk and mitigations to the application's design and configuration. |
For information about these topics, see:
The Cloud Platform Integration Framework (CPIF) is an extension of the IaaS Product Line Architecture (PLA) previously developed and released by Microsoft. This extension incorporates additional information about implementing workloads in public and hybrid cloud environments.
These scenarios can be deployed by using the principles of the framework, rather than being implemented against a rigid architecture. The CPIF architectural pillars group areas of Azure architecture together and define consumption methodologies within the pattern guides.
The Architectural Pattern Guides provide guidance about the services, construction, and deployment of the services within a larger solution. This section covers the various discipline areas within each pillar as they pertain to deploying solutions in a target Microsoft Azure subscription.
Deployment is a key facet of Azure implementations, and it is usually one of the first steps in following an Azure subscription design. Customers new to Azure commonly have their first experiences with Azure during a deployment of Azure resources.
Deployment of Azure constructs can involve many layers – subscriptions, accounts, virtual networks, virtual machines, Azure websites, and other services. Although various mechanisms exist to provision Azure resources, to scale beyond simple deployments, many customers must incorporate automation solutions to effectively scale their deployments.
Additionally, automation solutions can provide standardization (such as naming conventions and configuration standards), which surpass the built-in capabilities and range of choices provided by the Azure portal.
Azure is a unique implementation of cloud services in that it provides each and every service at hyper-scale, over a global infrastructure, over a common API and accessible through a common UI (often referred to by customers as a "single pane of glass"). This API layer is accessible through several programming languages and supports rich automation scenarios. Automation within the Azure infrastructure can be accomplished using a variety of toolsets. Usually, an automation architecture will follow shortly after the Azure networking, storage, subscriptions and other key architectural decisions are made. These prerequisites will be inputs required for the automation toolsets which instantiate different parts of the environment or solution within Azure. Note that depending on the Azure region(s) to be used, certain toolsets may not be available.
Each customer environment should also be evaluated for existing toolsets used for on-premises or public cloud automation. Customers may wish to leverage existing expertise and investments in existing toolsets and extend the on-premises functionality to support the automation of Azure constructs. However, other organizations may wish to have the automation supporting Azure handled separately, without dependencies on legacy or existing platforms. This discovery of each organization's environment and automation strategy should serve as an input into the overall automation architecture.
The Azure APIs mentioned earlier in this document are complemented through a comprehensive library of PowerShell cmdlets. PowerShell scripting can provide a wide variety of automation solutions, and the Azure PowerShell cmdlet library can support native and third-party automation capabilities.
Although third-party tools are not directly highlighted in this document, customers with extensive third-party automation toolsets already deployed and that support PowerShell may consider extending the functionality by using Azure PowerShell scripts within the existing tools.
Standard PowerShell scripts are commonly the first iteration of automation that customers explore when they are new to Azure. Generally, this automation is later incorporated into new or existing toolsets to standardize automation within Azure across the organization.
Feature References | |
Install Azure PowerShell cmdlets |
http://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/ |
Azure Cmdlet Reference (TechNet) |
https://msdn.microsoft.com/en-us/library/azure/jj554330.aspx |
Getting started with Azure PowerShell (Ed Wilson) |
Mandatory: Download and install the most current edition of the Azure PowerShell cmdlets through the web platform installer. | |
Recommended: Azure Automation solutions that use PowerShell cmdlets should be designed using PowerShell scripting best practices. Additionally, PowerShell scripts should be designed by incorporating PowerShell workflow processes into the initial design, or the scripts can be refactored in the future to use PowerShell workflows. | |
Optional: Consider using the Azure PowerShell cmdlets to build automation solutions on your existing platform if your Azure Automation solutions leverage third-party automation platforms. |
Design Guidance |
Azure PowerShell scripts should follow a number of design recommendations. This section covers some of the following design considerations:
Credential protection - We recommend that authors create a dedicated Azure authentication function. These functions should leverage [System.Security.SecureString] to secure the user passwords during runtime. Following is a code snippet example of an authentication function:
Param(
[Parameter(Mandatory=$true)][string]
$AzureUserName,
[Parameter(Mandatory=$true)][System.Security.SecureString]$AzurePassword
)
Import-Module
Azure
$azureCred
=
New-Object
System.Management.Automation.PSCredential($AzureUserName,$AzurePassword)
Subscription details - After authenticating to Azure, it is advisable to display the subscriptions the user has access to. Generally, administrators in Azure will have access to multiple subscriptions, and they should choose which subscription to leverage at run time.
This approach is generally more flexible than static configuration, file-based subscription assignment of the script actions. To simplify the subscription choice for the user, append an automatically generated number for the user to input, and reference this read-host selection back to the object of available subscriptions:
$subscriptionList
=
Get-AzureSubscription
ForEach ($subscription
in
$subscriptionList) {
$subscriptionNumber
++
$properties
= @{'SubscriptionNumber'=$subscriptionNumber;
'SubscriptionName'=$subscription.SubscriptionName
'SubscriptionID'=$subscription.SubscriptionId
}
$PSObject
=
New-Object
-TypeName
PSObject
-Property
$properties
$subscriptionResults
+= @($PSObject)
}
return
$subscriptionResults
}
Object naming deconfliction - For provisioning scripts, we recommend that you test the name of the object you intend to provision. Azure objects have requirements to be globally unique. For example, the name of a storage account must be unique across all of Azure.
Other objects must be unique within an individual subscription, for example virtual machine names. Authors should test the name prior to attempting a provisioning action. Generally, test within the current subscription first to see if the object already exists. If the name is not already used in the subscription, use the following script to leverage the Test-AzureName cmdlet to test if the name is taken in Azure.
$testAzureServiceName
=
Test-AzureName
-Service
$cloudServiceName
If ($testAzureServiceName) {
Write-Host
'Cloud Service Name already taken, choose another name and run script again'
Return
$null, $false
}
Else {
Write-Host
'Cloud Service Name Available - creating cloud service...'
New-AzureService
-ServiceName
$cloudServiceName
-Location
$azureRegion
|
Out-Null
Write-Host
"Cloud Service $cloudServiceName created" }
Object pre-existence verification - While performing a provisioning action in a series of actions (for example, provisioning a storage account, then provisioning a storage container within the account), we recommend that you check the status of the newly provisioned object prior to moving to the next provisioning action. The following script example uses a Do While loop that incorporates a Get-AzureService request on the new object:
Do {
Write-Host
"Finding status of cloud service $cloudServiceName ..."
$selectedCloudService
=
Get-AzureService
-ServiceName
$cloudServiceName
-ErrorAction
SilentlyContinue
-WarningAction
SilentlyContinue
If ($selectedCloudService.Status -eq
'Created') {
Write-Host
"Found cloud service $cloudServiceName - Cloud Service Status: $($selectedCloudService.Status)"
}
Start-Sleep
-Seconds 1
}
While ($selectedCloudService.OperationStatus -eq
'InProcess')
Desired state configuration (DSC) within Azure allows for virtual machine configuration by using a configuration file (PowerShell script) and a virtual machine extension.
This configuration file is uploaded to Azure blob storage and is applied to the virtual machine that is enabled with the PowerShell DSC extension. Some examples what DSC enables are:
• Install or remove server roles and features
• Manage registry settings
• Manage files and directories
• Start, stop, and manage processes and services
• Manage local groups and user accounts
• Install and manage packages, such as .msi and .exe
• Manage environment variables
• Run Windows PowerShell scripts
• Fix a configuration that has drifted away from the desired state
• Discover the actual configuration state on a given node
The following illustration provides insight into the DSC flow phases and the Push and Pull models.
These phases are outlined in the following table.
Phase |
Definition |
Authoring |
A DSC configuration is created through PowerShell or by third-party languages and tools. The output from the Authoring phase is one or more Management Object Format (MOF) files, which is the format that is consumable by DSC. |
Staging |
DSC data (MOF files) are staged. When using the Pull model, DSC data and custom providers are kept on the Pull server (an IIS server with an OData interface) and the target system contacts the Pull server by passing a URI and a unique identifier to retrieve its configuration. When using the Push model, DSC data is being pushed to the target system directly. |
"Make it so" |
The final phase is to apply the configuration. DSC data is pulled or pushed to the local configuration store, which contains the current, previous, and desired state configurations. The configuration gets parsed, and the relevant WMI provider implements the change. |
Azure DSC Extensions allow PowerShell desired state configuration to configure your Azure virtual machines. The DSC Extension handler has a dependency on Windows Management Framework 5.0. This is automatically installed by the extension handler. Currently, the only Windows Server virtual machine images that support the DSC extension handler are Windows Server 2012 R2 images due to a dependency on Windows Management Framework 5.0.
Feature References | |
Introduction to Azure PowerShell DSC |
|
Configuring a virtual machine via PowerShell DSC |
|
Built-In Windows PowerShell Desired State Configuration Resources |
|
Build Custom Windows PowerShell Desired State Configuration Resources |
|
Windows PowerShell Desired State Configuration for Azure |
https://msdn.microsoft.com/en-us/library/azure/dn877980.aspx |
Optional: Solutions that leverage provisioning automation can consider the use of DSC, in combination with other automation, to deliver a final product that is configured to the desired specifications. Because there are limitations on what DSC can deliver, organizations may find that they need to augment DSC with additional automation capabilities. |
Service Management Automation (SMA) is primarily an engine that allows running .NET Framework Windows Workflow Foundation activities. SMA is typically accessed through the Azure portal; however, solutions can also leverage the SMA OData-based web service to call SMA directly.
The SMA environment leverages the .NET Framework, so 64-bit functions can be utilized. This is a key difference when compared to solutions such as System Center Orchestrator, which leverages the 32-bit Windows PowerShell 2.0 engine, regardless of the underlying operating system capabilities.
Feature References | |
Service Management Automation overview |
|
Service Management Automation comparison to other tools |
|
Getting Started with PowerShell Workflow |
|
Running PowerShell commands in a Workflow |
Recommended: Azure solutions should leverage SMA if parallel or long running activities are required. Additionally, SMA should be part of solutions looking to automate Azure resources and on-premises resources. | |
Optional: Solutions that leverage SMA can optionally integrate with Orchestrator runbooks via a third-party integration pack or by calling the Orchestrator/SMA APIs. |
Design Guidance |
SMA automation runbooks are based on PowerShell workflows, rather than traditional PowerShell scripts. PowerShell workflows provide additional benefits over traditional PowerShell scripts, specifically around activity parallelism and long-running activities.
Typically, long-running activities are seen as riskier in traditional PowerShell scripts due to the stateless nature of the execution environment. However, PowerShell workflows can accommodate interruptible activities, such as stop, restart, and reboot.
Traditional PowerShell scripts are generally not designed for parallelism, whether this is within one machine or across multiple machines. PowerShell workflows have parallelism as a built-in feature that can be leveraged (depending on the design of your workflows). The following considerations apply:
Scenario |
Model |
Points to Consider |
SMA-based automation |
Green-field or existing SMA automation for hybrid scenarios |
|
SMA combined with other automation platforms |
Leverage existing customer investment in other automation platforms |
|
Standard PowerShell in a PowerShell Workflow |
Attempting to run existing PowerShell code or functions that are not supported in a PowerShell Workflow natively |
|
System Center 2012 R2 Orchestrator is a 32-bit automation platform that helps you automate the creation, monitoring, and deployment of resources in your environment. Many organizations have adopted Orchestrator based on the GUI designer, third-party ecosystem of integration packs and the relatively low investment in creation of initial or simple automation solutions.
However, there are a number of considerations when using Orchestrator to automate Azure solutions. The primary consideration is that the 32-bit engine can potentially limit performance (currently limited to 50 concurrent runbooks). PowerShell execution and capabilities are limited to Windows PowerShell 2.0 32-bit (unless more complex external calls are used), and automation capabilities are enabled by (and limited to) the Orchestrator Integration Pack framework.
Feature References | |
Orchestrator Azure Integration Pack |
|
Kelverion Integration Pack |
http://kelverion.com/integration-packs/ip-microsoft-system-center/ |
Orchestrator TechNet page |
Mandatory: Azure Automation using Orchestrator should be deployed in an architecture with high availability. Orchestrator may require a separate, high-availability architecture, and it is generally complex (requiring high availability for Management server and SQL Database). | |
|
Recommended: Orchestrator solutions that automate Azure should consider using Azure Integration Packs (Microsoft or third-party) to speed the creation of the Orchestrator runbooks. |
Optional: Azure can be automated by using System Center Orchestrator. However, if you do not currently have Orchestrator deployed, other automation platforms, such as SMA or Azure Automation may be a better choice. |
Design Guidance |
System Center 2012 R2 Orchestrator can be leveraged as an automation solution for Azure environments. One of the key drivers that may support the use of Orchestrator is the existing automation that is housed within Orchestrator runbooks. Additionally, if you have experience with Orchestrator automation for other projects or applications, the choice to leverage that existing investment in administrator abilities may be right for some organizations.
One of the key considerations of an automation design that uses Orchestrator should include the selection of Integration Packs. Microsoft provides an Azure Integration pack for Orchestrator that provides pre-built activities that can be leveraged in a runbook.
Additionally, third parties have developed Orchestrator integration packs that can be evaluated for enhanced functionality. Microsoft System Center 2012 integration pack by Kelverion provides automation activities beyond the default product functionality.
Orchestrator can also be used as a component of an automation solution by leveraging connections to Service Management Automation through API calls or by using an Orchestrator activity in the Kelverion Orchestrator integration pack.
This is particularly useful in a hybrid scenario where Orchestrator is used to provision and configure on-premises virtual machines and do post-provisioning activities. These same runbooks may be leveraged to do the same post-provisioning activities on Azure virtual machines with minimal changes.
Scenario |
Model |
Points to Consider |
Orchestrator-based automation |
GUI driven automation |
|
Orchestrator-integrated automation |
Orchestrator added to other automation tools |
|
Azure Automation is a PowerShell workflow-based environment hosted in Azure. This environment is similar to on-premises implementations of Service Management Automation. These PowerShell workflows can be started manually or at a particular time.
Feature References | |
Azure Automation platform comparison and reference |
https://msdn.microsoft.com/en-us/library/azure/dn643629.aspx |
Azure Automation Overview |
https://azure.microsoft.com/en-us/documentation/articles/automation-intro/ |
Azure Automation E-book |
|
Azure Automation Training |
|
Hybrid Runbook Workers |
https://azure.microsoft.com/en-us/documentation/articles/automation-hybrid-runbook-worker/ |
Mandatory: Azure Automation solutions cannot leverage on-premises resources that are not exposed publicly. | |
Recommended: Azure Automation based solutions should contain checkpoints for long-running runbooks and workflows. This allows for more intelligent recovery should a runbook fail. | |
Optional: An Azure Automation based solution can be developed locally with the PowerShell ISE because it follows the same design patterns as PowerShell workflows. |
Design Guidance |
Azure Automation is a platform to create, manage, and execute automation for various Azure components. There are a number of design considerations, particularly for solutions that will span on-premises resources and Azure.
One challenge is that Azure Automation cannot leverage on-premises resources that are not accessible over the Internet. This can be a significant limitation for hybrid deployments, or deployments that are fully cloud-based, but use existing on-premises management tools (such as antivirus and monitoring tools).
Note that Azure Automation cannot access Orchestrator or SMA runbooks. This can be a significant design consideration if an existing investment in automation platforms is required.
Whether it is self-hosted in Azure or on-premises, these platforms cannot be leveraged for a hybrid automation solution. Typically, Azure Automation would be used in a scenario to deploy or manipulate Azure resources, while additional automation could be used for post-provisioning automation actions when existing automation runbooks already exist for an on-premises resource configuration. Solution designs that require both automation environments should consider the tradeoffs associated with complex or multiple automation solutions.
Scenario |
Model |
Points to Consider |
Exclusively using Azure Automation |
Automation environment as a service |
|
Multi-platform environment: leverage Azure Automation with other platforms |
Azure Automation plus additional functionality |
|
Azure Resource Groups allows you to group all related components of your application as a logical unit. This allows you to simplify the lifecycle of applications from creation to deletion.
Feature References | |
Azure Resource Groups |
http://azure.microsoft.com/en-us/documentation/articles/azure-preview-portal-using-resource-groups/ |
Azure Resource Manager Template Language |
https://msdn.microsoft.com/en-us/library/azure/dn835138.aspx |
Design Guidance |
When you design solutions by using Azure Resource Groups, consider the following scenarios.
When deciding between one large resource group or multiple smaller resource groups, it is best to model the structure based on your organizational structure and ultimately what you feel works well for your company.
Scenario |
Model |
Points to Consider |
Complex structures (involving a website, SQL Database, Redis Cache, and so on) |
Pattern or model for implementation |
|
When designing Azure Resource Groups by using JSON templates, we recommend that you adhere to the following guidance:
Scenario |
Model |
Points to Consider |
Standard website deployment |
Continuous deployment with MSDeploy |
|
Azure provides scalable, durable cloud storage, backup, and recovery solutions for any data— big and small. You can use Azure backup capabilities with your existing infrastructure and backup investments to:
This section provides an overview of the backup and disaster recovery capabilities provided by Azure.
Microsoft Azure Backup is a feature within Azure that enables off-site file and folder backups from the on-premises Windows Server or System Center Data Protection Manager Server to Azure Storage.
By using incremental backups, only changes to files are transferred to the cloud. This helps ensure efficient use of storage, reduced bandwidth consumption, and point-in-time recovery of multiple versions of data.
Configurable data-retention policies, data compression, and data-transfer throttling also offer added flexibility and help boost efficiency. Backups are stored offsite in Azure, which reduces the need to secure and protect on-site backup media.
Scenarios include:
Feature References | |
Azure Backup Overview |
https://azure.microsoft.com/documentation/articles/backup-introduction-to-azure-backup/ |
Configure Azure Backup |
http://azure.microsoft.com/en-us/documentation/articles/backup-configure-vault/ |
Administer Azure Backup with Windows PowerShell |
https://msdn.microsoft.com/en-us/library/azure/hh831765.aspx |
Azure Backup in 10 minutes |
https://azure.microsoft.com/en-us/documentation/articles/backup-try-azure-backup-in-10-mins/ |
Azure Import/Export Service |
http://azure.microsoft.com/en-us/documentation/articles/storage-import-export-service/ |
Azure Backup for IaaS (Preview) |
|
Configure Azure Backup to quickly and easily back up Windows Server |
http://azure.microsoft.com/en-us/documentation/articles/backup-configure-vault/ |
Backup and Recover Using the Azure Backup Agent |
https://azure.microsoft.com/documentation/articles/backup-azure-backup-and-recover/ |
Mandatory:
| |
Recommended: If you are facing network constraints or need to accelerate the initial back up of your data, you can ship your data through a disk to the nearest Azure datacenter through the Azure Import/Export service. | |
Optional:
|
Azure Backup support for backing up an Azure IaaS virtual machines with Azure Backup is now generally available. With this support, customers can back up their virtual machines in Azure like they back up their on-premises virtual machines and physical servers by using System Center 2012 R2 Data Protection Manager (DPM) or any third-party backup tool.
When you back up from an on-premises environment to Azure, you first need create a backup vault. If you currently have an existing backup vault created in the region where your virtual machines reside, you can use it for backing up your IaaS virtual machines. You manage your Azure Backup for IaaS virtual machines from the same portal as you do for backing up on-premises workloads.
With Azure Backup, you get the expected features from a backup solution, such as a backup schedule to determine when to back up your virtual machines and a retention policy to determine how long you will keep your backup.
You do not need to shut down your virtual machine to do a backup. Azure Backup currently supports application-level consistency for virtual machines running Windows operating systems, and file-system-level consistency for virtual machines running Linux operating systems.
To back up IaaS virtual machines, you need to deploy absolutely nothing. The storage and compute infrastructure requirements are automatically handled by the Azure Backup service.
You need not worry about scaling up either. You can protect as many virtual machines as needed, at any time. Azure Backup also reduces the overhead of maintenance by automatically handling the virtual machine extension upgrade, without user intervention.
Feature References | |
Back up Azure virtual machines with Azure Backup |
https://azure.microsoft.com/en-us/documentation/articles/backup-azure-vms/ |
Azure Backup - Announcing general availability of backup for Azure IaaS VMs |
https://azure.microsoft.com/en-us/blog/general-availability-of-backup-for-azure-iaas-vms/ |
Announcing pricing model for TCO reduction |
|
Estimating Azure Backup billing usage for DPM data sources |
https://gallery.technet.microsoft.com/Estimating-Azure-Backup-e0d4abbc |
Azure Backup Cost Calculator |
https://azure.microsoft.com/en-us/pricing/calculator/?scenario=data-management |
Azure Backup - Monthly bill estimate and TCO calculator |
https://gallery.technet.microsoft.com/Azure-Backup-Monthly-bill-093fd095 |
Mandatory: The backup is stored in the same backup vault that you used to register the virtual machine. To access those backups for restore purposes, click the Protected Items tab. | |
Optional: You have the option of choosing locally redundant storage (LRS) or geo-redundant storage (GRS) for the backup data, independent of other non-backup-related storage accounts. You can lower your storage costs by choosing LRS instead of the default GRS. For example, you could benefit from choosing LRS if you want to replace data on tapes with one copy in Azure. |
Design Guidance |
Azure Backup works with your existing data protection software, whether it's Windows Server Backup or System Center Data Protection Manager (DPM). If you do not have an existing investment in Windows Server Backup or DPM then you can leverage Azure Backup as a standalone data retention and protection solution.
Azure Backup implements an optimized blob copy that ensures constant, predictable I/O and backup times. Consider the following scenarios:
Scenario |
Model |
Points to Consider |
Azure Backup (IaaS) |
Application-consistent back up of IaaS virtual machines |
|
Azure Backup (IaaS) |
Fabric-level backup |
|
Azure Backup (IaaS) |
Policy-driven backup and retention |
|
Azure backup (SQL Server in IaaS) |
SQL Server instance backups |
|
System Center 2012 R2 Data Protection Manager (DPM) is an enterprise backup system. By using DPM, you can back up (copy) data from a source location to a target secondary location. If original data is unavailable because of planned or unexpected issues, you can restore data from the secondary location.
By using DPM, you can back up application data from Microsoft servers and workloads, and file data from servers and client computers. You can create full backups, incremental backups, differential backups, and bare-metal backups to completely restore a system.
Feature References | |
DPM and Azure Backup |
https://msdn.microsoft.com/en-us/library/azure/dn337332.aspx |
Recommended: Find and | |
Optional: If you leverage Operations Manager you can download the DPM management pack to extend monitoring coverage of your Azure backup process for Data Protection Manager. |
Starting with Microsoft System Center 2012 SP1, Data Protection Manager (DPM) can back up production workloads directly to Microsoft Azure by using integration with the Azure Backup service. This integration provides organizations with an option for an offsite backup location without having to manage tape libraries or disk backup then shipping them to an offsite location.
To leverage DPM integration with Azure, you need to create a storage vault in the Azure portal. The data that is backed up in the storage vault is secured by using SSL certificates and strong passphrases.
After the storage vault is configured, you need to install a Windows Backup Agent on your DPM server and register the DPM server with Azure. System Center Data Protection Manager provides rich reporting capabilities as illustrated in the following graphics
.
The Data Protection Manager reporting framework includes the following feature set:
Feature Area |
Capabilities |
Customization |
Custom scripts Documented views Customized user interface |
Aggregation |
All your DPM data in System Center Operations Manager data warehouse Scalable to any number of DPM servers |
Flexibility |
Rich UI No coding restriction Edit SQL Server queries |
Feature References | |
Configure the Azure Backup vault and register the DPM server |
https://msdn.microsoft.com/en-us/library/azure/dn337336.aspx |
Set up DPM to back up to Azure |
https://msdn.microsoft.com/en-us/library/azure/dn337341.aspx |
Recover DPM data from Azure |
https://msdn.microsoft.com/en-us/library/azure/dn337334.aspx |
Recommended: To leverage the full functionality of DPM integration with Azure Backup and Storage, please ensure you are running a minimum configuration of System Center 2012 R2 with Update Rollup 5. |
Design Guidance |
When integrating DPM and Azure Backup, consider the following:
Capability Considerations |
Capability Decision Points |
Cost |
Azure backup vault and DPM offer a potentially less expensive replacement to traditional tape backups. |
Compute instance planning |
DPM is supported on any Azure IaaS virtual machine that is size A2 or higher. Create an instance in the Standard compute tier because the maximum IOPS per attached disk is higher in the Standard tier than in the Basic tier. |
Storage account planning |
Use a separate storage account for the DPM virtual machine, because there are size and IOPS limits on a storage account that might impact the performance of the DPM virtual machine if it is shared with other running virtual machines. |
Virtual network planning |
The DPM virtual machine and the protected workloads should be part of the same Azure virtual network. |
Data retention |
Retain data for one day on DPM-attached storage, and store data older than one day in the Azure Backup service. The goal is to protect a larger amount of data or have a longer retention range. Offloading backup data from DPM to Azure Backup provides retention flexibility without the need to scale the storage that is attached to the DPM server. |
Azure Site Recovery orchestrates replication and failover of physical servers and virtual machines. The following scenarios are supported by Azure Site Recovery:
Feature References | |
Azure Site Recovery Overview |
https://azure.microsoft.com/en-in/documentation/articles/site-recovery-overview/ |
Best Practices for Site Recovery Deployment |
https://azure.microsoft.com/en-in/documentation/articles/site-recovery-best-practices/ |
Mandatory: Azure Site Recovery requires creating a recovery vault and setting up a core infrastructure to support the level of protection desired (for example, integration with VMM and deployment of agents or an InMage infrastructure within the subscription). | |
Recommended: If you are facing network constraints or need to accelerate your initial backup of your data, you can ship your data through a disk to the nearest Azure datacenter through the Azure Import/Export service. | |
Design Guidance |
Azure Site Recovery (ASR) can be an effective backup solution for small businesses that don't necessarily have the resources to set up a secondary failover datacenter. ASR can seamlessly back up your virtual machines to Azure. Here is some design guidance:
Scenario |
Model |
Points to Consider |
Azure Site Recovery |
Small business model |
|
The Microsoft Azure Platform-as-a-Service (PaaS) solutions, such as Azure App Service and Azure SQL Database have built-in backups to support self-service, point-in-time restore options and geographic restore for Azure SQL Database Basic, Standard, and Premium service tiers. This is a key consideration when deploying solutions that leverage Azure PaaS tiers.
Feature References | |
Built-in Automatic Backup in Azure SQL Database |
https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx |
Point in Time Restore for Azure SQL Database |
https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx |
Restoring an Active Database to a Point in Time |
https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx |
Geo-Restore |
https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx |
Azure SQL Database Business Continuity |
https://msdn.microsoft.com/en-us/library/azure/hh852669.aspx |
Geo Replication |
https://msdn.microsoft.com/en-us/library/azure/dn783447.aspx |
SQL Data Sync |
http://azure.microsoft.com/en-us/documentation/articles/sql-database-get-started-sql-data-sync/ |
Business Continuity |
http://azure.microsoft.com/en-us/documentation/articles/sql-database-business-continuity/ |
The Azure App Service backup strategy centers on the backup strategy of your cloud package and configuration files. The following guidance is provided:
Feature References | |
Azure App Service Overview |
http://azure.microsoft.com/en-us/documentation/articles/app-service-changes-existing-services/ |
Recommended: Leverage Azure blob storage to retain and restore your application's cloud snapshot files. |
Design Guidance |
The following guidance is provided for designing backup for cloud packages and configuration using Azure Storage:
Capability Considerations |
Capability Decision Points |
Containers |
Create containers that represent each of your environment's deployment stories (for example, daily builds, manual builds, and dynamic builds) |
Naming conventions |
Have a solid naming convention for each binary file and for your overall build structure (for example, Build2015.418.2338 and ContosoClientGatewayPaas.csfg) |
Continuous release management |
|
There are multiple third-party backup applications that are used to back up an organization's data to Azure Storage. The following table describes a few:
Solution |
Summary |
CloudBerry Backup uploads files and folders from your computer running Windows Server to Azure and restores them in minutes. The software connects directly to an Azure account and securely transfers backup files and folders to the cloud, serving as a transport between your computer running Windows Server and Azure cloud storage. | |
Cloud-based EVault Backup Services for Microsoft Azure gives you the flexibility, scalability, economy, and offsite protection of the cloud without any up-front capital expenses. | |
Azure Backup is supported by all the Uranium Backup Pro editions, including Pro Tape, Pro DB, Pro Shadow, Pro Virtual. |
Backup data is always encrypted prior to being stored in Microsoft Azure. The customer is responsible for managing their encryption keys (and the subsequent back up of those keys). Customer data is never decrypted in Azure. To restore the data, it is decrypted on the on-premises client side by the customer.
If a customer loses their encryption keys, Microsoft cannot recover those keys. Customers can additionally use hardware security modules or key management software to securely safeguard their encryption keys. There is also an option from Microsoft called "Azure Key Vault," which is described in the following section.
Mandatory: The customer is responsible for managing the encryption keys and the backup of those keys. Customer data is never decrypted in Azure. To restore the data, it is decrypted on the on-premises client by the customer. |
Design Guidance |
When you design storage for backups, consider the following:
Capability Considerations |
Capability Decision Points |
Capability Models in use today |
Any servers that are registered using the same vault can recover the data backed up by other servers that use that certificate. |
If you want to ensure that recovery only occurs to specific servers in your organization, you should use a separate certificate designated for those servers. For example, human resources servers could use one certificate, accounting servers another, and storage servers a third. This provides a way to control recovery by installing the appropriate certificates on the recovery servers. |
When data is backed up to an Azure backup vault, it is important to understand who has access to it, and who has the privilege to restore that data to what servers. Be sure to review this security before establishing the backup vaults and backup certificates. |
There is a distinction between using Azure for backup and using Azure for disaster recovery |
Azure Backup is a classic backup solution that involves copying the data of the virtual machine to a backup vault in a storage account and making it available for a restore at a later point. Azure Site Recovery is for disaster recovery scenarios—specifically for virtual machines. The typical scenario for disaster recovery is when you want application availability if there is a disaster in your production environment. You also want to minimize data loss and the time it takes to bring your business back online. ASR is more of a "live" solution, whereas Azure Backup is more of an offline solution. |
Any organization should understand the difference between backup and disaster recovery. Sometimes, organizations use a backup copy for disaster recovery scenarios when they are willing to tolerate the time it takes to bring up their applications after they restore the data from backup copies. This is when the difference between disaster recovery and backup blurs a bit. |
Azure Key Vault helps customers safeguard cryptographic keys and secrets used by cloud applications and services. By using Azure Key Vault, you can encrypt keys and secrets (such as authentication keys, storage account keys, data encryption keys, .pfx files, and passwords) by using keys that are protected by hardware security modules.
For added assurance, you can import or generate keys in hardware security modules. Keys never leave the hardware security module boundary. Hardware security modules are certified to FIPS 140-2 level 2.
Azure Key Vault streamlines the key management process and enables you to maintain control of keys that access and encrypt your data. Developers can create keys for development and testing in minutes, and then seamlessly migrate them to production keys. Security administrators can grant and revoke permission to keys, as needed.
Anybody with an Azure subscription can create and use key vaults. Although Azure Key Vault benefits developers and security administrators, it could be implemented and managed by an administrator who manages other Azure services for an organization.
For example, this administrator would sign in with an Azure subscription, create a vault for the organization in which to store keys, and then be responsible for operational tasks, such as:
Feature References | |
Azure Key Vault |
|
What is Azure Key Vault |
http://azure.microsoft.com/en-us/documentation/articles/key-vault-whatis/ |
Get Started with Azure Key Vault |
http://azure.microsoft.com/en-us/documentation/articles/key-vault-get-started/ |
Azure Key Vault PowerShell Cmdlets |
Mandatory:
|
The Azure portal provides a default monitoring capability for your cloud assets without any additional investments in monitoring software. A summary of the monitoring services available within Azure are outlined in the following table:
Feature References | |
Portal Cloud Service Monitoring |
http://azure.microsoft.com/en-us/documentation/articles/cloud-services-how-to-monitor/ |
Portal Storage Account Monitoring |
http://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/ |
Customizing Monitoring with Azure Portal |
http://azure.microsoft.com/en-us/documentation/articles/insights-how-to-customize-monitoring/ |
Azure Traffic Manager Monitoring |
http://azure.microsoft.com/en-us/documentation/articles/traffic-manager-monitoring/ |
Organizations that have existing investments with System Center Operations Manager can leverage this infrastructure to monitor their Azure-based assets through the System Center Management Pack for Azure. This Management Pack extends monitoring of Azure resources by exposing them to Operations Manager. The following references are provided:
Feature References | |
Azure Management Pack Documentation download |
http://www.microsoft.com/en-us/download/details.aspx?id=38414 |
Azure Management Pack Monitoring scenarios |
Mandatory: You need the management certificate from your subscription before you can configure Operations Manager to discover your Azure resources. |
Design Guidance |
The following considerations apply when extending Azure monitoring through Operations Manager PowerShell:
With ability to execute PowerShell scripts through Operations Manager, you can:
There are two methods that you can leverage to extend Azure monitoring to Operations Manager:
Method 1 is best used when you are unable to find a corresponding PowerShell cmdlet.
Method 2 should be your first choice for designing your custom Azure PowerShell solution. Using cmdlets is preferable due to the abstraction factor. With REST APIs, the interface can change, which could potentially force you to revise your solution.
Application Insights is a set of services that provide actionable insight into a production application. This data is then integrated into the development tools and process. The following references are provided when implementing this capability within Azure to monitor applications:
Feature References | |
Availability Monitoring |
http://azure.microsoft.com/en-us/documentation/articles/app-insights-monitor-web-app-availability/ |
Diagnostics and Performance |
http://azure.microsoft.com/en-us/documentation/articles/app-insights-detect-triage-diagnose/ |
Usage |
http://azure.microsoft.com/en-us/documentation/articles/app-insights-overview-usage/ |
SharePoint Monitoring with Application Insights |
http://azure.microsoft.com/en-us/documentation/articles/app-insights-sharepoint/ |
Mandatory:
| |
Recommended:
|
Design Guidance |
The following considerations apply when extending Azure monitoring through Application Insights:
Capability Considerations |
Capability Decision Points |
Application code development integration |
|
Application logging |
|
Operational Insights is an analysis service that enables IT administrators to gain deep insight across on-premises and cloud environments. It enables you to interact with real-time and historical machine data to rapidly develop custom insights, and it provides Microsoft and community-developed patterns for analyzing data.
Mandatory: This feature requires a Microsoft or Organizational Account to perform initial setup. | |
Recommended: If you are using Operational Insights with Operations Manager, it is recommended to download the latest updates to ensure you take advantage of new features and functionalities. |
Design Guidance |
The following considerations apply when extending Azure monitoring through Operational Insights:
Scenario |
Points to Consider |
Threat analysis |
|
Global Service Monitor is a cloud service that provides a simplified way to monitor the availability of external web-based applications from multiple locations around the world. Importantly, Global Service Monitor monitors applications from the perspective of the customers who use them.
Because Global Service Monitor monitors from locations that are correlated to customer geographies, application owners can gain insight into customer experiences in addition to the separate issues related to external factors, such as Internet or network issues from application or service issues.
The monitoring experience with Global Service Monitor focuses on the application instead of the infrastructure or individual URL. Global Service Monitor extends the monitoring capabilities of the System Center Operations Manager console so that you can monitor external- and internal-facing web applications in the same place you monitor other applications.
Global Service Monitor uses points-of-presence in Microsoft Azure to monitor and identify external factors to help give you a true reflection of an end-user's experience of a web application.
Feature References | |
Features |
|
Web Application Availability Monitoring |
|
Visual Studio Web Tests |
Recommended:
|
Azure Diagnostics 1.3 and 1.2 are Azure extensions that enable you to collect diagnostic telemetry data from a worker role, web role, or virtual machine running in Azure. The telemetry data is stored in an Azure Storage account, and it can be used for debugging and troubleshooting, measuring performance, monitoring resource usage, traffic analysis and capacity planning, and auditing.
Feature References | |
Overview/Configuring |
http://azure.microsoft.com/en-us/documentation/articles/cloud-services-dotnet-diagnostics/ |
Design Guidance |
The following considerations apply when extending Azure monitoring through Azure Diagnostics:
Scenario |
Points to Consider |
Multi-role/tier application |
|
Traditional maintenance of solutions in Microsoft Azure is largely dependent on the services that are consumed within Azure (PaaS or IaaS). Customers have a shared
responsibility for solutions deployed using Azure services, and the amount of shared responsibility is dependent on the services consumed.
As a practical example of this, Azure IaaS virtual machines have the requirement to be maintained by the customer, and there is not an existing automated update service provided for the guest operating system. The underlying fabric hardware, virtualization, and service layers are managed by Azure. Using this example, decision points that should drive the maintenance strategy of the architecture include:
This section covers some of the available options, and it uses the Azure IaaS scenario as a common example.
Keeping up-to-date with Microsoft updates for Windows-based virtual machines is critical to ensure that a proper security posture is maintained for these systems. For these systems, Microsoft updates should be applied to Azure IaaS virtual machines in a similar way that updates are applies to the customer's existing environment.
When updating from on-premises or public Microsoft update servers, the updates source location is largely driven by the Azure network design decisions and customer configurations—like any other Windows-based virtual machine.
For example, if forced tunneling of all network traffic is implemented, it would be recommended to leverage on-premises content servers, such as System Center Configuration Manager Distribution Points, Windows Server Update Services (WSUS) servers, or third-party patch management solutions.
This configuration would reduce the amount of network transit. However, if the Azure virtual machines are permitted to access the public Internet by egressing through the Azure datacenters (the default configuration), we recommend that you configure the Windows Update client settings to download the updates from Microsoft Update directly.
Like Windows Update, Windows Server Update Services (WSUS) can be utilized to patch Azure IaaS virtual machines for customers who wish to have a higher degree of control over patch distribution, release and reporting.
We recommend that organizations review patching and update requirements with customers, specifically requirements around Azure IaaS virtual machines. Currently, Microsoft does not provide a centralized patch management offering for IaaS virtual machines outside of currently shipping patch management solutions such as WSUS and System Center Configuration Manager. Therefore, it is important that the customer understands that their organization is responsible for patching.
Feature References | |
WSUS TechNet site |
|
WSUS Deployment Guide |
https://technet.microsoft.com/en-us/library/dd939906(WS.10).aspx |
Mandatory: Azure solutions that deploy virtual machines must update the virtual machine operating systems. Microsoft does not provide guest operating system patching as a service. | |
Recommended: Azure solutions should contain a patching solution that has reporting or feedback on the status of individual patches. WSUS provides this minimum level of reporting. |
Design Guidance |
For organizations with small environments, or organizations that have not invested in a patching infrastructure (such as System Center Configuration Manager or a similar third-party tool). WSUS can provide a basic-level patch management infrastructure. However, the virtual machines need to be configured to utilize the WSUS instance like any other Windows-based system in the enterprise.
Typically, the application of updates needs to be done manually on individual servers (or it can be automated). WSUS can provide an update repository for content download, in addition to the approval and release of the patches to the environment. WSUS provides simple update automation at the endpoint, and this should be taken into account in any WSUS design. More advanced update features require an infrastructure similar to that provided by System Center Configuration Manager.
If the customer has an existing WSUS topology, a recommended approach is to deploy an additional WSUS server within the organization's Azure subscription and joined to the WSUS hierarchy. Optionally, this additional server can be configured to be a content store, such that Azure virtual machines download content from this new server.
Alternatively, the WSUS server can be configured to not be a content store, and virtual machines leveraging this WSUS server would download updates from Microsoft Update, while simply reporting to the new WSUS server.
A key design input to the WSUS topology and design is the network configuration in Azure. If forced tunneling is used, the WSUS design leverages a content store in Azure. However, if virtual machines have Internet access from Azure directly, the virtual machines can be configured to use Microsoft Update for updates. Consider your network configuration prior to architecting your update topology.
System Center Configuration 2012 R2 Manager can provide services including installing applications and updating management, and other system configuration tasks. This is particularly attractive in hybrid scenarios where customers may have significant existing investments in Configuration Manager packaging, software update groups, and so on. Microsoft Azure presents additional configurations that should be considered, such as updating location settings, boundaries, and client authentication.
Within Configuration Manager, cloud-based distribution points can provide content hosted in Azure for IaaS virtual machines (or other computers) to consume local to their virtual network. This minimizes egress traffic from those systems consuming update services from the organization's infrastructure.
Additional considerations and limitations include the requirement for a site server to have certificate-based authentication to Azure. Task sequences should be configured as Download all content locally as part of any deployment package. Also, some features are not viable in cloud-based distribution points, and Azure supportability and feasibility should be considered with any Azure deployment.
Feature References | |
Cloud-based Distribution Points |
https://technet.microsoft.com/en-us/library/hh272770.aspx#BKMK_InstallCloudSiteSystems |
PKI Certificates for Configuration Manager |
Mandatory:
| |
Recommended: Azure solutions that leverage Configuration Manager should consider the update source settings for Windows and third-party updates because these are not delivered by cloud-based distribution points. We recommend that you allow clients in Azure to connect to Windows Update to retrieve content. Client settings for Allow access to cloud distribution points must be set to Yes. Configuration Manager client settings can be configured in the Configuration Manager administration console. | |
Optional: Configuration Manager deployments within Azure can optionally contain a dedicated primary site. This may depend on the on-premises infrastructure, the scale of the Azure environment to be deployed, and the location of the Azure region in respect to the on-premises environments. |
Design Guidance |
If Configuration Manager is going to be used, we recommend that you leverage an existing Configuration Manager infrastructure, if available. The Configuration Manager architecture (such as primary sites and distribution points) can be an involved and generally a separate engagement beyond an Azure scope of work.
By using an existing Configuration Manager infrastructure, consider the anticipated size of the Azure deployment. If the scale is large enough, consider using a dedicated primary site to service Azure virtual machines.
A key input to the architecture of Configuration Manager is the networking topology. If Azure virtual machines are directly exposed to the Internet via Azure networking, cloud distribution points should be used to deliver content to virtual machines, and Microsoft updates can be downloaded directly from Microsoft Update.
Configuration Manager client settings can be configured in the Configuration Manager console. If forced tunneling is enabled, consider a traditional Configuration Manager distribution point to host content and updates.
A primary consideration with either architecture should be the network bandwidth between the on-premises infrastructure and Azure. Solutions should be architected to minimize the use of this limited bandwidth for patching and maintenance purposes.
It is possible to enable the hub and spoke or the daisy-chain approach to support multiple-hop access. This requires making changes to the default network configuration file.
The process is as follows:
By using a network configuration file, you must define the DNS servers, the local networks, and the virtual networks.
When designing the DNS server requirements, you can use DNS servers that are managed by Azure, or you can use customer-managed DNS servers. The DNS servers are added at the subscription level, assigned to each virtual network, and delivered by the DHCP server that is servicing the subnet. The customer-managed DNS servers can reside in Azure or on-premises.
When designing the local network sites for the configuration file, you must define local network sites that describe the:
For example, the following image shows how to create a network configuration file that connects three virtual networks together in a daisy-chain:
You need to define the local network sites that correspond to each virtual network:
Lvnet1, Lvnet2, Lvnet3
Then define the virtual networks that you want to create:
vNet1, vNet2, vNet3
Now you need to create the network configuration file for the scenario where you want to connect these three virtual networks, and you want traffic to route the entire length of the daisy chain. This means that you need a dynamic routing gateway on each virtual network.
Note: Since you do not know the actual address of the gateways because they are not created yet, so just use placeholder addresses.
First, define the DNS servers:
<Dns>
<DnsServers>
<DnsServer name="DNS1" IPAddress="10.0.0.4" />
<DnsServer name="DNS2" IPAddress="10.0.0.5" />
</DnsServers>
</Dns>
Then define the local networks lvnet1, lvnet2, lvnet3 with placeholder gateway addresses:
<LocalNetworkSites>
<LocalNetworkSite name="lvnet1">
<AddressSpace>
<AddressPrefix>10.1.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>1.1.1.1</VPNGatewayAddress>
</LocalNetworkSite>
<LocalNetworkSite name="lvnet2">
<AddressSpace>
<AddressPrefix>10.2.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>2.2.2.2</VPNGatewayAddress>
</LocalNetworkSite>
<LocalNetworkSite name="lvnet3">
<AddressSpace>
<AddressPrefix>10.3.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>3.3.3.3</VPNGatewayAddress>
</LocalNetworkSite>
</LocalNetworkSites>
Then define the virtual networks, vNet1, vNet2, vNet3. You need to know the following details:
To define the local network sites that specify the routing, create a single local network site definition that traverses multiple subnets. Instead of defining lvnet1, lvnet2, lvnet3, and specifying multiple local network sites in the virtual network definition, you define "transitive" local network site definitions and specify those in the virtual network definition.
To allow routing from vNet1 to vNet3, you would define a single local network site definition called lvnet2-3 and place the address spaces (that you want to get to on the other side of the gateway) for vNet2 and vNet3 in the single definition, for example:
<LocalNetworkSite name="lvnet2-3">
<AddressSpace>
<AddressPrefix>10.2.0.0/16</AddressPrefix>
<AddressPrefix>10.3.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>
</LocalNetworkSite>
You would define the opposite local network site definition to allow routing from vNet3 to vNet1 and vNet2 to vNet1, and specify the additional vNet1 address space:
<LocalNetworkSite name="lvnet2-1">
<AddressSpace>
<AddressPrefix>10.2.0.0/16</AddressPrefix>
<AddressPrefix>10.1.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>
</LocalNetworkSite>
For the vNet1 virtual network definition, you would specify the local network site that allowed the routing path through the gateway that you want to traverse lvnet2-3:
<VirtualNetworkSite name="vnet1" Location="EAST US">
<AddressSpace>
<AddressPrefix>10.1.0.0/16</AddressPrefix>
</AddressSpace>
<Subnets>
<Subnet name="Subnet-1">
<AddressPrefix>10.1.0.0/19</AddressPrefix>
</Subnet>
<Subnet name="GatewaySubnet">
<AddressPrefix>10.1.32.0/29</AddressPrefix>
</Subnet>
</Subnets>
<Gateway>
<ConnectionsToLocalNetwork>
<LocalNetworkSiteRef name="lvnet2-3">
<Connection type="IPsec" />
</LocalNetworkSiteRef>
</ConnectionsToLocalNetwork>
</Gateway>
</VirtualNetworkSite>
For vNet3, specify the lvnet2-1 local network site definition.
For vNet2, specify the local network sites you want to route to (going both directions) as separate single-hop definitions:
<VirtualNetworkSite name="vnet2" Location="EAST US">
<AddressSpace>
<AddressPrefix>10.2.0.0/16</AddressPrefix>
</AddressSpace>
<Subnets>
<Subnet name="Subnet-1">
<AddressPrefix>10.2.0.0/19</AddressPrefix>
</Subnet>
<Subnet name="GatewaySubnet">
<AddressPrefix>10.2.32.0/29</AddressPrefix>
</Subnet>
</Subnets>
<Gateway>
<ConnectionsToLocalNetwork>
<LocalNetworkSiteRef name="lvnet1">
<Connection type="IPsec" />
</LocalNetworkSiteRef>
<LocalNetworkSiteRef name="lvnet3">
<Connection type="IPsec" />
</LocalNetworkSiteRef>
</ConnectionsToLocalNetwork>
By using this approach with local network site definitions that specify the routing subnets, the network configuration file for the vNet1<->vNet2<->vNet3 daisy chain configuration would look like this:
<NetworkConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/ServiceHosting/2011/07/NetworkConfiguration">
<VirtualNetworkConfiguration>
<Dns>
<DnsServers>
<DnsServer name="DNS1" IPAddress="10.0.0.4" />
<DnsServer name="DNS2" IPAddress="10.0.0.5" />
</DnsServers>
</Dns>
<LocalNetworkSites>
<LocalNetworkSite name="lvnet1">
<AddressSpace>
<AddressPrefix>10.1.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>138.91.10.68</VPNGatewayAddress>
</LocalNetworkSite>
<LocalNetworkSite name="lvnet2-3">
<AddressSpace>
<AddressPrefix>10.2.0.0/16</AddressPrefix>
<AddressPrefix>10.3.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>
</LocalNetworkSite>
<LocalNetworkSite name="lvnet2-1">
<AddressSpace>
<AddressPrefix>10.1.0.0/16</AddressPrefix>
<AddressPrefix>10.2.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>
</LocalNetworkSite>
<LocalNetworkSite name="lvnet3">
<AddressSpace>
<AddressPrefix>10.3.0.0/16</AddressPrefix>
</AddressSpace>
<VPNGatewayAddress>138.91.18.174</VPNGatewayAddress>
</LocalNetworkSite>
</LocalNetworkSites>
<VirtualNetworkSites>
<VirtualNetworkSite name="vnet1" Location="EAST US">
<AddressSpace>
<AddressPrefix>10.1.0.0/16</AddressPrefix>
</AddressSpace>
<Subnets>
<Subnet name="Subnet-1">
<AddressPrefix>10.1.0.0/19</AddressPrefix>
</Subnet>
<Subnet name="GatewaySubnet">
<AddressPrefix>10.1.32.0/29</AddressPrefix>
</Subnet>
</Subnets>
<Gateway>
<ConnectionsToLocalNetwork>
<LocalNetworkSiteRef name="lvnet2-3">
<Connection type="IPsec" />
</LocalNetworkSiteRef>
</ConnectionsToLocalNetwork>
</Gateway>
</VirtualNetworkSite>
<VirtualNetworkSite name="vnet2" Location="EAST US">
<AddressSpace>
<AddressPrefix>10.2.0.0/16</AddressPrefix>
</AddressSpace>
<Subnets>
<Subnet name="Subnet-1">
<AddressPrefix>10.2.0.0/19</AddressPrefix>
</Subnet>
<Subnet name="GatewaySubnet">
<AddressPrefix>10.2.32.0/29</AddressPrefix>
</Subnet>
</Subnets>
<Gateway>
<ConnectionsToLocalNetwork>
<LocalNetworkSiteRef name="lvnet1">
<Connection type="IPsec" />
</LocalNetworkSiteRef>
<LocalNetworkSiteRef name="lvnet3">
<Connection type="IPsec" />
</LocalNetworkSiteRef>
</ConnectionsToLocalNetwork>
</Gateway>
</VirtualNetworkSite>
<VirtualNetworkSite name="vnet3" Location="EAST US">
<AddressSpace>
<AddressPrefix>10.3.0.0/16</AddressPrefix>
</AddressSpace>
<Subnets>
<Subnet name="Subnet-1">
<AddressPrefix>10.3.0.0/19</AddressPrefix>
</Subnet>
<Subnet name="GatewaySubnet">
<AddressPrefix>10.3.32.0/29</AddressPrefix>
</Subnet>
</Subnets>
<Gateway>
<ConnectionsToLocalNetwork>
<LocalNetworkSiteRef name="lvnet2-1">
<Connection type="IPsec" />
</LocalNetworkSiteRef>
</ConnectionsToLocalNetwork>
</Gateway>
</VirtualNetworkSite>
</VirtualNetworkSites>
</VirtualNetworkConfiguration>
</NetworkConfiguration>
When designing a peering approach with an NSP model, there are a set of prerequisites that must be determined and allocated to accomplish a circuit connection. For the purposes of illustrating this behavior, AT&T NetBond will be used as an example.
Prerequisites:
When you have all the prerequisites identified, proceed to the provisioning process within the Microsoft Azure subscription:
The process for creating the public peer connection is similar, starting with a VLAN creation for the existing virtual network circuit, but you need to use the second /29 CIDR address space and append _public to the service key.
When designing a peering approach with an IXP model, there are a set of prerequisites that must be determined and allocated to accomplish a circuit connection.
Prerequisites:
When you have all the prerequisites identified, the provisioning process can proceed:
Microsoft published a tiered model of administrative control in the Microsoft whitepaper Mitigating Pass-the-Hash and Other Credential Theft, version 2 (pages 15-19). This section contains additional detail about this tier model.
Tier 2 – Control of user workstations and devices. Tier 2 administrator accounts have administrative control of a significant amount of business value that is hosted on user workstations and devices. Examples include Help Desk and computer support administrators because they can impact the integrity of almost any user data.
The Tier model prevents escalation of privilege by restricting what administrators can control and where they can log on (because logging on to a computer grants control of those credentials and all assets managed by those credentials).
Tier 0 administrator:
To manage the identity store and a small number of systems that are in effective control of it, and:
Tier 1 administrator:
To manage enterprise servers, services, and applications, and:
Tier 2 administrator:
To manage enterprise desktops, laptops, printers, and other user devices, and:
The security dependencies of the domain controllers include all hosts, groups, accounts, and other objects that have effective control of them.
Note that there are assets that can have Tier 0 impact to availability of the environment, but do not directly impact the confidentiality or integrity of the assets. These include the DNS Server service and critical network devices like Internet proxies.
Service or application accounts that are granted Tier 0 privileges introduce a significant amount of risk to an organization's mission and business. This configuration should be remediated as a top security priority.
Organizations may have one or more service accounts in a group that grants tier 0 access, such as domain admins, domain\administrators, schema admins, backup operators, or enterprise admins. Frequently, domains are configured such that account operators and server operators are also effective Tier 0 operators via the accounts, servers, and applications these roles can effectively control.
The specific options to mitigate these risks depend on the application functionality that initiated the granting of these rights. You should consult the product documentation and vendor support to ensure all designed configurations are supported by the vendor. Always test changes before deploying to production.
In most cases, these configurations have been made to support the need for an application to exercise administrative rights for one of the following scopes:
No known applications require both of these kinds of rights, so granting Tier 0 rights for both creates a state of "over-permissioning" that only benefits an attacker or malicious insider.
Important: Tier 0 rights also provide the ability to steal all password hashes in the Active Directory database, which is not required by any known legitimate applications except for the Active Directory Migration Tool (for password synch during migration) and password synchronization to Azure Active Directory.
To remediate this risk, we recommend using preventive and detective controls.
Preventive controls should always include a least-privilege design of permissions, regular password changes, and any applicable application-specific controls. Each role should be designed to the least privilege required for the tool and tasks required using one of the three sets of guidance in this section.
Detective controls such as monitoring for anomalous behavior should be implemented. Expected behavior will vary with each application, but organizations should look for:
Other detective controls specific to the application (such as unused functionality)
This access is typically required for configuration and management tools for a variety of tasks including agent installation, agent repair, troubleshooting actions for the operating system and applications, or security compliance scanning.
Access is granted by adding an account directly or indirectly to the membership of the local administrators group on each computer. By default, Domain Admins is in this group so it is used frequently in many organizations, resulting in a dangerous over-permissioning state.
Group Policy preferences can be used to centrally grant local administrators access without also granting domain administrators privileges. For more information, see Configure a Local Group Item.
This is a configuration that is frequently required for identity management tools that perform provisioning, synchronization, and other similar functions.
Frequently, domain admins or account operators are used for these accounts, resulting in over-permissioning risk. Account operators is effectively a Tier 0 group because they have the ability to reset passwords and take control of any user or computer account not protected by the AdminSDHolder process, which frequently includes Tier 0 applications and administrators.
This can be addressed by delegating permissions to manage the object's OUs where the target accounts reside:
For more information, see KB303972: How to grant the "Replicating Directory Changes" permission for the Microsoft Metadirectory Services ADMA service account.
For more information about Forefront Identity Manager ADMA, see Management Agent Communication Ports, Rights, and Permissions.
Several technical approaches may be used to automate granting privileges, see How to View or Delete Active Directory Delegated Permissions.
Important: These changes should be tested in a lab prior to implementing them in production.
When piloting to production, create a new account with the previous reduced permissions, and reconfigure the application to use it during a maintenance window. Test all application functionality during that window, as possible.
A Change Advisory Board (CAB) is the discussion forum and approving authority for changes that could impact the security profile of the organization. Any exceptions to these standards should be submitted to the CAB with a risk assessment and justification.
Each standard in this document is broken out by the criticality of meeting the standard for hosts trusted at a given Tier level.
Mandatory | |
Strongly Recommended | |
kjRecommended | |
kjOptional Elements |
All exceptions for Mandatory items (marked with red octagon or an orange triangle in this document) are considered temporary, and they need to be approved by the CAB. Guidelines include:
All exceptions for Recommended items (marked with a yellow circle in this document) are considered temporary, and need to be approved by the CAB. Guidelines include:
Exceptions for Recommended items (marked with a blue square in this document) do not require acceptance by the CAB.
Note that there are specific conditions for exceptions to some standards in this document.
The security value of any given IT asset (application, server, workstation, and so on) is evaluated by the importance of the system and its data to the mission. A system or its data may have intrinsic value, which makes it a primary target that is inherently valuable to an attacker.
Systems are also frequently targeted because they allow attackers to directly or indirectly access their primary target. These intermediary targets are not inherently valuable, but they serve as a valuable stepping stone to get the attacker closer to their objective of capturing the primary target.
Primary Target – Intrinsically has value to an attacker (for example, a valuable piece of data or a mission-critical system).
Intermediary Target – Has value to the attacker as a means to reaching a primary target of interest.
Because the Active Directory forest is in effective control of all the organization's business assets joined to a domain in that forest, it is a security dependency of all these business assets.
The Active Directory configuration and operation should adhere to the tier model published in the Microsoft whitepaper Mitigating Pass-the-Hash and Other Credential Theft, version 2 (pages 15-19).
A detailed description of these tiers are described previously, they are:
Tier 2 – Control of user workstations and devices.
The Active Directory forest, domain controllers, and their security dependencies are collectively classified as Tier 0. The security dependencies of the domain controllers include all hosts, groups, accounts, and other objects that have effective control of them. Additional details on Tier 0 contents are described in a previous Appendix.
These Tier 0 functions have been identified as required for effective management of the environment. Each of these assets should be fully under positive control, and they should meet the Tier 0 hardening standards:
This section includes a summary of the standards and their applicability to hosts in each tier.
This table describes the systems to which the standards should be applied.
Tier 0 |
Tier 1 |
Tier 2 |
Standard |
Identity System Hosts | |||
N/A |
N/A |
Active Directory and domain controllers | |
N/A |
N/A |
Certification authorities (CAs) | |
N/A |
N/A |
Privileged Identity Management (PIM) systems | |
N/A |
N/A |
Enterprise Identity Management Systems | |
N/A |
N/A |
Identity federation services (such as AD FS) | |
Identity system administrative workstations | |||
Infrastructure Management and Security Hosts | |||
Configuration management | |||
Operational monitoring | |||
Backup | |||
Virtualization | |||
Security tools | |||
Infrastructure system administrative workstations | |||
Cloud Services | |||
Microsoft Azure Infrastructure as a Service (IaaS) |
Table 2 - Standards Applicability
The Tier columns in this table refer to the trust level that must be met for assets (workstations, servers, domain controller, admin workstation, and so on) to be used by that Tier of administration.
Tier 0 |
Tier 1 |
Tier 2 |
Standard |
Host Security | |||
Current Operating System Version | |||
Block access to public internet and email | |||
Verification of all media in build as clean | |||
Rapid or automated patching | |||
Stringent restrictions for applications, middleware, management agents | |||
Restricted local administrators membership | |||
Compliance with Microsoft Security baselines | |||
Up-to-date anti-malware | |||
Standard security tools | |||
Enhanced Mitigation Experience Toolkit (EMET) | |||
Enforce RDP RestrictedAdmin on admin workstations | |||
Attack surface analysis | |||
Physical security | |||
Full disk encryption | |||
N/A |
Baseboard management controller (BMC) security | ||
UEFI, TPM, and secure boot enabled | |||
Application whitelisting | |||
USB media restrictions | |||
N/A |
Outbound traffic restrictions (no Internet) | ||
Inbound traffic restrictions (default block) | |||
Use of scheduled tasks | |||
Logon restrictions | |||
Enable rapid rebuild process | |||
Follow application security guidance (if available) |
Table 2 - Operating System Standards Summary
The Tier columns in this table refer to the Tier level of the data or objects, the control of which typically impact all assets in that tier.
Tier 0 |
Tier 1 |
Tier 2 |
Standard |
Follow administrative OU structure | |||
N/A |
N/A |
AdminSDHolder ACLs must be default | |
ACLs on Active Directory objects must adhere to tier model | |||
Access to stored LAPS Local Account Passwords | |||
Administrative accounts restricted from delegation | |||
Regularly randomize password on smart card accounts (SCRIL Cycling) | |||
Group Policy Objects (GPOs) | |||
GPO adherence to Microsoft baselines | |||
Group Policy permissions | |||
Startup and logon script quality control and change management | |||
Service Accounts | |||
N/A |
N/A |
No service accounts will have Tier 0 privileges | |
N/A |
Service accounts tracked, documented, and reviewed | ||
N/A |
Service account passwords complexity and expiration | ||
N/A |
Service account privileges assigned through groups | ||
N/A |
Service accounts restricted from delegation | ||
N/A |
Service accounts monitored for anomalous logon behavior | ||
N/A |
Use Group Managed service accounts (gMSAs) instead of user accounts | ||
N/A |
Service account logon restrictions | ||
Certification Authority Data | |||
N/A |
N/A |
GPO trust | |
N/A |
N/A |
NTAuth store |
Table 3 - Active Directory and Identity Data Standards Summary
The Tier columns in this table refer to the Tier level of the administrative account, the control of which typically impacts all assets in that tier.
Tier 0 |
Tier 1 |
Tier 2 |
Standard | |||
Administrator Enablement, Accountability, and Lifecycle Enforcement | ||||||
Administrative personnel standards | ||||||
Administrative security briefing and accountability | ||||||
Provisioning and de-provisioning processes for administrative accounts | ||||||
Operationalize Least Privilege | ||||||
Limit count of administrators | ||||||
Dynamically assign privileges | ||||||
Manage Risk of Credential Exposure | ||||||
Separate administrative accounts | ||||||
Administrator logon practices | ||||||
Use of approved support technology | ||||||
No browsing the public internet with admin accounts or from admin workstations | ||||||
No accessing email with admin accounts or from admin workstations | ||||||
Store service and application account passwords in a secure location. | ||||||
Strong Authentication | ||||||
Enforce smartcard multi-factor authentication for all admin accounts | ||||||
Enforce multi-factor authentication for all cloud admin accounts | ||||||
Rare Use / Emergency Procedures | ||||||
N/A |
N/A |
Correctly follow established processes for all emergency access accounts | ||||
N/A |
N/A |
Restrict and monitor usage of emergency access accounts | ||||
N/A |
N/A |
Temporarily assign Enterprise Admin and Schema Admin membership |
Table 4 - Operational Standards Summary
These standards are designed to protect operating systems against unauthorized administrative control.
Tier 0 host standards are required for all Tier 0 computer assets including domain controllers, Tier 0 management servers, and Tier 0 management workstations.
All administrative workstations must meet or exceed the standards of the highest value assets they manage. As an example, domain admin workstations must meet the Tier 0 security standards because Tier 0 accounts will be logging on to them to administer the domain
Access to the public Internet is disallowed for all servers, all administrative workstations, and all administrative users. No email accounts will be assigned to any administrative account.
Exceptions can be approved for the change approval board for required Internet connectivity to specific Internet assets, such as:
Technically restrict all exceptions as tightly as possible to DNS addresses or IP ranges.
To provide the latest security capabilities and designs, all systems should be installed and running the latest version of the operating system at the time of installation. All hosts in operation should be running the latest operating system version or one major version older than the current (N-1).
Use verified installation media to build all hosts to mitigate against supply chain risks, such as malware installed in a master image or injected into an installation file during download or storage. This includes all executable code such as operating system installation, application installations, tools, and plug-ins. Any unsigned code should be analyzed for security concerns with extra vigor.
The media should be protected from tampering throughout the lifecycle, including:
Software Source
The source of the software should be validated through one of the following means:
certutil –hashfile <filename>
When possible, all application software, such as application installers and tools should be digitally signed and verified using Windows Authenticode with the Windows Sysinternals tool, sigcheck.exe, with revocation checking. Some software may be required where the vendor may not provide this type of digital signature.
Software Storage
After obtaining the software, it should be stored in a location that is protected from modification, especially by internet-connected hosts or personnel trusted at a lower level than the systems where the software or operating system will be installed. This storage can be physical media or a secure electronic location.
Software Usage
Ideally, the software should be validated at the time it is used, such as when it is manually installed, packaged for a configuration management tool, or imported into a configuration management tool.
Physical Build Environment
The credential theft solutions and server resources should ideally be built in a physically secure lab, using only known good software media. This lab should be established for creating the solution and used for any updates or maintenance.
All security updates available for operating systems and applications should be applied within five days of being generally available. This protects against attacks that leverage rapid-reverse engineering techniques to develop exploits using security updates.
Security updates should be applied automatically where operationally feasible, such as administrative workstations and administrative forest hosts. Automated security updates may not be feasible for production domain controllers given the risk for production outages.
To prevent additional attack surfaces on hosts, install only approved software that is specifically required. The number of management agents with full system control should be limited as much as possible.
Each management agent should be approved by the change approval board, and the justification should include specific support of how the business and mission value of the tool's presence exceeds the business and mission risk of not having the functionality.
To limit the number of accounts that can create risk to the organization, restrict the number of accounts in the local administrators group of all systems to the smallest number possible. To protect against inadvertent weakening of the security posture and enforce governance standards, operate all administrative workstations with standard user privileges on the local hosts.
Configure local administrative groups and accounts as follows:
To protect against configuration vulnerabilities, configure all Windows Tier 0 hosts to comply with the appropriate security baselines from Microsoft Security Compliance Manager (SCM). Apply them with Group Policy Objects (GPOs) to ensure consistent enforcement. Apply security configuration guidance for all other operating systems from the manufacturers.
All hosts should include antimalware software to protect against known threats and malware. Note that System Center Endpoint Protection (SCEP) is generally preferable for locked-down systems, such as administrative workstations, because it can leverage WSUS servers to obtain signatures.
Enhanced Mitigation Experience Toolkit
The Enhanced Mitigation Experience Toolkit (EMET) provides exploit mitigations
for many popular applications that may process untrusted data. These mitigations can protect against known and unknown threats and exploits. All hosts that run applications that have been tested to work with EMET (primarily web browsers, media players, and productivity applications) should have EMET installed, and those applications opted in to EMET protections.
To provide protection against a lateral traversal attack that uses a domain account, on all administration workstations, enforce the use of RestrictedAdmin mode for all Remote Desktop Protocol (RDP) connections to remote servers and workstations.
This is enabled by applying the following Group Policy setting to the admin workstation to enforce it on all RDP connections from this computer:
Computer Configuration\Administrative Templates\System\Credentials Delegation\Restrict delegation of credentials to remote servers
The following parameter for the Remote Desktop client application can also be supplied from the command line to enable this mode: mstsc.exe /RestrictedAdmin.
For more information about this capability, see:
Hosts should undergo attack surface analysis to prevent introduction of new attack vectors to Windows during installation of new software. The Attack Surface Analyzer should be used at the following times to help assess configuration settings on a host and identify attack vectors:
Only authorized personnel have physical access to the Tier assets, including servers, storage, administrative workstations, and backup files.
All systems should use BitLocker or a similar full disk/volume encryption
to mitigate against physical loss of computers, such as administrative laptops that are used remotely.
Baseboard management controllers (such as Hewlett-Packard's iLO and Dell's DRAC) can be leveraged as an attack vector in much the same way as any other software component through unpatched vulnerabilities, weak passwords, or misconfigurations.
The most comprehensive mitigation for attacks based on a baseboard management controller is to disable this functionality in the system BIOS or UEFI.
If lights-out management functionality is required, reduce the risks exposed by a baseboard management controller with measures including the following:
All physical systems should be configured with the secure boot feature to mitigate against attackers or malware attempting to load malicious code during the boot process. This feature was introduced in Windows 8 and Windows Server 2012 to leverage the Unified Extensible Firmware Interface (UEFI). See UEFI Firmware for more information.
All hosts should implement software restriction with AppLocker or a similar technology to ensure that only authorized administrative software is run on the host operating system.
All hosts should implement USB restrictions to protect against physical infection vectors. See Control Read or Write Access to Removable Devices or Media for more information.
To protect against network attacks, host firewalls or network devices should block all incoming connections to hosts from the public Internet.
Exceptions are only allowed for designated hosts that download security updates, such as the Windows Server Update Services (WSUS) servers or antimalware servers that obtain signature updates. In these exceptions, controls should be implemented if possible to restrict Internet access to only the authorized Internet locations.
To protect against attacks that leverage inadvertent admin actions, such as browsing and email, host firewalls or network devices should block all outbound access to the public Internet. All changes to hosts that require software from Internet locations should be obtained by using the media verification process or through vendor-provided automated update mechanisms.
Scheduled tasks provide the ability to automatically run arbitrary code and scripts on many hosts. All use of scheduled tasks beyond tasks created automatically by installing the operating system and authorized applications should be carefully reviewed and managed.
Restrict sign-in to the host to only the accounts that are expected to log onto them for regular, daily use or to provide support.
To limit the risk of leaving a potentially compromised Tier 0 host in operation, a rapid rebuild process should be established for all Tier 0 hosts. This allows the organization to rapidly deploy replacement units if compromise of Tier 0 assets is suspected.
Important: This rebuild capability does not constitute the complete response process! A complete response process must meet organizational operational security, investigation, and intelligence-gathering requirements, which can include forensic analysis, allowing an adversary to persist to gather intelligence, and other actions as required.
Document the justification for varying from the manufacturer's security recommendations for each application installed on the host.
Manufacturers can provide explicit or de facto security guidance in several forms, including:
Avoid disabling any security features that are enabled by default and follow the manufacturer's guidance for configuring any security-related settings.
Active Directory is the authoritative identity store, and the data in it is composed of two types of data:
Many of these objects can provide an effective means of controlling the directory through use, abuse, or modification. These standards prevent abuse of objects that provide a well-known means of control of Tier 0.
Each of these standards should be explicitly followed to prevent an escalation of privilege vulnerability in the Active Directory data. Additionally, all Active Directory objects that are classified at Tier 0 may not be modified without review and approval by the change approval board.
Well-known Tier 0 objects include:
Misconfigurations or misunderstandings about the OU model can create elevation of privilege vulnerabilities to Tier 0. To prevent these unauthorized means of control, place all administrative accounts, groups, servers, and workstations into an "Admin" OU structure that is distinct and separate from the managed enterprise servers and user workstations.
Ensure that all objects are placed into appropriate OUs to receive the correct permissions and GPO settings.
Only Tier 0 administrative accounts should have permissions to modify any objects in the Admin OU structure.
The AdminSDHolder object in the system container of every Active Directory domain ensures that the access control list (ACL) permissions are consistently enforced on protected accounts and groups including the Domain Admins group, Enterprise Admins group, built-in administrators, and members of those groups. The Security Descriptor Propagator (SDProp) runs every 60 minutes to ensure the permissions for these protected objects match the permissions for the domain's AdminSDHolder object.
The permissions on this object should never be modified from the default configuration.
For more information, see Best Practices for Securing Active Directory (page 43).
Ensure that the tier model is adhered to for all permissions on Active Directory objects, such as those for enterprise servers and enterprise workstations. Lower tier administrators should never have permission to higher tier resources.
LAPS Local account passwords are stored in an attribute in Active Directory that is only granted to domain admins by default. Additional administrators must be delegated access to read this value by using the tools provided by the Local Administrator Password Solution (LAPS). To prevent elevation of privilege, access to read this password value must be restricted according to the Tier model.
All admin accounts need the attribute enabled for Account is sensitive and cannot be delegated. For more information, see Security Focus: Analyzing "Account is sensitive and cannot be delegated" for Privileged Accounts.
Enabling the Smart card required for interactive logon (SCRIL) attribute on an account will restrict future interactive smart card sign-ins, and set a random value in the account's password attribute.
A new random value will be generated each time this attribute is enabled, so you can cycle this attribute by periodically disabling and enabling it. This change can cause technical issues with currently open Windows sessions, so this operation should not be performed while administrators are working on tasks.
A script to perform this action should be run every 24 hours for administrators at a time when they are unlikely to be working on administrative tasks (such as 3:00 A.M. local time).
A significant portion of the security configuration for Active Directory and the Windows hosts in the organization are set in the Group Policy Objects (GPOs).
Preventing and detecting unauthorized and unsafe modification of these policies is critical to the security posture of Active Directory and the organization. This is critically important for Tier 0 GPOs that include any policies linked to the domain, to Active Directory sites, to the domain controller's OU, or to other Tier 0 OUs.
All exceptions from the recommended GPO settings and permissions must include an impact analysis and proposed change that are approved by the change approval board. Examples of changes include, but are not limited to:
Exceptions to GPO settings for a specific set of users or workstations
Changes to GPO settings can have a significant impact on the enterprise security posture, which must be evaluated prior to allowing that change into a production environment. All GPOs must be compliant with the appropriate Microsoft baselines in the Security Compliance Manager (SCM).
Any exceptions to the baselines or changes to the GPOs must be assessed for threats, potential impact, and countermeasures by using the guidance in the SCM for the divergent settings.
Because Group Policy can grant a means of control (run code on a host or in context of an account), the security permissions for Group Policy must meet the rules of the Tier model and the organization's administrative model.
Group Policy applying and associated script content may only be modified per the following:
Policy that applies to |
Can only be modified by |
Tier 0 hosts and accounts |
Tier 0 administrative accounts |
Administrative workstations and accounts (all) |
Tier 0 administrative accounts |
Enterprise servers |
Tier 1 (and above) administrative accounts |
Enterprise workstations and users |
Tier 1 (and above) administrative accounts |
Note: The Tier model can allow greater GPO control by Tier 1 and 2 administrators, but in our example, CONTOSO has chosen to centralize the management of administrative workstations and enterprise workstations.
Startup and logon scripts provide the ability to automatically run arbitrary code and scripts on many hosts. All script content should be carefully reviewed and managed.
A service account is created for the use of an application or service, rather than for a specific person. Service accounts are frequently targeted by attackers, and they should meet specific security standards to limit and secure their use.
Service accounts that are granted Tier 0 privileges introduce a significant amount of risk to the organization's mission and business. No service account should be granted full Tier 0 privileges that use built-in groups such as domain admins, enterprise admins, built-in administrators, account operators, or any other groups listed in Appendix C.
Any service account that is alleged to require Tier 0 privileges should use one of the following approaches to support the functionality:
For more detailed instructions and options for granting the correct level of permissions to a service account, see Appendix D: Removing service accounts from Tier 0.
Any exceptions to these standards must:
To mitigate risk of unknown service accounts, all service accounts must be tracked, documented, and regularly reviewed for operational needs. Use a single tracking system for all service accounts to document:
This information must be reported to the change approval board (including all changes). All service accounts should be reviewed by the change approval board at least annually.
All service accounts should have passwords that meet the domain complexity requirements and length in the SCM tool.
All service account passwords should be changed at least every 90 days if they are manually managed.
We recommend (but it is not required) that you acquire and implement a tool to generate random passwords for service accounts and manage the service account password lifecycle.
Never enable the attribute Password never expires for a service account.
Service account permissions assigned through groups
All service accounts are assigned permissions via an Active Directory group. Don't assign permissions directly to a service account.
Service accounts restricted from delegation
Enable the attribute Account is sensitive and cannot be delegated for all service accounts. For more information, see Security Focus: Analyzing "Account is sensitive and cannot be delegated" for Privileged Accounts. All exceptions must be approved by the change approval board.
Service accounts monitored for anomalous logon behavior
All service accounts should be monitored for anomalous logon behavior. This can be accomplished by specifically documenting the expected logon patterns and manually identifying alerts for violations or by using tools that apply machine learning techniques for the environment. One such tool is Microsoft Advanced Threat Analytics.
Use Group Managed Service Accounts (gMSAs) instead of user accounts
When supported by the application, use Managed Service Accounts (MSAs) and Group Managed Service Accounts (gMSAs) instead of user accounts.
Windows Server 2008 R2 introduced the concept of the managed service accounts, which are accounts that are tied to a specific computer and are automatically set up and maintained with a complex password that is updated every 30 days by default. Managed service accounts are exempt from domain password policies and cannot be used to log on interactively.
Use Managed Service Accounts whenever possible so that passwords for the accounts are set and managed automatically. This mitigation is appropriate for service accounts that run Windows services, but it is not applicable for accounts that applications use to perform tasks (which require the application to store the account password).
Create and use Managed Service Accounts with the default Managed Service Account container. For more information, see Managed Service Accounts (documentation for Windows 7 and Windows Server 2008 R2) or Group Managed Service Accounts Overview.
Service account logon restrictions
To prevent adversary abuse if a service account is compromised, services accounts should be restricted to only the authentication profile they require to perform authorized tasks. This includes which hosts to access and what logon types to use. This can be accomplished by managing logon user rights for enterprise hosts with a set of GPOs or the use of Authentication Policies and Silos. For more information, see:
Certification authorities (CA) can issue PKI certificates that can be trusted to authenticate as any account. Any system administration of the CAs or access to the private keys of the CAs in any form represents Tier 0 access. All means of administrative control of these systems must be secured at or above the Tier 0 standard.
The change approval board must approve any certificates to be trusted across the enterprise and any certificates to be published to the NTAuth store.
These locations should be monitored for any unauthorized changes with available tools.
Group Policy offers the ability to manage which certificates are trusted by workstations and servers in the domain. This trust can be granted at the operating system level for complete trust or for only certain purposes by using Certificate Trust Lists. For more information, see:
All trust of certification authorities must be managed at the domain level. All changes to the default trust must be approved by the change approval board.
Publishing a CA certificate to the NTAuth store grants a level of trust beyond normal certificate trust. Any certificate directly issued by a CA in the NTAuth store is trusted to authenticate accounts in Active Directory by putting the user name in an attribute of that certificate. This is mostly commonly used for smart card authentication.
Operational decisions that are made on a regular basis are critical to maintaining the security posture of the environment. These standards for processes and practices help ensure that an operational error does not lead to an exploitable operational vulnerability in the environment.
Administrators must be informed, involved, trained, and accountable to operate the environment as securely as possible.
Administrative Personnel Standards
Assigned administrative personnel must be vetted to ensure they are trustworthy and have a need for administrative privileges:
Administrative Security Briefing and Accountability
Administrators must be informed and accountable for the risks to the organization and their role in managing that risk.
Administrators should be trained yearly on:
To provide accountability, all personnel with administrative accounts should sign an Administrative Privilege Code of Conduct document that says they intend to follow organization-specific administrative policy practices.
Provisioning and Deprovisioning Processes for Administrative Accounts
The following standards must be met for meeting lifecycle requirements
Account Privilege Level |
Approving Authority |
Membership Review Frequency |
Tier 0 Administrator |
Change approval board |
Monthly or automated |
Tier 1 Administrator |
Tier 0 administrators or security |
Monthly or automated |
Tier 2 Administrator |
Tier 0 administrators or security |
Monthly or automated |
Achieving least privilege requires understanding the organizational roles, their requirements, and their designing mechanisms to ensure that they are able to accomplish their job by using least privilege. Achieving a state of least privilege in an administrative model frequently requires the use of multiple approaches:
This section contains the standards for achieving least privilege.
Limit Count of Administrators
A minimum of two qualified personnel should be assigned to each administrative role to ensure business continuity.
If the number of personnel assigned to any role exceeds two, the change approval board must approve the specific reasons for assigning privileges to each individual member (including the original two). The justification for the approval must include:
Dynamically Assign Privileges
Administrators are required to obtain permissions "just-in-time" to use them as they perform tasks. No permissions will be permanently assigned to administrative accounts.
Permanently assigned administrative privileges naturally create a "most privilege" strategy because administrative personnel require rapid access to permissions to maintain operational availability if there is an issue. Just-in-time permissions provide the ability to:
Separate Administrative Accounts
All personnel that are authorized to possess administrative privileges must have separate accounts for administrative functions that are distinct from user accounts.
Standard user accounts – Grant standard user privileges for standard user tasks, such as email, web browsing, and using line-of-business applications. These accounts should not be granted administrative privileges.
Administrative accounts – Create separate accounts for personnel who are assigned the appropriate administrative privileges. An administrator who is required to manage assets in each Tier should have a separate account for each Tier. These accounts should have no access to email or the public Internet.
Administrator Logon Practices
Before an administrator can log on to a host interactively (locally over standard RDP, by using RunAs, or by using the virtualization console), that host must meet or exceed the standard for the admin account Tier (or a higher Tier). This is because logging onto a host interactively grants control of the credentials to that host.
See the Mitigating Pass-the-Hash (PtH) Attacks and Other Credential Theft Techniques whitepaper (version 1) for details about logon types, common management tools, and credential exposure.
Administrators can only sign in to admin workstations with their administrative accounts. Administrators only log on to managed resources by using the approved support technology described in the next section.
Use of Approved Support Technology
Administrators who support remote systems and users must follow these guidelines to prevent an adversary in control of the remote computer from stealing their administrative credentials.
Tier 1 Server and Enterprise Application Support
Tier 2 Help Desk and User Support
Help Desk and user support organizations perform support for end users (which doesn't require administrative privileges) and the user workstations (which does require administrative privileges).
User support – Tasks include assisting users with performing tasks that require no modification to the workstation, frequently showing them how to use an application feature or operating system feature.
Workstation support – Tasks include performing workstation maintenance or troubleshooting that requires access to a system for viewing logs, installing software, updating drivers, and so on.
No browsing the public Internet with admin accounts or from admin workstations
Administrative personnel cannot browse the public Internet while logged on with an administrative account or while logged on to an administrative workstation. The only authorized exceptions are the use of a web browser to administer a cloud-based service, such as Microsoft Azure, Amazon Web Services, Microsoft Office 365, or enterprise Gmail.
No accessing email with admin accounts or from admin workstations
Administrative personnel cannot access email while logged on with an administrative account or while logged on to an administrative workstation.
Store service and application account passwords in a secure location
The following guidelines should be used for the physical security processes that control access to the password:
Enforce smartcard multi-factor authentication (MFA) for all admin accounts
No administrative account is allowed to use a password for authentication. The only authorized exceptions are the emergency access accounts that are protected by the appropriate processes.
Link all administrative accounts to a smart card and enable the attribute "Smart Card Required for Interactive Logon."
Allow no exceptions for accounts used by human personnel beyond the emergency access accounts.
Enforce Multi-Factor Authentication for All Cloud Admin Accounts
All accounts with administrative privileges in a cloud service, such as Microsoft Azure and Office 365, must use multi-factor authentication.
Operational practices must support the following standards:
Correctly Follow Appropriate Processes for All Emergency Access Accounts
Ensure that each emergency access account has a tracking sheet in the safe.
The procedure documented on the password tracking sheet should be followed for each account, which includes changing the password after each use and logging out of any workstations or servers used after completion.
All use of emergency access accounts should be approved by the change approval board in advanced or after-the-fact as an approved emergency usage.
Restrict and Monitor Usage of Emergency Access Accounts
Followed these standards for each use of the emergency access accounts:
Temporarily Assign Enterprise Admin and Schema Admin membership
This privilege should be added as needed and removed after use. The emergency account should have these privileges assigned for only the duration of the task to be completed, and for a maximum of 10 hours. All usage and duration of these privileges should be captured in the change approval board record after the task is completed.
Active Directory and Domain Controllers
Active Directory domain controllers store a copy of the password hashes for all accounts. All means of administrative control of these hosts must be secured at or above the Tier 0 standard.
Certification Authorities
Certification authorities (CAs) can issue PKI certificates that can be used to authenticate as any account. All means of administrative control of these hosts must be secured at or above the Tier 0 standard.
Privileged Identity Management Systems
Privileged identity management systems can reset passwords and provide access to administrative accounts and groups, so they are considered Tier 0 assets. The application server hosts and workstations where PIM administrators log on should meet the Tier 0 host hardening standards.
Enterprise Identity Management Systems
Identity management systems can reset passwords and provide access to any user's account, and they are considered Tier 2 assets. Enterprise identify management systems should not be granted privileges to administrative accounts, groups, or OUs. If they are, they must be classified as a privileged identity management solution and secured at Tier 0 standards. The enterprise identify management application servers and workstations administrators log on should meet the appropriate host hardening standards.
Identity Federation Systems
Identity federation systems (AD FS) process authentications for any accounts that use the system, so they must be considered Tier 0 assets. These systems frequently host websites that are exposed to direct Internet traffic, so those exceptions must be allowed, and application-specific security guidance from the vendor should be followed.
Identity System Administrative Workstations and Servers
Administrative workstations must be secured at or above the level of the systems to be managed, so all locations that host administrative accounts for identity systems must be hardened at the appropriate Tier standard.
These standards describe the requirements that must be met for the infrastructure management and security tools that will be used to manage hosts in a given Tier. All components of a management tool must meet or exceed the hardening standards of that Tier, including:
Configuration Management
Configuration management tools allow the ability to run arbitrary code as a system on managed computers. These are classified at the Tier that they manage, and all components must be secured at or above that level.
As an example, a configuration management tool with an agent installed on a domain controller will be Tier 0.
Operational Monitoring
Operational management and performance management tools allow the ability to run arbitrary code as a system on managed computers. These are classified at the highest tier of assets that they manage, and all components must be secured at or above that level.
As an example, an operations management tool with an agent installed on a Tier 0 configuration management tool will be Tier 0.
Backup
Backup systems provide the ability to access backup files for an operating system, the ability to back up an operating system, or the ability to restore backups of an operating system to arbitrary locations.
Any of these needs provide the ability to read and control any element of that operating system (or to copy the operating system to a back-up file), including the operating system secrets.
These are classified at the highest tier of assets that they manage, and all components must be secured at or above that level, including backup file storage systems, storage administrators and their workstations, and any administrators of those systems.
Virtualization
Virtualization tools allow the ability to fully control all operating systems and data hosted on them. They are classified at the highest level of virtual machines that are hosted on them. These provide the ability to read and control any element of the operating system (on disk or in memory), including the operating system secrets.
New technology called "shielded virtual machines" has been announced by Microsoft. Its purpose is to change this, but it is not generally available at the time of the writing of this standard. For more information, see Shielded VMs and Guarded Fabric Validation Guide for Windows Server 2016.
The virtualization components are classified at the highest tier of assets that they manage and all components must be secured at or above that level including virtual machine hosts, hosts where virtualization management applications run, hosts where virtual machine storage is managed, and administrative workstations where the admin accounts manage the virtualization solution and its storage.
Security Tools
Any security monitoring and enforcement tools generally allow the ability to run arbitrary code as a system on managed computers through directly accessible features or through manipulation of extensibility features. The virtualization components are classified at the highest tier of assets that they manage, and all components must be secured at or above that level.
Some solutions are fixed function, and they do not offer extensibility, so the risk of those tools should be assessed individually. These tools are:
Security monitoring should be provided by the security team.
Infrastructure System Administrative Workstations
Administrative workstations must be secured at or above the level of the systems to be managed, so all locations that host administrative accounts for infrastructure systems must be hardened at the appropriate Tier standard.
Microsoft Azure
Much like virtualization tools, an administrator of an Azure subscription has the ability to fully control all operating systems and the data hosted on it. Because of this, the subscription is classified at the Tier of the highest level of virtual machines that are hosted on them.
If Tier 0 assets, such as domain controllers, are hosted in Azure, they must be hosted in a separate Azure subscription from the Tier 1 and Tier 2 assets.
Microsoft Azure is a cloud service that is maintained by Microsoft, and it does not require traditional application security for the platform because tenants are not responsible for application-code security practices or software updates for the service.
The software components that are under tenant control require appropriate application security, including:
This is important to the risk posture of the tenant because attackers can gain control of a tenant service by attacking vulnerable applications that the tenant depends on for security assurances, such as Active Directory, federation, synchronization, and infrastructure management capabilities. Additionally, adversaries can attack applications hosted on Azure.
The security of software purchased from a vendor depends on security measures taken throughout the lifecycle, including:
Detailed guidance about software configuration, security updates, and software supply risks are in the Security Standardization sections of this document.
Following is additional guidance for purchasing policies.
Security Purchasing Policies and Preferences (Example)
The following elements represent security preferences and requirements when you are acquiring applications, devices, and services: