Azure Reference Architecture

Introduction

The goal of the Azure Reference Architecture is to help organizations quickly develop and implement Microsoft Azure-based solutions while reducing complexity and risk. The Azure Reference Architecture combines Microsoft software and recommended compute, network, and storage guidance to support the extension of their datacenter environment through the use of Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) constructs.

Scope

The scope of this document is to provide the necessary guidance to develop Microsoft Azure-based solutions by establishing an Azure subscription model that meets the business, identity, security, infrastructure, and development requirements held by most organizations adopting a public cloud services strategy.

The focus of this document is on the design and implementation guidelines for general Azure subscription planning. This document is not intended to replace existing documentation about Microsoft Azure features. It seeks to integrate and complement that information with associated design guidance. For most organizations that want to seamlessly integrate Azure services, a firm understanding of the features and capabilities of the Azure platform along with tested models and practices is key towards proper consumption and adoption of the services.

This This document's primary scope focuses on the generally available (GA) feature set of Azure. Azure features and capabilities are surfaced in one of three ways:

  • Private Preview
  • Public Preview
  • Generally Available

Preview features are included in this document where possible: however, the primary focus is on conveying the tested design practices and solutions based on GA features. Preview features may not have full capabilities, global scale, or repeatable design patterns that can be leveraged in your planning activities.

Cloud OS

The Cloud Platform is Microsoft's vision of a modern platform for the world's apps. It provides a platform that is unified across on-premises, service provider, and Microsoft Azure environments. The Cloud Platform delivers the hybrid cloud, which effectively provides one consistent platform that spans customer datacenters and multiple clouds.

The Infrastructure as a Service (IaaS) product line architecture (PLA) utilizes the core capabilities of Windows Server, Hyper-V, and System Center to deliver a private cloud IaaS offering.

The Azure Reference Architecture compliments the IaaS PLA and completes the Cloud Platform vision by providing a reference architecture and design patterns for the public cloud (Microsoft Azure).

Microsoft Cloud Service Provider Program

The Microsoft Cloud Service Provider (CSP) program allows service providers to sell Microsoft cloud services along with their own offerings and services. Partners own the complete customer lifecycle through direct billing, provisioning, management, and support. The CSP program enables service providers to:

  • Create a customer offer, set the price, and own the billing terms
  • Integrate service offerings with Microsoft cloud services
  • Stay at the center of the Microsoft cloud customer lifecycle

Microsoft Azure is an open and flexible cloud platform that enables service providers to rapidly build, deploy, and manage secure applications to scale on premises, in the cloud, or both. Bringing Azure to Cloud Service Providers enables partners to capitalize on this Azure opportunity with the capabilities of a CSP, where partners own the end-to-end customer lifecycle with direct provisioning, billing, and support of Microsoft's cloud services.

Modern Datacenter and Cloud Offering Portfolio

The Datacenter and Cloud Infrastructure Services portfolio from Microsoft Enterprise Services is designed to help organizations implement technologies that introduce the efficiency and agility of cloud computing, along with the increased control and management of infrastructure resources.

The key attribute of the Cloud Platform vision is a hybrid infrastructure, in which customers have the option of utilizing an on-premises infrastructure or services provided by Azure. The IT organization is a consumer and a provider of services. This enables workload and application development teams to make sourcing selections for services from any of the provided infrastructures or to create solutions that span them.

The Datacenter and Cloud Infrastructure Services portfolio are Microsoft Services engagements and frameworks through which Intellectual Property (IP), such as the IaaS Product Line Architecture (PLA) and the Azure Reference Architecture is delivered. The portfolio includes offerings for scenarios such as infrastructure deployment, consolidation and migration, modernization, automation, and operations. All of the offerings and scenarios leverage the best practices and design patterns found in the IaaS PLA and the Azure Reference Architecture.

Azure Reference Architecture Overview

The Azure Reference Architecture (AZRA) is an initiative to address the need for detailed, modular, and current architecture guidance for solutions being built on Microsoft Azure. AZRA is a collection of materials including design guidance and design patterns to support a structured approach to architecting services and applications hosted within Microsoft Azure.

Unlike the Microsoft PLAs, it is not the intention of the Azure Reference Architecture to result in a single design, nor will it encompass an exhaustive definition of Azure features. The primary reason for this approach is that customer solutions that use Azure services vary greatly in their implementation. Given the pace of changes and enhancements to Azure services, it is critical that organizations are provided with durable recommended practices related to subscription and architectural planning within Microsoft Azure.

The focus of the Azure Reference Architecture is to identify common services and reusable models that can be used broadly when designing cloud-based solutions. These models assist customers through a series of decision points that lead to reusable design patterns. They are based on successful customer implementations and recommended practices.

Azure Deployment Models and Audiences

Unlike many on-premises solutions, Azure deployment models vary in size, composition, and end-state design, which presents a clear challenge to organizations looking to build solutions based on established standards and best practices. Although there are significant variances between projects that use Azure services, many of these can be classified into a small number of key deployment models and audiences. Each audience or model falls into two broad focus areas: Development or Infrastructure. Additionally, deployment models differ by the subscription ownership type; whether it's the customer organization or their Cloud Solution Provider (CSP) who manages their Azure subscriptions.

Within these categories and corresponding constraints, the following deployment models and audiences can be defined:

  • Customer-Owned Models
  • Cloud Service Provider-Managed Models

Customer-Owned Models

  • Application Owner hosting their application in Azure – A native public cloud or hybrid cloud scenario where an application development team within a customer environment wants to take advantage of Azure capabilities to host their application outside of services managed by Enterprise IT.
  • Application Division (Business Unit IT) hosting their services in Azure - A native public cloud or hybrid cloud scenario where an application development division within a customer environment wants to take advantage of Azure capabilities to host their suite of applications and development/test environment outside of Enterprise IT.
  • Enterprise IT extending their datacenter infrastructure to Azure - Typically a hybrid cloud scenario where a mature Enterprise IT organization wants to extend their existing physical or virtual environment to Azure to support the large number of growing IT requirements for their organization and its customers.
  • Organization without on-premises IT hosting all infrastructure in Azure - Typically a hybrid cloud scenario where a startup, divesting, or enterprise organization with a distributed workforce is looking to provide traditional IT services without an on-premises infrastructure.


Cloud Solution Provider –Managed Models

  • Cloud Solution Provider - Connect Through – A Cloud Solution Provider (CSP) public cloud scenario where a CSP provides cloud services and/or hosts, and directly manages, customer application workloads deployed on top of Azure services. In the "Connect-Through" model, the customer consumes the provider's cloud services delivered via the provider's network with end services hosted in the provider's provisioned Azure subscription. The Azure subscription is created, owned, and managed by the service provider. The following diagram illustrates the Connect Through model:

  • Cloud Solution Provider - Connect To – A Cloud Solution Provider (CSP) public cloud scenario where a CSP hosts and manages customer application workloads deployed on top of Azure services. In the "Connect-To" model, the provider makes cloud services accessible directly to the customer's network. The Azure subscription is created, owned, and managed by the service provider but the customer consumes cloud services by interacting directly with Azure cloud footprint.
  • The following diagram illustrates the Connect To model:

When choosing a management approach for consuming Azure services, the decision is driven by how much management the customer wants to deliver versus how much the cloud service provider will deliver; as well as the connectivity approach to Azure. The figure below provides a comparison view of management responsibility based on the CSP scenarios described above. It's important to note that CSP models provide both built-in and optional services that the customer can select from.

When it comes to planning, designing, and consuming Azure services, these categories are complimentary in some respects. In other respects, they have the potential to create divergent paths.

A key consideration to remember is that within any organization or project, developers consume infrastructure. Similarly, infrastructure is deployed to support applications and services. Understanding the needs of both is important towards developing an Azure subscription model that satisfies the needs of the project or organization.

Azure Reference Architecture Guide Use

As outlined previously, the Azure Reference Architecture guide provides the basis for the decisions that must be considered as part of any project that encompasses a solution design using Azure services. The design of the solution should leverage the architecture design patterns (infrastructure, foundation, and solution) described later in this document.

The Azure Reference Architecture guide does not outline a single Azure design for hybrid enterprise solutions. Rather, it provides a comprehensive framework for decisions based on the core Microsoft Azure services, features, and capabilities required by most solutions. The guide is structured to cover each of the broader topic areas outlined previously, and it uses the following framework for each component:

  1. Technology Definition – Each topic area has a brief section that outlines "what" the technology is and general information about its role within the Microsoft Azure service. Given the rate of changes to the Azure service, all applicable references to product documentation for the feature set or capability are provided.
  2. Design Principle – Some features and capabilities within Microsoft Azure are critical to any solution design, and they require that decisions be made by organizations as part of their project. When a specific topic area requires a decision or has a recommended practice, a clearly defined rule for that technology or design principle is identified. This is a useful component in recording key design decisions as part of the project.
  3. Design Guidance – For each technology topic area, this section provides key information, considerations, and a potential design model to help organizations understand design constraints and recommended practices towards implementing this feature or capability within Microsoft Azure.

A sample topic area is outlined here to illustrate this relationship:

Figure 2: Azure Reference Architecture Sample Topic Area

Rule Set Criteria

Rule set requirements are vendor-agnostic and are categorized as one of the following:

Mandatory: Mandatory recommended practice or area that is critical towards building solutions and services within Microsoft Azure. These requirements are necessary for alignment with the reference.

Recommended: Recommended practice or area that represents a standard recommended approach that is strongly recommended when developing a solution or service within Microsoft Azure. However, implementation is at the discretion of each customer and is not required for alignment with the Azure Reference Architecture.

Optional: Optional recommended practice. These requirements are voluntary considerations that can be implemented in the solution or service being developed in Microsoft Azure and can be followed at the discretion of each customer.

Azure Architecture Patterns

Both Public and private cloud environments provide common elements to support running complex workloads. Although these architectures are relatively well understood in traditional on-premises physical and virtualized environments, the constructs found within Microsoft Azure require additional planning to rationalize the infrastructure and platform capabilities found within public cloud environments.

To support the development of a hosted application or service in Azure, a series of patterns are required to outline the various components and to compose a given workload solution. These architectural patterns fall within the following categories:

  • Infrastructure – Microsoft Azure is a platform that provides Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) services, and it is comprised of several underlying services and capabilities. These services largely can be decomposed into compute, storage, and network services. However, there are several capabilities that may fall outside of these definitions. Infrastructure patterns detail a functional area in Microsoft Azure that is required to provide a service to one or more solutions hosted within an Azure subscription.
  • Foundation – When composing a multi-tiered application or service within Microsoft Azure, several components must be used in combination to provide a suitable hosting environment. Foundation patterns compose one or more services from Microsoft Azure to support a layer of functionality within an application. This may require the use of one or more of the components described in the infrastructure patterns outlined previously. For example, the presentation layer of a multi-tier application requires compute, network, and storage capabilities within Azure to become functional. Foundation patterns are meant to be composed with other patterns as part of a solution.
  • Solution – Solution patterns are composed of infrastructure and/or foundation patterns to represent an end application or service that is being developed. It is assumed that complex solutions are not developed independently of other patterns. Rather, they should utilize the components and interfaces defined in each of the pattern categories outlined previously.

This spectrum of patterns is illustrated in the following model.

Figure 3: Azure Architecture Model

Architectural patterns for cloud-hosted workloads (applications and services) should generally adhere to this model and complex scenarios can be implemented using one or more of the pattern types outlined previously. To learn more about the Azure architectural patterns, see Cloud Platform Integration Framework (Azure Architecture Patterns).

The following diagram illustrates how they can be composed to define a solution, application, or service in Microsoft Azure.

Microsoft Azure Overview

Microsoft Azure Services

What is Azure? In short, it's the Microsoft public cloud platform. Microsoft Azure includes a growing collection of integrated services (compute, storage, data, networking, and applications) that help customers move faster, do more, and save money. With Microsoft Azure, you can build an infrastructure, develop modern applications, gain insights from data, and manage identity and access.

Azure offers dozens of different services in the cloud. These services include all of the commonly referenced cloud computing models:

  • Software as a Service (SaaS)
  • Infrastructure as a Service (IaaS)
  • Platform as a Service (PaaS)

These models can be combined and integrated to build complex robust solutions for any audience and use case.

Available Azure Services

The availability of Azure services varies by region and whether the service is currently in Preview or is Generally Available (GA). For up-to-date information about service availability in each datacenter, see the Services by region page. Determining which services are available is a key consideration when deploying applications or enabling services within Azure.

The concept of Azure regions will be covered later, but consider the sample webpage that follows. The area outlined in red serves as an example of a customer-selected region. Using this example, if the requirement was to deploy a solution within the South Central US Azure region, the solution would be constrained from using G-Series virtual machines (currently in Preview and covered in the Compute IaaS section of this document). Conversely, if the solution required G-Series virtual machines, the organization would need to select an Azure region that supports that feature or service.

When deploying solutions in Azure and planning Azure subscription models for your organization, consider the following questions:

  • What services are needed to support your solution?
  • From where are your customers accessing the solution?
  • Do internal or external users (or both) need access to your solution?
  • Do you require geographic redundancy, and do both of your selected regions support the service and feature sets used in your solution?

The answers to these questions will help govern your decisions about regions and service consumption in Microsoft Azure. For additional details about each Azure services offering, refer to Directory of Azure Cloud Services.

Cloud Computing Models

The United States National Institute of Standards and Technology (NIST) published Special Publication (SP) 800-145, "The NIST Definition of Cloud Computing to provide a clear definition about cloud computing to United States government agencies. Since its release, it has become an unofficial standard in the computing industry when it comes to defining cloud models.

Using the definitions provided in NIST SP 800-145, Microsoft Azure (and other online properties, such as Office 365) is classified as a Public cloud offering because it is owned and managed by Microsoft and is open for use by the general public. Within Microsoft Azure and other cloud solutions, Microsoft provides Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS) capabilities.

Infrastructure as a Service (IaaS)

NIST SP 800-145 defines Infrastructure-as-a-Service (IaaS) as "The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components."

Deploying an application and managing an IaaS environment provides the most flexibility that Azure has to offer. With any deployment choice, there will be pros and cons that must be considered. The greatest benefit of an IaaS implementation is that it offers the greatest amount of control from the operating system to manage access to the application.

IaaS is most like traditional IT delivery. Customers provision their own virtual machines, define their own networks, and allocate their own virtual hard disks. IaaS shifts the burden of operating datacenters, virtualization hosts, and hypervisors. In addition, the business continuity and disaster recovery infrastructure is shifted from the enterprise to the service provider.

Platform as a Service (PaaS)

NIST SP 800-145 defines Platform-as-a-Service (PaaS) as "The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment."

With PaaS applications, many of the layers of management are removed and more flexibility is provided than an application running on IaaS instances. Specifically, there is no need to manage the operating system, including patching, which reduces some of the complexity of designing the deployment.

A significant benefit of deploying an application running in a PaaS environment is the ability to quickly and automatically scale up the application to meet the demand when traffic is high, and inversely scale down when the demand is less. Deploying an application in the PaaS model is very cost effective from a scalability and manageability perspective.

PaaS extends IaaS further by providing multitenant services that customers subscribe to. Platform services are a transformational computing model that can dramatically reduce the costs and increase the agility of delivering applications to end users internally and externally. PaaS users bring their own application code but leverage robust platforms, which they do not need to maintain.

Software as a Service (SaaS)

NIST SP 800-145 defines Software-as-a-Service (SaaS) as "The capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings."

Choosing an Azure SaaS offering provides the least amount of responsibility on the customer's side. At the same time, providing a lesser amount of flexibility in comparison with an IaaS or PaaS approach.

SaaS is the real promise of cloud computing. By integrating applications from one or multiple vendors, customers need to bring only their data and configurations. They can eliminate the costs of building and maintaining applications and platform services and still deliver the secure, robust solutions to the end users.

Hybrid

Many scenarios need to implement a blend of Azure offerings to meet the needs of their organization and application requirements. The following diagram highlights the main differences from a manageability perspective, when using public cloud SaaS, PaaS, IaaS and On-Premises implementations

This is important to understand when making a decision about implementation, because each offering has a different impact on the cost, security, scalability, and staff needed to maintain the application or environment.

Azure Datacenter Model

Like most cloud computing services, Microsoft Azure's cloud computing capacity and capabilities are delivered at hyper scale across a series of well-connected global datacenters. These datacenters are represented in constructs such as Azure regions, which are intended to be easily understood by customers and can be easily consumed based on customer needs. This section will review the Azure datacenter model and provide an overview of the constructs established for their use by customers.

Global Datacenter Presence

Microsoft Azure is deployed around the world in strategic areas that best meet the demand of customers. These areas are known as regions, and are placed at distances greater than 300 miles from each other to help avoid the possibility that a common natural disaster would affect more than one region at a time.

Azure Regions

When it comes to deploying an application or service in Microsoft Azure, there needs to be an understanding about the following:

  • What is a Region?
  • Where are the regions located?
  • What are their capabilities of regions with respect to each other?
  • What are the preview features vs. general availability within a region?
  • Are there any restrictions on where you can deploy to a region? For example:
    • Legal compliance
    • Government regulations

Microsoft Azure is a worldwide network of distributed datacenters that are strategically located around the world to support Microsoft Azure customers. This global presence of datacenters provides Microsoft customers with the ability to deploy an application or service in any datacenter in the world or in multiple datacenters. Whether a customer is a small company or a major corporation, all the services Azure has to offer in that particular region can be consumed.

Locations

For list of Azure datacenter locations, see Azure Regions.


Azure operates out of 17 regions around the world. Geographic expansion is a priority for Azure, because it enables our customers to achieve higher performance and it supports their requirements and preferences regarding data location. The following table is provided as reference:

Azure Region

Location

Azure Region

Location

Central US

Iowa

North Europe

Ireland

East US

Virginia

West Europe

Netherlands

East US 2

Virginia

East Asia

Hong Kong

US Gov Iowa

Iowa

Southeast Asia

Singapore

US Gov Virginia

Virginia

Japan East

Saitama Prefecture

North Central US

Illinois

Japan West

Osaka Prefecture

South Central US

Texas

Brazil South

Sao Paulo State

West US

California

Australia East

New South Wales

   

Australia Southeast

Victoria

It is very important to correctly choose a region or regions that meet your organization's needs. There are a number of elements to consider when choosing a region to deploy your applications and services:

  • Data
  • Location of service consumers
  • Service capability and availability
  • Network performance
  • Pricing
  • Redundancy for high availability
Data

Where Azure data is physically stored is very important to most customers. If the organization is restricted by any government regulations or internal company policies about data storage and location, this needs to be transparent. Many times there are restrictions about data export and Government Regulatory Compliance (GRC) for some data sets. This information needs to be understood before deploying any applications or services.

When you create a storage account, you select the primary region for the account. When enabling geographic replication of a storage account, the secondary region is determined based on the primary region, and it cannot be changed. The following table shows the current primary and secondary region pairings when geographically replicated storage is used:

Primary Region

Secondary Region

North Central US

South Central US

South Central US

North Central US

East US

West US

West US

East US

US East 2

Central US

Central US

US East 2

North Europe

West Europe

West Europe

North Europe

Southeast Asia

East Asia

East Asia

Southeast Asia

East China

North China

North China

East China

Japan East

Japan West

Japan West

Japan East

Brazil South

South Central US

Australia East

Australia Southeast

Australia Southeast

Australia East

Service Capability and Availability

As described earlier, all the Azure Regions are not equal when it comes to the available capabilities and services. Azure will first release a new feature in Preview and may only be available in certain regions, prior to being Generally Available (GA).

Before deploying an Azure service, review the following link and choose a region or regions to verify what services are available in your selected region: Services by region.

Network Performance

The network topology of the Internet is complex when looking at bandwidth and routing. Routes from one end-point to another are not clear, and they propagate between different ISPs while in route. It is best to validate the latency between the customer location and Microsoft Azure regions. Choose the one with the lowest latency, which will provide the best performance from a networking perspective

Pricing

The costs associated with services within the different Azure regions are not necessarily the same. The cost for Azure services is controlled by many factors. If latency and GRC are not influencing the architectural design of the application or service, it may be best to deploy to the region with the lowest costs. Please refer to the following site for the pricing details of each service provided by Microsoft Azure: Azure Pricing.

Geographic Redundancy and High Availability Across Datacenters

One way to reduce the impact of a datacenter or regional service outage is to place these applications and services in multiple regions. Placing a web application or service in multiple Azure regions and tying those services together with Traffic Manager provides the required redundancy to keep the service running.

When an outage happens in one of other regions, the required high availability components will be in place and the services will remain available to the end users. Establishing virtual network-to-virtual network VPNs between datacenters to route data and infrastructure services is another way to support enterprises with high availability.

Using Regions and Affinity Groups

Affinity Groups tell the Azure Fabric Controller that two or more Azure virtual machines should always be placed together or close to one another within a cluster of compute resources. In the past, it was a requirement to have an affinity group associated with a virtual network. Recent architectural improvements have removed this prior requirement, and it is no longer recommended to use affinity groups in general for virtual networks or virtual machines.

Virtual Machine Considerations

Although it is not generally recommended to use affinity groups with virtual machines, there is one particular scenario where it may be necessary to use an affinity group, specifically only when it is required to have the absolute lowest network latency between the virtual machines. Associating a virtual machine with an affinity group ensures that all virtual machines in the affinity group are in the same compute cluster or scale unit.

Although it may be necessary to use an affinity group when configuring a virtual machine to ensure the least amount of latency, the following drawbacks can be difficult to change later should they occur:

  • Limitation of virtual machine sizes available on the computer scale unit associated with the affinity group
  • Higher probability of not being able to allocate a new virtual machine (caused by the scale unit of the affinity group being out of capacity)

The link between the virtual machine and the affinity group is the cloud service rather than the virtual machine alone. Should capacity issues or the inability to resize an existing virtual machine to a larger size occur, it is necessary to:

  • Remove the virtual machine and import to a new cloud service associated with the region.
  • Remove all virtual machines from the existing cloud service before deleting and re-creating the cloud service to reference the region rather than the affinity group.

The process to remove virtual machines from an affinity group is not very easy, and this further emphasizes why you should not make the association to an affinity group rather than a region unless the requirement for the least amount of latency is present.

For more information about the current guidance for affinity groups and specifics regarding virtual networks and virtual machines, see How to migrate from Affinity Groups to a Regional Virtual Network.

Virtual Network Considerations

In May of 2014, the ability to create a virtual network that can span the entire region (datacenter) was introduced. Now when creating a new virtual network in the Azure portal, the only option is to associate the virtual network to a location rather than to an affinity group.

A regional virtual network is required for many of the newer Azure features, including internal load balancers. Customers with an affinity group virtual network need to request support to migrate the virtual network to a regional type.

Optionally, you can create a new virtual network that is associated with the region. Then migrate the existing deployments from the affinity group virtual network.

For more information, see: Regional Virtual Networks.

Datacenter Architecture

As described earlier, Azure hosts its services in a series of globally distributed datacenters. These datacenters are grouped together in regions, and datacenters within a given region are divided into "clusters," which host services. This interaction is outlined in the following diagram:

Within each datacenter, the racks of equipment are built to be fault tolerant with respect to networking, physical host servers, storage, and power. The physical host servers are placed in high availability units called a cluster. The cluster configurations are spread across multiple server racks.

A single rack is referred to as a Fault Domain (FD), and it can be viewed as a vertical partitioning of the hardware. The fault domain is considered the lowest common denominator within the datacenter for fault tolerance. Microsoft Azure can lose a complete rack, and the hosted services can continue unaffected.

A second partition within the datacenter is called the Upgrade Domain (UD) and it can be viewed as a set of horizontal stripes passing through the vertical racks of fault domains. Upgrade domains are used to deploy updates (security patches) within Azure without affecting the availability of the running services within the Azure fabric. The following diagram shows a high-level relationship between fault domains and update domains in the Azure datacenters.

Virtual machines are placed in specific fault domains and update domains based on the location of respective virtual machines in the same Availability Set. For more information about properly configuring availability sets, refer to the Compute (IaaS) section.

Server Optimization

Servers reside within each Azure datacenter. The servers are divided into clusters, which then are partitioned by the Azure Fabric Controller to deliver a given service. This relationship is outlined in the following diagram below:

Additional details about virtual machine compute instances are provided later in this document; however, a brief overview is provided here to give a general understanding of Azure compute services.

Concerning compute sizes for Azure IaaS (virtual machines), Azure currently has three series: A, D, and G. Each series has different characteristics. For example, the D series offer up to 800 GB of temporary SSD storage, while the G series machines are the largest and offer the highest performance.

The following article has details about each series and examples where decisions have been made based on scenarios: Azure A-SERIES, D-SERIES and G-SERIES: Consistent Performances and Size Change Considerations.

With the newer D and G series virtual machines, the temporary drive (D:\ on Windows, /mnt or /mnt/resource on Linux) are local SSDs. This high-speed local disk is best used for workloads that replicate across multiple instances, such as MongoDB, or for workloads that can leverage this high I/O disk for local and temporary cache, such as the Buffer Pool Extensions in SQL Server 2014.

Note: These drives are not guaranteed to be persistent. Thus, although physical hardware failure is rare, when it occurs, the data on this disk may be lost, unlike your operating system drive and any attached durable disks that are persisted in Azure Storage.

Also available when using premium storage are DS series virtual machines that offer high-performance and low-latency disk support for I/O intensive workloads. The underlying disks for DS series virtual machines are SSDs rather than HDDs, and they achieve 64,000 IOPS.

The following tables list the details of the D Series and G Series virtual machines.

D Series

General Purposes Sizes

Name

vCores

Memory (GB)

Local SSD (GB)

Max Persistent Data Disks

Standard_D1

1

3.5

50

2

Standard_D2

2

7

100

4

Standard_D3

4

14

200

8

Standard_D4

8

28

400

16

Memory Intensive Sizes

Name

vCores

Memory (GB)

Local SSD (GB)

Max Persistent Data Disks

Standard_D11

2

14

100

4

Standard_D12

4

28

200

8

Standard_D13

8

56

400

16

Standard_D14

16

112

800

32

G Series

Name

vCores

Memory (GB)

Local SSD (GB)

Max Persistent Data Disks

Standard_G1

2

28

412

4

Standard_G2

4

56

824

8

Standard_G3

8

112

1,649

16

Standard_G4

16

224

3,298

32

Standard_G5

32

448

6,596

64

Microsoft Azure Enterprise Operations

To operate your application, service, or infrastructure within Microsoft Azure, it is important to understand the roles, access methods, and components that make up a given organizations Azure environment. This section covers each of these areas at a high level.

Enterprise Roles and Portals

Roles

Within a given enterprise enrollment, Microsoft Azure has several roles that individuals play. These roles range from creating subscriptions (covered later in this document) to provisioning resources. The following top level roles exist within Azure:

Role

Quantity/Description

Functions/Permissions

Enterprise Administrator

There may be multiple Enterprise Administrators per Enterprise Enrollment

  • Manage accounts and Account Owners
  • Manage Enterprise Administrators
  • View usage across all accounts
  • View unbilled charges across all accounts

Account Owner

Each account requires a unique Microsoft Account or Organizational Account

  • Create and manage subscriptions – only the account owner is able to perform these functions
  • Manage Service Administrators and Co-Administrators
  • View usage for subscriptions
  • View account charges – if the Enterprise Administrator has provided access

Service Administrator

A single Microsoft Account or Organizational Account may be used across subscriptions and between hierarchal levels

  • Access and manage subscriptions and development projects on the developer portal

A detailed breakdown of each role, how it is created and what primary tool they use is provided in the following table:

Role

How Created

Primary Tool

Enterprise Administrator

First account created at on-boarding. Full access and visibility into all activity and resources of a corporate enrollment.

https://ea.azure.com

Departmental Administrator

Delegated by the Enterprise Administrator, this role is typically cost focused at the business unit level. Approves rolled up IT budgetary requests for multiple organizations. Can create and have visibility into multiple account owners. Consumption information can be rolled up and isolated at this level.

https://account.windowsazure.com

Account Owner

Delegated by a Departmental Administrator, this role typically is cost focused at the departmental or project level. Role creates the subscriptions and Service Administrators, and approves hardware and resource requests by project. Can create and have visibility into multiple Service Administrators and subscriptions..

https://account.windowsazure.com

Service Administrator

Owns a subscription at the resource level. Manages who can create and use IT resources; is solution and project delivery focused. Sets roles and responsibilities at the project level. Has visibility into a single subscription's consumption..

https://manage.windowsazure.com

Co-administrator

A resource administrator within a subscription that can manage provisioning and delegation of additional co-administrators. Project and resource focused.

https://manage.windowsazure.com

Resource Group Administrator

Manages a group of resources within a subscription that collectively provide a service and share a lifecycle. Single project or service focused. (Currently in Preview)

https://manage.windowsazure.com

Portals

Microsoft Azure has several portals to support holistic management of the accounts, subscriptions, and features outlined in this document. The following sections cover available portals depending on the account management model:

Customer-Owned Models

When the Azure subscription is provisioned using the customer-owned account models described previously in this document, the customer's organization deploys and manages Azure workloads on their own. The following portal offerings are available for resource management:

Portal

Location

Purpose

Enterprise Portal

https://ea.azure.com/

  • Manage access
  • Manage accounts
  • Manage subscriptions
  • View price sheet
  • View usage summary
  • Manage usage and lifecycle email notifications
  • Manage Authentication Type
    • Microsoft Account Only – for organizations using only Microsoft Accounts
    • Organizational Account – for organizations that have set up Active Directory in Azure or synchronized from an on-premises Active Directory using ADFS or Directory Synchronization (DirSync), and chose to add users with cloud-based Active Directory authentication
    • Organizational Account Cross Tenant – for organizations that want to add an Enterprise Azure user from an Active Directory tenant outside of their own
    • Mixed Account – for organizations that want to add a combination of Microsoft Account users and cloud-based Active Directory users

Account Portal

https://account.windowsazure.com

  • Edit subscription details
  • Enroll in or enable Preview features

Management Portal

https://manage.windowsazure.com

or

https://portal.azure.com (Preview)

  • Provision/deprovision Azure services
  • Manage co-administrators on subscriptions
  • Open support tickets for issues within the subscription

Note - any support ticket under a Premier Azure Support agreement should be opened using the Premier portal

Cloud Solution Provider-Managed Models

When the Azure subscription is provisioned by Cloud Solution Provider, who manages end customer Azure subscriptions, the following portal offerings are available for resource management:

Portal

Location

Purpose

Partner Portal

https://partnercenter.microsoft.com

  • Manage CSP customers
  • Manage CSP customer accounts
  • Manage CSP customer subscriptions
  • Manage user accounts assigned to CSP customer subscription administration
  • Retrieve billing data as a CSP on behalf of CSP customers
  • Administer services provisioned within CSP customer subscriptions
  • Create and Manage Azure service requests

Management Portal

If managed by CSP (exact URLs are provided in customer subscription "Service management" view on Partner portal):

Azure Active Directory Management:

https://account.windowsazure.com/PremiumOffer/Index?offer=MS-AZR-0110P&returnUrl=https://manage.windowsazure.com/<-csp->.onmicrosoft.com#Workspaces/ActiveDirectoryExtension/Directory/<-guid->/directoryQuickStart

Customer Azure resource management

https://portal.azure.com/<tenant>.onmicrosoft.com
(Preview)

When accessed by the end customer:

https://manage.windowsazure.com

or

https://portal.azure.com (Preview)

  • Provision/de-provision Azure services
  • Manage co-administrators on subscriptions
  • Open support tickets for issues within the subscription

The following table provides a summary of portal access by role:

Role

Enterprise Portal

Account Portal

Management Portal

Enterprise Administrator

Yes

Yes – if account is also Account Owner

Yes – if account is also the Service Administrator or Co-administrator

Account Owner

Yes – limited access if provided by Enterprise Administrator

Yes

Yes – if account is also the Service Administrator or Co-administrator

Service Administrator

No

No

Yes

Partner Center Portal

Partner Center Portal is the primary destination for Cloud Solution Providers (CSPs) to onboard customers, resell first party and third party services, onboard customers, and manage customer services. It also provides access to billing data, powerful analytics, and tools that enable upsell and cross sell for Cloud Solution Provider partners. The following sequence of steps demonstrate the onboarding process of new customer to the Azure platform as a CSP-managed entity:

Additional Role Considerations

The following considerations are provided for the operational roles identified within this section:

  • By default, the Account Owner will be the Service Administrator on any new subscriptions. The Service Administrator can be updated to any other eligible ID in the Account Portal.
  • For subscriptions created by a Microsoft Account Owner, the Service Administrators and Co-Administrators must also be Microsoft Accounts. For subscriptions created by an Organization Account, either type of Microsoft Account or Organization ID may be used for the Service Administrators and/or Co-Administrators.
  • Discounted Offer Subscribers (MSDN, BizSpark, Microsoft Action Pack) – When associating an ID that is receiving one of these listed benefits as an Account Owner, the benefit will be lost, but can it be recovered. This is not the case when associating an ID as an Enterprise Administrator or Co-Administrator on a subscription.
  • By default, each new subscription is named Microsoft Azure Enterprise. It is best practice to rename it to something more unique so that each subscription can be identified by name when you are managing them. See the following section about Azure subscriptions for information about subscriptions and naming conventions.
  • When you first activate an Enterprise Azure enrollment, we recommend that the customer request a concierge onboarding meeting so staff can provide an overview of Enterprise Azure and answer any questions. To request this onboarding session, use the following URL: http://aka.ms/AzureEntSupport. Choose the problem type Onboarding, and for the category choose Scheduling a Customer Onboarding Call.

Azure Subscriptions

What is a Subscription?

Initially, a subscription was the administrative security boundary of Microsoft Azure. With the advent of the Azure Resource Management (ARM) model, a subscription now has two administrative models: Azure Service Management and Azure Resource Management. With ARM, the subscription is no longer needed as an administrative boundary.

ARM provides a more granular Role-Based Access Control (RBAC) model for assigning administrative privileges at the resource level. RBAC is currently being released in stages with 22 new roles available at this time.

A subscription additionally forms the billing unit. Services charges are accrued to the subscription currently. As part of the new Azure Resource Management model, it will be possible to roll up costs to a resource group. A standard naming convention for Azure resource object types can be used to manage billing across projects teams, business units, or other desired view.

A subscription is also a logical limit of scale by which resources can be allocated. These limits include hard and soft caps of various resource types (for example, 10,000 compute cores per subscription). Scalability is a key element for understanding how the subscription strategy will account for growth as consumption increases.

Design Considerations

Assessment

One of the most critical items in the process of designing a subscription is assessing your current environment and needs.

It is critical to develop the Subscription, Network, Storage, Availability, and Administrative models together to have a cohesive approach. Understanding how each component is limited and how each impacts the others is critical to a solution that can scale and be flexible enough to support the needs of the business.

Specifically, it is important to have a thorough understanding of the following aspects:

Identify business requirements

  • Availability
  • Recoverability
  • Performance

Identify technical requirements

  • Is network connectivity a shared resource or dedicated to single use or group?
  • Are there Active Directory requirements?
  • Do you need to consider clustering, identity, or management tools?

Security requirements

  • Who are the subscription administrators?
  • Are the appropriate network connectivity and identity requirements being deployed?
  • Have you implemented a least privilege administrative model?

Scalability requirements

  • What are the growth plans?
  • How will limited resources be allocated?
  • How will the model evolve over time considering additional users, shared access, and resource limits?

Additional considerations

  • Is the subscription owned by the customer or managed by a cloud service provider?
  • Is an Office 365 Azure Active Directory tenant set up?
  • Are there plans for Office 365 enrollment?
  • Are there other Azure subscriptions in use?
  • Have you deployed a trial Azure subscription?
  • Have you run a trial Power BI evaluation?
  • Have you run a RMS evaluation?
  • Can you use the desired OrgID *.onmicrosoft.com for company directory?

Many of the early decisions in architecting and planning an Azure environment and related subscriptions can have an impact on future decisions and designs as the cloud environment grows. As such, it is important to have participation and input from many groups within an organization including networking, security, identity, domain administrators, and IT leadership.

Pulling in specific teams early and having an open dialogue of different perspectives provides a better design and implementation. By ensuring that any objections are exposed early and can be dealt with thoroughly is ideal rather than finding them in the middle of a project so that they have a negative impact on the schedule.

Following is an example subscription design based on a subscription per
Organizational Unit.

Here is another example subscription design that is based on one subscription per environment in the development process of an application.

For CSP-managed scenarios, here is an example subscription design that illustrates a model for one or
more
subscription per specific customer where a separate service deployment for a given customer may be assigned a dedicated subscription.

Administration

At its core, a subscription is a logical grouping of services and administration. It is the base unit of administrative granularity and it is used to track and bill service consumption.

Subscription Administrators have the ability to read and download anything stored in an Azure Storage account, including operating system VHDs, SQL Server data disks, and blobs.

Subscription Administrators can stop, start, provision and delete existing and new services.

Subscription Administrators can grant co-administrative access to new users.

All of these capabilities require careful consideration for who is given these rights in the subscription. Domain administrators have a similar situation regarding the level of rights and the need to carefully choose who has these rights.

In CSP-specific scenarios, customer subscriptions are often created, owned and managed by the service provider who then designates administrative agents to manage customer subscription resources. In this scenario the subscriptions are ARM based subscriptions and require an RBAC model to control the access and management of the subscription and resources.

Recommended: The minimum number of users should be assigned as Subscription Administrators and/or Co-administrators.

Recommended: Use Azure Resource Management RBAC whenever possible to control the amount of access that administrators have, and log what changes are made to the environment.

Connectivity

Adding network connectivity (whether using a site-to-site VPN or a dedicated ExpressRoute connection) brings additional considerations to the subscription requirements discussion.

The subscription is a required container to hold a virtual network, and often networking is a shared resource within an enterprise.

Site-to-site VPNs and ExpressRoute circuits require defining IP address ranges that do not overlap with on-premises ranges.

Site-to-site VPN connectivity requires setting up and configuring a public-facing gateway and VPN services at the corporate edge.

ExpressRoute connectivity is through a private connection from an on-premises datacenter to Azure through a service provider's private network. For more information, see the Microsoft Azure Networking section later in this document.

Routing and firewall configurations are typically necessary when enabling connectivity. Administration and connectivity are often at odds with respect to autonomy and sharing resources, but when designing the subscription architecture for the enterprise, both must be part of the solution. Business requirements including availability and reliability will impact the network architecture, and subsequently, the subscriptions necessary to support that architecture.

Because a virtual network must exist inside a subscription, some constraints of a subscription also impact decisions made for virtual networks. For example, only 20 virtual networks can be attached to a single ExpressRoute circuit. Therefore, only 20 subscriptions could be attached to that circuit.

In another scenario, if a design used 20 virtual networks within a single subscription, and ExpressRoute was used for connectivity to corporate network resources, there would be no way to attach another subscription to the same ExpressRoute, regardless of the bandwidth utilization on the circuit.

If multiple virtual networks are to share a single enterprise ExpressRoute connection, essentially there is no network isolation between those networks. In this case, any separation the subscription design may try to define is eliminated and must be achieved through subnet layer Network Security Groups (NSGs). When the virtual networks are attached to the same ExpressRoute circuit, they are essential a single routing domain.

A subscription hosting only PaaS services could have no virtual network at all, and the design limitations discussed above would not apply.

If a subscription will host a virtual network for on-premises connectivity and will not be used to host IaaS and or PaaS resources, it can be inferred that the cost of the subscription with a virtual network is about one-tenth the cost of the ExpressRoute circuit.

Security and Identity

Identity services provided by an IaaS Active Directory, an Azure Active Directory tenant, or a customer OrgID tenant will have an impact on how security is implemented and subsequently on how that security impacts the number and configuration of subscriptions necessary.

Subscription Administrators have a broad authority, and as such they must be considered administrators over all the resources in the subscription. If the subscription includes Azure Active Directory, IaaS domain controllers, or if it connects to domain controllers from an on-premises Active Directory, the Subscription Administrators and Co-administrators are also domain owners. They must be trusted individuals and treated like any domain administrator appropriate for that directory.

Productivity goals, single sign-on, and federation requirements impact identity services decisions, and subsequently, the supporting subscriptions.

Scale

Subscriptions form the scale limit in Azure. Many resources—from computing cores and storage accounts to reserved IP addresses—have quantity and size limitations based on the subscription.

When thinking about the subscriptions for an environment, it is important to think about how the design will scale if and when limits are reached.

In subscription discussions, a number of considerations determine the decisions made about the design. The number of connections that can be shared by a tunnel or circuit, bandwidth requirements, the source of identity, and the number of groups, users, and applications associated with a subscription are all important topics when considering scale.

Multiple Subscriptions Introduce Complexities

The use of a subscription as a security boundary may be considered when designing an Azure subscription model. A project requiring isolation should consider subscription administration very carefully. Some considerations for multiple subscriptions include:

  • A subscription on its own doesn't cost anything.
  • A subscription has its own administrators.
  • A subscription is accountable for its own consumption.

Complexities are introduced when you consider that the on-premises networking and security infrastructures are typically shared resources.

Patching, monitoring, and auditing are frequently provided by dedicated organizations, and staff is trained in the related tools. Business continuity and disaster recovery are almost always dependent on enterprise solutions to mitigate the cost.

An enterprise that allowed Azure subscriptions to be based on a project or team, could find itself:

  • Purchasing dedicated network circuits arbitrarily rather than for bandwidth need.
  • Supporting multiple edge gateway devices.
  • Increasing management of IP address space allocation.
  • Increasing management of routing and firewall configurations.
  • Duplicating services required, including monitoring, patching, and anti-virus.

If a business unit manages its own networking, operations, business continuity, and disaster recovery, or the use case is such that a dedicated VPN connection to on-premises resources is sufficient, this type of subscription model could work very efficiently.

Enterprise Model

The following diagram shows a robust enterprise Azure enrollment. There are multiple subscriptions, one of which is a "Tier 0" subscription used to host domain controllers and other sensitive roles when extending an on-premises Active Directory forest to Azure.

This is configured as a separate subscription to ensure that only administrators with domain administrator level privileges are able to exert administrative control over these sensitive servers through Azure subscriptions, while still allowing server administrators to manage virtual machines in other subscriptions.

QA and production networks share the same dedicated ExpressRoute circuit to on-premises resources. They are separated into distinct subscriptions to allow separation of access and to allow the QA subscription to scale on its own without impacting production.

This model will scale based on need. Second, third, and subsequent QA and production subscriptions can be added to this design without significant impact on operations. The same applies to network bandwidth—the circuit can be used until its limits are reached without any artificial limitations forcing additional purchases.

Subscriptions are the foundational building block of an Azure enterprise enrollment. The requirements for administration, operations, accountability, connectivity, scalability, and security shape the subscription model.

Note that multiple existing resource forests are depicted here only to show that some forests can be extended to Azure while others don't have to be. Microsoft does not recommend creating a separate resource forest for Azure-hosted resources as a security separation method.

This approach typically requires two-way trust relationships that negate any potential security isolation benefits and the organization will be left with increased operational overhead for no benefit. The use of Read Only Domain Controllers (RODCs) for Azure-hosted resources also offers no meaningful security benefits, while adding increased operational overhead.

Subscription Naming Convention Considerations

When naming the Microsoft Azure subscription, it is a recommend practice to be verbose. Try using the following format or a format that has been agreed upon by the stake holders for the company.

<Company> <Department (optional)> <Product Line (optional)> <Environment>

  • Company, in most cases, would be the same for each subscription. However, some companies may have child companies within the organizational structure. These companies may be managed by a central IT group, in which case, they could be differentiated by having both the parent company name (Contoso) and child company name (North Wind).
  • Department is a name within the organization where a group of individuals work. This item within the namespace as optional. This is because some companies may not need to drill into such detail due to their size. The company may want to use a different identifier.
  • Product line is a specific name for a product or function that is performed from within the department. As with the department namespace, this area is optional and can be swapped out as needed.
  • Environment is the name that describes the deployment lifecycle of the applications or services, such as Dev, Lab, or Prod.

What you are trying to accomplish with a naming convention, is to put together a meaningful name about the particular subscription and how it is represented within the company. Many organizations will have more than one subscription, which is why it is important to have a naming convention and use it consistently when creating subscriptions.

NOTE: In CSP-specific scenarios the naming convention can also incorporate the CSP identifier to mark the subscription as managed by a service provider.

This is simply an example naming convention to use as a base. Many of the decisions about the naming convention will come from the subscription model that is chosen.

The following table shows how a company might use the naming convention outlined previously.

Company

Department (OU)

Product Line

Environment

Full Name

Contoso

Services

Business

Dev

Contoso Services Business Dev

Contoso

Services

Business

Lab

Contoso Services Business Lab

Contoso

Services

Business

Prod

Contoso Services Business Prod

Contoso

Services

Consumer

Dev

Contoso Services Consumer Dev

Contoso

Services

Consumer

Lab

Contoso Services Consumer Lab

Contoso

Services

Consumer

Prod

Contoso Services Consumer Prod

North Wind

Databases

Business

Dev

North Wind Databases Consumer Dev

North Wind

Databases

Business

Lab

North Wind Databases Consumer Lab

North Wind

Databases

Business

Prod

North Wind Databases Consumer Prod

Subscription Management

Azure AD Authentication

The recommended way to access your subscription when using Azure PowerShell is to authenticate by using the Add-AzureAccount PowerShell cmdlet. This cmdlet prompts for authentication in a window where you input your credentials that are associated with Azure Active Directory. You input either your Microsoft Account credentials or Org ID credentials that are associated with the Azure subscription.

Using this method of authentication even once with your subscription takes precedence over any management certificates you may have for your profile. (i.e., running the Import-AzurePublishSettings cmdlet.). To remove the Azure AD token and restore the management certificate method, use the Remove-AzureAccount cmdlet.

When using Azure AD authentication, occasionally you may see an error message: "Your credentials have expired. Please use Add-AzureAccount to log on again." To restore access to your subscription by using Azure PowerShell, simply run Add-AzureAccount again and authenticate.

This method of authenticating to the subscription is most convenient when working with commands or scripts interactively. It is possible to use this method with automated processes, and pass secured credentials by using the –Credential switch. However, at this time this method only works when you are using Org ID credentials, not Microsoft Account credentials.

Certificate Authentication

Management certificates are used to allow client devices to access resources within the Microsoft Azure subscription. The management certificates are x.509 v3 certificates that only contain a public key. They have the .cer file extension.

If a user requires the ability to deploy or change services running in Microsoft Azure, but does not require access to the Microsoft Azure portal, they'll need a certificate. It is very common for a developer to deploy to Azure services through Visual Studio and they will require a certificate to accomplish this task.

The x.509 v3 certificates are mapped to one or more Azure subscriptions. The possession of the private keys associated with these certificates should be given the same level of security as passwords. If the certificate private key becomes compromised, whoever holds this key can perform actions on the subscriptions for which the certificate is valid.

At this time, an Azure subscription can import 100 certificates. Certificates can be shared across multiple subscriptions. There is also a 100 certificate limit for all subscriptions for a specific Service Administrator's ID.

There are a few ways to generate a certificate. You can create a self-signed management certificate or you can download a certificate from the Microsoft Azure portal as part of what is known as a Publish Settings file.

To create your own self-signed certificate, use makecert.exe. Thus is a command-line tool that ships with Visual Studio. Or if you have access to a computer running Internet Information Services (IIS), you can generate one from there.

Using the Publish Settings File

The Publish Settings file is an XML file that contains information about the Microsoft Azure subscriptions. The file contains specific information about all subscriptions associated with the user's Microsoft ID. These are the subscriptions in which the particular Microsoft ID is associated with the Administrator or Co-administrator. The Published Settings file exposes your Azure subscription to be used with Visual Studio and Azure PowerShell.

To use Azure PowerShell within your environment, open an elevated Windows PowerShell console and execute the following commands:

Record the path used to save the Publish Settings file, for example:

C:\Users\ProfileName\Documents\AzurePublishSettingFile\YourFileName.publishsettings

  • Import-AzurePublishSettingsFile C:\Users\ProfileName\Documents\AzurePublishSettingFile\YourFileName.publishsettings
  • Get-AzureSubscription (this will give you the list of your subscriptions).
  • Find the "SubscriptionName" value.
  • Select-AzureSubscription "Your Subscription Name."

Now Azure PowerShell has set up a management certificate to interface with your Microsoft Azure subscription. To validate the association between Azure PowerShell and your subscription, execute the following Azure PowerShell cmdlet: Get-AzureLocation.

Some drawbacks of using management certificates for interfacing with an Azure subscription include:

  • It is difficult to manage and keep track of certificates in portal.
  • You cannot ensure that access to a subscription is revoked when you remove a Co-administrator unless all the certificates are removed.
  • You might experience an unknown sharing of management certificates.
Additional Setup

When using either authentication method in Azure PowerShell, some scripts, such as provisioning new virtual machines, will not function properly until you have associated a storage account with your subscription. To add this association, run the following script:

Set-AzureSubscription –SubscriptionName 'My Subscription Name' –CurrentStorageAccountName 'storageacctname001'

After running this script, you can verify that your storage account is now associated with the subscription by running Get-AzureSubscription. There should now be a value under CurrentStorageAccountName. You should only need to set this value once for most Azure PowerShell operations, and the value can be changed at any time by running Set-AzureSubscription
again.

If you have multiple subscriptions, you also have to ensure that you are targeting the correct subscription with Azure PowerShell operations. There is a default and current subscription setting that you can use to control this. When you load the published settings file or use Add-AzureAccount with access to multiple subscriptions, one subscription is configured with the default and current tag. Any operation will target this subscription unless you change the focus. To redirect PowerShell operations to a different subscription, just add the current option at the end of the Select-AzureSubscription cmdlet with the subscription name you want to target. If you want to permanently change the default subscription, then use the default option.

Development and Management Tools

For the development and management of Azure resources, there are a wide variety of tools that can be used from the Azure Management portal, Azure PowerShell, SDKs, and cross-platform and third-party downloads.

Software Development Kits (SDKs)

Following are examples of some SDKs that are available for download and the respective platforms the SDKs can be used to develop on. To get the SDKs and command-line tools you need, see the Microsoft Azure Downloads site.

.NET

Java

Node.js

PHP

VS 2015 install

Windows install

Windows install

Windows install

VS 2013 install

Mac install

Mac install

Mac install

VS 2012 install

Linux install

Linux install

Linux install

Client libraries

     

Python

Ruby

Mobile

Media

Windows install

Windows install

iOS install

iOS SDK install

Mac install

Mac install

Android install

Flash OSMF install

Linux install

Linux install

Windows Store C# install

Windows Store JS install

Windows Phone 8 install

Windows 8 install

Silverlight install

.NET SDK install

Java SDK install

Azure PowerShell

You can use Windows PowerShell to perform a variety of tasks in Azure, either interactively at a command prompt or automatically through scripts. Azure PowerShell is a module that provides cmdlets to manage Azure through Windows PowerShell.

You can use the cmdlets to create, test, deploy, and manage solutions and services delivered through the Azure platform. In most cases, you can use the cmdlets to perform the same tasks that you can perform through the Azure Management portal. For example, you can create and configure cloud services, virtual machines, virtual networks, and web applications.

The module is distributed as a downloadable file and the source code is managed through a publicly available repository. A link to the downloadable files is provided in the installation instructions later in this topic. For information about the source code, see the Azure PowerShell code repository.

XPlat-CLi

The Azure Cross-Platform Command-Line Interface (Azure CLI, or sometimes referred to as xplat-cli) provides a set of open source, cross-platform commands for working with the Azure platform. The Azure CLI provides much of the same functionality found in the Azure Management portal, such as the ability to manage websites, virtual machines, mobile services, SQL Server databases, and other services provided by the Azure platform.

The Azure CLI is written in JavaScript, and requires Node.js. It is implemented by using the Azure SDK for Node.js, and it is released under an Apache 2.0 license. To access the project repository, see Microsoft Azure Cross Platform Command Line.

Azure Service Limits Considerations

Subscriptions now exist for both ARM and ASM models. Subscriptions have associated "hard" (upper boundary) and "soft" (default) limits for many of the Azure services, features, and capabilities. Many of these soft limits can be increased greatly by simply creating a support request, but some of the hard limits have a big impact on decisions regarding subscription design. ASM based subscriptions have limits based purely on the subscription and is cumulative across all regions. ARM based subscriptions typically have limits based on the region that is being targeted in the subscription. Following are some of the hard limits in a subscription that have the greatest impact on design decisions.

Azure Object

Limit

Virtual networks

100 per subscription

Virtual machines

10,000 CPU cores per subscription

ExpressRoute

1 circuit across 20 subscriptions

10 dedicated circuits per subscription

Cloud services

200 per subscription

Network security groups

100 per subscription

Storage accounts

100 per subscription

Management certificates

100 per Service Administrator

Co-administrators

200 per subscription

For a more detailed and up-to-date list of Azure limits, see Azure Subscription and Service Limits, Quotas, and Constraints..

Azure Billing

There are differences in how billing is viewed and the available data in the output based on type of subscription. Billing information and details for pay-as-you-go subscriptions are viewed in the Usage and Billing Portal, whereas billing information for enterprise subscriptions is viewed in the account portal.

For either subscription type, the billing details can make it difficult to discern charges for the items listed if a specific naming convention is not being used. For more information about naming conventions, see the respective sections in this document, such as Virtual Machines, Virtual Networks, and Storage Accounts.

Billing Unit Conversions

An important item to note regarding the detailed billing, regardless of the subscription type is that all standard virtual machine instances are converted into Small instance hours on the bill. For example, a Windows Extra Small (A0) would have a clock hour of 1, but it shows as ¼ hour when converted to small instance hours. Similarly, any A-Series Cloud Service instance is converted into Small (A1) instance hours on the billing detail.

For a complete list of Windows and non-Windows conversions use the Azure FAQ page titled How do various instance sizes get billed?.

The following chart shows the details for each A-Series cloud service instance conversion:

Cloud Services Instance

Clock Hours

Small Instance Hours

Extra Small (A0)

1

¼ hour

Small (A1)

1

1 hour

Medium (A2)

1

2 hours

Large (A3)

1

4 hours

Extra Large (A4)

1

8 hours

Pay-As-You-Go Usage Details

Following is an example of usage details for a pay-as-you-go subscription. Notice the Component column has the name of the resources and further highlights why a good naming convention is key.

In this particular subscription, there are very few applications or databases, so it is not difficult keep track of them. However, if there were many applications following the naming style of app1, app2, app3, and the random generated name for the database, it would quickly become very difficult to decipher costs per application.

For a full list of usage details and the definition of each, refer to Understand your bill for Microsoft Azure.

Enterprise Usage Detail

Enterprise subscription billing details are slightly different. In the following example, you can see that a Component column also exists in the billing details; however, not all fields are the same as those in the previous subscription details.

The following table lists the fields for the enterprise usage details:

Detail Fields

AccountOwnerId

Day

ResourceQtyConsumed

AccountName

Year

ResourceRate

ServiceAdministratorId

Product

ExtendedCost

SubscriptionId

ResourceGUID

ServiceSubRegion

SubscriptionGuid

Service

ServiceInfo

SubscriptionName

ServiceType

Component

Date

ServiceRegion

ServiceInfo1

Month

ServiceResource

ServiceInfo2

AdditionalInfo

Tags

Store Service Identifier

Department Name

Cost Center

 
Billing Visibility and Rollup

Under Manage Access, it's possible to enable Department Administrators to see the costs associated with all accounts and subscriptions in their departments. You can also enable Account owners to see their costs.

Billing and ARM Tags

ARM tags can be used to group billing data where ARM-compliant Azure services allow defining and applying tags to organize the billing usage for Azure resources. As an example, if a customer organization is running multiple virtual machines for different organizations, tags can be used to group usage by cost center. Alternatively, tags can be used to categorize costs by runtime environment; for example, the billing usage for virtual machines running in production environment. Tags appear in billing and usage artifacts, such as usage CSV data or billing statements. For more information about ARM tags, see the respective section in this document. The following example illustrates a sample scenario utilizing tags.

Billing and CSP Model

The Microsoft Cloud Service Provider (CSP) program allows service providers to own the complete customer lifecycle, including direct billing. The service providers are able to implement their own pricing and billing policies to create customer offers, set the price, and own the billing terms. CSP partners can automatically receive monthly invoices and billing statements and incorporate incurred costs into their billing accounting for value-added services they provide to their customers.

Microsoft Azure Pricing Models

Pricing for pay-as-you-go subscriptions is based on the Pricing details page. Each service is listed on an individual page, so it may be best to use the Pricing Calculator for the majority of cost estimating.

For MSDN subscribers, Partner Network, and BizSpark accounts the pricing may differ from the pay-as-you-go model, for more information on these types of accounts, use the Member Offers page as a resource.

Enterprise account pricing differs based on commitment and other variables. The Licensing Azure for the Enterprise page reviews some of the benefits of this type of agreement. For more details about the Enterprise account pricing model, the Pricing Overview for Microsoft Azure in Enterprise Programs document is a great resource.

Azure Accounts and Access Permissions

Account Types and Directory Source

Azure Active Directory Accounts

Azure Active Directory (Azure AD) is the standalone directory service within Azure. Customers can create their administrative structure within Azure AD by defining their users and groups. This service can work on its own, because Azure AD can perform authentication without integrating with an on-premises directory.

On the other hand, organizations can choose to synchronize Azure AD with their users and groups from an on-premises Active Directory to Azure AD. This syncing effort rapidly provides availability to resources within Azure for on-premises users and groups.

All users who access the organization's Azure subscriptions, are now present in the Azure AD, which the subscription is associated with. This enables the company to manage what the users can access, or to revoke access to Azure by disabling the account in the directory.

Microsoft Accounts

The creation of Microsoft accounts is typically controlled by the users and not by the organization. With an Azure subscription, we recommend using Organization Accounts where possible to provide access to resources.

When creating Microsoft Accounts, we recommend establishing guidelines that will be used within the organization.

Do not allow the use of existing personal Microsoft Accounts. Depending on the individual permissions, these accounts may be tied to the company Azure subscriptions, and have access to storage accounts and billing information.

A Microsoft account is mapped to a person and it should be formatted to identify the user, for example: FirstName.LastName.xyz@outlook.com and not alias2763@outlook.com.

The reason for using specific naming for the Microsoft Accounts is that at the time the account is created, the identity of the user may be known. However, as time goes on and roles change within the company, accounts may be difficult to identify.

In the previous example, the xyz after FirstName.LastName is optional, and it could be used for any number of things, such as environment name, development, lab, or organization name, if that is preferred.

Organizational Accounts

Using Organizational Accounts for managing an Azure subscription is recommended over Microsoft Accounts for various reasons.

The main reason is that the organization has more control over access for adding administrators and removing access when an employee is no longer with the company.

Additionally, many of the newer Azure services offerings are relying heavily on Organizational Accounts. In some cases, having existing Microsoft Accounts tied to services prior to switching to Organizational Accounts can cause issues with the respective tenant IDs.

Access Permissions

Service Administrators and Co-Administrators

A Microsoft Azure subscription and the associated resources can be accessed via the Azure Management portal, Azure PowerShell, Visual Studio, or other SDKs and tools. When a subscription is created, a Service Administrator is assigned. The default Service Administrator is the same as the Account Administrator, who is also the contact person (via email) for the subscription.

The Account Administrator can assign a different Service Administrator by editing the subscription in the Microsoft Online Services Customer Portal.

To assist with the management of the Azure Services, the Service Administrator will add Co-administrators to the subscription. To be added as a Co-administrator, a user must have a valid Microsoft Account or Org ID, if this is the method of authentication used in the subscription. The first Co-administrator in the subscription must be added by the Service Administrator. After that, any Co-administrator can add or remove other Co-administrators in the subscription.

Removing or adding Co-administrators must be done in the Azure Management portal, and the option is located under Settings > Administrators.

Subscription Co-administrators share the same rights and permissions that the Service Administrator has, with the following exception: a Co-administrator cannot remove the Service Administrator from a subscription. Only the Microsoft Azure account owner (Account Administrator) can change the Service Administrator for a subscription, by editing the subscription in the Microsoft Online Services Customer Portal, as shown previously.

The Co-administrator account can sign in to the Microsoft Azure Management portal, and view all services. The Service Administrator and Co-administrator have the ability to add, modify, or delete Azure services such as websites, cloud services, and mobile services. A single subscription is limited to a maximum of 200 Co-administrators.

Role Based Access Control Models

With the introduction of Role Based Access Control (RBAC), Microsoft Azure now has a security model to perform access control of resources by users on a more granular level. Users specified in RBAC permissions can access and execute actions on the resources within their scope of work. Because there is a limit of 200 Co-administrators per subscription, RBAC allows more users to manage their Azure Services. At the same time, RBAC limits access to only the specific resources needed rather than the entire subscription.

RBAC is only available in the Azure Preview portal and when using the Azure Resource Manager APIs. The Service Administrator and Co-administrator will continue having access to all portals and APIs, however any user added only via RBAC will not be able to access the current version of the Azure Management Portal or Service Management APIs.

Mandatory:

  • Service Administrator and Co-Administrators see all resources in all portals and through APIs.
  • Users defined in RBAC models do not have access to the Service Management portal or APIs.
  • Users not assigned to either group see only the empty Azure Resource Management portal, and they cannot access the Service Management portal.

With RBAC, the subscription is no longer the management boundary for permissions in Azure. Resource Groups are new constructs to group resources that have a common application or service lifecycle. In addition to granting access at the Resource Group level, RBAC permissions can be applied to an individual resource such as SQL Database, websites, virtual machines, and storage accounts.

RBAC administration is implemented by the subscription Service Administrator and Co-administrators. Customers can leverage their existing Azure AD users and groups, or use on-premises Active Directory accounts for access management.

Role Permissions

There are twenty-two built-in Azure RBAC roles for controlling access to Azure resources:

  1. The Owner can perform all management operations for a resource and its child resources, including access management or granting access to others.
  2. The Contributor can perform all management operations for a resource, including creating and deleting resources. A contributor cannot grant access to others.
  3. The Reader has Read-only access to a resource and its child resources. A Reader cannot read secrets.
  4. The API Management Service Contributor lets users manage API Management Services, but not access to them.
  5. The Application Insights Component Contributor lets users manage Application Insights components, but not access them.
  6. The BizTalk Contributor lets users manage BizTalk services, but not access them.
  7. The ClearDB MySQL DB Contributor lets users manage ClearDB MySQL Databases, but not access them.
  8. The Data Factory Contributor lets users manage data factories, but not access them.
  9. The DocumentDB Account Contributor lets users manage DocumentDB, but not access it.
  10. The Intelligent Systems Account Contributor lets users manage Intelligent Systems accounts, but not access them.
  11. The NewRelic APM Account Contributor lets you manage New Relic Applications Performance Management accounts and applications, but not access them.
  12. The Redis Cache Contributor lets users manage Redis caches, but not access them.
  13. The Scheduler Job Collections Contributor lets users manage scheduled job collections, but not access them.
  14. The Search Service Contributor lets users manage Search Service, but not access it.
  15. The SQL DB Contributor lets users manage SQL Databases, but not access them. Users also cannot manage security-related policies or parent SQL servers.
  16. The SQL Security Manager lets users manage the security-related policies of SQL Server instances and databases, but not access them.
  17. The SQL Server Contributor lets users manage SQL Server instances and databases, but not access them or their security-related policies.
  18. The Storage Account Contributor lets users manage storage accounts, but not access them.
  19. The User Access Administrator lets users manage user access to Azure resources.
  20. The Virtual Network Contributor lets users manage virtual networks, but not access them.
  21. The Web Plan Contributor lets users manage the web plans for websites, but not access them.
  22. The Website Contributor lets users manage websites (not web plans), but not access them.
Command-Line and API Access for Azure Role Based Access Control

Enforcing the access policies that you configure using RBAC is done by using Azure Resource Manager APIs. The Azure Preview portal, command-line tools, and Azure PowerShell use the Resource Manager APIs to run management operations. This ensures that access is consistently enforced regardless of what tools are used to manage Azure resources.

The following article provides additional details: Role-based access control in the Microsoft Azure portal.

Extending the Datacenter Fabric to Microsoft Azure

Extending services from on-premises implementations to Azure resources is largely driven by operational requirements.

  • How will systems be patched and maintained?
  • How will systems be monitored?
  • What security and audit tools will be required?

If any of the answers to these and other similar questions is by using an on-premises management tool, decisions need to be made as to how that is achieved.

  • Do on-premises systems need to be augmented with gateways or additional infrastructure that resides in Azure?
  • Are agents necessary, and can they be built-in to an image, or must they be post deployed?
  • What protocols and communications flows are needed?
  • What identities and permissions are required?
  • Is a common user interface required by operations?
  • Is it the same or different operators on-premises and in Azure?

The answers to these questions can drive decisions about identity, security, and network connectivity or be driven by them, depending on the organizations priorities.

Moving to Azure and the cloud provides opportunities to do things differently. It is important to think about processes and functions from a cloud perspective. Treat everything like a service. Can Azure Services meet the needs? Think about minimum viable solutions—the agility and cost benefits can be enormous.

Azure offers Operational Insights, Application Insights, Log collection, and antivirus solutions from multiple venders, and encryption and backup solutions from Microsoft and third-parties. The more the platform can be leveraged and SaaS offerings utilized, the greater the benefits to the organization.

Start with a cloud first mentality. That means using platform services to reduce infrastructure, management costs, and anchors to legacy solutions. Focus on being agile and scalable so the organization can capitalize on the elasticity and pay-for-use characteristics of Azure.

Microsoft Azure Management Models

Azure Service Management Overview

The Azure Service Management (ASM) Representational State Transfer (REST) API has historically been the primary model for managing Azure resources. The original and present iterations of the Azure Portal, the Azure PowerShell cmdlets, the cross-platform CLI and the Azure Management Libraries for .NET are all built on top of the ASM API. The ASM API was initially developed several years ago and is missing many modern cloud management capabilities, whether it's desired state configuration, role based access control (RBAC) or a flexible extensibility model for future Azure first-party services. ASM supports authentication with either X.509 certificates or Azure Active Directory (AAD).

Note: Azure Service Management (ASM) is not supported in the CSP subscriptions as defined in the CSP model. Only customer-managed subscriptions can be managed using Azure Service Management (ASM). CSP subscriptions are compliant only with the Azure Resource Management model described in the following section.

Azure Resource Manager Overview

The Azure Resource Manager REST API (ARM) has been developed to replace ASM as the authoritative method to manage Azure resources. ARM supports both desired state configuration and RBAC, while providing a pluggable model allowing new Azure services to be cleanly integrated. The preview Azure Portal and the ARM mode of the Azure PowerShell cmdlets both use ARM. AAD is the only authentication method supported by ARM.

ARM introduces the concept of a resource group which is a collection of individual Azure resources. A resource group is associated with a specific Azure region but may contain resources from more than one region.

A resource group can be described in the following scenarios:

Type

Description

Example

Vertical

Contains all resources comprising the single application

Company HR Application Resource Group

Horizontal

Combines all resources that comprise the specific deployment topology layer such as shared services used by multiple applications or app-specific tier

Shared Management Services Resource Group

ARM supports the use of a parameterized resource group template file that can be used to create one or more resource groups along with their individual resources. The deployment of a resource group uses desired state configuration. ARM ensures that the resources are deployed in accordance with the appropriately parameterized template file for the resource group. Resource providers exist for many types of Azure resources, and more Azure services are currently adding ARM support, gradually migrating from the legacy ASM model.

ARM supports role based access control (RBAC), and this support is expressed in the preview Azure Portal and the ARM mode of the Azure PowerShell cmdlets. ARM provides several core roles – Owner, Contributor, and Reader. Individual resource providers support additional resource-specific roles, such as Search Service Contributor and Virtual Machine Contributor.

Azure Resource Manager Templates

Azure Resource Manager (ARM) templates enable quick and easily provisioning of Azure applications via declarative JSON. The single JSON template can be constructed to deploy multiple services, such as virtual machines, virtual networks, storage, app services, and databases. The same template can be used to repeatedly and consistently deploy the application during every stage of the application lifecycle. Consequently, templates provide a reusable declarative model that complements imperative management patterns defined by PowerShell.

Azure Resource Manager (ARM) templates can be deployed from Azure PowerShell, Azure CLI, or the Azure preview portal. The following excerpt demonstrates how a known quickstart template defining a simple Azure VM can be deployed using Azure PowerShell.

$deployName="<deployment name>"
$RGName="<resource group name>"
$locName="<Azure location, such as West US>"
$templateURI="https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/101-simple-windows-vm/azuredeploy.json"
New-AzureResourceGroup -Name $RGName -Location $locName
New-AzureResourceGroupDeployment -Name $deployName -ResourceGroupName $RGName -TemplateUri $templateURI

Template Constructs

The list below describes typical constructs that can be found in an ARM template:

The parameters section represents a collection of the parameters that are defined in all of the resources, includes property values provided when setting up a resource group.

"parameters": {
  "siteName": {
    "type": "string"
  },
  "hostingPlanName": {
    "type": "string"
  },
  "siteLocation": {
    "type": "string"
  },
}

The resources section lists the resources that the template creates, with each resource described in detail, including its properties, and parameters for user-defined values.

{
    "name": "[parameters('databaseName')]",
    "type": "databases",
    "location": "[parameters('serverLocation')]",
    "apiVersion": "2.0",
    "dependsOn": [
      "[concat('Microsoft.Sql/servers/', parameters('serverName'))]"
    ],
    "properties": {
      "edition": "[parameters('edition')]",
      "collation": "[parameters('collation')]",
      "maxSizeBytes": "[parameters('maxSizeBytes')]",
      "requestedServiceObjectiveId": "[parameters('requestedServiceObjectiveId')]"
    }
},

The templateLink references another template from the current one. The following excerpt shows how the dependent JSON template file located in Azure storage can be linked from the primary template definition:

{
  "properties": {
    "template": {
      "$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
      "contentVersion": "1.0.0.0",
      "parameters": {
        "childParameter": { "type": "string" }
      },
      "resources": [ {
        "name": "Sub-deployment",
        "type": "Microsoft.Resources/deployments",
        "apiVersion": "2015-01-01",
        "properties": {
          "mode": "Incremental",
          "templateLink": {
            "uri": "http://<stac>.blob.core.windows.net/templates/template.json",
            "contentVersion": "1.0.0.0",
          },
          "parameters": {
            "subParameterName": { "value": "[parameters('childParameter']" }
          }
        }
      } ]
    }
  }
}

The CustomScriptExtension references a Custom Script Extension for Windows that allows it to execute PowerShell scripts on a remote Virtual Machine, without logging into it. The scripts can be executed after provisioning the VM or anytime during the lifecycle of the VM without requiring to open any additional ports on the VM. The most common use case for Custom Script include running, installing, and configuring additional software on the VM post provisioning.

The following excerpt illustrates how to reference the Custom Script Extension from within the JSON template to run a custom Windows PowerShell script to apply post-provisioning configuration:

{
   "type": "Microsoft.Compute/virtualMachines/extensions",
   "name": "MyCustomScriptExtension",
   "apiVersion": "2015-05-01-preview",
   "location": "[parameters('location')]",
   "dependsOn": [
      "[concat('Microsoft.Compute/virtualMachines/',parameters('vmName'))]"
   ],
   "properties": {
      "publisher": "Microsoft.Compute",
      "type": "CustomScriptExtension",
      "typeHandlerVersion": "1.4",
      "settings": {
         "fileUris": [
            http://<stacn>.blob.core.windows.net/customscriptfiles/start.ps1
      ],
      "commandToExecute": "powershell.exe -ExecutionPolicy Unrestricted -File start.ps1"
      }
   }
}

Common Template Scopes

The following key solution templates scopes have been identified in practical experience. These three scopes (capacity, capability, and end to end solution) are described in more detail below.

Type

Description

Example

Capacity Scope

Delivers a set of resources in a standard topology that is pre-configured to be in compliance with regulations and policies

Deploying a standard development environment in an Enterprise IT or SI scenario

Capability Scope

Deploying and configuring a topology for a given technology

Common scenarios including technologies such as SQL Server, Cassandra, Hadoop, etc.

End to End Solution Scope

Targeted beyond a single capability, and instead focused on delivering an end to end solution comprised of multiple capabilities. A solution scoped template scope manifests itself as a set of one or more capability scoped templates with solution specific resources, logic, and desired state.

An end to end data pipeline solution template that might mix solution specific topology and state with multiple capability scoped solution templates such as Kafka, Storm, and Hadoop

Template Free-form vs. Known Configurations

While the template is generally perceived to give customers the utmost flexibility, many considerations affect the choice of whether to use free-form configurations vs. known configurations.

Free-form Configurations

Free-form configurations provide the most flexibility by allowing customization of the resource type and supplying values for all resource properties, such as selecting a VM type and providing an arbitrary number of nodes and attached disks for those nodes.

Nonetheless, since in mature organizations templates are expected to be used to deploy large Azure resource topologies, the complexity of building a template for a sophisticated infrastructure deployment potentially containing hundreds of varied resources results in substantial overhead for designing, maintaining and deploying the free-form template.

Known Configurations

Rather than offer a template that provides total flexibility and countless variations, the common pattern is to provide the ability to select known configurations—in effect, standard sizes such as sandbox, small, medium, and large. Other examples of such sizes are product offerings, such as community edition or enterprise edition. In other cases, it may be workload specific configurations of a technology – such as map reduce or nosql.

Many enterprise IT organizations, OSS vendors, and SIs make their offerings available today in this way in on-premises, virtualized environments (enterprises) or as software-as-a-service (SaaS) offerings (CSVs and OSVs). This approach provides good, known configurations of varying sizes that are preconfigured for customers.

Without known configurations, end customers must determine cluster sizing on their own, factor in platform resource constraints, and do math to identify the resulting partitioning of storage accounts and other resources (due to cluster size and resource constraints). Known configurations enable customers to easily select the right standard size for a given deployment. In addition to making a better experience for the customer, a small number of known configurations is easier to support and can help deliver a higher level of density.

Template Dependencies

For a given resource, there can be multiple upstream and child dependencies that are critical to the success of deployment topology. Such dependencies can be defined on other resources using dependsOn keyword and resources property of a resource in the ARM template. As an example, a virtual machine may be dependent on having a database resource successfully provisioned. In another case, multiple cluster nodes must be installed before deploying a virtual machine with the cluster management tool.

While dependsOn is a useful tool to map dependencies between resources comprising the deployment, it needs to be used judiciously since it can impact the deployment performance characteristics. As such dependsOn should not be used to document how resources are interconnected. The lifecycle of dependsOn is just for deployment and is not available post-deployment. Once deployed there is no way to query these dependencies. Use of the dependsOn keyword may have implication on the deployment engine operation that would prevent it from using parallelism where it might have otherwise. The mechanism called resource linking should be used instead to document and provide query capability over the relationships between resources.

Azure Resource Manager Resource Locks

There are numerous scenarios where an administrator needs to place a lock on a resource or resource group to prevent other users in the organization from committing write actions or accidentally deleting a critical resource. Azure Resource Manager provides the ability to restrict operations on resources through resource management locks. Resource locks are policies which enforce a lock level at a particular scope. The lock level identifies the type of enforcement for the policy, which presently has two values – CanNotDelete and ReadOnly. The scope is expressed as a URI and can be either a resource or a resource group.

For example, various resources are used in an off-and-on pattern, such as virtual machines which are turned on periodically to process data for a given interval of time and then turned off. In this scenario, the VM shut down must be enabled but it is imperative that the underlying storage account not be deleted. In this scenario, a resource lock with a lock level of CanNotDelete can be applied on the storage account.

In another scenario, business organization may have periods where updates must not go into production. In these cases, the ReadOnly lock level stops creation or updates. For example, a retail company may not want to allow updates during holiday shopping periods; a financial services company may have constraints related to deployments during certain market hours. A resource lock can provide a policy to lock the resources as appropriate. This could be applied to just certain resources or to the entirety of the resource group. The resource lock can be applied both via Azure PowerShell or added within the context of the Azure Resource Manager template.

Azure Resource Manager QuickStart Templates

The Microsoft Azure Product Group created a community-maintained set of quickstart ARM templates that could be used as building blocks to author custom JSON templates for complex workloads to be deployed in Azure. For representative purposes a subset of provided templates is listed in the table below:

QuickStart ARM Templates

Application Gateway with Public IP

Create an Application Gateway with Public IP

Virtual Network with Subnets, a local network, and a VPN gateway

Create a Virtual Network with two Subnets, a local network, and a VPN gateway

VM with multiple NICs

Create a VM with multiple network interfaces and RDP accessible

Windows VM with tags

This template takes a minimum amount of parameters and deploys a Windows VM with tags, using the latest patched version.

2 VMs in a Load Balancer

2 VMs in a Load Balancer and Load Balancer rules

Multi tier VNet with NSGs and DMZ

Install Virtual Network with DMZ Subnet

Network Interface in a Virtual Network with Public IP Address

Network Interface in a Virtual Network with Public IP Address

Azure Resource Manager Tags

Azure Resource Manager provides a tagging feature that facilitates resource categorization according to customer requirements for managing or billing. Tags are defined as Name-Value pairs assigned to resources or resource groups and can be used in scenarios where customer business processes and organizational hierarchy call for a complex collection of resource groups and resources and subscription assets need to be structured according to established policies. Each resource can have up to 15 tags. Users are able to sort and organize resources by tags. Tags may be placed on a resource at the time of creation or added to an existing resource. Once a tag is placed on a billable resource, created via the Azure Resource Manager, the tag will be included in the usage details found in the Usage and Billing portal.

Tags are persisted in the resource's properties in the order they are added. The following Azure PowerShell excerpt demonstrates how to obtain the tag info associated with the existing virtual machine demonstrating the order in which tags were associated with the virtual machine resource.

PS C:\> Get-AzureVM -Name MyVM -ResourceGroupName Group-1

ResourceGroupName : Group-1

Id : /subscriptions/<..>/resourceGroups/Group-1/providers/Microsoft.Compute/virtualMachines/MyVM

Name : MyVM

Type : Microsoft.Azure.Management.Compute.Models.VirtualMachineGetResponse

Location : westus

Tags : {

"Department": "MarketingDepartment",

"Application": "LOBApp",

"Created By": "CEO",

"AppPropOn1": "AppInsightsComponent",

"AppPropOne": "One"

}

...

NetworkInterfaceIDs : {...c}

Alternatively, tags can be defined in the Resource Manager template as demonstrated in the excerpt below:

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "newStorageAccountName": {
      "type": "string",
      "metadata": {
        "description": "Unique DNS Name for the Storage Account where the Virtual Machine's disks will be placed."
      }
    },
  "resources": [
    {
      "type": "Microsoft.Storage/storageAccounts",
      "name": "[parameters('newStorageAccountName')]",
      "apiVersion": "2015-05-01-preview",
      "location": "[variables('location')]",
      "tags": {
        "Department": "[parameters('departmentName')]",
        "Application": "[parameters('applicationName')]",
        "Created By": "[parameters('createdBy')]"
      },
    },
    {
      "apiVersion": "2015-05-01-preview",
      "type": "Microsoft.Network/publicIPAddresses",
      "name": "[variables('publicIPAddressName')]",
      "location": "[variables('location')]",
      "tags": {
        "Department": "[parameters('departmentName')]",
        "Application": "[parameters('applicationName')]",
        "Created By": "[parameters('createdBy')]"
    },
  ]
}

Generally, subscription resources can have tags defined to accommodate the following scenarios:

  • Resources that serve a similar role in customer organization
  • Resources that belong to the same department – i.e. finance, legal, retail, etc.
  • Resources comprising the same environment – i.e. dev, test, or prod.
  • Resources that are managed by the same responsible party - i.e. Alice, Bob, etc.

Mature organizations are encouraged to create a custom tag taxonomy adopted to all Azure organizational assets to ensure all actors consuming Azure resources are compliant with established policies. For example, users utilizing organization-specific tags, such as "Contoso-DeptOne" instead of applying duplicate but slightly different tags (such as "dept" and "department").

The following template excerpt contains JSON that describes tags for a resource that specify the environment type, project name and internal billing chargeback ID. The values for these are passed in via parameters to make this template more re-usable and of higher value for Systems Integrators, Corporate IT, and Cloud Service Vendors. This approach enables them to use the same template to deploy capacity or capabilities for a multitude of customers that each will have distinct values for these tags.

"tags": {
"ChargebackID": "[parameters(chargebackID)]",
"ProjectName": "[parameters(projectName)]",
"EnvironmentType" :"[parameters('environmentType')]"
},

Azure has many partners who have incorporated tags into their cost management solutions. Specifically, such as
Apptio, Cloudability, Cloudyn, Cloud Cruiser, Hanu Insight
and RightScale
are leveraging tags in their products.

Microsoft Azure Storage

Microsoft Azure Storage Services Overview

Every solution deployed in Microsoft Azure leverages an aspect of Azure Storage, making storage a common component and critical to planning any Azure-based solution design. The storage planning considerations covered in this section include:

  • Designing Azure Storage solutions, which account for the service and application use cases, impact storage account quantity, type of storage leveraged, locations, and availability models.
  • Planning considerations including the types of workloads, the method of access (internal, external, or peered) and storage account security.
  • Considerations surrounding cloud-integrated storage solutions including Azure Backup, System Center Data Protection Manager, StorSimple, and third-party tools and solutions that leverage Azure Storage.

Managing, monitoring, and troubleshooting topics, including storage throttling behaviors, storage analytics, and IaaS virtual machine considerations, including I/O profiles and maintaining disk consistency within workloads.

Recommended: For organizations that are new to Azure Storage, it is often helpful to draw comparisons to private cloud storage or traditional SAN storage as a way to understand some of the basic concepts required in an Azure Storage design (for example, compare BLOB storage account to a LUN).

Storage Planning

Planning the storage account infrastructure is perhaps the most important step of any Microsoft Azure deployment because it sets the foundation for performance, scalability and functionality. It is first necessary to understand the two types of storage accounts (standard and premium) and the services available within each type of account. The following sections outline the differences at a high level.

Storage Account Types and Services

A standard storage account includes Blob, Table, Queue, and File storage services. These storage services are included in every storage account created. A storage account provides a unique namespace for working with the blobs, queues, and tables.

  • Blob storage stores file data. A blob can contain any type of text or binary data, such as a document, media file, or application installer. Every data blob is organized into an object called a container within each storage account. A storage account can contain any number of containers (but it must have at least one), and a container can contain any number of blobs. Blob storage offers two types of blobs: block blobs and page blobs.
    • Block blobs are optimized for streaming and storing cloud objects, and they are a good choice for storing documents, media files, and backups.
    • Page blobs are optimized for representing IaaS disks and supporting random writes. An Azure virtual machine IaaS disk is a virtual hard disk that is stored as a page blob.
  • Table storage stores structured datasets. Table storage is a NoSQL key-attribute data store, which allows for development and fast access to large quantities of data.
  • Queue storage provides messaging for workflow processing and for communication between components of cloud services.
  • File storage offers shared storage for legacy applications that leverage the SMB 2.1 protocol. Azure virtual machines and cloud services can share file data across application components by using mounted shares. On-premises applications can also access data in a share via the File service REST API.
    Note: Access to files is available to virtual machines residing in the same Azure region over standard universal naming convention (UNC) paths. Access across regions or from on-premises infrastructures is not supported. There are key differences to areas such as storage capacity and I/O between blob and file access, which should be accounted for in every design. These differences are outlined in the Storage Limits section of the reference article Azure Subscription and Service Limits, Quotas, and Constraints.

Standard storage accounts are available with four redundancy types:

  • Locally redundant (LRS)
  • Geo-redundant (GRS)
  • Zone redundant (SRS)
  • Read access geo-redundant (RA-GRS)

These redundancy values and their potential use will be covered in later sections of this document.

A premium storage account currently supports only Azure virtual machine disks that are backed by page blobs. A premium storage account stores only page blobs, and only REST APIs for page blobs and their containers are supported. From an infrastructure perspective, premium storage stores data on solid-state drives (SSDs), whereas standard storage stores data on hard disk drives (HDDs). As a result, premium storage delivers high-performance, low-latency disk support for I/O intensive workloads running on Azure virtual machines. The following characteristics summarize the current capabilities of Azure premium storage. Premium storage offers:

  1. 4 – 5 IOPS per GB @ 256KB I/O size
  2. 5,000 IOPS with no-cache. Much higher with cache depending on virtual machine size
  3. 200 MB/sec bandwidth per disk.
  4. 1 TB maximum disk size.
  5. 50,000 IOPS @ 8KB; 512 MB/sec virtual machine limit

Premium storage is limited to local replication only. Premium Storage GRS is not currently available. However, you can optionally create snapshots of your disks and copy those snapshots to a standard GRS storage account if required. This enables the ability to maintain a geo-redundant snapshot of data for disaster recovery purposes.

For high-scale applications and services, you can attach several premium storage disks to a single virtual machine, and support up to 32 TB of disk storage per virtual machine and drive more than 64,000 IOPS per virtual machine at less than 1 millisecond latency for read operations. Like standard storage accounts, premium storage keeps three replicas of data within the same region, and ensures that a Write operation will not be confirmed until it is durably replicated.

Every object that is stored in Azure Storage has a unique uniform resource identifier (URI) address; the storage account name forms the subdomain of that address. The subdomain together with the domain name, which is specific to each service, form an endpoint for your storage account.

For example, if your storage account is named azra1, the default endpoints for your storage account would be:

  • Blob service: http://azra1.blob.core.windows.net
  • Table service: http://azra1.table.core.windows.net
  • Queue service: http://azra1.queue.core.windows.net
  • File service: http://azra1.file.core.windows.net

The endpoints for each storage account are visible on the storage Dashboard in the Azure Management portal after the account has been created.

The URI for accessing an object in a storage account is built by appending the object's location in the storage account to the endpoint. For example, a blob address might have this format: http://azra1.blob.core.windows.net/mycontainer/myblob.

Storage Security

Access to storage accounts is possible through two means:

  • Azure authentication – These mechanisms include Azure Active Directory (Organization ID) and Microsoft Accounts (formerly Live ID)
  • Storage Account Key – This mechanism primarily is used for programmatic access by applications. This includes cloud and line-of-business applications and graphical user interface (GUI) tools to manage storage accounts such as CloudBerry or Storage Explorer.

Please refer to the Storage Security section later in this document to understand common practices and implications for using storage account keys.

Feature References

Introduction to Microsoft Azure Storage

http://azure.microsoft.com/en-us/documentation/articles/storage-introduction/

Azure Storage documentation and intro videos

http://azure.microsoft.com/en-us/documentation/services/storage/

Introduction to Premium Storage

http://azure.microsoft.com/en-us/documentation/articles/storage-premium-storage-preview-portal/

Technical Overview

http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/

Quick Start Guide

http://azure.microsoft.com/en-us/documentation/articles/storage-getting-started-guide/

Microsoft Azure Storage Team Blog

http://blogs.msdn.com/b/windowsazurestorage

Understanding Block Blobs and Page Blobs

https://msdn.microsoft.com/en-us/library/ee691964.aspx

Introducing Azure Storage Append Blob

http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/13/introducing-azure-storage-append-blob.aspx

Mandatory:

  • There are hard limits on the quantity, size, and expected performance of Azure Storage accounts. It is critical to review the Azure subscription and service limits, quotas, and constraints pertaining to storage accounts when planning Azure solutions. For more information, see Standard Storage Limits.
  • A single storage account is limited to a maximum of 500 TB. If this limit is exceeded, a new storage account must be created.
  • The maximum size of any Azure file share is 5 TB. If this limit is exceeded, a new file share must be created.

Recommended: Consider the use of premium storage when a higher level of disk performance is needed for a given workload or application. Premium storage is high performance SSD-based storage designed to support I/O intensive workloads with significantly high throughput and low latency. With premium storage, you can provision a persistent disk and configure its size and performance characteristics to meet your application requirements.

Design Guidance

When designing storage account types and services, consider the following:

Capability Considerations

Capability Decision Points

Capability Models in Use

Different storage account types serve different purposes.

Each storage account should be allocated for a specific purpose and not be a generic, all-purpose container.

You need to decide how to allocate storage accounts for different purposes within your project..

Within an IaaS deployment, consider a separate storage account for maintenance of master images that can be deployed to other storage accounts throughout the subscription.

Within an IaaS deployment, consider a separate storage account for any backup purposes, separate from any production data, such that it can be created in a different region than the primary data.

Different storage services provide unique capabilities.

Understand the type of data and data flow that the storage account will serve to determine the storage service that the account will provide.

For key lookups at scale for structured data, use Tables.

For scans or retrievals of large amount of raw data, such as analytics or metrics, use Blobs.

For streaming and storing documents, videos, pictures, backups, and other unstructured text or binary data, use Blobs.

For IaaS virtual machine VHDs, use Blobs.

For process workflows or decoupling applications, use Queues.

To share files between applications running in virtual machines that are using familiar Windows APIs or the File service REST API, use Files

The storage service offers two types of blobs: block blobs and page blobs.

Understand and decide on the use of block blobs or page blobs when you create the blob..

In the majority of cases, page blobs will be utilized. Page blobs are optimized for random Read and Write operations (best for virtual machines and VHDs).

Page blobs have a maximum storage of 1 TB (compared to only 200 GB for a block blob), and they commit immediately (compared to a block blob, which remains uncommitted until a commit is issued).

Block blobs are for streaming and storing documents, videos, pictures, backups, and other unstructured text or binary data.

Additionally, there are cost differences associated to each type of storage.

Microsoft Azure provides several ways to store and access randomly access data in the cloud (blobs).

Decide when to use Azure Blobs, Azure Files, or Azure data disks.

Azure Files is most often used when data stored in the cloud needs to be accessed by multiple IaaS or PaaS virtual machines with a standard SMB interface or UNC path.

Azure Blobs is most often used for larger capacity uses and where random access is needed, such as for multiple disks, and you want to be able to access data from anywhere.

Azure data disks are most often used when you want to store data that is not required to be accessed from outside the virtual machine to which the disk is attached. It is exclusive to a single virtual machine (only one at a time).

Depending on how data is replicated (LRS, GRS, ZRS, RA-GRS), the blob type, storage service, storage transactions, and the use of premium storage will affect the overall cost of the Azure Storage solution.

When making decisions about how your data is stored and accessed, you should also consider the costs involved.

Your total cost depends on how much you store, the volume of storage transactions and outbound data transfers, and which data redundancy option you choose.

T The type of data will drive most of these decisions. For example, data that is critical to a business may drive the decision to have GRS, whereas data that is less critical may suffice with LRS.

Data that must be quickly accessed with the highest possible IOPS may drive the usage of premium storage, where data without that requirement may accept the use of standard storage.

These requirements will necessitate and support the higher costs associated with the storage services.

Storage containers can be used to further organize data in storage accounts.

Decide how you want the data in Azure Storage to be organized.

Deciding how to design and build containers is similar to how you would design and build a folder structure on a file server. It is simply how you want to organize the data.

By default, all VHDs will be put into a "vhds" folder, but you can change or specify whatever container structure you want to use.

Concurrency settings can be modified for Azure Storage accounts.

Modern applications usually have multiple users viewing and updating data simultaneously. This requires developers to think carefully about how to provide a predictable experience to their end users, particularly for scenarios where multiple users can update the same data.

There are three main data concurrency strategies developers typically consider:

  • Optimistic concurrency
  • Pessimistic concurrency
  • Last writer wins

You can opt to use optimistic or pessimistic concurrency models to manage access to blobs and containers in the Blob service. If you do not explicitly specify a strategy, last writer wins is the default.

For IaaS, concurrency settings do not need to be modified.

For PaaS, the developers need to consider the type of application, the user base, and the data types to help determine the concurrency settings.

There are storage account limitations that must be understood and respected.

Aside from the size limits of Azure Storage accounts, you must consider the throughput limitations of each account and design your storage accounts with those in mind.

You are more likely to hit the throughput limitations before you hit the size limitations. You are also limited by the number of storage accounts per subscription.

The primary constraining factor is the number of VHD files that can be stored in each storage account. For virtual machines in the Basic tier, do not place more than 66 highly used VHDs in a storage account to avoid the 20,000 total request rate limit (20,000/300).

For virtual machines in the Standard tier, do not place more than 40 highly used VHDs in a storage account (20,000/500). The term highly used refers to VHDs that push the upper limits of the total request rates.

If you have VHDs that are not highly used and do not come close to the maximum request rates, you can put more VHDs in the storage account.

Note that this refers to virtual hard disks and not virtual machines. Virtual machines may indeed contain multiple virtual hard disks.

Single or multiple storage accounts can be used.

Additional storage accounts may be used to get more scale than a single storage account.

Consider how to design the IaaS or PaaS workloads to dynamically add accounts, in the event that more scale is needed for the solution in the future, beyond what a single storage account can provide.

Storage account throughput is the determining factor in using single or multiple storage accounts.

Consider the throughput limitations of each of the storage account types. Also, consider that throughput can be maximized by using:

  • More simultaneous outstanding I/O
  • Parallel page writes or block writes in a single blob
  • Parallel uploads for multiple blobs

Naming Conventions

The choice of a name for any asset in Microsoft Azure is an important choice because:

  • It is difficult (though not impossible) to change that name at a later time.
  • There are certain constraints and requirements that must be met when choosing a name.

This table covers the naming requirements for each element of a storage account.

Item

Length

Casing

Valid characters

Storage account name

3-24

Lower case

Alphanumeric

Blob name

1-1024

Case sensitive

Any URL char

Container name

3-63

Lower case

Alphanumeric and dash

Queue name

3-63

Lower case

Alphanumeric and dash

Table name

3-63

Case insensitive

Alphanumeric

It is also possible to configure a custom domain name for accessing blob data in your Azure Storage account. The default endpoint for the Blob service is:

https://mystorage.blob.core.windows.net

But if you map a custom domain (such as www.contoso.com) to the blob endpoint for your storage account, you can also access blob data in your storage account by using that domain. For example, with a custom domain name, http://mystorage.blob.core.windows.net/mycontainer/myblob could be accessed as http://www.contoso.com/mycontainer/myblob.

Use the following reference when this capability is required.

Feature References

Naming and Referencing Containers, Blobs, and Metadata

https://msdn.microsoft.com/en-us/library/dd135715.aspx

Naming Queues and Metadata

https://msdn.microsoft.com/en-us/library/dd179349.aspx

Naming Tables

https://msdn.microsoft.com/en-us/library/azure/dd179338.aspx

Configure a custom domain name for blob data in an Azure Storage account

http://azure.microsoft.com/en-us/documentation/articles/storage-custom-domain-name

Mandatory:

  • A blob name can contain any combination of characters, but reserved URL characters must be properly escaped. Avoid blob names that end with a period (.), a forward slash (/), or a sequence or combination of the two. By convention, the forward slash is the virtual directory separator.
    Also don't use a backward slash (\) in a blob name. The client APIs may allow it, but then fail to hash properly, and the signatures will mismatch.
  • It is not possible to modify the name of a storage account or container after it has been created. You must delete it and create a new one if you want to use a new name.

Recommended: We recommend that you establish a naming convention for all storage accounts and types before you create any.

Design Guidance

When you choose naming conventions for storage objects, consider the following:

Resource

Restrictions

Recommendations

Storage account

Must be between 3 and 24 characters in length and use numbers and lower-case letters only. Not only must it be unique within the subscription, but it also must be unique across Azure.

Naming should be representative of its contents (for example, virtual machines, backup data, archive data, or images).

Storage Blob container

Container names must start with a letter or number, and they can contain only letters, numbers, and the hyphen (-) character.

Every hyphen must be immediately preceded and followed by a letter or number; consecutive hyphens are not permitted in container names.

All letters in a container name must be lower case.

Container names must be from 3 through 63 characters long.

Naming should be representative of its contents (for example, vhds, server images, or backup-Mar03-2015)

Storage Blob

A blob name can contain any combination of characters.

A blob name must be at least one character long and cannot be more than 1,024 characters long.

Blob names are case sensitive.

Reserved URL characters must be properly escaped.

The number of path segments comprising the blob name cannot exceed 254. A path segment is the string between consecutive delimiter characters (for example, the forward slash) that corresponds to the name of a virtual directory.

Naming should be representative of its contents.

Queues

Every queue within an account must have a unique name. The queue name must be a valid DNS name.

A queue name must start with a letter or number, and can only contain letters, numbers, and the hyphen (-) character.

The first and last letters in the queue name must be alphanumeric. The hyphen cannot be the first or last character. Consecutive hyphens are not permitted in the queue name.

All letters in a queue name must be lower case.

A queue name must be from 3 through 63 characters long.

Naming should be representative of its contents.

Tables

Table names must be unique within an account.

Table names can contain only alphanumeric characters.

Table names cannot begin with a numeric character.

Table names are case insensitive.

Table names must be from 3 to 63 characters long.

Some table names are reserved, including "tables." Attempting to create a table with a reserved table name returns error code 404 (bad request).

Table names preserve the case with which they were created, but are case insensitive when used.

Naming should be representative of its contents.

Location, Durability, and Redundancy

The location and durability of the storage accounts must also be taken into account. Durability and redundancy options also have an impact on the cost of the storage. When creating a storage account (either through the portal, Azure PowerShell, or REST APIs), you are required to specify an affinity group or a location.

  • Affinity groups allow you to group your Azure services to optimize performance. All services, storage accounts, and virtual machines within an affinity group will be located in the same Azure datacenter or region. An affinity group can improve service performance by locating computer workloads in the same Azure datacenter or region or near the target user audience. Also, no billing charges are incurred for egress when data in a storage account is accessed from another service that is part of the same affinity group.
  • Location refers to Azure region where the storage account will be deployed. Using a location instead of affinity groups will allow you to deploy different services in different locations independently, as needed.
  • Resource groups enable you to manage all your resources in an application or service deployed together in Microsoft Azure. Given the close relationship of Azure Storage to Azure Services, the alignment of Azure Storage accounts to associated resource groups is a key consideration when used.

To ensure that the Azure SLAs can be met, there are several levels of data replication available for the storage accounts:

  • Locally redundant storage (LRS) maintains three copies of your data. LRS is replicated three times within a single facility in a single region. LRS protects your data from normal hardware failures, but not from the failure of a single facility.
  • Zone-redundant storage (ZRS) maintains three copies of your data. ZRS is replicated three times across two to three facilities, either within a single region or across two regions, providing higher durability than LRS. ZRS ensures that your data is durable within a single region.
  • Geo-redundant storage (GRS) is enabled for your storage account by default when you create it. GRS maintains six copies of your data. With GRS, your data is replicated three times within the primary region. It is also replicated three times in a secondary region hundreds of miles away from the primary region. This provides the highest level of durability. In the event of a failure at the primary region, Azure Storage will fail over to the secondary region. GRS ensures that your data is durable in two separate regions.
  • Read-access geo-redundant storage (RA-GRS) allows you to have higher read availability for your storage account by providing Read-only access to the data replicated to the secondary location. When you enable this feature, the secondary location can be used to achieve higher availability in the event the data is not available in the primary region. Read-access geo-redundant storage is recommended for maximum availability and durability.

Feature References

Azure Storage Replication for Durability and High Availability

http://azure.microsoft.com/en-us/documentation/articles/storage-introduction/#replication-for-durability-and-high-availability

Azure Storage Redundancy Options

http://azure.microsoft.com/en-us/documentation/articles/storage-redundancy/

Azure SLAs (including Storage)

http://azure.microsoft.com/en-us/support/legal/sla/

Azure Storage Pricing Guide

http://azure.microsoft.com/en-us/pricing/details/storage/

Using resource groups to manage your Azure resources

http://azure.microsoft.com/en-us/documentation/articles/azure-preview-portal-using-resource-groups

Mandatory:

  • ZRS is currently available only for block blobs. Consider the storage type when selecting the redundancy type.
  • When you create a GRS storage account, you select the primary region for the account. The secondary region is determined based on the primary region, and it cannot be changed.

Recommended: Not all storage services are available in all regions. Be sure to check the availability of the service you desire, in the region you desire, during the planning phase. (For example, premium storage is limited to only a few regions.) For more information, see Services by regions.

Optional:

  • GRS is recommended over ZRS or LRS for maximum durability. However, please note that there is a price difference amongst the different redundancy types.
  • You can change how your data is replicated after your storage account has been created, but note that you may incur an additional one-time data transfer cost if you switch from LRS to GRS or RA-GRS.

Design Guidance

When you design storage durability and redundancy, consider the following:

Capability Considerations

Capability Decision Points

Capability Models in Use

Each availability option provides a different level of data redundancy.

Carefully consider the level of redundancy that you may need with your data. Not all data needs the same level of redundancy (and redundancy costs).

Locally redundant storage (LRS) is less expensive than geographically redundant storage GRS, and it also offers higher throughput. If your application stores data that can be easily reconstructed, you may opt for LRS.

Some applications are restricted to replicating data only within a single region due to data governance or privacy requirements.

If your application has its own geo-replication strategy (for example, SQL AlwaysOn and Active Directory domain controllers), then it may not require GRS.

ZRS is currently available only for block blobs. Note that once you have created your storage account and selected zone-redundant replication, you cannot convert it to use to any other type of replication, or vice versa.

Locally redundant storage (LRS) provides economical local storage or data governance compliance.

Zone redundant storage (ZRS) provides an economical, yet higher durability option for block Blob storage.

Geographically redundant storage (GRS) provides protection against a major datacenter outage or disaster.

Read-access geographically redundant storage (RA-GRS) provides Read access to data during an outage, for maximum data availability and durability.

In general, plan to design to regional redundancy (GRS), unless the workload already accounts for it. In that case, there is no need to duplicate it. Also, see the previous section about premium storage for special considerations on premium storage redundancy options.

Plan with failure in mind.

Redundancy options are available not because failures may occur, but because they will occur. Accept that hardware failures are part of running hyper-scale datacenters, and plan that failures will occur through the use of available redundancy options.

Ensure applications and workloads have "retry options" for storage connection failures, in the event the storage becomes unavailable in the primary location. No code changes are required, but small latency may occur. Latency sensitive apps may also benefit from the use of cache.

Storage Security

When a storage account is created, only the owner of that account may access the blobs, tables, files, and queues within that account. There are several ways to grant and share access to storage accounts to other users. This section discusses some of the available options.

Access Control

When you create a storage account, Azure generates two storage access keys, which are used for authentication when the storage account is accessed. By providing two storage access keys, Azure enables you to regenerate the keys with no interruption to your storage service or access to that service.

One simple way to grant access to the storage account is to share that storage access key. However, if your service or application needs to make these resources available to other clients without sharing your access key, you have other options for permitting access:

  • You can set a container's permissions to permit anonymous Read access to the container and its blobs. This is for blobs only, not for tables or queues.
  • You can use a shared access signature, which enables you to delegate restricted access to a container, blob, table, or queue resource by specifying the interval for which the resources are available and the permissions that a client will have to it.
  • You can also use a stored access policy to manage shared access signatures for a container, blob, queue, or table. The stored access policy gives you an additional measure of control over your shared access signatures. You can use a stored access policy to change the start time, expiration time, or permissions for a signature, or to revoke it after it has been issued.

Authentication

Every request made to an Azure Storage account must be authenticated, unless it is an anonymous request against a public container or its blobs. There are two ways to authenticate a request against the storage accounts:

  • Use the shared key or shared key lite authentication schemes for the Blob, Queue, Table, and File services.
  • Create a shared access signature. A shared access signature includes the credentials required for authentication and the address of the resource being accessed. Because the shared access signature includes all data needed for authentication, it can be used to grant access to a Blob, Queue, or Table service, and it can be distributed separately from any code.

Feature References

Manage Access to Azure Storage Resources

http://azure.microsoft.com/en-us/documentation/articles/storage-manage-access-to-resources/

Authenticating Access to Your Azure Storage Account

https://msdn.microsoft.com/en-us/library/azure/hh225339.aspx

Authentication for the Azure Storage Services REST API reference

https://msdn.microsoft.com/library/azure/dd179428.aspx

Microsoft Azure Storage Explorers

http://blogs.msdn.com/b/windowsazurestorage/archive/2014/03/11/windows-azure-storage-explorers-2014.aspx

Constructing the Shared Access Signature URI

https://msdn.microsoft.com/en-us/library/azure/dn140255.aspx

Mandatory: You need the Azure Storage access key to access the storage account through any GUI tools, such as Azure Storage Explorer or any third-party tools.

Mandatory: The primary access key and secondary access key for storage accounts should be changed periodically to mitigate unauthorized access.

  • Ensure to update applications that use the key with the new key as you change each key.
  • Do not change both storage account access keys at the same time. This can result in loss of access by applications.

We suggest changing each key (primary or secondary) every 60 to 120 days. This allows for an ongoing monthly or quarterly key change cadence that affects only one key at a time.

Additional events that could cause you to regenerate keys include when a security incident occurs, if you fear compromise of storage account keys, or when key administrative personnel leave your organization.

Comparisons can be made within each organization regarding password changes for critical service accounts or credentials in Active Directory and other authentication systems.

As a general practice, Azure storage account and vault keys should follow similar practices and procedures currently established within the organization.

Recommended:

  • Only regenerate storage account keys if absolutely necessary. Regenerating your access keys affects virtual machines, media services, and any applications that are dependent on the storage account. All clients that use the access key to access the storage account must be updated to use the new key.
  • If you require more granular control over blob resources, or if you want to provide permissions for operations other than Read operations, you can use a shared access signature to make a resource accessible to users.

Optional: A container or blob can be made available for public
access by setting a container's permissions. A container, blob, queue, or table may be available for signed access via a shared access signature (a shared access signature is authenticated through a different mechanism).

Design Guidance

When you design storage security, consider the following:

Capability Considerations

Capability Decision Points

Capability Models in Use

Storage accounts can be created with internal (private) or external (public) access.

The decision about what type of access control to apply to the storage accounts depends entirely on the type of data stored in those accounts and how that data needs to be accessed and protected.

This is something that is unique to every customer in Azure. In general, the guidance is to always start with internal (private) access only, then find reasons (exceptions) why the data may need external (public) access.

Most companies do not need to have external access directly to their data.

Storage keys can be used to protect storage accounts against unauthorized usage.

Storage keys should be treated like highly privileged credentials (such as Domain Admin credentials). They should be limited to a few selected, trusted resources within the organization.

If you need to grant access to storage accounts without sharing the storage keys, there are other methods to accomplish this.

To permit access to storage resources without giving out your access keys, you can use a shared access signature. A shared access signature provides access to a resource in your account for an interval that you define and with the permissions that you specify.

If your service requires that you exercise more granular control over blob resources, or if you want to provide permissions for operations other than Read operations, you can use a shared access signature to make a resource accessible to users.

You can specify that a container should be public, in which case all Read operations in the container and any blobs within it are available for anonymous access.

An anonymous request does not need to be authenticated, so a user can perform the operation without providing account credentials.

Encryption

Client-side encryption for Microsoft Azure Storage contains new functionality to help developers encrypt their data inside client applications before uploading it to Azure Storage. The data also can be decrypted when it is downloading.

Client-side encryption also supports integration with Azure Key Vault to store and manage the encryption keys. The storage service never sees the keys and is incapable of decrypting the data. This gives you the most control you can have. It's also fully transparent so you can inspect exactly how the library is encrypting your data to ensure that it meets your standards.

Storage Protection

Service-Level Storage Protection

From a service-level perspective, Microsoft has a responsibility to protect stored data to mitigate threats related to physical drives within each Azure datacenter. Storage within Azure is exposed in one or more storage accounts within each Azure subscription.

Within a Microsoft Azure datacenter, storage accounts do not reside on a single disk. Rather, data is distributed across several disks in the form of extents within the Azure fabric. They are replicated within and across datacenters based on customer-selected preferences, such as locally redundant storage or geo-redundant storage.

Microsoft protects the data stored within each datacenter with a comprehensive set of controls in alignment with the security certifications outlined at the Azure Trust Center website.

Subscription-Level Storage Protection

From a subscription-level perspective, customers can additionally protect storage accounts within their Azure subscription to mitigate threats related to subscription administrators within their organization. Access to data found within each storage account is accessible to workloads in multiple ways: Queue, Table, Blob, and Files (SMB).

Each storage account has several layers of protection, including those that are provided by Microsoft and those that are controlled by the customer, using both Microsoft and third-party mechanisms.

Depending on the workload type and how it is accessed, data can be protected in the following ways:

Subscription-Level Workload Types and Storage

IaaS

IaaS workloads (virtual machines) contain their storage inside virtual hard disks (VHDs), which are stored as page blobs in one or more storage accounts. Some lift-and-shift virtual machine workloads access data over Files (SMB).

PaaS

PaaS workloads access storage by using one or more of the accessible methods outlined previously (Queues, Tables, and Blobs).

StorSimple

StorSimple appliances access storage over the storage account's REST API URL and encrypts storage.

Subscription-Level Protection Types

 

Data-At-Rest

Data-In-Transit

Data Access

IaaS

Performed by the customer by encrypting the virtual hard disk (VHD) files. Microsoft and third-party mechanisms are used.

Workloads (such as SQL Server) also support Transparent Data Encryption (TDE).

Technologies that assist with this are:

  • Key Vault
  • SQL Server Transparent Data Encryption
  • Azure Disk Encryption
  • Third-party virtual machine volume encryption

Performed by the customer by using transport encryption of traffic traversing exposed virtual machine network endpoints. Microsoft and third-party mechanisms are used.

Actions performed by Microsoft include disk encryption using BitLocker Drive Encryption for bulk import/export operations and encrypting traffic between Azure datacenters.

Technologies that assist with this are:

  • HTTPS/REST API
  • Azure endpoints
  • Azure Import/Export service

Performed by the customer by using native protections within the installed operating system to authenticate and authorize access to the virtual hard disk (VHD) data that is exposed through the operating system and published endpoints (for example, operating system file shares).

PaaS

Performed by the customer by encrypting data located in Queue, Table, and Blob storage. Uses Microsoft encryption mechanisms.

Technologies that assist with this are:

  • Key Vault
  • Client-Side Encryption (Preview)
  • Azure SQL Database Transparent Data Encryption (Preview)
  • Azure Table Encryption

Performed by the customer by using transport encryption of traffic traversing storage account network endpoints. Microsoft and third-party mechanisms are used.

Actions performed by Azure include encryption of traffic between Azure datacenters.

Technologies that assist with this are:

  • HTTPS/REST API
  • Storage account endpoints


Performed by the customer by using shared access keys and shared access signatures to provide access to data stored in Queue, Table, and Blob storage.

Technologies that assist with this are:

  • Shared access signatures
  • Storage account access keys
  • Storage account endpoints

StorSimple

Performed by the appliance using AES-256 encryption with Cipher Block Chaining (CBC) prior to saving to the mapped Azure storage account.

Performed by the customer by using transport encryption of traffic traversing exposed physical or virtual machine network endpoints. Performed by the appliance using SSL encryption.

Performed by the customer by using native protections within the installed operating system to authenticate and authorize access to attached StorSimple volumes.

Performed by the appliance by using authentication protocols (such as CHAP), ACLs, network access control, and Role-Based Access Control (RBAC).

Feature References

BitLocker Drive Encryption

https://technet.microsoft.com/en-us/library/cc732774.aspx

Key Vault

http://blogs.technet.com/b/kv/archive/2015/01/08/azure-key-vault-making-the-cloud-safer.aspx

http://azure.microsoft.com/en-us/services/key-vault

Azure Disk Encryption

https://channel9.msdn.com/Events/Ignite/2015/BRK3490

http://blogs.msdn.com/b/azuresecurity/archive/2015/05/11/azure-disk-encryption-management-for-windows-and-linux-virtual-machines.aspx

Third-Party virtual machine Volume Encryption

http://azure.microsoft.com/blog/2014/08/19/azure-virtual-machine-disk-encryption-using-cloudlink/

http://azure.microsoft.com/blog/2014/11/13/encrypting-azure-virtual-machines-with-cloudlink-securevm/

http://www.cloudlinktech.com/choose-your-cloud/microsoft-azure/

https://channel9.msdn.com/Blogs/AzurePartner/Guest-Post-CloudLink-Secures-Azure-VMs-via-BitLocker-and-Native-Linux-Encryption

Client-Side Encryption (Preview)

http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/29/microsoft-azure-storage-client-library-for-c-v1-0-0-general-availability.aspx
http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/29/getting-started-with-client-side-encryption-for-microsoft-azure-storage.aspx

https://azure.microsoft.com/en-us/documentation/articles/storage-client-side-encryption

Azure Import/Export Service

https://azure.microsoft.com/en-us/documentation/articles/storage-import-export-service/

http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/13/announcing-microsoft-azure-import-export-service-ga.aspx

Storage Account Access Keys

https://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account
http://blogs.msdn.com/b/mast/archive/2013/11/07/why-does-a-storage-account-have-two-access-keys.aspx

Shared Access Signatures

https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-shared-access-signature-part-1/ and https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-shared-access-signature-part-2/

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/12/introducing-table-sas-shared-access-signature-queue-sas-and-update-to-blob-sas.aspx

http://blogs.msdn.com/b/skaufman/archive/2012/10/15/blob-storage-and-shared-access-signatures.aspx

SQL Server Transparent Data Encryption

https://msdn.microsoft.com/en-us/library/bb934049.aspx

http://blogs.msdn.com/b/sqlsecurity/archive/2015/04/29/announcing-transparent-data-encryption-for-azure-sql-database.aspx

https://channel9.msdn.com/Shows/Data-Exposed/TDE-in-Azure-SQL-Database

http://azure.microsoft.com/blog/2013/08/01/using-microsoft-sql-server-security-features-in-windows-azure-virtual-machines/

StorSimple Security

http://www.storsimple.com/Portals/65157/docs/StorSimple-Solution-Overview-Security.pdf

http://download.microsoft.com/download/6/9/A/69A81B3A-E111-4797-AD31-02671D501D87/StorSimple_Security_Brief.pdf

Storage for IaaS

A Microsoft Azure virtual machine is created from an image or a disk. All virtual machines use one operating system disk, a temporary local disk, and they enable the use of multiple data disks depending on the selected size of the virtual machine. All images and disks, except the temporary local disk, are created from virtual hard disk (VHD) files that are stored as page blobs in a storage account in Microsoft Azure.

You can use platform images that are available in Microsoft Azure to create virtual machines, or you can upload your own images to create customized virtual machines. The disks that are created from images are also stored in Azure Storage.

Disks

Disks can be leveraged in different ways with a virtual machine in Microsoft Azure. An operating system disk is a VHD that you use to provide an operating system for a virtual machine. A data disk is a VHD that you attach to a virtual machine to store application data. You can create and delete data disks whenever you have to.

Temporary Disk

Each virtual machine that you create has a temporary local disk, which is labeled as drive D by default. This disk exists only on the physical host server on which the virtual machine is running. It is not stored in blobs in Azure Storage. This disk is used by applications and processes that are running in the virtual machine for transient and temporary storage of data. It is used also to store page files for the operating system.

Caching

The operating system disk and data disk each have a host caching configuration setting called Host cache preference, which can improve performance under some circumstances. By default, Read/Write caching is enabled for operating system disks and all caching is off for data disks. Note that some workloads have specific configuration requirements with this setting. Its use should be reviewed carefully with vendor and for a workload's specific needs.

Images

An image is a VHD file (.vhd) that you can use as a template to create a new virtual machine. You can use images from the Azure Image Gallery, or you can create and upload your own custom images. To create a Windows Server image, you must run the Sysprep command on your server to generalize and shut it down before you can upload the .vhd file that contains the operating system.

VHD Files

A .vhd file is stored as a page blob in Microsoft Azure Storage, and it can be used to create images and operating system disks or data disks in Microsoft Azure. You can upload a .vhd file to Microsoft Azure and manage it as you would any other page blob. The .vhd files can be copied, moved, or deleted if a lease does not exist on the VHD (for example, it belongs to an existing virtual machine).

A VHD can be in a fixed format or a dynamic format. Currently, however, only the fixed format is supported in Microsoft Azure. Often, the fixed format wastes space because most disks contain large unused ranges. However, in Microsoft Azure, fixed VHD files are stored in a sparse format, so you receive the benefits of fixed and dynamic disks at the same time.

When you create a virtual machine from an image, a disk is created for the virtual machine, which is a copy of the original VHD file. To protect against accidental deletion, a lease is created if you create an image, an operating system disk, or a data disk from a VHD file.

Feature References

Disks and Images in Azure

https://msdn.microsoft.com/en-us/library/azure/jj672979.aspx

Virtual Machine Disks in Azure

https://msdn.microsoft.com/en-us/library/azure/dn790303.aspx

Virtual Machine Images in Azure

https://msdn.microsoft.com/en-us/library/azure/dn790290.aspx

VHDs in Azure

https://msdn.microsoft.com/en-us/library/azure/dn790344.aspx

Manage Images using Windows PowerShell

https://msdn.microsoft.com/en-us/library/azure/dn790330.aspx

How To Change the Drive Letter of the Windows Temporary Disk

http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-change-drive-letter

Mandatory:

  • Note that Microsoft Azure virtual machines do not currently
    support the VHDX format. Ensure that any virtual machines or images that are planned for use in or migration to Microsoft Azure use the VHD format.
  • Note that if a VHD file on-premises is a dynamic disk, it is converted to a fixed disk when it is uploaded to Microsoft Azure.
  • Do not store data on the temporary disk. This disk provides temporary storage for applications and processes, and it is intended to only store transient data such as page or swap files. No data on the temporary disk will persist a host-machine failure or any other operation that requires moving the virtual machine to another piece of hardware.

Recommended: You can read or write to a single blob at up to a maximum of 60 MB/second (this is approximately 480 Mbps, which exceeds the capabilities of many client-side networks (including the physical network adapter on the client device).

In addition, a single blob supports up to 500 requests per second. If you have multiple clients that need to read the same blob, and you might exceed these limits, you should consider using a content delivery network (CDN) for distributing the blob.

Optional: Different virtual machine sizes allow for a different number of data disks to be attached. Be sure to choose the appropriate size virtual machine, based on the number of data disks that you may anticipate needing. For example, a size A1 virtual machine can have a maximum of two data disks. If you need more than two data disks, choose something bigger than an A1.

Design Guidance

When you design storage for IaaS, consider the following:

Capability Considerations

Capability Decision Points

Capability Models in Use

The IaaS design will depend heavily on the storage account design.

For IaaS workloads, it is important to first understand the I/O (or IOPS) and the profile of a workload to determine the stress and expectations it will put on the storage accounts. Based on this information, you can determine how VHDs should be stored in storage accounts and what kind of limitations you will be subject to.

The primary constraining factor is the number of VHD files that can be in each storage account.

For virtual machines in the Basic tier, do not place more than 66 highly used VHDs in a storage account to avoid the 20,000 total request rate limit (20,000/300).

For virtual machines in the Standard tier, do not place more than 40 highly used VHDs in a storage account (20,000/500). The term highly used refers to VHDs that push the upper limits of the total request rates. If you have VHDs that are not highly used and do not come close to the maximum request rates, you can put more VHDs in the storage account.

Note that this refers to virtual hard disks, not virtual machines. Virtual machines may indeed contain multiple virtual hard disks.

Deployable virtual machine images also reside in storage accounts.

When uploading and deploying images, a storage account must be used to house those images. The decision comes to which storage account should be used for images versus live virtual machines.

We recommend that all custom virtual machine images are stored in a separate dedicated storage account, from which deployments can occur.

This keeps images separate from live virtual machines and prevents them from usurping any IOPS. Deployment can occur by copying an image from one storage account to another, thus keeping the images isolated and protected. This also allows you to give special permissions to the images storage account that you might not grant to the live virtual machines storage account (such as permissions for image deployment engineers).

Also, never deploy an image across a VPN connection. Always maintain the source image in an Azure Storage account. This will provide for a much faster deployment, instead of pushing it across the VPN each time.

Choosing which storage account to deploy a virtual machine into is not a permanent decision.

If necessary, it is possible to migrate a virtual machine from one storage account to another. For detailed procedures, see Migrate Azure Virtual Machines between Storage Accounts.

Migrating a virtual machine to another storage account should be done on an as-needed basis. If this is needed more frequently, we recommend that you re-examine the storage account architecture to ensure adequate coverage for all deployment points.

Consider what to do if multiple data disks are needed in a single (striped) volume.

If multiple data disks need to appear as a single volume within a virtual machine, you are limited to using LRS only (you cannot use GRS for those VHDs).

If multiple data disks need to appear as a single volume, it is not possible to enable this in a storage account that is configured with GRS.

Those VHDs must be stored in a storage account configured as LRS, or if GRS is still a requirement, each data disk must be kept as a separate volume.

Data loss may occur if you use striped volumes (Windows or Linux) in geo-replicated storage accounts due to loose consistency for VHDs distributed across storage accounts.

If a storage outage occurs and it requires restoring data from a replicated copy, there is no guarantee that the Write order of the striped disk set would be intact after it is restored.

Disk cache settings have an effect on the performance of the virtual machine disks.

The operating system disk and the data disk have a host caching setting that can improve performance under some circumstances. However, these settings can also negatively affect performance in other circumstances, depending on the application.

Host caching is off by default for Read and Write operations for data disks. Host-caching is on by default for Read and Write operations for operating system disks.

Only change these settings if the workload would benefit from the change in cache to improve performance.

Cache setting changes for the operating system disk require a reboot. Cache setting changes for a data disk do not.

Storage Management

Data in Azure Storage can be accessed and managed in a variety of ways, and through numerous tools and process. This section will cover the various mechanisms that support managing Microsoft Azure Storage.

Graphical User Interface (GUI) Tools

Graphical user interface (GUI)-based tools are those that access Azure Storage in an interface that mimics File Explorer. This includes functionality such as drag-and-drop-based tools that allow you to view and access data as you would on a local or network drive on a server. These tools are easy to use and understand, and they are the best option for those who are new to Azure Storage.

Command-Line Interface (CLI) Tools

Command-line interface (CLI)-based tools are those that access Azure Storage from a command line, such as Azure PowerShell. This allows you to include data operations (such as move, copy, and delete) within automation scripts. These interfaces have many options and switches to allow for a variety of options in working with the data. They are best used by advanced users and those who are already familiar with Windows PowerShell and require automation as part of their Azure-based solution.

REST Interfaces

The REST APIs for the Azure Storage services offer programmatic access to the Blob, Queue, Table, and File services in Azure, or in the development environment, via the storage emulator.

All storage services are accessible via REST APIs. Storage services can be accessed from within a service running in Azure, or directly over the Internet from any application that can send an HTTP/HTTPS request and receive an HTTP/HTTPS response. These interfaces are best suited for developers or solutions that require detailed information or control over Azure Storage services.

Client Libraries

The Azure Storage Client Library reference for .NET contains the current version of the Storage Client Library for .NET. You can install the Storage Client Library for .NET from NuGet or from the Azure SDK for .NET. The source code for the Storage Client Library for .NET is publicly available in GitHub. The Azure Storage Native Client Library is a C++ library for working with the Azure Storage services.

Cross-Platform Options

The Azure Cross-Platform Command-Line Interface (xplat-cli) provides a set of open source, cross-platform commands for working with the Azure platform. The xplat-cli provides much of the same functionality found in the Azure portal, such as the ability to manage websites, virtual machines, storage, and SQL Databases. The xplat-cli is written in JavaScript, and it requires Node.js.

Vendor Solutions

Other vendors are free to distribute their Azure Storage management tools, in addition to what Microsoft provides.

Storage Emulator

The Azure Storage Emulator provides a local environment that emulates the Azure Blob, Queue, and Table services for development purposes. By using the storage emulator, you can test your application against the storage services locally, without incurring any costs.

Data Migration

After the suitability of the data has been determined, there are multiple methods to move data to Azure and manage its lifecycle:

  • StorSimple (outlined in the following section) can be used for unstructured data that is typically stored on traditional file servers.
  • The Azure Import/Export service is used to transfer large amounts of data to Azure Blob storage in situations where uploading data over the network is not feasible. You can also use this service to transfer a large amount of data from Blob storage to your on-premises storage without the network transfer time and network egress costs. Data is stored on physical hard drives, encrypted with BitLocker, and physically shipped to an Azure datacenter.

For smaller amounts of data, a manual copy using a GUI tool or a CLI program can accomplish the data move. AzCopy is perhaps the most popular choice to move small amounts of data to and from Azure Storage accounts. For more information, see Getting Started with the AzCopy Command-Line Utility.

Feature References

Storage Services REST API Reference

https://msdn.microsoft.com/en-us/library/azure/dd179355.aspx

Storage Client Library Reference

https://msdn.microsoft.com/en-us/library/azure/dn261237.aspx

Import/Export Service REST API Reference

https://msdn.microsoft.com/en-us/library/azure/dn529096.aspx

Azure GUI Storage Explorers

Link

Using the Azure Cross-Platform Command-Line Interface

http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-command-line-tools/#Commands_to_manage_your_Storage_objects

Install and Configure the Azure Cross-Platform Command-Line Interface

http://azure.microsoft.com/en-us/documentation/articles/xplat-cli

Use the Microsoft Azure Import/Export Service to Transfer Data to Blob Storage

http://azure.microsoft.com/en-us/documentation/articles/storage-import-export-service

Using the Azure Storage Emulator

https://azure.microsoft.com/en-us/documentation/articles/storage-use-emulator/

Azure Throughput Analyzer

http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-013d927e15a7/default.aspx

Mandatory:

  • When you import publish settings, the information for accessing your Azure subscription is stored in a ".azure" directory located in your user directory. Your user directory is protected by your operating system; however, it is recommended that you take additional steps to encrypt your user directory to protect that information. Products for encryption include BitLocker (Windows), FileVault (Mac), and Encrypted Home (Ubuntu or other equivalent systems).
  • When using the Import/Export service, the data on the drive must be encrypted by using BitLocker prior to creating the job and transferring data. This protects your data while it is in transit. For an export job, the Import/Export service encrypts your data before shipping the drive back to you. Additional details about the Import/Export service and its operation are provided in the previous Feature References table.
  • You must provide your shipping tracking number to the Azure Import/Export service; otherwise, your job cannot be processed. Additional details about the Import/Export service and its operation are provided in the previous Feature Reference table.

Recommended: Azure Storage supports HTTP and HTTPS; however, using HTTPS is highly recommended..

Optional: During shipping, physical media may need to cross international borders. You are responsible for ensuring that your physical media and data are imported and exported in accordance with the applicable laws. Before shipping the physical media, check with your advisors to verify that your media and data legally can be shipped to the identified datacenter. This helps ensure that it reaches Microsoft in a timely manner.

Design Guidance

When you design storage management, consider the following:

Capability Considerations

Capability Decision Points

Capability Models in Use

There is a need to determine whether it is appropriate to store the data in Azure Storage.

When moving data into Azure Storage (aside from IaaS VHD files), it is important to examine and determine the suitability of placing that data into Azure. There are three primary factors to examine to determine if data should move to Azure Storage:

  • Access frequency – How often is the data is accessed? Data that is accessed on a frequent or regular basis may not be a good candidate to store in the cloud. Data that is accessed less frequently, such as archive data, is better suited for the cloud.
  • Active data vs. passive data – How much data latency is tolerable? When data that is stored in the cloud needs to be accessed, additional latency should be expected when reading or writing that data. Applications or services that cannot handle latency should not store its data in the cloud. For example, do not install a SQL Server on-premises and store its databases in cloud storage.
  • Privacy requirements –What kind of data is permitted in cloud services for your organization? Based on what country, industry, and customer you are working with, there may be unique data privacy rules that you must take into consideration. These data privacy rules may govern what type of data is allowed to be stored in the cloud. For a good reference, see International Privacy Laws.

Suitability is something that is unique to every customer. It is important to first understand the data privacy laws that may apply to the customer's country, industry, and any regulatory bodies that may govern it.

From there, begin to understand the sensitivity of the data that may be stored. Then look at the technical aspects of access frequency and active vs. passive data.

It doesn't make sense to look at the technical aspects of the data before looking at the legal aspects of storing the data in public cloud storage.

Storage Monitoring

Monitoring

Monitoring your Azure Storage environment is as important as monitoring your on-premises storage environment. Within your storage service, the following areas need to be monitored: service health, availability, performance, and capacity.

Monitoring Service Health

You can use the Azure portal to view the health of the storage service and other Azure services in all Azure regions. This is where you can see if there are any issues outside of your control that maybe affecting your storage service.

Monitoring Availability

You should monitor the availability of the storage services in your storage account by monitoring the value in the Availability column in the hourly or minute metrics tables. The availability of your storage should be at 100%. If not, you need to identify what is causing degradation in your storage.

Monitoring Performance

There are multiple areas in your storage services that you should monitor for performance trends. Some of key areas to monitor include AverageE2ELatency, TotalIngress, and TotalEgress.

Additionally, it is important to consider monitoring for storage account throttling. Throttling is the mechanism that Azure uses to limit the IOPS allocated to a given Azure storage account (currently 20,000 IOPS). When this amount is exceeded, Azure implements throttling to ensure that limitations in the service are preserved.

Although it is somewhat unlikely in well-planned Azure environments, monitoring of the Throttling Error and Throttling Error Percentage metrics is an effective mechanism to identify when throttling events occur in the service. This operation is outlined in the following article: How to Monitor for Storage Account Throttling.

Optional: You should continuously monitor your Azure applications to ensure they are healthy and performing as expected.

Alerting

You can configure alerting for different services, including Azure Storage. When you configure alerting, you have the option of having those alerts emailed to the Co-administrators for the subscription.

An alert rule enables you to monitor an Azure service based on a metric value that is set by your organization. When the value that is configured for a specific metric is reached and the threshold is assigned for a rule, the alert rule becomes active and registers an alert. This alert is then logged in the system.

Diagnosing Azure Storage Issues

When there are problems with your storage, there are a number of ways that you may become aware of these issues. These include:

  • A major failure that causes the application to crash or to stop working.
  • Significant changes from baseline values in the metrics you are monitoring as described in the previous section.
  • Reports from users of your application that some particular operation didn't complete as expected or that some feature is not working.
  • Errors generated within your application that appear in log files or through some other notification method.

Typically, issues related to Azure Storage services fall into one of the following four broad categories:

  • Your application has a performance issue – This problem is reported by your users or revealed by changes in the performance metrics.
  • There is a problem with the Azure Storage infrastructure in one or more regions.
  • Your application is encountering an error - This problem is reported by your users or revealed by an increase in one of the error count metrics you monitor.
  • Storage emulator-related issues - During development and test you may be using the local storage emulator, and you may encounter some issues that relate specifically to usage of the storage emulator.

Troubleshooting

There are some common issues that you may encounter from your Azure Storage services that you will need to troubleshoot. These issues include:

  • Performance of the storage services
  • Availability of the storage services

To troubleshoot applications that use Azure Storage, you can use a combination of tools to determine when an issue has occurred and what the cause of the problem may be. These tools include:

  • Azure Storage Analytics provides metrics and logging for Azure Storage.
  • Storage metrics tracks transaction metrics and capacity metrics for your storage account. By using metrics, you can determine how your application is performing according to a variety of different measures.
  • Storage logging logs each request to the Azure Storage services to a server-side log. The log tracks detailed data for each request, including the operation performed, the status of the operation, and latency information.
  • The Azure Management portal allows you to configure metrics and logging for your storage accounts. You can also view charts and graphs that show how your application is performing over time, and configure alerts in the portal to notify you if your application performs differently than expected for a specified metric.
  • Server logs for Azure Storage are stored as blobs, so you can use AzCopy to copy the log blobs to a local directory for analysis using Microsoft Message Analyzer.
  • Microsoft Message Analyzer is a tool that consumes log files and displays log data in a visual format that makes it easy to filter, search, and group log data into useful sets that you can use to analyze errors and performance issues.

Azure Portal Options

The Storage Metrics feature is available in the Azure portal to help you monitor your storage performance. Storage Metrics can be thought of as an equivalent to Windows Performance Monitor counters in the Microsoft Azure service.

A comprehensive set of metrics (counters) enable the ability to see data from services, such as the percentage of successful or failed service requests and the service's availability. The following image shows the monitoring page in the Azure portal where you can view metrics such as total request, success percentage (Blob), success percentage (Table), and availability.


System Center Operations Manager Management Packs

Microsoft System Center Operations Manager allows for monitoring Azure Storage by utilizing Management Packs. The management pack for Microsoft Azure enables monitoring the availability and performance of Azure Fabric resources that are running on Microsoft Azure.

The management pack runs on a specified server pool and then uses various Microsoft Azure APIs to remotely discover and collect instrumentation information about a specified Microsoft Azure resource, such as a cloud service, storage, or virtual machines.

Feature References

Monitor a Storage Account in the Azure Management Portal

http://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/

Monitor, diagnose, and troubleshoot Microsoft Azure Storage

http://azure.microsoft.com/en-us/documentation/articles/storage-monitoring-diagnosing-troubleshooting/

End-to-End Troubleshooting using Azure Storage Metrics and Logging, AzCopy, and Message Analyzer

http://azure.microsoft.com/en-us/documentation/articles/storage-e2e-troubleshooting

Storage Analytics

http://azure.microsoft.com/en-us/documentation/articles/storage-analytics

How to: Receive Alert Notifications and Manage Alert Rules in Azure

https://msdn.microsoft.com/en-us/library/azure/dn306638.aspx

Understanding Monitoring Alerts and Notifications in Azure

https://msdn.microsoft.com/en-us/library/azure/dn306639.aspx

Azure Storage Analytics Metrics Management Pack

http://blogs.technet.com/b/omx/archive/2014/08/15/azure-storage-analytics-metrics-management-pack.aspx

System Center Management Pack for Microsoft Azure

http://www.microsoft.com/en-us/download/details.aspx?id=38414

How to Monitor for Storage Account Throttling

http://blogs.msdn.com/b/mast/archive/2014/08/02/how-to-monitor-for-storage-account-throttling.aspx

Mandatory:

  • Storage Analytics is not enabled by default so you need to enable this feature for any services that you want to monitor.
  • You can create up to 10 alert rules for each Azure subscription. If you reach the maximum number of allowable rules, you must remove an existing rule before you can create another.
  • The Azure File service does not currently support Storage Analytics.
  • Currently, capacity metrics are only available for the Blob service.

Recommended:

  • Consider costs when you select the metrics. There are transaction and egress costs associated with refreshing monitoring displays. Additional costs are associated with examining monitoring data in the Management portal.
  • As a user of Azure Storage services, you should continuously monitor the services your application uses for any unexpected changes in behavior (such as slower than usual response times). Also use logging to collect more detailed data and to analyze a problem in depth. The diagnostics information you obtain from monitoring and logging will help you determine the root cause of the issue your application encountered.

Optional:

  • It is not possible to set minute metrics by using the Azure portal. However, you can set minute metrics programmatically by using Azure PowerShell, or via the Azure Preview portal. We recommend that you set minute metrics for the purposes of testing and for investigating performance issues with your application. Note that the Azure portal cannot display minute metrics, only hourly metrics.
  • The Management Pack for Microsoft Azure provides no functionality for importing. For each Microsoft Azure subscription that contains Azure resources you want to monitor, you must configure discovery and monitoring. First use the Microsoft Azure wizard in the administration section of the Operations console, and then use the Microsoft Azure Monitoring template in the authoring section of the Operations console.

Design Guidance

When you design storage monitoring, consider the following:

Capability Considerations

Capability Decision Points

Capability Models in Use

Storage monitoring and analytics are not enabled by default. Monitoring would grant access to:

  • Metrics - Service health, capacity, availability and performance
  • Logs – Detailed logs of every operation to support troubleshooting

Because storage monitoring and analytics are not enabled by default, the decision must be about whether to enable them, based on why and what they will be used for.

It is possible to log not only storage metrics and performance information, but also authentication requests, anonymous requests, transaction metrics, and capacity metrics.

In most models where organizations are starting their initial use of Azure, it is wise to enabled storage monitoring and analytics to observe the data that is available during the collection process. In general, we recommend that you enable storage analytics at least:

  • During testing to establish baseline metrics.
  • In production to assist with troubleshooting.

Note that analytics and monitoring can be enabled or disabled at any time.

All metrics data is written by the services of a storage account. As a result, each Write operation performed by Storage Analytics is billable. The amount of storage used by metrics data is also billable.

Every request made to a storage account is billable or non-billable. Storage Analytics logs each individual request made to a service, including a status message that indicates how the request was handled.

Similarly, Storage Analytics stores metrics for a service and the API operations of that service, including the percentages and count of certain status messages.

Together, these features can help you analyze your billable requests, make improvements on your application, and diagnose issues with requests to your services.

When looking at Storage Analytics data, you can use the tables in the Storage Analytics Logged Operations and Status Messages areas to determine what requests are billable.

Then you can compare your logs and metrics data with the status messages to see if you were charged for a particular request.

You can also use the tables in this area to investigate availability for a storage service or individual API operation.

Cloud Integrated Storage

A Cloud Integrated Storage solution is one that uses a combination of on-premises storage and cloud storage. The purpose of cloud integrated storage is to take advantage of the lower cost of cloud storage (as compared to traditional on-premises SAN storage), but still connect to and manage it similarly to how you would treat on-premises storage.

StorSimple

Microsoft Azure StorSimple is a cloud integrated storage solution that manages storage tasks between on-premises devices and Azure cloud storage. Azure StorSimple is designed to reduce storage costs, simplify storage management, improve disaster recovery capability and efficiency, and provide data mobility. StorSimple has several components:

  • The Azure StorSimple device is an on-premises hybrid storage appliance that provides primary storage and iSCSI access to data stored elsewhere. It manages communication with cloud storage, and helps ensure the security and confidentiality of all data that is stored in the Microsoft Azure StorSimple system.
  • You can use Azure StorSimple to create a virtual device that replicates the architecture and capabilities of the actual hybrid storage device. The StorSimple Virtual Appliance runs on a single node in an Azure virtual machine. (A virtual device can only be created on an Azure virtual machine. You cannot create one on a StorSimple device or an on-premises server.)
  • You can use it to back up and clone data from your hosts. Note that data stored in Azure Storage by StorSimple is not directly accessible with the standard storage tools outlined previously. This is because it is encrypted and protected by the StorSimple devices that use it. Only StorSimple 8000 series devices can take advantage of the Microsoft StorSimple Virtual Appliance to provide on-premises-like access to data stored in Azure.
  • Azure StorSimple provides a web-based user interface (the StorSimple Manager Service) that enables you to centrally manage datacenter and cloud storage. This is built-in to the Azure portal.
  • The Azure StorSimple Snapshot Manager is an optional console that you can use to create application consistent, point-in-time, backup copies of local and cloud data.
  • The Azure StorSimple Adapter for SharePoint is an optional component that transparently extends StorSimple storage and data protection features to SharePoint server farms. The adapter works with a Remote Blob Storage (RBS) provider and the SQL Server RBS feature, allowing you to move blobs to a server that is backed up by the Microsoft Azure StorSimple system. StorSimple then stores the blob data locally or in the cloud, based on usage.

Feature References

StorSimple MSDN Reference Site

https://msdn.microsoft.com/en-us/library/azure/dn772442.aspx

StorSimple 8000 Series Chalktalk

https://www.youtube.com/watch?v=4MhJT5xrvQw

StorSimple Hybrid Cloud Storage Features and Benefits

http://www.microsoft.com/en-us/server-cloud/products/storsimple/overview.aspx

Hybrid Cloud Storage

http://www.controlyourstorage.com/

Mandatory:

  • StorSimple Update 1 provides a migration toolkit to enable customers to upgrade from a StorSimple 5000/7000 Series device to an 8000 Series device to take advantage of its improved feature offerings while maintaining their data in-place.
  • There are user name and password length requirements for the Snapshot Manager user name and password. Be sure to reference the latest device series documentation before you select a user name or password.
  • There are user name and password length requirements for the CHAP user name and password. Be sure to reference the latest device series documentation before selecting a user name or password.
  • The service data encryption key is generated only on the first device registered with the service. All subsequent devices that are registered with the service must use the same service data encryption key. It is very important to save this in a secure location. A copy of the service data encryption key should be stored in such a way that it can be accessed by an authorized person and can be easily communicated to the device administrator.

Recommended:

  • When registering the StorSimple device with the StorSimple manager, you are provided with a registration key. The service registration key is a long key that contains 100+ characters. We recommend that you copy the key and save it in a text file in a secure location so that you can use it to authorize additional devices as necessary.
  • When using the StorSimple Virtual Appliance, make sure that the virtual network is in the same region as the cloud storage accounts that you are going to be using with the virtual device.

Optional: Remote management is turned off by default. You can use the StorSimple Manager service to enable it. As a security best practice, remote access should be enabled only during the time period that it is actually needed.

Design Guidance

Configuration best practices are implemented during deployment and are specific to the way StorSimple is deployed. The following configuration recommendations are provided.

Area

Recommendation

High availability

Connect the power supply cables from both the power control modules. Connect each power supply cable to different power circuits in the datacenter. This prevents loss of power in the event that a single power circuit fails.

Connect all four network ports from each of the controllers to the respective network subnets. During the initial configuration of StorSimple, designate at least one port for use with the management interface and at least two ports for use with the iSCSI data interface. If eight network ports are not available, a fewer number could be used—albeit at a lowered level of redundancy.

When using multiple VLANs, disable iSCSI for the management (MGMT) interface.

When initially configuring the network interfaces for the appliance and completing the setup, it is required to configure the secondary DNS server. This provides high-availability if the primary DNS server goes down.

Microsoft Multipath I/O (MPIO) for iSCSI, should be installed and configured on the host to leverage the multiple data paths connected to StorSimple. Each iSCSI host should have more than one network adapter and be connected to more than one network switch for physical data path redundancy. The appropriate MPIO policies (for example, performance, round robin or failover) should be selected based on the desired behavior across the multiple network paths.

Configure StorSimple with 2-node file server cluster configurations. By removing single points of failure and building redundancy on the host, the entire solution becomes highly available.

Use Continuously Available (CA) shares, which became available in Windows Server 2012 (SMB 3.0) for high availability during failover of the storage controllers.

Access control

Always associate at least one Access Control Record (ACR) with a volume.

When assigning more than one ACR to a volume, care should be taken to ensure the combinations of the ACRs do not expose the volume in a way where it can be concurrently accessed by more than one non-clustered host. StorSimple is designed to display a pop-up warning message if multiple ACRs, which together expose the volume to more than one host, are assigned to a volume.

Storage accounts

Keep primary volume data at 30-40% and backup data at 60-70% of the total capacity for the storage account. This works out to be about 60-70 TB of primary data and 130-140 TB of backup data per storage account.

When calculating the amount of primary data, it is important to consider the change rate or growth rate of primary data. Factor that into the 60-70 TB limit for primary data.

Customers should create storage accounts with Geo-Redundant Storage (GRS) enabled. This option is included within the pricing agreement.

Customers should also choose storage accounts within the same subscription as the StorSimple Manager resource they have created. This simplifies workflows behind the scenes and provides a more streamlined experience.

Volume size

To minimize the processing overhead, create the volume size as close to the anticipated usage as possible. The volume size can be increased later to accommodate any increase in usage beyond what was originally anticipated if the host file system allows it.

If the dataset within a volume is not required, do not reformat a volume. If you need to reformat a volume, delete the original volume and create a new volume. This prevents unnecessary overhead from the UNMAP feature on the existing volume.

Create a volume size such that 80% of the volume is used for currently expected data consumption and 20% is available for future growth. For example, if you have a data set of 800 GB, create a volume that is 1 TB in size. As the volume approaches fullness, the volume size can be increased. We recommend that you increase the volume capacity when it reaches 95% full.

When configuring volumes that will be used on the StorSimple Virtual Appliance, (by failover or cloning), it is important to consider the size of the volumes that will be used on the StorSimple Virtual Appliance. The total capacity of the StorSimple Virtual Appliance has a maximum capacity of 30 TB. Therefore, all volumes that fail over or are cloned on the StorSimple virtual Appliance cannot exceed the 30 TB maximum capacity.

This capacity is the actual provisioned capacity. Therefore, volumes that are provisioned as 10 TB, but only have 2 TB of data on them, still account for 10 TBs of the 30 TB maximum capacity on the StorSimple Virtual Appliance.

We recommend that you size volumes so that a single volume does not exceed 30 TB or that all the volumes within the volume container do not exceed 30 TB. Volume containers are the unit of failover and all volumes within a volume container will fail over to a StorSimple Virtual Appliance during the failover process.

For volumes that will be used with the StorSimple Virtual Appliance, it is best to consider the use case. For disaster recovery use cases, smaller volumes are recommended with a size of 2-4 TB. This helps meet expected recovery point objective (RPO) and recovery time objectives (RTOs).

For development and test use cases, larger volumes can be used. However, we recommend that you not exceed a maximum volume size of 16 TB for volumes that will be cloned on the StorSimple Virtual Appliance.

Volume usage type

Select the correct volume usage type: primary or archive. Select primary volumes for data that is expected to be used, and select archive volumes for data that is not expected to be used or is already classified as archival data.

Although the tiering algorithm does not change for primary versus archive volumes, the block size is adjusted to expedite data movement to the cloud. The chunk size for primary volumes is 64 KB, and chunk size for archive volumes in 512 KB. With larger chunks sizes for archive volumes, when a chunk tiers to the cloud, more data is transferred to the cloud as compared to primary volumes with smaller chunk sizes.

NTFS allocation unit size

For all StorSimple volumes, regardless of the volume type, format with a 64 KB allocation unit size.

Always select quick formatting when formatting the volume. Windows Server 2012 and later uses GPT by default.

Do not use spanned volumes. They are not supported by StorSimple.

Index search and virus scan

Configure the index search or virus scan application to always perform incremental operations. An incremental operation means that only new data, which is most likely still on the local tiers of StorSimple, is operated on by way of the index search or virus scan. The data that has been tiered to the cloud is not accessed during the incremental operation.

Disable any automatically configured full-scan operations.

Ensure the correct search filters and settings are configured so that only the intended types of files are scanned. For example, image files (JPEG, GIF, and TIFF) and engineering drawings should not be scanned during the incremental or full index rebuild.

Virtual devices

Create the StorSimple Virtual Appliance in the same geographical region as the physical appliance that hosts the primary volumes that will be ported to the StorSimple Virtual Appliance. Specifically, the backups of the volumes need to have a storage account that is in the same geographical region as the storage account for the StorSimple Virtual Appliance. If the StorSimple is deployed with a storage account in a different geographical region than the primary volumes, a customer may see a degradation in performance.

Provision the StorSimple Virtual Appliance before it will actually be needed—especially for disaster recovery use cases. The StorSimple Virtual Appliance can be on standby without incurring additional costs. By having the StorSimple Virtual Appliance provisioned, the customer saves time when the actual workflow for porting data to the StorSimple Virtual Appliance needs to occur.

Design Guidance

Operational best practices are part of everyday (or ongoing) operations. The following operational guidance is provided:

Area

Recommendation

Network connectivity

The minimum Internet network bandwidth for StorSimple should be at least 20 Mbps at all times. This 20 Mbps bandwidth should be dedicated and not shared with any other applications. The optimal bandwidth for a customer deployment is dependent on the RTO.

The maximum Internet network latency between the on-premises appliance and Azure should not exceed a certain limit. The main variables that impact network latency are the distance from the datacenter and the internal network architecture, including the number of routers. Test operational best practices network latency with the solution and verify it does not exceed any maximum latency thresholds.

Configure the Quality of Service (QoS) templates for StorSimple to enable throttling the network throughput by the appliance and at different times of the day. QoS templates can be used very effectively in conjunction with backup schedules to leverage additional network bandwidth for cloud operations during off-peak hours.

Ensure connectivity to the Internet is available at all times. This recommendation also applies to use cases where a very small amount of data exists on StorSimple and the data has not been tiered to the cloud storage.

Although StorSimple can easily buffer temporary glitches in network connectivity, prolonged outages result in the storage becoming unavailable, and iSCSI error messages are returned to the application. StorSimple has a built-in alert mechanism that can be configured. It sends an alert message and displays the alert on the Web UI.

If the network infrastructure supports jumbo frames, they can be configured on the data ports that transmit the iSCSI traffic between the host servers and StorSimple. The jumbo frames should be enabled end-to-end for the network components. This includes all components that interface on the network from StorSimple, and then to the host.

The ports on StorSimple are configured to automatically accept jumbo frames if the network and the iSCSI initiators are set up to support them. Jumbo frames should not be configured on a network port that is connected to the Internet or a network adapter that is cloud-enabled on StorSimple.

StorSimple Virtual Appliance provisioning and sizing

Provisioning a Virtual Appliance may take several hours. Therefore, it is best to provision the Virtual Appliance and turn it off until it is needed.

Migrating data

Set up cloud snapshots that are taken frequently during data migration. When a cloud snapshot is taken, all data in the volume is copied to the cloud. This improves the overall performance of StorSimple when normal operations resume.

Before migrating the data, use a file classification tool like Microsoft File Server Resource Manager to gain insight on the amount, type, and location of the data on the original storage system—specifically the data that is not being accessed and should be tiered to the cloud (this data is generally referred to as cold data). Contact StorSimple Support for additional assistance with developing a data migration strategy.

Migrate the infrequently accessed (cold) data first, if possible, and then the frequently accessed (hot) data. Migration should be done during periods where there is minimal activity on the systems.

The time it takes to migrate data from the older storage system to StorSimple is dependent on the amount of data and any shared resources. In general terms, during a period of migration, a user can expect to see Write throughput between 20 and 100 MB/s depending on how full the solution is.

For a solution that has data on the solid-state drive (SSD) tier only, expected performance would be around 100 MB/s (up to 200 MB/s for deployments with MPIO).

For a solution that has SSD and hard disk drive (HDD) tiers that are relatively full and pushing data to the cloud, expected performance would be up to 20 MB/s, depending on Internet bandwidth available. Deployments configured with MPIO can expect to see higher Write throughput.

Data security and encryption

The backup XML configuration file should be stored outside of Azure. This can be on a passcode protected server or USB drive.

Use different storage accounts for different departments, projects, roles, and so on. There can be 64 storage accounts per StorSimple solution.

Monitoring, reporting, and system health

For ease of debugging, have a serial console connection from both controllers to a server that can be reached remotely.

Regularly check the health of the system to ensure that all hardware is functioning as expected. The health of the system components can be viewed from the StorSimple Manager.

Enable email alerts to be sent when there is a change in the system. This allows prompt notification to any changes that may impact or degrade the system.

StorSimple is architected with high availability in mind and to withstand a failure of a single hardware component (or in some cases, more than one component). We recommend that you correct any failures as soon as possible. Email alerts will expedite this process.

Monitor the system performance and ensure that there is consistent Read performance. Tiering data to the cloud is best for data that is not frequently accessed. However, if the appliance gets to a point where the working set of data is exceeding available on-premises storage capacity, there will be an impact to the performance. Contact StorSimple technical support if an impact to Read performance becomes inconsistent or reaches and unacceptable low threshold.

Data protection

Implement a data protection plan with three tiers of backup in mind:

  • Short term retention that provides high performance for backup and restore.
  • Medium term retention that provides medium performance for backup and restore.
  • Long term retention that provides slow performance for backup and restore.

Carefully consider the required RPO and RTO when selecting a combination of short, medium, and long term schedule policies to back up the data.

The backup retention period should always be set to the desired value to ensure that old backup copies are automatically deleted and the maximum number of backups per volume (256) is not exhausted.

If VSS-based application-level consistency is required for the application backup the backup should be scheduled by using the StorSimple Snapshot Manager. If only crash consistency is required for the backup, the backup can be scheduled by using the StorSimple Snapshot Manager or the StorSimple Manager.

When using the Azure StorSimple Snapshot Manager, make sure that the host server on which the StorSimple Snapshot Manager is running has sufficient processing power to initiate the backup and restore operation.

Microsoft Azure Compute: IaaS

Microsoft Azure provides a comprehensive platform of Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) capabilities and services that can support a wide range of customer applications and services. Cloud infrastructures can be comprised of on-premises customer-, partner-, or public-hosted cloud computing infrastructures that provide a range of capabilities for organizations to consume natively or across these models in a hybrid capacity.

Focusing on the public cloud, the primary difference between IaaS and PaaS constructs is the division of responsibility for common operational functions between the provider and the consumer. In Microsoft Azure, the Microsoft Corporation acts as the cloud provider, and the organization acts as one of many cloud consumers. This relationship is outlined at a high level in the following diagram.

The focus of this section is on the Microsoft IaaS capabilities, which in large part consist of storage, networking, backup and recovery, large scale computing, and traditional virtual machine deployments. The primary building block for IaaS solutions deployed on Microsoft Azure is Virtual Machines and this section explains how Virtual Machines can be used to build solutions for your environment.

Virtual Machines

At a high level, Azure Virtual Machines provides a traditional virtualized server infrastructure to deploy a given application or service. Typically, it includes a compute instance consisting of virtual CPUs (cores), virtual memory, a persistent operating disk, potential persistent data disks, and internal/external networking to allow the system to interact with other aspects of a customer's environment, application, or solution.

Azure Virtual Machines has several considerations as part of its design, including size, storage, placement, source images, and additional functionality, which can be provided through Microsoft and third-party add-ins. This section explains each of these areas to provide an overview, guidance, and potential design decisions that are required when implementing Azure Virtual Machines.

Feature References

Azure Virtual Machines documentation

http://azure.microsoft.com/en-us/documentation/services/virtual-machines/

Virtual Machine Sizes and Tiers

When deploying applications and solutions using Microsoft Azure Virtual Machines, there are various sizing configurations that are available to organizations. Virtual Machines are available in different sizing series (A, D, DS, and G series as examples). Within each sizing series there are incremental sizes (A0, A1, and so on) and different tiers (Standard and Basic).

The sizing and tiering options provide customers with a consistent set of compute sizing options, which expand as time goes on. From a sizing perspective, each sizing series represents various properties, such as:

  • Number of CPUs
  • Memory allocated to each virtual machine
  • Temporary local storage
  • Allocated bandwidth for the virtual machines
  • Maximum data disks

As outlined earlier, some virtual machine series includes the concept of Basic and Standard tiers. A Basic tier virtual machine is only available on A0-A4 instances, and a Standard tier virtual machine is available on all size instances. Virtual machines that are available in the Basic tier are provided at a reduced cost and carry slightly less functionality than those offered at the Standard tier. This includes the following areas:

Capability Consideration

Capability Decision Points

CPU

Standard tier virtual machines are expected to have slightly better CPU performance than Basic tier virtual machines

Disk

Data disk IOPS for Basic tier virtual machines is 300 IOPS, which is slightly lower than Standard tier virtual machines (which have 500 IOPS data disks).

Features

Basic tier virtual machines do not support features such as load balancing or auto-scaling.

The following table is provided to illustrate a summary of key decision points when using Basic tier virtual machines:

Size

Available CPU Cores

Available Memory

Available Disk Sizes

Maximum Data Disks

Maximum IOPS

Basic_A0 –
Basic_A4

1 – 8

768 MB –

14 GB

Operating system = 1023 GB

Temporary = 20 - 240 GB

1 - 16

300 IOPS per disk

In comparison, Standard tier virtual machines are available for all compute sizes.

Capability Consideration

Capability Decision Points

CPU

Standard tier virtual machines have better CPU performance than Basic tier virtual machines.

Disk

Data disk IOPS for Basic tier virtual machines is 500. (This is higher than Basic tier virtual machines, which have 300 IOPS data disks.) If the DS series is selected, IOPS start at 3200.

Availability

Standard tier virtual machines are available on all size instances.

A-Series features

  • Standard tier virtual machines include load balancing and auto-scaling.
  • For A8, A9, A10, and A11 instances, hardware is designed and optimized for compute and network intensive applications including high-performance computing (HPC) cluster applications, modeling, and simulations.
  • A8 and A9 instances have the ability to communicate over a low-latency, high-throughput network in Azure, which is based on remote direct memory access (RDMA) technology. This boosts performance for parallel Message Passing Interface (MPI) applications. (RDMA access is currently supported only for cloud services and Windows Server-based virtual machines.)
  • A10 and A11 instances are designed for HPC applications that do not require constant and low-latency communication between nodes (also known as parametric or embarrassingly parallel applications). The A10 and A11 instances have the same performance optimizations and specifications as the A8 and A9 instances. However, they do not include access to the RDMA network in Azure.

D-Series features

  • Standard tier virtual machines include load balancing and auto-scaling.
  • D-series virtual machines are designed to run applications that demand higher compute power and temporary disk performance. D-series virtual machines provide faster processors, a higher memory-to-core ratio, and a solid-state drive (SSD) for the temporary disk.

DS-Series features

  • Standard tier virtual machines include load balancing and auto-scaling.
  • DS-series virtual machines can use premium storage, which provides high-performance and low-latency storage for I/O intensive workloads. It uses solid-state drives (SSDs) to host a virtual machine's disks and offers a local SSD disk cache. Currently, premium storage is only available in certain regions.
  • The maximum input/output operations per second (IOPS) and throughput (bandwidth) possible with a DS series virtual machine is affected by the size of the disk.

G-Series features

  • Standard tier virtual machines include load balancing and auto-scaling.
  • Leverages local SSD disks to provide the highest performance virtual machine series that is available in Azure.

The following summary of the capabilities of each virtual machine series is provided in the following table:

Size

Available CPU Cores

Available Memory

Available Disk Sizes

Maximum Data Disks

Maximum IOPS

Basic_A0 –
Basic_A4

1 – 8

768 MB –

14 GB

Operating system = 1023 GB

Temporary = 20-240 GB

1 - 16

300 IOPS per disk

Standard_A0 – Standard_A11

(Includes compute intensive A8-11)

1 - 16

768 MB - 112 GB

Operating system = 1023 GB

Temporary = 20-382 GB

1 - 16

500 IOPS per disk

Standard_D1-D4
Standard_D11-D14

(High memory)

1 - 16

3.5 GB – 112 GB

Operating system = 1023 GB

Temporary (SSD) =50 – 800 GB

2 - 32

500 IOPS per disk

Standard_DS1-DS4
Standard_DS11-DS14

(Premium storage)

1 - 16

3.5 – 112 GB

Operating system = 1023 GB

Local SSD disk = 7 GB – 112 GB GB

2 - 32

43 – 576 GB cache size

3200-50000 IOPS total

Standard_G1 – G5

(High performance)

2 - 32

28 GB – 448 GB

Operating system = 1023 GB

Local SSD disk = 384 – 6,144 GB

4 - 64

500 IOPS per disk

These sizing and capabilities are for the current Preview of Azure Virtual Machines, and they might expand over time. For a complete list of size tables to help you configure your virtual machines, please see: Sizes for Virtual Machines.

Design Guidance

When you design solutions for using virtual machines, consider the following:

Capability Considerations

Capability Decision Points

Deployment order

If you intend to deploy an application that may require compute intensive resources, we recommend that you provision a virtual machine to a cloud service with the largest virtual machine (such as Standard_G5) and scale it down to a more appropriate size. The reason is that virtual machines will be placed on the clusters that have the faster processors. It also makes scaling easier and it is more efficient to combine resources.

Supportability

The following are not supported in a virtual machine on Microsoft Azure:

  • Multiple IP addresses
  • 32-bit applications

Virtual Machine Storage

With respect to IaaS solutions, images and disks that are used by Azure virtual machines are stored within virtual hard disks (VHDs). Azure virtual machines are compute instances that have VHDs attached. The VHDs provide persistent and temporary storage to the underlying operating system within the virtual machine.

Like other components of Azure, virtual machines require a storage account to store virtual machine data, which is in the form of VHDs. The VHD specification has several formats, including fixed, dynamic, and differencing. However, Azure supports only the fixed VHD format. VHDs are stored as page blobs in the target storage account, and they can be accessed through automation, the Azure API, or by the virtual machines themselves.

Feature References

About VHDs in Azure

https://msdn.microsoft.com/en-us/library/azure/dn790344.aspx

Exploring Azure Drives, Disks, and Images

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/28/exploring-windows-azure-drives-disks-and-images.aspx

How to Attach a Data Disk to a Windows Virtual Machine

http://azure.microsoft.com/en-us/documentation/articles/storage-windows-attach-disk/

Create and upload a Windows Server VHD to Azure

http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-create-upload-vhd-windows-server/

Sizes for Cloud Services

https://azure.microsoft.com/documentation/articles/cloud-services-sizes-specs/

Sizes for Virtual Machines

https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-size-specs/

Azure Subscription and Service Limits, Quotas, and Constraints

http://azure.microsoft.com/en-us/documentation/articles/azure-subscription-service-limits/

Basic Tier Virtual Machines

http://azure.microsoft.com/blog/2014/04/28/basic-tier-virtual-machines-2/

Azure Storage Scalability and Performance Targets

http://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/

Performance Best Practices for SQL Server in Azure Virtual Machines – I/O Performance Considerations

https://msdn.microsoft.com/en-us/library/azure/dn133149.aspx#io

Design Guidance

The following general considerations are provided when planning storage accounts for virtual machines:

Capability Considerations

Capability Decision Points

Storage considerations for virtual machines

  • Keep the storage account and virtual machines in the same region.
  • Disable Azure geo-replication on the storage account when the workload provides availability or if multiple disks are spanned together.
  • Avoid using the operating system or temporary disks for database storage or logging.
  • Avoid using Azure data disk caching options (caching policy = None) when it is not supported by the workload or solution running in the virtual machine.
  • Stripe multiple Azure data disks to get increased IO throughput.
  • Format with documented allocation sizes.
  • Separate data and log file I/O paths to obtain dedicated IOPs for data and logs.
  • Move all data-related files to data disks, including system databases.
  • Back up directly to Blob storage when possible.

Scalability and storage throttling

Microsoft provides general guidance about the number of virtual hard disks that should reside in a storage account. The following guidance is provided with the assumption that at any point all virtual machines could consume the available IOPS for all assigned disks, which could result in storage account throttling and have an adverse impact on the workload running within the virtual machine.

The following guidance is provided:

  • For virtual machines in the Basic tier, do not place more than 66 highly used VHDs in a storage account to avoid the 20,000 total request rate limit (20,000/300).
  • For virtual machines in the Standard tier, do not place more than 40 highly used VHDs in a storage account (20,000/500).

For more information, please see Sizes for Cloud Services, Sizes for Virtual Machines.

Virtual Machine Placement

The placement of virtual machines within a given Azure subscription is critical in multiple ways. It is important to consider the consumers of the services that are associated with each virtual machine. It is equally important to understand the relationships each virtual machine has between the Azure resources these compute instances consume. Azure provides several constructs that support effectively placing virtual machines and associated resources within the Azure infrastructure.

Affinity groups tell the Fabric Controller to logically group dependent items together, such as the compute and storage of a given virtual machine. When the Fabric Controller is searching for the best suited container, it chooses where it can deploy these two elements in the same cluster, thereby reducing latency and increasing performance.

Affinity groups provide the following:

  • Aggregation – Affinity groups aggregate items, such as compute and storage services, and provide the Azure Fabric Controller the information needed for them to be kept in the same Azure datacenter and cluster.
  • Reduce latency - Affinity groups provide information to the Fabric Controller about resources that should be kept together, and the result is reduced latency between components.
  • Lower costs – When servers are assigned to affinity groups, these services are placed in the same cluster, reducing intercommunication between resources and potentially reducing costs associated with interconnectivity.

A Resource Group is a unit of management for operations such as deployments, updates, and standard lifecycle operations across a number of different Azure services, including virtual machines. A resource group provides:

  • A single grouping of resources (for example, metering, billing, or quota management)
  • Lifecycle management (deployment, update, delete, status)
  • The ability to assign administrative control (RBAC permissions)

Resource Manager enables the creation of reusable deployment templates that declaratively describe the resources that make up your application (for example, a website and a SQL Database). In essence, it provides an environment to handle the infrastructure and configuration information as code.

Resource Manager provides the following:

  • An application lifecycle container - Deploy and manage your application as you see fit.
  • A declarative solution for deployment and configuration - Deploy multiple instantiations of your application with a single click.

A consistent management layer - Get the same experience of deployment and management whether you work from the portal, command line, or tools in Azure.

Feature References

Importance of Azure Affinity Groups

http://social.technet.microsoft.com/wiki/contents/articles/7916.importance-of-windows-azure-affinity-groups.aspx

About Regional Virtual network and Affinity Groups

https://azure.microsoft.com/documentation/articles/virtual-networks-migrate-to-regional-virtual network/

Role-based access control in the Microsoft Azure portal

http://azure.microsoft.com/en-us/documentation/articles/role-based-access-control-configure/

Using Azure PowerShell with Azure Resource Manager

http://azure.microsoft.com/en-us/documentation/articles/powershell-azure-resource-manager/

Azure Resource Groups

http://azure.microsoft.com/en-us/documentation/articles/azure-preview-portal-using-resource-groups/

Importance of Azure Affinity Groups

http://social.technet.microsoft.com/wiki/contents/articles/7916.importance-of-windows-azure-affinity-groups.aspx

Virtual Machine Availability

It is important to tell the Azure Fabric Controller which resources must be aligned and placed near one another for performance, management, and so on. However, it is critical that Azure be informed of systems that must be placed across mutually exclusive boundaries to ensure that the availability of a given service that spans multiple virtual machines is maintained and not interrupted by planned maintenance activities within Azure.

Azure natively understands the tiers in a PaaS application; and thus, it can properly distribute them across fault and update domains. In contrast, the tiers in an IaaS application must be manually defined using Availability Sets. To meet a given SLA, availability sets are required when building IaaS solutions using Azure virtual machines.

Placing virtual machines in an availability set tells the Fabric Controller in Azure to place the virtual machines in separate fault domains. This ultimately provides redundancy for the services provided by the virtual machines when both systems are responsible for the same tier of service in an application. This is illustrated in the following diagram:

As illustrated, availability sets ensure that all instances of each tier have hardware redundancy by distributing them across fault domains, and are not taken down during an update.

Virtual Machine Load Balancing

If the virtual machines should have traffic distributed across them, you must group the virtual machines in a cloud service, and load balance them across a specific TCP or UDP endpoint. For more information, see Load Balancing Virtual Machines.

If the virtual machines receive input from another source (such as a queuing mechanism), a load balancer is not required. The load balancer uses a basic health check to determine if traffic should be sent to the node. It is also possible to create your own probes to implement application-specific health metrics that determine if the virtual machine should receive traffic. Load balancing mechanisms and constructs within Microsoft Azure are covered in the Networking section of this document.

Virtual Machine Gallery Items and Images

The Azure Gallery contains a library of images that are provided by Microsoft and Microsoft partners, and they can be used to create IaaS virtual machines. Custom images that you upload to your Azure subscription are also available in the Azure Gallery. This section outlines each of the image types and how they can be utilized.

Image Families

Image families are virtual hard disks (VHDs) that are managed and supported by Azure. In some instances, these VHDs may include preinstalled software and configuration. A few examples are a SQL Database, SharePoint, and BizTalk.

The goal of the image families is to make it easier for you to deploy an application into the Azure environment. These image families are updated once a month, going back as far back as two months.

Partner Images

Partner images are VHDs that were uploaded by partners so that their applications can be consumed by Azure customers. Virtual machines that are deployed by using partner images are not deployed on the same clusters as other virtual machine workloads. Examples partners that provide images include Oracle and Puppet Labs.

Latest Images

Azure actively maintains the images that are part of the Azure Gallery. In some instances, you may want to get the latest image. By default, the latest image is chosen when you deploy a virtual machine by using the portal. However, if you would like to use Azure PowerShell, please use the following code snippet:

$image_name
= (Get-AzureVMImage | Where { $_.ImageFamily -eq
$ImageFamily } | sort
PublishedDate
-Descending | Select-Object
-First 1).ImageName

Customized Images

For various reasons, many customers would like to upload their own images as opposed to using the images provided by Azure. Reasons range from internal security standards to cost, especially in the scenario of licensing.

A good example is for a SQL server. You may want to leverage a SQL server license in the cloud as opposed to paying the additional cost for the SQL Database license that is provided by Azure. In this scenario, you may want to upload your image and have it available as a gallery item for tenants in your company to consume. To upload a customized image, see Create and upload a Windows Server VHD to Azure..

Feature References

Virtual Machine Images in Azure

https://msdn.microsoft.com/en-us/library/azure/dn790290.aspx

Preparing SQL Image

https://msdn.microsoft.com/en-us/library/ee210664.aspx

Virtual Machine Extensions

Azure Virtual Machines extensions are built by Microsoft and trusted third-party providers to enable security, runtime, debugging, management, and other features that you can take advantage of to increase your productivity. This section describes the various features that Azure Virtual Machines extensions provide for Windows and Linux virtual machines, and it points to documentation for each operating system.

Virtual Machines extensions implement most of the critical functionality that you want to use with your virtual machines, including basic functionality such as resetting passwords and configuring Remote Desktop Protocol (RDP). Because new extensions are added all the time, the number of possible features that your virtual machines support in Azure continues to increase.

By default, several basic Virtual Machines extensions are installed when you create your virtual machine from the image gallery, including IaaSDiagnostics and BGInfo (currently available for Windows virtual machines only), and VMAccess. However, not all extensions are implemented on both Windows and Linux operating systems at any specific time. This is due to the constant flow of feature updates and new extensions.

Virtual Machines extensions provide dynamic features that Microsoft and third-parties provide. The agent and extensions are added primarily through the Management portal, but you can also use the following options to add and configure extensions when you create a virtual machine or for existing virtual machines:

Extensions include support for Remote Debugging in Visual Studio, System Center 2012, Microsoft Azure Diagnostics, and Docker (to name a few).

Recommended: Evaluate each Virtual Machines extension and define which of them will be used as a standard for all of your Azure Virtual Machines. For example, you may standardize one antivirus extension to use in addition to the Azure PowerShell DSC extension.

Connectivity and Basic Management

The Azure Virtual Machine Agent (VM Agent) is used to install, configure, manage, and run Azure Virtual Machines extensions. The following extensions are critical for enabling, re-enabling, or disabling basic connectivity with your virtual machines after they are created and running.

VM Extension Name

Feature Description

More Information

VMAccessAgent (Windows)

VMAccessForLinux (Linux)

Create, update, and reset user information and RDP and SSH connection configurations.

Windows

Linux

Deployment and Configuration Management

The Azure CustomScript extension automatically runs a specified script or a set of scripts on a running virtual machine after they are created and running.

Name

Custom Script Extension

Description

Execution of Windows PowerShell on the target resource

Applicability

Provisioning in Azure:

  • Post virtual machine installations
  • Multiple point of integration (for example, Azure, virtual machines, or Active Directory)

Pros

  • Developed in Windows PowerShell, code can be easily edited and ported into Azure Automation
  • Reproducible artifact that can be used multiple times
  • Can be executed remotely
  • Extensibility can be integrated with multiple systems (such as Azure, virtual machines, Active Directory, or System Center)
  • Can be integrated with the Azure virtual machine provisioning process

Cons

  • Involves a lot of Windows PowerShell code development
  • Though it is extensible, may not be the best tool for all scenarios
  • Only for IaaS-based and Web Roles scenarios

Supported operating systems

Windows, Linux

The extensions detailed in the following table support different kinds of deployment and configuration management scenarios and features.

Name

Chef

Description

With Chef, you can automate how you build, deploy, and manage your infrastructure. Your infrastructure becomes as versionable, testable, and repeatable as application code.

For more information, see Get Chef.

Applicability

Provisioning in Azure:

Infrastructure as a Service (IaaS)

Pros

  • Extensible and can be implemented many ways.
  • Leveraged mainly for post virtual machine provisioning tasks, updates, and software deployment.
  • Integrated Windows PowerShell DSC

Cons

  • Involves knowledge about the product
  • Though it is extensible, may not be the best tool for all scenarios
  • Recipes may pose potential limitations
  • Only for IaaS-based scenarios
  • Potential license cost

Supported operating systems

Windows, Linux

Name

Puppet Enterprise

Description

With Puppet Enterprise, you can easily configure and manage your Windows environments. Whether you are managing a large datacenter, are taking advantage of Microsoft Azure, or a combination of both, Puppet Enterprise lets you manage your Microsoft Windows machines faster than ever.

For more information, see Puppet Labs.

Applicability

Provisioning in Azure:

Infrastructure as a Service (IaaS)

Pros

  • Extensible and can be implemented many ways
  • Leveraged mainly for post virtual machine provisioning tasks, updates, and software deployment
  • .NET integration
  • Native integration with Open Source repositories

Cons

  • Involves knowledge of the product
  • Though it is extensible, may not be the best tool for all scenarios
  • Only for IaaS-based scenarios
  • Potential license cost

Supported operating systems

Windows, Linux

Name

Windows PowerShell DSC

Description

Desired State Configuration (DSC) is a management platform in Windows PowerShell that enables deploying and managing configuration data for software services and managing the environment in which these services run.

For more information, see Windows PowerShell Desired State Configuration Overview.

Applicability

Provisioning in Azure:

Infrastructure as a Service (IaaS)

Pros

  • Extensible and can be implemented many ways
  • Leveraged mainly for post virtual machine provisioning tasks and desired state configurations
  • Integrated with release management so it can perform application deployments

Cons

  • Involves knowledge of Windows PowerShell DSC
  • Though it is extensible, may not be the best tool for all scenarios
  • Only for IaaS-based scenarios
  • Can be challenging to implement based on requirements

Supported operating systems

Windows, Linux

For more information, see Installing and configuring DSC for Linux

Name

System Center 2012 R2 Virtual Machine Role Authoring Guide

Description

Implements features for support by System Center.

For more information, see System Center 2012 R2 Virtual Machine Role Authoring Guide - Resource Extension Package

Applicability

Provisioning in Azure:

Infrastructure as a Service (IaaS)

Pros

  • Extensible and can be implemented many ways
  • Leveraged mainly for post virtual machine provisioning tasks, updates, and software deployment

Cons

  • Involves knowledge of Virtual Machine Manager (VMM)
  • Though it is extensible, may not be the best tool for all scenarios
  • Only for IaaS-based scenarios

Supported operating systems

Windows

Security and Protection

The extensions in this section provide critical security features for your Azure Virtual Machines.

Virtual Machines Extension Name

Feature Description

More Information

CloudLinkSecureVMWindowsAgent

Provides Azure customers with the capability to encrypt their virtual machine data on a multitenant, shared infrastructure, and fully control the encryption keys for their encrypted data in Azure Storage

Securing Microsoft Azure Virtual Machines leveraging BitLocker and Native operating system encryption

McAfeeEndpointSecurity

Protects your virtual machine against malicious software

McAfee

TrendMicroDSA

Enables Trend Micro Deep Security platform support to provide intrusion detection and prevention, firewall, antimalware, web reputation, log inspection, and integrity monitoring

How to install and configure Trend Micro Deep Security as a Service on an Azure VM

PortalProtectExtension

Guards against threats to your Microsoft SharePoint environment

Securing Your SharePoint Deployment on Azure

IaaSAntimalware

Microsoft Antimalware for Azure Cloud Services and Virtual Machines is a real-time protection capability that helps identify and remove viruses, spyware, and other malicious software, with configurable alerts when known malicious or unwanted software attempts to install itself or run on your system.

Download antimalware documentation

SymantecEndpointProtection

Symantec Endpoint Protection 12.1.4 enables security and performance across physical and virtual systems

How to install and configure Symantec Endpoint Protection on an Azure VM

Virtual Machine Operations and Management

Virtual Machines Extension Name

Feature Description

More Information

IaaSDiagnostics

Enables, disables, and configures Azure Diagnostics, and is also used by the AzureCATExtensionHandler to support SAP monitoring

Microsoft Azure Virtual Machine Monitoring with Azure Diagnostics Extension

OSPatchingForLinux

  • Enables Azure Virtual Machines administrators to automate operating system updates with customized configurations.

You can use the OSPatching extension to configure operating system updates for your virtual machines, including:

  • Specify how often and when to install operating system patches
  • Specify what patches to install
  • Configure the reboot behavior after updates

Operating System Patching Extension Blog Post

See also the Readme and source on Github at Operating System Patching Extension.

Cloud Services

An Azure cloud service is a compute capability within Microsoft Azure that is available to IaaS and specific PaaS workloads. From an IaaS perspective, Azure cloud services leverage virtual machines to provide a unit of access through public endpoints, load balancing, and scalability through auto-scale capabilities. This relationship is illustrated in the following conceptual diagram:

The following diagram shows a visual comparison between leveraging virtual machines and native PaaS capabilities within Azure Cloud Services:

Cloud Services Load Balancing

Load balancing cloud services can be managed between and within each deployed cloud service. To load balance network traffic between deployed cloud services, Azure Traffic Manager can provide redundant and performant paths to the publicly routable virtual IP that is used by the systems within the cloud service.

Azure Traffic Manager provides control over the distribution of network traffic to public Internet endpoints. Traffic Manager works by applying an intelligent policy engine to Domain Name System (DNS) queries for the domain names of your Internet resources. Azure Traffic Manager uses three load-balancing methods to distribute traffic:

  • Failover: Use this method when you want to use a primary endpoint for all traffic, but provide backups in case the primary becomes unavailable.
  • Performance: Use this method when you have endpoints in different geographic locations, and you want requesting clients to use the "closest" endpoint in terms of the lowest latency.
  • Round robin: Use this method when you want to distribute load across a set of cloud services in the same datacenter or across cloud services or websites in different datacenters.

For more information, see Traffic Manager routing methods.

The following image shows an example of the round robin load-balancing method for distributing traffic between different cloud services.

To load balance network traffic across systems deployed within cloud services, the Azure Load Balancer can be used. Virtual machines in the same cloud service or virtual network can communicate with each other directly by using their private IP addresses. Computers and services outside the cloud service or virtual network can only communicate with virtual machines in a cloud service or virtual network with a configured endpoint.

An endpoint is a mapping of a public IP address and port to that private IP address and port of a virtual machine or web role within an Azure cloud service. The Azure Load Balancer randomly distributes a specific type of incoming traffic across multiple virtual machines or services in a configuration known as a load-balanced set.

The following image shows a load-balanced endpoint for standard (unencrypted) web traffic that is shared among three virtual machines for the public and private TCP port of 80. These three virtual machines are configured in a load-balanced set.

By default, a cloud service has a single public facing virtual IP (VIP) address that is assigned an IP address from the Azure IPv4 public address space. Each endpoint uses the VIP for the address component and a unique port. It is possible to add additional public facing VIPs to a cloud service load balancer to support the ability to have endpoints with different IP addresses but the same port.

Azure can also load balance within a cloud service or virtual network by using the internal load balancer. The internal load balancer can be used in the following ways:

  • To balance loads between servers in different tiers of a multitier application (for example, between web and database tiers).
  • To balance loads for line-of-business (LOB) applications hosted in Azure without requiring additional load-balancer hardware or software.
  • To include on-premises servers in the set of computers with traffic that is load balanced.

Internal load balancing is also facilitated by configuring an internal load-balanced set.

The following figure shows an example of an internal load-balanced endpoint for an LOB application that is shared among three virtual machines in a cross-premises virtual network.

Feature References

Cloud Services

https://azure.microsoft.com/en-us/documentation/services/cloud-services/

Multiple VIPs per Cloud Service

https://azure.microsoft.com/en-us/documentation/articles/load-balancer-multivip/

Azure Load Balancer

https://azure.microsoft.com/en-us/documentation/articles/load-balancer-internet-overview/

Internal Load Balancer

https://azure.microsoft.com/en-us/documentation/articles/load-balancer-internal-overview/

Azure RemoteApp

Azure RemoteApp is a service that runs on Microsoft's Azure fabric. It provides an environment for Windows applications to be remotely accessed over the Internet. This environment is scalable to accommodate the end-user demand.

Azure RemoteApp technology expands on the native Windows on-premises service to provide a secure remote connection to applications hosted in Azure. Azure RemoteApp enables remote LOB applications to appear like they are running on the end user's local computer.

RemoteApp uses Microsoft Remote Desktop Protocol (RDP) and RemoteFX. RDP is a WAN optimized protocol to resist network latency and loss. RemoteFX provides a 3D virtual adapter for rendering images. Application delivery provides a highly reliable, fast, and consistent user experience to support content ranging from text to the streaming of multimedia via the Azure global network of datacenters.

Azure RemoteApp is available to run from the following supported end-user devices including:

  • Windows operating system
  • Windows RT
  • Mac operating system
  • iOS
  • Android operating system.

End-users can use the client-side software from their preferred devices to access the Azure RemoteApp programs. Azure RemoteApp provides users with 50 GB of persistent storage. This storage is protected by the fault tolerant nature of Azure Storage accounts.

To test Azure RemoteApp, see: Azure RemoteApp

On the integrated Azure RemoteApp menu, select an application (for example, Excel). The Connecting to dialog will start and you may be prompted for credentials depending on the deployment type.

After the authentication process is complete, the RemoteApp will launch, and the user will have remote access to the application.

RemoteApp is available in two deployment types, which are referred to as collections.

  • A cloud collection is hosted in and stores all data for the programs within the Azure cloud. End-users access apps by signing in with their Microsoft account, synchronized corporate credentials, or credentials that are federated with Azure Active Directory.
    The RemoteApp cloud collection offers a standalone way to host applications within Azure. A cloud collection exists only in the Azure cloud and cannot access the local on-premises network. Cloud collections support creating and sharing custom applications through the use of a custom template image for the application that is being published.
  • A hybrid collection is hosted in and stores data in the Azure cloud, but it allows end-users to access resources that are stored on an on-premises network. Users can access apps by signing in with their synchronized corporate credentials or credentials that are federated with Azure Active Directory.
    The hybrid RemoteApp collection provides a custom set of applications to end-users and access to resources that are stored on an on-premises network. Unlike a custom image that is used with the cloud collection, the image you create for a hybrid collection runs apps in a domain-joined environment, granting full access to the local network and resources.
    When integrating Active Directory with Azure Active Directory by using DirSync or Azure AD Connect, corporate policies can be used within Azure to control the applications being offered, and end-users can use Active Directory credentials to access the RemoteApp applications and resources.

The key differences between the hybrid and cloud collections are how the installation of software updates (patching) is handled. Cloud collection uses preinstalled images (from Office 365 or Office 2013), and the patching process is accomplished by Microsoft.

For both types of collections created from a custom template image, the subscription owner is responsible for managing the image and the applications. Domain-joined images can be managed by Windows Update, Group Policy, Desired State Configuration, or System Center Configuration Manager. After the updates to custom template image are applied, they are uploaded to Azure and the collections (hybrid or cloud) are updated to consume the new image.

Feature References

Introducing Microsoft Azure RemoteApp

http://blogs.msdn.com/b/rds/archive/2014/05/12/windows-apps-in-the-cloud-introducing-microsoft-azure-remoteapp.aspx

How to create a custom template image for RemoteApp

http://azure.microsoft.com/en-us/documentation/articles/remoteapp-create-cloud-deployment/

How to create a hybrid collection of RemoteApp

http://azure.microsoft.com/en-us/documentation/articles/remoteapp-create-hybrid-deployment/

How does licensing work in RemoteApp?

http://azure.microsoft.com/en-us/documentation/articles/remoteapp-licensing/

Best practices for using Azure RemoteApp

http://azure.microsoft.com/en-us/documentation/articles/remoteapp-bestpractices/

Azure RemoteApp FAQ

http://azure.microsoft.com/en-us/documentation/articles/remoteapp-faq/

IaaS Considerations

There are several considerations when deploying IaaS solutions within Microsoft Azure. Deployment considerations include cost, load balancing, resiliency, security, networking, and disaster recovery. Although not exhaustive, this section explores many of these considerations at a high level.

Cost

Cost is one of the top considerations for most organizations consuming services from Microsoft Azure. Being able to develop a predictable consumption model is key for the success of any solution deployed in Azure. The following table itemizes cost factors that you should consider:

Considerations

Decision Points

The size and number of virtual machines

Windows Server licensing costs may be included. Compute hours don't include any Azure Storage costs that are associated with the Windows Server image running in virtual machines. These costs are billed separately.

Azure Storage requirements

Charges apply for Azure Storage costs that are required for virtual machines.

Azure Virtual Network

Charges apply for the creation of a virtual private network (VPN) connection between a virtual network and your VPN gateway. The charge is for each hour that the VPN connection is provisioned and available (referred to as a VPN connection hour). The connection should be 24 hours a day, seven days a week. All data transferred over the VPN connection is charged separately at the Azure standard data transfer rates.

Network traffic

Outbound data is charged based on the total amount of data moving out of the Azure datacenters through the Internet in a given billing cycle. This applies to any traffic, including traffic that traverses the VPN tunnel. In this document, outbound directory synchronization traffic is expected to represent the most significant portion of the network traffic, depending on the amount of directory changes.

Support

Azure offers flexible support options for organizations of all sizes. Enterprises that deploy business-critical applications in Azure should consider additional support options.

Load Balancing

Customers deploying applications in Azure Virtual Machines must consider load balancing their virtual machines. This is for application deployments that require more than one server. For customers wanting to use on-premises load balancing, this configuration is not supported today with Azure Virtual Machines. When considering load balancing in Azure Virtual Machines, note that Azure Virtual Machines currently only supports a round robin load-balancing configuration.

There are two levels of load balancing available for Azure infrastructure services:

  • DNS level: Load balancing for traffic to different cloud services located in different datacenters, to different Azure websites located in different datacenters, or to external endpoints. This is done with Traffic Manager and the round robin load balancing method.

Network level: Load balancing of incoming Internet traffic to different virtual machines of a cloud service, or load balancing of traffic between virtual machines in a cloud service or virtual network. This is done with the Azure Load Balancer.

Feature References

Load Balancing for Azure Infrastructure Services

http://www.windowsazure.com/en-us/manage/windows/common-tasks/how-to-load-balance-virtual-machines/

About Traffic Manager Load Balancing Methods

http://azure.microsoft.com/documentation/articles/traffic-manager-load-balancing-methods

Internal load balancing

http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview

Encryption

A key consideration for workloads deployed in Azure virtual machines is encryption for data-at-rest. For virtual machines, most customers seek the ability to perform platform encryption that they have the ability to control.

Currently, Microsoft BitLocker Drive Encryption is not supported because there is no way for Azure to handle the key management portion during virtual machine startup. Given that Azure consists of multiple physical servers, there is not a simple way to manage BitLocker encryption keys.

Third parties, such as CloudLink, have the capability to manage disk encryption keys on Windows and Linux platforms. You can use CloudLink to support encrypting virtual hard disks that are attached to virtual machines and that use published virtual machine extensions. Additional details about CloudLink are provided in the following table.

Feature References

Azure Virtual Machine Disk Encryption using CloudLink

http://azure.microsoft.com/blog/2014/08/19/azure-virtual-machine-disk-encryption-using-cloudlink/

Encrypting Azure Virtual Machines with CloudLink SecureVM

http://azure.microsoft.com/blog/2014/11/13/encrypting-azure-virtual-machines-with-cloudlink-securevm/

Networking

When The following table itemizes what to consider when you are deciding how to provision virtual machines on a virtual network:

Considerations

Decision Points

Name resolution

When you deploy virtual machines and cloud services to a virtual network you can use Azure-provided name resolution or your own DNS solution, depending on your name resolution requirements.

Enhanced security and isolation

Because each virtual network is run as an overlay, only virtual machines and services that are part of the same network can access each other. Services outside the virtual network have no way to identify or connect to services hosted within virtual networks. This provides an added layer of isolation to your services.

Extended connectivity boundary

The virtual network extends the connectivity boundary from a single service to the virtual network boundary. You can create several cloud services and virtual machines within a single virtual network and have them communicate with each other without having to go through the Internet. You can also set up services that use a common back-end database tier or use a shared management service.

Extend your on-premises network to the cloud

You can join virtual machines in Azure to your domain running on-premises. You can access and leverage all on-premises investments for monitoring and identity for your services hosted in Azure.

Use persistent private IP addresses

Virtual machines within a virtual network will have a stable private IP address. We assign an IP address from the address range you specify and offer an infinite DHCP lease on it. You can also choose to configure your virtual machine with a specific private IP address from the address range when you create it. This ensures that your virtual machine retains its private IP address even when it is stopped or deallocated.

For more information, see Configure a static internal IP address for a VM.

There are two models for network configurations for Azure virtual machines: cloud-only and cloud-premises:

  • Cloud-only virtual network configurations are virtual networks that don't use a virtual network gateway to connect back to your on-premises network or directly to another Azure virtual network. They aren't really a different type of virtual network, but rather, they are a way to configure a virtual network without configuring cross-premises connectivity. You connect to the virtual machines and cloud services from the endpoints, rather than through a VPN connection. For cloud-only configurations, see How to create a virtual network.
  • Cross-premises connections offer an enormous amount of flexibility. You can create multisite configurations, virtual network to virtual network configurations, ExpressRoute connections, and combinations of multiple configuration types. If you are extending your on-premises network to the cloud, this is the way to do it.
    Most cross-premises connections involve using a VPN device to create a secure connection to your Azure virtual network. If you prefer, you can create an ExpressRoute direct connection to Azure through your network service provider or exchange provider and bypass the public Internet altogether.

Feature References

About Virtual Network Secure Cross-Premises Connectivity

https://msdn.microsoft.com/en-us/library/azure/dn133798.aspx

Limitations

Although the capabilities of Azure Virtual Machines are quite comprehensive, some native limitations exist, and they should be understood by organizations prior to deploying solutions in Azure. The following table explores these limitations.

Limitation

Impact

Workaround

Auto-scaling

The application environment does not automatically increase or decrease role instances for increase or decrease in load.

  • Someone needs to manually monitor the load.
  • Sudden increase in load will impact the performance.

Utilize monitoring and automation capabilities such as the Azure Monitoring Agent and Azure Automation to dynamically scale and deploy application code to virtual machine instances in the environment.

Load balancing

Virtual machines are not load balanced by default

  • Azure Virtual Machines does not allow for elasticity of the application environment.
  • Sudden increase in load will impact the performance.

After the virtual machine is provisioned, create an Internal Load Balancer and associate it with the virtual machine.

Multiple network adapters

  • The current release does not support adding or removing network adapters after a virtual machine is created.
  • Network adapters in Azure Virtual Machines cannot forward traffic or act as Layer 3 (IP) gateways.
  • Internet-facing VIP is only supported on the "default" network adapter, and there is only one VIP mapped to the IP of the default network adapter. The additional network adapters cannot be used in a load-balance set.
  • The order of the network adapters inside the virtual machine will be random, but the IP addresses and the corresponding MAC addresses will remain the same.
  • You cannot apply Network Security Groups or Forced Tunneling to the non-default network adapters.

For more information, see:

Multiple virtual machine network adapters and network virtual appliances in Azure

 

Density

The total virtual machines per virtual network currently is 2048.

Create a new virtual network and extend the network by connecting virtual networks together.

Concurrent TCP connections

Concurrent TCP connections for a virtual machine or role instance = 500 K.

 

Static IP address or multiple IP address

  • Cannot assign static IP Addresses on a virtual machine instance
  • Cannot assign multiple IP Address on a virtual machine instance
 

Management Considerations for Azure IaaS Virtual Machines

Azure Diagnostics provides Azure extensions that enable you to collect diagnostic telemetry data from a worker role, web role, or virtual machine running in Azure. The telemetry data is stored in an Azure Storage account. It can be used for debugging and troubleshooting, measuring performance, monitoring resource usage, traffic analysis and capacity planning, and auditing.

The following table explains the types of telemetry Azure Diagnostics can collect.

Data Source

Description

IIS logs

Information about IIS websites

Azure Diagnostic infrastructure logs

Information about diagnostics

IIS failed request logs

Information about failed requests to an IIS site or application

Windows Event logs

Information sent to the Windows event logging system

Performance counters

Operating system and custom performance counters

Crash dumps

Information about the state of the process in the event of an application crash

Custom error logs

Logs created by your application or service

.NET EventSource

Events generated by your code using the .NET EventSource class

Manifest-based ETW

Event Tracing for Windows (ETW) events generated by any process

Operational Insights is an analysis service that enables IT administrators to gain deep insight across on-premises and cloud environments. It enables you to interact with real-time and historical machine data to rapidly develop custom insights, and provides Microsoft and community-developed patterns for analyzing data.

For more information about these topics, please refer to the Cloud Platform Integration Framework section later in this document.

Microsoft Azure Compute: PaaS

Azure Platform-as-a-Service (PaaS) workloads share some common elements with IaaS, but they also have some key differences that should be considered when they are deployed. This service has been a part of the Azure offering since its inception, and in many ways is a desirable service to realize the true value of cloud computing.

A primary goal of PaaS is to remove the need to manage the underlying virtual machines. This allows customers to focus on the real value of the application, which is the functionality that it provides, not the underlying operating system or virtual machine.

PaaS provides great value in that management duties are significantly smaller for most organizations. The ability for Microsoft to maintain the operating system and virtual machines, and keep them patched with the latest security updates is a key differentiator to many cloud solutions in place today.

Another key benefit for targeting PaaS for applications and services is the dynamic scaling features that it affords. A side benefit of not managing the underlying virtual machines is the ability to scale the workloads to upper limits without any preplanning. New instances can be created and destroyed by the Azure platform and controlled by the customer. The real value of auto-scaling is in full effect with PaaS.

The integration of application deployment and release management into the service offering makes PaaS very desirable for customers looking to automate and orchestrate deployment of their application. Every application that gets deployed to Azure is a self-contained, packaged asset. This package is simply deployed to a virtual machine that is provisioned by the platform based on a configuration that the customer provides.

This makes automated and continuous integration of application code a real option. Partnered with the concept of deployment slots to allow VIP swapping makes deployments to the cloud a more predictable and safer deployment model. In addition, rolling back to a snapshot is possible with these options.

Feature References

Cloud Services Explained

http://azure.microsoft.com/en-us/documentation/articles/fundamentals-application-models/#cloud-services

Websites explained

http://azure.microsoft.com/en-us/documentation/articles/fundamentals-application-models/#websites

Cloud Service details / architecture

https://msdn.microsoft.com/en-us/library/azure/jj155995.aspx

Large Scale Services in Azure

https://msdn.microsoft.com/en-us/library/azure/jj717232.aspx

Development Considerations

https://msdn.microsoft.com/en-us/library/azure/jj156146.aspx

Platform updates in PaaS

https://msdn.microsoft.com/en-us/library/azure/hh472157.aspx

Deploying Azure Cloud Service with Release Management

http://blogs.msdn.com/b/visualstudioalm/archive/2015/02/09/deploying-azure-cloud-service-using-release-management.aspx

Mandatory: Azure solutions must contain at least two instances if running web or worker roles. For apps (such as Web Apps), this is not a requirement because the design has inherit fault tolerance built in.

Recommended: Azure solutions should contain multiple upgrade domains to avoid outages caused by updates to the guest and host by the platform. This is a unique item that exists for PaaS services.

Optional: Azure solutions can optionally contain auto-scaling configurations to increase and decrease instance counts for the service, based on a schedule or metric.

Design Guidance

For more information, see these applicable Azure design patterns:

The common design patterns for PaaS workloads can be split into two primary categories:

  • Web-based workloads
  • Back-end processing workloads

Typically, web-based workloads for modern frameworks work on PaaS with few changes.

Web applications that use a framework prior to .NET Framework 4.0, usually require some code changes to fit with this cloud model. The key points to remember are whether there are extra components that need to be installed as part of the application, for example, custom ISAPI filters, drivers, or security models that require full trust. These can be adapted to PaaS web applications, but they require varying levels of changes.

The other very important point to remember when using both web and back-end workloads is that the application needs to be stateless. Applications that require additional components for state management and tight coupling of the tiers of the applications tend to have problems when using modern cloud scale models.

At a minimum, it's important to understand all the components that make up the application and the architecture for the data, business, and front-end tiers. Additionally, its key to understand if the deployment can be a file-based deployment and if the application is self-contained from a binary perspective.

Scenario

Model

Points to Consider

Web-based workload

Web-based applications

  • Software outside of web server needs to be installed
  • Startup tasks are needed

Back-end workload

Service-based applications

  • Avoid tight processing loops
  • Leverage competing consumer for multiple instances

Cloud Services

Azure Cloud services, in the context of PaaS, provides the units that contain the roles instances that comprise a given application. Azure Cloud Services bind to the virtual IP (VIP) that services request and load balance requests over underlying role instances. Azure Cloud Services can be considered a unit of deployment that can be versioned and stored. When you deploy a cloud service, it contains a package that defines the service (such as networking, load balancing, or role instance counts) in addition to the actual code for the application.

This model of deployment makes it very easy to control and deploy specific versions of an application. A cloud service can have multiple deployments running simultaneously. This is possible because of a concept of deployment slots that are implemented with cloud services. There are two deployment slots available for each cloud service.

The intention is for the staging slot to be used to stage new or updated versions of the cloud service, which are assessable to the deployment or DevOps teams for testing, and the production slot is used to host the production deployment of the application.

Cloud services also contain the binaries and scripts to install additional components to the PaaS instance at startup. These are necessary because the deployment that is running in Azure, will move inside the Azure datacenter. As updates are deployed to the host and guest operating systems, the PaaS instances will be moved to other hosts. This means that everything required to make the PaaS instance and application run is required to be a part of the cloud service package.

Feature References

Cloud services explained

https://msdn.microsoft.com/en-us/library/azure/jj155995.aspx

Startup tasks in cloud services

https://msdn.microsoft.com/en-us/library/azure/hh180155.aspx

Tools for packaging and deployment

https://msdn.microsoft.com/en-us/library/azure/gg433055.aspx

Manage guest operating system updates

https://msdn.microsoft.com/en-us/library/azure/ff729422.aspx

Recommended practices for large scale Web Apps

https://msdn.microsoft.com/en-us/library/azure/jj717232

Mandatory: Cloud services must contain all the assets, including code and other installations required to run the application. Everything must be included in the cloud service package, including scripts for installation.

Recommended: Give consideration to deployment models that will be used when updating the application. There are a few options to understand, and they each have pros and cons.

Optional: Cloud services can contain multiple running deployments in the form of production and testing or staging.

Design Guidance

It is best to understand that Azure Cloud Services, in the simplest form, provide a container or package wrapper for applications that are deployed to Azure. This type of application deployment model is not necessarily new. You will find similar models in client applications, such as the .appx format used by modern Windows applications.

The core idea is to build, version, and deploy the service package as a unit. This will make it easier for the DevOps or release management team to deploy updates to the application and to roll back if there are unforeseen side effects from an application update.

It is also important to realize that scaling Azure Cloud Services in the PaaS model is trivial. Because the application and service definition are wrapped in a package, deploying more instances of this is simply a matter of telling the Azure platform how many instances you want.

Web and Worker Roles

Application Type

Description

Web role

This role is used primarily to host and support applications that target IIS and ASP.NET. The role is provisioned with IIS installed, and it can be used to host front-end, web-based applications.

Worker role

This role is used primarily to host and support service applications. These applications target back-end processing workloads. They can be long running processes and can be thought of as providing services in the cloud.

It is important to remember that web and worker roles have dedicated underlying virtual machines per instance. Typically, this is transparent to the consumer, but it's particularly important from a diagnostics perspective. You can enable and log on to the underlying virtual machine if needed; however, this option is disabled by default.

Important things to keep in mind when deploying to the PaaS model are:

  • Additional components and software can be installed on the PaaS instances. This is accomplished via startup scripts.
  • There will most likely be some changes to the application when you migrate from an existing on-premises or private cloud deployment. At the very least, you will be repackaging the application in the cloud service.
  • It's important to consider session affinity and session state management when deploying applications to the cloud.
  • You should consider how upgrade domains are configured with the PaaS application. This affects how deployments are rolled out to PaaS workloads.

Feature References

Web and worker roles

https://msdn.microsoft.com/en-us/library/azure/hh180152.aspx

IIS configuration in PaaS

https://msdn.microsoft.com/en-us/library/azure/gg433059.aspx

Configure web role with multiple sites

https://msdn.microsoft.com/en-us/library/azure/gg433110.aspx

Mandatory: Web and worker roles require at least two instances to provide fault tolerance for the automatic maintenance nature of PaaS.

Recommended: Web and worker roles should be considered if the application requires installing binaries on the web or application servers.

Optional: Virtual networking is common to allow the communication needed for databases, management services, and other services, but it is not a hard requirement for deploying an application via PaaS to a web or worker role.

Design Guidance

Web roles are specifically tailored for IIS-based applications. This limits their use to Windows applications that can target the Microsoft operation system and services. The common design pattern is to configure the scale unit for the instances and ensure that multiple (at least 2) are used for production workloads. This is done by simply setting the configuration in the service definition.

Worker roles specifically target service applications (non-web based). As such, error handling that would be required for an out-of-band management service application should be employed. If exceptions are not handled in the service inner loop, the role instance will be restarted, which will result in downtime for processing.

Azure App Service

Previous PaaS applications called Azure Websites have been integrated into a model that is called Azure App Service. App Service is comprised of the following subcomponents:

  • Web Apps
  • Mobile Apps
  • API Apps
  • Logic Apps

Web Apps is the new term used to describe what was previously named Azure Websites. The Web Apps feature in Azure App Service is a type of PaaS workload that differs slightly from the traditional web and worker role applications. The model is based on decoupling from the underlying infrastructure—even more than traditional PaaS applications. This highly reduces the operational burden when maintaining applications because the maintenance is no longer required for the infrastructure, and it shifts to the underlying application.

This model primarily removes the customer from any connection with the underlying virtual machines that are hosting the application. This means components such as Remote Desktop are not an option and that the installation of components and software is not something a customer can directly execute.

There are extensions available via the Azure portal (Azure Marketplace), which are essentially packages of software that have been tested and can be added to a website deployed via Web Apps.

Web Apps are primarily used to provide a platform to host various web applications and web services. Additionally, Web Apps can run back-end processes via a service offering in Azure WebJobs.

WebJobs encapsulate an existing executable or script that provides some processing output. WebJobs can also be scheduled or run on demand. For more information about WebJobs, see Azure WebJobs documentation resources.

Deployment of Web Apps is in some ways different from other PaaS and IaaS deployment models. Supported deployment models include:

  • Manual – Use a file copy, FTP, or WebMatrix
  • Local Git – Use the Kudo environment
  • Continuous integration – Use GitHub or Team Foundation Version Control (TFVC)

There are some fundamental differences in deployment slots in Web Apps as compared with the web and worker role deployments. Web Apps supports up to five deployment slots.

Web Apps is deployed in an App Service plan, previously called a Web Hosting plan. The service plan represents a set of features and capacity that can be contained and shared with multiple Web Apps in an Azure App Service. The following pricing tiers are provided:

  • Free
  • Shared
  • Basic
  • Standard
  • Premium

For apps to share a hosting plan, they need to be in the same subscription and geographical location. In an Azure App Service, an app can be associated with only a single app hosting plan at one time.

Feature References

App Services explained

https://msdn.microsoft.com/en-us/library/azure/dn948515.aspx

App Services deep dive

http://channel9.msdn.com/Series/Windows-Azure-Web-Sites-Tutorials

App Service migration tools

https://www.movemetothecloud.net/

Mandatory: These lighter weight PaaS services do not allow direct access to the underlying virtual machines. This means no installation of components on the underlying web server (outside of the application folder).

Recommended: Match the service offering with the type of workload. API apps differ from Web Apps because one needs more focus on the back end and the other needs more focus on the front end.

Optional: Plan for capacity needs. Although some thought should be given to how many instances or sizes should be used, these can easily be changed later. The focus here is on rapid deployment.

Design Guidance

Azure App Service is one of the latest models to be employed on Azure. The idea is to simplify the management and cost of running a variety of services in PaaS. This means a service performance level can be set at the service level and then the various services can be deployed inside this service.

For example, a web app could be deployed that is using an API app or a Logic App, and the cost and performance levels are set at the service level. This simplifies the deployments because each app doesn't need to be configured and billed separately.

The app model is growing very fast, and it makes integrating deployed services, APIs, and applications much simpler and faster than previous PaaS models, such as web roles.

Azure SQL Database

Azure SQL Database is the realization of one of the most popular relational databases in a managed, multitenant, PaaS model. When choosing a database deployment model, there are key factors to consider to ensure the end goal is met. Although this is attractive for many reasons, it is important to understand that there are key differences between running an Azure SQL Database and an on-premises or IaaS virtual machine with SQL Server installed.

A key point is that system-level functions cannot be performed from Azure SQL Database. This includes database backups, system level profiling, and extensions to SQL Database, including FILESTREAM and CLR extensions.

Operations such as backups have been accommodated by extensions to TSQL, which allows for the backups without the need to create a backup media object (which would tie to a file system object on the operating system running SQL Server). All other operations such as management for the SQL Database instance are accommodated by using dynamic management views (DMVs) instead of extended stored procedures.

The key benefits of leveraging Azure SQL Database over traditional deployments of SQL Server is that databases and servers can be created in seconds, not hours or days. The other added benefit is that there is less need to focus on the infrastructure and processes to replicate and back up the data in the databases.

By default, Azure SQL Database commits the data to three separate instances in the same Azure region. This is similar to how Azure Storage commits all Writes to three stamps in Azure. This provides the high availability feature and protects against hardware failures inside Azure.

Feature References

Understanding Azure SQL

http://azure.microsoft.com/en-us/documentation/articles/data-management-azure-sql-database-and-sql-server-iaas/

Development considerations

https://msdn.microsoft.com/library/azure/ee730903.aspx

Performance and scaling

https://msdn.microsoft.com/library/azure/e6f95976-cc09-4b46-9d8c-4cf23119598d

Mandatory: Understand the performance and management differences between a traditional SQL Server database and an Azure SQL Database.

Recommended: Analyze databases to be migrated to Azure SQL Database for incompatibles that might be present (for example, FILESTREAM).

Optional: Leverage built-in tools for BACPAC and DACPAC to move databases to Azure SQL Databases.

Design Guidance

Running relational databases in the cloud has deep implications to the application, performance, and resiliency. In many ways, the database is as important as the application in terms of how to run it effectively in a public cloud. There are pros and cons to running SQL Server in an IaaS environment as well as Azure SQL Database.

Some fundamental considerations are encryption requirements, performance requirements, and feature requirements. Although data can be encrypted by the application and stored in an SQL Database, TDE is not yet supported in SQL Databases.

If this is required, a SQL Server in your IaaS environment would be the preferred target. Performance of an Azure SQL Database can appear to be slower, but one must consider that each write, will be committed to three (3) databases in the local datacenter (synchronously) and other is asynchronously if geo replication is enabled. This affords the benefit of having to maintain as many local backups (built-in backups) with the downside of performance. For high TPS loads, consider adding a caching layer to insulate the application from the performance impacts of multiple commits as much as possible.

Advanced features in SQL Server that require access at the disk level or operating system level obviously will not work the same with SQL Database. For example, CLR integration, backup sets, and FILESTREAM tables are not possible with Azure SQL Database.

Leveraging Azure SQL Database has some unique security considerations. Azure SQL Database has a public facing IP address that is accessible by anyone. Communications to Azure SQL Database can be secured by using the SQL Server firewall and the per SQL Database firewall.

This allows for specific IP addresses that can connect to the database. When using ExpressRoute and public peering to access Azure SQL Database, the access flows through a network address translation (NAT) interface. This means the NAT address has to be specified in the firewall rules for Azure SQL Database, and therefore, it does not allow the ability to specify end-to-end security.

Scenario

Model

Points to Consider

FILESTREAM needed

Shred the file objects to a blob in Azure Storage and store indexes in SQL Server

FILESTREAM is not available with Azure SQL Databases

SQL backups

Use geo-replication and point-in-time backups for Azure SQL Database workloads

Traditional backup sets are not supported in Azure SQL Databases

Azure Batch

As discussed earlier, there is often the need to run processing that might not have an immediate UI and runs as a service in the background. Traditional PaaS offering was to leverage a worker role and implement the business logic via custom code.

Azure Batch offers a similar type service, but with a unique twist. It has been designed to run background processes, but it is centered on high-performance data. It can provide scheduling, auto-scaling of compute resources, and partitioning of jobs.

This type of service targets the following type of workloads:

  • Financial risk modeling
  • Image rendering
  • Image processing
  • Media encoding and transcoding
  • Genetic sequence analysis
  • Software testing
  • Azure Batch is comprised of two primary components: Azure Batch and Azure Batch Apps. Azure Batch APIs focus on the ability to create pools of virtual machines and define work items to be scheduled for processing.
  • Before the advent of Azure Batch, the typical model was to create an HPC cluster, and devise a software model that would allow scheduling or queueing items that required processing.
  • Azure Batch includes an API to support the infrastructure. There is no need to manually build servers and software libraries to handle job scheduling. This allows consumers the ability to focus purely on the business logic and use configuration to drive the infrastructure.
  • Azure Batch apps take this a step further. Consumers can publish an "app," which is essentially a service that allows data to be fed to it, and it will run as needed. Definition of the pool for the compute workload is defined in the configuration. This adds the ability to monitor these jobs via the Azure portal and via the REST API for extensibility to build custom dashboards.

Feature References

Azure Batch Technical Overview

http://azure.microsoft.com/en-us/documentation/articles/batch-technical-overview/

Azure Batch APIs

https://msdn.microsoft.com/en-us/library/azure/dn820177.aspx

Mandatory: Define pools of virtual machines that will perform the underlying work for Azure Batch jobs.

Recommended: Analyse the workload to determine which model is the better fit— Azure Batch or Azure Batch apps.

Optional: Leverage the REST API to output monitoring and telemetry to existing systems.

Design Guidance

Azure Batch is a good consideration for workloads in which an existing process or executable is used to process the data. For this to work effectively, the data should be in a format that can allow parallelization (which cuts the data into several chunks). This service can be highly effective for custom processing and it is easy to configure.

Azure HDInsight (Hadoop)

Azure HDInsight is the implementation of Hadoop as a service in Azure. The goal of this service is to enable customers with the ability to create Hadoop cluster services in seconds and minutes instead of hours and days. This significantly reduces the cost of this big data service. Additionally, the service provides storage in the form of the Hadoop Distributed File System (HDFS), which has become the standard for Hadoop clusters. Azure extends this concept to allow Azure Storage to be leveraged by Hadoop.

At its core, HDInsight can be run on Linux-based or Windows-based servers. This makes using Hadoop easy for those coming from a Linux background and approachable by those who use Windows.

The HortonWorks Data Platform (HDP) is the Hadoop distribution used by HDInsight. Additionally, there are several high-level configurations for running Hadoop, which can be used to optimize the cluster based on the operations and activities it will target. Along with this are other components that have been developed primarily by the open source community. These customize the system for specific types of workloads.

Feature References

Hadoop on Azure

http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/

Components matrix

http://azure.microsoft.com/en-us/documentation/articles/hdinsight-component-versioning/

Apache Hadoop Core

http://hadoop.apache.org/

Using Pig with Hadoop

http://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-pig/

Mandatory: Create storage accounts for data repositories for HDInsight. Also, be sure to deprovision your clusters when not in use because the cost for running thousands of cores can add up quickly.

Recommended: Know the type of data you have in your system, and think about what actions are most important for your workloads. Get a good sense for your data and what is most important to your workloads. This will help guide what components in HDInsights can be leveraged to take best advantage of the platform services.

Optional: Spend some time checking what is already built by the open-source communities that you can use with must less effort than writing from scratch.

Azure Machine Learning

Machine learning and data science is an exciting technology in today's market. Unlocking the insights that our data holds is key to getting the competitive advantage that most companies need for success.

Although the language around neural networks and machine learning have been in place for decades, only recently has computing power been able to provide the computational performance required to run these algorithms at a large scale. This service allows modeling data and algorithms. This provides a low bar for entry and users can start simply with a web browser. Azure Machine Learning also provides a tool set called Azure Machine Learning Studio.

This service offering turns the algorithms and code from languages (for example, R and Python) into services that can target user's data. Users upload more data, and it can be compared and processed against models created by data scientists. Machine Learning also provides training with existing data to feed the models.

This offering lowers the bar for getting the most out of a predictive analytics system. Historically, it required very different skill sets to develop the models and algorithms as opposed to exposing the results via the web or services. Machine Learning marries these primary skills and focuses on the underlying business logic.

Feature References

Machine learning overview

https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-machine-learning/

Azure for Research

http://research.microsoft.com/en-us/projects/azure/default.aspx

Machine Learning Studio

http://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-ml-studio/

Publishing the API

http://azure.microsoft.com/en-us/documentation/articles/machine-learning-publish-a-machine-learning-web-service/

Mandatory: Bring your R and Python code libraries, and understand how to leverage Machine Learning Studio to provide a streamlined development experience.

Recommended: Partition your logic to create consumable services by using the platform services of Machine Learning.

Optional: Explore what the data science community has already created and work to extend or enhance these to speed your development effort and time.

High-Performance Computing

Azure provides high-performance computing (HPC) in the form of high-performance virtual machines. These virtual machines are tailored to support this specific computing need via features such as a back-end network with MPI latency under three microseconds and up to 32 Gbps of throughput. The back-end network leverages RDMA to enable scaling workloads beyond the typical limits (even for cloud platforms) to thousands of cores.

Azure supports a high-performance capacity of up to 500 MB of memory and 6.9 TB of SSD disk performance, up to 32 CPUs per virtual machine. When combined with the Microsoft HPC Pack, a system can be architected to run on-premises in a Windows compute cluster, and then extend to Azure as capacity demands dictate. This allows organizations to also run HPC wholly in Azure if desired.

Combined with a rich ecosystem of applications, libraries, and tools, Azure provides a premier platform for high-performance computing.

Feature References

HPC overview

https://msdn.microsoft.com/en-us/library/azure/dn482130.aspx

Cluster pack for Azure IaaS

https://msdn.microsoft.com/en-us/library/azure/dn518135.aspx

Running MPI apps

https://msdn.microsoft.com/en-us/library/azure/dn592104.aspx

Hybrid cluster

http://azure.microsoft.com/en-us/documentation/articles/cloud-services-setup-hybrid-hpcpack-cluster/

Mandatory: Determine a strategy for on-premises, cloud, and hybrid clustering for HPC workloads.

Recommended: Scale the application resources dynamically, to take advantage of extreme size virtual machines only when it makes sense.

Optional: Make changes to the applications to allow disconnecting the tiers to take advantage of features, such as queuing to allow scaling of independent compute clusters.

Content Delivery Network

A key component for applications that require global reach is to be able to reach customers in the quickest manner. One of the industry standards for distributing static content (typically images and video) is content delivery networks (CDNs).

Although this type of service has been available for years, the Azure platform integrates the CDN with existing data in Azure Blob Storage. This integration enables hosting the applications for a global reach, and makes it easier and less expensive to implement.

Typically, CDNs are used for static content, such as media, images, or documents, which is read more than written to. With an end goal of pushing content globally, this integration with Azure results in a faster response time for clients, based on their location. It also facilitates automatic redundancy and replication of content.

Feature References

CDN overview

http://azure.microsoft.com/en-us/documentation/articles/cdn-overview/

POP locations for CDN in Azure

https://azure.microsoft.com/en-us/documentation/articles/cdn-pop-locations/

Integration of CDN

http://azure.microsoft.com/en-us/documentation/articles/cdn-serve-content-from-cdn-in-your-web-application/

Mandatory: Determine regions to target for the CDN and update the application to use the root URI of the CDN rather than for the local content.

Recommended: Leverage parameters to vary the caching characteristics and lifetime of the content cache.

Optional: Map the CDN content to a custom domain name.

Redis Cache

Application-level caching can be achieved with a variety of products, some from Microsoft and others from external vendors. There are multiple types of caching options in Azure, including Managed Cache Service, In-Role Cache, and Redis Cache.

Redis Cache is a cache-as-a-service offering for the Azure platform. This means that the Azure platform manages the underlying infrastructure to host the caching servers. From an application point of view, the service can be accessed via the same Redis clients that have been in use since Redis was created. These vary based on platforms, and they are available for most of the popular selections (for example, Java, Node, and .NET).

Redis goes beyond a simple key/value pair so it can cache entire data structures, such as collections and sets. Redis also supports non-blocking, first synchronization, and automatic reconnections, which support cache replication to increase uptime.

Feature References

Redis overview

http://azure.microsoft.com/en-us/services/cache/

Caching data in Redis

https://msdn.microsoft.com/library/azure/dn690521.aspx

Management in Azure Portal

https://msdn.microsoft.com/library/azure/dn793612.aspx

Cache planning

https://msdn.microsoft.com/library/azure/dn762132.aspx

Mandatory: Understand which tier of service will be required and implement the Redis client in the application.

Recommended: Use the advanced structure caching options with Redis to simplify the application caching code.

Optional: Set up policies for cache rejection, lifetime, and so on.

Service Bus

Azure Service Bus is one of the core services, and it provides a high-performance, durable-messaging service. But it is actually a bit more than this—Service Bus offers queueing and relay services.

Service Bus queues provide an option to decouple any processing from the request pipelines. This type of architecture is very important, especially when migrating workloads to the cloud because loosely coupled applications can scale, and they are more fault resilient.

Service Bus can use a variety of models, from simple queue-based storage, to topics (which target and partition messages in a namespace). You can even use Event Hubs on top of Service Bus to service very large client bases, where input to Service Bus will include several thousands to millions of messages in rapid succession.

Feature References

Service Bus overview

https://msdn.microsoft.com/en-us/library/ee732537.aspx

Sample scenarios

https://msdn.microsoft.com/en-us/library/dn194201.aspx

Event Hubs

https://msdn.microsoft.com/en-us/library/dn789973.aspx

Application architecture

http://azure.microsoft.com/en-us/documentation/articles/service-bus-build-reliable-and-elastic-cloud-apps/

Mandatory: Determine which model to use when storing messages in Service Bus, based on transaction, lifetimes, and message rates.

Recommended: Modify applications to provide transient fault handling to compliment decoupling message posting from message processing.

Optional: Leverage Event Hubs to handle large scale intake of Service Bus messages.

API Management

In addition to providing business logic, a key consideration when implementing web-service workloads is to allow for features such as throttling, authentication, and partitioning services. Until this point, developers were tasked with building code for this infrastructure, which became the framework for the web services that were deployed.

API Management Service was designed to accommodate this need. It provides this infrastructure with very little effort. Developers can concentrate on the business logic of the web services instead of how they are deployed. This also allows deploying the underlying web services on different servers and different technologies, as needed.

API Management Service can leverage existing on-premises web services in addition to cloud-deployed services. Services such as throttling, rate limits, and service quotas can be applied in a central point, similar to load balancing. Services are based on rules that are established in the API Management service. This allows consolidating services from multiple back-ends to a single entry point for service consumers.

Feature References

API Management overview

http://azure.microsoft.com/en-us/documentation/articles/api-management-key-concepts/

Getting started

http://azure.microsoft.com/en-us/documentation/articles/api-management-get-started/

API management for developers

https://msdn.microsoft.com/en-us/library/azure/dn776327.aspx

Securing APIs

http://azure.microsoft.com/en-us/documentation/articles/api-management-howto-mutual-certificates/

Mandatory: Configure policies for services and profiles for existing web services to use API Management.

Recommended: Protect web services with API Management rate limits and quota policies.

Optional: Customize the developer portal to allow for developer registration and subscription models.

Azure Search

Probably one of the most common components in applications is a platform or infrastructure that can support searching for application data. There are quite a few products offered on the market by third-party and Microsoft. Azure Search was created to provide the search-as-a-service offering in Azure.

While Azure Search does not offer a crawler to index the application data sources, it does provide the infrastructure to intake the index files and provides interfaces for the actual search functions. This service is targeted at developers, and it is not a service that is directly customer facing. At the core, it's a web service that follows the model of a REST-based interface for connected applications.

The index schemas are expressed in JavaScript Object Notation (JSON). Essentially the index contains a list of fields and associated attributes. These attributes are:

  • Name: Describes the data this field contains.
  • Type: Indicates the type of data in this field. Some of the options are String, Int32, Double, Boolean.
  • Searchable: Determines whether a user's search request can access this field.
  • Suggestions: Determines whether Azure Search can provide suggestions for this field. If this is set, an application can call Azure Search regularly while the user is typing in the Search box to get suggestions. These suggestions are added to the index by the people who own that index—they're not created automatically by the search service.
  • Sortable: Indicates that search results can be sorted by this field. Some fields, such as a string containing a paragraph of text, might not allow this because sorting on a paragraph probably wouldn't make much sense.
  • Retrievable: Indicates whether this field can be returned in the search results.
  • Filterable: Indicates that this field can be used as a filter. For example, if a user wants to search for "high heels," the field that contains these search terms must be marked as filterable. This lets Azure Search return only the rows in the index that contain "high heels" in that field.
  • Facetable: Indicates whether a search request can return the number of items in the index with a specific characteristic. An application can also request the number of items within a specific range.

Feature References

Azure Search Overview

https://msdn.microsoft.com/library/azure/dn798933.aspx

Getting started

http://azure.microsoft.com/en-us/documentation/articles/fundamentals-azure-search-chappell/

Azure Search API

https://msdn.microsoft.com/en-us/library/azure/dn798935.aspx

Creating indexes

https://msdn.microsoft.com/en-us/library/azure/dn798941.aspx

Mandatory: Construct and update indexes for Azure Search consumption via back-end services. PaaS-based worker roles work well for these types of jobs.

Recommended: Add additional attributes to the index to support advanced features, such as automatic suggestions.

Optional: Build monitoring data integration to existing monitors to ensure storage or indexes don't exceed the limits for the service.

Design Guidance

For more information about applicable Azure design patterns, see Azure Search Tier (Azure Architecture Patterns).

PaaS Considerations

As with IaaS, there are several considerations when deploying solutions within Microsoft Azure, specifically when targeting PaaS models. Deployment considerations include deployment methods, load balancing, resiliency, security, and networking. This section covers these areas at a high level.

Deployment Methods

When running services in PaaS, it's important to understand and configure the services such that upgrades to the application and upgrades as part of the Azure platform do not result in outages or downtime to the application.

Models for the application lifecycle vary from simple to complex. Some of the most important tradeoffs are detailed in the following table:

Considerations

Decision Points

Upgrade domains

This service is deployed to Azure (specifically for PaaS web and worker roles), and it includes the concept of upgrade domains. It's important to configure the services to use multiple upgrade domains to avoid unnecessary outages when new deployment or upgrades to the application or service are initiated.

Deployment slots

Deployment slots can be used to test new versions or upgrades without affected the production application. A model for using staging slots before releasing to production can enable better testing to avoid downtime.

Web Deploy

Web Deploy is a way to deploy services to cloud services in Azure. Although this is simple, user interaction is typically required with this model. This makes a good option for developer and smaller apps, but larger apps might require more governance and control for the deployments.

Continuous integration

Continuous integration is a great option for larger applications and organizations that require automating deployments. This allows gated check-ins (approval) and continuous check-ins (triggered).

Load Balancing

Customers who deploy applications as PaaS in Azure must consider load balancing as a core part of the application. This is a must for applications hosted in PaaS because even if hardware failures never happen (which is unlikely), the servers need to be upgraded (guest and host upgrades). This means that these instances will be moved at some point.

The Azure fabric will ensure that it doesn't take all the server instances down at one time, but this requires the use of at least two instances (fault and upgrade domains allow the fabric to operate in this way).

Keep in mind that are two levels of load balancing available for Azure PaaS services:

  • DNS level: Load balancing for traffic to different cloud services located in different datacenters, to different Azure websites located in different datacenters, or to external endpoints. This is done with Traffic Manager and the round robin load balancing method.
  • Network level: Load balancing of incoming Internet traffic to different virtual machines of a cloud service, or load balancing of traffic between virtual machines in a cloud service or virtual network. This is done with the Azure Load Balancer.

Feature References

Load Balancing for Azure Services

http://azure.microsoft.com/documentation/articles/load-balancer-overview

About Traffic Manager Load Balancing Methods

http://azure.microsoft.com/documentation/articles/traffic-manager-load-balancing-methods

Internal load balancing

http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview

Networking

When deciding to provision PaaS instances that need to communicate with other servers or services in Azure, a virtual network is required. Some areas of consideration include:

Considerations

Decision Points

Name resolution

When you deploy virtual machines and cloud services to a virtual network, you can use Azure-provided name resolution or your own DNS solution, depending on your name resolution requirements. For information about name resolution options, see Name Resolution (DNS).

Enhanced security and isolation

Because each virtual network is run as an overlay, only virtual machines and services that are part of the same network can access each other. Services outside the virtual network have no way to identify or connect to services hosted within virtual networks. This provides an added layer of isolation to your services.

Extended connectivity boundary

The virtual network extends the connectivity boundary from a single service to the virtual network boundary. You can create several cloud services and virtual machines within a single virtual network and have them communicate with each other without having to go through the Internet. You can also set up services that use a common backend database tier or use a shared management service.

Extend your on-premises network to the cloud

You can join virtual machines in Azure to your domain running on-premises. You can access and leverage all on-premises investments related to monitoring and identity for your services hosted in Azure.

Use persistent public IP addresses

Cloud services within a virtual network have a stable public VIP address. You can also choose to configure your cloud services when you create it by using a reserved public IP address from the address range. This ensures that your instances retain their public IP address even when moved or restarted. See Reserved IP Overview.

There are two models for network configurations for Azure cloud services: cloud-only and cloud-premises virtual network configurations:

  • Cloud-only virtual network configurations are virtual networks that don't use a virtual network gateway to connect back to your on-premises network or directly to another virtual network in Azure. They aren't really a different type of virtual network, but rather, they are a way to configure a virtual network without configuring cross-premises connectivity.
    You connect to the virtual machines and cloud services from the endpoints, rather than through a VPN connection.
  • Cross-premises connections offer an enormous amount of flexibility. You can create multisite configurations, virtual network to virtual network configurations, ExpressRoute connections, and combinations of multiple configuration types. If you are extending your on-premises network to the cloud, this is the way to do it.
    Most cross-premises connections involve using a VPN device to create a secure connection to your Azure virtual network. Or if you prefer, you can create an ExpressRoute direct connection to Azure through your network service provider or exchange provider and bypass the public Internet altogether.

Feature References

How to create a virtual network

https://azure.microsoft.com/documentation/articles/virtual-networks-create-virtual network/

Limitations

Although the capabilities of Azure Virtual Machines are quite comprehensive, there are some native limitations that exist and should be understood by organizations prior to deploying solutions in Azure. These include:

Consideration

Impact

Workaround

Auto-scaling

The application environment does not automatically increase or decrease role instances for increased or decreased loads.

  • Someone needs to manually monitor the environment
  • Sudden increase in load impacts the performance.

Utilize monitoring and automation capabilities such as the Azure Monitoring Agent and Azure Automation to dynamically scale and deploy application code to cloud service instances in the environment.

Load balancing

Application instances are not load balanced by default.

  • It does not allow for elasticity of the application environment.
  • Sudden increase in load impacts the performance.

After the cloud service is provisioned, create an Internal Load Balancer and associate it with the cloud service endpoint.

Density

The total cloud services per subscription is 20.

Leverage multiple subscriptions to provide the proper level of segmentation.

Management Considerations for Azure PaaS Cloud Services

Azure Diagnostics are Azure extensions that enable you to collect diagnostic telemetry data from a worker role, web role, or virtual machine running in Azure. The telemetry data is stored in an Azure Storage account and can be used for debugging and troubleshooting, measuring performance, monitoring resource usage, traffic analysis, capacity planning, and auditing.

Azure Diagnostics can collect the following types of telemetry:

Data Source

Description

IIS logs

Information about IIS websites.

Azure Diagnostics infrastructure logs

Information about Azure Diagnostics.

IIS failed request logs

Information about failed requests to an IIS site or application.

Windows Event logs

Information sent to the Windows event logging system.

Performance counters

Operating system and custom performance counters.

Crash dumps

Information about the state of the process in the event of an application crash.

Custom error logs

Logs created by your application or service.

.NET EventSource

Events generated by your code using the .NET EventSource class.

Manifest-based ETW

ETW events generated by any process.

Microsoft Azure Networking

Microsoft Azure networking leverages a combination of software-defined networking within the Azure fabric infrastructure and physical networking at the edge where customers interface with Azure. Within the Azure fabric infrastructure, there is the concept of virtual networks, subnets within the virtual networks, and the network gateways that allow connectivity between virtual networks and customer networks.

At the edge of the Azure fabric infrastructure, enterprise customers typically use physical devices to provide communications from the on-premises enterprise datacenter environments, where small or medium businesses might use virtual devices or only connect from the client computer to the Azure environment.

This section covers the concepts and planning guidance required for networking infrastructures that interface with and exist within the Microsoft Azure platform.

Cloud Service Provider and Enterprise Customer Connectivity

Connecting to Azure can be accomplished directly by enterprise customers or using a cloud service provider as the interface. When customers connect directly, they create subscriptions, establish connections, and are responsible for managing the private network interfaces and manage all aspects of establishing services within Azure. When customers leverage a cloud service provider, they offload various aspects of the subscription, networking, identity, and management of the Azure environment to the cloud service provider.

From a networking perspective, cloud service providers offer two types of network connectivity to Azure for customers.

  • The customer can choose to "connect through" the CSP to connect to Azure. In this model, the CSP creates the customer subscription, connects their datacenter to the subscription, and then the customer connects to the CSP network to access Azure and the subscription's resources.

  • The customer can also choose a "connect to" model. The CSP creates the customer's subscription, but the customer is responsible for connecting their datacenter to Azure. The CSP connects to the customer network to obtain access to the Azure subscription and assists the customer in management aspects.

Azure Resource Management versus Service Management

When leveraging Azure networking technologies, a key consideration are the capabilities that can be implemented, depending on the API and portal approach that you are targeting. Before a capability can be used, the implementation API (ARM or ASM) has to enable that capability. Most common Azure networking technologies are available using the ASM API, however advanced capabilities logging or diagnostics data are only implemented using the ARM API. While the transition between ASM and ARM APIs continue, it is important to verify that a specific area of functionality is available in either API before committing to a path of implementation.

For customer-managed environments, the customer has the option of choosing whether to use the existing ASM API and migrate to the ARM API when required networking capabilities are available in ARM. Conversely, they can choose to immediately adopt the ARM API understanding that there are certain capabilities they cannot leverage until they are made available.

For CSP managed scenarios, the networking capabilities are limited to ARM API due the requirement for RBAC to separate management scope between the provider and the customer.

Virtual Networks

Azure Virtual Networks provide a key building block for establishing virtual private networks. Virtual networks can be used to allow isolated network communication within the Azure environment or establish cross-premises network communication between an organization's network infrastructure and Azure. By default, when virtual machines are created and connected to Azure Virtual Network, they are allowed to route to any subnet within the virtual network, and outbound access to the Internet is provided by Azure's Internet connection.

A fundamental first step in creating services within Microsoft Azure is establishing a Virtual Network. To establish a virtual private network within Azure, you must create a minimum of one virtual network. Each virtual network must contain an IP address space and a minimum of one subnet that leverages all or part of the virtual network address space. To establish remote network communications to on-premises or other virtual networks, a gateway subnet must be allocated for the virtual network and a virtual network gateway must be added to it.

To enable cross premises connectivity, a Virtual Network must attach a virtual network gateway (often referred to as a gateway). Currently, there are three types of gateways that can be deployed:

  • Static routing gateway (basic, standard, and high performance)
  • Dynamic routing gateway (basic, standard, and high performance)
  • ExpressRoute gateway

The type of gateway determines the cross-premises connectivity capabilities, the performance, and the features that are offered. Static and dynamic gateways are used when establishing Point-to-Site (P2S) and Site-to-Site (S2S) VPN connections where the cross-premises connectivity leverages the Internet for the transport path. ExpressRoute gateways are designed for high-speed, private, cross-premises connectivity where the traffic flows across dedicated circuits and not the Internet.

Static gateways are for establishing low-cost connections to a single virtual network in Azure. Dynamic gateways are used to establish low-cost connections to an on-premises environment or to connect multiple virtual networks for routing purposes in Azure. ExpressRoute gateways are used for connecting on-premises environments to Azure over high-speed private connections.

Feature References

Azure Virtual Network Overview

https://msdn.microsoft.com/library/azure/jj156007.aspx

Virtual Network FAQ

https://msdn.microsoft.com/en-us/library/azure/dn133803.aspx

Virtual Network Cross Premises Connectivity

https://msdn.microsoft.com/en-us/library/azure/dn133798.aspx

VPN Devices and Gateway Information

https://msdn.microsoft.com/en-us/library/azure/jj156075.aspx

ExpressRoute

http://azure.microsoft.com/en-us/documentation/services/expressroute/

Mandatory: Azure solutions must contain a minimum of one virtual network to establish network communications within Azure. A Virtual Network must contain a minimum of one subnet for virtual machine placement and one gateway subnet if cross premises network connectivity is required.

Proper network address space planning is required when implementing virtual networks and subnets.

Recommended: Azure solutions should use the dynamic routing or ExpressRoute gateway versus the static routing gateway.

Design Guidance

When you design Virtual Networks, consider the following:

Capability Considerations

Capability Decision Points

RBAC

The Virtual Network Contributor resource role allows the ability to manage the entire virtual network and subnets. The Virtual Machine Contributor role can be used to grant the ability to use a subnet but not manage it.

CSP management

CSP scenarios might drive additional virtual networks to allow customer separate management capabilities.

Core limitations

Ensure that the virtual network design supports the number of virtual machines that are desired.

Virtual Network Gateways

Virtual network gateways provide connectivity from on-premises networks to Azure and between Azure virtual networks. Types of gateway connection technologies include Point-to-Site (P2S), Site-to-Site (S2S), and ExpressRoute. This section covers gateways in the context of Site-to-Site (S2S) and ExpressRoute connections.

For Site-to-Site gateways, an IPsec/IKE VPN tunnel is created between the virtual networks and the on-premises sites by using Internet Key Exchange (IKE) protocol handshakes. For ExpressRoute, the gateways advertise the prefixes by using the Border Gateway Protocol (BGP) in your virtual networks via the peering circuits. The gateways also forward packets from your ExpressRoute circuits to your virtual machines inside your virtual networks.

Currently there are two types of S2S virtual private network connections (VPNs) that require the use of two types of gateways: static routing and dynamic routing. A static routing gateway uses policy-based VPNs. Policy-based VPNs encrypt and route packets through an interface based on a customer-defined policy. The policy is usually defined as an access list. Static routing VPNs require a static routing VPN gateway. Although they are effective for single virtual network connections, static gateways are limited to a single virtual network per VPN connection.

In contrast, dynamic routing gateways use route-based VPNs. Route-based VPNs depend on a tunnel interface specifically created for forwarding packets. Any packet arriving at the tunnel interface is forwarded through the VPN connection. Dynamic routing VPNs require a dynamic routing VPN gateway.

From a performance perspective there are three types of dynamic routing gateways: Basic, Standard, and High-Performance. The differences between these gateway types are outlined in the following table.

Type

Type

S2S connectivity

Authentication method

Maximum number of S2S vNet connections

Maximum number of P2S connections

S2S VPN Throughput

ExpressRoute

Throughput

Basic Dynamic Routing Gateway

Basic Dynamic Routing Gateway

Route-based VPN configuration

Pre-shared key

10

128

~100 Mbps

~500 Mbps

Standard Dynamic Routing Gateway

Standard Dynamic Routing Gateway

Route-based VPN configuration

Pre-shared key

10

128

~100 Mbps

~1000 Mbps

High-performance Dynamic Routing Gateway

High Performance Dynamic Routing Gateway

Route-based VPN configuration

Pre-shared key

30

128

~200 Mbps

~2 Gbps

Creating and connecting a gateway for a Virtual Network is a multiple step process, and it requires certain configurations to be complete. A high-level set of steps is outlined here:

  1. Define a virtual network gateway subnet within the available IP address space within the virtual network.
  2. Create the virtual network gateway and allow it to provision.
  3. Establish the connection between locations.
    1. For S2S connections, you must download the supported VPN device configuration script (or instructions) and establish the connection. For more information, see About VPN Devices and Gateways for Virtual Network Connectivity.
    2. For virtual network-to-virtual network connections, you must define and register shared keys to establish the connection.
    3. For ExpressRoute connections, you must define a service key to establish connections.
  4. Initiate the connection handshake to connect the gateways.

After you have a successful gateway connection, the gateway status will show as active within the virtual network dashboard in the Azure portal. Note that for ExpressRoute, S2S, and virtual network-to-virtual network connections the portal will provide a gateway connection status, Data-In traffic amount, Data-Out traffic amount, and the gateway address in the portal. Here is an example of this information shown in the Azure portal:

The configuration of an S2S or P2S virtual network gateway can be performed within the portal, but at this time, ExpressRoute gateways can only be provisioned and configured by using the ExpressRoute PowerShell module.

Note that it is possible to resize a gateway (between Basic, Standard, and High-Performance) with the Resize-AzureVNetGateway
cmdlet. This allows organizations to start at one class of service and expand their capabilities as their requirements grow. This process results in resizing the gateway, and some downtime is required during the resizing process. No other configuration changes are required. Resizing operations include increasing or decreasing between a Basic, Standard, and High-Performance gateway.

Feature References

Configure a Cross-Premises Site-to-Site connection to an Azure Virtual Network

http://azure.microsoft.com/documentation/articles/vpn-gateway-site-to-site-create/

Configure a Virtual Network Gateway in the Management Portal

https://msdn.microsoft.com/en-us/library/azure/jj156210.aspx

Connect Multiple On-premises Sites to a Virtual Network

https://msdn.microsoft.com/en-us/library/azure/dn690124.aspx

Configure a Virtual Network and Gateway for ExpressRoute

http://azure.microsoft.com/documentation/articles/expressroute-configuring-vnet-gateway/

ExpressRoute Technical Overview

http://azure.microsoft.com/documentation/articles/expressroute-introduction/

ExpressRoute Prerequisites

http://azure.microsoft.com/documentation/articles/expressroute-prerequisites/

Coexistence Gateway

https://azure.microsoft.com/en-us/documentation/articles/expressroute-coexist/

Mandatory:

  • Connecting a virtual network to an on premises environment or to another virtual network requires the creation of a virtual gateway.
  • Only a single gateway can be attached to a virtual network.

Recommended:

  • Because gateways take time to be provisioned and they must be accessible to establish the handshake, create the gateway as soon as possible after the virtual network is created.
  • Use a high-performance gateway for ExpressRoute connections of 1 Gbps or higher.
  • Use a high-performance dynamic routing gateway in S2S scenarios if more than 10 virtual network connections are required per gateway or if more than 100 Mbps of throughput is required.
  • Consider using a coexistence gateway as the default gateway for all ExpressRoute circuits.
   

Design Guidance

For more information about Applicable Azure design patterns, see Hybrid Networking (Azure Architecture Patterns).

When you design virtual network gateways, consider the following:

Capability Considerations

Capability Decision Points

Gateway provisioning performance

Note that when you create a gateway, it can take anywhere from 15-30 minutes for the gateway to be available via the provisioning process.

Gateway limits

A virtual network can have a maximum of a single gateway attached to it.

Static routing gateway limits

Multi-site, virtual network to virtual network, and Point-to-Site gateway connection technologies are not supported with static routing VPN gateways.

Co-existence gateway

A new co-existence gateway exists that combines both the BGP and IKE protocols (ExpressRoute and S2S connections). With this gateway, it is possible to support ExpressRoute and S2S VPN connections to a single virtual network.

The coexistence gateway supports two modes of operation: failover and coexistence.

  • In failover mode, the ExpressRoute side handles all traffic until a failure occurs and then the S2S gateway takes over.
  • In coexistence mode, the gateway allows a customer to leverage a high-speed ExpressRoute connection to provide ingress and egress to Azure while also providing virtual network-to-virtual network connections across the Azure network fabric.

Gateway performance remains the same for the connection types.

Cisco ASA VPN Device

The most common VPN device that customers use is a Cisco ASA device. This device does not currently support dynamic routing and we do not support multiple policy configuration with a static routing gateway, so only a single virtual network can be connected to a Cisco ASA VPN device.

CSP Limits

CSP scenarios currently only support S2S VPN gateways due to limitations of the ARM API.

Virtual Network Gateway Address Requirements

In the previous section, we discussed that there are three types of gateways available today in Azure. Gateways are used to connect on-premises environments to Azure and to enable virtual network-to-virtual network connectivity. Gateways are created at the virtual network level and a virtual network can have only a single gateway connected to each.

Regardless of which gateway type you create; you must have a gateway subnet defined for the virtual network. The gateway subnet has different address space requirements based on the type of gateway created.

A static routing gateway or a dynamic routing gateway must have a subnet with a /29 CIDR definition. When the gateway is connected, it actually takes the /29 segment and breaks it into two /30 segments to provide redundant connections as part of the Site-to-Site VPN. The address requirements are the same for a standard and a high-performance static or dynamic routing gateway.

An ExpressRoute gateway must have a subnet with a /28 CIDR definition. When the ExpressRoute gateway is established, it breaks the /28 into two /29 segments that are used to provide the redundant connections as part of the ExpressRoute circuit establishment.

Mandatory:

  • An Azure gateway requires the creation of a gateway subnet in a virtual network to create the gateway. The gateway subnet must meet the address requirements based on the type of connection the gateway will support (S2S or ExpressRoute).
  • Address prefixes used in Azure or for Azure connectivity must be non-overlapping
  • CSP Scenarios will require address planning with the customer to ensure non-overlapping address prefixes are being used.

Recommended:

  • Develop and Address prefix plan for Azure so you can create a spreadsheet that pre-allocates addresses space for the virtual network architecture.
  • Plan for both production, pre-production, and non-production environments

Virtual Network to Virtual Network Routing

Virtual network-to-virtual network routing allows establishing routing paths across the Azure network fabric without having to send the traffic on-premises. Establishing virtual network-to-virtual network routing requires creating an IPsec tunnel and dynamic routing across that segment.

The static and dynamic routing gateways use IKE protocol to establish an IPsec tunnel and route traffic, but only the dynamic routing gateway supports dynamic routing. Based on those requirements, the static routing gateway does not support virtual network-to-virtual network routing. Every virtual network-to-virtual network segment requires a dynamic routing gateway on both ends.

The ExpressRoute gateway leverages the BGP routing protocol to establish the communication and route traffic, and therefore, it does not meet the requirements to establish a virtual network-to-virtual network connection. This requires that a separate connection to each virtual network must be established when using ExpressRoute. In addition, it requires that virtual network traffic routes from Azure to the edge of the ExpressRoute circuit and back to Azure communicate between virtual machines on different virtual networks.

When you establish virtual network routing, by default a virtual network can allow traffic to flow cross a single virtual network-virtual network gateway connection. This is an isolation feature that forces establishing multiple hop routing definitions to enable communications.

The following section discusses the different options of connecting virtual networks together to support routing scenarios. Note that it is possible to provision ASM and ARM versions of virtual networks. The process for connecting both versions is slightly different.

Multiple Virtual Network Routing Configuration

Each gateway has a limited number of other gateway connections that it can establish. The connection model between gateways dictate how far you can route within Azure. There are three distinct models that you can leverage to connect multiple virtual networks to one another:

Mesh

Hub and Spoke

Daisy-Chain

By default, in the Mesh approach, every virtual network can talk to every other virtual network with a single hop. Therefore, this approach does not require you to define multiple hop routing. Challenges with this approach include the rapid consumption of gateway connections, which limit the size of the virtual network routing capability.

By default, in the Hub and Spoke approach (as illustrated in the previous example) a virtual machine on vNet1 will be able to communicate to a virtual machine on vNet2, vNet3, vNet4, or vNet5. A virtual machine on vNet2 could talk to virtual machines on vNet1, but not a virtual machine on vNet3, vNet4, or vNet5. This is due to the default single hop isolation of the virtual network in this configuration.

By default, in a Daisy-Chain approach, a virtual machine on vNet1 can communicate to a virtual machine on vNet2, but not vNet3, vNet4 or vNet5. A virtual machine on vNet2 could talk to virtual machines on vNet1 and vNet3. The same virtual network single hop isolation applies.

Feature References

Connecting ASM virtual networks to ARM virtual networks

https://azure.microsoft.com/en-us/documentation/articles/virtual-networks-arm-asm-s2s/

Mandatory:

  • Multiple hop routing requires selecting a connection model and modifying the default single hop routing.
  • It is possible to establish virtual network to virtual network connections between ASM and ARM virtual networks.

Recommended:

  • Do not use the Mesh connection option because of the limitations on expandability due to gateway connection limits.
  • For higher virtual network gateway connection limits, deploy the high performance gateway

Design Guidance

For more information about applicable Azure design patterns, see Hybrid Networking (Azure Architecture Patterns).

Additional implementation guidance and examples are provided in the Appendix of this document.

Virtual Network IP Address Space Planning

A Virtual Network in Azure is an address space container that can have a gateway connected to it to allow communications. As part of the Virtual Network configuration, customers must configure non-overlapping IP address space for their Azure environment.

This IP address space can consist of private IPV4 IP address ranges (as described in RFC 1918) or public (non-RFC 1918) IPV4 IP address ranges owned by the organization. Exceptions to public address ranges include:

  • 224.0.0.0/4 (multicast)
  • 255.255.255.255/32 (broadcast)
  • 127.0.0.0/8 (loopback)
  • 169.254.0.0/16 (link-local)
  • 68.63.129.16/32 (internal DNS)

A virtual network address space can be subdivided into smaller groups of address spaces called subnets. Subnets are the connection points for virtual machines and specific PaaS roles, not the virtual network. The subnets are connected to the virtual network and part of a flat routed network where traffic that flows through the gateway will reach each subnet.

There are two types of subnets that can be created:

  • Virtual machine subnet
  • Gateway subnet

Virtual machine subnets can have virtual machines, PaaS roles (web and worker), and internal load balancers. Gateway subnets can have connections only with other gateways—provided they are using non-overlapping IP address spaces.

Feature References

Non-RFC 1918 space now allowed in a virtual network

http://azure.microsoft.com/en-us/updates/non-rfc-1918-space-is-now-allowed-in-a-virtual-network

About Public IP Address Space and Virtual Network

https://azure.microsoft.com/documentation/articles/virtual-networks-public-ip-within-vnet/

Microsoft Azure Datacenter IP Ranges

http://www.microsoft.com/en-us/download/details.aspx?id=41653

Mandatory:

  • When designing Azure Virtual Network infrastructures, planning for IP address spaces is a required initial step prior to deploying and configuring virtual networks.
  • For CSP "Connect Through" scenarios, the provider will control the IP address space, but must coordinate with the customer to achieve non-overlapping space.
  • For CSP "Connect To" scenarios the customer will control the IP address space.

Design Guidance

When you design virtual network IP address spaces, consider the following:

Capability Considerations

Capability Decision Points

Virtual network and subnet Configuration

Although there is a limit on the number of virtual networks that can be placed in a subscription, there is no limit on subnets except for how small the address space of the virtual network can be subdivided.

Each subnet has the first three addresses reserved for Azure usage, so the first available address is the fourth address. This means that the smallest subnet can have a CIDR of /29, so there are six assignable addresses.

It is possible to have multiple IP address space definitions in a virtual network definition.

Virtual machines and address space planning

Currently, a virtual network can have a total of 2048 virtual machines attached to subnets. By default, every virtual machine has a single network adapter, and therefore, the virtual network space needs a minimum of 2048 IP addresses (plus the three for Azure) if you are going to maximize the density of the virtual network.

Address space considerations

When designing an address space for a virtual network, consider the following:

  • Limits to the number of objects that can consume IP addresses in a virtual machine subnet
  • Requirements of the gateway subnet based on the type of gateway connection

Planning for internal load balancing

Each virtual machine can have multiple network adapters, and if internal load balancers are used, you also need a single IP address for every internal load balancer. So a formula would be:

Virtual network address space = # of Virtual machines + # of additional network adapters + # of internal load balancers + 3

Note that you need to round up this number to an IP address CIDR border. For example, if the formula results in a minimum requirement of 8003 addresses, you must round up to the next CIDR border of /19, which is 8190 addresses for the virtual network address space.

Duplicate or overlapping IP ranges

One limitation of address space design is that no duplicate IP address ranges can exist in any routed network. This means that you cannot use the same address space for an Azure virtual network or subnet that already exists somewhere else (such as on premises) where the Azure subnets need to route.

CSP's need to ensure that customers do not implement overlapping address spaces in Azure.

Virtual Network Logging

Currently virtual network logging is limited to change management (create, modify, and delete) audit logging. In the Azure portal, it is available via Management Services > Operations Logs. You can also use the Azure PowerShell cmdlet Get-AzureSubscriptionIDLog
for:

  • virtualNetworks (write, delete)
  • publicIPAddresses (write, delete)
  • networkInterfaces (write, delete)
  • loadBalancers (write, delete)
  • networkSecurityGroups (write, delete)

Data plane and control plane logging is not available at this time.

Design Guidance

Consider the impact of logging virtual network information for customers who require regulatory compliance (such as PCI) or other operational requirements.

Network Connectivity

Azure supports two types of connectivity options to connect customer's networks to Azure virtual networks: Site-to-Site VPN and ExpressRoute. Although Point-to-Site is another viable connectivity option, it is client-focused and is not specific to this discussion.

Site-to-Site VPN connections use VPN devices over public Internet connections to create a path to route traffic to a virtual network in a customer subscription. Traffic to the virtual network flows across an encrypted VPN connection, while traffic to the Azure public services flows over the Internet.

It is not possible to create a Site-to-Site VPN connection that provides direct connectivity to the public Azure services via a public peering path. To provide multiple VPN connections to the virtual network, you must use multiple VPN devices connected to different sites. These relationships are depicted in the following diagram:

If a customer selects to engage a cloud service provider in a "connect through" scenario, the customers connect to the CSP network over a S2S VPN and the CSP is connected to Azure over separate S2S VPN connections.

If a customer selects to engage a cloud service provider in a "connect to" scenario, the customer connects to Azure network over a S2S VPN and the CSP is connected to the customer's network over a separate S2S VPN connection that allows the CSP to manage the Azure subscription and resources on behalf of the customer.

ExpressRoute connections use routers and private network paths to route traffic to Azure Virtual Network, and optionally, to the Azure public services. Private connections are made through a network provider by establishing an ExpressRoute circuit with a selected provider. The customer's router is connected to the provider's router and the provider creates the ExpressRoute circuit to connect to the Azure Routers.

When the circuit is created, VLANs can be created that allow separate paths to the private peering network to link to virtual networks and to the public peering network to access Azure public services.

Design Guidance

Cloud Service Provider scenarios are currently limited to Site-to-Site connection options due to the current lack of support for ExpressRoute in ARM.

ExpressRoute Overview

ExpressRoute is a high-speed private routed network connection to Azure. The connections between the customer's network edge and the provider's network edge are redundant as are the connections from the provider's edge to the Azure edge.

From the provider to the Azure edge, you can have private peering connections to customer virtual networks and public peering connections to the Azure PaaS services, such as Azure SQL Database. There are two carrier models provided for ExpressRoute: Network Service Providers (NSPs) and Exchange Providers (IXPs). NSP and IXP connectivity models, speeds, costs and capacities vary. These differences are summarized in the following table:

 

Network Service Provider

Exchange Provider

Bandwidth

10, 50, 100, 500, 1000 Mbps

200, 500, 1000, 10000 Mbps

Route management

Provider manages

Customer manages

High availability

Provider manages

Customer manages

MPLS support

Yes

No

Azure circuit costs

Ingress and egress included in monthly fee

Ingress and egress allocation included in monthly fee and based on consumption

Provider circuit costs

Based on consumption—some provide all-inclusive plans

Based on consumption

Feature References

Configure an ExpressRoute Connection through a Network Service Provider

http://azure.microsoft.com/documentation/articles/expressroute-configuring-nsps/

Configure an ExpressRoute Connection through an Exchange Provider

http://azure.microsoft.com/documentation/articles/expressroute-configuring-exps/

ExpressRoute Whitepaper with detailed steps for connecting via IXP model

http://download.microsoft.com/download/0/F/B/0FBFAA46-2BFD-478F-8E56-7BF3C672DF9D/Microsoft%20Azure%20ExpressRoute.pdf

ExpressRoute Public Peering Design Considerations

Establishing a connection to the public peering network allows virtual machines on Azure Virtual Networks and on-premises systems to leverage the ExpressRoute circuit to connect to Azure PaaS services on the public peering network without traversing the Internet.

Establishing a public peering connection is an optional configuration step for an ExpressRoute circuit. When the public peering connection is established, the routes for all the Azure datacenters worldwide are published to the edge router. This directs traffic to the Azure services instead of going out to the Internet.

The interface between the Azure public services and the customer's network is protected by redundant NAT firewalls. These NAT devices allow customers' systems to access the Azure public services, but they only allow stateful traffic back to the customer's networks.

This interaction is outlined in the following diagram:

Design Guidance

When you design ExpressRoute peering, consider the following:

Capability Consideration

Capability Decision Points

Azure Services in the datacenter

All Azure services reside within an Azure datacenter and are assigned routable IP addresses.

Public peering services

From a design perspective, any Azure public service only sees the NAT device address. If the Azure public service provides firewall protection, only the NAT addresses can be used in the firewall rules.

From a security perspective, specifying the NAT addresses will prevent connections from the Internet for the customer's instance of that public service. This also means that any system behind the NAT can access the public service, which may not be desired from a security perspective.

ExpressRoute Performance Design Considerations

ExpressRoute circuits provide a private path to route traffic to the Azure datacenter. When the traffic reaches the Azure edge device, it must leverage the software-defined routing within the Azure datacenter to isolate traffic. Currently, ExpressRoute connections from the customer's datacenter to the Azure edge and can achieve up to 10 Gbps.

At that edge, virtual connections are established to the customer's private virtual network gateways to enable routing traffic. Today the maximum performance that a single virtual network gateway can provide is 2 Gbps. To optimize traffic through the ExpressRoute circuit, it may be required to leverage multiple virtual networks and gateways.

Customers have the ability to purchase the optimal ExpressRoute circuit bandwidth to meet their throughput requirements. Circuits can be upgraded to provide additional performance with minimal impact. Circuits cannot be downgraded without impact.

Design Guidance

When you design for ExpressRoute performance, consider the following:

Capability Consideration

Capability Decision Points

Bursting traffic

ExpressRoute circuits allow for bursting of traffic to up to two times the rated bandwidth of the circuit. Gateways will also support this bursting capability. Gateways and circuits will drop packets if the burst limit is exceeded.

Standard versus Premium ExpressRoute

ExpressRoute comes in two SKUs: Standard and Premium. Although the performance of the ExpressRoute circuit does not change the number of routes, the number of virtual network connections per circuit, and the ability to route traffic across Azure regions is an upgrade when using Premium.

Gateway performance

Gateways come in three SKUs: Basic, Standard, and High Performance. The maximum speed of the gateway is a function of the SKU and affects the performance that you can achieve over an ExpressRoute circuit to a single virtual network.

ExpressRoute Cost Design Considerations

ExpressRoute connectivity and pricing is made of two components: the service connection costs (Azure) and the authorized carrier costs (telco partner). Customers are charged by Azure for the ExpressRoute monthly access fee, and potentially an egress traffic fee based on the type and performance of the ExpressRoute connection. Customers also have costs associated with the selected provider, which is typically comprised of the circuit connection and monthly traffic fees.

From an Azure perspective, an NSP connection is an inclusive plan where customers are charged a monthly fee and get unlimited ingress and egress traffic. Fees associated with IXP connections include a monthly service charge and potential traffic egress charges when a high watermark of traffic is exceeded. In these cases, the customer is charged an additional fee based on the amount of egress traffic above the included amount.

Feature References

Azure ExpressRoute cost information

http://azure.microsoft.com/en-us/pricing/details/expressroute/

Recommended: When planning for network connectivity with ExpressRoute, ensure that the costs are well understood and that conversations with authorized carriers are addressed early in the planning process.

Design Guidance

When you design for ExpressRoute costs, consider the following:

Capability Consideration

Capability Decision Points

Provider costs

The provider costs are much harder to ascertain because egress traffic is inclusive and additional monthly circuit fees typically apply. NSP provider costs can range from a flat monthly fee per gigabyte to a premium service that is a large flat monthly fee. Potential additional costs may include the number of MPLS circuits that have been configured.

IPX connections

An IXP provider connection is a fiber connection, and it typically includes monthly and one-time fiber connection fees. The customer's connection typically includes the fiber from the datacenter to the provider's access point, the costs for transmitting the traffic to the consolidation point of presence, and a "last mile" fiber connection to get connected to the Azure datacenter.

NSP connections

For the NSP model, the provider typically provides and manages the provider edge routers and the configuration and management of the published routes. However, for an IXP model, the customer must provide the router that is placed at the provider access point and manage all the route publishing.

The advantage of the IXP model is that typically the customers is given a rack and is allowed to place hardware in addition to the router. This allows the customer to include security hardware and other appliances.

ExpressRoute Premium

ExpressRoute Premium is an add-on package that allows an increase in the number of BGP routes, allows for global connectivity, and increases the number of virtual networks per ExpressRoute circuit. This add-on can be applied to Network Service Provider or Exchange Provider circuits.

Summary of ExpressRoute Premium features:

  • Increased route limits for public and private peering (from 4,000 routes to 10,000 routes).
  • Global connectivity for services. An ExpressRoute circuit created in any region (excluding China and government cloud) will have access to resources across any other region in the world. For example, a virtual network created in West Europe can be accessed through an ExpressRoute circuit provisioned in the West US region.

Increased number of virtual network links per ExpressRoute circuit (from 10 to a larger limit, depending on the bandwidth of the circuit).

Feature References

Azure ExpressRoute Premium Circuit connection information

https://azure.microsoft.com/en-us/documentation/articles/expressroute-faqs/

Design Guidance

When you design for ExpressRoute Premium, consider the following:

Capability Consideration

Capability Decision Points

Service availability and access

Although ExpressRoute Premium is available in regions such as India and Australia, to leverage the cross virtual network connectivity, you must have a business presence within the country and a local Azure billing account to establish a cross-region virtual network connection.

Microsoft Azure Site–to-Site VPN

Microsoft Site-to-Site (S2S) connectivity allows low cost connections from customer locations to Azure private peering networks. S2S leverages the Internet for transport and IPsec encryption to protect the data flowing across the connection.

Requirements:

  • Public facing IPv4 address for the on-premises VPN device that is not behind a NAT
  • Compatible hardware VPN device or RRAS

Potential Use Cases:

  • CSP Connectivity
  • Branch office connectivity
  • Low cost primary datacenter connection where you do not want to require configuration at the client level
  • Require virtual network-virtual network routing in Azure datacenter

Feature References

VPN Device information

https://msdn.microsoft.com/en-us/library/azure/jj156075.aspx

Configuring Multiple Sites Connectivity

https://msdn.microsoft.com/en-us/library/azure/dn690124.aspx

Configure a Cross-Premises Site-to-Site connection to an Azure Virtual Network

http://azure.microsoft.com/documentation/articles/vpn-gateway-site-to-site-create/

Mandatory:

  • A dedicated IPv4 address is required for the on-premises VPN device to establish a S2S VPN connection
  • For CSP scenarios, a separate S2S VPN device is required for the CSP connection

Recommended: If automating the creation of the gateway for the S2S VPN, specify your shared key versus retrieving a shared key from Azure.

  • Always use encrypted VPN connections for S2S VPNs.
  • Always use VPN devices that support dynamic routing.
  • Leverage multi-site S2S support to provide redundant paths to a virtual network
  • Leverage the high performance gateways to maximize virtual network to virtual network connections and to obtain the high performance connection for S2S scenarios.

Optional: Leverage the New-GUID cmdlet to generate a complex shared key

Design Guidance

When you design for Site-to-Site Connections, consider the following:

Capability Consideration

Capability Decision Points

S2S VPN performance

A maximum of 200 Mbps connection per VPN at the gateway interface to the virtual network regardless of the Internet connection speed

On-premises VPN device

The supported VPN device used determines virtual network routing capability (static or dynamic routing)

Shared keys

Shared keys are required for establishing site-to-site connectivity

Multi-site support

You must determine if multiple on-premises sites can access a single virtual network gateway.

Public peering

S2S VPNs do not have access to the public peering network to connect to Azure services.

Microsoft Azure Point-to-Site VPN

Microsoft Point-to-Site (S2S) connectivity allows low cost connections from customer workstations to Azure private peering networks. P2S leverages the Internet for transport and certificate-based encryption to protect the data flowing across the connection. A VPN device or a public facing IPv4 address is not required to establish a P2S VPN connection.

Requirements:

  • Microsoft VPN client installed on the workstation with a supported operating system
  • Outbound Internet access
  • Root certificate installed in Azure to support encryption
  • Client certificate installed on the workstation
  • Virtual network with a dynamic routing gateway

Potential use cases:

  • Developers accessing virtual networks without any other dedicated network connectivity
  • Companies that have no datacenter or branch offices where they can place a VPN device to establish a S2S connection
  • Temporary connection while you are away from your S2S network
  • Do not have a public IP4 address to establish a S2S VPN

Feature References

Configure a Cross-Premises Point-to-Site connection to an Azure Virtual Network

http://azure.microsoft.com/documentation/articles/vpn-gateway-point-to-site-create/


  •  

    Mandatory: The following are required for implementation of Point-to-Site VPN connections:

    • A certificate to encrypt the connection
    • Microsoft VPN client package installed on the workstation
    • P2S is only supported with a dynamic routing gateway

    Design Guidance

    When you design for Point-to-Site connections, consider the following:

    Capability Consideration

    Capability Decision Points

    P2S VPN limitations

    There is a maximum of 128 P2S VPN connections per virtual network. At the time of writing, the client package is available for x86 and x64 Windows clients.

    Certificate requirements

    Self-signed or Enterprise Certification Authority (CA) certificates must be used

    Interoperability with ExpressRoute

    You cannot leverage P2S connections with a virtual network connected to an ExpressRoute circuit due to existing gateway limitations.

    Forced Tunneling

    Forced tunneling allows you to specify the default route for one or more virtual networks to be the on-premises VPN or ExpressRoute gateway. This is implemented by publishing a 0.0.0.0/0 route that points to that gateway. In effect, this results in any packet that is transmitted from a virtual machine connected to the virtual network that is not destined to another IP address within the scope of the virtual network to be sent to that default gateway.

    When using forced tunneling, any outbound packet that is attempting to go to an Internet address will be routed to the default gateway and not to the Azure Internet interface. For a virtual machine that has a public endpoint defined that allows inbound traffic, a packet from the Internet will be able to enter the virtual machine on the defined port. A response might be sent, but the reply will not go back out the public endpoint to the Internet. Rather, it will be routed to the default gateway. If the default gateway does not have a route path to the Internet, the packets will be dropped, effectively blocking any Internet access.

    Forced tunneling has different implementation requirements and scope depending on the type of Azure connectivity of the virtual network. A virtual network that is connected over a S2S VPN connection requires forced tunneling to be defined and configured on a per virtual network basis by using Azure PowerShell. A virtual network that is connected over an ExpressRoute connection requires forced tunneling to be defined at the ExpressRoute circuit, and this affects all virtual networks that are connected to that circuit.

    Determining how forced tunneling will be used in a design should involve the following design decisions:

    • Type of virtual network connectivity (S2S or ExpressRoute) (defines scope of impact)
    • Requirements for direct Internet egress via Azure's Internet connection (direct conflict with a business requirement)
    • Security requirements and flexibility (forced tunneling can provide isolation of network traffic while the connection is up)
    • Connectivity costs (forcing all traffic back over the S2S or ExpressRoute circuit)
    • CSP considerations on how the traffic will be routed to the customer. CSP scenarios will usually require user defined routing
    Defense-in-Depth Considerations for Forced Tunneling

    It is always a good security practice to have defense-in-depth where there are additional layers of security in case a layer is compromised or inadvertently removed. Forced tunneling forces all packets back to the default gateway. However, relying only on that approach is not a good defense in depth design.

    If you leverage forced tunneling, there is no reason to define any public endpoint for virtual machines when they are provisioned. Leaving the default public ports provides a security vector if forced tunneling is ever disabled.

    A good design practice is to implement Network Security Groups for the subnets of every virtual network configured for forced tunneling. This allows you to have an additional layer of network protection. Understand that although you can use Network Security Groups to create rules for a virtual machine or a subnet that restricts outbound access, any co-administrator can temporarily or permanently override those rules. (Note that Network Security Group rule changes are logged.)

    Forced tunneling provides the best defense-in-depth for a virtual network that is connected by ExpressRoute. A forced tunnel configuration with an ExpressRoute circuit requires a network engineer to be involved because it is implemented as a BGP routing configuration. This is not something an Azure co-administrator has the rights to configure. However, a forced tunneling configuration with a S2S VPN is something that can be performed by a co-administrator on each virtual network.

    Mandatory: A design that leverages forced tunneling (default route) typically must provide access via a different path than using Azure Internet access.

    Recommended: Combine forced tunneling with Network Security Groups to achieve defense-in-depth for traffic isolation.

    Optional: Investigate the use of a dual network adapter edge firewall appliance with an extranet subnet as one alternative to Network Security Groups.

    Network Security

    Network security in Azure can present many challenges, especially to organizations that rely exclusively on network security measures for isolation. Many on-premises and IaaS deployments leverage a point-to-point firewall, rule-based approach to secure access to resources. They combine this with platform-based authentication access.

    PaaS deployments present new challenges because they are designed to be driven by application and identity controls, but many organizations attempt these deployments with traditional network based-approaches.

    Hybrid PaaS and IaaS deployments (where there may be IaaS or PaaS roles combined with Azure public services such as Azure SQL Database, Redis Cache, and Service Bus) are the most challenging to plan. This is because the Azure public services are multitenant, and in some cases, they are cannot be connected directly to the network infrastructure owned by the customer.

    Many of the public Azure services also do not have the construct of a service-level firewall that the customer can configure directly, so leveraging traditional approaches to secure those solutions with classic network approaches can be challenging.

    Regulatory requirements introduce potential complications because they sometimes explicitly require (or are interpreted to require) a traditional, on-premises, point-to-point network security approach to mitigations.

    Another complication with attempting to use traditional network-based security controls exclusively is that most of these controls assume the IP address is a good proxy for machine or service identity.

    IP addresses are a poor proxy for identity outside of a corporate LAN that is using static assignments, particularly in a globally scaled Internet service such as Azure where IP addresses change rapidly. This typically creates significant challenges for organizations that are overly reliant on network security measures and are using static IP addresses for server and service mapping.

    Review the guidance in the Microsoft Azure Security section (specifically the Containment and Segmentation Strategy) for how to design complete security containment strategies that overcome the limitations of networking controls alone.

    Traditional Security Approaches

    Applying traditional security approaches to Azure networking involves the following:

    Security Feature   

    Description

    When to Use

    Network security group

    Access control rules that can be applied to subnets or virtual machines

    • Control the ingress and egress traffic of a subnet
    • Control traffic between virtual machines in a subnet
    • Control ingress and egress traffic for a single virtual machine

    Forced tunneling

    Default route for a gateway that send all non-local traffic to the customer's on-premises edge router for processing

    • Block outbound Internet traffic in a virtual network
    • Block inbound traffic in a virtual network
    • ExpressRoute is implemented at the BGP routing level. Provides defense-in-depth by role separation.

    Firewall appliances – single network adapter

    Software-based firewall that can be placed between virtual machines and the Internet. Requires all traffic to be routed through the firewall, typically by using agents and IPsec.

    • Need additional firewall protection from the Internet
    • Want traffic flow control of all virtual machine traffic
    • Want to log and monitor traffic

    Firewall appliances – dual network adapters

    Software-based firewall that can be placed between subnets or between a subnet and the Internet

    • Need additional firewall protection from the Internet and also need high throughput.
    • Want packet capture and inspection
    • Want detailed logging

    IPsec

    Traffic authentication and encryption at the server level. Requires machines to be domain joined,

    • Want to use policy-based traffic encryption
    • Want to control which servers can communicate with each other

    Hardware firewall appliances at the network edge

    Placing a hardware firewall appliance at the customers network edge

    • Want to control ingress and egress traffic between Azure and on-premises

    Web application firewalls

    Software-based firewall that is used to control ingress traffic from the Internet. Typically a layer 7 firewall.

    • Want SSL session termination
    • Want session affinity

    Network Access Approaches

    When customers extend their datacenters to Azure or deploy an application within the Azure infrastructure, they must select an approach for access control and security, based on an access scenario. Common access scenarios include:

    • Internal users
      • Accessing the solution from on premises
      • Accessing the solution from non-corporate locations
      • Accessing the solution via VPN
    • External users
      • Accessing the solution via the Internet
    • Dependent applications
      • Accessing the solution from on premises
      • Accessing the solution from Azure
      • Accessing the solution from Internet locations
    • Solution
      • Accessing on-premises resources
      • Accessing Azure public services
      • Accessing other Internet-facing services

    Note that a security-access approach might have multiple options to provide access. For example, accessing an application in Azure via the Internet can be accomplished with different security and traffic routing approaches.

    Application Access Approach   

    Description

    When to Use

    Direct to Azure

    Internet access is accomplished by exposing the UI tier directly on the Internet.

    • The application needs to be accessed from the Internet, and minimal security is required.
    • The application has no connection to corporate resources.

    Using the existing security solution

    Internet access could be blocked by using forced tunneling. All traffic must flow through the corporate Internet-facing security stack, be routed over the corporate backbone, and get to Azure using ExpressRoute or S2S connections.

    • The application needs to be accessed from the Internet, and high security is required.
    • A security stack that meets requirements cannot be created in Azure.

    Using a provider security solution

    Internet access could be blocked by using forced tunneling. All traffic must flow through a service provider's Internet-facing security stack, be routed over the service providers backbone, and get to Azure by using S2S or ExpressRoute connections,

    • The application needs to be accessed from the Internet, and high security is required.
    • Internet access is being provided by a service provider that has a backbone connection to Azure.
    • Existing corporate Internet bandwidth cannot handle application load.

    Using an Azure-based security solution

    Internet access is accomplished by building a security stack in Azure by using network virtual appliances.

    • Application needs to be accessed from the Internet, and high security is required.
    • Existing corporate Internet bandwidth cannot handle the application load.
    • There is a desire to have no dependency on corporate resources.
    • High-speed Internet access is required.
    • Application requires global load balancing and the lowest latency connection.

    Virtual Appliances

    Virtual Appliances are third-party-based virtual machine solutions that can be selected from the Azure Gallery or Marketplace to provide services like network firewall, application firewall and proxy, load balancing, and logging. Appliances are licensed by:

    • Using a license key that you already own.
    • Including the licensing cost into the hourly cost of the appliance.

    Appliances are available in single network adapter or multiple network adapter configurations depending on the type of appliance and the required capabilities. For example, a logging appliance might only require a virtual machine with a single network adapter because all the traffic is written to the appliance. A network firewall typically requires a virtual machine with a multiple network adapter configuration that supports layer 3 routing so that the traffic has to flow through the appliance to reach its destination.

    To leverage an appliance that supports layer 3 routing, the network architecture must include user defined routing to override the default implicit routes to specify explicit user-defined routes. This allows the specification of routing rules that can direct traffic to the appliance network adapter, to the local virtual network, or to the on-premises environment.

    The following table lists virtual appliances types and when to use them:

    Virtual Appliance Type   

    Description

    When to Use

    Network firewall

    Virtual appliance that leverages a virtual machine with a multiple network adapter configuration and layer 3 routing support to enable a network firewall between multiple subnets in Azure.

    • Control outbound traffic flow to the Internet from an application tier
    • Control inbound traffic flow from the Internet to a UI tier of an application
    • Control traffic flow between two subnets in Azure
    • Collect detailed packet captures or network logs of traffic flowing through the appliance

    Load balancer

    Provides layer 4 or layer 7 load balancing

    • A load balancer with more features that the Azure Load Balancer is required
    • Detailed logging is required
    • SSL termination is required

    Security appliance

    Intrusion detection appliance

    • Attempting to create a security stack to manage inbound Internet traffic
    • Advanced security monitoring and mitigation solution is needed

    User Defined Routing

    User defined routing allows you to configure and assign routes that override the default implicit system routes, ExpressRoute BGP advertised routes, or the local-site network-defined routes for S2S connections. Configuring a user defined route allows the specification of next-hop definition rules that control traffic flow within a subnet, between subnets, from a subnet through an appliance to another subnet, to the Internet, and to on-premises networks.

    Configuring user defined routes involves modifying the default routing table. Each entry in the routing table requires a set of information:

    • Destination address CIDR
    • NextHop type specification: Includes Local, VPN Gateway, Internet, Virtual Appliance, NULL
    • If the NextHop type is Virtual Appliance, you need the address of the appliance network adapter.

    User defined routing is only applied to virtual machines and cloud services in Azure. Placing a virtual appliance and defining user defined routes between on-premises networks and Azure allows you to control the traffic. Any traffic that flows from on-premises networks to Azure is not affected by the user defined routes, and it leverages the system routes and bypasses the virtual appliance.

    Mandatory: To leverage user defined routing or a virtual appliance requires that both are implemented.

    Recommended:

    • Ensure that any user defined routes are more specific than ExpressRoute BGP routes or local site network routes; otherwise, they will not be used.
    • CSP scenarios should leverage user defined routing to control customer traffic is required
    Route Table Design Considerations

    While you can have multiple route tables defined, a subnet can only have a single route table associated with it. A single route table can be associated to multiple subnets. All virtual machines and cloud services connected to a subnet are affected by the route table decisions.

    Default Routing in a Subnet

    Routing of traffic from a virtual machine is accomplished by using implicit system routing via a distributed router that is implemented at the virtual network level. Every packet follows a set of implicit routes that are implemented at the host level. These routes control the flow of traffic within the virtual network to on-premises networks (if enabled), and to the Internet. Traffic flow to the Internet is achieved through NAT by the host.

    The following diagram shows the implicit routing rules that a virtual machine follows by default without any user defined routing.

    The following rules are applied to the packet in this scenario:

    • If the address is within the virtual network address prefix, route to the local virtual network.
    • If the address is within the on-premises address prefixes or BGP published routes (BGP or local site network for S2S), route to the gateway.
    • If the address is not part of the virtual network, BGP, or local site network routes, route to Internet via NAT.
    • If the destination is an Azure datacenter address and ExpressRoute public peering is enabled, it is routed to the gateway because the gateway has the Azure datacenter address via BGP.
    • If the destination is an Azure datacenter with S2S or ExpressRoute without public peering enabled, it is routed to the host NAT for the Internet path, but it never leaves the datacenter
    Routing Changes introduced by User Defined Routing and Virtual Appliances

    When a network firewall virtual appliance is introduced to the scenario, user defined routing must be configured to control the traffic routing through the appliance. Without user defined routing, no traffic will flow through the appliance.

    The following diagram shows a virtual appliance inserted into the scenario to control traffic routing to the Internet via front-end and back-end subnets in Azure:

    The following rules are applied to the packet in this scenario:

    • If the user-defined routing is defined with NextHop Local routing, route to a virtual machine in the virtual network, based on address.
    • If the user-defined routing is defined with NextHop VPN Gateway routing, route to a machine on-premises, based on address.
    • If the user-defined routing is defined with NextHop Appliance routing, route to the virtual appliance, based on address.
    • If the user-defined routing is defined with NextHop Internet routing, route to the Internet over the host NAT

    Mandatory: For CSP scenarios where the provider attempts to leverage a single VPN device to connect multiple customers to Azure, user defined routing is required to maintain proper traffic separation and flow.

    Network Security Groups

    A Network Security Group is a top-level object that is associated with your subscription. It can be used to control traffic to one or more virtual machine instances in your virtual network. A Network Security Group contains access control rules that allow or deny traffic to virtual machine instances. The rules of a Network Security Group can be changed at any time, and changes are applied to all associated instances.

    A Network Security Group requires a regional virtual network. Network Security Groups are not compatible with virtual networks that are associated with an affinity group.

    Network Security Groups are similar to firewall rules in that they provide the ability to control the inbound and outbound traffic to a subnet, a virtual machine, or virtual network adapter.

    Network Security Groups allow you to define rules that specify the source IP address, source port, destination address, destination port, priority, and traffic action (Allow or Deny). The rules can be applied to inbound and outbound traffic independently.

    Traditionally, a firewall rule is applied to a port on a router that is connected to a switch. It affects all traffic flowing inbound and outbound to the switch, but it does not affect any traffic within the switch. A Network Security Group rule that is applied to a subnet is more like a firewall rule that is applied at the switch and affects inbound and outbound traffic on every port in the switch.

    Any virtual machine connected to the switch port would be affected by the Network Security Group rule applied to the subnet.

    For example, if a Network Security Group is created and a Network Security Group rule is defined that denies inbound Remote Desktop Protocol (RDP) traffic for all addresses over port 3389, no virtual machine outside the subnet can connect via RDP to a virtual machine that is connected to the subnet, and no virtual machine connected to the subnet can connect via RDP to any other connected virtual machine.

    Network Security Groups can also be applied to the virtual machine or to the network adapter of a virtual machine. This allows greater flexibility in how traffic is filtered.

    Mandatory: For Ingress traffic to the VM, rules are applied at a subnet level, then VM level, and then NIC level. For Egress traffic from the VM, rules are applied at the NIC level, then VM level, and then subnet level. Rules are applied in priority order.

    To allow the virtual machines within the subnet to connect via RDP to each other, a new rule with higher priority has to be added that allows inbound traffic from the subnet CIDR on port 3389.

    Description

    Priority

    Source Address

    Source Port

    Destination Address

    Destination Port

    Protocol

    Action

    Deny inbound RDP

    1010

    *

    *

    *

    3389

    TCP

    Deny

    Allow inbound for subnet

    1000

    192.168.100.0/24

    *

    *

    3389

    TCP

    Allow

    Every Network Security Group that is created has a set of default Inbound and Outbound rules that are defined and cannot be deleted. The rules can be overridden with higher priority rules though. Any user defined rule can range from a priority value of 100-4096, where 100 is the highest priority rule.

    Default Inbound Network Security Group Rules

    Description

    Priority

    Source Address

    Source Port

    Destination Address

    Destination Port

    Protocol

    Action

    Allow virtual network inbound

    65000

    VIRTUAL_NETWORK

    *

    VIRTUAL_NETWORK

    *

    *

    Allow

    Allow Azure load balancer inbound

    65001

    AZURE_LOADBALANCER

    *

    *

    *

    *

    Allow

    Deny all inbound

    65500

    *

    *

    *

    *

    *

    Deny

    Default Outbound Network Security Group Rules

    Description

    Priority

    Source Address

    Source Port

    Destination Address

    Destination Port

    Protocol

    Action

    Allow virtual network outbound

    65000

    VIRTUAL_NETWORK

    *

    VIRTUAL_NETWORK

    *

    *

    Allow

    Allow Internet outbound

    65001

    *

    *

    INTERNET

    *

    *

    Allow

    Deny all outbound

    65500

    *

    *

    *

    *

    *

    Deny

    Subscription Limits for Network Security Groups

    Object

    Service Management Subscription Limit

    Resource Management Subscription Limits

    Network Security Groups

    100 per subscription

    1 Network Security Group per subnet

    1 Network Security Group per virtual machine

    1 Network Security Group per network adapter

    1 Network Security Group can be linked to multiple subnets, virtual machines, or network adapters

    100 per region/per subscription

    1 Network Security Group per subnet

    1 Network Security Group per virtual machine

    1 Network Security Group per network adapter

    1 Network Security Group can be linked to multiple subnets, virtual machines, or network adapters

    Network Security Group rules

    100 Rules/Network Security Group*

    100 rules per Network Security Group*

    *Can be increased by Microsoft support personnel to a maximum of 400 rules per Network Security Group.

    Default tags are system-provided identifiers to address a category of IP addresses. Default tags can be specified in customer-defined rules. The default tags are as follows:

    Tag

    Description

    VIRTUAL_NETWORK

    This default tag denotes all of your network address space. It includes the virtual network address space (IP CIDR in Azure) and all connected on-premises address spaces (local networks). It also includes virtual network-to-virtual network address spaces.

    AZURE_LOADBALANCER

    This default tag denotes the load balancer for the Azure infrastructure. This translates to an IP address for an Azure datacenter where the health probes originate. This is needed only if the virtual machine or set of virtual machines associated with the Network Security Group is participating in a load balanced set. Note this is not the actual load balancer IP address.

    INTERNET

    This default tag denotes the IP address space that is outside the virtual network and reachable by public Internet. This range includes the public IP space that is owned by Azure. If you use this tag for outbound restrictions, you potentially will not be able to access an Azure PaaS service unless you have a higher priority rule that grants access to that service.

    Mandatory: Network Security Groups must be assigned to a subnet, virtual machine, or network adapter for any of the rules to affect traffic.

    Recommended: For CSP scenarios, consider using network security groups to protect subnets from improperly configured CSP routing tables.

    Design Guidance

    When you are designing Network Security Groups, consider the following:

    Capability Consideration

    Capability Decision Points

    Compliance

    Network Security Groups can present design challenges from a PCI or other compliance perspective because of the logging capabilities that exist in the service.

    Priority numbering

    When designing Network Security Group rules for inbound or outbound scenarios, be sure to leave a blank priority number space between rules. Note that the priority values are independent for inbound and outbound rules.

    Default rules

    To override the default rules, define a rule that has a priority number in the 4000 range to allow as many additional rules as possible.

    Port numbers

    Although you can specify a contiguous range of ports in the rule definition (1024-1048), you cannot specify random port numbers (1024, 1036, 30000).

    Targeting

    There are limitations that affect how many Network Security Groups, how many rules per Network Security Group, and how they can be applied to the object.

    Consider the following when determining Network Security Group targets:

    • Targeting by virtual machine is useful when the number of systems that require this rule set (immediately and in the future) is unknown. This requires per-virtual machine management, but it avoids issues such as IP space exhaustion through over-provisioning subnet address space to plan for future growth of a given role
    • Targeting by subnet is useful when the rule sets are defined by role and the number of systems that the Network Security Group is expected to apply is well known, and appropriate subnet sizes can be determined for these systems. This requires pre-planning IP address space, but it provides simplistic application and management of rule sets similar to management models for on-premises VLAN ACLs.

    Precedence

    Consider where the Network Security Group is being deployed and if there are other Network Security Group in play at virtual machine or subnet levels that may prevent the Network Security Group being applied at the virtual machine or network adapter level from functioning.

    • Deny at the subnet level takes precedence over Allow at virtual machine or network adapter level.
    • Deny at the virtual machine level takes precedence over Allow at the network adapter level or the subnet level.
    • Deny at network adapter level takes precedence over Allow at virtual machine or subnet level.

    Azure Endpoints and ACLs

    Endpoints allow for communication between Azure compute instances and the Internet. Endpoints can be defined in Azure so that they allow translation of a public port and IP address to a private port and private address. By default, when provisioning a virtual machine, two endpoints are automatically created:

    • Remote Desktop with a private port for 3389 and a random public port
    • PowerShell with a private port of 5986 and a public port of 5986

    In the portal, you can see the defined endpoints on the Endpoints tab of the virtual machine configuration.

    To provision a virtual machine with no public endpoints, you have two choices:

    • Use the portal and delete the default ports in the wizard
    • Use the parameter options in Azure PowerShell to disable the endpoints

    If a cloud service has more than one virtual machine, the endpoints have to share the single public facing VIP, but they require different public ports for the port translation to be redirected to the correct virtual machine.

    Endpoint design requires the consideration of the security threat that the endpoint provides. When you have a public facing endpoint, it can be used to access the provided service, but it also can be attacked by hackers.

    Azure provides denial-of-service features at the edge of the Azure firewall, but it does not prevent someone from attempting to hack a public facing port. Any public facing port for a provided service should leverage a strong authentication mechanism to help prevent a hacker from gaining access.

    Mandatory: Public endpoints are not required unless you need inbound access from the Internet.

    Recommended: Only enable public endpoints if the inbound Internet is the only way to achieve communication.

    Use P2S, S2S, or ExpressRoute to RDP, or access the PowerShell interfaces to a virtual machine versus using a public endpoint.

    To further protect resources deployed within Azure, you can manage incoming traffic to the public port by configuring rules for the network access control list (ACL) of the endpoint. An ACL provides the ability to selectively permit or deny traffic for a virtual machine endpoint for an additional layer of security.

    By using network ACLs, you can do the following:

    • Selectively permit or deny incoming traffic based on the remote subnet IPv4 address range to a virtual machine input endpoint.
    • Block lists of IP addresses
    • Create multiple rules per virtual machine endpoint
    • Specify up to 50 ACL rules per virtual machine endpoint
    • Use rule ordering to ensure the correct set of rules are applied on a given virtual machine endpoint (lowest to highest)
    • Specify an ACL for a specific remote subnet IPv4 address.

    For instructions about configuring ACLs for your Azure virtual machine endpoints, see:

    The following diagram outlines the UI for creating of an ACL for a public endpoint.

    Feature References

    Microsoft Azure Network Security Whitepaper version 3

    http://download.microsoft.com/download/C/A/3/CA3FC5C0-ECE0-4F87-BF4B-D74064A00846/AzureNetworkSecurity_v3_Feb2015.pdf

    Security Considerations for SQL Server in Azure Virtual Machines

    https://msdn.microsoft.com/en-us/library/azure/dn133147.aspx

    Active Directory Considerations in Azure Virtual Machines and Virtual Networks Part 5 – Domains and GCs

    http://blogs.technet.com/b/privatecloud/archive/2013/04/09/active-directory-considerations-in-azure-virtual-machines-and-virtual-networks-part-5-domains-and-gcs.aspx

    Security Considerations for Infrastructure as a Service–IaaS-Private Cloud

    http://blogs.technet.com/b/privatecloud/archive/2011/10/12/security-considerations-for-infrastructure-as-a-service-iaas-private-cloud.aspx

    IP Addresses

    Azure allocates IP addresses based on the type of object being provisioned and the options selected. In some cases, having a reserved IP address is required. The following table outlines the options for reserved IP addresses and the use cases for each type:

    Type

    Description

    When to Use

    DIP

    Dynamic IP address. Internal IP address assigned by default and associated with a virtual machine.

    Always assigned

    VIP

    Virtual IP address. Assigned to a virtual machine, cloud service load balancer, or an internal load balancer.

    Address is private for an internal load balancer and is public for a cloud service load balancer or a virtual machine.

    Address is shared across all virtual machines within the same cloud service.

    Always assigned

    PIP

    Public IP address. Public instance-level IP address that can be assigned to a virtual machine. A PIP allows direct communication to a virtual machine without going through the cloud service load balancer.

    Use only when you need to directly communicate with an instance in cloud service

    Reserved

    This is a static public-facing VIP address for a cloud service that must be specially requested. There are a limited number of these addresses per subscription.

    Use only when you need a public facing static IP address

    Internal static

    A static address allocated from the subnet address pool. Internal facing only. The number is only limited by the number of addresses assigned to the subnet address pool. This is implemented as a DHCP reservation.

    Use only when you need an internal facing static IP address

    For more information, see VIPs, DIPs and PIPs in Microsoft Azure.

    This relationship is illustrated in the following diagram:

    Address Assignment

    When you create an object that connects to a subnet in Azure, two IP addresses are automatically allocated to that object:

    • VIP: Public facing IP address associated with the cloud service it is a member
    • DIP: Internal facing private IP address

    Both addresses are assigned to the single network adapter and that adapter is connected to the subnet. The internal facing DIP address is allocated from the address space pool of the subnet to which the virtual machine is attached. The public facing VIP address is allocated from the pool of Azure datacenter addresses that are assigned to the datacenter where it resides.

    Azure provides dynamic allocation of IP addresses to compute resources within each subscription. Addresses are assigned starting from the first available address in the subnet pool. If a virtual machine is allocated an address and then it releases that address, the address is available for reassignment.

    An IP address assigned to a virtual machine is associated to the virtual machine until the machine is in a stopped (deallocated) state or it is destroyed completely. Using the Shutdown option in the Azure portal results in the virtual machine being placed in the stopped-deallocated state, and the DHCP reservation is released. When the virtual machine is restarted it will receive a new IP address.

    Actions like a virtual machine reboot by using Shutdown from the operating system via RDP, or by leveraging the Stop-AzureVM PowerShell cmdlet with the StayProvisioned parameter will not deallocate the IP address of the virtual machine.

    Azure IP addresses that are released to the available address pool are immediately available for reassignment to a virtual machine. When Azure allocates an address, it searches sequentially from the beginning of the subnet address pool until it finds an available address, and then assigns it to the virtual machine. This assignment method is used for dynamic and static addresses from the subnet address pool.

    Mandatory: Every object that connects to a subnet in Azure requires a DIP (including IaaS virtual machines, internal load balancers, PaaS roles)

    A public-facing VIP is always assigned to a cloud service and shared by all virtual machines or PaaS roles within the cloud service.

    Recommended: Do not use the Azure portal to shut down a virtual machine unless you are trying to change its IP address or delete the virtual machine, otherwise you will lose the assigned IP address.

    Use static IP addresses only when a dynamic address will not meet requirements. Do not use them because that is the current on-premises approach.

    Feature References

    Stop-AzureVM cmdlet command reference

    https://msdn.microsoft.com/en-us/library/azure/dn495269.aspx

    Static Addresses

    By default, in Azure all addresses are dynamic regardless of if they are provisioned through the Azure portal or through PowerShell. Statically assigned IP addresses can only be requested or assigned by using Azure PowerShell. During the object creation process, a command-line option is provided to allow a static address to be specified.

    There is not a way within Azure to preallocate or reserve an address prior to assignment. All address assignments are done at the time of object provisioning. To determine if an address is available to use as a static address, you can use the Azure PowerShell cmdlet Test-AzureStaticvNetIP to test if an IP address has already been allocated from the subnet address pool.

    If the address is not available, the cmdlet will return a list of addresses that are available. Using the Test-AzureStaticvNetIP cmdlet to determine if an address is allocated does not guarantee that the address has not been allocated by the time you provision the object.

    Reserved IP addresses are static public facing VIP addresses that are typically used to provide a static IP address for a public facing application. Using a reserved IP address allows a DNS A record to be created with minimum management overhead required. It also provides a consistent IP address that can be used for point-to-point security rules in firewalls. Reserved IP addresses must be requested by using the Azure PowerShell cmdlet New-AzureReservedIP, and then given a name. The name is used by the New-AzureVM cmdlet during provisioning.

    New-AzureVM -ServiceName "WebApp" -ReservedIPName "MyWebSiteIP"
    -Location "US West"

    There are a limited number of reserved IP addresses in a given subscription. The default is five addresses, but through a limit increase request, it may be increased to a maximum of 100.

    Reserved IP addresses are scarce resources, and they should only be used when a static address is absolutely required.

    Mandatory: Carefully plan and track reserved IP address usage to prevent running out of the address quota.

    Recommended: If more than five reserved addresses are required, contact Microsoft support early to increase the reserved address quota to prevent running out and preventing deployments.

    Leverage reserved IP address names that can be easily associated with the service they are being used for.

    Name Resolution

    When IaaS- and PaaS-provisioned services need to resolve host names and FQDNs, they can use either Azure provided name resolution or their own DNS server, depending on the actual scenario.

    Azure automatically registers a new virtual machine or PaaS role in the Azure default *.cloudapp.net DNS suffix. Storage accounts are registered in *.blob.core.windows.net. Azure Web Apps (a feature in Azure App Service) are registered in *.azurewebsites.net. It may also be desirable to have those services resolvable under a custom domain name, for example *.contoso.com.

    The following table is provided to outline scenarios that are related to name resolution.

    Scenario

    Name resolution provided by:

    Name resolution between role instances or virtual machines located in the same cloud service

    Azure-provided name resolution

    Name resolution between virtual machines and role instances located in the same virtual network

    Azure-provided name resolution using FQDN

    ~or~

    Name resolution using your DNS server

    Name resolution between virtual machines and role instances located in different virtual networks

    Name resolution using your DNS server

    Cross-premises: Name resolution between role instances or virtual machines in Azure and on-premises computers

    Name resolution using your DNS server

    Reverse lookup of internal IP addresses

    Name resolution using your DNS server

    Name resolution for custom domains (such as Active Directory domains or domains that you register)

    Name resolution using your DNS server

    Name resolution between role instances located in different cloud services, not in a virtual network

    Not applicable. Connectivity between virtual machines and role instances in different cloud services is not supported outside a virtual network.

    Feature References

    Azure Name Resolution

    https://msdn.microsoft.com/en-us/library/azure/jj156088.aspx

    Configure a custom domain name for blob data in an Azure Storage account

    http://azure.microsoft.com/en-us/documentation/articles/storage-custom-domain-name/

    Configure a custom domain name in Azure App Service

    http://azure.microsoft.com/en-us/documentation/articles/web-sites-custom-domain-name/

    Design Guidance

    When you are planning name resolution, consider the following:

    Capability Consideration

    Capability Decision Points

    Subscription creation

    When preparing a new Azure subscription for provisioning or migrating resources, configure DNS servers at the subscription level, and then assign them to the virtual network level so the Azure DHCP Server service will hand out the DNS servers for resolution support

    Service limits

    A maximum of 10 custom DNS servers can be configured per subscription.

    Azure DNS Service

    Azure DNS is a global scale DNS service for hosting tenant DNS domains and providing name resolution by using Microsoft Azure infrastructure. Azure DNS has been tuned to be a highly available DNS service with fast query response times. Azure DNS provides updates of DNS records and global distribution.

    By hosting domains in Azure DNS, tenant DNS records can be managed by using the same credentials, APIs, tools, and billing as other Azure services.

    Mandatory: Automation scripts must be created to automate the creation and update of Azure DNS domains and records.

    Azure DNS domains are hosted on the Azure global network of DNS name servers. Azure uses Anycast networking, so that each DNS query is answered by the closest available DNS Server. This provides fast performance and high availability for your domain.

    Mandatory: Azure DNS does not currently support purchasing domain names. Tenants purchase domains from a third-party domain name registrar, who typically charges an annual fee. These purchased domains can then be hosted in Azure DNS to manage DNS records. For more information, see Delegate a Domain to Azure DNS.

    Mandatory: Azure DNS does not currently support C names at the root (apex) of the domain.

    To create the domains and domain records within Azure DNS, you can use Azure PowerShell, Azure CLI, REST APIs, or the SDK.

    Etags and Tags

    ETags are used to manage concurrency in a highly distributed DNS infrastructure where changes could be implemented at any location that has access to Azure. Azure DNS uses ETags to handle concurrent changes to the same resource safely.

    Each DNS resource (zone or record set) has an ETag associated with it. Whenever a resource is retrieved, its ETag is also retrieved. When updating a resource, you have the option to pass back the ETag so Azure DNS can verify that the ETag on the server matches.

    Because each update to a resource results in the ETag being regenerated, an ETag mismatch indicates that a concurrent change has occurred. ETags are also used when creating a new resource to ensure that the resource does not already exist.

    By default, Azure PowerShell uses ETags to block concurrent changes to DNS zones and record sets. The optional –Overwrite switch can be used to suppress ETag checks, in which case any concurrent changes that have occurred are overwritten.

    Tags are different from ETags. Tags are name-value pairs used by Azure Resource Manager to label resources for billing or grouping purposes. For more information about Tags, see Using tags to organize your Azure resources.

    Azure PowerShell supports Tags for zones and record sets. Tags are specified using the –Tag parameter:

    Load Balancing

    There are several mechanisms that provide load balancing capabilities within Azure. The following table outlines these features and their potential use in Azure designs:

    Type

    Description

    When to Use

    External load balancer

    A software load balancer that is automatically created when a cloud service is created. It is Internet facing only. It has a single Internet-facing VIP by default, but additional Internet-facing VIPs can be added.

    VIP addresses are dynamically assigned from the Azure public datacenter address pool by default, but they can be assigned a reserved static address.

    Use external load balancers to provide Internet-facing load balancing capabilities for the UI tier of an application.

    The remaining tiers of the application should use internal load balancers if required.

    Load balanced sets

    A way to combine multiple virtual machines or PaaS roles from a single cloud service into a group that is associated with a port of the load balancer.

    Use load balanced sets when you need to use a single VIP with multiple load balanced applications in a single cloud service.

    Internal load balancer

    A software load balancer that is internal facing only. It has a single VIP that is allocated from the local subnet address pool.

    Use an internal load balancer when you need load balancing capabilities for an application.

    However, that application should not be Internet-facing—for example, the second and third tiers of a three tier application.

    Traffic manager

    A public-facing load balancer that is designed to support cross datacenter balancing of loads and geolocation optimization so the user is sent to the closest datacenter.

    Typically used to load balance two cloud services in separate datacenters to provide geolocation optimization.

    Feature References

    Azure Load Balancer

    http://azure.microsoft.com/documentation/articles/load-balancer-overview

    Azure Traffic Manager Overview

    https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-overview/

    Azure Traffic Manager Load Balancing Methods

    https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-load-balancing-methods/

    About Traffic Manager Monitoring   

    https://azure.microsoft.com/en-us/documentation/articles/traffic-manager-monitoring/

    Internal Load Balancer

    http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview

    Configure Load Balanced Sets

    http://azure.microsoft.com/documentation/articles/load-balancer-internal-overview

    Configure an internal Load Balanced Sets

    http://azure.microsoft.com/documentation/articles/load-balancer-internal-getstarted

    Azure Internal Load Balancer SQL Always-On

    https://azure.microsoft.com/en-us/documentation/articles/load-balancer-configure-sqlao/

    Naming Conventions

    The choice of a name for any asset in Microsoft Azure is an important choice because:

    • It is difficult (though not impossible) to change that name at a later time.
    • There are certain constraints and requirements that must be met when choosing a name.

    This table covers the naming requirements for various elements of Azure networking.

    Item

    Length

    Casing

    Valid characters

    Virtual network

     

    Case-insensitive

    Alphanumeric and hyphen

    Cannot start with a space or end with a hyphen

    Subnet

     

    Case-insensitive

    Alphanumeric, underscore, and hyphen

    Must be unique within a virtual network

    Network Security Group

     

    Case-insensitive

    Alphanumeric and hyphen

    Network Security Group rule

     

    Case-insensitive

    Alphanumeric and hyphen

    AT&T VLAN name

    15

    Case-insensitive

    Alphanumeric and hyphen

    Microsoft Azure Identity

    Identity and Access Management is a daunting space in technology. As trends come and go and Internet threats mature, identity and access management solutions must constantly evolve.

    A few years ago, a framework for building identity solutions emerged. It's called "The Four Pillars of Identity." Many organizations have adopted this framework to forge their identity strategy at macro and micro levels.

    The Four Pillars of Identity are areas that identity solutions must address to be successful:

    • Administration
    • Authentication
    • Authorization
    • Auditing of identities.

    For more information about the Four Pillars of Identity, please read the whitepaper titled The Four Pillars of Identity – Identity Management in the Age of Hybrid IT.

    Several options for leveraging identity and access management solutions exist when working with Azure. Most often, it's helpful to distinguish between two audiences when determining a solution: the developer and the IT pro.

    The Developer Audience

    For developers, the most important thing with regards to identity is to integrate their applications with the organization's preferred identity and access management platform. In the past, many developers didn't have a good grasp on how to integrate applications with enterprise identity and access management platforms, so they often took on the task of managing the identities and access within the application itself.

    This places a lot of burden on the developers, because they have to take on all the work of each of the four identity pillars described previously. This means that they have to provide a place to store identities and provide a way for users to change identity data, manage credentials, deactivate their access, request new access, and so on.

    Developers would also have to securely authenticate users, manage entitlements that authorize users to various resources in the application, and maintain audit trails of authentication and access events.

    Even if a single development team can do this well for a given application, organizations typically have hundreds of applications in use. The result is that users have multiple identities sprawled throughout an organization, with each application operating independently with regards to identity and access management.

    The IT Professional Audience

    IT pros are under pressure from the organization to facilitate the adoption of cloud services by extending the traditional identity and access management enterprise into the cloud. Without this important integration, cloud services such as Azure, become virtually unusable.

    When there is no integration between the cloud and an organization's on-premises identity and access management platform, users have multiple identities with different credentials and different access rights to an organization's data. Not only is this a bad experience for end users, but it makes it impossible to manage access to all of an organization's applications and resources. The issue gets worse when non-Microsoft clouds are introduced into the equation.

    Another important concept is that identity becomes the "control plane" for the cloud. In the past, an organization could keep sensitive information on-premises and put up firewalls and extranets to protect it and keep potential malicious users out.

    This becomes much more difficult in a cloud-connected world. The network edge is being pushed out and becoming vaster, while users on mobile devices are accessing on-premises applications inside the organization's network and cloud services provided by the organization.

    Organizations can no longer depend on firewalls to keep out potential attackers because those firewalls also keep out the people who require access to resources. Because of this, the identity and access management platform is the primary means of protecting an organization's applications and data in the cloud-connected world.

    Azure Active Directory

    Azure Active Directory (Azure AD) interacts with the cloud in two ways:

    • An enabler of the cloud
    • A consumer of the cloud

    IT professionals will mostly be concerned with Azure AD as an enabler of the cloud because they are often tasked with integrating the enterprise identity and access management platform into the cloud.

    On the other hand, developers will mostly be concerned with the identity services that Azure AD provides as a consumer of the cloud. Most often, they are looking to understand how their applications can leverage the cloud identity service.

    Enabler of the Cloud

    Azure AD plays a pivotal role in enabling the cloud. To use Microsoft cloud services, such as Office 365, the cloud services must:

    • Store identity data that is used to identify the user
    • Store profile data about the user
    • Entitle the user to specific applications and data in the cloud service

    Rather than having each cloud service keep its own identity repository, all Microsoft cloud services use Azure AD. The capabilities of the Microsoft cloud cannot be enabled without it.

    After identities are populated in the Microsoft cloud, Azure AD becomes an identity and access management hub that enables other clouds. Azure AD can facilitate access to an organization's custom applications regardless of whether they are on-premises or hosted in the cloud, in addition to other Software-as-a-Service applications that do not reside in the Microsoft cloud.

    Consumer of the Cloud

    Azure AD is a single, multitenant directory that contains over 200 million active identities and serves billions of authentication requests each day. A cloud-scale identity service like this can only be built by using the scale and breadth of the cloud. In addition, Azure AD has features that rely on cloud services, such as Azure Multi-Factor Authentication and machine learning technologies. In this way, Azure AD consumes the cloud to provide its services.

    Azure AD becomes a cloud service that can be consumed by other applications and services. Application programming interfaces (APIs) and endpoints are exposed so that developers can use Azure AD to store and retrieve their identity data, and they can depend on Azure AD to authenticate users to their applications.

    IT professionals can use the cloud Identity Management-as-a-Service features, such as Self-service password reset, to enable new identity management capabilities that traditionally took months to deploy on-premises.

    Tenant Directory Planning

    The existence of an Azure AD directory is a requirement for an Azure subscription. Therefore, each Azure tenant has at least one directory associated with it. This directory is used for signing in to and accessing the Azure portal, Office 365, and other Microsoft cloud services.

    Additionally, Azure tenants can have multiple directories. These directories are separate and unique. For example, if two Azure AD directories exist in the same tenant, they have their own set of administrators and there is no data shared between them. Administrators of one directory in the tenant do not have access to another directory in the same tenant, unless they are explicitly granted access to it.

    Mandatory: There must be at least one directory in the tenant. You do not have a choice. All tenants created within Azure are assigned a default directory if one doesn't exist.

    How Many Directories Should a Customer Have?

    Most tenants should have at least two directories—one for the production users using the cloud services that are integrated with Azure AD, and another directory for testing.

    If a customer has software development teams, it is possible that those teams might need Azure AD directories that they can use for developing applications. The following criteria should be used to determine if separate development directories are needed:

    • Is there any reason why the development team can't use the test directory? For example, developers might need a separate directory if they need to control user accounts and attributes in the directory.
    • Does the development team need to have the full log-in experience that an end-user will go through? If so, the development directory might require a deeper level of integration with the on-premises Active Directory if the production tenant is federated with Active Directory Federation Services (AD FS). Maintaining this integration for each developer directory would be extremely arduous because it would require multiple on-premises servers. Most organizations would develop applications against the test directory in this case, rather than maintaining multiple Azure AD Connect instances.
    • Are any Azure AD Premium features (such as Multi-Factor Authentication) needed by the development team? Azure AD Premium is licensed per-directory. Therefore, if Azure AD Premium features are needed in the development directory, the customer must purchase independent licenses for the development accounts.

    Optional: Software development teams might want their own Azure AD directories in the tenant.

    Cross-Organizational Directories: Complex Government Organizations

    Some complex government organizations look like a single entity on paper; but in reality, they are multiple, independently-run organizations. The question of whether to have a single tenant or multiple tenants is a very important discussion to have with these customers before they get locked in to a model that doesn't work for them.

    There is no definitive answer for every situation. Rather, this must be addressed on a case-by-case basis. The following criteria should help you understand how to guide customers.

    Considerations for a cross-organizational directory:

    • The customer has a long-term goal of operating as a single entity with a consolidated Active Directory environment.
    • Applications in one organization within the tenant should be readily accessible by users in other organizations.

    Considerations for unique organizational directories:

    • Each organization in the tenant has an Active Directory environment and unique IT staff.
    • There are security requirements that prevent the customer from having a single set of directory administrators over all organizations.
    • Applications within an organization are restricted only to users within that organization.
    Cross-Organizational Directories: Mergers and Acquisitions

    The topic of a cross-organizational directory is important to discuss with commercial customers who often buy and sell other companies. The following criteria can be used to help you determine if the customer should have a cross-organizational directory.

    Considerations for a cross-organizational directory:

    • The customer plans to permanently integrate the acquired company with no foreseeable plans to divest it.
    • Users in the acquired company should be able to access applications and data in the acquiring company.

    Considerations for unique organizational directories:

    • The customer plans to divest the acquired companies in the future.
    • The acquired company is already an Azure AD customer and the cost and disruption of migrating the users to the acquiring tenant is prohibitive.
    • Users in the acquired company cannot access applications or data in the acquiring company.
    Custom Domain Names

    When a directory is created, the default name of the directory is <something>.onmicrosoft.com. The <something> is chosen by the directory administrator during the creation of the directory. Usually, customers want to use their own domain name, such as contoso.com. This can be achieved by using a custom domain name.

    Recommended: Add a customer's public-facing DNS name as a custom domain name for the production Azure AD directory. Otherwise, users will sign in with accounts such as bob@contoso.onmicrosoft.com instead of bob@contoso.com.

    Multiple custom domain names can be added to each Azure AD directory, but a custom domain name can only be used in one Azure AD directory. For example, if there are two Azure AD directories in the tenant, and the first directory assigns the custom domain name of contoso.com, the second directory cannot use that name.

    Mandatory: Custom domain names must be publically registered with an Internet domain name registrar, and the customer must be able to modify the DNS records of the public record to prove ownership of the domain.

    Feature References

    Add your custom domain to the Azure AD tenant

    https://msdn.microsoft.com/en-us/library/azure/hh969247.aspx

    Integrating On-Premises Active Directory (Identity Bridge)

    When you create a new Azure AD tenant, the contents of the directory will be managed independently from the on-premises Active Directory forest. This means that when a new user comes in to the organization, an administrator must create an on-premises Active Directory account and an Azure Active Directory account for the employee. Because these two accounts are separate by default, they also may have different user names and passwords, and they need to be managed separately.

    However, an organization can use Azure AD Connect to connect the on-premises Active Directory to Azure AD. When this is in place, users that are added or removed from the on-premises Active Directory are automatically added to Azure AD. The user names and passwords are also kept synchronized between the two directories, so end users do not have different credentials for cloud and on-premises systems.

    AD FS can be used to add an identity federation trust between on-premises Active Directory and Azure AD, which enables the users in the organization to have a single sign-on experience. We call this scenario the "identity bridge" because it bridges the on-premises identity systems with the cloud, thereby enabling a single identity service for the enterprise.

    Recommended: Unless you have a cloud-only company (with no on-premises systems), you should incorporate this integration. Even if you are not using Azure AD, you will have a better experience with Azure and the other Microsoft cloud services that you may subscribe to.

    Synchronizing Users to the Cloud

    The goal for synchronization of identities is to extend the on-premises Active Directory into Azure AD. After synchronization is in place, Active Directory and Azure AD should be viewed as a single identity service with on-premises and cloud components, instead of two separate identity services.

    In most cases, managing the identities (such as on-boarding, off-boarding, and entitlement changes) still occurs on-premises by using identity management solutions that were specifically created for these scenarios.

    This is depicted in the "On-Prem" box in the following diagram. These systems are usually going to be different than the identity bridge systems that connect the on-premises Active Directory to Azure AD.

    Historically, there have been four tools available to do the job of the identity bridge, which has caused a lot of confusion. Therefore, we released a single tool that can be used for everything except the most complex of scenarios.

    When deciding on which synchronization tool to use, the choice should be between using Azure AD Connect or the Microsoft Identity Manager Synchronization Services with the Azure AD Connector. This is summarized in the following diagram.

    In general, the default stance should be to use Azure AD Connect, unless the scenario is extremely complex, requiring a lot of customization. Some key features of Azure AD are lost (such as password synchronization and write-back) when using Identity Manager, so it should only be used as a fallback option if absolutely necessary.

    Multiple Active Directory Forests

    Many customers do not have simple single-forest Active Directory environments, and dealing with multiple forests can be a challenge when integrating with Azure AD. Typically, customers fall in two scenarios:

    • They have an account and resource forest model.
    • They have multiple forests with active users in many of them.

    Single Forest with Multiple Domains

    Some customers have a single forest environment with multiple domains. Azure AD Connect natively handles this scenario when the following conditions need to be met:

    • Users need to exist uniquely across the forest. A user cannot have an active account in more than one domain, because both accounts will be synchronized as separate identities in Azure AD.
    • If the domains in the forest use different UPN suffixes, each UPN suffix needs to be added to the Azure AD tenant as a custom domain name.

    Account and Resource Forest Model

    When a customer has an account and resource forest model, there is a dedicated forest where all of the user identities reside (the account forest) and a dedicated forest for some or all of the applications (the resource forest). A one-way trust (often a forest trust) is in place so that the resource forest trusts the account forest. This relationship is depicted in the following diagram.

    This is most commonly seen with complex Exchange Server deployments. Often, there needs be a representation of the user in the resource forest's Active Directory for the application to use. This is sometimes referred to this as a shadow account. In most cases, it's a duplicate of the user's account from the Account forest, but it is put into a disabled state. Thereby, users are prevented from signing in to it.

    Azure AD Connect natively handles this scenario. If the resource forest contains data that needs to be added to Azure AD (such as mailbox information for an Exchange user), the synchronization engine detects the presence of disabled accounts with linked mailboxes. The appropriate data is then contributed to the Azure AD user account.

    Multiple Forests with Unique Users

    In this scenario, there are multiple independent forests in the environment, which may or may not have Active Directory trust relationships between them. This situation will be encountered in highly segmented organizations or companies that acquire other companies via mergers and acquisitions. The following diagram depicts what this architecture might look like.

    Users in this scenario have only a single account in one of the forests (they do not have multiple user accounts across forests). Because of this, you do not need the synchronization tool to match a user to multiple accounts.

    However, one decision that needs to be made is whether the accounts will be migrated into a single forest at some point. This is an important thing to consider, because it will determine whether you can use the objectGUID of the user accounts as the source anchor (which is used to match the Active Directory accounts to the Azure AD accounts).

    If the users will be migrated to a single forest at some point, you'll need to use a different source anchor, such as the user's email address or UPN. The reason is that the objectGUID can't be migrated with the user. After migration, there would be multiple accounts in Azure AD for migrated users—one for the old forest and another for the new forest.

    Mandatory: If users from the additional forests will be migrated into a single forest in the future, you must choose something other than the objectGUID as the source anchor attribute (such as the mail attribute).

    Multiple Forests with Duplicate Users

    This scenario is the same as the previous scenario (multiple forests with unique users) with the exception that a single user has multiple user accounts in different forests in the environment. These accounts are either:

    • Enabled (users likely have a password and sign in to these accounts)
    • Disabled (a shadow account is used to store attributes for an application, such as Exchange).

    Even though there are multiple user accounts in the organization, there should be only a single account for the user in Azure AD. To enable this, the synchronization service needs to be able to match user accounts across the forests to a single person. For this to happen, the accounts in each forest need to have an attribute that contains the same, unique value for a user.

    Mandatory: If a single person has multiple user accounts in different forests, you must choose a common attribute to match the accounts together.

    UPN Alignment

    The User Principal Name (UPN) is the attribute in Azure AD that is used for a user's sign-in name. By default, this is sourced from the on-premises Active Directory directory by using the userPrincipalName attribute for the user account. Because of legacy guidance, some customers' AD forests use non-routable UPN suffixes or UPN suffixes that are different from the public-facing DNS name of the organization.

    For example, the UPN suffix in Active Directory might be @contoso.local, while the public facing DNS name is @contoso.com. In this situation, the Active Directory users have a log-in name similar to bob@contoso.local, rather than bob@contoso.com.

    Azure AD requires that the UPN suffix be a valid public domain name that is registered with an Internet name registrar. This is to ensure that it's unique across all Azure AD tenants and that only one organization owns the domain name. When the tenant is federated with an on-premises identity provider, the UPN suffix is used to determine where to redirect the user for authentication.

    Customers that have a UPN suffix that is not routable or not desirable for the user logon name have two options:

    • Perform a UPN rationalization exercise
    • Use the Alternate Login ID

    UPN Rationalization

    UPN rationalization entails that the organization add a new UPN suffix to the Active Directory forest, and then change the UPN suffix of every account to match the new UPN suffix. This is the preferred approach for UPN alignment because it provides the best experience for users after the alignment is complete. There are challenges with UPN rationalization, however.

    Applications that are Dependent on the UPN Attribute

    It is possible that some of a customer's applications use the UPN to store data about users in the application. If this is the case, changing the UPN in Active Directory would break those applications. The risk associated with performing a UPN rationalization exercise increases with the size of the organization.

    For smaller customers with a well-defined set of applications, it's easier to determine if changing the UPN suffix will impact any of the applications in use. However, for larger organizations, it is nearly impossible to gauge the impact. In that situation, it is best to pick a sample of users that is representative of all of the business groups in the organization, and first test the change with their accounts.

    Mandatory: If user certificates use the UPN in the Subject Name field, the certificates need to be reissued during the UPN rationalization.

    Recommended: Before performing rationalizing UPNs, build a catalog of the applications that have a dependency on the UPN attribute of the users, and test the new UPN on users of those applications.

    User Certificates Issued with UPN as the Subject Name

    Another big challenge when changing UPNs is that some organizations issue x.509 certificates that have the UPN value in the Subject Name field of the certificate. The impact varies with each customer because the certificates could be used for authentication or for signing or encrypting data, such as when sending email messages.

    The data in a certificate cannot simply be changed because the certificates are digitally signed by the Certification Authority that issues them. If the data is changed, the signature is broken and the certificate is no longer valid. Therefore, the certificates must be reissued when the UPN is changed.

    The process of obtaining new certificates varies between customers, so if there are certificates that rely on the UPN attribute, it's important to understand the process that the customer uses for reissuing those certificates. In some cases, this may mean provisioning a new "soft" certificate (a certificate with a private key that resides on the computer, rather than a hardware device) to the user's machine. Or it may require that the user write the new certificate to their smartcard.

    Recommended: Understand the process used for reissuing user certificates, so that you can adequately communicate with users and prepare for a massive reissuance event, if needed.

    Identity Management Systems

    If an identity management solution is in place, such as Microsoft Identity Manager, it's likely that there's a dependency on the UPN attribute. In these cases, it's likely that the identity management system is managing the value of the UPN attribute for users. So if the UPN is changed on the user account in Active Directory, the identity management system would set it back to the old value (which it deems is authoritative).

    Depending on the configuration of the identity management system, it is also possible that the UPN attribute is being used as an anchor for joining identities in different systems to the identities in Active Directory. Therefore, changing the UPN without updating the identity management system could result in identities being disassociated in the connected systems. At best, this would cause the identities to stop synchronizing to those systems. At worst, the identities would be deleted from the target system.

    Recommended: Spend some time understanding the identity management systems that are used for managing Active Directory within the organization to ensure that there isn't a dependency on the UPN of user accounts.

    Alternate Login ID

    The Alternate Login ID is a way to achieve UPN alignment without having to modify the UPN attribute of user accounts in Active Directory. When using the Alternate Login ID, an Active Directory attribute other than userPrincipalName is selected to feed the UPN of Azure AD. This can be any unique, indexed attribute that uses the user@domain.com format. The impact to users is much less than changing their UPNs.

    Although the Alternate Login ID can help in some situations, it should not be the default solution because it has some drawbacks, including:

    • Cannot be used with an Exchange hybrid online deployment.
    • Configuring it on an existing Azure AD Sync implementation that has already synchronized to Azure AD requires that you manually change the UPN on each Azure AD account.
    • Kerberos-based single sign-on no longer works for applications that rely on the Sign-in Assistant (such as Lync, OneDrive for Business, and Office Pro Plus). Users are prompted to enter credentials, which then can be cached by the Windows Credential Manager, but users will be prompted on a regular basis when their password changes.
    • Azure AD Application Proxy requires that the UPN in Azure AD is the same as the UPN in the on-premises Active Directory for Kerberos constrained delegation to work. Therefore, an Alternate Login ID will break Kerberos constrained delegation for Azure AD Application Proxy.

    Due to these issues, it is recommended that Alternate Login ID be used as a secondary option only when UPN rationalization is not possible with a customer.

    Synchronization Server Availability

    It is not possible to have a high availability design for the server hosting the Azure AD Connect service. By default, the synchronization server runs the synchronization job to Azure AD every three hours by using a scheduled task on the server. This interval can be decreased, if needed. High availability for the Azure AD Connect server should not be necessary in most situations because synchronization is not a continuous event.

    In the event of a catastrophic failure, a new Azure AD Connect server can be built and synchronized in a couple of hours for a medium-sized business. Larger business with more than 100,000 users will take more time to synchronize. If you need a faster recovery time, Azure AD Connect can be configured to use a dedicated SQL Server deployment with high availability.

    Consider a dedicated SQL Server environment in the following scenarios:

    • The organization has more than 100,000 users. The SQL Express LocalDB used by Azure AD Connect has a limitation of a 10 GB database. Therefore, if an organization has more users than SQL Express can hold, a full SQL Server implementation is required.
    • A large organization wants to have a low recovery time for the synchronization service.

    Optional: A dedicated SQL Server instance can be used to provide better performance and high availability options for the Azure AD Connect synchronization service.

    Password Hash Synchronization

    With password hash synchronization, the Azure AD Connect service will synchronize one-way SHA256 hashes of Active Directory password hashes into Azure AD. This allows a user that signs into Azure AD to use the same password that is used to sign in to the on-premises Active Directory.

    Even though the default synchronization frequency for Azure AD Connect is every three hours, password hash synchronization occurs every two minutes, allowing users who change their passwords in on-premises Active Directory to begin using their new password in Azure AD almost immediately.

    When you enable password hash synchronization, it applies to all users that are being synchronized to Azure AD. This means that you cannot pick and choose which user's password hashes get synchronized. The only way to prevent a user's password hash from being synchronized to Azure AD is by filtering out the user in the synchronization policies, thereby removing their account from Azure AD.

    If you are using federated authentication for Azure AD, we still often recommend enabling password hash synchronization. This approach is recommended to allow password-based sign in to be used as a fallback if the customer's on-premises AD FS instance goes down.

    If a user's password is already synchronized to Azure AD, enabling password-based sign in is as simple as running a PowerShell script. Users can safely be switched back to federated authentication after the problem is resolved and AD FS is back online.

    Recommended: Even if all of a customer's users are signing in to Azure AD with AD FS, it is recommended to enable password synchronization. Doing so provides a good fall-back method for user authentication if AD FS goes offline.

    Signing In to Azure Active Directory

    After user accounts are synchronized from the on-premises Active Directory to the Azure AD tenant, users can sign in to the accounts and access applications that are integrated with Azure AD, such as Office 365. There are two options for signing in users to Azure AD:

  1. Provide the user name and password to Azure AD for verification
  2. Sign in to on-premises identity provider that is trusted by Azure AD

Authenticating to Azure AD

The user object in Azure AD is separate from the object in the on-premises Active Directory. Because of this, the Azure AD object has its own user name and password. Unless password hash synchronization is enabled in Azure AD Connect, users will have different passwords for Active Directory and Azure AD.

This can confuse users and lead to a poor cloud experience. Therefore, it's recommended to enable password hash synchronization unless there is a specific reason that the customer doesn't want it enabled.

Recommended: Enable password hash synchronization so that the Azure AD password for users is the same as the on-premises Active Directory password.

Authenticating to an On-Premises Identity Provider

Azure AD supports the ability to establish an identity federation trust with an on-premises identity provider (IdP), such as Active Directory Federation Services (AD FS). This enables users to have a desktop single sign-on experience when accessing resources that are integrated with Azure AD.

With this experience, an end user would sign in to a domain-joined workstation and not be prompted again for a password throughout the entire session, regardless of which applications are used.

When a federation trust is in place, Azure AD defers to the on-premises identity provider to collect the user's credentials and perform the authentication. After authenticating the user, the on-premises identity provider creates a signed security token to serve as proof that the user was successfully authenticated.

This security token may also contain data about the user (called claims), which can then be provided to Azure AD for various purposes. The security token is given to Azure AD, which then verifies the signature on the token and uses it to provide access to the applications. The following diagram illustrates this behavior:

Domain Names

When enabling a federated identity relationship between Azure AD and an on-premises identity provider, an entire domain name in Azure AD is converted from a standard domain to a federated domain. This impacts all of the users that have UPNs under the domain name. You cannot have a mix of federated and non-federated users in a domain name.

Note: You cannot convert the default <tenant>.onmicrosoft.com domain name to a federated domain name. Only custom domain names added to Azure AD can be federated.

Any subdomains under a domain namespace will have the same configuration as the parent domain. For example, if the custom domain name contoso.com is configured as a federated domain, child.contoso.com will also be a federated domain. This happens automatically by Azure AD, and it cannot be overridden.

Recommended: Federated domains can be converted back to standard domains at any time. Using this option in conjunction with password synchronization can provide a great fall-back strategy if the customer's identity provider goes down for a period of time.

After the domain name is converted to federated, all users who attempt to sign in to Azure AD with a UPN from the converted domain (or one of its child domains) will be redirected to the on-premises identity provider for authentication. If the user does not have a valid account in the on-premises identity provider, the user will not be able to authenticate to Azure AD or to any of the connected applications.

Federating Multiple Domains

If there are multiple custom domain names in an Azure AD tenant that need to be federated, Azure AD can be configured to redirect users to a single identity provider or to multiple identity providers. The following criteria should be used to determine whether to use a single or multiple identity providers.

Use a single identity provider if:

  • There is a single Active Directory forest with multiple UPN suffixes synchronizing to different domain names in Azure AD.
  • There is a single Active Directory forest with child domains.
  • There are multiple Active Directory forests with two-way forest trusts to the forest that the identity provider resides in, and UPN suffixes can be correctly routed between the forest trusts.

Use multiple identity providers if:

  • There are multiple Active Directory forests with no trust relationship between them.
  • There are multiple Active Directory forests with only one-way trusts between them.

Recommended: Use a single identity provider for the organization, if possible. Otherwise, you'll have to manage multiple instances of the Identity Federation Service on-premises.

Multi-Factor Authentication

Multi-factor authentication (MFA) adds a secondary authentication prompt for users when they sign in to an application that is integrated with Azure AD. This secondary authentication takes the form of something other than a password prompt. Azure AD uses Azure Multi-Factor Authentication, which performs MFA by using a phone call, a text message, or an authentication request to a smart phone application.

Depending on the on-premises identity provider, additional third-party MFA providers can be used in addition to Azure Multi-Factor Authentication. These MFA providers provide additional MFA methods that Azure Multi-Factor Authentication does not support, such as the use of OATH tokens.

Most identity providers support the use of some form of multi-factor authentication when authenticating a user on-premises. With Azure AD, you have the choice of having the identity provider perform multi-factor authentication or have Azure AD perform it with Azure Multi-Factor Authentication.

Performing MFA Through an Identity Provider

When an identity provider performs the MFA, you must configure the identity provider to use Azure Multi-Factor Authentication or a third-party multi-factor authentication service.

If the identity provider performs multi-factor authentication for a user, it must pass a special claim in the security token that it sends to Azure AD to indicate to Azure AD that MFA was performed on-premises. When this claim is passed into Azure AD, Azure AD bypasses its own prompt for MFA. For more information, see Getting started with Azure Multi-Factor Authentication and Active Directory Federation Services.

Perform MFA through the on-premises identity provider if any of the following conditions exist:

  • User phone numbers aren't populated in Azure AD
  • You want to use a smart card or client certificate authentication for the MFA method (Azure AD does not support certificate authentication)
  • You want to use a non-certificate-based MFA method that isn't available in Azure AD (such as an OATH token)
  • You want to trigger MFA, based on user attribute data in the on-premises Active Directory
  • You want to customize the web page that a user sees while undergoing MFA
  • You want to use the device registration status as a trigger for MFA
  • You want to use an MFA provider other than Azure Multi-Factor Authentication

Performing MFA with Azure AD

If MFA is performed with Azure AD, the on-premises identity provider is not configured to perform MFA at all. Instead, MFA is activated globally for the users in the Azure AD tenant, or it is activated for a specific application by using a conditional access rule in Azure AD.

After authentication succeeds at the on-premises identity provider, the identity provider creates a security token and sends the user's identification to Azure AD with the security token. Azure AD receives and verifies the token, and then performs the MFA process as a secondary authentication step.

Perform MFA with Azure AD if any of the following conditions exist:

  • The identity provider does not support MFA
  • You want to perform MFA for only certain applications. If the on-premises identity provider is not able to determine which Azure AD-integrated application the user is authenticating to, this cannot be done through the identity provider.

Optional: Multi-factor authentication is an optional service that can increase the security of Azure AD. To use Azure Multi-Factor Authentication with the on-premises identity provider, an Azure AD Premium license is required.

Multiple Forest Configurations

When dealing with an environment that has multiple Active Directory forests, the main consideration is whether a single identity provider is needed, or if you need multiple identity providers. In general, the answer is going to depend on what product is used as the on-premises identity provider, and how it supports authentication across Active Directory forests.

Regardless of the vendor, however, the following key tenets of federation remain true with Azure AD:

  • UPN suffixes are tied to the federated domain name in Azure AD, and therefore they must be unique across federated domains. For example, if you have a federated domain named contoso.com, all users with UPN suffixes of @contoso.com will use this federated domain. You cannot associate @contoso.com UPN suffixes with any other federated domain.
  • A single federated domain in Azure AD also covers the child namespaces. So a federated domain of contoso.com would cover @contoso.com UPN suffixes and @child.contoso.com.

Using AD FS

AD FS can support a multiple forest configuration, but only if all the forests have two-way Active Directory trust relationships between them. If there are no forest trusts between the multiple Active Directory Domain Services (AD DS) forests, you must have multiple AD FS deployments (one for each forest that is untrusted).

Mandatory: If using AD FS in a multiple forest configuration with no forest trusts between Active Directory forests, you must have multiple deployments of AD FS (one for each untrusted forest).

If possible, we recommend that customers have trusts between their multiple Active Directory forests, so that only a single AD FS farm is needed for Azure AD. If this is the case, the UPN suffixes need to be unique across each forest, otherwise the domain controllers will not be able to properly route the UPN suffixes to the correct forest for cross-forest authentication. For example, you can't have two domains that contain users with an @contoso.com UPN suffix.

Recommended: If possible, we recommend that you have trusts between each Active Directory forest and use a single AD FS instance with Azure AD. This simplifies the architecture and prevents you from having to manage multiple AD FS farms.

Mandatory: If AD FS is used in a multiple forest configuration with trusts between the Active Directory forests, the UPN suffixes for each domain must be unique.

Using a Third-Party Identity Provider

If a customer is using a third-party identity provider, the level of multiple forest support is going to depend on what the identity provider is capable of. For example, the "Optimal IDM Virtual Identity Server Federation Service" identity provider can be implemented with a single deployment whether or not Active Directory forest trusts are in place across the multiple domains.

You'll want to check with the vendor of the identity provider that the customer wants to use to find out their multiple forest support capabilities.

Using AD FS as the Azure AD Identity Provider

Most commonly, AD FS is used as the identity provider for Azure AD. Designing and deploying an AD FS farm tends to be a complex task, and we typically recommend a separate engagement for doing so. However, if a customer is implementing AD FS for only Office 365, a simple load-balanced AD FS deployment is probably fine if it's scaled appropriately.

Authentication Options

In general, it's best to provide a different authentication experience for users, based on whether they are on the corporate intranet. If a user is on the intranet and has direct access to an AD DS domain controller, you want to allow AD FS to use Windows Integrated Authentication (WIA). If a user is not on the intranet, however, there is no access to a domain controller, and therefore, no capability to perform WIA. Instead, it's best to provide the user with a forms-based authentication experience.

AD FS is able to accommodate this scenario through an external-facing server role called the Web Application Proxy. Although AD FS servers are configured for WIA, Web Application Proxy servers sit at the edge of the network in the extranet, and they are configured with Forms authentication.

The determination of whether a user hits the AD FS server or the Web Application Proxy server is based on how the client computer resolves the DNS name of AD FS. This architecture requires a split-brain DNS implementation so that users outside of the corporate network resolve the AD FS DNS to the Web Application Proxy servers, while the users inside the corporate network resolve it to the AD FS servers directly.

Mandatory: If you want to implement Web Application Proxy servers in AD FS (which is highly recommended), you need to have a split-brain DNS implementation.

Another authentication option for users is to allow them to sign in with an x.509 certificate, which is on a smart card, a virtual smart card, or a soft certificate that is installed on the client. This is used as an alternate primary authentication option, and therefore it bypasses the user name and password prompt at the AD FS log-on page.

You should consider using client certificate authentication if:

  • Users have smart cards or soft certificates that can be used for authentication issued to them
  • You want multi-factor authentication (MFA), but you don't want to implement Azure Multi-Factor Authentication Server or another third-party MFA product

Optional: You can optionally use client certificate authentication with AD FS to provide certificate-based multi-factor authentication. This is a good MFA option if your users already have authentication certificates issued to them.

Certificate Options

AD FS requires the following certificates:

Certificate Type

Description

Token signing certificate

AD FS uses the private key of this certificate to sign the security token that it sends to Azure AD. The public key of the certificate is provided to Azure AD through a metadata file when federated authentication is configured.

Web application SSL certificate

This certificate is used for transport security when users browse to the AD FS log-on page. This is a traditional TLS certificate used for HTTPS connections with users' web browsers.

Web service SSL certificate

This certificate is different from the Web Application SSL certificate because it's used specifically to protect the web service in the AD FS active endpoints. Although not required by Azure AD, some Office 365 properties (such as Exchange Online and Lync Online) use this endpoint. Most deployments use the same certificate as the Web Application SSL Certificate.

Although there is no single configuration of certificates that makes sense for every situation, we've distilled the guidance into a few scenarios. Use the following table to make the appropriate decisions for certificate configuration:

Level of Security Consciousness

Certificate

Issuer

Validity Period

Uniqueness

Favors convenience over security

Token Signing

Self-Signed

1 year

Unique

 

Web App

CA-Issued (Publically Trusted)

5 years

Common

 

Web Service

CA-Issued (Publically Trusted)

5 years

Common

Somewhat security conscious

Token Signing

Self-Signed

1 year

Unique

 

Web App

CA-Issued (Publically Trusted)

2 years

Common

 

Web Service

CA-Issued (Publically Trusted)

2 years

Common

Favors security over convenience

Token Signing

CA Issued

1 year

Unique

 

Web App

CA-Issued (Publically Trusted)

1 year

Unique

 

Web Service

CA-Issued (Publically Trusted)

1 year

Unique

Mandatory: Azure AD requires that the key size of the token signing certificate is a minimum of 2048 bits.

Web Application Proxy Servers

In the context of Azure AD, the Web Application Proxy role is basically an edge role for the AD FS implementation on-premises. Using the Web Application proxy has the following benefits for customers:

  • Provides a layer of network insulation from the Internet so that users outside the network aren't hitting the AD FS servers directly
  • Provides a soft-account lockout capability (the user is locked out from the proxy server, but not from their Active Directory account) so that an attacker on the Internet cannot lock out internal user accounts
  • Allows you to split the authentication method being used, based on where the connection is coming from (for example, use Forms Authentication for users over the Internet, but use Windows Authentication for users on the corporate network)
  • Provides location-awareness to AD FS, so that decisions can be made based on whether the user is on the corporate intranet.

Because of these benefits, we typically recommend using the Web Application Proxy role with an AD FS deployment that interacts with Azure Active Directory.

Recommended: It is very rare to not include Web Application Proxy servers in the AD FS deployment for Azure AD. We recommend that they be implemented unless you have a sound reason for not wanting to do so.

Although Web Application Proxy servers can handle a heavier load than AD FS, we generally recommend that you deploy Web Application Proxy servers in a 1:1 relationship with AD FS servers. For example, if an organization has three AD FS servers on the intranet, it's a good idea to start with three Web Application Proxy servers in the extranet. You can monitor the performance of the Web Application Proxy servers and determine if you can remove some of the servers from the load balancer, based on the utilization metrics of the servers.

Recommended: As a general rule, we recommend starting with an equal number of Web Application Proxy servers and AD FS servers.

High Availability

AD FS is a web application and a web service, so making it highly available and responsive to client requests consists of using traditional web-based load balancing methods. We've performed AD FS deployments with most of the major load-balancer manufacturers on the market.

Some of the load balancers we've used are Citrix Netscaler, F5 BigIP, and Windows Network Load Balancing. We've found that most load-balancing manufacturers have specific guidelines for integrating AD FS with their products.

The following diagram illustrates what a common load-balancing architecture for AD FS looks like.

AD FS supports active/active load balancing within a network segment, and it is 100% stateless. This means that persistent connections (sticky connections) are not needed at the load balancer. This simplifies the load balancer configuration to the point where you only have to select the load balancing algorithm to be used.

Database Selection

AD FS has the ability to use a database that is hosted on a full deployment of SQL Server or it can use an internal database known as the Windows Internal Database (WID). The following table provides a comparison of these options:

Consideration

SQL Server

WID

Database availability

Capable of active/active deployments, with multiple database nodes having write capabilities

Active/active deployment, but only one node can write to the database—all other nodes are Read-only

Replication

Any supported SQL replication option

Uses a primary/secondary replication methodology, with replication occurring at five-minute intervals

Performance

Can provide very high performance, which is good for environments consisting of hundreds of thousands of users and more than 100 applications integrated with AD FS

Performance is good, but not has high as SQL Server. If the customer has less than 100,000 users and is only using AD FS for Azure AD, this is a good option.

Scalability

No restrictions on scalability; scales as high as SQL Server allows

Soft limit of 10 servers in the AD FS farm

Location

Can be stored on separate, dedicated database servers, which can be used to increase performance

Local copy resides on each AD FS server

Complexity

Can be very high in complexity (we've seen AD FS deployments where more than half of the project consisted of deploying SQL Server)

Very low and works with virtually no configuration necessary

Aside from the general recommendations, there are two situations where SQL Server may be required over the WID:

Scenario

Description

Token replay protection

Token replay protection comes in to play when AD FS is not the ultimate identity provider being used. In these scenarios, there is another identity provider that AD FS defers authentication to and receives a security token from.

Token replay protection prevents an attacker from replaying the token sent to AD FS from the additional identity provider. To do this, AD FS writes a hashed value of the token to the database. Therefore, every AD FS server must be able to write to the database, making this scenario unsuitable for WID-based deployments.

This is typically not a common scenario with Azure AD, but it's possible that customers may be using this capability, so this may be a key decision factor in determining which type of database to use with AD FS.

SAML artifact resolution

This is a capability where AD FS can give a portion of the security token to the application through the user's web browser, instead of the entire token. The application can then go back to the AD FS server and retrieve the rest of the security token without the user's browser being involved.

This capability requires every AD FS server in the farm to write to the database, so it's not suitable for WID-based deployments. It's rare that we see customer's using SAML artifact resolution. Azure AD in particular does not use it, so if a customer is only using AD FS for Azure AD integration, this scenario can be ignored.

Unfortunately, there is no single architecture that is right for every situation. However, we recommend that you default to a WID-based architecture, unless there is a solid reason to depart from it. This keeps the architecture simpler, particularly in cases where multisite deployments are needed.

Recommended: Plan to use a WID-based architecture by default, unless there is a solid reason to depart from it.

Using Third-Party Identity Providers with Azure AD

The use of third-party identity providers is supported in Azure AD if the identity provider is on the approved list (see Use Third-Party Identity Providers to Implement Single Sign-On in the following References table).

The guidance for AD FS in the previous section of this document is not applicable to third-party identity providers. If a customer chooses to go that route, you'll need to work with the vendor to obtain the recommended practices for implementing the identity provider.

References

Install the Azure Active Directory Sync Service

https://msdn.microsoft.com/en-us/library/azure/dn757602.aspx

DirSync: Using Alternate Login IDs with Azure Active Directory

http://social.technet.microsoft.com/wiki/contents/articles/24096.dirsync-using-alternate-login-ids-with-azure-active-directory.aspx

Configuring Alternate Login ID

https://technet.microsoft.com/en-us/library/dn659436.aspx

Set up a trust between AD FS and Azure AD

https://msdn.microsoft.com/en-us/library/azure/jj205461.aspx

SSO for On Prem IWA Apps Using Kerberos constrained delegation with Application Proxy

https://msdn.microsoft.com/en-us/library/azure/dn879065.aspx

Use Third-Party Identity Providers to Implement Single Sign-On

https://msdn.microsoft.com/en-us/library/azure/jj679342.aspx

Azure Multi-Factor Authentication Options for Federated Users

https://msdn.microsoft.com/en-us/library/azure/dn394284.aspx

How to Switch from Single Sign-On to Password Sync

http://social.technet.microsoft.com/wiki/contents/articles/17857.how-to-switch-from-single-sign-on-to-password-sync.aspx

Azure Multi-Factor Authentication

For an overview of Azure Multi-Factor Authentication, please see What is Azure Multi-Factor Authentication.

Licensing

There are multiple ways to acquire licenses for Azure Multi-Factor Authentication. The following methods currently exist:

Method

Description

Direct purchase of Azure Multi-Factor Authentication licenses

The ability to pay on either a per-user or per-authentication basis

Purchase as part of Azure AD Premium

AAD-Premium includes Azure licenses in the per-user cost

Purchase through the Enterprise Mobility Suite

The Enterprise Mobility Suite includes Azure AD Premium as part of the package, which includes Azure Multi-Factor Authentication per-user licenses

You may encounter a situation where a customer purchases licenses and there was a delay before the licenses are available. If this is the case, do not simply create a temporary trial license and expect the trial licenses to be converted into production licenses. You'll want to understand what licenses the customer purchased, and work with the customer's Account team to make sure that the appropriate licenses become available.

Recommended: Be sure to understand how a customer acquired the Azure Multi-Factor Authentication licenses. Doing so may save you time and problems in the future.

Azure MFA Server vs Azure AD MFA

There are two types of Azure Multi-Factor Authentication services available:

  1. Azure Multi-Factor Authentication Server: This is a server product that you install on-premises, which can add multi-factor authentication to services other than Azure AD.
  2. Azure AD Multi-Factor Authentication: This adds multi-factor authentication to an Azure AD account. This is a pure cloud service with no on-premises components.

Situations for Using Azure Multi-Factor Authentication Server

You should use Azure Multi-Factor Authentication Server in the following conditions:

  • You want to add multi-factor authentication to on-premises devices, such as VPN devices or networking gear
  • You want to add multi-factor authentication to applications that don't support identity federation protocols or are not integrated with Azure AD
  • You want to use one of the following multi-factor authentication (MFA) methodologies:
    • OATH token
    • Two-way SMS
  • You want to integrate Azure Multi-Factor Authentication with AD FS directly
Conditional Access

Conditional access provides a way to specify advanced authorization rules for requiring MFA. Without conditional access, users that are enabled for MFA require MFA every time they attempt to access an application. For more information about conditional access, please see Azure Conditional Access Preview for SaaS Apps.

Exception Policies

Conditional access allows you to use "exception policies" so that users who are members of certain Azure AD security groups are exempt from the MFA requirement. When a user is exempt from MFA, the exemption overrides any MFA requirement, regardless of the application or the other groups that the user is a member of. Keep this in mind when working through exception policies with customers, and use exception policies sparingly, and only when necessary.

Recommended: Exception policies override the MFA requirement for certain users. This may cause a security violation for some customers, so be sure that you understand a customer's security policies when helping a customer define MFA exemptions.

References

What is Azure Multi-Factor Authentication?

http://azure.microsoft.com/en-us/documentation/articles/multi-factor-authentication/

Reporting

Azure Active Directory contains a series of reports that can be used by customers to gain insight into various activities about the user. These reports are broken into three categories:

  1. Anomalous activity – Reports potentially suspicious activity that could be an indicator of a security incident.
  2. Activity logs – Provides reports about various activities that are taking place within the directory, such as password management or self-service identity activities.
  3. Integrated applications – Provides statistics about which applications are being used.

These reports are available to customers with Azure Active Directory Premium licenses. There is nothing architecturally to be considered with these reports. Customers simply need to be aware that they exist and should be monitored.

Monitoring

As a cloud service, Azure Active Directory does not need monitoring. You can, however, visit the Azure Status page to determine if there are currently any issues with the Azure AD service.

Azure AD Connect Health

One monitoring capability that Azure Active Directory does provide is monitoring on-premises AD FS servers. This can be any AD FS implementation; it is not required to be the AD FS implementation that federates an organization with Azure AD.

This feature is called Azure AD Connect Health. This service allows you to install agents on your AD FS servers that push audit data to Azure AD. When the data is there, Azure AD can provide lots of useful information about the health and metrics of AD FS.

For more information, please see Azure AD Connect Health.

Agent Installation

The Azure AD Connect Health agent must be installed on each AD FS server that is being monitored. The package can be downloaded and deployed via automated software installation (such as Active Directory Group Policy Objects or System Center Configuration Manager), or it can be manually installed on each AD FS server.

Azure AD Connect Health has an option to keep the agents automatically up-to-the date (which is turned on by default), so the installation of the agents is a one-time event.

For some of the agent functionality to work, auditing must be turned on at each of the AD FS servers. This is turned off by default, so you need to ensure that a customer does not have any issues enabling it.

Mandatory: Auditing must be enabled on each AD FS server for the Usage Analytics in Azure AD Connect Health to work properly.

There is also a set of outbound URLs that the agent contacts. These URLs must not be blocked by firewalls. For a complete list of these addresses, see the "Outbound Connectivity to Azure Service Endpoints" section on the Azure AD Connect Health Requirements page.

Network Connectivity

The Azure AD Connect Health agent sends audit and event log data to Azure AD. If network connectivity is disrupted, the agent will queue data up to an amount equal to 10% of the total system memory. If connectivity isn't restored before the queue is full, the newer data will overwrite the older data until network connectivity is restored. It is estimated that 1000 requests consume about 80 MB of data.

Recommended: You'll want to ensure that there's a big enough buffer on the AD FS audit channel to prevent wrapping of data. We recommend that you have at least 1 GB of storage allocated to the AD FS audit channel.

References

Azure Status

http://azure.microsoft.com/en-us/status/#current

Azure AD Connect Health Requirements

https://msdn.microsoft.com/en-us/library/azure/dn906733.aspx

Azure AD Connect Health FAQ

https://msdn.microsoft.com/en-us/library/azure/dn906723.aspx

Directory Management

Azure AD License Assignment

All users that have a Basic, Premium, or Enterprise Mobility Suite license must be specifically assigned to the license to use the associated features. There are two ways to associate a license with a user in Azure AD:

  • Direct assignment
  • Group membership

Direct License Assignment

With a direct license assignment, you are assigning a license to an individual person. For large organizations, this model can be unsustainable because it requires you to manage each user's license individually.

However, if you are using an Identity Management service in your on-premises environment (such as Microsoft Identity Manager), you can directly assign licenses to users by having the Identity Management service run a PowerShell command. Therefore, we recommend assigning licenses directly if:

  • The organization is small and an administrator can manage the license assignments through the Azure portal.
  • An Identity Management system is used and it can integrate with Windows PowerShell as part of its provisioning process.

Group Membership

Another approach for assigning licenses to Azure AD users is to add the users to an Azure AD group, and then assign the license to the group, instead of to individual users. The group used for the license assignment can be a group that is sourced from the on-premises Active Directory or a group that is sourced from the Azure AD.

This approach tends to be more manageable for large organizations, especially those that already have a group management solution in place. We recommend assigning licenses via group membership if:

  • The organization has an Identity Management system in place that is capable of managing group memberships
  • The organization is large and can appropriately assign various users to role-based groups memberships

References

Manage Azure AD Subscriptions and Licenses

https://msdn.microsoft.com/en-us/library/azure/dn919664.aspx

Identity Management-as-a-Service Capabilities

Azure Active Directory provides a set of capabilities that allow users to manage their identities in the cloud.

Self-Service Password Reset

With self-service password reset, users can reset their forgotten passwords in Azure AD and the new password can optionally be written back to the on-premises Active Directory. To support password write-back, the following must be in place:

  1. Azure AD Sync or Azure AD Connect must be used for synchronizing user accounts to Azure AD. Forefront Identity Manager with the Azure AD Connector is not supported for password synchronization.
  2. Password hash synchronization must be enabled, so that the on-premises password hashes are synchronized to the Azure AD tenant for each user.

  •  

    Authentication

    When a user initiates the process to replace a forgotten password, the user must be authenticated. Because the password is not known, an alternate means of authentication must be used. Azure AD self-service password reset supports the following forms of authentication for forgotten passwords:

    • Phone call with a one-time code
    • Text message with a one-time code
    • Email with a one-time code to an alternate email address
    • Security questions

    Phone-Based Authentication

    To use the phone-based methods, the user's mobile phone or office phone must be populated in the directory. This can be done through Azure AD Connect synchronization or by manually updating the phone number through the web interface or PowerShell.

    Email-Based Authentication

    If you want to use the email option, the Alternate Email Address attribute of the user account needs to be populated with a valid email address. Similar to the phone number, this can be done through Azure AD Connect synchronization or by manually updating the email address through the web interface or PowerShell.

    Question-Based Authentication

    When using security questions for self-service password reset authentication, you must specify a pool of security questions that Azure AD can choose from. Out of this question pool, Azure AD requires that some number of questions be answered during enrollment and all or a subset of those questions need to be answered during a password reset event.

    We recommend that customers define their own questions, based on legal counsel and the approval of the customer's security team. This is to ensure that the questions don't inappropriately ask users for sensitive personally identifiable information, and that the questions are secure with the answers not being easily guessable.

    Enablement

    Self-service password reset can be enabled for all users or for a subset of users in the directory. To enable self-service password reset for a subset of users, the users can be added to an Azure AD security group, which is referenced in the self-service password reset configuration. There are two options for adding users to this group:

  1. Create the group in the cloud and manage members through Azure AD. This process works well for organizations with smaller numbers of people and administrators can effectively manage the users through the Azure portal.
  2. Create the group in the on-premises Active Directory, manage its membership in the on-premises Active Directory, and synchronize it into Azure AD via Azure AD Connect. This is the preferred process for large- and medium-sized organizations that already have a process for managing group memberships on-premises

Registration

Before self-service password reset can be used, the appropriate data must be populated in Azure AD. This data could be prepopulated based on the Azure AD Connect synchronization job, or users can be asked to self-register their information.

If you choose the latter method, users can be provided with a web page link to take them to the registration portal. Azure AD can also be configured to automatically take the user to the registration portal when they sign in to the Access Panel.

Recommended: We recommend that you create an end-user communication plan to provide the users with the details about how to register for self-service password reset, reset their password, and know what to expect.

Self-Service Group Management

Self-service group management enables users to manage their groups and group memberships in Azure Active Directory. The following scenarios are supported by self-service group management:

  • Users can create groups in Azure AD, which can be used to authorize people to access applications that are integrated with Azure AD
  • Users can manage the memberships of groups that were created in Azure AD
  • Users can request to join groups that were created in Azure AD

Recommended: At the time of this writing, we did not support managing groups that are created in on-premises Active Directory and synchronized to Azure AD. Please check for updates about self-service group management on TechNet (see the References table at the end of this section).

Mandatory: Self-service group management requires an Azure AD Premium license, and the user performing group management must have a license assigned to them.

Enablement

Self-service group management can be enabled for all users or for a subset of users in a specific Azure AD security group. There are two options for adding users to this group:

  1. Create the group in the cloud and manage members through Azure AD. This process works well for organizations with smaller numbers of people, where administrators can effectively manage the users through the Azure portal.
  2. Create the group in the on-premises Active Directory, manage its membership in the on-premises Active Directory, and synchronize it to Azure AD via Azure AD Connect. This is the preferred process for large- and medium-sized organizations that already have a process for managing group memberships on-premises

Managing Groups

After users are enabled for self-service group management, they are able to create groups and submit requests to join groups. After self-service group management is enabled, there are a couple of options available:

Option

Description

Allow users to create groups

Not all customers are comfortable allowing employees to create groups, so this is used to control group creation for all users.

Restrict who can use self-service group management

This option limits which users are enabled for self-service group management. Customers who use this option need to add all users who can perform self-service group management to a security group in Azure AD.

Groups that are created in the Azure portal have self-service group management disabled by default. If a customer wants to enable self-service group management for these groups, the groups must first be assigned owners, and the owner must change the policy for the group in the MyApps portal.

Mandatory: Groups cannot be enabled for self-service group management unless they have an identified owner. Finding owners for groups can be a painful exercise for customers for various reasons—for example, the previous owner may have left the organization, or the group may have been created without an owner and no one is sure what organization the group should be managed by. When working with customers in self-service group management, it's important to make sure that customers know that they may need to perform a group owner reconciliation exercise.

Dedicated Groups

Dedicated groups are special groups that are automatically created and managed by Azure AD. For example, the dedicated group for All Users contains every user account in Azure AD.

Customers can choose to enable or disable dedicated groups. If customers require a group that consists of all users, using a dedicated group is the preferred method of achieving that. Otherwise, customers need to proactively manage the membership of the group.

Mandatory: To use dedicated groups, self-service group management must be enabled in Azure AD. However, if customers want to use dedicated groups and they are not ready for their users to participate in self-service group management, they can configure self-service group management to apply only to users in a security group and keep that security group empty.

Dynamic Groups

Dynamic groups allow an administrator to specify some criteria by which all users who meet the criteria are automatically members of the group. These criteria take the form of a user attribute query, based on attributes that are present in the directory. For example, if a customer wants to create a dynamic group called HR Users, the criteria for the group would consist of a query where the value of the Department attribute is Human Resources.

Mandatory:

  • Customers are limited to the number of dynamic groups they can have in the directory. At the time of this writing, the limit is 10 dynamic groups in the Azure AD tenant. Check the resources in the Reference table at the end of this section for current limits before consulting a customer.
  • To use dynamic groups, self-service group management must be enabled in Azure AD. However, if customers want to use dynamic groups and they are not ready for their users to participate in self-service group management, they can configure self-service group management to only apply to users in a security group, and keep that security group empty.
  • All attributes that you want to use for group membership evaluation must be present on the User objects in Azure AD. This can be by manually adding the attributes or by synchronizing them from the on-premises Active Directory.

Recommended:

  • When you convert a standard group to a dynamic group, the previous membership of the group is lost. Make sure that the previous membership is exported first if the customer wants to retain a list of those members.
  • When creating a custom (advanced) membership evaluation policy, there is a limit of 255 characters. Therefore, complex evaluations that involve multiple attributes and AND/OR conditions should be limited.

References

Self-service group management for users in Azure AD

https://msdn.microsoft.com/en-us/library/azure/dn641267.aspx

Manage Your Groups

https://msdn.microsoft.com/en-us/library/azure/dn641268.aspx

Dedicated Groups in Azure AD

https://msdn.microsoft.com/en-us/library/azure/dn889921.aspx

Dynamic Memberships for Groups in Azure AD

https://msdn.microsoft.com/en-us/library/azure/dn913807.aspx

Branding and Customization

Although Azure AD is a cloud service, some elements of the user interfaces can be branded by customers. To brand the user interface in Azure AD, an Azure AD Basic or Azure AD Premium license is required.

Mandatory: Azure AD Basic or Azure AD Premium licenses are required to customize the sign-in page and Access Panel. The administrator who is making the customizations must have an assigned license; otherwise, the option to configure the branding and customizations will not show up in the Azure portal.

The following Azure AD components can be branded or customized:

  • Sign-in page elements
  • Illustration image displayed on the left side of the page
  • Banner logo above the sign-in box
  • Sign-in page text

  • Application Gallery
  • Logo (in upper-left corner of page)

Optional: Language-specific customizations can also be made. So if your customer is a multinational, you could customize the sign-in page text for each language that the customer supports.

Mandatory: Do not design solutions for Azure AD that require custom code to be run as part on the sign-in page or Access Panel. Azure AD does not allow HTML elements or client-side scripts to be run as a customization.

References

Add Company Branding to your Sign-In and Access Panel pages

https://technet.microsoft.com/en-us/library/dn532270.aspx

Extending On-Premises Active Directory to Microsoft Azure

When working with virtual machines in an Infrastructure-as-a-Service (IaaS) environment, the virtual machines most often need to be joined to an Active Directory domain. This is so the operating system can be properly managed and the software running on the virtual machines can function properly. Many customers who move virtual machines to Azure have come to the conclusion that extending Active Directory into Azure IaaS is a recommended course of action.

One of the questions we are often asked is whether a customer should deploy Active Directory domain controllers into IaaS. The alternative option is to keep them on-premises and provide a VPN connection. There are various considerations to be made when answering this question. This section answers those questions and provides guidance about how to extend Active Directory to Azure virtual machines in a safe and reliable way.

Networking Considerations

When considering extending Active Directory to Azure, there are two primary areas of networking that need some thought:

  • Connecting on-premises domain controllers to Azure virtual machines
  • Networking the domain controllers in Azure with the virtual networks in IaaS
Connectivity to Azure

Whether the domain controllers are on-premises or deployed in the Azure, there needs to be connectivity between the Azure virtual network and your on-premises network. If you want to keep domain controllers on-premises, you need an ExpressRoute connection or a Site-to-Site VPN connection to Azure.

Every time a virtual machine in Azure needs to access a domain controller, it will traverse this connection over the WAN. Depending on the stability and performance of the connection, this may cause issues. You should ask the following questions:

  • What kind of connection is available between the on-premises network and Azure? If it's an ExpressRoute connection, you are more likely to keep the domain controllers on-premises. And it's even better if it's an IXP connection instead of NSP. For more information about ExpressRoute, see the ExpressRoute Overview section earlier in this document.
  • What is the cost of network traffic across the connection? If there's a high cost associated with the connection, and you expect the virtual machines to communicate heavily with Active Directory, it would probably be more cost-effective to include the domain controllers in IaaS.
  • How stable is the network connection with Azure? Accessing domain controllers over the WAN will not work well if the connection is unstable. Inaccessibility to Active Directory can cause many applications and services to be unusable. If you have applications with dependencies on Active Directory, you'll want to put in a more stable connection (consider ExpressRoute) or put the domain controllers locally in the IaaS tenant.
IP Addressing

Virtual machines in Azure get IP addresses assigned dynamically from the virtual network that they reside in. To the operating system, this assignment occurs via DHCP. If you try to change the IP address to a static IP address, Azure will isolate the virtual machine, and it won't be able to communicate on the network.

Mandatory: Do not set a static IP address on the network adapter in the operating system for virtual domain controllers in Azure. Doing so will isolate the virtual machines and prevent them from communicating on the virtual network.

The IP address that Azure gives the domain controller will never change unless you deprovision the virtual machine. In general, it is safe to allow Azure to assign a dynamic IP address to a domain controller. If, however, you want a domain controller to have a specific IP address, you can configure Azure to provide a static IP address to the domain controller by using the methods outlined in the networking section of this document.

The IP address is still dynamically assigned from the perspective of the operating system on the virtual machine, but it's really an address that you choose. This also has the benefit of ensuring that the virtual machine retains the same IP address if it is accidently deprovisioned by an administrator.

Recommended: To give a domain controller the IP address that you want and prevent it from changing if the virtual machine is deprovisioned, provide the virtual machine with a static virtual network IP address.

Storage Considerations

There are three types of disks in Azure that can be attached to virtual machines:

  • Operating system disks: The default operating system disk that is used in the virtual machine. This disk is durable, but it has Write-behind disk caching enabled.
  • Data disks: Additional durable disks that have Write-behind disk caching disabled
  • Temporary disks: The contents are wiped away during reboot. This disk is intended for paging files, so that paging files don't count against the storage quota.

To prevent the Active Directory database and its SYSVOL from getting deleted or corrupted, both must be placed on a data disk. The virtual machine's operating system disk has Write-behind disk caching in place, so placing the Active Directory database and SYSVOL on the operating system disk could cause Writes to get lost, if a virtual machine is stopped before the cache is committed. Never place the Active Directory database or SYSVOL on a temporary disk – the contents of the temporary disk are deleted during certain virtual machine operations.

Mandatory: Make sure that you place the Active Directory database and SYSVOL on a data disk. If you use the operating system disk or a temporary disk, the database may get corrupted or purged during an outage.

Security Considerations

Domain Controllers in Azure

Most customers will strongly consider placing domain controllers in Azure because they will want the applications they place in Azure IaaS to have reliable and low latency access to the domain controllers.

Domain controllers are highly sensitive roles. If someone compromises a domain controller, they can gain access to virtually everything in a customer's environment. Some customers are nervous about hosting domain controllers in an Azure tenant, and we always respect that concern.

The best way to handle the situation is to present the customer with an understanding of how Azure is secured and information about how we prevent the compromise of a domain controller.

Key resources in any informed conversation about how Azure protects domain controllers should include:

  • Use the Tier 0 subscription described in the reference model in this document. It allows you to manage who has explicit control over the domain controllers (and their security equivalents in Tier 0).
  • The logical flow of choosing security controls in "10.2 Azure Security Strategies" later in this document (including the "do no harm" approach to security controls).

Active Directory and its structure also provide pivotal tools for hardening and enforcing good credential hygiene. Defending against Pass-the-Hash and Pass-the-Ticket attacks come from the design principals in Active Directory, including:

  • Organizational unit (OU) structure
  • Security Group memberships
  • Group Policy Objects (GPOs) and their respective local policies for User Rights Assignments

For further considerations, refer to the section in this document about AD Design Considerations.

Azure Security

Microsoft is committed to ensuring that the Azure platform is secure, and new capabilities are constantly being added that increase the security of solutions that can be deployed on Azure. It is highly recommended to read the section of this document called Microsoft Azure Security to gain an understanding of the core security strategies in Azure.

You can also visit the Microsoft Azure Trust Center for additional Azure security documentation. Rather than expound on security itself (this is covered in detail in other places), the remainder of this section will discuss the security aspects of hosting Active Directory domain controllers in Azure.

Read-Only Domain Controller

Read-Only Domain Controllers (RODCs) are a type of Active Directory domain controller that do not allow Write operations to take place. For more information on RODCs, please see AD DS: Read-Only Domain Controllers.

Customers frequently ask if they should use RODCs in Azure as a security measure. In short, the answer is, "No."

The intent of RODCs is to limit the scope of damage that can be done in the case of poor physical security. For example, if a branch domain controller sits in an unsecured trailer in a construction lot, the theft of that domain controller could expose user credentials and provide a way for someone to modify and inject data into the organization's Active Directory infrastructure.

Some might argue that an Azure virtual machine could be stolen in a similar manner as a physical domain controller. Although this may be the case, there are security measures in place that prevent someone from downloading the virtual hard disk (VHD) associated with a domain controller's virtual machine. There are also additional protective measures that can be taken to secure VHDs, which are covered later in this section.

A primary reason that the use of RODCs is discouraged in Azure is that application compatibility is unpredictable. Many services and applications are not compatible with RODCs, and it's difficult to fully assess an application or service to determine its compatibility. In addition, RODCs, by design, redirect a client's Lightweight Directory Access Protocol (LDAP) Write requests to a Read/Write domain controller (RWDC), meaning a client needs to be able to touch an RWDC.

Another reason why we do not recommend using RODCs is that there is still a dependency on the network being in place. If the organization deploys domain controllers in Azure as a way to ensure that Active Directory is available if the connection to the on-premises datacenter goes down, an RODC will not suffice.

It is also inaccurate to believe an RODC helps secure LDAP. To secure LDAP, we recommend using Controlling Object Visibility – List Object Mode in Active Directory..

Recommended: Do not use Read-Only Domain Controllers as a security measure in Azure.

Windows Server Core

The Server Core installation option of Windows Server is a reduced-attack surface implementation of Windows Server that removes the user interface and other features of Windows that may not be necessary for all applications and services. Because of the reduced footprint, Server Core provides a smaller attack vector and is less susceptible to viruses and malware than a full Windows Server installation. For more information, please see Windows Server Installation Options.

Although Server Core can provide better security for a virtual machine than a full Windows Server installation, it does not address any specific security concerns with regards to running domain controllers in Azure. If you are already using Server Core for domain controllers on-premises, you can continue this practice for domain controllers in Azure. However, do not go out of your way to deploy Server Core specifically for Azure-based domain controllers.

Recommended: If you are not using Server Core on-premises to run your Azure-based domain controllers, we do not recommend switching.

Protecting VHDs

Perhaps the biggest threat to domain controllers in Azure is the possibility of someone downloading the virtual hard disk (VHD). The Active Directory database is not encrypted. Therefore, if attackers got ahold of the disk, they could execute a brute force attack against the accounts, or edit the database offline to inject their data. Due to this, protecting the VHDs of the domain controllers in Azure are very important.

The VHDs associated with a virtual machine are stored in Azure Blob storage. The Blob storage has an API that is accessible from the Internet. This is perhaps the biggest threat to domain controllers; the domain controller's hard disk can be downloaded by anyone over the Internet, if they have one of the API keys. The URL for the Blob storage container hosting the VHDs is a standard URL, for example:

https://<StorageAccountName>.blob.core.windowsazure.net/vhds

This allows an attacker to potentially guess which URL the VHDs are stored in, or determine the URL by sniffing DNS queries while on the same network as a tenant administrator that accesses the API.

There are two API keys, and each key is 512 bits. The two keys exist so that they can be rotated without interruption of the services that use Azure Storage. Either key provides access to the storage container, so it's important that the keys are not given to anyone.

The first step for protecting the domain controller VHDs is to create a separate Azure Storage account for them. Keep this storage account separate from all other storage accounts and make sure that no one has the API keys. There is no reason for administrators to have the API keys for their day-to-day operations, so the keys should remain a secret. You'll also want to limit access to the Azure portal, to prevent unauthorized people from obtaining the API keys.

Recommended:

  • Create a separate Azure Storage account for domain controller VHDs, and make sure that no one has the API keys. Only store domain controller VHDs in this storage account. All other VHDs should be in a separate storage account.
  • Limit access to the Azure portal to administrators who really need it to prevent unauthorized people from getting access to the API keys for the Azure Storage account that the domain controller VHDs are stored in.

In addition to protecting access to the API, we recommend that organizations take additional measures to encrypt the domain controller VHDs. There are various third-party solutions available for encrypting VHDs, but we recommend that the customer look at CloudLink first.

CloudLink SecureVM leverages native BitLocker capabilities in the virtual machine and provides a key management solution that releases BitLocker keys to virtual machines when they need to reboot via manual intervention or preauthorization. Please visit the CloudLink website for more information.

Recommended: We highly recommend that you encrypt domain controller VHDs in Azure by using a third-party partner solution, such as CloudLink SecureVM.

Limiting Endpoint Exposure

When a virtual machine is created by using the Azure portal, two endpoints are created by default, which inherently can be accessed over the Internet by using:

  • Remote Desktop: Used for logging into the virtual machine from outside the network
  • WinRM: Used for PowerShell access to the virtual machine from outside the network

We recommended that you remove the Remote Desktop endpoint and only log in to the virtual machine that hosts the domain controller role via the local network. If the VPN connection goes down, and you don't have access to the private IP address of the domain controller, you can sign in to the Azure subscription and create the Remote Desktop endpoint temporarily to give you access to the virtual machine.

If for some reason, it's not desirable to remove the endpoint, we recommend (at a minimum) to place an ACL on the endpoint for all of the IP addresses that will be logging in to the domain controller.

Recommended: Remove the Remote Desktop endpoint from virtual machines in Azure that host the domain controller role.

We also recommend that you remove the WinRM endpoint for virtual machines that host the domain controller role. If you are actually using the WinRM endpoint for remote PowerShell access to domain controllers, we recommend that you look at doing this over the private virtual network. If that's not possible, you should place an ACL on the endpoint for the IP addresses that automation will be running from.

Recommended: Remove the WinRM endpoint from virtual machines that host the domain controller role in Azure.

References

AD DS: Read-Only Domain Controllers

https://technet.microsoft.com/en-us/library/cc732801(v=ws.10).aspx

Microsoft Azure Trust Center

http://azure.microsoft.com/en-us/support/trust-center/security/

Windows Server Installation Options

https://technet.microsoft.com/library/hh831786

Azure Virtual Machine Disk Encryption using CloudLink

http://azure.microsoft.com/blog/2014/08/19/azure-virtual-machine-disk-encryption-using-cloudlink/

Deployment Considerations

The following section walks through some considerations for deploying domain controllers on virtual machines in Azure.

Virtual Machine Sizing

Active Directory makes efficient use of the available memory in a domain controller, so one of the best ways to make sure that a domain controller is performing optimally is to provide enough memory to adequately cache the database.

To support this, leverage the virtual machine profiles that have larger memory footprints. In particular, look into using the A5 virtual machine, which has a larger memory footprint, and fewer cores. A5 virtual machines tend to strike a good balance between performance and cost for domain controllers.

Recommended: Start out by using A5 virtual machines for domain controllers in Azure. If you need more memory in the domain controller for caching the database, consider using an A6 virtual machine.

Virtual Machine Roles

In addition to standard virtual machine roles, Azure provides Web and Worker roles for applications hosted on Azure PaaS. Web and Worker roles are not persistent, and they should not be used for domain controllers. Do not install domain controllers on Web or Worker roles.

Mandatory: Do not use Web or Worker roles for domain controllers in Azure.

Getting Domain Controllers into Azure

There are multiple ways to deploy a domain controller into a given Azure subscription. The following table lists supported methods for deploying a domain controller in Azure IaaS:

Method

Description

Migrate from physical computer to virtual machine

A P2V solution, such as System Center Virtual Machine Manager, can be used to convert a physical domain controller on-premises to a virtual machine that can be imported into Azure.

When doing so, the on-premises domain controller must be shut down and not turned on prior to the virtualized domain controller being started in Azure.

Move an existing virtual domain controller

If customers have domain controllers in Hyper-V, those virtual hard disks can be moved and directly attached to an Azure virtual machine.

Build a new domain controller and replicate from on-premises

This traditional approach works well for Azure, also. A new virtual machine is created in Azure, and then promoted to a domain controller when connectivity to the on-premises network is established.

If this is the first domain controller in Azure, the Active Directory data will be replicated over the WAN. Only egress traffic has a charge, so this should have minimal impact on networking cost.

The Install from Media (IFM) option is also available as an alternative to downloading the Active Directory database over the WAN connection.

Clone a domain controller

Because Azure supports the VMGenerationID feature, domain controllers on-premises can be cloned and imported in Azure. This is a quick way to get a domain controller deployed in Azure.


  •  

    Mandatory: Protect all copies of domain controller VHDs by using good security practices, including backup and temporary copies.


  •  

    Administrative Considerations

    For the most part, managing domain controllers in Azure is similar to managing domain controllers on-premises. However, there are some specific things that Active Directory administrators need to be aware of. This section outlines those considerations and recommendations for Active Directory administrators.

    Virtualization Safe Domain Controllers

    Active Directory leverages time stamps for replication, and it expects that time will only move forward. There is a particular issue that occurs when an old snapshot of a domain controller is introduced—this is called USN rollback. A USN rollback occurs when there is a divergence in domain controller data.

    The issue is that a change that occurs on a rolled-back domain controller is made successfully on the domain controller, but the change never replicates to other domain controllers, because the other domain controllers believe that they are already up-to-date. For more information about USN rollback, please see KB875495.

    USN rollback most commonly occurs on virtualized domain controllers, because it's so easy to take a snapshot of a virtual machine that is hosting a domain controller and restore the snapshot later. When the snapshot is restored, the domain controller becomes "rolled back."

    To mitigate the effects of USN rollback in virtual domain controllers, a feature was introduced in Windows 2012 called VMGenerationID. When a virtual machine is restored from a snapshot, restarted after being deprovisioned, or restored through a "service healing" event, the VMGenerationID of the virtual machine changes. This has the following effect on domain controllers:

    • Changes the InvocationId: A change in VMGenerationID also causes the InvocationId to be reset. Active Directory is resilient to changes in the InvocationId, but if the change occurs on multiple domain controllers, there is a large processor spike while the domain controllers figure out how up-to-date they are with respect to other domain controllers. This potentially causes a denial-of-service to Active Directory. For more information on the InvocationId, please see How the Active Directory Replication Model Works.
    • Invalidates the RID Pool: A relative identifier (RID) is a unique identifier that domain controllers generate, to ensure that security principals have unique security identifiers (SID) when they are created. Each domain controller maintains a "pool" of RIDs, and each domain controller's pool is unique from other domain controller's. This enables security principals to be created at any domain controller without fear of a duplicate SID being assigned.
      When the VMGenerationID changes, the domain controller's current RID pool is discarded. This poses no immediate threat, but the upper-limit of RIDs is around two billion. So if the RID pool is large, and many domain controllers are deprovisioned and re-hydrated over and over again, it is possible to exhaust the available RIDs in the domain.
    • SYSVOL replication is set to Non-Authoritative mode: SYSVOL is used to hold scripts and security policies that are applied throughout the domain. When the VMGenerationID changes, SYSVOL replication is set to non-authoritative mode, which means that it will accept replication from other domain controllers, but it will not replicate any changes that it made.
      If you shut down and deprovision all of your domain controllers, when they come up, the VMGenerationID will change on all of them. The result is that all the domain controllers have non-authoritative copies of SYSVOL, and replication will stop working, even though everything appears to be working fine.

    Recommended: If all DCs are hosted in Azure, do not shut down all of the DCs at the same time from the Azure console. This is will de-provision the DCs and cause the VMGenerationID to change upon starting the VM back up, ultimately causing SYSVOL replication to break.

    Virtual Machine De-Provisioning

    In the past, when a virtual machine in Azure was shut down, Microsoft would continue to charge for the virtual machine, even though it's not running. The reason is that the resources were still committed to the virtual machine, so the resources couldn't be used for other customers.

    To address this, a new shutdown option was added to Azure, called "Stop and De-allocated". Now, when a virtual machine is stopped via the Azure portal, the hard disk remains intact, but several of the resources are deprovisioned. The result is that when the virtual machine is turned on, several things change, including the IP address, the CPU ID, and most importantly, the VMGenerationID.

    As discussed earlier, a change of the VMGenerationID can greatly impact Active Directory. Therefore, we strongly recommend virtual machines that host domain controllers are only shut down from within the virtual machine's operating system. Shutting it down in this manner ensures that the resources are not deprovisioned, and the VMGenerationID will not change on the domain controller when it boots up.

    Mandatory: Never stop a domain controller through the Azure portal. Always shut down the domain controller from the operating system inside the virtual machine.

    References

    Service Healing – Auto-recovery of Virtual Machines

    http://azure.microsoft.com/blog/2015/04/07/service-healing-auto-recovery-of-virtual-machines/

    KB875495 – How to Detect and Recover From a USN Rollback in WS2003, WS2008, and WS2008 R2

    https://support.microsoft.com/en-us/kb/875495

    How the Active Directory Replication Model Works

    https://technet.microsoft.com/en-us/library/cc772726(v=ws.10).aspx

    AD Design Considerations

    When deploying domain controllers in Azure, there are some specific things to consider for your Active Directory design. This section outlines these considerations.

    Active Directory Sites and Subnets

    It is generally recommended to consider the Azure datacenter as a separate Active Directory site, because it will have its own IP address space and routing considerations. For many applications and services, it is preferable to have a domain controller available within the site, and it's typically preferable to have a local connection instead of traversing the WAN.

    More importantly, the customer needs to consider what happens to a virtual machine in Azure if it can't reach a domain controller. The standard recommendation is to place two domain controllers in each Azure region where virtual machines reside.

    It is important to note that a resource domain or forest is not recommended given the additional overhead, and these do not represent an effective security boundary.

    Recommended: Place two domain controllers within an availability set in all Azure regions where virtual machines reside.

    In addition, there should be an Active Directory site created for each Azure region, and all of the virtual networks in that region should be associated with that site. Standard guidance applies for the definition of Active Directory site links, that is, create the appropriate site links so replication and DCLocator functionality works correctly.

    Recommended: Create a unique Active Directory site object for each Azure region where virtual machines reside, and associate all of the virtual networks in that region with the Active Directory site.

    Global Catalog

    In modern Active Directory deployment there's little reason to not make every domain controller a Global Catalog server. The standard guidance for Global Catalogs also applies to domain controllers in Azure. As a recommended practice, make all of the domain controllers in Azure Global Catalog servers.

    Recommended: Make all domain controllers in Azure Global Catalog servers.

    DNS

    DNS is instrumental to the operation of Active Directory. There should always be DNS servers located alongside the domain controllers, and most of the time we recommend that DNS be Active Directory-integrated. This does not change with Azure.

    The domain controllers in Azure should run the DNS Server service, if possible. If you are not using DNS in Windows, there should be a DNS appliance in Azure for the domain controllers to use. Otherwise, a VPN outage will render DNS unavailable and prevent the domain controllers in Azure from operating correctly.

    Recommended: Domain controllers in Azure should be also be DNS servers, if it's in line with your existing Active Directory architecture. If using third-party DNS appliances, there should be a virtual appliance available in the Azure tenant.

    Azure provides a default DNS service to virtual machines if you don't specify a DNS server. The Azure name resolution services do not support the complex name resolution needs of Active Directory, so do not attempt to use Azure DNS servers on domain controllers.

    Mandatory: Make sure that domain controllers are pointing to a DNS server in Windows that hosts the Active Directory zones, rather than the default DNS servers in Azure.

    Organizational Units

    Organizational units (OUs) in an Active Directory design are important for operational and security management of Azure assets, especially when extending existing on-premises forests into the Azure cloud. OUs, security groups, and Group Policy Objects (GPOs) provide key administrative controls that can provide containment boundaries within a security zone.

    Integrating Applications with Azure Identity Solutions

    One of the primary challenges of integrating applications into Azure is getting the applications to work with an identity system. The most obvious case is when an application is moved from an on-premises web server to the cloud as a PaaS application.

    When on-premises, the application could leverage Active Directory for authentication, group memberships for authorization, and other on-premises identity repositories for accessing identity data. But when that application is moved to a PaaS application, those on-premises systems are no longer accessible.

    Even if Active Directory is extended to an IaaS application as discussed in the previous section, Active Directory is still contained in a virtual network that is isolated from PaaS applications. This makes it difficult to migrate applications to Azure, and it is one of the biggest blockers to moving those workloads to the cloud.

    The majority of applications can't function without an identity layer. Commonly, applications rely on an identity system for:

    • Authentication – Proving that the user sitting at the keyboard is the same person each time an application is used.
    • Authorization – Ensuring that the user can only gain access to the services and components that they are authorized for.
    • Personalization – Applications often store information about the end user for personalizing the user's experience in the application, such as a user's avatar.
    • Background Processing – Many applications store data about the user that is not available during the user's session, so that an out-of-band background process can leverage the identity data. An example of this is an application that stores user email addresses so that email notifications can be sent throughout the week.

    Applications have two options with their identity layer. The first option is to create the necessary identity functionality within the application. This adds a considerable amount of complexity to an application because the application now has to deal with the storage of identity data, security of the data, and user interfaces for interacting with it.

    Applications that take on the identity layer need to do the following, at minimum:

    • Build and manage a secure user repository
    • Obtain expertise in enterprise-grade cryptographic algorithms for storing and validating passwords or other user secrets
    • Develop user interfaces for signing a user in and managing profile data
    • Develop processes and services for resetting a user's forgotten password
    • Develop a process to ensure that the data stored about the user is kept up-to-date
    • Provide Help Desk functionality when users have issues with their accounts, such as forgetting passwords

    When considering the amount of work that an application developer has to undertake to manage the identities, it's obvious why most developers are integrating with trusted cloud identity solutions instead. This section focuses on primary scenarios for integrating applications with Azure-based identity solutions.

    Application Integration Methodologies

    Before digging in to the various scenarios, there are some baseline concepts that need to be covered. There are several ways to integrate an application with an Azure-based identity solution, so before going into the specific recommendations, it's important to understand these approaches. The concepts that are discussed in this section are:

    • Federating the application
    • Proxying the application
    • Vaulting the user's credentials for the application
    Federating the Application

    Identity federation gives an application the ability to defer authentication (and some types of authorization) to a source that is, in theory, more authoritative for authenticating users.

    For example, if you have an application that you want to share with a partner, the application can be federated with that partner. The partner then becomes the entity that authenticates the users. The application simply receives proof that the user was successfully authenticated by the partner. The following diagram depicts what a federated relationship looks like between an organization and one of its partners.

    Notice in the diagram that both organizations have an Identity Federation Service deployed. The federation service in the Resource Org (the organization hosting the application) trusts authentications that occur at the federation service in the Partner Org (where the users authenticate). Also, the web application trusts the federation service inside of its own organization.

    There are several reasons for this architecture. The biggest benefit to the application is that the federation service in the Resource Org acts as a buffer between the Partner Org and application. Because of this, the Resource Org has better control over who is authorized to access the application.

    If there was more than one partner (as is usually the case), the Identity Federation Service in the Resource Org manages those relationships, so the application doesn't have to. The application has to only manage the relationship between itself and the federation service in its own organization.

    Identity federation is a big subject, and there's been a lot of content written about it. One of the best sources for understanding identity federation is a free eBook called A Guide to Claims-Based Identity and Access Control (2nd Edition).

    Common Identity Federation Protocols

    When it comes to actually implementing identity federation, a variety of protocols exist. The job of the protocol is to lay out the process that an application needs to go through to obtain a security token. The security token serves as proof that a user has successfully authenticated to a trusted identity system.

    Most of the protocols that are available were written for web-based applications, and therefore, they use common web methods (such as HTTP redirects and HTTP POST methods) for carrying out the protocol exchange. There are also protocols available for non-web-based applications that need identities for calling SOAP-based or RESTful web services. The following protocols are common today:

    • WS-Federation Passive Requestor Profile (PRP) – Browser-based apps
    • SAML 2.0 Web SSO Profile – Browser-based apps
    • OpenID Connect – Browser-based apps
    • WS-Trust 1.3 – Client and service apps
    • OAuth 2.0 – Client and service apps

    WS-Federation PRP and SAML 2.0 Web SSO Profile

    The WS-Federation Passive Requestor Profile (WS-Federation) and the SAML 2.0 Web SSO Profile (SAML 2.0) protocols use the same general process for providing an application with a security token. Therefore, the following description applies to both protocols.

    However, these protocols are not compatible. That is, an application that only understands WS-Federation cannot request a security token from a federation service that only understands SAML 2.0.

    Although the protocol flow is the same, the difference between the two is the messaging. For example, in WS-Federation, the parameters are given to the federation service via individual query string elements in the URL. In SAML 2.0, there is an XML message that is encoded and placed in a single query string element to the federation service. Therefore, a SAML 2.0-based federation service wouldn't know what to do with a WS-Federation request.

    WS-Federation and SAML 2.0 are web sign-in protocols that enable web applications to authenticate users over a web browser. The first step of this process is to establish a relationship between the application and the federation service. This relationship is called a federation trust.

    This is depicted in the following diagram, where the web application on the right trusts authentications performed by the Identity Federation Service on the left.

    After a trust is established, users can sign in to the web application. To initiate the log-on process, the application redirects the user's web browser to the log-on URL in the trusted Identity Federation Service. This is done via an HTTP 302 Redirect, so the user's web browser automatically browses to the log-on page after clicking the sign-in button.

    After the user is redirected to the log-on page on the Identity Federation Service, users authenticate by using their credentials. These credentials can take any form that the Identity Federation Service supports, such as user name/password, Kerberos protocol, or even X509 certificate authentication.

    After the user is authenticated, the Identity Federation Service creates a security token that proves the user successfully authenticated. This security token also contains some additional information about the user (called claims), so that the application has some idea of who the user is.

    This security token is signed by using an X509 certificate to prevent a malicious user from tampering with its contents. The security token can exist in a variety of formats, but the most common formats are:

    • An XML-based format called SAML (not to be confused with the SAML protocol)
    • A JSON-based format called a JSON Web Token (JWT)

    The Identity Federation Service then provides this security token to the application by returning it to the user's web browser in a web form. The user does not have to click the Submit button in the web form because it's hidden.

    Instead, the Identity Federation Service loads some JavaScript into the browser that automatically clicks the Submit button for the user, which initiates an HTTP POST event to the application. The data that is POSTed to the application is the security token that the Identity Federation Service created.

    Now that the web application has the security token, it can inspect it and check the certificate to make sure it was created by the same Identity Federation Service that it has a trust relationship with. The application can then extract the identity information from the security token and use it.

    OpenID Connect

    OpenID Connect (OIDC) is another web sign-in protocol that was ratified in February 2014. Its purpose was to take the basic OAuth 2.0 flows and add an identity layer to make it suitable for authenticating users to web applications. OIDC is a more modern web sign-in protocol, and it has distinct advantages over WS-Federation and SAML 2.0.

    Similar to WS-Federation and SAML 2.0, the first step in OIDC is to establish a relationship between the application and the Identity Federation Service, which is also referred to as the OpenID Connect Provider.

    During this process, the application registers with the OpenID Connect Provider. Unlike WS-Federation and SAML 2.0, however, the web application registers a credential with the OpenID Connect Provider, which will allow the web application to authenticate itself later during the token exchange.

    Now that the application is registered, the user can sign in. When the user browses to the web application and clicks the Sign in link, the web application redirects the user to the OpenID Connect Provider.

    To accomplish this, the web application uses a standard HTTP 302 Redirect message. All of the data that the OpenID Connect Provider needs to do the job of authenticating the user is included by the application in the query string elements of the URL in the redirect.

    In the next step, the OpenID Connect Provider prompts the user for authentication. This authentication challenge can take any form that the OpenID Connect Provider supports, including user name/password, Kerberos protocol, or X509 certificate authentication.

    After authentication is successfully, OIDC prompts the user for consent to release some of the user's identity data to the application. This is one of the main differences between OIDC and other web sign-in protocols.

    After authentication, the OpenID Connect Provider creates an authorization code for the application. This authorization code is not the security token. Instead, the application can use this authorization code to retrieve the security token from the OpenID Connect Provider.

    After the authorization code is created, the OpenID Connect Provider redirects the user to the web application with the authorization code in a query string parameter, so that the application can receive it.

    Now that the web application has the authorization code, the user's browser is finished with its job. The application goes back to the OpenID Connect Provider and redeems the authorization code for an identity token. It does this by calling a RESTful web service on the OpenID Connect Provider and sending an HTTP POST method to the OpenID Connect Provider with the authorization code.

    The web application authenticates to the OpenID Connect Provider by using the credential that it established during registration.

    As a final step, to prevent a man-in-the-middle attack, the web application verifies that the identity token was indeed created by the OpenID Connect Provider by checking the signature. The web application can then extract the identity data from the token and use it for the session.

    WS-Trust

    WS-Trust is a protocol that is used for authenticating to SOAP-based web services. This has traditionally been used in client/server applications that run on the desktop. The client application must collect the credentials from the user, and call the Identity Federation Service to exchange those credentials for a security token that the web service will accept.

    Similar to the other protocols, the first step is for the web service to establish a trust with the Identity Federation Service. The client does not have a trust because the client is not receiving the security token. It's akin to the web browser in the previously discussed flows.

    WS-Trust generally assumes that the client running on the user's device is a trusted entity. Therefore, the first step in the process is for the client to prompt the user for credentials. Like the other protocols, these credentials can take any form that the Identity Federation Service accepts, such as a user name/password or a Kerberos ticket. The client then sends a request to the Identity Federation Service called a Request for Security Token (RST).

    The Identity Federation Service verifies the credentials, and sends a response back to the client called a Request for Security Token Response (RSTR). The RSTR contains the security token that the client will present to the web service.

    The client then calls the web service, including the security token as an authenticator in the SOAP message. When the web service receives the message, it extracts the security token and verifies that it was issued by the Identity Federation Service that it trusts.

    OAuth 2.0

    OAuth 2.0 is a client/server protocol. Unlike WS-Trust, OAuth 2.0 is a RESTful protocol which is most suitable for tablet or phone applications that need to access RESTful web services. Most modern applications leverage RESTful web services, so this protocol has gained a lot of popularity.

    One thing to note about OAuth 2.0 is that its original purpose was not to be an identity protocol. Rather, it was authored with the intention of providing delegated access to web services for applications running on a user's device.

    The following table describes the four main flows in OAuth 2.0, which can provide access tokens for a variety of application types:

    OAuth Flow Type

    Description

    Authorization code flow

    Perhaps the flow most often used by clients, this flow allows applications installed on a device to use RESTful web services. The password is not given to the application; rather, it's given to the OAuth server, which means that the application is not able to store the password.

    Client credentials flow

    This flow is typically used by a web application or web service that needs to access another web service. The user is not involved in this flow; rather, it's for one application authenticating to another.

    Resource owner password flow

    This flow allows an application installed on the device to collect credentials from the user and exchange them for an access token that can be used to access the web service. This should only be used with first party applications, because the user is trusting the application with a user name and password in most cases.

    Implicit flow

    This flow is used for applications that are not installed on a device, for example, a JavaScript client that needs to access a web service.

    For more details on the flows, please see the OAuth 2.0 specification, RFC6749. Rather than diving into each of these flows, the remainder of this section details the Authorization Code flow, which is the flow that most customers will likely be using.

    Like the other protocols, the first step in an OAuth 2.0 flow is to establish a relationship with the Identity Federation Server (the OAuth server). This process is similar to the OpenID Connect registration process, where the application registers an authentication secret. In addition to the web service, the client registers with the OAuth server.

    These registrations are performed by the developers of the respective applications.

    When the user launches the application on their device and attempts the sign-in process, the client opens a web browser control and directs the user to the log-on page on the OAuth server. The user is also presented with the log-on page, as well as the consent page, which allows the client to use the web service on their behalf.

    After the user authenticates and consents to the OAuth server, the OAuth server creates an authorization code and returns it to the client. To get the Authorization Code to the client, the OAuth Server returns it as a query string parameter in an HTTP redirect. However, because the client is not a web browser, it pulls the authorization code out of the URL and does not follow through on the redirect.

    For modern Windows applications, this is a special URL that begins with ms-app://, but it could be a standard https:// URL pointing to a server that doesn't exist. Because the application does not follow through on the redirect, it does not matter if the URL is real.

    After receiving the authorization code, the client returns it to the OAuth server and redeems it for an access token.

    Now that the client is in possession of the access token, the client can call the web service, using the access token as an authenticator. Note that this authenticator does not authenticate the user. Instead, it authenticates the fact that the client has access to the web service on the user's behalf.

    Identity Federation Support in Azure AD

    Azure Active Directory natively supports the following identity protocols and security token formats in Azure AD:

    • WS-Federation
    • SAML 2.0
    • OpenID Connect
    • OAuth 2.0

    What this means is that there are a variety of protocols to choose from for integrating an application with Azure AD. This also means that for custom applications, it is likely that some portions of the application need to be rewritten to support one of these protocols.

    If it's a COTS application, and the application doesn't support any of these identity federation protocols, then you'll need to use one of the other approaches outlined in this section. The following diagram provides the general approach for integrating applications into Azure AD:

    Identity Federation Support in AD FS

    Active Directory Federation Services is Microsoft's on-premises identity federation service product. Similar to Azure AD, AD FS supports WS-Federation, SAML 2.0, and a limited subset of the OAuth 2.0 protocol flows. The following diagram illustrates the protocol support built-in to AD FS in Windows Server 2012 R2.

    Note that unlike Azure AD, AD FS in Windows Server 2012 R2 does not provide support for OpenID Connect, however, this is present in AD FS for Windows 10.

    Identity Federation APIs

    In theory, if application developers read the identity protocol specification documents, they can write the application to support each protocol natively. This, however, is a very large undertaking because these specifications aren't simple, and there are several cases that divert from the protocol specifications.

    The better approach for an application developer is to leverage an existing application programming interface (API), which takes on all of the complexity associated with following the protocol specification. These APIs typically come in the form of a library that the developers can use.

    When AD FS 2.0 was released, we also released one of these APIs for .NET applications. This API was called the Windows Identity Foundation (WIF). WIF came with a variety of .NET libraries that made it simple for developers to interact with the WS-Federation and WS-Trust protocols. The problem with including all of the protocol support in a single monolithic library is that it's difficult to update the library to support newer and emerging protocols.

    Also, in recent years, there's been an effort to decouple the .NET Framework from the Microsoft IIS server. As a result, newer interface methodologies for .NET have come forth. In particular, Open Web Server Interface for .NET (OWIN) is now the preferred way to provide federation protocol support to .NET applications.

    OWIN provides a modular platform, and Microsoft has started building modules to support each of the identity federation protocols. Developers can integrate these modules into their applications without a fee, and they can enable identity federation support for the application very quickly.

    Recommended: Use OWIN modules instead of the Windows Identity Foundation SDK for adding identity federation protocol support to .NET applications.

    Azure AD Application Proxy

    The Azure AD Application Proxy enables external access for on-premises applications. When a user connects to an on-premises application from outside the network, the connection is made directly with Azure and conveyed via proxy into the private network via an application proxy agent.

    This allows a customer to externalize their on-premises applications through Azure AD and provide pre-authentication and single sign-on. For more information about Azure AD Application Proxy, please see Using Application Proxy to Publish Applications for Secure Remote Access.

    The Azure AD Application Proxy is a good solution for applications that are on-premises and rely on Windows authentication. Because it can provide Kerberos delegation, users who are outside the network can authenticate their credentials in Azure AD before the connection with the application is established. By doing this, you can establish policies for the application, such as a requirement for Multi-Factor Authentication.

    Recommended: Do not use the Azure AD Application Proxy for users who are external to a customer's organization. For Kerberos constrained delegation to work, a shadow account needs to be provisioned and managed in the on-premises Active Directory forest..

    Mandatory: Azure AD Application Proxy requires an Azure AD Basic or an Azure AD Premium subscription.

    Password Vaulting

    Password Vaulting solves a very specific problem of integrating with web applications that don't support any type of Single Sign-On (SSO) solution. With password vaulting, users store their passwords for applications in Azure AD. The application is integrated with the Azure AD Access Panel, which is an application launcher in Azure AD.

    When the user launches the application from the Access Panel, a web browser plug-in is launched. The plug-in retrieves the password from Azure AD, enters it into the sign-in form, and clicks the Sign In button for the user. By using this technique, users don't have to enter credentials in the web applications. For more information on password vaulting, please see Application Access Capabilities for Azure Active Directory.

    Mandatory: When using password vaulted applications, a browser plug-in is used. Therefore, a user cannot run their web browser in "private" mode.

    Recommended: Use password vaulting for integrating applications that use forms-based authentication with Azure AD. These applications are typically difficult to integrate with SSO solutions because they do not accept standard SSO protocols.

    One of the main benefits of password vaulting is that it can integrate with almost any application that has a sign-in page, even if it's not an application that is integrated into the Application Gallery in Azure AD. So when a customer wants to integrate every application with Azure AD, password vaulting is an integral piece of the solution.

    Password vaulting is not a perfect solution without its challenges, though. When working with customers that want to use password vaulting, ensure that they understand the following key points:

    • Password vaulting requires a browser plug-in that must be installed on the clients.
    • Users must still manage their password for the application. If the password in the application is changed, the user must also change it in the password vault.
    • The entry point for the application must be through either the Azure AD Access Panel or a predefined link. The user cannot browse to the application and click the Sign In button from within the app.

    References

    A Guide to Claims-Based Identity and Access Control (2nd Edition)

    https://msdn.microsoft.com/en-us/library/ff423674.aspx

    OpenID Connect Website

    http://openid.net/connect/

    Using Application Proxy to Publish Applications for Secure Remote Access

    https://msdn.microsoft.com/en-us/library/azure/dn768219.aspx

    Application Access Enhancements for Azure Active Directory

    https://msdn.microsoft.com/en-us/library/azure/dn308588.aspx#bkmk_passwordsso

    On-Premises Applications

    Traditionally, on-premises applications integrate with a customer's Active Directory implementation, or some other well-known directory in the environment. Although customers don't have 100% of their applications integrated with Active Directory, they usually prefer to do it if the application supports it.

    Some of the key reasons a customer might decide to integrate an on-premises application with an Azure identity solution are to:

    • Make the application available to employees from outside of the network
    • Align heterogeneous applications under a single authentication framework, typically for single sign-on
    • Integrate applications with a cloud-based authorization model
    • Share applications with other organizations

    Regardless of the reason, there are two ways to integrate an on-premises application with Azure Active Directory:

    • Through identity federation
    • Through the application proxy.
    Identity Federation

    Most of the time, when federating an on-premises application, the sign-in and sign-out portion of the application will have to be rewritten to handle one of the identity federation protocols discussed in the Microsoft Azure Networking section in this document.

    To accomplish this, the developer needs a library that implements the protocol. Federated identity libraries won't be available for every application platform, but most of the platforms have them available.

    For .NET 3.5 and .NET 4.0, we recommend you use the WS-Federation protocol support that is available in Windows Identity Foundation (WIF). For more information about WIF, please see 7 Hours of Recordings from the WIF Workshops.

    Recommended: For .NET 4.0 and .NET 3.5 applications, use Windows Identity Foundation with the WS-Federation protocol.

    Mandatory: We do not have any native WS-Federation libraries available for .NET 3.0 or earlier. Customers must update their applications to .NET 3.5 at a minimum to use the WIF API.

    For applications that use .NET 4.5, we recommend using the OWIN libraries for WS-Federation or OpenID Connect. For more information, please see:

    Recommended: For .NET 4.5 applications, use the OWIN OpenID Connect module, if possible.

    For Java applications, there are two approaches. First, if the Java application is running on a J2EE server, such as Oracle WebLogic, it's possible that the J2EE server natively supports one or more identity federation protocols.

    If that's the case, we recommend that you use the native support of the web platform. We've found that most J2EE platforms support SAML 2.0, and usually, a few additional protocols. If the Java application is running on Apache, the protocol integration can be done with a servlet, such as the Oracle OpenSSO Fedlet.

    Recommended: For Java applications running on a J2EE server, use the native SAML 2.0 support that most platforms have. If the application is running on Apache Tomcat, use the Oracle OpenSSO Fedlet, or use a suitable servlet alternative with SAML 2.0 support.

    Azure AD Application Proxy

    If re-writing the application to support an identity federation protocol is not possible, the Azure AD Application Proxy is an alternative that you can use for on-premises applications. When integrating with the Azure AD Application Proxy, the application gets the following benefits:

    • The Application Proxy makes the app accessible over the Internet.
    • Application Proxy can pre-authenticate users against Azure AD.
    • Application Proxy can enforce Azure AD-based Multi-Factor Authentication for the application.
    • If integrating multiple applications with Azure AD, the Application Proxy can facilitate SSO to Azure AD for on-premises applications.

    When using the Azure AD Application Proxy, the application is still authenticating against the on-premises Active Directory—the application proxy does not change that. However, the application proxy is able to pre-authenticate users with Azure AD, and then facilitate SSO to the on-premises Active Directory by using a protocol transition technology called Kerberos constrained delegation.

    Because there is still a dependency on Active Directory, all users of the application must have an active Active Directory account in the customer's forest. Because of this, Azure AD Application Proxy is most suitable for internal users who access on-premises applications over the Internet.

    IaaS Applications

    When applications are running in IaaS, they are actually running on standard web servers (such as IIS) on virtual machines in Azure. If you consider Azure IaaS as a second datacenter that extends a customer's on-premises network, applications running in Azure IaaS are going to have similar options to applications running on-premises. Therefore, the options outlined in previous sections should be considered.

    When deciding which integration approach to take, it's first important to understand the customer's motivation for wanting to move the application to IaaS.

    Implementing in IaaS for Physical Reasons

    If the customer is moving the application to IaaS for physical reasons, the application will likely behave similarly to if it was hosted on-premises. Some examples of physical reasons are:

    • On-premises datacenter is out of space
    • Desire to cut hardware costs
    • Datacenter modernization efforts under way
    • Provide better geographic redundancy to an existing application
    • Make the application available in a business continuity or disaster recovery scenario

    If any of these scenarios are the motivation for the customer to move the application to IaaS, the best approach likely would be to also extend the on-premises Active Directory forest into IaaS. This allows the application to leverage Active Directory without any modifications to the application or the identity management processes for the customer.

    There is, however, some additional engineering required for planning to extend Active Directory to Azure IaaS, in addition to some administrative burden. Please see the previous section for more guidance about how to extend Active Directory into IaaS.

    Making the Application Available to Employees over the Internet

    Another potential motivation for customers is to move the application to the cloud so that they can provide access to the application for employees without VPN access. As the workforce becomes increasingly mobile, the motivation for employees to access internal applications from their personal devices is growing. To accomplish this goal, there are three potential solutions.

    Proxy an Active Directory-Integrated Application

    The Azure AD Application Proxy can be used to make an application on a private network accessible over the Internet. If the application is integrated with Active Directory, the customer should use Azure AD pre-authentication. Otherwise, a user on the Internet will have a poor sign-in experience.

    They will see the uninformative HTTP 401 credential dialog from the web browser, instead of a web-based sign-in page. This may also prevent the user from getting a SSO experience, and could cause the user's credentials to be stored in the Windows Credential Manager on their personal computers.

    Recommended: If the application is Active Directory-integrated, enable pre-authentication at the Azure AD Application Proxy.

    In addition, the customer should deploy Active Directory into the Azure IaaS tenant with the application, as discussed in the previous section. Unless this happens, the application will not be able to authenticate users if the network link to the on-premises environment is down.

    Recommended: If the application is Active Directory-integrated, extend the Active Directory forest to the IaaS tenant. This ensures that the application can authenticate users over the Internet in the event that the S2S VPN connection is down.

    Proxy a Forms-Based Application

    The Azure AD Application Proxy can also proxy applications that do not integrate with Active Directory. It is expected that these applications have their own sign-in page, which authenticates users against some other type of credential store.

    If this is the case, the customer will likely want to integrate the application in pass-through mode (without pre-authentication). Otherwise, the user will get two password prompts—one for Azure AD and another for the application.

    Recommended: If the application uses forms-based authentication, proxy it in pass-through mode.

    If the customer wants to also achieve SSO for these applications, they can use the password vaulting technique discussed in previous sections.

    Recommended: To achieve SSO for forms-based applications, use password vaulting.

    Proxy a Claims-Based Application

    If the application already understands one of the identity federation protocols discussed in previous sections, Azure AD Application Proxy can proxy the application either with pre-authentication or in pass-through mode.

    If the application is published in pass-through mode, Azure AD will not interfere with the authentication process. Instead, the application walks the user through the process of performing federated authentication. Customers would likely want to use this option if the application is claims-based, but is not federated with Azure AD.

    Recommended: For claims-based applications that are federated with something other than Azure AD, proxy them in pass-through mode.

    If the application is federated with Azure AD, the customer should enable pre-authentication. Pre-authentication provides an additional layer of protection for the application, and in this case, the user experience would not be compromised as a result.

    Recommended: For claims-based applications that are federated with Azure AD, use pre-authentication.

    Making the Application Available to Partners

    If the customer wants to move an application to IaaS and make it available for partners to use, the solution becomes a bit more difficult.

    The Application Already Manages Partner Identities

    If the application already manages partner identities, it would already have an existing identity repository and authentication mechanism for partner users. In this case, the customer should simply publish the application with the Azure AD Application Proxy in pass-through mode. In this case, the application performs the authentication process end-to-end, and the application proxy is there to make the application accessible from outside the intranet.

    The Application Does Not Already Consume Partner Identities

    If the application only allows employee access, and the customer wants to now open it up to partners, the difficulty of the solution increases. In general, the application needs to be updated to support an identity federation protocol, which is previously discussed. Depending on the complexity of the application, this may not be a minor undertaking.

    PaaS Applications

    In general, applications that are deployed in Azure PaaS do not have access to Active Directory. The PaaS platform does not support Integrated Windows Authentication, so integrating these applications is not possible. There are two ways to support this application.

    Application Handles Identity

    The first approach is that the application entirely handles the identity layer, however this is typically not desired. As a result, there are a lot of consequences for the developers and application support team. This approach is not recommended as the first option.

    Recommended: We do not recommend that the developer write the identity layer into a PaaS application.

    If a customer chooses to use this approach, password vaulting can be used to provide users with a SSO experience for these applications. The one requirement is that the application must use a sign-in form with a user name and password field.

    Recommended: If a customer builds the identity layer to the application, use password vaulting to provide users with a SSO experience.

    Mandatory: To integrate a PaaS application with password vaulting, the application must have a sign-in form with a user name and password field.

    Federation with AD FS

    The third approach for the PaaS application is to directly integrate it with Azure AD. To perform this integration, the application needs to support federated identity and must integrate over one of the protocols that are discussed earlier in this document.

    By integrating with Azure AD, the PaaS application can leverage some of the identity service capabilities that Azure AD provides for developers.

    Authentication

    When a PaaS application is federated with Azure AD, the user is taken to the Azure AD sign-in page to authenticate. Azure AD can choose to sign in the user, or if the customer has integrated Azure AD with another identity provider (such as AD FS), the user can authenticate to an on-premises Identity Federation Service.

    The benefit to the application is that the developer doesn't have to be concerned with how the authentication process happens. Additionally, if the customer is using the Azure AD version of Azure Multi-Factor Authentication, Azure AD can perform MFA on behalf of the application.

    Authorization

    Another benefit for integrating PaaS applications with Azure AD is that user and group-based authorization can be performed. If a customer is synchronizing group memberships into Azure AD, those group memberships can be used to grant users access to the PaaS application. Otherwise, customers can manage the group memberships in Azure AD directly.

    Mandatory: The customer must have an Azure AD Basic or an Azure AD Premium license to use group-based access control.

    Identity Data

    One benefit that a PaaS application gets from Azure AD that it does not get from AD FS is access to the identity data repository. Applications can store identity data in Azure AD by using a RESTful interface called the Graph API. If an application needs to store data about users for out-of-band use, it can use Azure AD instead of having its own identity data store.

    Access Panel

    When the PaaS application is directly integrated with Azure AD, it can be displayed in the Azure AD Access Panel. The Access Panel is an application dashboard that serves as a single portal for all of the user's applications and self-service identity management capabilities.

    SaaS Applications

    There are two primary identity functions that are common to most SaaS applications:

    • Authenticating users
    • Provisioning user accounts into the application.
    Authenticating to SaaS Applications

    When integrating SaaS applications with an Azure-based identity solution, the options are going to vary greatly, depending on what the SaaS provider supports. Because a SaaS application isn't owned by the organization that is using it, there isn't much flexibility in changing the application to support a particular integration approach.

    Therefore, a comprehensive enterprise approach for integrating SaaS applications is going to require a variety of integration techniques. There are two authentication methods, in particular, that Azure AD offers to get SaaS applications integrated with the enterprise:

    • Federation
    • Password vaulting

    Azure AD offers a catalog full of applications that are pre-integrated.

    Federation through Azure AD

    An increasing number of SaaS applications are supporting identity federation standards and SAML 2.0, in particular. The main benefit of integrating a SaaS application with Azure AD is that Azure AD can become the hub of application integration.

    Customers who use 100 applications would normally have to integrate each application individually with their on-premises identity system. However, Azure AD can integrate with the 100 applications so that the on-premises systems don't have to. There only needs to be a single integration between the on-premises systems and Azure AD, which many customers already have in place through Office 365.

    A federated identity trust also provides users with a first class experience when signing in to SaaS applications. If the customer uses AD FS with Azure AD, the user gets a seamless desktop SSO experience to the application when accessing the application from the intranet.

    The experience for a user who is signed in to the AD domain on the intranet would resemble the following:

  1. The user browses to the web application.
  2. The user clicks the sign-in button at the application to initiate sign in.
  3. The user enters her email address in user name field of the application.
  4. The user is redirected to Azure AD.
  5. Azure AD redirects the user to AD FS on-premises.
  6. AD FS automatically signs in the user via Kerberos protocol.
  7. AD FS provides a security token to Azure AD.
  8. Azure AD provides a security token to the application.
  9. The user is successfully signed in without having entered a credential.

It's important to note that not every SaaS application offers an option to perform a federated identity trust. Therefore, only a subset of the SaaS applications that are pre-integrated with Azure AD use this approach.

When integrating SaaS applications with Azure AD, it's important to understand if the application is one that Azure AD uses identity federation with. If it's not, it's using the password vaulting approach described in the Microsoft Azure Identity section, which has a very different user experience.

Password Vaulting

Password vaulting is an alternative approach for SaaS applications that don't support identity federation. Almost every application has a sign-in form, where the users enter their credential and click the Sign In button.

Azure AD is able to securely store the password that users have for these applications. When the user launches the application from the Azure Access Panel, a browser plug-in securely retrieves the password from Azure AD, fills in the sign-in form, and clicks the Sign In button for the user. The experience resembles the following:

  1. The user browses to the Azure Access Panel (https://myapps.microsoft.com).
  2. The user signs in with an Azure AD account or an AD FS account (desktop SSO is possible).
  3. The user selects the application to use in the Access Panel.
  4. The application opens in a separate browser tab.
  5. The user momentarily sees the sign-in page, while the browser plug-in populates the credentials and signs in the user.
  6. The user is signed in to the application.

One main limitation of this approach that customers should be aware of is that the application must be launched from the Access Panel. If the user browses to the application first, and clicks the Sign In button at the application, the browser plug-in will not interfere to sign in the user.

Provisioning Identities to SaaS Applications

In addition to authentication, most SaaS applications require that each user has an account in their identity repository. Even if the application is fully federated, there is often a requirement for an account so that a subscription can be associated with an individual user.

Any customer integrating with a SaaS application should be prepared to populate the SaaS provider with user accounts for the people using the application. There are two ways to approach this:

  • Manually
  • Automated synchronization

Manual Account Provisioning

Most SaaS providers provide a web interface that can be used for creating user accounts and assigning them roles, subscriptions, and attributes. This can be a time-consuming process, but smaller customer organizations might not be averse to using this approach.

One drawback with the manual method is that many organizations don't off-board employees or contractors from all of their applications as employees leave the organization. There's a lot of room for mistakes when manually managing accounts in the SaaS provider, because it's not likely that administrators will remember to delete the accounts for former contractors and employees in the SaaS applications. This not only costs more money in terms of subscription fees, but it is also a big security problem.

Account Synchronization

With account synchronization, there is an automated process that copies the identities on-premises into the SaaS provider's system. Sometimes, SaaS providers give their customers their own synchronization tool (such as that provided by Azure AD). Other times, SaaS providers simply provide an API for their customers, and ask that they use their own synchronization tool (such as Microsoft Identity Manager) to interface with the API.

When a customer uses Azure AD to integrate with SaaS providers, this process can be greatly simplified. Because Azure AD already has a copy of all the user accounts for a customer, it can synchronize those accounts into the SaaS app on behalf of the customer. In this way, Azure AD becomes not only the hub of identity federation, but also the hub of application provisioning.

Recommended: The provisioning capability is only present for a small subset of the SaaS applications that integrate with Azure AD. When discussing this with customers, it's prudent to check the Azure AD Application Gallery to determine if the application supports provisioning from Azure AD.

If account provisioning through Azure AD isn't available for a SaaS application that a customer wants to integrate with, the customer needs to provision those accounts in an out-of-band process on their own.

Azure Hybrid Identity for CSPs

Azure AD plays a critical role for CSPs, providing the identity control plane for access to Azure management portals and APIs. This allows authorized CSP agents to provision, manage, and support tenant Azure subscriptions.

Provider Azure Active Directory Tenant

Each CSP has a provider-specific Azure AD tenant (Provider Tenant) created automatically upon registration in the Microsoft Partner Center Portal. The Provider Tenant uses the standard Azure AD domain naming convention of <tenantname>.onmicrosoft.com as the default domain name. Additional custom domains may be added to the tenant to improve user sign-on experiences. The Provider Tenant can be managed through the Partner Center Portal as shown below or through standard Azure AD administration interfaces.

The Provider Tenant stores user identities for CSP staff and is not intended to store customer user identities. CSP users authenticate against the Provider Tenant to perform administrative functions through management tools and APIs.

Provider Directory Integration

The Provider Tenant can be integrated with a provider's on-premises Active Directory to enable single sign-on or simplified sign-on, self-service password reset with on-premises write-back, and other Azure hybrid identity capabilities for provider users. Directory integration for the Provider Tenant may include a combination of directory synchronization, identity federation, and password synchronization. This is achieved using Azure AD Connect between the provider's on-premises Active Directory and the Provider Tenant.

Additional Considerations

The Provider Tenant has the same capabilities as a standard Azure AD tenant. As such, the identity principles described throughout this document apply to the Provider Tenant. This includes integration of an on-premises CSP Active Directory with the Provider Tenant, multi-factor authentication, reporting, monitoring, and more.

Provider Agent and Admin Roles

A role-based security model permits degrees of access and control over CSP customers and over the Provider Tenant itself. A CSP user's role can fall into one of the following general categories:

  • Agent (supporting customers)
  • Admin (supporting the provider)

A CSP user can also be both an agent and an admin, but this role assignment is not recommended for security reasons.

Provider Agent Roles

Agent roles are intended for CSP staff who need to perform management functions over a customer's Azure subscription. Management functions include technical capabilities such as delegated administration as well as non-technical capabilities such as billing and subscription management.

Role in Partner Center

What they can do

What they can't do

Admin agent

  • Customer management
  • Subscription management
  • Service health and service requests for customers
  • Request delegated admin privileges
  • View pricing and offers
  • Billing
  • Admin on behalf of
  • Register a value added reseller
  • User management
  • Service requests for Partner Center

Sales agent

  • Customer management
  • Subscription management
  • View support tickets
  • Request a relationship
  • Manage customer leads
  • View the customer agreement
  • Register a value added reseller
  • Create support tickets for services or Partner Center
  • Resolve support tickets
  • View service health
  • View pricing and offers
  • Billing
  • Admin on behalf of

Helpdesk agent

  • Search for and view a customer
  • Edit customer details
  • Service health and service requests
  • Admin on behalf of
  • View partner profiles
  • Create a new customer listing
  • Edit customer billing info
  • Subscription management
  • Request a relationship
  • Manage customer leads
  • View pricing and offers
  • View the customer agreement
  • Billing
  • Register a value added reseller

Provider Admin Roles

Admin roles are intended for CSP staff who perform management functions over the CSP's Provider Tenant. These roles mirror the standard Azure subscription administration roles.

Role in Partner Center

What they can do

Global admin

  • Can access all Microsoft account/services with full privileges
  • Create support tickets for Partner Center
  • View agreements, price lists, and offers
  • Billing
  • View, create, and manage partner users

Billing admin

  • Can access all bills from Microsoft with full privileges
  • View agreements, price lists, and offers
  • Billing

User management admin

  • View, create, and manage users
  • View all partner profiles

Customer Azure Active Directory

A customer Azure AD tenant (Customer Tenant) is created when the customer is provisioned through the Partner Center or through the CREST API. When a new Azure CSP subscription is ordered for the customer, the provider's AdminAgent Group is configured as the owner of the subscription. The customer has no rights to manage the CSP subscription unless specifically granted by the provider.

Administered On Behalf of (AOBO) and Tenant Management

Administered On Behalf of (AOBO) is an administrative construct to allow delegation of administration to internal or external entities. Office 365 and Intune honor the Azure AD Tenant Admin role. The AdminAgents group in the Provider Tenant is made a member of the Tenant Admin role in the Customer Tenant. This allows members of the AdminAgents group in the provider's Azure AD tenant to perform administrative functions in a customer's Office 365 and Intune instances.

In a similar fashion to Office 365, the Azure CSP subscription is provisioned into the Customer Tenant. However, Azure subscriptions have "subscription-scoped" boundaries and are not managed by Tenant Admins. Instead, the provider AdminAgents group is directly granted the Owner role in Azure CSP subscriptions. This allows members of the AdminAgents group in the Provider Tenant to perform administrative functions in a customer's Azure CSP subscription.

Management tools vary depending on the service being managed. A link to management tools for each customer can be accessed from the Partner Center on the main customer page and in the Service Management section for each customer.

At the time of this writing, management of a customer's Azure AD was not supported directly through the Partner Center portal and links. The URL below can be used as a workaround for Azure AD management until the Partner Center Azure Active Directory management tool links are operational.

Azure AD Management URL Format:

https://manage.windowsazure.com/[provider tenant name]#Workspaces/
ActiveDirectoryExtension/Directory/[tenant ID]/directoryQuickStart

Azure AD Management URL Example:

https://manage.windowsazure.com/FabrikamCSP.onmicrosoft.com#Workspaces/
ActiveDirectoryExtension/Directory/3477e0d3-fecd-4586-9917-d34807715a6f/
directoryQuickStart

The Customer Tenant ID can be found in the Partner Center as the customer's Microsoft ID.

Customer Azure Active Directory Tenant

Each CSP customer has a customer-specific Azure AD tenant (Customer Tenant) created automatically when the CSP provisions the customer in the Microsoft Partner Center Portal or through the CREST API. The Customer Tenant uses the standard Azure AD domain naming convention of <tenantname>.onmicrosoft.com as the default domain name, and additional custom domains may be added to the tenant to improve user sign-on experiences.

The Customer Tenant contains identities used by the customer to manage CSP subscription resources (if authorized by the CSP) and to authenticate to services and applications within the CSP subscription. The Customer Tenant is an Azure Active Directory tenant and all guidance found earlier in this document applies to the Customer Tenant.

Service Providers connect their customers to their CSP Azure subscriptions in one of two ways – Connect To and Connect Through. In both models, customers may use their Customer Tenant to store independent identities with separate credentials. They may also desire identity integration between the Customer Tenant and an Active Directory identity store. The identity integration approach differs slightly between the two models.

"Connect Through" Identity Integration

In the Connect Through model, the Provider creates a direct connection between the provider datacenter and the provisioned customer Azure subscription using Site-to-Site VPN over the provider's network. This connectivity scenario requires that the customer pass through a provider network to access CSP provisioned Azure subscription services, using a network connection that is created, owned, and managed by the service provider.

For Connect Through customers, it is assumed that the provider has a previously-established Active Directory tenant identity store for each tenant. This identity store should be integrated with the Customer Tenant using Azure AD Connect. Guidance on integrating on-premises Active Directory with Azure Active Directory can be found earlier in this document. This guidance should be consulted to determine if directory integration is desirable for the Connect Through customer, and if so, how best to implement the integration. For the purposes of integration, the provider-hosted tenant identity store for a Connect Through customer should be considered the customer's on-premises Active Directory.

"Connect To" Identity Integration

In the Connect To model, the provider creates a direct connection between the customer datacenter and the provisioned customer Azure subscription using Site-to-Site VPN (or in the future over an Express Route connection) using the customer's network. This connectivity scenario requires that the customer connect directly through a customer network to access CSP provisioned Azure subscription services, using a direct network connection that is created, owned and managed either wholly or in part by the customer.

For Connect To customers, it is assumed that the provider does not currently have an Active Directory tenant identity store established for the customer. If the customer has an existing Active Directory, the provider may consider integrating the customer's Active Directory environment with the Customer Tenant for management of the customer's CSP subscription or for user authentication to CSP-provided services. Some customers may choose not to implement this integration and will instead use the Customer Tenant to store independent identities with separate credentials. Guidance on integrating on-premises Active Directory with Azure Active Directory can be found earlier in this document. This guidance should be consulted to determine if directory integration is desirable for the Connect To customer, and if so, how best to implement the integration.

Integration using Graph API

The Azure Active Directory Graph API provides programmatic access to Azure AD through REST API endpoints. Applications can use the Graph API to perform create, read, update, and delete (CRUD) operations on directory data and objects. This enables CSP partners to programmatically access Azure Active Directory to automate many end-user management functions, including user license assignment, user role assignment, and managing domains. Programmatic access using Graph and other APIs allows for comprehensive automation of management activities and the ability for CSPs to integrate Partner Center functionality into custom applications and management tools for use by CSP agents or customers.

Activity

Partner Center

Azure Portal

CREST API

Graph API

Windows PowerShell

Manage Reseller Users

(e)

 

Manage Customer Users

 

(e)

 

 

Manage Customers

 

   

Manage Customer Profiles

 

   

Manage Orders

 

   

Manage Subscriptions

 

   

Manage Entitlements

 

   

Manage Event Streams

   

   

Perform Admin Tasks

 

(n)

   

*

View Service Health

 

(n)

     

View Service Requests

 

(n)

     

(e) Denotes the existing Azure portal, i.e. https://manage.windowsazure.com
(n) Denotes the new preview Azure portal i.e. https://portal.azure.com
* Includes other Azure management interfaces like the Azure CLI and Azure REST API

The release of the Graph API for CSP Partners is provided to allow for the automation of end-user management functions. Note that the GRAPH API for CSP Partners is similar to the existing Azure Active Directory (AD) GRAPH API, but includes the ability to be used by Partners who have a delegated admin relationship with its customers.

Application Registration

Before you can call Graph API from an application, the application must be registered with Azure AD. Registration generates a security key for the application to use when accessing the Graph API and allows you to grant specify permissions to the application. Application registration can be done from the Azure portal. For more details on application registration, please see Integrating Applications with Azure Active Directory.

Multi-Tenant Access

Within the Azure AD application configuration, you may decide whether to allow your application to allow sign-in from other Azure AD tenants by selecting the multi-tenant option. In contrast to a single-tenant application, external users will need to consent to your application before you have access to query their directory with the Graph API.

Application Permissions

By default, applications have access to authenticate users in the Provider Tenant and read the authenticated user's profile. This does not permit access to Graph API for reading or writing other directory objects. To enable access to Graph API, you must grant the appropriate permissions (Read directory data, Read and write directory data) for Windows Azure Active Directory.

Acquiring a Security Token

Acquiring a security token from Azure Active Directory is the first step an application takes before making calls into Graph API. The security token is acquired once at the start of a session and can be used in multiple subsequent calls to Graph API.

Two pieces of information are needed for an application to obtain a security token:

  • Client ID
  • API key

Both values are obtained from the application's configuration page within the Azure AD management portal. The client ID can be read at all times, and keys can be generated and viewed a single time. After viewing, the keys remain active but cannot be seen.

The following example shows how to obtain a security token from Azure Active Directory. Replace the client ID value with your application's client ID and replace the client secret value with a valid API key generated for your application.

POST
https://login.windows.net/contoso.com/oauth2/token?api-version=1.0

HEADERS
Content-Type: application/x-www-form-urlencoded

BODY
grant_type=client_credentials&resource=https%3a%2f%2fgraph.windows.net&client_id=52752c8e-d73c-4f9a-a0f9-2d75607ecb8e&client_secret=qKDjII5%2FK8WyKj6sRo5a5vD6%2Bm74uk1A%2BpIlM%3D

RESPONSE: 200 OK

The security token will be returned to the calling application if all values are valid.

Getting user information

After obtaining an Azure AD security token, the Graph API can be called to read and write information from Azure AD.

The following example shows how to get user objects through the Graph API. Replace the Authorization: Bearer value with the security token returned from the prior call to the Azure AD.

GET
https://graph.windows.net/contoso.com/users?api-version=2013-11-08

HEADERS
Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik5HVEZ2ZEstZnl0aEV1T….
Content-type: Application/JSON;odata=minimalmetadata

RESPONSE: 200 OK

User objects in JSON format will be returned if the request is successful.

Azure AD PowerShell for Tenant Management

Azure AD PowerShell can be used to manage objects in Azure AD, including objects in both the Provider Tenant and Customer Tenants. When managing objects in the Customer Tenant using an Azure Active Directory account from the Provider Tenant, you must provide some additional information to set the proper context for commands. By default, Azure AD PowerShell commands will operate against the Azure AD tenant of the authenticated user.

Before managing a Customer Tenant from PowerShell, first obtain the Customer Tenant ID. This can be found in the Partner Center as the customer's Microsoft ID.

The Tenant ID is then used to target PowerShell operations to the appropriate Azure AD tenant instead of the default tenant for the authenticated user. The Tenant ID can be provided as a parameter to most Azure AD PowerShell commands. For example:

  1. To read a credential interactively:

    $msolcred = get-credential

  2. To connect to Azure AD with the specified credential:

    connect-msolservice -credential $msolcred

  3. To read all users in Azure AD, specifying the Customer Tenant ID to target the operation to the Customer Tenant:

    get-msoluser -All -TenantId 120355bc-ba8c-4207-b2db-80b2cb893a54

For more information on Azure AD PowerShell, please see Manage Azure AD using Windows PowerShell.

Microsoft Azure Security

Microsoft Azure Fabric and Datacenter Security

Security in the cloud-enabled world is full of determined adversaries who are constantly conducting targeted cyber-attacks. This threat environment is captured in a Microsoft whitepaper entitled Determined Adversaries and Targeted Attacks. These cybersecurity threats make securing a workload more challenging because these adversaries are adept at exploiting vulnerabilities in any layer.


An organization can leverage a trustworthy public cloud provider, such as Microsoft, to shift some security burden for lower-level components to the provider, so that they can focus their security resources on higher-layer and higher-value security operations. To help illustrate the challenges faced by most organizations in this space, Microsoft has published a Cloud Security Framework that maps customer responsibilities for security in a cloud enabled world.

Moving the first enterprise workload to a public cloud service, such as Microsoft Azure, represents a significant change for several aspects of how IT services will be provided and secured by the organization.

The most significant change for the security organization is likely learning how to establish, validate, and maintain trust for the cloud platform provider. Although this document covers security topics for storage, network, compute, and identity throughout each topic area, this section will attempt to summarize the security strategies and approaches most customers can take when planning to adopt Microsoft Azure.

Azure Security Strategies

To facilitate rapid onboarding and the realization of security benefits that come from hosting workloads in the cloud, the security strategy in this reference architecture includes two target security levels:

  • Rapid Azure onboarding security (start with "Do No Harm")
  • Advanced security strategy for Azure

Some key technical security and risk changes faced by organizations that transition from a traditional on-premises workload hosted on physical or virtual hardware to a hosted public cloud model include:

  • Availability of administrative interfaces to the public Internet
  • Potential inadvertent exposure of intranet services to the public Internet
  • Different capabilities for detective and protective controls

As organizations evaluate their security posture in the face of these changes, they need to decide what risks must be addressed as part of the Azure onboarding planning, which risks should be addressed independently, and which can be addressed after onboarding.

The primary sources for new or increased risk when moving a workload to the cloud are:

  • Availability of administrative interfaces to the public Internet
  • Potential inadvertent exposure of intranet services to the public Internet (IaaS workloads)

The following sections provide an overview of these risks, their considerations, and potential mitigations.

Risk 1: Availability of Administrative Interfaces to the Public Internet

Microsoft Azure Infrastructure-as-a-Service (IaaS) is analogous to an on-premises virtualization solution in many ways. On-premises and public cloud IaaS solutions provide similar services. At the platform level, they effectively have full technical control of all hosted workloads including data, applications, operating systems, and associated secrets, and virtual devices.

The key security difference of an IaaS solution hosted in Azure is that the administrative interfaces are potentially accessible through public Internet interfaces. A typical on-premises virtualization deployment may be restricted to only a corporate Intranet or to a dedicated management VLAN.

This change of scope potentially results in a greater availability to attack a given system, therefore resulting in a greater likelihood of attack on the system. This risk exposure is always present for Azure IaaS and PaaS solutions, but it may or may not be present given the current state of a workload.

Due to this consideration, additional measures must be taken to reduce this risk and protect the potential for accidental exposure of administrative controls to external services.

Risk 2: Potential Inadvertent Exposure of Intranet Services to Public Internet

This risk is specific to publicly hosted IaaS workloads that are transitioned to Azure by using a "lift and shift" migration approach. Typically, these systems have internal controls which are tuned for external threats and potentially ease-of-use and ease-of-management, given their previous security posture.

If not configured correctly during post-migration, a virtual machine may accept traffic from the Internet for remote management, including Remote Desktop (TCP 3389) and Windows PowerShell (TCP 5986). This potentially allows attackers to attempt to connect and authenticate to these resources, enabling them to abuse or test stolen credentials and attempt dictionary attacks or brute force attacks on the virtual machine passwords.

This is a "net new" risk to the workload when migrating to Azure because most on-premises virtual machines and physical servers are protected from unsolicited network traffic by Internet firewalls and other security appliances.

Mitigating threats such as these are critical to any migration activity and should be included in any post-migration tasks when transitioning workloads from on-premises to an Azure subscription.

Rapid Azure Onboarding Security: Start with "Do No Harm"

Organizations may need to rapidly adopt Azure services because of business priorities, budgetary reasons, security incidents, project schedules, or other reasons specific to the organization. When planning security for rapidly onboarding to Azure, organizations should take a "do no harm" approach to ensure that security risks are not increased when compared to the current on-premises posture for the organization.

These "do no harm" measures should be implemented during or prior to the onboarding process. They include the implementation of compensating controls for all new or increased risks created by moving a workload to the public cloud.

This approach is commonly used when moving Active Directory Domain Services (AD DS) domain controllers to Azure, given the sensitivity of the data, secrets, and password hashes stored there.

Rapid Mitigation Strategy for Risk 1

The risk created by the availability of administrative interfaces to the public Internet should be mitigated through the combination of protections for the authorized administrative accounts and the configurable tenant administrative interfaces. The following table describes these protections:

Mitigation Approach

Mitigation Strategy

Administrative standards and practices

Establish standards for administration that are compliant with the rules in this section of this document.

Multi-factor authentication

Ensure all administrative tenants leverage multi-factor authentication when accessing Azure subscriptions.

Hardened administrative workstations

Ensure all subscription administrators with privileges over "Tier 0" assets are using hardened administrative workstations. Guidance for building hardened workstations can be found in the following documents:

Administrative model

Document and validate the organization's intended Azure AD Administrative model including:

  • Administrative roles (including Tier 0 roles at a minimum)
  • Determining which roles will have Tier 0 rights (explicitly and through control of Azure assets, such as domain controllers, through subscription administration)
  • Criteria of who can fulfill and manage those roles
  • Determining which standards and practices the roles will follow

Logon restrictions

Restrict administrative activity to only authorized administrative workstations

Administrator education

Educate administrators with subscription administration privileges on the

  • Current threat environment -
    Organizations can use the following resources published by Microsoft to help create this content, but they should tailor it to their needs:
  • Administrative Practices and Standards
  • Log-on practices and administrative workstations

    • Know what tasks to perform on an admin workstation vs. a standard user desktop
    • Never logon to a user's desktop with administrative credentials
    • Know what logs to check every day for anomalous activity

Logging

Ensure security personnel are monitoring Azure AD administrative activity

Response and recovery processes

Add guidance for the Azure portal and resources to existing response and recovery framework. If a framework or processes don't exist, create some that include Azure guidance.

Rapid Mitigation Strategy for Risk 2

The risk created by the potential inadvertent exposure of intranet services to public Internet should be mitigated by configuring the network security protections and guidance found within in this document.

Security Compliance and Accreditation

Microsoft Azure has a strong commitment to security and compliance. Several international, industry, and regional organizations have independently certified that Microsoft cloud services, including the Microsoft Azure platform, meet rigorous security standards and are trustworthy.

By providing customers with compliant, independently verified cloud services, Microsoft also makes it easier for organizations to achieve compliance for infrastructure and applications deployed within the service. Organizations that use the Azure platform for workloads that require compliance validation or certification should:

  1. Validate the components controlled by their organization per any regulatory requirements.
  2. Reference the current compliance state of the Azure platform and components.

Because certification status can be updated frequently, it is crucial that organizations regularly review the Office 365 and Microsoft Azure Trust Centers for a complete list of security certifications and more information.

Feature References

Office 365 Trust Center

http://products.office.com/en-us/business/office-365-trust-center-cloud-computing-security#welcome

Azure Trust Center

http://azure.microsoft.com/en-us/support/trust-center/

To establish a full security strategy when implementing solutions in or consuming services from Microsoft Azure, the following considerations should apply:

Mandatory:

  • Establish the "Do No Harm" controls outlined in the preceding section.
  • Follow each of the customer responsibilities for security in a cloud-enabled world outlined here: Microsoft cloud IT architecture resources.
  • Integrate the security guidance provided throughout this document into your organization's Microsoft Azure security strategy.
  • Educate the organization's security staff with knowledge of Azure capabilities and security features.

Recommended: Integrate Azure logs with current security tools, such as Security Information and Event Management (SIEM).

To establish security operations for Microsoft Azure through administrative controls, practices, and procedures, the following considerations apply:

Mandatory:

  • All personnel with subscription admin accounts have separate, dedicated administrative account workstations. This account is used only for administrative tasks and the standard user account has no administrative privileges.
  • All subscription admins utilize Azure Multi-Factor Authentication except for one emergency access account, if used.
  • All subscription admin accounts will logon to only authorized administrative workstations with those admin accounts. Subscription admin accounts are never used on any other workstation.
  • Standard user accounts associated with administrative personnel may be used anywhere standard user accounts are permitted to logon.
  • Administrative workstations are not used for email or general web browsing, rather, only for administration of Azure and other on premises resources as appropriate. Administrative personnel can use virtualization (such as Hyper-V) or Remote Desktop to access an alternate operating system to perform standard user tasks such as web browsing and email.
  • Subscription admins should check administrative logs for anomalies each day at a minimum, ideally every time they logon.

Recommended:

  • All subscription admin accounts should be restricted by AD FS policy to be authenticated only from authorized administrative workstations.
  • An emergency access account must be created and the password for the account must be physically secured. The password will be changed on each use and the account should be configured for Azure Multi-Factor Authentication. If this is not created, emergency access will be provided through Azure support processes.

Containment and Segmentation Strategy

With the advent of cloud computing and capabilities like Microsoft Azure, many organizations are fundamentally reinventing how they acquire, use, and manage IT services. This shift presents an opportunity to take a fresh look at how core security elements are provided and how to improve those in the new designs for this age.

The strategies in this section apply to all enterprise resources, including those hosted on Azure and on-premises. It is critical to include the recommendations in this section early in the planning and implementation for Azure to prevent the cost of retrofitting them later.

This section describes considerations for designing security containment and segmentation strategies in an enterprise IT environment. They are shaped by:

  • The availability of new cloud services and capabilities spanning basic Infrastructure as a Service (IaaS) to advanced machine-learning technologies
  • A threat environment where attackers have proven adept at exploiting any type of weakness (functional, configuration, operations) at any layer of the enterprise IT stack (strategy, data, identity, application, middleware, operating system, network, physical).

As organizations move workloads to the cloud, they must address threats in new ways and shed legacy security practices that often have proven to be ineffective and burdensome. In some cases, extending to the cloud provides an opportunity to implement security controls and contain adversaries in ways that are more challenging to accomplish in existing on-premises environments. Although containment strategies are not new, the traditional network-centric approach has failed in several ways and needs to be updated.

This section defines the following terminology:

  • Containment strategy - High-level strategic approach designed to limit the risk and scope of any given compromise
  • Segmentation strategy - Component of the containment strategy that separates computing assets into security zones that reflect significantly different asset valuation, trust levels, and/or risk exposure profiles
  • Security zone - Set of computing assets with a common asset valuation, trust level, and/or risk exposure profile.

The notions of containment and segmentation have been around for a long time in IT security, though the interpretations of how to implement them have varied in practice. This document starts with an assume breach mindset and calls for designing security controls to prevent propagation of breaches among enterprise assets.

This requires architects and system designers to look at what a breached system or compromised account means to the environment so as to limit the impact of that breach, to make it detectable, and to enable the organization to respond.

This assume breach approach complements the traditional perimeter approach focused on preventing breaches for a combined approach that results in a more resilient strategy.

Traditional Approaches and Challenges

Traditionally, many IT security organizations have built segmentation and containment strategies primarily by using firewalls that filter IP traffic by protocol and port rules. These designs typically include a production intranet, an extranet (sometimes called a demilitarized zone (or DMZ), and sometimes additional segment isolation within or outside of production using firewalls.

Although many elements of these designs are valid, there are fundamental shortcomings with this approach that have led to failure of these strategies in the industry:

  • The trust level of a network segment is defined by the lowest trust device connected to it. The effective trust level of the production network and security value of the perimeter around it has been significantly diminished by:
    • The ability of attackers to easily compromise standard user workstations with targeted malware.
    • The increased business need to make internal systems available to third parties.
    • The proliferation of unmanaged and partially managed devices on production through an increasingly mobile workforce.
  • Network layer security relies on the IP address as a proxy for identity and a TCP/UDP port as a proxy for applications. These properties are not reliable in a cloud-enabled world that heavily leverages dynamic IP addresses. Attackers and application/device makers alike have been successful at using whatever ports happen to be available to transport traffic for many different applications and protocols.
    • Note that this document refers to security controls and devices as operating at the layer that they defend. For example, detonation chamber solutions like Microsoft's Exchange Online Advanced Threat Protection or FireEye Email Security products are considered application protections because they protect email applications from malicious data payloads, even though they are connected to "the network."
  • The rules and exceptions in the Firewall and IP Filter policy settings are typically high volume and highly complex, making it difficult to assess risk. Network traffic filtering devices are commonly placed at unnatural points in the network such as within a production forest with a Deny by default rule. This creates a requirement for a substantial number of exceptions as the security personnel attempt to retain some semblance of least privilege, but it results in:
    • A set of firewall exceptions that are:
      • Extremely permissive, undermining the original goal.
      • Generally incomprehensible and difficult to understand and assess from a risk perspective.
    • Additional IT operational overhead.
    • A constant point of friction between IT operations and security personnel, damaging collaboration and security effectiveness.

The net result of most of these designs is a failed strategy that is difficult to implement, costly to the organization, and repeatedly proven to be easily evaded by attackers and penetration testers.

The recommendations in this document intend to overcome these challenges by setting forth guidelines that allow effective segmentation to bring the greatest security value (blocking and detecting attackers) and minimize the operational overhead.

Building a Containment Strategy

A containment strategy should focus on:

  • Containing the propagation of risk at all feasible levels
  • Prioritizing operational effectiveness and protection of assets by business impact

The strategy should also consider the lifecycle of the asset. Many organizations will develop, test, train within the Azure environment. Each of these lifecycle stages potentially has different risk levels, asset valuations, and personnel responsible to administer the assets.

Mandatory:

  • A containment strategy should include the following elements
    • Security zones
    • Isolation between security zones
    • Containment within a security zone
    • Containment within a host (between users and administrators)
    • Containment within an application (as applicable)
  • A containment strategy should minimize the number of zones. It is taxing on the organization to set up a security zone correctly because each additional zone requires:
    • Establishing a zone infrastructure (identity systems and so on) and associated bounded roles
    • Securing the inter-zone trust boundary including the network and security infrastructure. This also includes threat modelling any application or data flows across that trust boundary.

Organizations should consider the following list of examples as the maximum types of zones that should be created (note that some are not applicable to infrastructures hosted in Azure).

Zone

Description

Production

Environment to host collaboration, office automation, and line-of-business applications

Extranet

Environment to host applications interacting directly with Internet endpoints

Non-production

Lab environment for development and validation testing

Fabric environment

Environment for private clouds and out-of-band infrastructure management

Legacy isolation

Environment for assets that cannot be secured or updated, such as mission-critical computers running Windows XP, Windows 2003, and earlier operating systems

High-value asset

Environment such as specialized research and collaboration systems or Very Important Personnel (VIP) or executive collaboration

Regulated functions

Environment for financial (such as PCI or ATM) or medical/health/privacy (such as HIPAA) assets

Physical control and machinery zones

Environment for Process Control Networks (PCN) or Supervisory Control And Data Acquisition (SCADA) devices

Isolation Between Security Zones: Security, Complexity, and Cost Tradeoffs

Containment between security zones needs to isolate groups of assets that have significantly different business and mission values, risk exposure, and/or trust level. The containment measures must be aggressive, and exception management should always include deep analysis and consideration of risks and mitigations.

Every exception and mitigation should consider controls at all layers (such as identity, application, data, and network)—not only network ports and protocols. The design of containment between zones should be holistic and include controls at all layers to provide:

  • Preventive controls
  • Alerts for known bad and anomalous events
  • Logging to enable investigations and pattern matching
  • Response processes (and recovery plans) that reflect the context and priority of the security/trust boundary

In its simplest form, a security zone is a fully separate IT environment with an independent infrastructure of services, including identity, management, and monitoring. Although the same personnel may manage multiple zones, each shared service that interacts with multiple zones creates a technical means of control and a security risk.

Isolation between zones provides excellent security benefits by making it very difficult for an attacker to cross the trust boundary. However, each additional zone also adds overhead for implementation, maintenance, and daily operation that may create cost and security assurance degradation.

Each security zone has to be defended and monitored, so each additional zone brings equipment and configuration to maintain, each zone may add additional steps to perform regular tasks across the boundary, and each adds complexity that is hard to sustain because personnel inevitably changes. Additionally, complexity creates challenges with measuring compliance, making it difficult to be confident that the security assurances are working as expected.

For these reasons, it is important to choose the simplest strategy that meets your business and mission protection goals. This security zones and containment strategy should ultimately span across the on-premises and the Azure-hosted assets.

Most environments will require at least two security zones, production and extranet, to host assets for internal corporate and public-facing assets, as illustrated in the following image. Any additional security zone considerations should take into account the increased cost-over-time to maintain the environment. As the zone count and complexity increases, the cost and likelihood of security degradation over time also increases.

You should only create a security zone if you are prepared to maintain and defend that trust boundary with the same rigor that you would expect a software vendor to create a security update for a vulnerability—working constantly until the risk is mitigated.

Figure 5 - Examples of Security Zone

Some asset types almost always warrant a dedicated separate zone. If the organization owns and manages the following types of assets, it is strongly recommended to create a separate security zone to separate them from a risk of attack from a compromised production or an extranet asset.

  • Payment Card Industry (PCI)-regulated data and assets such automated teller machines (ATMs) and Point of Sale (POS) devices
  • Process Control Networks (PCN) or Supervisory Control and Data Acquisition (SCADA) devices that provide control of physical devices that could cause loss of life or major financial damage (such as manufacturing equipment, power generation and distribution, and medical devices).

Some asset types may warrant a separate security zone depending on the asset valuation, trust levels, and/or risk exposure profiles in the context of the organization. These are some examples of when organizations might choose or circumstances might dictate a different strategy for the same type of asset:

  • Non-production
    lab environment for development and testing: This lab can be placed within a production zone or in a separate zone, depending on the relative value and trust of the assets. For example, a financial institution may have a website that handles sensitive financial information. Consider the following scenarios when making your decisions:
    • Separate Zone
      The organization has chosen to outsource general website development, but maintain a regimented security validation and quality control process in-house. The developers are not trusted or authorized to access live sites, so this organization would likely place these development and validation assets on a dedicated zone that is separate from production or extranet.
    • Within Existing ZoneThe organization has found that development of analysis components must use highly regulated customer data to be effective. In this case, the security controls and trust of the employees doing the development are similar to the production instance, and the components can reside within the same security zone, protected by administrative controls.
  • Personal Health Information:
    This is highly regulated information that must be protected at a high level to maintain privacy. Consider the following scenarios when making your decisions:
    • Separate ZoneA retail organization maintains a pharmacy with fixed function terminals in each store to handle the pharmacy data. There is no need for non-pharmacy employees to access the PHI data—they need access only to financial and retail reporting data for the business.
      This organization creates separates zones for these systems. They might create a dedicated zone for the PHI or this zone could be combined with the existing PCI security zone that hosts the point-of-sale (POS) systems, which is already isolated.
    • Within Existing ZoneA health insurance organization must have most employees handle PHI data as part of their work to process claims and provide customer support. In this case, there is very little need or ability to provide a separate zone for PHI data. The organization should focus security efforts within the production zone on maintaining and monitoring the administrative model, particularly roles with access to large amounts of PHI.

Recommended:

  • Design security zones to minimize the number of application data flows across the trust boundaries between zones. Every inter-zone trust boundary must be secured and defended, and each application that crosses the boundary incurs security overhead to analyze, protect, and monitor the data. Minimizing the application data flows can be accomplished through a number of means such as:
    • Blocking data flow and accepting impact to application functionality
    • Changing the application deployment architecture (for example, move monitoring and reporting assets for SCADA systems to a different security zone)
    • Changing the zone design to combine zones (for example, host PCI and HIPAA data for a retail store's point-of-sales in a single zone)
    • Changing the application or system design
  • Perform a threat modelling exercise for every system and application data flow that crosses the trust boundary between zones, particularly focusing on:
    • Data flows that cross the boundary
    • Processes and external entities on either side of the data flow crossing the trust boundary

For more information about the threat modelling, see:

Containment within Security Zones

Because of the need for fluid interactions between similarly valued and trusted assets, the goal of containment within a security zone focuses on defending the integrity of a common administrative model and the resulting control relationships and permissions. This is similar to how you would put armed guards at a building entrance, but place electronic door locks with badge readers on rooms in a building.

Containment within the zone should focus on:

  • Protecting administrative control of assets in the security zone
  • Ensuring that least privilege is applied to the access of business sensitive assets

Containment strategies within a zone should start with credential theft log-on restrictions because these attacks are widely available and used prevalently. These defenses should follow the Tier model described in Mitigating Pass-the-Hash and Other Credential Theft, Version 2.

The primary means of maintaining the integrity of the common administrative model within a zone should focus on the following standards:

  • Administrator operational practices
  • Administrative model elements, including organizational units (OUs), security groups, and delegated permissions in Active Directory
  • Means of violating the administrative model by using management tools, unpatched vulnerabilities, and configuration errors

Additional information can be found about these standards in the Microsoft Azure Security section in this document.

Note: Because Active Directory domains are not security boundaries within a forest and they add management overhead, you should avoid using them for security containment within a zone.

Figure 6 - Examples of Security Zones with Tiers

Mandatory:

  • Containment within a zone should not interfere with application functionality or business operations (unless those operations natively present a specific security risk).
  • Apply escalation of privilege mitigations within a zone by using the Tier model described in Mitigating Pass-the-Hash and Other Credential Theft, Version 2.
  • No firewalls or other IP filtering technology should be used to block traffic between domain controllers within the same forest or security zone.
  • No firewalls or other IP filtering technology should be used to block traffic between domain controllers and any other hosts within the same forest or security zone.

Recommended:

  • Detective controls within a security zone should favor machine learning technologies that automatically learn normal traffic patterns. These are generally more successful at identifying anomalous behavior with a much lower investment of time by skilled human personnel.
    For an example of this approach, see Microsoft Advanced Threat Analytics.
  • All preventive controls for network traffic within a zone should be accompanied by a detective control to identify unexpected or anomalous behavior. This ensures that security operations personnel have visibility into anomalous events and operational issues that are reported by IT operations personnel.
  • Use a single Active Directory forest for identity in any given zone if you are using Active Directory in Windows Server for identity.

Security Standardization for Administrative Control

Most enterprise organizations that adopt Azure will continue to operate a production identity system that includes Windows Server Active Directory (AD) on-premises, hosted on Azure IaaS, or both.

Because of the importance of identity systems to the overall security posture of the organization, including assets hosted on Azure, this document includes a set of security standards for Active Directory and related assets.

This section describes the standards and expectations for securing administrative control of all an organization's information technology systems. These standards will have to be adapted to the requirements and risk appetite of the organization, but the defaults are constructed to recommended practices so they can be used as a benchmark or end state for planning.

The standards in this section assume that the organization has the following attributes:

  • Most or all servers and workstations are joined to Active Directory.
  • Smart cards can be used for all authentication of administrative accounts.
  • An enterprise identity management solution is deployed.
  • There is a privileged access management solution, such as Microsoft Identity Manager, in place, or there is a plan to adopt one.
  • Personnel are assigned to monitor security alerts and respond to them.
  • The technical capability to rapidly apply Microsoft security updates is available.
  • Baseboard management controllers on servers will not be used, or will adhere to strict security controls.
  • Administrator accounts and groups for servers (Tier 1 admins) and workstations (Tier 2 admins) will be managed by domain admins (Tier 0).
  • There is a Change Advisory Board (CAB) or another designated authority in place for Active Directory changes.
Administrative Control

Administrative control represents the ability to exert full technical control of all system functions. This ability to "control" the system is separate and distinct from "security controls" that prevent or detect activity within the system.

By commonly targeting administrative control of identity systems, cybersecurity attackers can control most or all identities and computing assets in an organization. This makes the security assurances for all identity systems including Active Directory a critical mission imperative.

Maintaining the security standards of a system is a critical to maintaining positive administrative control of that system. If administrative privileges are poorly defined or not followed, the entire enterprise is at high risk of compromise by hostile actors.

Getting positive enterprise control requires restricting all means of administrative control—much like controlling access into a city requires restricting all means of transportation, such as road, rail, and air. This document categorizes the means of controlling the identity store into three types:

  • Operating system – Technical control of any operating system through means such as management agents, local administrative group membership, and local configurations.
  • Active Directory and identity data – Technical control of any data in Active Directory, other identity systems or other entities. Means of control include permission to reset passwords, Active Directory group membership, Active Directory and identity system object permissions, Group Policy permissions, or permissions on log-on or start-up scripts called by Group Policy.
  • Operational practices – Operational practices or actions can provide technical control of assets, such as password and credential handling practices (where administrative credentials are used to logon and the physical security of computing assets.
Securing Custom Developed Software

The security of software that is developed in-house relies on using the security practices during the development cycle in addition to the maintenance and response processes.

Recommended: All custom developed applications should be developed with a full security development lifecycle to minimize risk.

Mandatory: All custom applications should go through a threat modelling exercise to systematically identify risk and mitigations to the application's design and configuration.

For information about these topics, see:

Leveraging the Cloud Platform Integration Framework in Azure

The Cloud Platform Integration Framework (CPIF) is an extension of the IaaS Product Line Architecture (PLA) previously developed and released by Microsoft. This extension incorporates additional information about implementing workloads in public and hybrid cloud environments.

These scenarios can be deployed by using the principles of the framework, rather than being implemented against a rigid architecture. The CPIF architectural pillars group areas of Azure architecture together and define consumption methodologies within the pattern guides.

The Architectural Pattern Guides provide guidance about the services, construction, and deployment of the services within a larger solution. This section covers the various discipline areas within each pillar as they pertain to deploying solutions in a target Microsoft Azure subscription.

Deployment

Deployment is a key facet of Azure implementations, and it is usually one of the first steps in following an Azure subscription design. Customers new to Azure commonly have their first experiences with Azure during a deployment of Azure resources.

Deployment of Azure constructs can involve many layers – subscriptions, accounts, virtual networks, virtual machines, Azure websites, and other services. Although various mechanisms exist to provision Azure resources, to scale beyond simple deployments, many customers must incorporate automation solutions to effectively scale their deployments.

Additionally, automation solutions can provide standardization (such as naming conventions and configuration standards), which surpass the built-in capabilities and range of choices provided by the Azure portal.

Automation

Azure is a unique implementation of cloud services in that it provides each and every service at hyper-scale, over a global infrastructure, over a common API and accessible through a common UI (often referred to by customers as a "single pane of glass"). This API layer is accessible through several programming languages and supports rich automation scenarios. Automation within the Azure infrastructure can be accomplished using a variety of toolsets. Usually, an automation architecture will follow shortly after the Azure networking, storage, subscriptions and other key architectural decisions are made. These prerequisites will be inputs required for the automation toolsets which instantiate different parts of the environment or solution within Azure. Note that depending on the Azure region(s) to be used, certain toolsets may not be available.

Each customer environment should also be evaluated for existing toolsets used for on-premises or public cloud automation. Customers may wish to leverage existing expertise and investments in existing toolsets and extend the on-premises functionality to support the automation of Azure constructs. However, other organizations may wish to have the automation supporting Azure handled separately, without dependencies on legacy or existing platforms. This discovery of each organization's environment and automation strategy should serve as an input into the overall automation architecture.

Microsoft Azure PowerShell

The Azure APIs mentioned earlier in this document are complemented through a comprehensive library of PowerShell cmdlets. PowerShell scripting can provide a wide variety of automation solutions, and the Azure PowerShell cmdlet library can support native and third-party automation capabilities.

Although third-party tools are not directly highlighted in this document, customers with extensive third-party automation toolsets already deployed and that support PowerShell may consider extending the functionality by using Azure PowerShell scripts within the existing tools.

Standard PowerShell scripts are commonly the first iteration of automation that customers explore when they are new to Azure. Generally, this automation is later incorporated into new or existing toolsets to standardize automation within Azure across the organization.

Feature References

Install Azure PowerShell cmdlets

http://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/

Azure Cmdlet Reference (TechNet)

https://msdn.microsoft.com/en-us/library/azure/jj554330.aspx

Getting started with Azure PowerShell (Ed Wilson)

http://blogs.technet.com/b/heyscriptingguy/archive/2013/06/22/weekend-scripter-getting-started-with-windows-azure-and-powershell.aspx

Mandatory: Download and install the most current edition of the Azure PowerShell cmdlets through the web platform installer.

Recommended: Azure Automation solutions that use PowerShell cmdlets should be designed using PowerShell scripting best practices. Additionally, PowerShell scripts should be designed by incorporating PowerShell workflow processes into the initial design, or the scripts can be refactored in the future to use PowerShell workflows.

Optional: Consider using the Azure PowerShell cmdlets to build automation solutions on your existing platform if your Azure Automation solutions leverage third-party automation platforms.

Design Guidance

Azure PowerShell scripts should follow a number of design recommendations. This section covers some of the following design considerations:

Credential protection - We recommend that authors create a dedicated Azure authentication function. These functions should leverage [System.Security.SecureString] to secure the user passwords during runtime. Following is a code snippet example of an authentication function:

Param(

[Parameter(Mandatory=$true)][string]
$AzureUserName,

[Parameter(Mandatory=$true)][System.Security.SecureString]$AzurePassword

)

Import-Module
Azure

$azureCred
=
New-Object
System.Management.Automation.PSCredential($AzureUserName,$AzurePassword)

Subscription details - After authenticating to Azure, it is advisable to display the subscriptions the user has access to. Generally, administrators in Azure will have access to multiple subscriptions, and they should choose which subscription to leverage at run time.

This approach is generally more flexible than static configuration, file-based subscription assignment of the script actions. To simplify the subscription choice for the user, append an automatically generated number for the user to input, and reference this read-host selection back to the object of available subscriptions:

$subscriptionList
=
Get-AzureSubscription

ForEach ($subscription
in
$subscriptionList) {


$subscriptionNumber
++


$properties
= @{'SubscriptionNumber'=$subscriptionNumber;


'SubscriptionName'=$subscription.SubscriptionName


'SubscriptionID'=$subscription.SubscriptionId

}


$PSObject
=
New-Object
-TypeName
PSObject
-Property
$properties


$subscriptionResults
+= @($PSObject)

}

return
$subscriptionResults

}

Object naming deconfliction - For provisioning scripts, we recommend that you test the name of the object you intend to provision. Azure objects have requirements to be globally unique. For example, the name of a storage account must be unique across all of Azure.

Other objects must be unique within an individual subscription, for example virtual machine names. Authors should test the name prior to attempting a provisioning action. Generally, test within the current subscription first to see if the object already exists. If the name is not already used in the subscription, use the following script to leverage the Test-AzureName cmdlet to test if the name is taken in Azure.

$testAzureServiceName
=
Test-AzureName
-Service
$cloudServiceName


If ($testAzureServiceName) {


Write-Host
'Cloud Service Name already taken, choose another name and run script again'


Return
$null, $false

}


Else {


Write-Host
'Cloud Service Name Available - creating cloud service...'


New-AzureService
-ServiceName
$cloudServiceName
-Location
$azureRegion
|
Out-Null


Write-Host
"Cloud Service $cloudServiceName created" }

Object pre-existence verification - While performing a provisioning action in a series of actions (for example, provisioning a storage account, then provisioning a storage container within the account), we recommend that you check the status of the newly provisioned object prior to moving to the next provisioning action. The following script example uses a Do While loop that incorporates a Get-AzureService request on the new object:

Do {


Write-Host
"Finding status of cloud service $cloudServiceName ..."


$selectedCloudService
=
Get-AzureService
-ServiceName
$cloudServiceName
-ErrorAction
SilentlyContinue
-WarningAction
SilentlyContinue


If ($selectedCloudService.Status -eq
'Created') {


Write-Host
"Found cloud service $cloudServiceName - Cloud Service Status: $($selectedCloudService.Status)"

}


Start-Sleep
-Seconds 1

}


While ($selectedCloudService.OperationStatus -eq
'InProcess')

Desired State Configuration

Desired state configuration (DSC) within Azure allows for virtual machine configuration by using a configuration file (PowerShell script) and a virtual machine extension.
This configuration file is uploaded to Azure blob storage and is applied to the virtual machine that is enabled with the PowerShell DSC extension. Some examples what DSC enables are:

• Install or remove server roles and features
• Manage registry settings
• Manage files and directories
• Start, stop, and manage processes and services
• Manage local groups and user accounts
• Install and manage packages, such as .msi and .exe
• Manage environment variables
• Run Windows PowerShell scripts
• Fix a configuration that has drifted away from the desired state
• Discover the actual configuration state on a given node

The following illustration provides insight into the DSC flow phases and the Push and Pull models.

These phases are outlined in the following table.

Phase

Definition

Authoring

A DSC configuration is created through PowerShell or by third-party languages and tools. The output from the Authoring phase is one or more Management Object Format (MOF) files, which is the format that is consumable by DSC.

Staging

DSC data (MOF files) are staged.

When using the Pull model, DSC data and custom providers are kept on the Pull server (an IIS server with an OData interface) and the target system contacts the Pull server by passing a URI and a unique identifier to retrieve its configuration.

When using the Push model, DSC data is being pushed to the target system directly.

"Make it so"

The final phase is to apply the configuration. DSC data is pulled or pushed to the local configuration store, which contains the current, previous, and desired state configurations. The configuration gets parsed, and the relevant WMI provider implements the change.

Azure DSC Extensions allow PowerShell desired state configuration to configure your Azure virtual machines. The DSC Extension handler has a dependency on Windows Management Framework 5.0. This is automatically installed by the extension handler. Currently, the only Windows Server virtual machine images that support the DSC extension handler are Windows Server 2012 R2 images due to a dependency on Windows Management Framework 5.0.

Feature References

Introduction to Azure PowerShell DSC

http://blogs.msdn.com/b/powershell/archive/2014/08/07/introducing-the-azure-powershell-dsc-desired-state-configuration-extension.aspx

Configuring a virtual machine via PowerShell DSC

http://blogs.msdn.com/b/powershell/archive/2014/04/03/configuring-an-azure-vm-using-powershell-dsc.aspx

Built-In Windows PowerShell Desired State Configuration Resources

https://technet.microsoft.com/en-us/library/dn249921.aspx

Build Custom Windows PowerShell Desired State Configuration Resources

https://technet.microsoft.com/en-us/library/dn249927.aspx

Windows PowerShell Desired State Configuration for Azure

https://msdn.microsoft.com/en-us/library/azure/dn877980.aspx

Optional: Solutions that leverage provisioning automation can consider the use of DSC, in combination with other automation, to deliver a final product that is configured to the desired specifications. Because there are limitations on what DSC can deliver, organizations may find that they need to augment DSC with additional automation capabilities.

Service Management Automation

Service Management Automation (SMA) is primarily an engine that allows running .NET Framework Windows Workflow Foundation activities. SMA is typically accessed through the Azure portal; however, solutions can also leverage the SMA OData-based web service to call SMA directly.

The SMA environment leverages the .NET Framework, so 64-bit functions can be utilized. This is a key difference when compared to solutions such as System Center Orchestrator, which leverages the 32-bit Windows PowerShell 2.0 engine, regardless of the underlying operating system capabilities.

Feature References

Service Management Automation overview

http://blogs.technet.com/b/privatecloud/archive/2013/08/09/automation-an-introduction-to-service-management-automation.aspx

Service Management Automation comparison to other tools

https://technet.microsoft.com/en-us/library/dn469260.aspx

Getting Started with PowerShell Workflow

https://technet.microsoft.com/en-us/library/jj134242.aspx

Running PowerShell commands in a Workflow

https://technet.microsoft.com/en-us/library/jj574197.aspx

Recommended: Azure solutions should leverage SMA if parallel or long running activities are required. Additionally, SMA should be part of solutions looking to automate Azure resources and on-premises resources.

Optional: Solutions that leverage SMA can optionally integrate with Orchestrator runbooks via a third-party integration pack or by calling the Orchestrator/SMA APIs.

Design Guidance

SMA automation runbooks are based on PowerShell workflows, rather than traditional PowerShell scripts. PowerShell workflows provide additional benefits over traditional PowerShell scripts, specifically around activity parallelism and long-running activities.

Typically, long-running activities are seen as riskier in traditional PowerShell scripts due to the stateless nature of the execution environment. However, PowerShell workflows can accommodate interruptible activities, such as stop, restart, and reboot.

Traditional PowerShell scripts are generally not designed for parallelism, whether this is within one machine or across multiple machines. PowerShell workflows have parallelism as a built-in feature that can be leveraged (depending on the design of your workflows). The following considerations apply:

Scenario

Model

Points to Consider

SMA-based automation

Green-field or existing SMA automation for hybrid scenarios

  • Complexity and limitations in hybrid scenarios
  • Customer familiarity with PowerShell Workflows

SMA combined with other automation platforms

Leverage existing customer investment in other automation platforms

  • Existing customer investment in other platforms
  • Integration points and complexity

Standard PowerShell in a PowerShell Workflow

Attempting to run existing PowerShell code or functions that are not supported in a PowerShell Workflow natively

  • Leverage the inline script capability to enable this scenario.
System Center 2012 R2 Orchestrator

System Center 2012 R2 Orchestrator is a 32-bit automation platform that helps you automate the creation, monitoring, and deployment of resources in your environment. Many organizations have adopted Orchestrator based on the GUI designer, third-party ecosystem of integration packs and the relatively low investment in creation of initial or simple automation solutions.

However, there are a number of considerations when using Orchestrator to automate Azure solutions. The primary consideration is that the 32-bit engine can potentially limit performance (currently limited to 50 concurrent runbooks). PowerShell execution and capabilities are limited to Windows PowerShell 2.0 32-bit (unless more complex external calls are used), and automation capabilities are enabled by (and limited to) the Orchestrator Integration Pack framework.

Feature References

Orchestrator Azure Integration Pack

https://technet.microsoft.com/en-us/library/jj721956.aspx

Kelverion Integration Pack

http://kelverion.com/integration-packs/ip-microsoft-system-center/

Orchestrator TechNet page

https://technet.microsoft.com/en-us/library/hh237242.aspx

Mandatory: Azure Automation using Orchestrator should be deployed in an architecture with high availability. Orchestrator may require a separate, high-availability architecture, and it is generally complex (requiring high availability for Management server and SQL Database).


Recommended: Orchestrator solutions that automate Azure should consider using Azure Integration Packs (Microsoft or third-party) to speed the creation of the Orchestrator runbooks.

Optional: Azure can be automated by using System Center Orchestrator. However, if you do not currently have Orchestrator deployed, other automation platforms, such as SMA or Azure Automation may be a better choice.

Design Guidance

System Center 2012 R2 Orchestrator can be leveraged as an automation solution for Azure environments. One of the key drivers that may support the use of Orchestrator is the existing automation that is housed within Orchestrator runbooks. Additionally, if you have experience with Orchestrator automation for other projects or applications, the choice to leverage that existing investment in administrator abilities may be right for some organizations.

One of the key considerations of an automation design that uses Orchestrator should include the selection of Integration Packs. Microsoft provides an Azure Integration pack for Orchestrator that provides pre-built activities that can be leveraged in a runbook.

Additionally, third parties have developed Orchestrator integration packs that can be evaluated for enhanced functionality. Microsoft System Center 2012 integration pack by Kelverion provides automation activities beyond the default product functionality.

Orchestrator can also be used as a component of an automation solution by leveraging connections to Service Management Automation through API calls or by using an Orchestrator activity in the Kelverion Orchestrator integration pack.

This is particularly useful in a hybrid scenario where Orchestrator is used to provision and configure on-premises virtual machines and do post-provisioning activities. These same runbooks may be leveraged to do the same post-provisioning activities on Azure virtual machines with minimal changes.

Scenario

Model

Points to Consider

Orchestrator-based automation

GUI driven automation

  • Leverage existing Orchestrator infrastructure
  • Limitations of 32-bit application

Orchestrator-integrated automation

Orchestrator added to other automation tools

  • Complexity of entire solution
  • Where to run which pieces of automation

Azure Automation

Azure Automation is a PowerShell workflow-based environment hosted in Azure. This environment is similar to on-premises implementations of Service Management Automation. These PowerShell workflows can be started manually or at a particular time.

Feature References

Azure Automation platform comparison and reference

https://msdn.microsoft.com/en-us/library/azure/dn643629.aspx

Azure Automation Overview

https://azure.microsoft.com/en-us/documentation/articles/automation-intro/

Azure Automation E-book

http://blogs.msdn.com/b/microsoft_press/archive/2015/03/11/free-ebook-microsoft-azure-essentials-azure-automation.aspx

Azure Automation Training

https://www.microsoftvirtualacademy.com/en-US/training-courses/automating-the-cloud-with-azure-automation-8323

Hybrid Runbook Workers

https://azure.microsoft.com/en-us/documentation/articles/automation-hybrid-runbook-worker/

Mandatory: Azure Automation solutions cannot leverage on-premises resources that are not exposed publicly.

Recommended: Azure Automation based solutions should contain checkpoints for long-running runbooks and workflows. This allows for more intelligent recovery should a runbook fail.

Optional: An Azure Automation based solution can be developed locally with the PowerShell ISE because it follows the same design patterns as PowerShell workflows.

Design Guidance

Azure Automation is a platform to create, manage, and execute automation for various Azure components. There are a number of design considerations, particularly for solutions that will span on-premises resources and Azure.

One challenge is that Azure Automation cannot leverage on-premises resources that are not accessible over the Internet. This can be a significant limitation for hybrid deployments, or deployments that are fully cloud-based, but use existing on-premises management tools (such as antivirus and monitoring tools).

Note that Azure Automation cannot access Orchestrator or SMA runbooks. This can be a significant design consideration if an existing investment in automation platforms is required.

Whether it is self-hosted in Azure or on-premises, these platforms cannot be leveraged for a hybrid automation solution. Typically, Azure Automation would be used in a scenario to deploy or manipulate Azure resources, while additional automation could be used for post-provisioning automation actions when existing automation runbooks already exist for an on-premises resource configuration. Solution designs that require both automation environments should consider the tradeoffs associated with complex or multiple automation solutions.

Scenario

Model

Points to Consider

Exclusively using Azure Automation

Automation environment as a service

  • On-premises resources
  • Automation limitations

Multi-platform environment: leverage Azure Automation with other platforms

Azure Automation plus additional functionality

  • Complexity
  • Leverage existing automation investment

Azure Resource Groups

Azure Resource Groups allows you to group all related components of your application as a logical unit. This allows you to simplify the lifecycle of applications from creation to deletion.

Feature References

Azure Resource Groups

http://azure.microsoft.com/en-us/documentation/articles/azure-preview-portal-using-resource-groups/

Azure Resource Manager Template Language

https://msdn.microsoft.com/en-us/library/azure/dn835138.aspx

Design Guidance

When you design solutions by using Azure Resource Groups, consider the following scenarios.

When deciding between one large resource group or multiple smaller resource groups, it is best to model the structure based on your organizational structure and ultimately what you feel works well for your company.

Scenario

Model

Points to Consider

Complex structures (involving a website, SQL Database, Redis Cache, and so on)

Pattern or model for implementation

  • Create resource groups that contain objects that have very similar lifecycles and are generally managed by the same organizational group
  • If the SQL Server instances are operated by a dedicated group that manage the lifecycle of those servers, it might be best to create a dedicated group for the servers and reference it in your primary resource group as a dependency.

When designing Azure Resource Groups by using JSON templates, we recommend that you adhere to the following guidance:

  • Use consistent ordering of JSON elements. These include elements such as name, type, api6version, location, tags, dependson, properties, resources (child).
  • Use parameters for template reusability. Avoid hardcoding to increase reusability of your template.
  • If credential capture is required, use secure string as parameter type.
  • Use variables from commonly used constants.

Scenario

Model

Points to Consider

Standard website deployment

Continuous deployment with MSDeploy

  • Leverage tagging to label the build version

Backup and Disaster Recovery

Azure provides scalable, durable cloud storage, backup, and recovery solutions for any data— big and small. You can use Azure backup capabilities with your existing infrastructure and backup investments to:

  • Cost-effectively enhance your business continuity strategy
  • Provide the storage required by your cloud applications

This section provides an overview of the backup and disaster recovery capabilities provided by Azure.

Microsoft Azure Backup

Microsoft Azure Backup is a feature within Azure that enables off-site file and folder backups from the on-premises Windows Server or System Center Data Protection Manager Server to Azure Storage.

By using incremental backups, only changes to files are transferred to the cloud. This helps ensure efficient use of storage, reduced bandwidth consumption, and point-in-time recovery of multiple versions of data.

Configurable data-retention policies, data compression, and data-transfer throttling also offer added flexibility and help boost efficiency. Backups are stored offsite in Azure, which reduces the need to secure and protect on-site backup media.

Scenarios include:

  • Small businesses: Low-cost backup and recovery solution for single-server backups
  • Departmental backups: Low-cost backup alternative for departments in mid- to large-size organizations
  • Remote office backup and recovery consolidation: Consolidate backups of remote offices
  • Enterprises: Off-site protection of data, databases, and virtual machines from System Center Data Protection Manager

Feature References

Azure Backup Overview

https://azure.microsoft.com/documentation/articles/backup-introduction-to-azure-backup/

Configure Azure Backup

http://azure.microsoft.com/en-us/documentation/articles/backup-configure-vault/

Administer Azure Backup with Windows PowerShell

https://msdn.microsoft.com/en-us/library/azure/hh831765.aspx

Azure Backup in 10 minutes

https://azure.microsoft.com/en-us/documentation/articles/backup-try-azure-backup-in-10-mins/

Azure Import/Export Service

http://azure.microsoft.com/en-us/documentation/articles/storage-import-export-service/

Azure Backup for IaaS (Preview)

http://azure.microsoft.com/blog/2015/03/26/azure-backup-announcing-support-for-backup-of-azure-iaas-vms

Configure Azure Backup to quickly and easily back up Windows Server

http://azure.microsoft.com/en-us/documentation/articles/backup-configure-vault/

Backup and Recover Using the Azure Backup Agent

https://azure.microsoft.com/documentation/articles/backup-azure-backup-and-recover/

Mandatory:

  • Azure Backup requires creating a backup vault and schedule
  • The backup data is encrypted prior to being stored in Azure. The customer is responsible for managing encryption keys and backing up those keys. Customer data is never decrypted in Azure. To restore the data, it is decrypted on the on-premises client-side by the customer.

Recommended: If you are facing network constraints or need to accelerate the initial back up of your data, you can ship your data through a disk to the nearest Azure datacenter through the Azure Import/Export service.

Optional:

  • Azure Backup can work with your existing Data Protection Manager investments.
  • Using Azure Backup does not require that you install Windows Server Backup. However, the two backup methods complement each other. Windows Server Backup can perform tasks such as bare-metal and system-state restores, which are not available by using Azure Backup.
  • If you are using Azure Backup to back up or recover data, additional backup operations cannot be started for that computer. This means that when a backup operation is in progress, recovery cannot be started. If you need to cancel a backup operation to perform an unexpected recovery operation, the completed portion of the backup is not deleted. Instead, an incremental backup is performed the next time the backup is run to pick up any changes and to finish the backup.
Azure Backup for IaaS

Azure Backup support for backing up an Azure IaaS virtual machines with Azure Backup is now generally available. With this support, customers can back up their virtual machines in Azure like they back up their on-premises virtual machines and physical servers by using System Center 2012 R2 Data Protection Manager (DPM) or any third-party backup tool.

When you back up from an on-premises environment to Azure, you first need create a backup vault. If you currently have an existing backup vault created in the region where your virtual machines reside, you can use it for backing up your IaaS virtual machines. You manage your Azure Backup for IaaS virtual machines from the same portal as you do for backing up on-premises workloads.

With Azure Backup, you get the expected features from a backup solution, such as a backup schedule to determine when to back up your virtual machines and a retention policy to determine how long you will keep your backup.

You do not need to shut down your virtual machine to do a backup. Azure Backup currently supports application-level consistency for virtual machines running Windows operating systems, and file-system-level consistency for virtual machines running Linux operating systems.

To back up IaaS virtual machines, you need to deploy absolutely nothing. The storage and compute infrastructure requirements are automatically handled by the Azure Backup service.

You need not worry about scaling up either. You can protect as many virtual machines as needed, at any time. Azure Backup also reduces the overhead of maintenance by automatically handling the virtual machine extension upgrade, without user intervention.

Feature References

Back up Azure virtual machines with Azure Backup

https://azure.microsoft.com/en-us/documentation/articles/backup-azure-vms/

Azure Backup - Announcing general availability of backup for Azure IaaS VMs

https://azure.microsoft.com/en-us/blog/general-availability-of-backup-for-azure-iaas-vms/

Announcing pricing model for TCO reduction

http://azure.microsoft.com/blog/2015/03/31/azure-backup-announcing-new-pricing-model-for-tco-reduction/

Estimating Azure Backup billing usage for DPM data sources

https://gallery.technet.microsoft.com/Estimating-Azure-Backup-e0d4abbc

Azure Backup Cost Calculator

https://azure.microsoft.com/en-us/pricing/calculator/?scenario=data-management

Azure Backup - Monthly bill estimate and TCO calculator

https://gallery.technet.microsoft.com/Azure-Backup-Monthly-bill-093fd095

Mandatory: The backup is stored in the same backup vault that you used to register the virtual machine. To access those backups for restore purposes, click the Protected Items tab.

Optional: You have the option of choosing locally redundant storage (LRS) or geo-redundant storage (GRS) for the backup data, independent of other non-backup-related storage accounts. You can lower your storage costs by choosing LRS instead of the default GRS. For example, you could benefit from choosing LRS if you want to replace data on tapes with one copy in Azure.

Design Guidance

Azure Backup works with your existing data protection software, whether it's Windows Server Backup or System Center Data Protection Manager (DPM). If you do not have an existing investment in Windows Server Backup or DPM then you can leverage Azure Backup as a standalone data retention and protection solution.

Azure Backup implements an optimized blob copy that ensures constant, predictable I/O and backup times. Consider the following scenarios:

Scenario

Model

Points to Consider

Azure Backup (IaaS)

Application-consistent back up of IaaS virtual machines

  • Back up with no impact to production workloads
  • No shut down of virtual machine required
  • Application-level consistency for Windows operating systems
  • File-system-level consistency for Linux operating systems

Azure Backup (IaaS)

Fabric-level backup


  • Unlimited scalability with no customer resources required for the backup
  • Agentless back up of multiple virtual machines at the same time
  • Single, central management interface through the Azure portal
  • Detailed Jobs view for tracking progress and success or failure

Azure Backup (IaaS)

Policy-driven backup and retention

  • Configuration of scheduled backups
  • On-demand backups
  • Automatic management of recovery points within an Azure backup vault
  • Retain backup data in an Azure backup vault even if the original virtual machine is deleted

Azure backup (SQL Server in IaaS)

SQL Server instance backups

  • Enable backup compression to reduce costs
  • Monitor and delete failed backups
  • Set container-level access to Private
  • Ensure your backup location is in the same region as your IaaS SQL Server instance.

System Center Data Protection Manager

System Center 2012 R2 Data Protection Manager (DPM) is an enterprise backup system. By using DPM, you can back up (copy) data from a source location to a target secondary location. If original data is unavailable because of planned or unexpected issues, you can restore data from the secondary location.

By using DPM, you can back up application data from Microsoft servers and workloads, and file data from servers and client computers. You can create full backups, incremental backups, differential backups, and bare-metal backups to completely restore a system.

Feature References

DPM and Azure Backup

https://msdn.microsoft.com/en-us/library/azure/dn337332.aspx

Recommended: Find and
download the latest rollup update from Update rollups for System Center 2012 R2 - DPM and the Azure backup agent to leverage the latest features.

Optional: If you leverage Operations Manager you can download the DPM management pack to extend monitoring coverage of your Azure backup process for Data Protection Manager.

Azure Backup with System Center Data Protection Manager

Starting with Microsoft System Center 2012 SP1, Data Protection Manager (DPM) can back up production workloads directly to Microsoft Azure by using integration with the Azure Backup service. This integration provides organizations with an option for an offsite backup location without having to manage tape libraries or disk backup then shipping them to an offsite location.

To leverage DPM integration with Azure, you need to create a storage vault in the Azure portal. The data that is backed up in the storage vault is secured by using SSL certificates and strong passphrases.

After the storage vault is configured, you need to install a Windows Backup Agent on your DPM server and register the DPM server with Azure. System Center Data Protection Manager provides rich reporting capabilities as illustrated in the following graphics

.

The Data Protection Manager reporting framework includes the following feature set:

Feature Area

Capabilities

Customization

Custom scripts

Documented views

Customized user interface

Aggregation

All your DPM data in System Center Operations Manager data warehouse

Scalable to any number of DPM servers

Flexibility

Rich UI

No coding restriction

Edit SQL Server queries

Feature References

Configure the Azure Backup vault and register the DPM server

https://msdn.microsoft.com/en-us/library/azure/dn337336.aspx

Set up DPM to back up to Azure

https://msdn.microsoft.com/en-us/library/azure/dn337341.aspx


Recover DPM data from Azure

https://msdn.microsoft.com/en-us/library/azure/dn337334.aspx

Recommended: To leverage the full functionality of DPM integration with Azure Backup and Storage, please ensure you are running a minimum configuration of System Center 2012 R2 with Update Rollup 5.

Design Guidance

When integrating DPM and Azure Backup, consider the following:

Capability Considerations

Capability Decision Points

Cost

Azure backup vault and DPM offer a potentially less expensive replacement to traditional tape backups.

Compute instance planning

DPM is supported on any Azure IaaS virtual machine that is size A2 or higher.

Create an instance in the Standard compute tier because the maximum IOPS per attached disk is higher in the Standard tier than in the Basic tier.

Storage account planning

Use a separate storage account for the DPM virtual machine, because there are size and IOPS limits on a storage account that might impact the performance of the DPM virtual machine if it is shared with other running virtual machines.

Virtual network planning

The DPM virtual machine and the protected workloads should be part of the same Azure virtual network.

Data retention

Retain data for one day on DPM-attached storage, and store data older than one day in the Azure Backup service. The goal is to protect a larger amount of data or have a longer retention range.

Offloading backup data from DPM to Azure Backup provides retention flexibility without the need to scale the storage that is attached to the DPM server.

Azure Site Recovery

Azure Site Recovery orchestrates replication and failover of physical servers and virtual machines. The following scenarios are supported by Azure Site Recovery:

  • On-premises Hyper-V site to Azure protection with Hyper-V replication
  • On-premises Virtual Machine Manager (VMM) site to on-premises VMM site protection with Hyper-V replication
  • On-premises VMM site to on-premises VMM site protection with SAN replication
  • On-premises VMM site to Azure protection
  • On-premises VMware site to on-premises VMware site
  • On-premises VMware site to Azure protection
  • On-premises physical Windows Server or Linux server to Azure protection

Feature References

Azure Site Recovery Overview

https://azure.microsoft.com/en-in/documentation/articles/site-recovery-overview/

Best Practices for Site Recovery Deployment

https://azure.microsoft.com/en-in/documentation/articles/site-recovery-best-practices/

Mandatory: Azure Site Recovery requires creating a recovery vault and setting up a core infrastructure to support the level of protection desired (for example, integration with VMM and deployment of agents or an InMage infrastructure within the subscription).

Recommended: If you are facing network constraints or need to accelerate your initial backup of your data, you can ship your data through a disk to the nearest Azure datacenter through the Azure Import/Export service.

Design Guidance

Azure Site Recovery (ASR) can be an effective backup solution for small businesses that don't necessarily have the resources to set up a secondary failover datacenter. ASR can seamlessly back up your virtual machines to Azure. Here is some design guidance:

  • Currently, ASR only supports Hyper-V generation 1 virtual machines. In a failover event, your Generation 2 virtual machines will be converted to generation 1 in Azure, and they will convert back to Generation 2 with failover to the on-premises site.
  • With the support for a larger operating system drive size, ASR removes restrictions on operating system drives that are larger than 127 GB.
  • ASR adds support for subnet mapping and for multiple network adapters on virtual machines that fail over to Azure, and retains the virtual machine's IP address after failover to Azure.

Scenario

Model

Points to Consider

Azure Site Recovery

Small business model

  • Compute costs are incurred when you trigger a failover. Virtual machines in normal mode would not exist in Azure.
  • When configuring frequency of replication, you might want to assess the bandwidth impact of each setting through the calculator.
  • Leverage a hybrid network between the on-premises site and Azure to ensure clients resume communication to the failover Azure virtual machines.

Native Service Backup (PaaS)

The Microsoft Azure Platform-as-a-Service (PaaS) solutions, such as Azure App Service and Azure SQL Database have built-in backups to support self-service, point-in-time restore options and geographic restore for Azure SQL Database Basic, Standard, and Premium service tiers. This is a key consideration when deploying solutions that leverage Azure PaaS tiers.

Feature References

Built-in Automatic Backup in Azure SQL Database

https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx

Point in Time Restore for Azure SQL Database

https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx

Restoring an Active Database to a Point in Time

https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx

Geo-Restore

https://msdn.microsoft.com/en-us/library/azure/jj650016.aspx

Azure SQL Database Business Continuity

https://msdn.microsoft.com/en-us/library/azure/hh852669.aspx

Geo Replication

https://msdn.microsoft.com/en-us/library/azure/dn783447.aspx

SQL Data Sync

http://azure.microsoft.com/en-us/documentation/articles/sql-database-get-started-sql-data-sync/

Business Continuity

http://azure.microsoft.com/en-us/documentation/articles/sql-database-business-continuity/

The Azure App Service backup strategy centers on the backup strategy of your cloud package and configuration files. The following guidance is provided:

Feature References

Azure App Service Overview

http://azure.microsoft.com/en-us/documentation/articles/app-service-changes-existing-services/

Recommended: Leverage Azure blob storage to retain and restore your application's cloud snapshot files.

Design Guidance

The following guidance is provided for designing backup for cloud packages and configuration using Azure Storage:

Capability Considerations

Capability Decision Points

Containers

Create containers that represent each of your environment's deployment stories (for example, daily builds, manual builds, and dynamic builds)

Naming conventions

Have a solid naming convention for each binary file and for your overall build structure (for example, Build2015.418.2338 and ContosoClientGatewayPaas.csfg)

Continuous release management

  • Consider using Azure Storage to stage the deployments to your Azure application and also as a short term repository of versions.
  • Ensure you have a proper naming convention when saving your files to Azure Storage.

Third-Party Backup Using Azure Storage

There are multiple third-party backup applications that are used to back up an organization's data to Azure Storage. The following table describes a few:

Solution

Summary

CloudBerry Backup

CloudBerry Backup uploads files and folders from your computer running Windows Server to Azure and restores them in minutes. The software connects directly to an Azure account and securely transfers backup files and folders to the cloud, serving as a transport between your computer running Windows Server and Azure cloud storage.

Seagate EVault Backup Services for Microsoft Azure

Cloud-based EVault Backup Services for Microsoft Azure gives you the flexibility, scalability, economy, and offsite protection of the cloud without any up-front capital expenses.

Uranium Backup

Azure Backup is supported by all the Uranium Backup Pro editions, including Pro Tape, Pro DB, Pro Shadow, Pro Virtual.

Backup Security

Backup data is always encrypted prior to being stored in Microsoft Azure. The customer is responsible for managing their encryption keys (and the subsequent back up of those keys). Customer data is never decrypted in Azure. To restore the data, it is decrypted on the on-premises client side by the customer.

If a customer loses their encryption keys, Microsoft cannot recover those keys. Customers can additionally use hardware security modules or key management software to securely safeguard their encryption keys. There is also an option from Microsoft called "Azure Key Vault," which is described in the following section.

Mandatory: The customer is responsible for managing the encryption keys and the backup of those keys. Customer data is never decrypted in Azure. To restore the data, it is decrypted on the on-premises client by the customer.

Design Guidance

When you design storage for backups, consider the following:

Capability Considerations

Capability Decision Points

Capability Models in use today

Any servers that are registered using the same vault can recover the data backed up by other servers that use that certificate.

If you want to ensure that recovery only occurs to specific servers in your organization, you should use a separate certificate designated for those servers.

For example, human resources servers could use one certificate, accounting servers another, and storage servers a third.

This provides a way to control recovery by installing the appropriate certificates on the recovery servers.

When data is backed up to an Azure backup vault, it is important to understand who has access to it, and who has the privilege to restore that data to what servers.

Be sure to review this security before establishing the backup vaults and backup certificates.

There is a distinction between using Azure for backup and using Azure for disaster recovery

Azure Backup is a classic backup solution that involves copying the data of the virtual machine to a backup vault in a storage account and making it available for a restore at a later point.

Azure Site Recovery is for disaster recovery scenarios—specifically for virtual machines. The typical scenario for disaster recovery is when you want application availability if there is a disaster in your production environment. You also want to minimize data loss and the time it takes to bring your business back online.

ASR is more of a "live" solution, whereas Azure Backup is more of an offline solution.

Any organization should understand the difference between backup and disaster recovery.

Sometimes, organizations use a backup copy for disaster recovery scenarios when they are willing to tolerate the time it takes to bring up their applications after they restore the data from backup copies. This is when the difference between disaster recovery and backup blurs a bit.

Azure Key Vault

Azure Key Vault helps customers safeguard cryptographic keys and secrets used by cloud applications and services. By using Azure Key Vault, you can encrypt keys and secrets (such as authentication keys, storage account keys, data encryption keys, .pfx files, and passwords) by using keys that are protected by hardware security modules.

For added assurance, you can import or generate keys in hardware security modules. Keys never leave the hardware security module boundary. Hardware security modules are certified to FIPS 140-2 level 2.

Azure Key Vault streamlines the key management process and enables you to maintain control of keys that access and encrypt your data. Developers can create keys for development and testing in minutes, and then seamlessly migrate them to production keys. Security administrators can grant and revoke permission to keys, as needed.

Anybody with an Azure subscription can create and use key vaults. Although Azure Key Vault benefits developers and security administrators, it could be implemented and managed by an administrator who manages other Azure services for an organization.

For example, this administrator would sign in with an Azure subscription, create a vault for the organization in which to store keys, and then be responsible for operational tasks, such as:

  • Create or import a key or secret
  • Revoke or delete a key or secret
  • Authorize users or applications to manage or use keys and secrets
  • Configure key usage (for example, sign or encrypt)
  • Monitor key usage

Feature References

Azure Key Vault

http://azure.microsoft.com/en-us/services/key-vault/

What is Azure Key Vault

http://azure.microsoft.com/en-us/documentation/articles/key-vault-whatis/

Get Started with Azure Key Vault

http://azure.microsoft.com/en-us/documentation/articles/key-vault-get-started/

Azure Key Vault PowerShell Cmdlets

https://msdn.microsoft.com/library/azure/dn868052

Mandatory:

  • Azure Key Vault is only configurable using Azure PowerShell at this time.
  • Key usage logging information is not currently available.

Monitoring

Azure Management Services

The Azure portal provides a default monitoring capability for your cloud assets without any additional investments in monitoring software. A summary of the monitoring services available within Azure are outlined in the following table:

Feature References

Portal Cloud Service Monitoring

http://azure.microsoft.com/en-us/documentation/articles/cloud-services-how-to-monitor/

Portal Storage Account Monitoring

http://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/

Customizing Monitoring with Azure Portal

http://azure.microsoft.com/en-us/documentation/articles/insights-how-to-customize-monitoring/

Azure Traffic Manager Monitoring

http://azure.microsoft.com/en-us/documentation/articles/traffic-manager-monitoring/

System Center Operations Manager

System Center Management Pack for Microsoft Azure Fabric

Organizations that have existing investments with System Center Operations Manager can leverage this infrastructure to monitor their Azure-based assets through the System Center Management Pack for Azure. This Management Pack extends monitoring of Azure resources by exposing them to Operations Manager. The following references are provided:

Feature References

Azure Management Pack Documentation download

http://www.microsoft.com/en-us/download/details.aspx?id=38414

Azure Management Pack Monitoring scenarios

http://blogs.technet.com/b/momteam/archive/2013/04/11/pre-release-of-the-management-pack-for-windows-azure-fabric-now-available.aspx

Mandatory: You need the management certificate from your subscription before you can configure Operations Manager to discover your Azure resources.

Design Guidance

The following considerations apply when extending Azure monitoring through Operations Manager PowerShell:

With ability to execute PowerShell scripts through Operations Manager, you can:

  • Extend existing management packs to retrieve additional data that is not provided by the management pack for Azure
  • Expose functionality present in the Azure portal to Operations Manager

There are two methods that you can leverage to extend Azure monitoring to Operations Manager:

  • Use Invoke-RestMethod against the Azure REST APIs.
  • Load the Azure module as part of your monitoring solution and use native Windows PowerShell commands.

Method 1 is best used when you are unable to find a corresponding PowerShell cmdlet.

Method 2 should be your first choice for designing your custom Azure PowerShell solution. Using cmdlets is preferable due to the abstraction factor. With REST APIs, the interface can change, which could potentially force you to revise your solution.

Application Insights

Application Insights is a set of services that provide actionable insight into a production application. This data is then integrated into the development tools and process. The following references are provided when implementing this capability within Azure to monitor applications:

Feature References

Availability Monitoring

http://azure.microsoft.com/en-us/documentation/articles/app-insights-monitor-web-app-availability/

Diagnostics and Performance

http://azure.microsoft.com/en-us/documentation/articles/app-insights-detect-triage-diagnose/

Usage

http://azure.microsoft.com/en-us/documentation/articles/app-insights-overview-usage/

SharePoint Monitoring with Application Insights

http://azure.microsoft.com/en-us/documentation/articles/app-insights-sharepoint/

Mandatory:

  • You are required to integrate the SDK to your existing application.
  • You are required to tie your Application Insights integration to a subscription.
  • For organizations with existing applications, you must manually add the Application Insights NuGet package to integrate with Application Insights.

Recommended:

  • When you integrate Application Insights with your application, it is advised you update Visual Studio 2013 to latest update package.
  • When you design an application for the first time and you want to integrate it to Application Insights from the beginning, you might want to set up your subscription and storage account first.
  • If you have a choice between integrating the SDK into your application or leveraging the wizard to drop specific DLLs for Application Insights into your website bin folder, choose to integrate the SDK through Visual Studio, and redeploy the application.

Design Guidance

The following considerations apply when extending Azure monitoring through Application Insights:

Capability Considerations

Capability Decision Points

Application code development integration

  • Always consider server-side event logging first over client-side (JavaScript).
  • Application Insights will capture exceptions that you would normally have to explicitly catch.

Application logging

  • Avoid using multiple trace listeners because they execute sequentially.
  • Avoid using the default wadlogstable to store all events because after some time (depending on the volume of transactions recorded), the wadlogstables becomes so large that you would be unable to filter the data. Depending on your application needs, use other frameworks like NLog, Log4Net, or Slab.

Operational Insights

Operational Insights is an analysis service that enables IT administrators to gain deep insight across on-premises and cloud environments. It enables you to interact with real-time and historical machine data to rapidly develop custom insights, and it provides Microsoft and community-developed patterns for analyzing data.

Feature References

Capacity Planning

http://azure.microsoft.com/en-us/documentation/articles/operational-insights-capacity/

System Updates

http://azure.microsoft.com/en-us/documentation/articles/operational-insights-updates/

Log Management

http://azure.microsoft.com/en-us/documentation/articles/operational-insights-search/

Malware Assessment

http://azure.microsoft.com/en-us/documentation/articles/operational-insights-antimalware/

Security and Audit

http://azure.microsoft.com/en-us/documentation/articles/operational-insights-security-audit/

AD and SQL Assessment

http://azure.microsoft.com/en-us/documentation/articles/operational-insights-assessment/

Alert Management

http://azure.microsoft.com/en-us/documentation/articles/operational-insights-alerts/

Mandatory: This feature requires a Microsoft or Organizational Account to perform initial setup.

Recommended: If you are using Operational Insights with Operations Manager, it is recommended to download the latest updates to ensure you take advantage of new features and functionalities.

Design Guidance

The following considerations apply when extending Azure monitoring through Operational Insights:

Scenario

Points to Consider

Threat analysis

  • When investigating an active threat, always first look at the security panel to quickly assess your security situation.
  • If you suspect an executable might be potentially malicious, you can use event filtering to find all the servers in your environment that are detected running the same executable to see if a pattern emerges.

Global Service Monitor

Global Service Monitor is a cloud service that provides a simplified way to monitor the availability of external web-based applications from multiple locations around the world. Importantly, Global Service Monitor monitors applications from the perspective of the customers who use them.

Because Global Service Monitor monitors from locations that are correlated to customer geographies, application owners can gain insight into customer experiences in addition to the separate issues related to external factors, such as Internet or network issues from application or service issues.

The monitoring experience with Global Service Monitor focuses on the application instead of the infrastructure or individual URL. Global Service Monitor extends the monitoring capabilities of the System Center Operations Manager console so that you can monitor external- and internal-facing web applications in the same place you monitor other applications.

Global Service Monitor uses points-of-presence in Microsoft Azure to monitor and identify external factors to help give you a true reflection of an end-user's experience of a web application.

Feature References

Features

https://technet.microsoft.com/library/jj860368.aspx

Web Application Availability Monitoring

https://technet.microsoft.com/en-us/library/jj860370.aspx

Visual Studio Web Tests

https://technet.microsoft.com/en-us/library/jj860376.aspx

Recommended:

  • When configuring the Global Service Monitor to leverage global locations it is advisable to use an internal location for internal and external perspectives.
  • When monitoring on a large scale (in the thousands of links), it is best to create custom scope views to target specific sets of URLS for Web Application Availability Monitoring.

Microsoft Azure Diagnostics

Azure Diagnostics 1.3 and 1.2 are Azure extensions that enable you to collect diagnostic telemetry data from a worker role, web role, or virtual machine running in Azure. The telemetry data is stored in an Azure Storage account, and it can be used for debugging and troubleshooting, measuring performance, monitoring resource usage, traffic analysis and capacity planning, and auditing.

Feature References

Overview/Configuring

http://azure.microsoft.com/en-us/documentation/articles/cloud-services-dotnet-diagnostics/

Design Guidance

The following considerations apply when extending Azure monitoring through Azure Diagnostics:

Scenario

Points to Consider

Multi-role/tier application

  • When your application consists of multiple tiers, with each tier containing multiple roles, you might want to have a single storage account for each tier to pool your events to facilitate quick troubleshooting in the event of an issue.

Maintenance

Traditional maintenance of solutions in Microsoft Azure is largely dependent on the services that are consumed within Azure (PaaS or IaaS). Customers have a shared
responsibility for solutions deployed using Azure services, and the amount of shared responsibility is dependent on the services consumed.

As a practical example of this, Azure IaaS virtual machines have the requirement to be maintained by the customer, and there is not an existing automated update service provided for the guest operating system. The underlying fabric hardware, virtualization, and service layers are managed by Azure. Using this example, decision points that should drive the maintenance strategy of the architecture include:

  • Existing toolsets – Does the customer have an existing investment in maintenance systems (such as SCCM, third parties, or maintenance scripting)? Is there a desire to continue to use these toolsets to maintain the Azure environment?
  • Degree of connectivity from on-premises to Azure – Is the Azure environment operating as a "standalone" environment, or is there connectivity (such as through ExpressRoute or VPN) to the on-premises infrastructure?

This section covers some of the available options, and it uses the Azure IaaS scenario as a common example.

Microsoft Update

Keeping up-to-date with Microsoft updates for Windows-based virtual machines is critical to ensure that a proper security posture is maintained for these systems. For these systems, Microsoft updates should be applied to Azure IaaS virtual machines in a similar way that updates are applies to the customer's existing environment.

When updating from on-premises or public Microsoft update servers, the updates source location is largely driven by the Azure network design decisions and customer configurations—like any other Windows-based virtual machine.

For example, if forced tunneling of all network traffic is implemented, it would be recommended to leverage on-premises content servers, such as System Center Configuration Manager Distribution Points, Windows Server Update Services (WSUS) servers, or third-party patch management solutions.

This configuration would reduce the amount of network transit. However, if the Azure virtual machines are permitted to access the public Internet by egressing through the Azure datacenters (the default configuration), we recommend that you configure the Windows Update client settings to download the updates from Microsoft Update directly.

Windows Server Update Service

Like Windows Update, Windows Server Update Services (WSUS) can be utilized to patch Azure IaaS virtual machines for customers who wish to have a higher degree of control over patch distribution, release and reporting.

We recommend that organizations review patching and update requirements with customers, specifically requirements around Azure IaaS virtual machines. Currently, Microsoft does not provide a centralized patch management offering for IaaS virtual machines outside of currently shipping patch management solutions such as WSUS and System Center Configuration Manager. Therefore, it is important that the customer understands that their organization is responsible for patching.

Feature References

WSUS TechNet site

https://technet.microsoft.com/en-us/windowsserver/bb332157

WSUS Deployment Guide

https://technet.microsoft.com/en-us/library/dd939906(WS.10).aspx

Mandatory: Azure solutions that deploy virtual machines must update the virtual machine operating systems. Microsoft does not provide guest operating system patching as a service.

Recommended: Azure solutions should contain a patching solution that has reporting or feedback on the status of individual patches. WSUS provides this minimum level of reporting.

Design Guidance

For organizations with small environments, or organizations that have not invested in a patching infrastructure (such as System Center Configuration Manager or a similar third-party tool). WSUS can provide a basic-level patch management infrastructure. However, the virtual machines need to be configured to utilize the WSUS instance like any other Windows-based system in the enterprise.

Typically, the application of updates needs to be done manually on individual servers (or it can be automated). WSUS can provide an update repository for content download, in addition to the approval and release of the patches to the environment. WSUS provides simple update automation at the endpoint, and this should be taken into account in any WSUS design. More advanced update features require an infrastructure similar to that provided by System Center Configuration Manager.

If the customer has an existing WSUS topology, a recommended approach is to deploy an additional WSUS server within the organization's Azure subscription and joined to the WSUS hierarchy. Optionally, this additional server can be configured to be a content store, such that Azure virtual machines download content from this new server.

Alternatively, the WSUS server can be configured to not be a content store, and virtual machines leveraging this WSUS server would download updates from Microsoft Update, while simply reporting to the new WSUS server.

A key design input to the WSUS topology and design is the network configuration in Azure. If forced tunneling is used, the WSUS design leverages a content store in Azure. However, if virtual machines have Internet access from Azure directly, the virtual machines can be configured to use Microsoft Update for updates. Consider your network configuration prior to architecting your update topology.

System Center 2012 R2 Configuration Manager

System Center Configuration 2012 R2 Manager can provide services including installing applications and updating management, and other system configuration tasks. This is particularly attractive in hybrid scenarios where customers may have significant existing investments in Configuration Manager packaging, software update groups, and so on. Microsoft Azure presents additional configurations that should be considered, such as updating location settings, boundaries, and client authentication.

Within Configuration Manager, cloud-based distribution points can provide content hosted in Azure for IaaS virtual machines (or other computers) to consume local to their virtual network. This minimizes egress traffic from those systems consuming update services from the organization's infrastructure.

Additional considerations and limitations include the requirement for a site server to have certificate-based authentication to Azure. Task sequences should be configured as Download all content locally as part of any deployment package. Also, some features are not viable in cloud-based distribution points, and Azure supportability and feasibility should be considered with any Azure deployment.

Feature References

Cloud-based Distribution Points

https://technet.microsoft.com/en-us/library/hh272770.aspx#BKMK_InstallCloudSiteSystems

PKI Certificates for Configuration Manager

https://technet.microsoft.com/en-us/library/230dfec0-bddb-4429-a5db-30020e881f1e#BKMK_clouddp2008_cm2012

Mandatory:

  • If Configuration Manager cloud-based distribution points are to be used, Public key infrastructure (PKI) certificates must be installed on the site server for authentication.
  • Any task sequences deployed to Azure virtual machines configured to use SCCM must be configured as Download all content locally.
  • PXE and multicast, streamed applications, Apple and Unix clients, Windows and third-party updates are not supported features of cloud-based distribution points

Recommended: Azure solutions that leverage Configuration Manager should consider the update source settings for Windows and third-party updates because these are not delivered by cloud-based distribution points. We recommend that you allow clients in Azure to connect to Windows Update to retrieve content. Client settings for Allow access to cloud distribution points must be set to Yes. Configuration Manager client settings can be configured in the Configuration Manager administration console.

Optional: Configuration Manager deployments within Azure can optionally contain a dedicated primary site. This may depend on the on-premises infrastructure, the scale of the Azure environment to be deployed, and the location of the Azure region in respect to the on-premises environments.

Design Guidance

If Configuration Manager is going to be used, we recommend that you leverage an existing Configuration Manager infrastructure, if available. The Configuration Manager architecture (such as primary sites and distribution points) can be an involved and generally a separate engagement beyond an Azure scope of work.

By using an existing Configuration Manager infrastructure, consider the anticipated size of the Azure deployment. If the scale is large enough, consider using a dedicated primary site to service Azure virtual machines.

A key input to the architecture of Configuration Manager is the networking topology. If Azure virtual machines are directly exposed to the Internet via Azure networking, cloud distribution points should be used to deliver content to virtual machines, and Microsoft updates can be downloaded directly from Microsoft Update.

Configuration Manager client settings can be configured in the Configuration Manager console. If forced tunneling is enabled, consider a traditional Configuration Manager distribution point to host content and updates.

A primary consideration with either architecture should be the network bandwidth between the on-premises infrastructure and Azure. Solutions should be architected to minimize the use of this limited bandwidth for patching and maintenance purposes.

Appendix A: Designing a Virtual Network Configuration File to Support Multiple Hop Routing

It is possible to enable the hub and spoke or the daisy-chain approach to support multiple-hop access. This requires making changes to the default network configuration file.

The process is as follows:

  1. Establish all the DNS servers for the subscription.
  2. Establish all the local network site definitions for multi-hop routing.
  3. Establish all the virtual networks for the subscription:
    1. Enable site-to-site connectivity for all the virtual networks that you need to interconnect.
    2. Specify the local networks that you need to connect and route to.
  4. Create the gateways.
  5. Update the network configuration with the gateway addresses.
  6. Establish the shared keys for each side of the gateway.
  7. Connect the gateways.

By using a network configuration file, you must define the DNS servers, the local networks, and the virtual networks.

When designing the DNS server requirements, you can use DNS servers that are managed by Azure, or you can use customer-managed DNS servers. The DNS servers are added at the subscription level, assigned to each virtual network, and delivered by the DHCP server that is servicing the subnet. The customer-managed DNS servers can reside in Azure or on-premises.

When designing the local network sites for the configuration file, you must define local network sites that describe the:

  • Subnet addresses of the virtual networks in Azure
  • Set of subnets that define the on-premises environment that you want to route to
  • Local network site definitions for multi-hop routing between virtual networks

For example, the following image shows how to create a network configuration file that connects three virtual networks together in a daisy-chain:

You need to define the local network sites that correspond to each virtual network:

Lvnet1, Lvnet2, Lvnet3

Then define the virtual networks that you want to create:

vNet1, vNet2, vNet3

Now you need to create the network configuration file for the scenario where you want to connect these three virtual networks, and you want traffic to route the entire length of the daisy chain. This means that you need a dynamic routing gateway on each virtual network.

Note: Since you do not know the actual address of the gateways because they are not created yet, so just use placeholder addresses.

First, define the DNS servers:

<Dns>

<DnsServers>

<DnsServer name="DNS1" IPAddress="10.0.0.4" />

<DnsServer name="DNS2" IPAddress="10.0.0.5" />

</DnsServers>

</Dns>

Then define the local networks lvnet1, lvnet2, lvnet3 with placeholder gateway addresses:

<LocalNetworkSites>

<LocalNetworkSite name="lvnet1">

<AddressSpace>

<AddressPrefix>10.1.0.0/16</AddressPrefix>

</AddressSpace>

<VPNGatewayAddress>1.1.1.1</VPNGatewayAddress>

</LocalNetworkSite>

<LocalNetworkSite name="lvnet2">

<AddressSpace>

<AddressPrefix>10.2.0.0/16</AddressPrefix>

</AddressSpace>

<VPNGatewayAddress>2.2.2.2</VPNGatewayAddress>

</LocalNetworkSite>

<LocalNetworkSite name="lvnet3">

<AddressSpace>

<AddressPrefix>10.3.0.0/16</AddressPrefix>

</AddressSpace>

<VPNGatewayAddress>3.3.3.3</VPNGatewayAddress>

</LocalNetworkSite>

</LocalNetworkSites>

Then define the virtual networks, vNet1, vNet2, vNet3. You need to know the following details:

  • The regional location where you want the virtual network placed
  • The available address space
  • Subnets to break down the address space (minimum of one)
  • The subnets for the gateways to be placed
  • Local network names where you want to enable routing

To define the local network sites that specify the routing, create a single local network site definition that traverses multiple subnets. Instead of defining lvnet1, lvnet2, lvnet3, and specifying multiple local network sites in the virtual network definition, you define "transitive" local network site definitions and specify those in the virtual network definition.

To allow routing from vNet1 to vNet3, you would define a single local network site definition called lvnet2-3 and place the address spaces (that you want to get to on the other side of the gateway) for vNet2 and vNet3 in the single definition, for example:

      <LocalNetworkSite name="lvnet2-3">

        <AddressSpace>

          <AddressPrefix>10.2.0.0/16</AddressPrefix>

          <AddressPrefix>10.3.0.0/16</AddressPrefix>

        </AddressSpace>

        <VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>

      </LocalNetworkSite>

You would define the opposite local network site definition to allow routing from vNet3 to vNet1 and vNet2 to vNet1, and specify the additional vNet1 address space:

      <LocalNetworkSite name="lvnet2-1">

        <AddressSpace>

          <AddressPrefix>10.2.0.0/16</AddressPrefix>

          <AddressPrefix>10.1.0.0/16</AddressPrefix>

        </AddressSpace>

        <VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>

      </LocalNetworkSite>

For the vNet1 virtual network definition, you would specify the local network site that allowed the routing path through the gateway that you want to traverse lvnet2-3:

<VirtualNetworkSite name="vnet1" Location="EAST US">

        <AddressSpace>

          <AddressPrefix>10.1.0.0/16</AddressPrefix>

        </AddressSpace>

        <Subnets>

          <Subnet name="Subnet-1">

            <AddressPrefix>10.1.0.0/19</AddressPrefix>

          </Subnet>

          <Subnet name="GatewaySubnet">

            <AddressPrefix>10.1.32.0/29</AddressPrefix>

          </Subnet>

        </Subnets>

        <Gateway>

          <ConnectionsToLocalNetwork>

            <LocalNetworkSiteRef name="lvnet2-3">

              <Connection type="IPsec" />

            </LocalNetworkSiteRef>

          </ConnectionsToLocalNetwork>

        </Gateway>

      </VirtualNetworkSite>

For vNet3, specify the lvnet2-1 local network site definition.

For vNet2, specify the local network sites you want to route to (going both directions) as separate single-hop definitions:


<VirtualNetworkSite name="vnet2" Location="EAST US">

<AddressSpace>

<AddressPrefix>10.2.0.0/16</AddressPrefix>

</AddressSpace>

<Subnets>

<Subnet name="Subnet-1">

<AddressPrefix>10.2.0.0/19</AddressPrefix>

</Subnet>

<Subnet name="GatewaySubnet">

<AddressPrefix>10.2.32.0/29</AddressPrefix>

</Subnet>

</Subnets>

<Gateway>

<ConnectionsToLocalNetwork>

<LocalNetworkSiteRef name="lvnet1">

<Connection type="IPsec" />

</LocalNetworkSiteRef>

<LocalNetworkSiteRef name="lvnet3">

<Connection type="IPsec" />

</LocalNetworkSiteRef>

</ConnectionsToLocalNetwork>

By using this approach with local network site definitions that specify the routing subnets, the network configuration file for the vNet1<->vNet2<->vNet3 daisy chain configuration would look like this:

<NetworkConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/ServiceHosting/2011/07/NetworkConfiguration">

<VirtualNetworkConfiguration>

<Dns>

<DnsServers>

<DnsServer name="DNS1" IPAddress="10.0.0.4" />

<DnsServer name="DNS2" IPAddress="10.0.0.5" />

</DnsServers>

</Dns>

<LocalNetworkSites>

<LocalNetworkSite name="lvnet1">

<AddressSpace>

<AddressPrefix>10.1.0.0/16</AddressPrefix>

</AddressSpace>

<VPNGatewayAddress>138.91.10.68</VPNGatewayAddress>

</LocalNetworkSite>

<LocalNetworkSite name="lvnet2-3">

<AddressSpace>

<AddressPrefix>10.2.0.0/16</AddressPrefix>

<AddressPrefix>10.3.0.0/16</AddressPrefix>

</AddressSpace>

<VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>

</LocalNetworkSite>

<LocalNetworkSite name="lvnet2-1">

<AddressSpace>

<AddressPrefix>10.1.0.0/16</AddressPrefix>

<AddressPrefix>10.2.0.0/16</AddressPrefix>

</AddressSpace>

<VPNGatewayAddress>138.91.18.148</VPNGatewayAddress>

</LocalNetworkSite>

<LocalNetworkSite name="lvnet3">

<AddressSpace>

<AddressPrefix>10.3.0.0/16</AddressPrefix>

</AddressSpace>

<VPNGatewayAddress>138.91.18.174</VPNGatewayAddress>

</LocalNetworkSite>

</LocalNetworkSites>

<VirtualNetworkSites>

<VirtualNetworkSite name="vnet1" Location="EAST US">

<AddressSpace>

<AddressPrefix>10.1.0.0/16</AddressPrefix>

</AddressSpace>

<Subnets>

<Subnet name="Subnet-1">

<AddressPrefix>10.1.0.0/19</AddressPrefix>

</Subnet>

<Subnet name="GatewaySubnet">

<AddressPrefix>10.1.32.0/29</AddressPrefix>

</Subnet>

</Subnets>

<Gateway>

<ConnectionsToLocalNetwork>

<LocalNetworkSiteRef name="lvnet2-3">

<Connection type="IPsec" />

</LocalNetworkSiteRef>

</ConnectionsToLocalNetwork>

</Gateway>

</VirtualNetworkSite>

<VirtualNetworkSite name="vnet2" Location="EAST US">

<AddressSpace>

<AddressPrefix>10.2.0.0/16</AddressPrefix>

</AddressSpace>

<Subnets>

<Subnet name="Subnet-1">

<AddressPrefix>10.2.0.0/19</AddressPrefix>

</Subnet>

<Subnet name="GatewaySubnet">

<AddressPrefix>10.2.32.0/29</AddressPrefix>

</Subnet>

</Subnets>

<Gateway>

<ConnectionsToLocalNetwork>

<LocalNetworkSiteRef name="lvnet1">

<Connection type="IPsec" />

</LocalNetworkSiteRef>

<LocalNetworkSiteRef name="lvnet3">

<Connection type="IPsec" />

</LocalNetworkSiteRef>

</ConnectionsToLocalNetwork>

</Gateway>

</VirtualNetworkSite>

<VirtualNetworkSite name="vnet3" Location="EAST US">

<AddressSpace>

<AddressPrefix>10.3.0.0/16</AddressPrefix>

</AddressSpace>

<Subnets>

<Subnet name="Subnet-1">

<AddressPrefix>10.3.0.0/19</AddressPrefix>

</Subnet>

<Subnet name="GatewaySubnet">

<AddressPrefix>10.3.32.0/29</AddressPrefix>

</Subnet>

</Subnets>

<Gateway>

<ConnectionsToLocalNetwork>

<LocalNetworkSiteRef name="lvnet2-1">

<Connection type="IPsec" />

</LocalNetworkSiteRef>

</ConnectionsToLocalNetwork>

</Gateway>

</VirtualNetworkSite>

</VirtualNetworkSites>

</VirtualNetworkConfiguration>

</NetworkConfiguration>

Appendix B: Azure ExpressRoute Connectivity Examples

ExpressRoute Peering Requirements Using NSP Model

When designing a peering approach with an NSP model, there are a set of prerequisites that must be determined and allocated to accomplish a circuit connection. For the purposes of illustrating this behavior, AT&T NetBond will be used as an example.

Prerequisites:

  • Customer must have access to the provider's self-service portal.
  • The AT&T circuit must be allocated and viewable in the AT&T customer self-service portal.
  • A dedicated /29 CIDR address space must be allocated to create the VLANs from the AT&T edge router to the Azure edge router. The CIDR cannot overlap any address space in Azure or the on-premises network. A separate /29 CIDR is required for the private peering and public peering connections.
  • A dedicated /28 CIDR address space must be available in the virtual network that will be connected to the ExpressRoute circuit. The CIDR cannot overlap any address space in Azure or the on-premises network.
  • The latest version of Azure PowerShell must be installed on the system that will be creating the peering connection. You must import the ExpressRoute cmdlet module from:
    'C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ServiceManagement\Azure\ExpressRoute\ExpressRoute.psd1'

When you have all the prerequisites identified, proceed to the provisioning process within the Microsoft Azure subscription:

  1. Retrieve the provider locations and connection speeds by using:
    Get-AzureDedicatedCircuitServiceProvider
  2. Find the provider, the circuit location, and the desired speed.
  3. Use the New-AzureDedicatedCircuit cmdlet to create the Azure side of the circuit that will be linked to up to 10 virtual networks.
  4. Retrieve the circuit service key that will be used to establish the shared key for both sides of the circuit links.
  5. Go to the Azure portal and build the virtual networks (make sure you select Connect to local network), subnets, and gateway subnet, and then create the gateway.
  6. Go to the provider's self-service portal (AT&T in this example).
  7. Create the virtual Network Circuit by using Microsoft as the provider, the Azure datacenter location, and a name for the virtual network circuit, and then set the bandwidth.

  1. Within the provider portal (AT&T in this example), assign the VLAN by using the virtual network circuit you created. This is where you provide the /29 CIDR for the private peering, a VLAN name, and the service key that you created earlier.

  1. The process will take a few minutes and then the portal will show success:

  1. Run the Get-AzureDedicatedCircuit cmdlet and you should see the state of the circuit change to Provisioned.
  2. Use the New-AzureDedicatedCircuitLink with the service key to establish connections to virtual networks in Azure.
  3. Use the virtual network dashboard to verify that the circuit has been established
  4. To check the connectivity between the on-premises location and Azure you must use a TCP connectivity testing tool (such as PSPing) because Internet Control Message Protocol (ICMP) connectivity testing is not possible.

The process for creating the public peer connection is similar, starting with a VLAN creation for the existing virtual network circuit, but you need to use the second /29 CIDR address space and append _public to the service key.

ExpressRoute Peering Requirements Using IXP Model

When designing a peering approach with an IXP model, there are a set of prerequisites that must be determined and allocated to accomplish a circuit connection.

Prerequisites:

  • Customer must provide redundant routers that go in the Exchange Providers datacenter.
  • A dedicated /29 CIDR address space must be allocated to create the VLANs from the provider's edge router to the Azure edge router. The CIDR cannot overlap any address space in Azure or the on-premises network.
  • VLAN IDs must be allocated for the peering circuits.
  • A separate /29 CIDR and VLAN ID is required for the private peering and public peering connections.
  • A dedicated /28 CIDR address space must be available in the virtual network that will be connected to the ExpressRoute circuit. The CIDR cannot overlap any address space in Azure or the on-premises network.
  • An autonomous system number (ASN) must be provided to represent the customer network, either a public or private ASN can be used. If a private ASN is used, it must be greater than 65000.
  • MD5 hash password is required to configure the Border Gateway Protocol (BGP) routing.
  • The latest version of Azure PowerShell must be installed on the system that will be creating the peer connection. You must import the ExpressRoute cmdlet module from:
    'C:\Program Files (x86)\Microsoft SDKs\Azure\PowerShell\ServiceManagement\Azure\ExpressRoute\ExpressRoute.psd1'

When you have all the prerequisites identified, the provisioning process can proceed:

  1. Retrieve the provider locations and connection speeds by using the Get-AzureDedicatedCircuitServiceProvider cmdlet.
  2. Find the provider, the circuit location, and the desired speed.
  3. Use the New-AzureDedicatedCircuit cmdlet to create the Azure side of the circuit, which will be linked to up to 10 virtual networks.
  4. Retrieve the circuit service key that will be used to establish the shared key for both sides of the circuit links.
  5. Within the Azure portal, build the virtual networks (make sure you select Connect to local network), subnets, and gateway subnet, and then create the gateway.
  6. Contact the Exchange Provider (IXP) and provide the service key that was returned when you created the dedicated circuit.
  7. The IXP performs the cross-connection on the circuits.
  8. Establish BGP peering by using the New-AzureBGPPeering cmdlet. You need to run the cmdlet twice, once for private peering and once for public peering.
  9. Use the New-AzureDedicatedCircuitLink cmdlet to establish circuit to virtual network links for the private peering access.

Appendix C: Active Directory Administrative Tier Model

Microsoft published a tiered model of administrative control in the Microsoft whitepaper Mitigating Pass-the-Hash and Other Credential Theft, version 2 (pages 15-19). This section contains additional detail about this tier model.

  • Tier 0 – Control of most or all assets in the environment. Tier 0 includes accounts, groups, and other assets that have direct or indirect administrative control of the Active Directory forest, domains, or domain controllers, and all the assets in it. The security sensitivity of all Tier 0 assets is equivalent as they are all effectively in control of each other.
  • Tier 1 – Control of enterprise servers and applications. Tier 1 assets include server operating systems, cloud services, and enterprise applications. Tier 1 administrator accounts have administrative control of a significant amount of business value that is hosted on these assets. Examples include server administrators who maintain these operating systems or Exchange administrators with the ability to impact all users messaging data.

Tier 2 – Control of user workstations and devices. Tier 2 administrator accounts have administrative control of a significant amount of business value that is hosted on user workstations and devices. Examples include Help Desk and computer support administrators because they can impact the integrity of almost any user data.

The Tier model prevents escalation of privilege by restricting what administrators can control and where they can log on (because logging on to a computer grants control of those credentials and all assets managed by those credentials).

Primary responsibilities

Tier 0 administrator:

To manage the identity store and a small number of systems that are in effective control of it, and:

  • Can manage and control assets at any level as required
  • Can only log on interactively or access assets trusted at the Tier 0 level

Tier 1 administrator:

To manage enterprise servers, services, and applications, and:

  • Can only manage and control assets at the Tier 1 or Tier 2 level
  • Can only access assets (via network logon type) that are trusted at the Tier 1 or Tier 0 levels
  • Can only interactively log on to assets trusted at the Tier 1 level

Tier 2 administrator:

To manage enterprise desktops, laptops, printers, and other user devices, and:

  • Can only manage and control assets at the Tier 2 level
  • Can access assets (via network logon type) at any level as required
  • Can only interactively log on to assets trusted at Tier 2 level

The security dependencies of the domain controllers include all hosts, groups, accounts, and other objects that have effective control of them.

Note that there are assets that can have Tier 0 impact to availability of the environment, but do not directly impact the confidentiality or integrity of the assets. These include the DNS Server service and critical network devices like Internet proxies.

Appendix D: Removing Service Accounts from Tier 0

Service or application accounts that are granted Tier 0 privileges introduce a significant amount of risk to an organization's mission and business. This configuration should be remediated as a top security priority.

Organizations may have one or more service accounts in a group that grants tier 0 access, such as domain admins, domain\administrators, schema admins, backup operators, or enterprise admins. Frequently, domains are configured such that account operators and server operators are also effective Tier 0 operators via the accounts, servers, and applications these roles can effectively control.

The specific options to mitigate these risks depend on the application functionality that initiated the granting of these rights. You should consult the product documentation and vendor support to ensure all designed configurations are supported by the vendor. Always test changes before deploying to production.

In most cases, these configurations have been made to support the need for an application to exercise administrative rights for one of the following scopes:

  • All clients and/or servers in the domain
  • All user or computer accounts in the domain

No known applications require both of these kinds of rights, so granting Tier 0 rights for both creates a state of "over-permissioning" that only benefits an attacker or malicious insider.

Important: Tier 0 rights also provide the ability to steal all password hashes in the Active Directory database, which is not required by any known legitimate applications except for the Active Directory Migration Tool (for password synch during migration) and password synchronization to Azure Active Directory.

To remediate this risk, we recommend using preventive and detective controls.

Preventive controls should always include a least-privilege design of permissions, regular password changes, and any applicable application-specific controls. Each role should be designed to the least privilege required for the tool and tasks required using one of the three sets of guidance in this section.

Detective controls such as monitoring for anomalous behavior should be implemented. Expected behavior will vary with each application, but organizations should look for:

  • Use of the account on an unexpected computer
  • Modification of the group that grants privilege
  • Modification of the account

Other detective controls specific to the application (such as unused functionality)

Scenario I – Admin access on all clients/servers/ domain members

This access is typically required for configuration and management tools for a variety of tasks including agent installation, agent repair, troubleshooting actions for the operating system and applications, or security compliance scanning.

Access is granted by adding an account directly or indirectly to the membership of the local administrators group on each computer. By default, Domain Admins is in this group so it is used frequently in many organizations, resulting in a dangerous over-permissioning state.

Group Policy preferences can be used to centrally grant local administrators access without also granting domain administrators privileges. For more information, see Configure a Local Group Item.

Scenario II – Administrator control of Active Directory accounts

This is a configuration that is frequently required for identity management tools that perform provisioning, synchronization, and other similar functions.

Frequently, domain admins or account operators are used for these accounts, resulting in over-permissioning risk. Account operators is effectively a Tier 0 group because they have the ability to reset passwords and take control of any user or computer account not protected by the AdminSDHolder process, which frequently includes Tier 0 applications and administrators.

This can be addressed by delegating permissions to manage the object's OUs where the target accounts reside:

  1. Identify the objects types required (typically user accounts).
  2. Identify the permissions required on the objects (typically full control).
  3. Identify the OUs that contain the objects that need to be required.
  4. Create a group and a new service account, and place the service account in the group.
  5. Grant permissions to this group to the OUs for the object type and permission required.
    For example, Forefront Identity Manager Active Directory Management Agent (ADMA) accounts require:
  • Create, Delete, and Modify permissions for user objects in the OUs where Forefront Identity Manager will maintain user objects.
  • "Replicating Directory Changes" on the directory partition of the domain you're connecting Forefront Identity Manager to.

For more information, see KB303972: How to grant the "Replicating Directory Changes" permission for the Microsoft Metadirectory Services ADMA service account.

For more information about Forefront Identity Manager ADMA, see Management Agent Communication Ports, Rights, and Permissions.

Several technical approaches may be used to automate granting privileges, see How to View or Delete Active Directory Delegated Permissions.

Important: These changes should be tested in a lab prior to implementing them in production.

When piloting to production, create a new account with the previous reduced permissions, and reconfigure the application to use it during a maintenance window. Test all application functionality during that window, as possible.

Appendix E: Recommended Change Advisory Board Approval Process for Security Standards

A Change Advisory Board (CAB) is the discussion forum and approving authority for changes that could impact the security profile of the organization. Any exceptions to these standards should be submitted to the CAB with a risk assessment and justification.

Each standard in this document is broken out by the criticality of meeting the standard for hosts trusted at a given Tier level.

Mandatory

Strongly Recommended

kjRecommended

kjOptional Elements

All exceptions for Mandatory items (marked with red octagon or an orange triangle in this document) are considered temporary, and they need to be approved by the CAB. Guidelines include:

  • The initial request requires justification risk acceptance signed by personnel's immediate supervisor, and it expires after six months.
  • Renewals require justification and risk acceptance signed by a business unit director, and they expire after six months.

All exceptions for Recommended items (marked with a yellow circle in this document) are considered temporary, and need to be approved by the CAB. Guidelines include:

  • The initial request requires justification risk acceptance signed by personnel's immediate supervisor, and it expires after 12 months.
  • Renewals require justification and risk acceptance signed by a business unit director, and they expire after 12 months.

Exceptions for Recommended items (marked with a blue square in this document) do not require acceptance by the CAB.

Note that there are specific conditions for exceptions to some standards in this document.

Administrative Security Standards

Strategic Approach

The security value of any given IT asset (application, server, workstation, and so on) is evaluated by the importance of the system and its data to the mission. A system or its data may have intrinsic value, which makes it a primary target that is inherently valuable to an attacker.

Systems are also frequently targeted because they allow attackers to directly or indirectly access their primary target. These intermediary targets are not inherently valuable, but they serve as a valuable stepping stone to get the attacker closer to their objective of capturing the primary target.

Primary Target – Intrinsically has value to an attacker (for example, a valuable piece of data or a mission-critical system).

Intermediary Target – Has value to the attacker as a means to reaching a primary target of interest.

Because the Active Directory forest is in effective control of all the organization's business assets joined to a domain in that forest, it is a security dependency of all these business assets.

The Active Directory configuration and operation should adhere to the tier model published in the Microsoft whitepaper Mitigating Pass-the-Hash and Other Credential Theft, version 2 (pages 15-19).

A detailed description of these tiers are described previously, they are:

  • Tier 0 – Control of most to all assets in the environment.
  • Tier 1 – Control of enterprise server and applications.

Tier 2 – Control of user workstations and devices.

Critical Tier 0 Functions

The Active Directory forest, domain controllers, and their security dependencies are collectively classified as Tier 0. The security dependencies of the domain controllers include all hosts, groups, accounts, and other objects that have effective control of them. Additional details on Tier 0 contents are described in a previous Appendix.

These Tier 0 functions have been identified as required for effective management of the environment. Each of these assets should be fully under positive control, and they should meet the Tier 0 hardening standards:

  • Active Directory and domain controllers
  • Certification authorities (CAs)
  • Tier 0 identity management systems
  • Tier 0 configuration management
  • Tier 0 operational monitoring
  • Tier 0 backup
  • Tier 0 virtualization
  • Tier 0 security tools

Summary of Standards

This section includes a summary of the standards and their applicability to hosts in each tier.

Standards Applicability Summary

This table describes the systems to which the standards should be applied.

Tier 0

Tier 1

Tier 2

Standard

     

Identity System Hosts

N/A

N/A

Active Directory and domain controllers

N/A

N/A

Certification authorities (CAs)

N/A

N/A

Privileged Identity Management (PIM) systems

N/A

N/A

Enterprise Identity Management Systems

N/A

N/A

Identity federation services (such as AD FS)

Identity system administrative workstations

     

Infrastructure Management and Security Hosts

Configuration management

Operational monitoring

Backup

Virtualization

Security tools

Infrastructure system administrative workstations

     

Cloud Services

Microsoft Azure Infrastructure as a Service (IaaS)

Table 2 - Standards Applicability

Operating System Standards Summary

The Tier columns in this table refer to the trust level that must be met for assets (workstations, servers, domain controller, admin workstation, and so on) to be used by that Tier of administration.

Tier 0

Tier 1

Tier 2

Standard

     

Host Security

Current Operating System Version

Block access to public internet and email

Verification of all media in build as clean

Rapid or automated patching

Stringent restrictions for applications, middleware, management agents

Restricted local administrators membership

Compliance with Microsoft Security baselines

Up-to-date anti-malware

Standard security tools

Enhanced Mitigation Experience Toolkit (EMET)

Enforce RDP RestrictedAdmin on admin workstations

Attack surface analysis

Physical security

Full disk encryption

N/A

Baseboard management controller (BMC) security

UEFI, TPM, and secure boot enabled

Application whitelisting

USB media restrictions

N/A

Outbound traffic restrictions (no Internet)

Inbound traffic restrictions (default block)

Use of scheduled tasks

Logon restrictions

Enable rapid rebuild process

Follow application security guidance (if available)

Table 2 - Operating System Standards Summary

Active Directory and Identity Data Standards Summary

The Tier columns in this table refer to the Tier level of the data or objects, the control of which typically impact all assets in that tier.

Tier 0

Tier 1

Tier 2

Standard

Follow administrative OU structure

N/A

N/A

AdminSDHolder ACLs must be default

ACLs on Active Directory objects must adhere to tier model

Access to stored LAPS Local Account Passwords

Administrative accounts restricted from delegation

Regularly randomize password on smart card accounts (SCRIL Cycling)

     

Group Policy Objects (GPOs)

GPO adherence to Microsoft baselines

Group Policy permissions

Startup and logon script quality control and change management

     

Service Accounts

N/A

N/A

No service accounts will have Tier 0 privileges

N/A

Service accounts tracked, documented, and reviewed

N/A

Service account passwords complexity and expiration

N/A

Service account privileges assigned through groups

N/A

Service accounts restricted from delegation

N/A

Service accounts monitored for anomalous logon behavior

N/A

Use Group Managed service accounts (gMSAs) instead of user accounts

N/A

Service account logon restrictions

     

Certification Authority Data

N/A

N/A

GPO trust

N/A

N/A

NTAuth store

Table 3 - Active Directory and Identity Data Standards Summary

Operational Practices Standards Summary

The Tier columns in this table refer to the Tier level of the administrative account, the control of which typically impacts all assets in that tier.

Tier 0

Tier 1

Tier 2

Standard

     

Administrator Enablement, Accountability, and Lifecycle Enforcement

Administrative personnel standards

Administrative security briefing and accountability

Provisioning and de-provisioning processes for administrative accounts

     

Operationalize Least Privilege

Limit count of administrators

Dynamically assign privileges

     

Manage Risk of Credential Exposure

Separate administrative accounts

Administrator logon practices

Use of approved support technology

No browsing the public internet with admin accounts or from admin workstations

No accessing email with admin accounts or from admin workstations

Store service and application account passwords in a secure location.

     

Strong Authentication

Enforce smartcard multi-factor authentication for all admin accounts

Enforce multi-factor authentication for all cloud admin accounts

     

Rare Use / Emergency Procedures

N/A

N/A

Correctly follow established processes for all emergency access accounts

N/A

N/A

Restrict and monitor usage of emergency access accounts

N/A

N/A

Temporarily assign Enterprise Admin and Schema Admin membership

Table 4 - Operational Standards Summary

Operating System Standards

These standards are designed to protect operating systems against unauthorized administrative control.

Tier 0 host standards are required for all Tier 0 computer assets including domain controllers, Tier 0 management servers, and Tier 0 management workstations.

All administrative workstations must meet or exceed the standards of the highest value assets they manage. As an example, domain admin workstations must meet the Tier 0 security standards because Tier 0 accounts will be logging on to them to administer the domain

Host Hardening Standard

Block Access to Public Internet and Email

Access to the public Internet is disallowed for all servers, all administrative workstations, and all administrative users. No email accounts will be assigned to any administrative account.

Exceptions can be approved for the change approval board for required Internet connectivity to specific Internet assets, such as:

  • Software updates from Windows Update or similar services
  • Antimalware updates from vendor sites
  • Inbound connectivity to identity federation applications that service authentications from the Internet
  • Administration of the cloud infrastructure, such as Microsoft Azure, Office 365, or Google Gmail

Technically restrict all exceptions as tightly as possible to DNS addresses or IP ranges.

Current Operating System Version

To provide the latest security capabilities and designs, all systems should be installed and running the latest version of the operating system at the time of installation. All hosts in operation should be running the latest operating system version or one major version older than the current (N-1).

Verification of All Media in Build as Clean

Use verified installation media to build all hosts to mitigate against supply chain risks, such as malware installed in a master image or injected into an installation file during download or storage. This includes all executable code such as operating system installation, application installations, tools, and plug-ins. Any unsigned code should be analyzed for security concerns with extra vigor.

The media should be protected from tampering throughout the lifecycle, including:

  • Software source
  • Software storage
  • Software usage
  • Physical build environment

Software Source

The source of the software should be validated through one of the following means:

  • Software is obtained from physical media that is known to come from the manufacturer or a reputable source, typically manufactured media shipped from a vendor.
  • Software is obtained from the Internet and validated with vendor-provided file hashes.
  • Software is obtained from the Internet and validated by downloading and comparing two independent copies:
    • Download to two hosts with no security relationship (not in the same domain and not managed by the same tools), preferably from separate Internet connections.
    • Compare the downloaded files using a utility like certutil:

      certutil –hashfile <filename>

When possible, all application software, such as application installers and tools should be digitally signed and verified using Windows Authenticode with the Windows Sysinternals tool, sigcheck.exe, with revocation checking. Some software may be required where the vendor may not provide this type of digital signature.

Software Storage

After obtaining the software, it should be stored in a location that is protected from modification, especially by internet-connected hosts or personnel trusted at a lower level than the systems where the software or operating system will be installed. This storage can be physical media or a secure electronic location.

Software Usage

Ideally, the software should be validated at the time it is used, such as when it is manually installed, packaged for a configuration management tool, or imported into a configuration management tool.

Physical Build Environment

The credential theft solutions and server resources should ideally be built in a physically secure lab, using only known good software media. This lab should be established for creating the solution and used for any updates or maintenance.

Rapid or Automated Patching

All security updates available for operating systems and applications should be applied within five days of being generally available. This protects against attacks that leverage rapid-reverse engineering techniques to develop exploits using security updates.

Security updates should be applied automatically where operationally feasible, such as administrative workstations and administrative forest hosts. Automated security updates may not be feasible for production domain controllers given the risk for production outages.

Stringent Restrictions for Applications, Middleware, and Management Agents

To prevent additional attack surfaces on hosts, install only approved software that is specifically required. The number of management agents with full system control should be limited as much as possible.

Each management agent should be approved by the change approval board, and the justification should include specific support of how the business and mission value of the tool's presence exceeds the business and mission risk of not having the functionality.

Restricted Local Administrators Membership

To limit the number of accounts that can create risk to the organization, restrict the number of accounts in the local administrators group of all systems to the smallest number possible. To protect against inadvertent weakening of the security posture and enforce governance standards, operate all administrative workstations with standard user privileges on the local hosts.

Configure local administrative groups and accounts as follows:

  • Interactive users of systems run with standard privileges and are not local administrators of the system, including:
    • Standard user accounts on standard user desktops and laptops. Make no exceptions for granting local administrative privileges for administrative accounts on assigned administrative workstations.
    • Administrative accounts on their assigned admin workstations.
  • All exceptions for standard users are temporary and need to be approved by the change approval board.
    • Initial request requires justification risk acceptance signed by personnel's immediate supervisor, and it expires after six months.
    • Renewals require justification and risk acceptance signed by the business unit director, and they expire after six months.
  • Create no local accounts on the system beyond the built-in accounts.
  • The membership of the local administrators group on workstations and servers is managed with the Restricted Groups feature to ensure compliance:
    • Local administrator account (-500 SID)
    • Authorized workstation or server groups approved by change approval board
  • The built-in local administrator account is the only local administrative account on the system.
Compliance with Microsoft Security Baselines

To protect against configuration vulnerabilities, configure all Windows Tier 0 hosts to comply with the appropriate security baselines from Microsoft Security Compliance Manager (SCM). Apply them with Group Policy Objects (GPOs) to ensure consistent enforcement. Apply security configuration guidance for all other operating systems from the manufacturers.

Up-to-Date Antimalware

All hosts should include antimalware software to protect against known threats and malware. Note that System Center Endpoint Protection (SCEP) is generally preferable for locked-down systems, such as administrative workstations, because it can leverage WSUS servers to obtain signatures.

Enhanced Mitigation Experience Toolkit

The Enhanced Mitigation Experience Toolkit (EMET) provides exploit mitigations
for many popular applications that may process untrusted data. These mitigations can protect against known and unknown threats and exploits.
All hosts that run applications that have been tested to work with EMET (primarily web browsers, media players, and productivity applications) should have EMET installed, and those applications opted in to EMET protections.

Enforce RDP RestrictedAdmin on Admin Workstations

To provide protection against a lateral traversal attack that uses a domain account, on all administration workstations, enforce the use of RestrictedAdmin mode for all Remote Desktop Protocol (RDP) connections to remote servers and workstations.

This is enabled by applying the following Group Policy setting to the admin workstation to enforce it on all RDP connections from this computer:

Computer Configuration\Administrative Templates\System\Credentials Delegation\Restrict delegation of credentials to remote servers

The following parameter for the Remote Desktop client application can also be supplied from the command line to enable this mode: mstsc.exe /RestrictedAdmin.

For more information about this capability, see:

Attack Surface Analysis

Hosts should undergo attack surface analysis to prevent introduction of new attack vectors to Windows during installation of new software. The Attack Surface Analyzer should be used at the following times to help assess configuration settings on a host and identify attack vectors:

  • Completion of an installation image (analyze attack surface for unexpected exposure)
  • During a major configuration change, such as the installation of a new application (analyze deltas)
Physical Security

Only authorized personnel have physical access to the Tier assets, including servers, storage, administrative workstations, and backup files.

Full Disk Encryption

All systems should use BitLocker or a similar full disk/volume encryption
to mitigate against physical loss of computers, such as administrative laptops that are used remotely.

Baseboard Management Controller Security

Baseboard management controllers (such as Hewlett-Packard's iLO and Dell's DRAC) can be leveraged as an attack vector in much the same way as any other software component through unpatched vulnerabilities, weak passwords, or misconfigurations.

The most comprehensive mitigation for attacks based on a baseboard management controller is to disable this functionality in the system BIOS or UEFI.

If lights-out management functionality is required, reduce the risks exposed by a baseboard management controller with measures including the following:

  • Isolate computers with a baseboard management controller that is enabled from the production network so they can't be attacked from a compromised standard user desktop or laptop.
  • Use a system that provides automatic alerts for when baseboard management controller firmware is updated, and update your computers as soon as possible after regression testing is complete.
  • Use strong passwords, and if it available for your baseboard management controller instantiation, use multi-factor authentication for administrative access.
UEFI/TPM/Secure Boot Enabled

All physical systems should be configured with the secure boot feature to mitigate against attackers or malware attempting to load malicious code during the boot process. This feature was introduced in Windows 8 and Windows Server 2012 to leverage the Unified Extensible Firmware Interface (UEFI). See UEFI Firmware for more information.

Application Whitelisting

All hosts should implement software restriction with AppLocker or a similar technology to ensure that only authorized administrative software is run on the host operating system.

USB Media Restrictions

All hosts should implement USB restrictions to protect against physical infection vectors. See Control Read or Write Access to Removable Devices or Media for more information.

Inbound Traffic Restrictions (Default Block)

To protect against network attacks, host firewalls or network devices should block all incoming connections to hosts from the public Internet.

Exceptions are only allowed for designated hosts that download security updates, such as the Windows Server Update Services (WSUS) servers or antimalware servers that obtain signature updates. In these exceptions, controls should be implemented if possible to restrict Internet access to only the authorized Internet locations.

Outbound Traffic Restrictions (No Internet)

To protect against attacks that leverage inadvertent admin actions, such as browsing and email, host firewalls or network devices should block all outbound access to the public Internet. All changes to hosts that require software from Internet locations should be obtained by using the media verification process or through vendor-provided automated update mechanisms.

Use of Scheduled Tasks

Scheduled tasks provide the ability to automatically run arbitrary code and scripts on many hosts. All use of scheduled tasks beyond tasks created automatically by installing the operating system and authorized applications should be carefully reviewed and managed.

  • All startup and logon scripts should be reviewed for security risks, including the ability of unauthorized accounts to modify:
    • Script and executable content.
    • Files called by or used by a scheduled task.
  • Perform a threat modelling exercise to discover other types of risks. For more information, see SDL Threat Modeling Tool.
  • All changes to the scripts or executables should be submitted to the change approval board.
Logon Restrictions

Restrict sign-in to the host to only the accounts that are expected to log onto them for regular, daily use or to provide support.

Enable Rapid Rebuild Process

To limit the risk of leaving a potentially compromised Tier 0 host in operation, a rapid rebuild process should be established for all Tier 0 hosts. This allows the organization to rapidly deploy replacement units if compromise of Tier 0 assets is suspected.

Important: This rebuild capability does not constitute the complete response process! A complete response process must meet organizational operational security, investigation, and intelligence-gathering requirements, which can include forensic analysis, allowing an adversary to persist to gather intelligence, and other actions as required.

Follow Application Security Guidance (if available)

Document the justification for varying from the manufacturer's security recommendations for each application installed on the host.

Manufacturers can provide explicit or de facto security guidance in several forms, including:

  • Product-specific security configurations
  • Documentation such as TechNet security guidance
  • Default product configurations

Avoid disabling any security features that are enabled by default and follow the manufacturer's guidance for configuring any security-related settings.

Active Directory and Identity Data Standards

Active Directory is the authoritative identity store, and the data in it is composed of two types of data:

  • Objects stored directly in the data
  • Files stored on the SYSVOL share (primarily Group Policy settings and scripts)

Many of these objects can provide an effective means of controlling the directory through use, abuse, or modification. These standards prevent abuse of objects that provide a well-known means of control of Tier 0.

Each of these standards should be explicitly followed to prevent an escalation of privilege vulnerability in the Active Directory data. Additionally, all Active Directory objects that are classified at Tier 0 may not be modified without review and approval by the change approval board.

Well-known Tier 0 objects include:

  • All domain objects (DC=DomainName,DC=Extensions)
  • Domain controller's OU and all contents
  • Built-in groups and all members (including nested members)
    • Domain Admins
    • Enterprise Admins
    • Schema Admins
    • Built-in\Administrators
    • Account Operators
    • Backup Operators
    • Server Operators
    • Print Operators
    • Cert Publishers
    • Group Policy Creator/Owners
  • All objects in the forest configuration container (with the possible exception of sites and subnets)
  • All objects in the forest schema
  • Computer accounts for Tier 0 Hosts
  • OUs that contain any of the previous objects (including parent OUs up to the domain object)
  • Group Policy Objects linked to any site or domain object
  • Group Policy Objects linked to any OUs previously described
  • All files stored on the SYSVOL share
  • Any Active Directory object that is not:
    • Manageable through these standard Microsoft tools:
      • Active Directory Users and Computers
      • Active Directory Sites and Services
    • Created by the organization and explicitly known to be a lower privilege or impact than Tier 0
Follow Administrative OU Structure

Misconfigurations or misunderstandings about the OU model can create elevation of privilege vulnerabilities to Tier 0. To prevent these unauthorized means of control, place all administrative accounts, groups, servers, and workstations into an "Admin" OU structure that is distinct and separate from the managed enterprise servers and user workstations.

Ensure that all objects are placed into appropriate OUs to receive the correct permissions and GPO settings.

Only Tier 0 administrative accounts should have permissions to modify any objects in the Admin OU structure.

AdminSDHolder ACLs Must Be Default

The AdminSDHolder object in the system container of every Active Directory domain ensures that the access control list (ACL) permissions are consistently enforced on protected accounts and groups including the Domain Admins group, Enterprise Admins group, built-in administrators, and members of those groups. The Security Descriptor Propagator (SDProp) runs every 60 minutes to ensure the permissions for these protected objects match the permissions for the domain's AdminSDHolder object.

The permissions on this object should never be modified from the default configuration.

For more information, see Best Practices for Securing Active Directory (page 43).

ACLs on Active Directory Objects Must Adhere to Tier Model

Ensure that the tier model is adhered to for all permissions on Active Directory objects, such as those for enterprise servers and enterprise workstations. Lower tier administrators should never have permission to higher tier resources.

  • No accounts other that Tier 0 administrators should be granted permissions to modify objects in a Tier 0 OU such as domain controllers or an OU under Admin\Tier 0.
  • No accounts should have the ability to impact a resource higher than their privilege tier, for example, Tier 2 Help Desk administrators should not have permission to an OU that contains enterprise servers.
Access to Stored LAPS Local Account Passwords

LAPS Local account passwords are stored in an attribute in Active Directory that is only granted to domain admins by default. Additional administrators must be delegated access to read this value by using the tools provided by the Local Administrator Password Solution (LAPS). To prevent elevation of privilege, access to read this password value must be restricted according to the Tier model.

  • Tier 0 administrators have access to password attributes for the computer accounts:
    • Tier 0 management servers
    • All administrative workstations
  • Tier 1 administrators have access to the password attribute for the computer accounts:
    • Enterprise servers
  • Tier 2 administrators have access to the password attribute for the computer account:
    • Enterprise workstations (but not administrative workstations)
Admin Accounts Restricted from Delegation    

All admin accounts need the attribute enabled for Account is sensitive and cannot be delegated. For more information, see Security Focus: Analyzing "Account is sensitive and cannot be delegated" for Privileged Accounts.

Regularly Randomize Password on Smart Card Accounts (SCRIL Cycling)

Enabling the Smart card required for interactive logon (SCRIL) attribute on an account will restrict future interactive smart card sign-ins, and set a random value in the account's password attribute.

A new random value will be generated each time this attribute is enabled, so you can cycle this attribute by periodically disabling and enabling it. This change can cause technical issues with currently open Windows sessions, so this operation should not be performed while administrators are working on tasks.

A script to perform this action should be run every 24 hours for administrators at a time when they are unlikely to be working on administrative tasks (such as 3:00 A.M. local time).

Group Policy Objects (GPOs)

A significant portion of the security configuration for Active Directory and the Windows hosts in the organization are set in the Group Policy Objects (GPOs).

Preventing and detecting unauthorized and unsafe modification of these policies is critical to the security posture of Active Directory and the organization. This is critically important for Tier 0 GPOs that include any policies linked to the domain, to Active Directory sites, to the domain controller's OU, or to other Tier 0 OUs.

All exceptions from the recommended GPO settings and permissions must include an impact analysis and proposed change that are approved by the change approval board. Examples of changes include, but are not limited to:

  • New GPOs or modification of GPOs (including modification of logon or startup scripts)
  • Deviations from the Microsoft baselines for the domain
  • Deviations from the Microsoft baselines for domain controllers, enterprise servers, or enterprise workstations

Exceptions to GPO settings for a specific set of users or workstations

GPO Adherence to Microsoft Baselines

Changes to GPO settings can have a significant impact on the enterprise security posture, which must be evaluated prior to allowing that change into a production environment. All GPOs must be compliant with the appropriate Microsoft baselines in the Security Compliance Manager (SCM).

Any exceptions to the baselines or changes to the GPOs must be assessed for threats, potential impact, and countermeasures by using the guidance in the SCM for the divergent settings.

Group Policy Permissions

Because Group Policy can grant a means of control (run code on a host or in context of an account), the security permissions for Group Policy must meet the rules of the Tier model and the organization's administrative model.

Group Policy applying and associated script content may only be modified per the following:

Policy that applies to

Can only be modified by

Tier 0 hosts and accounts

Tier 0 administrative accounts

Administrative workstations and accounts (all)

Tier 0 administrative accounts

Enterprise servers

Tier 1 (and above) administrative accounts

Enterprise workstations and users

Tier 1 (and above) administrative accounts

Note: The Tier model can allow greater GPO control by Tier 1 and 2 administrators, but in our example, CONTOSO has chosen to centralize the management of administrative workstations and enterprise workstations.

Startup and Logon Script Quality Control and Change Management

Startup and logon scripts provide the ability to automatically run arbitrary code and scripts on many hosts. All script content should be carefully reviewed and managed.

  • All startup and logon scripts should be reviewed for security risks including the ability of unauthorized accounts to modify:
    • Script content.
    • Files called or used by a script.
  • Perform a threat modelling exercise to discover other types of risks. For more information, see SDL Threat Modeling Tool.
  • All changes to the scripts should be submitted to the change approval board.

Service Accounts

A service account is created for the use of an application or service, rather than for a specific person. Service accounts are frequently targeted by attackers, and they should meet specific security standards to limit and secure their use.

No Service Accounts Should Have Tier 0 Privileges

Service accounts that are granted Tier 0 privileges introduce a significant amount of risk to the organization's mission and business. No service account should be granted full Tier 0 privileges that use built-in groups such as domain admins, enterprise admins, built-in administrators, account operators, or any other groups listed in Appendix C.

Any service account that is alleged to require Tier 0 privileges should use one of the following approaches to support the functionality:

  • Service accounts that require administrative access to all enterprise workstations or servers should manage local administrative groups for the required scope by using:
  • Service accounts that manage all non-administrative user or computer accounts in the domain should delegate permissions to manage the OUs that contain the target objects.

For more detailed instructions and options for granting the correct level of permissions to a service account, see Appendix D: Removing service accounts from Tier 0.

Any exceptions to these standards must:

  • Be submitted to the change approval board by a business unit director.
  • Be temporary, and approvals and renewals will expire within 30 days.
  • Include a Plan of Actions and Milestones (POAM) for meeting the standards within 60 days.
  • Include specific technical requirements and why alternatives are not feasible, including:
    • Permissions to only a subset of Active Directory objects (typically identity management solutions).
    • Membership for local administrator on Tier 1 or Tier 2 hosts (typically management tools).
  • Include documented risk mitigations for how the account will be managed.
Service Accounts Tracked, Documented, and Reviewed

To mitigate risk of unknown service accounts, all service accounts must be tracked, documented, and regularly reviewed for operational needs. Use a single tracking system for all service accounts to document:

  • Service account name
  • Service account group used to assign any permissions
  • Permissions assigned to service account group
  • Business purpose
  • Business risk of service accounts
  • Technical alternates to use of service account

This information must be reported to the change approval board (including all changes). All service accounts should be reviewed by the change approval board at least annually.

Service Account Passwords Complexity and Expiration

All service accounts should have passwords that meet the domain complexity requirements and length in the SCM tool.

All service account passwords should be changed at least every 90 days if they are manually managed.

We recommend (but it is not required) that you acquire and implement a tool to generate random passwords for service accounts and manage the service account password lifecycle.

Never enable the attribute Password never expires for a service account.

Service account permissions assigned through groups

All service accounts are assigned permissions via an Active Directory group. Don't assign permissions directly to a service account.

Service accounts restricted from delegation

Enable the attribute Account is sensitive and cannot be delegated for all service accounts. For more information, see Security Focus: Analyzing "Account is sensitive and cannot be delegated" for Privileged Accounts. All exceptions must be approved by the change approval board.

Service accounts monitored for anomalous logon behavior

All service accounts should be monitored for anomalous logon behavior. This can be accomplished by specifically documenting the expected logon patterns and manually identifying alerts for violations or by using tools that apply machine learning techniques for the environment. One such tool is Microsoft Advanced Threat Analytics.

Use Group Managed Service Accounts (gMSAs) instead of user accounts

When supported by the application, use Managed Service Accounts (MSAs) and Group Managed Service Accounts (gMSAs) instead of user accounts.

Windows Server 2008 R2 introduced the concept of the managed service accounts, which are accounts that are tied to a specific computer and are automatically set up and maintained with a complex password that is updated every 30 days by default. Managed service accounts are exempt from domain password policies and cannot be used to log on interactively.

Use Managed Service Accounts whenever possible so that passwords for the accounts are set and managed automatically. This mitigation is appropriate for service accounts that run Windows services, but it is not applicable for accounts that applications use to perform tasks (which require the application to store the account password).

Create and use Managed Service Accounts with the default Managed Service Account container. For more information, see Managed Service Accounts (documentation for Windows 7 and Windows Server 2008 R2) or Group Managed Service Accounts Overview.

Service account logon restrictions

To prevent adversary abuse if a service account is compromised, services accounts should be restricted to only the authentication profile they require to perform authorized tasks. This includes which hosts to access and what logon types to use. This can be accomplished by managing logon user rights for enterprise hosts with a set of GPOs or the use of Authentication Policies and Silos. For more information, see:

Certification Authority data

Certification authorities (CA) can issue PKI certificates that can be trusted to authenticate as any account. Any system administration of the CAs or access to the private keys of the CAs in any form represents Tier 0 access. All means of administrative control of these systems must be secured at or above the Tier 0 standard.

The change approval board must approve any certificates to be trusted across the enterprise and any certificates to be published to the NTAuth store.

These locations should be monitored for any unauthorized changes with available tools.

GPO Trust

Group Policy offers the ability to manage which certificates are trusted by workstations and servers in the domain. This trust can be granted at the operating system level for complete trust or for only certain purposes by using Certificate Trust Lists. For more information, see:

All trust of certification authorities must be managed at the domain level. All changes to the default trust must be approved by the change approval board.

Enterprise NTAuth Store

Publishing a CA certificate to the NTAuth store grants a level of trust beyond normal certificate trust. Any certificate directly issued by a CA in the NTAuth store is trusted to authenticate accounts in Active Directory by putting the user name in an attribute of that certificate. This is mostly commonly used for smart card authentication.

Operational Standards

Operational decisions that are made on a regular basis are critical to maintaining the security posture of the environment. These standards for processes and practices help ensure that an operational error does not lead to an exploitable operational vulnerability in the environment.

Administrator Enablement and Accountability

Administrators must be informed, involved, trained, and accountable to operate the environment as securely as possible.

Administrative Personnel Standards

Assigned administrative personnel must be vetted to ensure they are trustworthy and have a need for administrative privileges:

  • Perform background checks on personnel prior to assigning administrative privileges.
  • Review administrative privileges each quarter to determine which personnel still have a legitimate business need for administrative access.

Administrative Security Briefing and Accountability

Administrators must be informed and accountable for the risks to the organization and their role in managing that risk.

Administrators should be trained yearly on:

  • General threat environment
    • Determined adversaries
    • Attack techniques including pass-the-hash and credential theft
  • Organization-specific threats and incidents
  • Administrator's roles in protecting against attacks
    • Managing credential exposure with the Tier model
    • Use of administrative workstations
    • Use of Remote Desktop Protocol RestrictedAdmin mode
  • Organization-specific administrative practices
    • Review all operational guidelines in this standard
    • Implement the following key rules:
      • Do not use administrative accounts on anything but administrative workstations
      • Do not disable or dismantle security controls on your account or workstations (for example, logon restrictions or attributes required for smart cards)
      • Report issues or unusual activity

To provide accountability, all personnel with administrative accounts should sign an Administrative Privilege Code of Conduct document that says they intend to follow organization-specific administrative policy practices.

Provisioning and Deprovisioning Processes for Administrative Accounts

The following standards must be met for meeting lifecycle requirements

  • All administrative accounts must be approved by the Approving Authority outlined in the following table.
    • Approval for administrative privileges should not exceed six months.
    • Approval must only be granted if the personnel have a legitimate business need for administrative privileges.
  • Access to administrative privileges must be immediately deprovisioned when:
    • Personnel change positions.
    • Personnel leave the organization.
  • Accounts must be immediately disabled following personnel leaving the organization.
  • Disabled accounts must be deleted within six months and the record of their deletion must be entered into change approval board records.
  • Review all privileged account memberships monthly to ensure that no unauthorized permissions have been granted. This can be replaced by an automated tool that alerts changes.

Account Privilege Level

Approving Authority

Membership Review Frequency

Tier 0 Administrator

Change approval board

Monthly or automated

Tier 1 Administrator

Tier 0 administrators or security

Monthly or automated

Tier 2 Administrator

Tier 0 administrators or security

Monthly or automated

Operationalize Least Privilege

Achieving least privilege requires understanding the organizational roles, their requirements, and their designing mechanisms to ensure that they are able to accomplish their job by using least privilege. Achieving a state of least privilege in an administrative model frequently requires the use of multiple approaches:

  • Limit the count of administrators or members of privileged groups
  • Delegate less privileges to accounts
  • Provide privileges on demand
  • Provide ability for other personnel to perform tasks (a concierge approach)
  • Provide processes for emergency access and rare-use scenarios

This section contains the standards for achieving least privilege.

Limit Count of Administrators

A minimum of two qualified personnel should be assigned to each administrative role to ensure business continuity.

If the number of personnel assigned to any role exceeds two, the change approval board must approve the specific reasons for assigning privileges to each individual member (including the original two). The justification for the approval must include:

  • What technical tasks are performed by the administrators that require the administrative privileges?
  • How often are the tasks performed?
  • Why can't the tasks be performed by another administrator on their behalf?
  • What alternatives to providing the administrative privileges exist?

Dynamically Assign Privileges

Administrators are required to obtain permissions "just-in-time" to use them as they perform tasks. No permissions will be permanently assigned to administrative accounts.

Permanently assigned administrative privileges naturally create a "most privilege" strategy because administrative personnel require rapid access to permissions to maintain operational availability if there is an issue. Just-in-time permissions provide the ability to:

  1. Assign permissions more granularly, getting closer to least privilege.
  2. Reduce the time footprint of assigned (and exposed) permissions.
  3. Tracking permissions use to detect abuse or attacks.
Manage Risk of Credential Exposure

Separate Administrative Accounts

All personnel that are authorized to possess administrative privileges must have separate accounts for administrative functions that are distinct from user accounts.

Standard user accounts – Grant standard user privileges for standard user tasks, such as email, web browsing, and using line-of-business applications. These accounts should not be granted administrative privileges.

Administrative accounts – Create separate accounts for personnel who are assigned the appropriate administrative privileges. An administrator who is required to manage assets in each Tier should have a separate account for each Tier. These accounts should have no access to email or the public Internet.

Administrator Logon Practices

Before an administrator can log on to a host interactively (locally over standard RDP, by using RunAs, or by using the virtualization console), that host must meet or exceed the standard for the admin account Tier (or a higher Tier). This is because logging onto a host interactively grants control of the credentials to that host.

See the Mitigating Pass-the-Hash (PtH) Attacks and Other Credential Theft Techniques whitepaper (version 1) for details about logon types, common management tools, and credential exposure.

Administrators can only sign in to admin workstations with their administrative accounts. Administrators only log on to managed resources by using the approved support technology described in the next section.

Use of Approved Support Technology

Administrators who support remote systems and users must follow these guidelines to prevent an adversary in control of the remote computer from stealing their administrative credentials.

  • The primary support options should be used if they are available.
  • The secondary support options should only be used if the primary support option is not available.

Tier 1 Server and Enterprise Application Support

  • Remote server support - When remotely accessing a server, Tier 1 administrators must follow these guidelines:
    • Primary (tool) - Remote tools that use network logons (type 3). For more information, see Mitigating Pass-the-Hash and Other Credential Theft v1 (pp 42-47).
    • Primary (interactive) - Use RDP RestrictedAdmin from an admin workstation with a domain account that uses permissions obtained just-in-time from a privileged access management solution.
    • Secondary - Log on to the server by using a local account password that is set by LAPS from an admin workstation.
    • Forbidden - Standard RDP may not be used with a domain account.
    • Forbidden - Using the domain account credentials while in the session (for example, using RunAs or authenticating to a share). This exposes the logon credentials to the risk of theft.
  • Physical server support – When physically present at a server console or at a virtual machine console (Hyper-V or VMWare tools), Tier 1 administrators must retrieve the local account password from LAPS prior to accessing the server.
    • Primary – Retrieve the local account password set by LAPS from an admin workstation before logging on to the server.
    • Forbidden – Logging on with a domain account is not allowed in this scenario.
    • Forbidden
      – Using the domain account credentials while in the session (for example, RunAs or authenticating to a share). This exposes the logon credentials to the risk of theft.

Tier 2 Help Desk and User Support

Help Desk and user support organizations perform support for end users (which doesn't require administrative privileges) and the user workstations (which does require administrative privileges).

User support – Tasks include assisting users with performing tasks that require no modification to the workstation, frequently showing them how to use an application feature or operating system feature.

  • Desk-side user support – The Tier 2 support personnel is physically at the user's workspace.
    • Primary – "Over the shoulder" support can be provided with no tools.
    • Forbidden – Logging on with domain account administrative credentials is not allowed in this scenario. Switch to desk-side workstation support if administrative privileges are required.
  • Remote user support – The Tier 2 support personnel is physically remote to the user.
    • Primary – Remote Assistance, Skype for Business, or similar user-screen sharing may be used. For more information, see What is Windows Remote Assistance?
    • Forbidden – Logging on with domain account administrative credentials is not allowed in this scenario. Switch to workstation support if administrative privileges are required.

Workstation support – Tasks include performing workstation maintenance or troubleshooting that requires access to a system for viewing logs, installing software, updating drivers, and so on.

  • Desk-side workstation support – The Tier 2 support personnel is physically at the user's workstation.
    • Primary – Retrieve the local account password set by LAPS from an admin workstation before connecting to user workstation.
    • Forbidden – Logging on with domain account administrative credentials is not allowed in this scenario.
  • Remote workstation support – The Tier 2 support personnel is physically remote to the workstation.
    • Primary – Use RDP RestrictedAdmin from an admin workstation with a domain account that uses permissions obtained just-in-time from a privileged access management solution.
    • Secondary – Retrieve a local account password set by LAPS from an admin workstation before connecting to user workstation.
      • Forbidden – Use standard RDP with a domain account.

No browsing the public Internet with admin accounts or from admin workstations

Administrative personnel cannot browse the public Internet while logged on with an administrative account or while logged on to an administrative workstation. The only authorized exceptions are the use of a web browser to administer a cloud-based service, such as Microsoft Azure, Amazon Web Services, Microsoft Office 365, or enterprise Gmail.

No accessing email with admin accounts or from admin workstations

Administrative personnel cannot access email while logged on with an administrative account or while logged on to an administrative workstation.

Store service and application account passwords in a secure location

The following guidelines should be used for the physical security processes that control access to the password:

  • Lock the service account passwords in a physical safe.
  • Ensure that only personnel trusted at or above the Tier classification of the account have access to the account password.
  • Limit the number of people who access to the passwords to a minimum number to for accountability.
  • Ensure that all access to the password is logged, tracked, and monitored by a disinterested party, such as a manager who is not trained to perform IT administration.
Strong Authentication

Enforce smartcard multi-factor authentication (MFA) for all admin accounts

No administrative account is allowed to use a password for authentication. The only authorized exceptions are the emergency access accounts that are protected by the appropriate processes.

Link all administrative accounts to a smart card and enable the attribute "Smart Card Required for Interactive Logon."

Allow no exceptions for accounts used by human personnel beyond the emergency access accounts.

Enforce Multi-Factor Authentication for All Cloud Admin Accounts

All accounts with administrative privileges in a cloud service, such as Microsoft Azure and Office 365, must use multi-factor authentication.

Rare Use Emergency Procedures

Operational practices must support the following standards:

  • Ensure outages can be resolved quickly.
  • Ensure rare high-privilege tasks can be completed as needed.
  • Ensure safe procedures are used to protect the credentials and privileges.
  • Ensure appropriate tracking and approval processes are followed.

Correctly Follow Appropriate Processes for All Emergency Access Accounts

Ensure that each emergency access account has a tracking sheet in the safe.

The procedure documented on the password tracking sheet should be followed for each account, which includes changing the password after each use and logging out of any workstations or servers used after completion.

All use of emergency access accounts should be approved by the change approval board in advanced or after-the-fact as an approved emergency usage.

Restrict and Monitor Usage of Emergency Access Accounts

Followed these standards for each use of the emergency access accounts:

  • Only authorized domain admins can access the emergency access accounts with domain admin privileges.
  • The emergency access accounts can be used only on domain controllers and other Tier 0 hosts.
  • This account should be used only to:
    • Perform troubleshooting and correction of technical issues that are preventing the use of the correct administrative accounts.
    • Perform rare tasks, such as:
      • Schema administration
      • Forest-wide tasks that require enterprise administrative privileges
        Note that topology management including Active Directory site and subnet management is delegated to limit the use of these privileges.
  • All usage of one of these accounts should have written authorization by the security group lead. Email format is acceptable.
  • The procedure on the tracking sheet for each emergency access account requires the password to be changed for each use. A security team member should validate that this happened correctly.

Temporarily Assign Enterprise Admin and Schema Admin membership

This privilege should be added as needed and removed after use. The emergency account should have these privileges assigned for only the duration of the task to be completed, and for a maximum of 10 hours. All usage and duration of these privileges should be captured in the change approval board record after the task is completed.

Hardening Standards Applicability

Identity System Hosts

Active Directory and Domain Controllers

Active Directory domain controllers store a copy of the password hashes for all accounts. All means of administrative control of these hosts must be secured at or above the Tier 0 standard.

Certification Authorities

Certification authorities (CAs) can issue PKI certificates that can be used to authenticate as any account. All means of administrative control of these hosts must be secured at or above the Tier 0 standard.

Privileged Identity Management Systems

Privileged identity management systems can reset passwords and provide access to administrative accounts and groups, so they are considered Tier 0 assets. The application server hosts and workstations where PIM administrators log on should meet the Tier 0 host hardening standards.

Enterprise Identity Management Systems

Identity management systems can reset passwords and provide access to any user's account, and they are considered Tier 2 assets. Enterprise identify management systems should not be granted privileges to administrative accounts, groups, or OUs. If they are, they must be classified as a privileged identity management solution and secured at Tier 0 standards. The enterprise identify management application servers and workstations administrators log on should meet the appropriate host hardening standards.

Identity Federation Systems

Identity federation systems (AD FS) process authentications for any accounts that use the system, so they must be considered Tier 0 assets. These systems frequently host websites that are exposed to direct Internet traffic, so those exceptions must be allowed, and application-specific security guidance from the vendor should be followed.

Identity System Administrative Workstations and Servers

Administrative workstations must be secured at or above the level of the systems to be managed, so all locations that host administrative accounts for identity systems must be hardened at the appropriate Tier standard.

Infrastructure Management and Security Hosts

These standards describe the requirements that must be met for the infrastructure management and security tools that will be used to manage hosts in a given Tier. All components of a management tool must meet or exceed the hardening standards of that Tier, including:

  • Server hosts where the tool and databases are installed
  • Administrative workstations used to operate the management tool
  • Accounts with administrative privileges over the tool
  • Account with administrative privileges over the servers that host the tool, and the administrative workstations that are used to manage the tool
  • Storage devices where the tool database, server virtual machines, or admin workstation virtual machines are stored

Configuration Management

Configuration management tools allow the ability to run arbitrary code as a system on managed computers. These are classified at the Tier that they manage, and all components must be secured at or above that level.

As an example, a configuration management tool with an agent installed on a domain controller will be Tier 0.

Operational Monitoring

Operational management and performance management tools allow the ability to run arbitrary code as a system on managed computers. These are classified at the highest tier of assets that they manage, and all components must be secured at or above that level.

As an example, an operations management tool with an agent installed on a Tier 0 configuration management tool will be Tier 0.

Backup   

Backup systems provide the ability to access backup files for an operating system, the ability to back up an operating system, or the ability to restore backups of an operating system to arbitrary locations.

Any of these needs provide the ability to read and control any element of that operating system (or to copy the operating system to a back-up file), including the operating system secrets.

These are classified at the highest tier of assets that they manage, and all components must be secured at or above that level, including backup file storage systems, storage administrators and their workstations, and any administrators of those systems.

Virtualization

Virtualization tools allow the ability to fully control all operating systems and data hosted on them. They are classified at the highest level of virtual machines that are hosted on them. These provide the ability to read and control any element of the operating system (on disk or in memory), including the operating system secrets.

New technology called "shielded virtual machines" has been announced by Microsoft. Its purpose is to change this, but it is not generally available at the time of the writing of this standard. For more information, see Shielded VMs and Guarded Fabric Validation Guide for Windows Server 2016.

The virtualization components are classified at the highest tier of assets that they manage and all components must be secured at or above that level including virtual machine hosts, hosts where virtualization management applications run, hosts where virtual machine storage is managed, and administrative workstations where the admin accounts manage the virtualization solution and its storage.

Security Tools

Any security monitoring and enforcement tools generally allow the ability to run arbitrary code as a system on managed computers through directly accessible features or through manipulation of extensibility features. The virtualization components are classified at the highest tier of assets that they manage, and all components must be secured at or above that level.

Some solutions are fixed function, and they do not offer extensibility, so the risk of those tools should be assessed individually. These tools are:

  • Microsoft System Center Endpoint Protection Agent (when not managed by GPOs instead of SCCM)
  • Microsoft Windows Server Update Service (WSUS)

Security monitoring should be provided by the security team.

Infrastructure System Administrative Workstations

Administrative workstations must be secured at or above the level of the systems to be managed, so all locations that host administrative accounts for infrastructure systems must be hardened at the appropriate Tier standard.

Cloud and Contracted Services

Microsoft Azure

Much like virtualization tools, an administrator of an Azure subscription has the ability to fully control all operating systems and the data hosted on it. Because of this, the subscription is classified at the Tier of the highest level of virtual machines that are hosted on them.

If Tier 0 assets, such as domain controllers, are hosted in Azure, they must be hosted in a separate Azure subscription from the Tier 1 and Tier 2 assets.

Mitigating Software Security Risks

Microsoft Azure is a cloud service that is maintained by Microsoft, and it does not require traditional application security for the platform because tenants are not responsible for application-code security practices or software updates for the service.

The software components that are under tenant control require appropriate application security, including:

  • Securing purchased software products
  • Securing custom developed software

This is important to the risk posture of the tenant because attackers can gain control of a tenant service by attacking vulnerable applications that the tenant depends on for security assurances, such as Active Directory, federation, synchronization, and infrastructure management capabilities. Additionally, adversaries can attack applications hosted on Azure.

Securing Purchased Software Products

The security of software purchased from a vendor depends on security measures taken throughout the lifecycle, including:

  • Security purchasing policies and preferences
  • Software supply chain risk mitigations
  • Software configuration
  • Software security updates
  • End-of-life risk mitigations

Detailed guidance about software configuration, security updates, and software supply risks are in the Security Standardization sections of this document.

Following is additional guidance for purchasing policies.

Security Purchasing Policies and Preferences (Example)

The following elements represent security preferences and requirements when you are acquiring applications, devices, and services:

  • Software updates
    • Require the vendor to provide security updates for the expected operational lifetime of the product in production.
    • Prefer that security updates and patches are published automatically or on a regular cycle.
    • Prefer that the vendor provide an automatic update mechanism or integrate into an existing one (such as WSUS) rather than requiring manual downloads and installation.
  • Compatibility support processes
    • Require the vendor to maintain compatibility with new releases of the operating system and applications that it is dependent on. Ensure that the vendor has a good track record and makes a commitment to update products within six months of the release of a new operating system or application by Microsoft (or other vendors).
      This helps avoid your organization being forced to run outdated (and potentially vulnerable) operating systems and applications to support required business capabilities.
  • Authentication standards
    • Identify and document protocol standards for your environment preferably leveraging industry standard protocols. Using such standards positions you towards a single sign-on solution, and it allows ease-of-interoperability between different platforms.
    • When evaluating applications, consider requiring applications to utilize the security assertion markup language (SAML) 2.0 or the Kerberos protocol. This enables federated scenarios and increases your use of a centralized and controlled authentication platform (such as Active Directory).
  • Security guidance
    • Require vendors to publish guidance for how to securely configure and operate hardware and software products.
      • This should include guidance about all settings that have a security impact including documentation of:
        • Threats – What the setting is designed to mitigate
        • Countermeasures – How to configure the setting correctly to mitigate the threat
        • Potential Impact –The operational impact of the setting configurations
      • What tools are available to check the integrity of the system after deployment?
      • What security considerations need to be taken into account by a developer calling the system's APIs, if applicable?
      • Is the product is designed to be exposed to Internet traffic without requiring a firewall?
      • Identify what security logs are produced by the product, how they are protected against tampering, and how to access them.
    • Require vendors to minimize and document attack surface changes resulting from installing software. This can be performed by using the Microsoft Attack Surface Analyzer tool. For more information, see:
    • Require vendors to document which threats the product is and is not designed to protect itself and its data against.
  • Software digital signatures
  • Software development practices: Gather information about the vendor's development practices
    • Secure development education
      • Do you educate all engineers about security development?
      • How frequently?
      • How much time per year for each engineer?
      • What methodology is taught?
      • Who teaches the classes?
    • Development security and privacy processes
      • What security development methods do you practice? The Microsoft Security Development Lifecycle? ISO 27034? Something else?
      • Do you perform threat modelling on design? Which methods and tools do you use?
      • Do you have documented development security and privacy principles? Can you share them?
    • Development practices
      • What compilers do you use?
      • Do you enforce specific compiler defenses?
      • Do you use static analysis tools?
        • Which tools?
        • When are they run?
      • Do you have banned API requirements?
        • How are they enforced?
      • What are your cryptographic requirements?
    • Security testing practices
      • Do you perform penetration testing?
        • Who does it and when?
      • Do you perform fuzz testing?
        • What is your fuzz testing policy?
    • Security response
      • Describe your process for prioritizing and responding to security bugs.
      • Describe your process for responding to cybersecurity attacks and compromises.
      • Who should my company contact to report a security bug?
    • Code integrity
      • How do you protect source code and related artifacts (such as specifications and test plans)?
      • Do you perform background checks on your software developers?
      • Can you trace the lineage of any line of code in your software to the original developer? Can you trace the lineage of all developers who modified the code?