Aligning ALM and Cloud strategies

Introduction

A government site is launched and announced on every major news network. The publicity drives significant traffic to the site. Despite being tested previously for up to 8 million users, the site starts to fail with just a fraction of that number. Tempers flare, fingers are pointed, and developers insist nothing has changed. In reality, a junior developer made a change that created a dependency on an external URL shortening service with a lower SLA, did no additional testing, and deployed it directly from his laptop. Despite all of the effort and planning for the launch, one junior developer transformed the site from being designed to scale to being designed to fail.

The CEO of a hot Silicon Valley company is reading and enjoying a very complimentary article in a major national newspaper. He hears a notification from his social media monitoring tool that there are a lot of people talking about his company. He assumes it's related to the article, but is disappointed to see that it's about an issue with the site. A developer had deployed an update to the production environment, but had done so using a configuration file intended for their test environment. The result? Millions of users having challenges because the site was now configured to use a test database instead of the one intended for the product site.

These examples of real world issues happened on cloud-based platforms, but they weren't the fault of the cloud hosting providers. These failures resulted from breakdowns in the organizations' application lifecycle management (ALM). ALM is a field that is seeing rapid improvement in the quality of tools and the strategy for applying those tools to continuous software development.  We provide an overview of current technology and strategy that will enable teams to be successful in creating solutions that are more stable and evolve faster through continuous feedback loops. Although the cloud isn't the cause of the issue, it does underscore the need for ALM. The cloud provides democratic access to massive compute power and empowers the current world of 24x7 always-on, always-connected consumers. These trends require that those organizations without ALM processes adopt them, and those that have legacy processes revise them.

As organizations adopt or adapt ALM processes to conduct their business, they can look to successful startups and consumer cloud services for insight. These organizations have pioneered approaches, some in contrast to legacy enterprise norms, for delivering solutions in the cloud with agility and efficiency.

Automation is part of the foundation of these approaches. Today, software automation enables devices and services, whether physical or virtual, to be configured, built, compiled, tested, and deployed in an automated fashion. The result is an opportunity to accelerate many aspects of the ALM lifecycle, resulting in increased speed of development and testing, reduced risk of human error, and faster time to market.

As a result, ALM has evolved to include the following processes and practices:

  • Continuous delivery. Delivering business value on a continual basis by deploying high quality solutions frequently.
  • Continuous integration. Integrating different parts of the solution in a continuous fashion ensures predictable delivery of a working product and enables you to quickly discover and correct issues.
  • Continuous deployment. Automating and streamlining the software build process to deploy solutions automatically into the appropriate software environment each time a build succeeds.
  • Continuous improvement. Learning from previous delivery cycles of a solution by receiving actionable feedback enables an agile approach to changing market circumstances and staying in tune with the needs of customers.

The increased expectations of solutions put into production forces teams to be able to detect and resolve issues in production quickly, measured by two KPIs , called Mean Time To Detect (MTTD) and Mean Time to Recovery (MTTR).

On top of having great tooling support to detect, analyze and resolve issues in production quickly and reliably, the evolution of ALM processes that deliver solutions in cloud environments has also influenced the way that architectural modeling is performed.

One area of particular importance to large scale software architectures, especially in the cloud era, is resilience modeling and analysis. Planning and testing your solution to be both available and resilient to failure. Both activities are key to delivering a great user experience and ensuring your solution will be able to live up to its potential, even in the face of failure. Practicing resilience modeling and analysis support business objectives by having a reduced functionality solution providing value between the moment of failure and the point of full recovery.

In addition, new ALM processes require tight integration and coordination between IT operations and software development teams. This requirement has resulted in the broad adoption of a new philosophy called DevOps.

DevOps is an evolved way of delivering software solutions in which teams involved in delivering and operating the solution work closely together to meet business goals. In this context, development teams and IT operations integrate and work closely together throughout the lifecycle, from planning, through development and release, up to and including operating the solution. This integrated approach is crucial to long-term success because it promotes a shared thinking about business value and allows for continuous feedback between the development and IT operations functions.

This paper highlights both considerations and opportunities for devising an ALM strategy in the cloud era. The paper also reviews the technology options that are available, including those from Microsoft and the open source community.

Business benefits

As their name implies, software developers have mostly been involved in developing software solutions. The responsibility for the infrastructure that solutions were deployed on as well as the availability of the solutions was the task of another organization, IT operations. This separation is fairly common and often results in limited involvement of the IT operations team until very late in the project lifecycle. Because IT operations personnel need to define and develop requirements for effectively operating the solution, involving them late in the lifecycle is clearly not ideal and could create unnecessary delays and rework costs.

By adapting or adopting approaches such as continuous delivery and practicing the DevOps philosophy, development and IT operations activities can be streamlined, lowering the total cost of ownership (TCO), decreasing time to market, reducing the scope of change, increasing customer satisfaction, and reducing production risk by minimizing the Mean Time To Detect (MTTD) and Mean Time to Recovery (MTTR).

The increasing number of software platforms and infrastructure platforms that expose application programming interfaces (APIs), provide the means to automate many aspects of ALM. Taking advantage of the proliferation of APIs can provide a number of benefits, from speeding up development and testing to reducing risk from human error.

With the introduction of cloud platforms, organizations are empowered to establish development and test environments with reduced lead time and cost, while the cloud platform APIs provide additional value to enable consistent provisioning of these environments. Enterprises can incorporate this capability into their overall ALM strategy and see the benefit of increasing team productivity from project start to delivery. In engagements with customers, Microsoft has observed that these capabilities and benefits often result in increased team confidence in their solutions. Teams now have greater agility in terms of releasing new versions of their services that are built atop cloud platforms.

Components of an ALM strategy

A solid ALM strategy includes practices that help enable the organization to deliver higher quality products at lower cost and with more agility.

Continuous delivery builds on the historical approach to ALM by streamlining the production process while DevOps makes its way to center stage, helping to enable a faster pace of deployment and reducing operational friction between teams.

Continuous delivery

Today it has become imperative for development teams to ensure they can deliver value to their customers quickly and reliably, learning by releasing continuously in small increments. Continuous delivery is an approach in which a number of techniques work together to ensure new features and fixes are delivered in a timely and predictable fashion at low risk and with a minimal need for manual intervention.

Continuous delivery helps ensure you are building the right solution as your stakeholders have insight into what they are receiving at any time, which allows you to pivot your approach if necessary.

Figure 1. Continuous value delivery

Fundamental aspects of continuous delivery are:

  • Automation. Reduce costly and errorprone manual intervention for building, testing and deploying processes, enabling consistency and increasing integrity and efficiency throughout the lifecycle.
  • Continuous integration. Enable a constant development feedback loop to reduce integration efforts and help ensure a stable solution.
  • Continuous deployment. Enable a deployment loop after each build to reduce the time to push out enhancements and defect fixes as a way to provide more stable releases.
  • Continuous improvement. Collect usage telemetry that allows your teams to understand the next set of investments that is new features, discoverability of features, critical technical debt or deprecate unnecessary services or features.

Automation

Automation reduces the potential for human errors across the development process. It supports a predictable and repeatable process of building and releasing a solution while increasing the speed at which this can be achieved.

Because increasing numbers of APIs are being exposed by services, devices, and infrastructure, there are many opportunities to take advantage of automation.

Infrastructure automation

Infrastructure automation delivers automatic provisioning and configuration of standardized environments that consist of the various components of a specific solution, such as compute, storage, and identity. This automation dramatically reduces lead time in infrastructure acquisition and provisioning for the development team while building trust between developers and IT pros through ensuring the development team adheres to corporate standards. If used to provision duplicates of solution dependencies such as ERP or CRM instances, infrastructure automation can also remove the planning efforts and associated lead time often involved with testing solution integration.

Deployment automation

Setting up the environment to automatically deploy the latest build on a given schedule or in a continuous manner allows tests to be automated and enables project stakeholders to always see the latest version of the solution in development. Deployment automation should ensure the environment ends up in a state that reflects what is desired in production. It should include deploying all elements that are part of the solution, such as databases of relevant data, configuration of external product dependencies, and authentication.

Test automation

Given the available studies and anecdotal evidence about the cost of software defects and the potential impact of these defects, testing is a necessary aspect of software development.

Tests enable the development team to continuously validate they are delivering the expected quality, help them avoid regression defects, and ensure they continue to meet requirements.

By running tests (unit tests, module tests, integration tests, functional tests, and non-functional tests such as load and performance tests) as part of an automated cycle, you ensure that tests can be executed quickly and that they are repeatable. Running tests as part of a continuous cycle helps to find defects earlier and fixed usually at significantly lower cost , than when discovered later in the lifecycle.

Repetitive and complex task automation

As solution complexity and scale increases, managing systems in a manual fashion is no longer an option. Automation of repetitive and complex tasks such as creating databases, loading reference data, and setting permissions for folders allows for reduced human error and shorter delivery time while increasing the stability of the solution development process.

All of these automation elements work together to ensure that each change within the solution is tested before being deployed into production. This continuously repeating process increases the level of confidence in the deployment process for both the development and operations team.

Continuous integration

Components that are developed by different members of the development team often have dependencies on each other. Ensuring that they integrate correctly is vital to a successful result; if development efforts grow too large, defects can occur that can be hard to triage and hinder project progression.

As developers work on the features or changes to the solution they set out to implement, they do so independently of others, taking the current code base as a starting point. As other developers commit their changes to the repository, this code base over time no longer reflects the repository. The longer a developer works in isolation, the bigger this disconnect becomes. When finally merging changes back into the shared repository, this can be a time consuming task and pose a risk for the stability of the code base.

Continuous integration is the practice of merging all developer work into a shared repository at a frequent pace. It enables teams to catch integration issues quickly and deliver working solutions of higher quality more rapidly. Depending on the size of the project team, your organizational experience with setting up continuous integration, and the amount of time allocated to enabling it, you could use a phased approach in which build automation, unit test automation, deployment automation, and integration test automation are introduced in an incremental manner.

Continuous deployment

Continuous deployment is a process by which software is deployed continuously, possibly even several times throughout the day, in iterations measured in minutes as opposed to days, weeks, or months. Based on a lean technique called continuous flow, continuous deployment can be designed to accelerate productivity by rearranging software deployment processes from a batch and queue approach to a single step flow that focuses on eliminating waste in each step of the process.

The main goal is to eliminate time wasted from "waits" produced from moving from one step to another: waiting to code, waiting to test, and waiting to deploy. Reducing or eliminating these waits by automating the deployment process leads to faster iterations, which is the key to successfully implementing continuous deployment. Using a concept called feature switches, newly added capabilities can be tested on a subset of the intended audience for the solution, allowing fine-grained testing in production. Feature switches further reduce risk and allow large cross-sprint features to be released in a manageable way.

Continuous improvement

For every organization, it's important to continuously adapt to customer needs and ensure that solutions work as envisioned. Continuous improvement is the practice of this adaptation, supported by constant monitoring and -telemetry-driven insights that allow developers to learn from the ways that solutions are used and the ways that they function.

To enable the resolution of production issues and defects in a correct and timely manner, it is of critical importance that actionable feedback about the details of those issues flows from IT ops back to the development and associated business teams.

Telemetry

Many software projects have focused solely on the implementation and launch of software that meets the functional requirements set by stakeholders. Concerns about providing deployment infrastructure and operating the solution were left to IT operations.

This disconnect between the software development team and IT operations can often result in unnecessary technical debt, where issues and defects in an operational solution arise because of insufficient engineering effort applied to ensuring the solution was designed for operations. In addition to this debt, business owners of applications have had little or no telemetry in place to monitor software usage or the organization's key performance indicators.

Four important types of telemetry to consider include:

  • Infrastructure. Capturing the health of the underlying infrastructure the solution depends on for functioning correctly. Example metrics include CPU utilization, memory pressure, and network utilization. The main goal of capturing this information is to monitor available capacity, health, and resource consumption for operating according to specification.
  • Application. Capturing the health metrics that are specific to the application to troubleshoot issues, find defects, and capture early warning of application failure. Some example metrics for application health include dependent services called, SQL logins, API calls, queue depth, errors that occur, and external service availability and reliability.
  • Business. Capturing metrics that relate to business metrics and KPIs enables solution stakeholders to gain insight into trends and actual usage of the solution, which allows them to plan and act on changing market conditions faster. Information within the business insight realm includes the number of active users, their device usage, geography, and trend analysis.
  • User. Capturing information about user actions enables the tailoring of the solution to specific user needs, delivering an individualized experience, and allowing for selling and marketing opportunities. On top of the user behavior, there is also an opportunity to capture the user experience through client-side instrumentation and outside-in monitoring.

Capturing all these metrics provides insight into the various perspectives of the solution.

When establishing a telemetry strategy, it is important to understand how it will be used and what activity it will be used in. The most common usage scenarios include:

  • Inform. By having a dashboard that visualizes the metrics in a useful manner, stakeholders can see usage patterns, identify utilization trends, and see important statistics that relate to consumption of the solution offered. These abilities allow them to forecast demand and behavior of the solution.
  • React. Thresholds can be set to inform operators, which allows proactive intervention to maintain or restore solution health. This capability leads to quickly resolving or, ideally, preventing production issues.
  • Explore. The captured information can also be explored in more free-form ways, which allows for ad hoc exploration through different pivots or perspectives that weren't previously apparent.

Certain types of dashboards have traditionally been tailored towards and consumed mostly by IT operations personnel. As we move into adopting DevOps, the solution quality and behavior in production becomes a shared responsibility throughout the lifecycle. It is critical that these dashboards are shared between these teams for a faster identification of systemic issues and resulting fixes to the solution.

DevOps

Historically, a number of the ALM practices have focused on planning and development activities with an emphasis on functional quality. Poor functional quality adversely impacts the value delivery cycle because of time spent fixing defects. In the same way, poor nonfunctional quality (implementing nonfunctional requirements, instrumentation, and the manageability of a solution in production) affect the user experience and business benefits. Poor nonfunctional quality often results in issues that are usually not found until the operate phase.

As stated earlier, DevOps is all about extending the agility gained in dev/test to IT operations, which requires that every part of the organization works more closely together to meet business goals. In this context, development teams and IT operations are integrated and engaged throughout the lifecycle. This integration is crucial to long-term success because it promotes a shared thinking about business value and allows for continuous feedback between the development and IT operations functions.

Figure 3 – Application Lifecycle Management overview

By working together and applying DevOps practices, a level of trust and transparency is established between development and operations teams and in the solution being built. This integrated approach can enable a reduced time to market and result in higher quality solutions.

Employing a DevOps approach touches the layers of people, process, and technology.

Essential components for success in a DevOps environment

DevOps will continue to evolve, adding new behaviors and technology support. This section describes some of the essential components of a successful DevOps approach.

Figure 4 - Essential components for success in a DevOps environment

People

DevOps starts with a mindset that has everyone clearly focused on the business value and a solution responsibility that transcends code complete. Working in collaboration between the various disciplines involved and doing so with a clear purpose helps lay the groundwork for a successful and timely delivery of high-quality solutions.

Within the People layer, we identify the top priority for every action taken to be that of adding business value. It starts with organizational leaders to shift focus towards shared value propositions, moving from "code complete" and "zero defects in UAT " to "onboarding 4-6 partners a month".

If an activity, optimization, rewrite or feature does not add business value, it should be reviewed or abandoned. In order to effectively and continuously deliver a high quality product, a strong sense of collaboration between all people involved, on equal footing, is needed.

In the past, code complete was the point at which a handover of the produced artifacts would be performed, and the team would move on to other things. In the era of continuous delivery, this approach needs to move into one where people involved have a responsibility for the product throughout its lifecycle. This sense of responsibility in turn raises awareness of the impact of choices during the process of creating or enhancing the product, which leads to better quality results.

A well known fact of operating systems, especially at large scale, is that hardware will fail, software will fail, and people will make mistakes. Failures are a fact of life, the main point is learning from them and trying to prevent them from occurring again. The last, important, behavior to ensure DevOps success from a People perspective, is focusing primarily on an issue when it occurs, not who might have caused it.

Process

On top of the people foundation, there are key elements that pertain to the process of delivering these solutions. Involving nonfunctional and operational requirements from ideation onwards ensures delivery of a manageable solution and minimizes the technical debt that is accrued.

Determining the next set of investments by surfacing actionable usage data allows for continuously evolving the solution in short cycle times, supporting the adoption of rapidly changing business needs.

By practicing simplicity over elegance helps reduce complexity and make it easier to allow others to find defects. Lastly, ensuring releases are developed in small increments enables continuous improvement, and ensures changes have minor impact in both functionality and integration with other components within the solution.

Technology

The technology perspective of DevOps has key principles and practices that include automating the pipeline to further shorten cycle times and increase predictability as well as treating the runtime environment of a solution (its configuration and hardware ecosystem) as foundational to its success, requiring the team to treat the environment and configuration as code from both a change management and source control perspective.

As the reach of solutions broadens and the impact of outages deepens, architecting for resiliency becomes key in ensuring customers have the most optimal experience possible.

Instrumenting a solution is key to measuring both the technical (what is going wrong) and business (what functionality is used in which manner by users) perspectives of a solution. By standardizing build, packaging and deployment activities, the reliability and speed of product delivery can be increased.

Modeling and fault injection testing

Failure is a fact for every solution of significant scale. The spectrum of failure ranges from common failures to infrequently occurring failures. As failure is inevitable, fault tolerance techniques should be incorporated into the service design to reduce the impact when failures occur.

The process of protecting against failure start with defining what failure entails from a customer and solution perspective, and determining what acceptable levels of failure would be. It might be acceptable to have the social sharing capabilities of your solution be unavailable for some time, but the browse and order capabilities might need a high level of availability in an acceptable form for the solution to provide business value.

To ensure customers receive the best possible experience from the solution being built, it is critical to create and maintain a model that details the ways in which it can fail and actions that can mitigate the impact of the failure. Microsoft calls this approach Resilience Modeling and Analysis (RMA).

A practical test for this modeling is fault injection testing , which tests whether the system can withstand failures that result from underlying infrastructure issues. Use of fault injection testing, now a proven practice, has grown as organizations increase the scale of their cloud solutions and dependencies upon external platform providers.

Modeling and design

Because of the ever-increasing number of people and devices that connect to your solution, it is important that it be designed for scalability, availability, and resiliency. As part of this endeavor, RMA is used to:

  • Model the various points within the architecture of your solution at which failures might occur.
  • Document the projected impact.
  • Define countermeasures to ensure failures do not occur or have a minimal impact on the users of the solution.

In addition to RMA, there are a set of other up front modeling and design activities that can better inform the architecture at the beginning of a project. We recommend these activities, as they will help to optimize the solution and prioritize fault remediation.

The combined set of activities should include:

  • Decomposing the application by workload. Applications are typically composed of multiple workloads. Each workload provides a function to the solution, often described in user stories, such as browsing a catalogue, adding products to the cart, and checking out. These workloads can, and often do, have different nonfunctional requirements and business criticality. Decomposing a solution by workload allows an organization to gain agility: the freedom to make technology choices that are appropriate at the workload level to control costs.
  • Establishing a lifecycle model. An application lifecycle model defines the expected behavior of an application when operational. At different phases and times, an application will have different usage requirements and put different demands on the system, either at a functional or scale level. Whether there are monthly, weekly, special day/holiday, or intraday patterns, the lifecycle model(s) will capture and reflect this usage. The lifecycle model will be a key input in establishing both an availability model as well as a plan for prioritizing remediation of identified failure modes.
  • Establishing an availability model and plan. After a lifecycle model is identified, the next step is to establish an availability model and plan. An availability model for an application identifies the level of availability that is expected for a workload. It is critical, as it will inform many of the decisions made during development of the solution.
  • Identifying failure points and failure modes . To create a resilient architecture, it's important to understand it. Specifically, understanding requires a proactive effort to determine and document what can cause an outage. Understanding the points within a solution that could result in a failure and the resulting failure modes enables the team to make specific, targeted decisions on strategies for resiliency and availability during the development phase.

Following these steps enables the team to make informed decisions and appropriately prioritized remediation actions to help ensure the solution will be available for use and resilient to failure.

Technologies

Within the ecosystem that includes ALM, continuous delivery, and DevOps there are many different solutions developed across a mix of technology providers and community contributors.

This section will review cloud computing, relevant service models, some popular third-party and community software and the technologies that make up the Microsoft integrated approach.

Cloud computing as an enabler

Cloud computing enables a shift, whether in whole or in part, from a traditional capital expenses (CAPEX) approach of purchasing infrastructure for an on-premises data center. That shift is to an operating expense (OPEX) model in which the organization pays only for the infrastructure that they use. The cloud can provide a potent combination of increased agility, reduced risk, and lower costs.

Cloud environments will have one or more service models: infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS).

Infrastructure as a service (IaaS)

With the availability of cloud computing and its API-driven provisioning and configuration for virtualized server infrastructure, it becomes easier and cost-effective to set up development, test, and production environments using IaaS. The ability to obtain access to capacity in minutes reduces the lead time for these environments. In addition, the ability to script these environments combined with a pay-as-you-go model makes it reasonable and affordable to set up multiple environment types (development, test, production) in an automated and consistent fashion.

IaaS also allows for quick architectural validation and reduces the number of servers that need onpremises configuration and maintenance, which drives down IT operational costs and allows IT to focus on value-added services to the business. The ability to provide new members on the project team with a readily available and isolated learning environment that represents the solution increases their understanding and helps them be effective in less time than was previously needed.

Platform as a service (PaaS)

PaaS offers additional opportunities to automate many of the aspects of your solution such as configuring identity management, creating websites, setting up databases, deploying cloud services, and more.

These opportunities empower you to set up the production pipeline for your project with a limited amount of human intervention and enable continuous delivery.

PaaS provides the platform, which will typically include operational maintenance and a service level objective that the platform provider will deliver.

Software as a service (SaaS)

Beyond these capabilities, SaaS removes the need for managing the infrastructure and configuration of components traditionally offered on premises, such as version control. SaaS from a DevOps perspective can provide capabilities that are used during the application lifecycle.

With SaaS solutions such as Visual Studio Online, you can use productivity tools like Team Explorer Everywhere, Git, Team Room, Application Insights, hosted build controller and cloud based load testing. that allow your teams developing cross platform solutions to be more efficient by helping them plan, test, build and deploy in the cloud and collaborate using SaaS Services.

Hybrid

As a deployment model, hybrid cloud enables you to safely share and extend resources between your onpremises environment and a public cloud (PaaS/IaaS). It provides an additional layer of control, and allows for the use of the proven security controls and monitoring tools that are already used on-premises.

Hybrid cloud allows for the rapid creation of shared development, test, and production environments that solve on-premises dependencies such as connectivity to data repositories and identity solutions (for example, Active Directory).

Hybrid cloud starts with an on-premises environment that is connected (via a VPN) to a public cloud. What resources are shared between the two environments depend on the needs of the solution being build.

The Microsoft integrated approach

The preceding sections have discussed key considerations, approaches, and opportunities for an organization's cloud-enabled ALM strategy. Bringing these strategies together and realizing their potential can be achieved using Microsoft technologies. This section discusses the Microsoft proposition.

Defining the modern application lifecycle

As shown previously, the modern application lifecycle entails four stages:

  • Plan. Capture requirements and plan the solution using the backlog combined with visual storyboarding UI prototyping capabilities in Microsoft PowerPoint.
  • Develop + Test. Materialize and test the solution using Microsoft Visual Studio and Team Foundation Server or Visual Studio Online for solution development and Microsoft Test Manager for lab management and test execution.
  • Release. Integrated release management and configuration-based deployments are key to a stable and predictable continuous deployment cadence. Release Management for Visual Studio provides tooling for release pipeline management and automated deployments.
  • Monitor + Learn. Manage the solution using Microsoft System Center, Application Insights, and Team Foundation Server or Visual Studio Online for managing continuous feedback, and Intellitrace for capturing actionable information when defects are discovered.

Figure 5. The modern application lifecycle

Microsoft provides tools for the entire modern application lifecycle. The next section provides more details about these tools and technologies.

Microsoft Tools and Technologies

First, let's illustrate which of the many Microsoft technologies support the described organization's cloudenabled ALM strategy.

Windows PowerShell

As more and more components have APIs, scripting repetitive and complex tasks becomes both feasible and important. Windows PowerShell is the Microsoft command-line shell and scripting language that is specifically designed for system administration. Key Windows PowerShell concepts are shown in the following figure:

Figure 6. Key Windows PowerShell concepts

  • Cmdlets. Provides lightweight commands using a verb-noun construct (Get-AzureLocation, Set-AzureVNetConfig), typically to execute an action and return a status. Cmdlets are written in .NET (C# and VB.NET) and are available for many different tasks and products, such as Microsoft SharePoint, Microsoft Exchange, Microsoft Azure, and many others.
  • Modules. Extends the core functionality of Windows PowerShell. Modules provide a way to group related functionality, and can be manifested in the form of script modules, binary modules, or manifest modules.
  • Providers. Used to access data and components not normally easily consumed from the commandline. Providers act like file system drives. Built-in providers include the certificate store, environment, and the registry.
  • Host applications. Windows PowerShell allows re-use by other applications by hosting its engine to invoke PowerShell functionality. Two host applications are provided: powershell.exe (the default hosting application) and the Integrated Scripting Environment (ISE), with the latter supporting the creation, saving, and interactive debugging of scripts.
  • Workflows. Using Visual Studio, Windows PowerShell cmdlets can be sequenced in XAML workflows.

Desired State Configuration

Introduced with PowerShell 4.0, Desired State Configuration (DSC) is a management platform in Windows PowerShell that enables deploying and managing configuration data for software services and managing the environment in which these services run.

DSC provides a set of Windows PowerShell language extensions, new Windows PowerShell cmdlets, and resources that you can use to declaratively specify how you want your software environment to be configured. It also provides a means to maintain and manage existing configurations.

Figure 7 - Example DSC script

The platform can be used to perform many tasks, such as enabling or disabling server roles and features, starting and stopping services, managing user accounts, and deploying software.

Because DSC is able to discover the actual configuration state of nodes (target computers or Virtual Machines) it manages, it is able to fix a configuration if it has drifted away from the desired configuration state (which could happen through manual intervention on the machine).

Microsoft Build Engine

The Microsoft Build Engine (MSBuild) is a platform for building applications. Used by Visual Studio, MSBuild is completely transparent about how it processes and builds software, and enables developers to orchestrate and build products in build lab environments where Visual Studio is not installed. MSBuild uses an XML file to describe what it is to do when building source code.

Key MSBuild concepts are:

  • Tasks. Executing actions is key to any build engine. MSBuild uses tasks58 for this purpose. A task is a unit of executable work, used by MSBuild as an atomic build operation. Examples of built-in tasks include MakeDir for creating a folder, Copy for copying files, and Csc for compiling C# source code. MSBuild allows for custom tasks by implementing the ITask interface.
  • Properties. Name-value pairs that are used to configure builds by passing information to tasks, evaluating conditions, and storing values references throughout the project file run by MSBuild.
  • Items. Usually relate to files and represent input to the build. Items are grouped into Item types based on their element names. These groups can be referenced throughout the project file and used as parameters to tasks.
  • Targets. Targets group tasks together and allow for the build to be segmented into smaller units. These smaller units can be executed in isolation from the command-line. An example of a target would be "clean," which removes any previous output from a specified folder.

Visual Studio

Visual Studio has a long history of enabling architects, developers, and testers. Its family of products include the following main components:

  • Integrated development environment. Enabling an integrated team development experience, the Visual Studio IDE provides support for solution architecture, development, testing, and integrating version control by using Team Foundation Version Control or Git.
  • Team Foundation Server (TFS). The collaboration platform at the core of the Microsoft ALM solution. TFS supports agile development practices, multiple IDEs, and platforms. It provides version control, work item tracking, bug tracking and the 'build, deploy and test' workflow can build the solution, deploy it into a test environment, run defined tests in an automated fashion, and report on the results. Team Foundation Build orchestrates the build process with MSBuild performing the various build tasks.
  • Visual Studio Online. Providing a fully featured version control system which supports both Team Foundation Version Control as well as Git, Visual Studio Online is a cloud service which removed the need for infrastructure deployment and associated time loss. It allows global teams to be productive quickly, planning in an agile fashion using Kanban boards and communicate effectively using team rooms, supports automatically triggered builds, as well as the ability to perform cloud-based load testing, which allows for simple and elastic load testing without the need to set up, configure, and maintain a testing infrastructure. With the recently announced OpsHub Online Migration Utility, moving source code, history and work items from on-premises Team Foundation Server to Visual Studio Online is easier than ever. With the ability of mapping users between environments, artifact ownership and traceability can be retained. On top of this, Visual Studio Online provides APIs and service hooks which allow 3rd party integration, making it easier for organizations to adopt Visual Studio Online without abandoning tools they're using today. The APIs also enable developers to build applications on any platform that can consume Visual Studio Online services.
  • Test Manager. Provides the ability to run automated tests in a lab environment that are provisioned from templates associated with your solution.
  • Release Management for Visual Studio. Provides the ability to manage releases for custom applications to any environment and enables the definition of the release pipeline though the software development lifecycle (SDLC) with stages, acceptance criteria and approvals, and configuring an automated deployment pipeline for each application, all while tracking trends in release successes and failures. All of these capabilities help automate the development-to-production release process of Team Foundation Server.

Microsoft Project Server

Microsoft Project Server can be integrated with Visual Studio Team Foundation Server to enable project managers to gain insight into the progress of ongoing projects, help them understand how they support the business needs, and help identify ways to improve existing processes.

System Center

System Center is a platform that offers comprehensive management of applications, services, physical resources, hypervisors, software defined networks, configuration, and automation in a single offering. It is used to manage data centers, client devices, and hybrid cloud IT environments.

Some of the key components of System Center that enable modern ALM practices, issue detection, and production feedback are:

  • Virtual Machine Manager. Manages the virtual machines and templates that are used by TFS to provide automatic provisioning of test environments when running on-premises. Virtual Machine Manager (VMM) uses the concept of services. In VMM, a service is a set of virtual machines and their software, all working together to provide functionality, such as a deployed multitier line-of-business application. VMM allows for the creation of service templates, which define the configuration of a service. It contains information on the virtual machines used to run the service, which applications to install as dependencies, and the networking configuration needed to run the service. When combined with Microsoft Test Manager, VMM can be used to provision a test environment that is suitable to the exact needs of the solution quickly, storing lab environment state if issues are encountered and running multiple copies of labs. For a hands-on experience with System Center Virtual Machine Manager, see Test Lab Guides: System Center 2012 SP1 - Virtual Machine Manager.
  • Operations Manager. Provides operational insight into the production environment and allows production issues to be input to TFS for the development team to engage, which enables continuous improvement.
  • Orchestrator. A workflow engine for operations staff, allowing for the scheduled execution of packages called runbooks. It allows teams to automate the creation, monitoring, and deployment of resources.
  • App controller. Manages applications across the private cloud and Windows Azure platform from a single console. Application components can be managed in the context of the service that it represents to the business, which allows for the management of services rather than servers.
  • Global Service Monitor. Constantly checks the production environment from multiple locations worldwide. Global Service Monitor (GSM) is integrated into the Operations Manager environment and feeds detected SLA issues into the backlog for timely analysis and resolution by the development team. GSM allows for the use of Visual Studio Web Tests, and can execute the tests used by the team during solution development to validate production health based on deep solution engineering expertise.

Figure 9. Realizing predictable applications by combining Visual Studio and System Center

In realizing predictable applications, different technologies are combined in a complementary manner:

  1. Web tests created during development can be imported into Operations Manager, fostering engineering experience the team has gained from implementing the solution.
  2. The web test is deployed into Global Service Monitor (GSM).
  3. GSM uses its worldwide presence to continuously execute the tests against the production application.
  4. SLA information is sent back to Operations Manager, which provides an overview of application performance and health to the IT operator.
  5. IT operations can assign any SLA issues to engineering by creating a work item for the development team, including IntelliTrace information for root cause analysis.

Azure Resource Manager

Providers of cloud services have traditionally focused on bringing highly available and scalable services to customers. In this effort, much of the focus has been on the basic building blocks of applications, compute power, network connectivity and storage capabilities. Building applications which utilize these resources, parallel to the increasing size and complexity of solutions, has become a challenge for many customers. The problem extends beyond managing the resources into the space of reusable components, centralized or policy driven configuration and versioning of deployment topologies.

The challenge is one of Application Management, including deploying and orchestrating groups of virtual machines, or combinations of PaaS resources.

Deploying multi-tier applications often requires custom build scripts which can contain bugs and need to be maintained, often without clear rollback mechanisms and restart-ability, leading to fragile and risky deployments.

Announced during Build 2014, Microsoft has created a declarative and extensible approach to composing groups of resources which make up applications. Named Azure Resource Manager, this approach allows users to define templates, called Azure Templates, in Javascript Object Notation (JSON), referencing resource items defined in other places for easier reuse.

Figure 10 - Moving from Resource Centric Views into Resource Groups

The combination of resources is called a Resource Group, which acts as a container for the resources, and a unit of management for operations, easing activities such as deployment, updates, monitoring, and lifecycle management, simplifying DevOps activities.

Azure Resource Manager allows developers and IT pros to work closely together on applications and combines the power of Infrastructure as a Service and Platform as a Service to quickly and easily create applications on Microsoft Azure.

Application Insights

Figure 12 - Example Application Insights dashboard view

Historically, getting information from production applications has been challenging, limiting the ability of the business to adjust to changing market demand and the DevOps team to proactively manage the application and resolve bugs before they become hard to manage issues. Instrumentation of applications was often times a hard nut to crack, and adding too little would not provide enough information, while adding too much would strain resources and create performance issues.

Application Insights , currently in Preview, allows for the discovery of popular features in applications, determining the way users work with the application80, as well as monitoring its health and availability.

When developing new features, Application Insights can be used to measure usage patterns in Canary testing81, exposing the new feature to a gradually expanding subset of users, measuring its effectiveness. When trying to decide on a variant of a new feature, use A/B testing82 with Application Insights, exposing slightly different variants to user focus groups and finding out which is more successful.

As a cloud based offering, Application Insights allows you to leverage the capabilities of System Center Global Service Monitor for your public web properties. By providing a web based portal software teams can be provided with the autonomy they need to successfully monitor their applications.

Given the capabilities of Global Service Monitor within Microsoft System Center, a comparison of the two options from various viewpoint provides value when selecting either or both to help with realizing an updated ALM strategy. In case applications span both cloud and on premises, a hybrid scenario would potentially include both products in order to yield an effective insights solution.

Azure Automation Services

Figure 13 - Integrated authoring, management and testing of PowerShell scripts within Microsoft Azure Automation Services

One of the foundational elements of enabling DevOps is automation. Automation removes manual steps in the process of ALM and enables repeatability and consistency across resources, improving efficiency on many levels. A service available in Microsoft Azure, called Azure Automation Services, provides automation capabilities to easily operate against cloud resources. Azure Automation Services allows PowerShell runbooks to be executed.

Azure Automation Services provides capabilities such as:

  • Centralized storage of runbooks, allowing coauthoring
  • Scheduled execution, allowing for operations to be repeated irrespective of client machine availability
  • Recoverable infrastructure so a checkpoint can be captured and restartability is achieved (e.g. not creating a Virtual Machines multiple times in case of machine failure during execution)
  • Global resources (assets), abstracting key information such as credentials, allowing easy re-use across scripts.
  • Browser integrated authoring experience, allowing authoring, testing and publishing scripts.

Enabling modern ALM with Microsoft technologies

Given the power of the identified products and the level of integration between them, Microsoft delivers the right mix of technologies that can be easily integrated to deliver the necessary capabilities outlined in this white paper.

Visual Studio enables continuous integration, automated testing, lab management and continuous deployment when combined with Team Foundation Build or Visual Studio Online build service, and both System Center and Application Insights enables IT operators to set solution-specific thresholds that trigger alerts for production systems.

If an issue is detected in production, it can be easily diagnosed by capturing data from the production system using IntelliTrace , and feeding this data back to the development team. The captured data provides the development team with actionable and contextual information about the issue and allows them to debug the issue in their development environment.

With System Center Monitoring Packs that span public, private, and hybrid implementations, System Center can facilitate monitoring an organizations infrastructure and applications from a "single pane of glass".

With the integration of Team Foundation Server and System Center Operations Manager, production issues can be effectively reported back to the development team and placed on the backlog for prioritization and fixing. For Microsoft Azure based solutions, Azure Resource Manager adds templating capabilities and holistic insight for a clean and continuous DevOps lifecycle.

Application Insights provides detailed information to inform business users and service teams alike on a wide range of aspects, including usage, availability, and performance, while allowing developers to effectively debug production issues.

Conclusion

A number of lessons can be learned from the ALM and DevOps approaches of startups and consumer cloud services. These learnings, technologies, and approaches can assist either the definition or evolution of the ALM strategy for an organization.

Although there are a number of third-party and community technologies in the marketplace, many vendors and community contributors concentrate on only a single ALM focus area. Microsoft stands apart in having a holistic vision that is realized across a set of integrated technologies and products. Where it can be hard to pin point support for a composed ALM solution, Microsoft provides for a single source for the entire lifecycle.

Microsoft Services can help establish an effective strategy for your ALM lifecycle and provide strategic direction, implementation guidance, delivery, and support to help organizations realize ALM strategy in their software development.

For more information about Consulting and Support solutions from Microsoft, contact your Microsoft Services representative or visit www.microsoft.com/services.