Turbonomic conducted this survey to better understand the role, measurement, and mitigation of latency in the modern datacenter. The investigation extended into the implementation of containers and microservices, as these architectures introduce new latency challenges. Our hope is that the results will instigate a data-driven conversation across the broader virtual and cloud community.
The data in this report were collected through a survey conducted from July 22, 2015 to August 3, 2015. The 554 survey respondents came from across the Enterprise IT and data center landscape. Respondents are of 18 years of age and older. In order to reveal the range in characteristics, respondents were identified demographically by their business and environment characteristics, such as role, business type, hosts in production, and virtual machines in production. This sample represents organizations spanning SMB to large enterprise, with various roles and responsibilities in those organizations.
This survey recruited participants from an internal email database. Participants were given an opportunity to win a pass to a major industry conference by entering their email address at the completion of the survey. Additionally, participants were given the option to participate in a one-on-one interview subsequent to completing the survey. While the survey successfully recruited a significant sample size, the distribution of the sample weighs highly in Operations Management as a role and was well-distributed across business types. Data were collected electronically through an online survey. The survey itself was designed internally by a team, which included microservices subject matter experts, namely Turbonomic engineers.
The data in this survey report were collected within a twelve day period. Progression through the twenty-nine survey questions depended on a respondent's level of interest in containers and microservices and and the current status of adoption. All respondents were asked the same questions about latency and its role, measurement, and mitigation within their organization. If a respondent was investigating containers or microservices as a current or future application delivery architecture, they were then asked about expected deployment time, perceived challenges and benefits, industry influence, as well as running these architectures in production or non-production environments.
We welcome your use of the results in this survey as you share insights with members of the broader IT community. Please reference Turbonomic and include our homepage URL, turbonomic.com, as you do so. A downloadable version of the complete dataset is available at github.com/turbonomic/turbonomicsurvey. Thank you.
In July 2015, Turbonomic conducted an industry survey titled How Are You Fighting Latency? The survey aimed to explore trends among three related themes:
Several high-level findings included the following:
The following analysis addresses these findings, their implications, and enables organizations to benchmark where they fall within the participant mix.
Latency is the time interval between a stimulation and response, or, from a more general point of view, is a time delay between the cause and the effect of some physical change in the system being observed. In computing, latency is a physical constraint determined by the distance between networked components, their physical transmission limits and the manner in which software interacts with the infrastructure on which it runs.
Although your purview may lead you to associate latency with a given entity, storage or network for example, latency is truly the sum of all operations – including overhead inherent in application code – required to transmit the encoded impulses that constitute a service. Even as advances in compute, storage, and network architectures have reduced latency from minutes to seconds and milliseconds, so too have advances in business raised the expectations and reliance upon increasingly fast application response times.
Amid this shift in business expectations, a shift in software architectures – toward virtualization, containerization, and distributed computing – has introduced a new wave of complexities that threaten the millisecond scale on which we operate, and a new wave of challenges for the humans tasked with assuring this scale.
Our survey sought to understand the prevalence and measurement of latencycritical applications in today's data centers, the methods used to assure SLAs, and the perceived efficacy of these methods.
Latency: Measurement & Verticals
Latency: Mitigation Tactics
The charts on the previous pages illustrate three noteworthy survey findings. Survey Question 11, "How does your company measure network latency?" was compared against an uncharted question (Question 10) in which 90.7% (n = 356) of respondents Agreed or Strongly Agreed that the avoidance and minimization of latency is important to their company. When this proportion is considered in light of Question 11, a full 32.3% of participants either do not measure latency or do not know if they measure latency. Given the reported importance of latency mitigation in Question 10, it is concluded that participants have either overstated the importance of latency mitigation, or that a sizable portion of respondents take a reactive approach to latency mitigation.
In Question 7, "Approximately what proportion of your production workloads are Latency-Critical (i.e. business needs cannot tolerate high latency levels)?" 22.8% of respondents (n = 356) identified a clear majority (60%-100%) of their workloads as being Latency-Critical. Of these (n = 81), 51.8% fell within the Managed Service Provider, Financial Services & Insurance, and Healthcare verticals.
The scenarios and applications contributing to this effect are rather intuitive. Service Providers, whose customers must contend with the speed of the Internet, cannot afford to harbor latency within their local network impacting service delivery. Financial Services, whose environments often consist of banking and trading applications reliant on ultra-low latency, are a logical fit. As Meaningful Use becomes a healthcare imperative, immediate access and availability of Electronic Medical Records necessitates latency-critical classification in those environments.
IT culture is inextricably bound to the architecture it supports. The days of monolithic client-server architecture were marked by appropriately monolithic teams, each fixed on its domain and its domain alone. As IT has largely transitioned to an era marked by virtualization, cloud, and
mobility, the data suggests that IT organizations have adjusted culturally also – to work across silos, reflective of their interdependence.
Our survey asked participants to rate their agreement, on a 4-point scale where 1 = Strongly Disagree and 4 = Strongly Agree, with the following statements:
For all participants, there was a majority focused on minimizing application latency across various IT domains (i.e. compute, storage, and network) (83.1%, X = 3.49) as opposed to within their specific domain (71.8%, X = 3.15). When scoped to the 81 participants with a clear majority of Latency-Critical workloads, the mix shifted to 100% (x̄ = 4.00) and 76.2% (x̄ = 3.29), respectively.
At a high level, this data supports the notion that as the imperative of low latency increases, so does the imperative of cross-functional latency mitigation. While additional surveying would be required to explore precisely what this means, it is clear that the old pattern of passing problems down the stack no longer suffices. Our hypothesis is that in addition to cross-domain cooperation, more and more IT professionals have become generalists of necessity, which is to say, no longer is it sufficient to specialize in one limited area. IT professionals must expand their skill sets to span domains and keep with the pace of architectural change. Again, this hypothesis requires additional surveying.
A primary objective of this survey was to identify and rank the methods organizations use to mitigate latency. Our findings were rather pronounced across methods, however, the most notable finding was that the most effective tactics are not practiced by a majority of organizations. This finding is most likely explained by budget, as the tactics deemed most effective also tend to be the most expensive.
Survey Question 14 asked Which tactics does your organization use to mitigate application latency (check all that apply)? and offered the following selections:
Operations) and manual troubleshooting
Survey Question 15 asked For each of the practiced tactics you selected in Question 14, please rate their effectiveness at mitigating application latency within your organization.
The top tactics used to mitigate application latency (N=356):
The most effective tactics used to mitigation application latency:
When scoped to participants with a clear majority of Latency-Critical workloads, the results were thus:
The top tactics used to mitigate application latency (n=81):
The most effective tactics used to mitigation application latency:
Of particular note is that for both groups, the full participant population and the scoped sample, the usage of Infrastructure Monitoring Software was the number one tactic used to mitigate application latency. In both groups, however, this tactic failed to make the top tactics in terms of efficacy. Notable write-ins for Question 14 included the following:
Containers are not new. In fact, their conceptual lineage dates back to the 1970s and chroot jail in UNIX, wherein an application's processes and dependencies were isolated from the rest of the system.
In 2014, the genesis of Docker catapulted containers back into the spotlight, as a prospective and seemingly inevitable replacement for their heavier cousins, virtual machines. Since Docker made its first headline, numerous production-ready alternatives have surfaced for consideration.
Similar to how virtualization abstracts the operating system away from the hardware, containerization abstracts the application away from the operating system. This concept unlocks a world of new methods for quickly developing, deploying, and delivering applications. Containers are orders of magnitude smaller than VMs, bear a fraction of the memory overhead, and can be provisioned in a matter of seconds instead of minutes.
Our purpose in exploring containers as part of this survey was to anticipate and investigate three related phenomena: (1) The continued adoption of containers in both pre-production and production (2) The driving forces behind this adoption and (3) The emergence of new latency-related challenges in containerized environments.
Microservices is an application architecture most succinctly defined as looselycoupled service orientated architecture (SOA) with bounded contexts. The term was first coined in 2005 by Dr. Peter Rogers, CEO of 1060 Research, at CloudComputing Expo during his presentation on Service-Oriented-Development on NetKernel. Specifically, Rodgers coined the term "Micro-Web-Services."
Microservices, as opposed to monoliths, relate application components as a graph of independent service-evoked functions, rather than a series of persistent and dependent tiers. Microservices deliver numerous benefits, including modularity, full encapsulation, and isolated persistence. Operationally, they enable continuous development/continuous integration, as well as independent testing, deployment, and scalability. When evoked, each microservices component communicates with its siblings laterally over the network, using language-agnostic APIs. When a component fails or requires an update, that component (rather than an entire tier) gets cloned or updated.
Our survey questioned respondents about their experience with microservices for two reasons. First, the rise of containers in production was hypothesized conducive to and/or correlated with the implementation of microservices architecture. Second, one of the greatest criticisms of microservices has been their introduction of network latency.
Critics argue that microservices trade one type of complexity - code complexity - for another - operational complexity. These viewpoints notwithstanding, our survey sought to understand microservices as it exists in the wild, at least as it does in 2015.
Our findings are consistent with prevailing commentary that despite a great deal of hype and publicity surrounding containers, very few organizations are actually using them. Question 16 asked participants, "Is your organization investigating using containers (i.e. Docker, CoreOS, LXC) for future deployment?" Of all respondents (n = 354), just 25% answered either 'Yes' (21%) or 'We already use them' (4%). An overwhelming 75% of respondents said No. While the promise and specific use cases of containers are well known, documented, and widely-accepted, the data suggest that at least for now, most organizations find their virtualized or bare metal deployments to be sufficient.
When scoped to the subset of respondents already using containers in their production environment, 42.9% (n = 14) have IT budgets of $5 million or greater and 50% have 250 or more hosts. Compared to the overall participant population, just 16% have budgets of $5 million or greater and 23% have 250 or more hosts. It is evident from the container-adopting sample that organizations with greater spending power are more prone to experiment with next-generation technologies like containerization. One must interpret this data and analysis with caution, as the sample size representative of container adopters is small.
Although at this time a majority of respondents are dismissive of containers, a similar survey in the future is likely to produce different results, especially since 84% of container contemplators plan to deploy in 2016 or beyond.
The momentum of central players like Docker and CoreOS, as well as the significant investments made by industry leaders VMware (Project Photon), Red Hat (Project Atomic), Microsoft (Hyper-V/ Windows Server Containers), and others, indicate that container technology is widely regarded as the future. It simply remains in its infancy.
59% of container contemplators (n = 71, those who responded 'Yes' to Q16) plan to adopt
Docker. 71% of container adopters have chosen Docker. Clearly, the benefits of first mover advantage are playing-out for the container leader. Of particular challenge for every standard is differentiation in a category with so little room to differentiate. Containers, by their nature, are simple constructs. CoreOS' Rocket has positioned itself as a purer, more secure alternative to Docker. Management frameworks such as Docker Hub are one means of differentiation, however, a burgeoning landscape of third party tools dilutes much of this advantage.
"lmctfy" was the only notable mention of non-Docker/CoreOS/LXC container standards under consideration.
Just 11% of respondents (n = 348) are currently running microservices in production. Given that only 4% of respondents are currently using containers, it can be concluded that a majority of existing microservices deployments are running as virtual machines (VMs).
Similar to the trends observed with containers, a disproportionately high number of respondents that have deployed microservices are from companies with IT budgets of $5 million or greater (x̄ = 32.4%). Microservices, it is said, trade code complexity for operational complexity. A portion of this operational complexity is that discrete services are assigned to many small development teams who develop, refine, and own that service. It follows logically that the organizations most capable of operating in this fashion are those with the budgets to support the requisite developer headcount.
While the primary driver for adopting both containers and microservices is Implementing Dev/ Ops, we were surprised to find that the top challenge in mitigating both container- and microservices-related latency was fighting storage latency. The top criticism of these architectures is the east-west network latency they generate. We expected our survey to support this sentiment, however, given the low rates of adoption for both, we believe that future surveying when microservices are more mature will yield different challenge rankings.