In the hybrid cloud era, networking monitoring is no longer restricted to your data centers and offices. The modern enterprise network is in your datacenters, your offices and most likely extended to multiple cloud services. The hybrid network scenario brings with it new requirements for administrators in several areas, including availability, security, and performance. It often brings a more complex network structure, including several locations and networks, with varying connectivity and security requirements.
You also no longer have full control over each part of the network. For example, if you extend your local network to Azure, you cannot see all details of the VPN gateway the Azure side. From a network security perspective, you now have many more endpoints to protect and points of ingress into your network. From a performance perspective, you can no longer rely on the local stable high-speed network, as your network now contains many different types of networks and connections of varying quality.
In this chapter, we will cover the capabilities of the Network Performance Monitoring (NPM) feature of OMS, including:
The NPM supports near real-time monitoring of network performance, collecting information like packet loss and latency, enabling you to perform diagnostics and troubleshooting of network issues. OMS can be configured to generate an alert when a configured threshold is reached in data collected by the NPM solution. The thresholds can be learned and configured automatically based on collected data or configured manually by an administrator. The NPM dashboard displays a summary of your network health. As with other solutions, you can drill into the data to find details about issues such as unhealthy network links or packet loss.
The primary method of data collection in NPM is with synthetic transactions that run every three minutes. As with other OMS solutions, all data collected by NPM is accessible with Log Analytics search. NPM uses TCP ping or ICMP ECHO to track latency and packet loss. It is only control packets that are exchanged, and not any data packets, resulting in a solution does not affect the bandwidth. The test is run every 5 seconds and sent to OMS every 3 minutes.
We will begin with a walkthrough of how to configure the NPM solution. Installing and configuring the NPM solution is divided into four major steps:
Enabling the NPM solution is done in the same way as all other OMS solutions. Simply navigate to the Solutions Gallery and add the solution. Figure 1 shows the Network Performance Monitor solution in the Solutions Gallery.
FIGURE 1. THE NETWORK PERFORMANCE MONITOR SOLUTION
When you first navigate to the NPM dashboard you will see the configuration page, shown in Figure 2. On this page, you can choose to use wither ICMP or TCP for synthetic transactions. If you select ICMP a ICMP ECHO message will be sent to estimate network latency. ICMP ECHO use the same message as Ping. If you select TCP a TCP SYN packet will be sent to the other NPM agent. The second NPM agent will reply with a TCP handshake and the connection will then be removed with RST packets. There are some things to think about before choosing protocol
FIGURE 2. NPM CONFIGURATION PAGE
If you select to use TCP for synthetic transactions, you will see a configuration steps overview page, shown in Figure 3. This page outlines all the steps needed to configure the agent, also described later in this chapter.
FIGURE 3. TCP CONFIGURATION
NPM uses the Microsoft Monitoring Agent to perform synthetic tests and collect data. It is recommended to have at least two agents on each subnet that you plan to monitor. More agents running the synthetics tests means more collected data and required storage, but also the availability of more granular data when you are troubleshooting network issues.
On the first configuration page, we choose to use TCP for synthetic tests. These tests are executed on port 8084. You can change the default port, but then you need to change the port on all agents. OMS provides a PowerShell script that will open the needed port in the local Windows Firewall, and configure a couple of registry keys. If you have other firewalls in your network or if you use Network Security Groups (NSGs) in Azure, you must manually configure them to allow the NPM traffic.
NPM uses the NPMDAgent.exe application to run synthetic transactions. The application is downloaded to the agent machine as soon as the NPM solution is added to the OMS workspace, even if the agent is not configured as an NPM node.
If you select the ICMP protocol for synthetic transactions no manual configuration is needed on each agent.
To configure a Windows Server to allow the NPM TCP traffic and be discovered as a monitoring node, perform the following steps:
It is possible to run the EnableRules.ps1 script with the portNumber parameter to specify another port. It is also possible to run the script with the DisableRule parameter to delete firewall rules.
FIGURE 4. RUNNING THE ENABLERULES.PS1 SCRIPT
The PowerShell script creates five inbound firewall rules, shown in Figure 5.
FIGURE 5. FIREWALL RULES CREATED BY THE ENABLERULES.PS1 SCRIPT
Once the solution is added, and the agents are deployed and configured, it is time to configure networks. In NPM, a network is one or more subnets, logically grouped together in OMS. You will notice there is a default network, which contains all subnets that are not specified in a user-defined network. The networks you create can have any structure, they do not have to reflect your current network layout. For example, you can base the networks on services instead of actual network structure.
When you navigate to the NPM Configuration dashboard, you will see five categories on the left side: TCP Setup, Networks, Subnetworks, Nodes, and Monitors. TCP Setup is the main configuration page, shown in Figure 6. The TCP Setup page describes the steps needed to enable and configure NPM. It also contains a direct link to the PowerShell script for agent configuration.
FIGURE 6. FIREWALL RULES CREATED BY THE ENABLERULES.PS1 SCRIPT
On the Network tab, you can configure and review networks. There is also a list of unallocated subnetworks, which are discovered subnetworks not yet included in a network. In Figure 7, you can see that all subnetworks are in a network named Default.
FIGURE 7. CONFIGURATION OF NETWORKS
The Subnetworks tab shows all discovered subnetworks. For each subnetwork, you can enable or disable the option "Use for monitoring", as shown in Figure 8. In the Subnetworks view you can also configure which nodes within the subnetwork you want to use for monitoring.
FIGURE 8. SUBNETWORKS CONFIGURATION
On the Networks page, you can click "Add network" and define a network based on the discovered subnetworks. Figure 8 shows how a network with two subnetworks is configured. In Figure 9, you can also see description on each subnetwork. The description is configured per subnetwork on the Subnetworks page.
FIGURE 9. CREATING A NEW NETWORK
In the Nodes view, you work with your monitoring nodes. In Figure 10, you can see the Nodes view including the "Use for monitoring" option. With this option, you can disable or enable nodes for monitoring. If you disable a node, which is the only node on a subnetwork, then the subnetwork monitoring will also be disabled.
FIGURE 10. CONFIGURATION OF NODES
On the Monitor tab, shown in Figure 11, you can configure monitoring rules. By default, there is a default rule that monitors connectivity between all networks and subnetworks. This default rule cannot be deleted, but you can disable it. In Figure 11 you can also see how you can configure the protocol per rule.
FIGURE 11. CONFIGURATION OF MONITORING RULES
Figure 11 shows the page for configuring a new rule. In Figure 12, you can see that user defined networks show up in this view. A good naming convention will make it easier to build new rules.
FIGURE 12. CONFIGURATION OF MONITORING RULES
The network you select in the first drop down will be the source network for the tests, such as the 'North Production Network' shown in Figure 12. Figures 13 and 14 show two different monitoring rules. The difference is the network selected in the first drop-down.
FIGURE 13. CONFIGURATION OF NEW MONITORING RULE
FIGURE 14. DEFAULT NPM DASHBOARD
Figure 15 shows the log for network node links. You can see that source network is always the first selected network.
FIGURE 15. REVIEW COLLECTED NETWORK NODE LINK DATA
When configuring networks in the portal UI, you will see a link named "Create Alerts". This link can be used to enable alert for the monitor rule you have configured. If you click the "Create Alerts" link an alert rule will be automatically created based on the monitor rule, shown in Figure 16. Once you have clicked "Create Alerts" the link will change to "Manage Alerts". You can also find the new alert under Alerts on the Settings page. On the page shown in Figure 16, you can reconfigure alert settings if needed.
FIGURE 16. ALERT RULE FOR NETWORK LINK MONITORING
When all deployment and configuration is complete, it is time to review and monitor collected data. The NPM solution includes a default dashboard, shown in Figure 17. The default dashboard provides an overview of network health and connectivity.
FIGURE 17. DEFAULT NPM DASHBOARD
The default NPM dashboard, shown in Figure 17, includes the following blades
FIGURE 18. NETWORK PATHS
In most of the default NPM views, you can click the Action tab and then enable autorefresh, as shown in Figure 19. Auto-refresh will automatically update the view with the latest information.
Note: It is important to know that auto-update is configured per view. In some scenarios, it can be misleading when one view has auto-refresh enabled and another does not.
The ability to select a snapshot on the same tab is a great capability to have when troubleshooting. You can easily review the status at a specified point in time.
FIGURE 19. CONFIGURATION OF TIMEFRAME
When there is a health event or alert, you can click on it and drill down for deeper analysis and troubleshooting. Figure 20 shows an unhealthy network link, and it seems like agents on the 172.16.200.0 subnetwork cannot connect to the 10.1.4.0 subnetwork, but other agents can. This is also an example how important it is to use multiple agents to test connectivity between subnetworks. In this example, we can see that most likely it is an isolated incident, as another communication is working in and out of both affected subnets.
We can click on the different blades, shown in Figure 20, to drill deeper into this information and use the sample queries to drill into the raw data collected, shown in Figure 20. In Figure 21 we can see when this problem started and we can also see details about which tests that are currently working to and from the affected subnetworks.
FIGURE 20. ERROR SHOWN IN THE DEFAULT DASHBOARD
These servers were running in Azure, and as OMS can collect activity logs from Azure too. Figure 21 lists activity logs from Azure. We can see for example that all changes, and in this example, someone had to change a firewall rule that blocked the traffic.
FIGURE 21. DETAILS ABOUT NETWORK SECURITY GROUP CHANGE
Microsoft has delivered several improvements in NPM in recent months. A few of the key enhancements are described here.
Many customers are using NPM in complex networks, with Microsoft Monitoring Agents installed on multiple nodes. In the past, it has been difficult to determine why an agent is not working as expected. Microsoft has now added agent diagnostic capabilities to the solution, which will help you keep tabs on any health and configuration problems with NPM agents in your network. You can now view the health of all NPM agents in a single view, find those that are misconfigured or unresponsive, and get actionable diagnostic information to resolve the issues.
NPM now provides a hop-by-hop breakdown of latency between two points in your network, on the topology map. This ability complements the other capabilities of the topology map, such as fault localization, path filters, hop compression slider, and advanced search. With latency data on each hop, you can now isolate network latency by identifying problem spots that occur along the network path.
NPM is now available in the Azure portal. You can now add NPM from the Azure Marketplace, and use the solution in the Azure portal itself to monitor your environment. You can also continue to use the solution in the OMS portal.
Slow networks can lead to slow applications and affect business-critical services. Network Performance Monitor is a solution for real-time monitoring of your network that provides monitoring, diagnostic and troubleshooting for network related issues with minimal configuration effort. As the solution does not require access to network devices, it is easy to get started. In this chapter, we have looked at what Network Performance Monitor is and how to get started. We walked through configuration steps on the agent and how to model your network in the OMS portal. Finally, we look at how to analyze the data collected by NPM.