Troubleshooting Performance Analysis
There are a number of common causes of test failure which may result in the location heat tile going red or tests not being created on a device.
Performance testing is supported on Cisco devices for all test types, and on Juniper devices for all but MOS.
Troubleshooting depends if Failures on the chart shows as a coloured line or is clear.
Failures line clear
A blank Failures line indicates that the test is not running. This usually indicates the test has not been created on the device. Use a router command to determine if the test is present:
|Vendor||Router command to display test setup|
The likely causes of this are:
- SNMP read-write community string missing or mistyped on the testing device
- SNMP read-write community string mistyped in the performance test set up in Highlight
- The target device is not capable of running performance tests
- There is a missing parameter in the test set up in Highlight. For example the port number for a UDP test, typically being port 7
- The VRF field has been completed in Highlight but no VRF is defined on the device
- If the previous points have been checked then it may be that that Highlight just needs to repush the test setup details. Make any change to the test setup in Highlight to force a repush. A typical option may be to change the polling period by 1 second (change 30 to 31 for example). It can be changed back immediately.
Test is defined in Highlight and also appears on the router, but still no results appear in Highlight:
ip sla responderhas not been configured on the target device, which is required for Precision, MOS and UDP Echo tests
- UDP 1967 is blocked between the source and target device, which is required to set up a Precision or MOS test
- Polling interval significantly lower than test interval. For example if the watch is polled every minute but the MOS test runs every 175sec.
Configure the test interval to be less than or equal to polling interval
- Alternatively this is likely to be an IOS code problem. We know for example that IOS version 15.1(4)M5 does not respond to Highlight with its test results.
Upgrade the IOS
Failures on chart show as a coloured line
This indicates the test has been created on the device but the test is failing.
To check if the test is running successfully on a Cisco router, issue
show ip sla statistics test#
Likely causes of this symptom include:
- There is no route between source and target or vice versa
- The test is blocked by a firewall or similar
- A source IP has been specified in the test which does not exist on the router
ip sla responderhas not been configured on the target device, which is required for Precision, MOS ; plus UDP Echo
- (For TCP Open) The target device is not listening on the specified port
- (For HTTP Tests) Click on the chart to find out the reported cause of the failure
If a test is showing a red vertical strip next to the strip chart having clicked on a heat tile and the cause is not covered above, we suggest you check the following:
- Check the target is not being exceeded. Refer to the blue line present in the test chart on the Details page and determine if the results currently exceed it. Similarly for a MOS test, check the MOS chart target
- Check for the presence of Failures (ICMP, UDP, TCP, HTTP, MOS)
- Check for the presence of Loss (MOS milliseconds, Precision)
If a test result is showing only irregularly across the chart, this may be a router issue.
- Check the uptime of the device from the Details; Technical page ("Has Been Up Since ...") and consider a restart if it has been up a year or more
There are two known situations where this symptom can occur:
If one of the IP addresses used in the test contains a trailing space, then the final byte is written to the router as zero rather than the host IP. For example an IP address of "192.168.100.15 " can appear on the router as IP address "192.168.100.0".
Check the test definition and remove the space
Some tests set up for Cisco devices fail to be set correctly on the router if the IP address of the test source or target contains .0. For example 172.20.0.240 would be created as 172.20.0.0, and 10.0.56.1 would be created as 10.0.0.0
This would be discovered using ''show ip sla configuration'' and is likely to be caused by an IOS issue. Based on our customers' experience, the IOS codes displaying this symptom include 15.1(4)M5 and 15.1(4)M6
The solution to the problem is to carry out an IOS upgrade of the device. The recommendation is to move to 15.2(4)M6 or above.
Workaround options we have found are:
- Use an alternative interface for the test whose IP address does not contain .0.
- Add a secondary address to the interface used for the test which does not contain .0.
When trying a Techtest from the Performance tab if the device reports "No Write Access is available – the device cannot be used for Performance Tests" check the following:
- set the SNMP RW community string in Highlight to match that configured on the device
- set RW access to the appropriate MIB area via the SNMP view statement
- set the license on the Cisco device to be greater than "Base". Check using
show version. Alternatively licence details are available from Reporting inventory
- that the "SAA capacity" value at the bottom of the Technical Test Results window is non-zero.
If SAA capacity is zero then there is insufficient free memory. Try the command
show ip sla applicationwhich will display the configured minimum memory required in order to set up another test ("IP SLA Monitor low memory water mark"). If you check memory available it will probably be less that this value.
You can change the minimum setting with the command
ip sla low-memory value
- test if sla number 10000000 (10million) exists on the router:
show ip sla statistics | i Indexand see if any line has 10000000
If so remove it by configuring the router as
no ip sla 10000000
This is most apparent from Reporting when the Delay Average is consistently 1ms, but the point to point distance between locations would make this figure unrealistic.
We have also seen a similar symptom where occasionally the average delay looks reasonable before cascading back to 1ms.
We have found this to be an IOS code issue for various devices running code versions 15.2(4)M3, 15.2(4)M4 or 15.2(4)M5 (and perhaps others). The recommendation is to move to 15.2(4)M6 at least.
- the "View by" period is prior to the test being configured in Highlight
- the test is configured on a child watch (e.g. Class) that is no longer valid on the bearer watch being monitored. For example this may occur if classes are discovered when monitoring interface "A" but Highlight is subsequently changed to monitor interface "B".
- the parent watch is created using SNMPv3 access. Currently Highlight allows tests to be created on an SNMPv3 parent watch but does not then collect the data. This will be fixed in a future release.
How do I determine the Highlight test number?
To determine the test number Highlight has created on the router, follow these steps:
- Select the test strip chart from Status Heat Tiles to display the Details page
- Identify the test in question and hover over the test description on the left
- A line appears starting PI_xxxxx_yyyyy (in my example PI_13048_13524)
- The test number on the router is the second number, 13524 in my example
- Display the router configuration of this test using
show ip sla configuration 13524
- Display the results in the router using
show ip sla statistics 13524
Does deleting a test in Highlight remove it from the router?
Deleting a test in Highlight does not remove the test from the router. Every test is written with a lifetime setting of approximately two months after which the router will remove the the test automatically. To remove it earlier delete the test manually.
What happens to tests after a device restart?
If the monitored device is restarted Highlight will automatically rewrite an active test after 3 poll cycles to determine the test is missing. Similarly it will take 3 poll cycles to rewrite the test once a test expires.