Understanding Heat Tiles
What are heat tiles?
Highlight’s unique heat tiles instantly show a clear status of both network and application services. They are more powerful than a simple traffic light or dashboard display because they show trends too. They are a time-based capability which understands that one incident does not make a network link bad, nor is it good the minute a long-standing problem is resolved.
Highlight uniquely measures performance using problem levels - an on-going rating that smoothes the display and indicates if a situation is improving or deteriorating. There are 2 types of heat tile:
Every location within your network is shown as an individual, summary tile with chevrons to indicate improving or deteriorating issues across stability, load and health metrics
Group watches from any location in your network into a single tile, custom-define thresholds to change tile colour based on the total number of issues
Warning Triangle: Number of issues
Impact Meter: Percent of total tile elements with issues (see examples below)
|Explanation||1 amber issue||at least 1 red issue, 3 total issues||at least 1 red issue, 3 total issues|
|Number of issues||1||3||3|
|Total elements||many (50)||6||3|
|Percent of total||approx. 2%||50%||100%|
These tiles are created by adding watches to a container and setting it to display as a tile.
Service Tile Example
A service tile for email could comprise the summary status of:
- A host watch for the email server (CPU/memory/disk)
- TCP tests from the local router to the server listening ports (application status)
- Link Health watches for the Internet service
- WAN tests from remote locations to the Data Centre (to assess impact of network)
The mechanism works on the principle that if an error condition is encountered in any element then a fuel gauge type counter is decreased. In all other circumstances it’s increased. When the counter goes below pre-set thresholds, Highlight signals an amber condition, and then a red. Please see below the conditions when the various counters are decreased, associated with Stability, Load, or Health which we refer to as level 3 metrics.
Decrease the counter if any of the following conditions are met, which we refer to as level 2 metrics affecting stability:
- The device loses connection with either of the Highlight pollers
- The monitored interface is down or indicating a brief outage or has been taken out of service or no longer exists
- There has been a device restart
- A switch port designated as critical is down
- Performance tests: ICMP Ping, UDP Echo and TCP Open: 100% packet loss of all tests in a sample
- Performance tests: HTTP and HTTPS: Application failure, HTTP response 4XX or timed out
- Performance test: MOS: Application failure, MOS is less than 1.0
- Performance test: Precision Delay: 100% packet loss of all tests in a sample. Health index is also affected.
Decrease the counter if any of these conditions are met, which we refer to as level 2 metrics affecting load:
- Link utilisation (traffic in or out) exceeds threshold (default is 80% )
- Tunnel utilisation (traffic in or out) exceeds threshold (default is 82% )
- Traffic in or out on a dormant watch exceeds threshold (default is 1000 Kbps )
- CPU for a router exceeds 60%
- CPU for a host exceeds threshold (default is 75% )
- Client Count exceeds threshold (default is 30 client devices )
- Wireless Utilisation exceeds threshold (default is 50% )
Decrease the counter if any of these conditions are met, which we refer to as level 2 metrics affecting health:
- Link errors exceed threshold (default is 1% or 10,000 packets per million )
- Link congestion occurs:
- Queue length exceeds 0
- Discards exceed threshold (default is 1% or 10,000 packets per million )
- Class drops exceed 0 ()
- Broadband Clarity:
- Connection speed of the broadband service drops below the speed threshold, which is auto-learned or manually set
- Cellular Clarity:
- Signal strength (RSSI) of the cellular service drops below the threshold, which may be set (default is -120 dBm )
- Physical Memory utilisation exceeds threshold (default is 80% )
- Disk utilisation exceeds threshold (default is 80% )
- Congestion (discards) exceed threshold (default is 1% or 10,000 packets per million ) or
- Signal Problems exceeds threshold (default is 25% )
- Performance tests - ICMP Ping, UDP Echo and TCP Open:
- Any one of the tests in a sample shows response exceeds target ()
- At least one test in a sample fails to respond (lost packet)
Note: One sample can contain up to six test results; if all tests in a sample fail it affects stability and healthPerformance tests - HTTP and HTTPS:
- Page load response exceeds target ()
- Performance tests - Precision Delay and MOS: ()
Condition Precision Delay MOS Average response from the burst exceeds response target Yes Yes Percentage of lost packets exceeds packet loss target Yes Yes Jitter measured over the burst exceeds jitter target Yes Yes MOS Score is less than target N/A Yes
Note that each level 2 metric above can trigger an alert so for example you may get an alert when a heat tile goes red caused by Inbound Link utilisation, then another alert caused by Outbound Link utilisation even though the tile is already red. The tile colour represents the worst case of all level 2 metrics associated with it.
The Status Heat tiles page refreshes every 180 seconds by default, with auto refresh on.
Alternatively, switching auto refresh off will keep the page in view constant until you select a different position in the network explorer tree.
Use the options panel for services and locations to display only certain colour tiles, adjust the size of your tiles, or group, arrange or filter tiles.
- Open the options panel using the chevron button. If you leave this panel open, it will be open when you next log in.
- If any settings have changed from default, the chevron background changes to bright blue and the "reset all filters" button is enabled. These options revert to default when you next log in but tile size remains as previously set.
Refer to the Tile colours displayed and Tile resizing sections below for details on these features.
Only selected folder
By default all service tiles are shown which contain any watch from the currently selected folder or below. By checking this option only services defined in the selected folder are displayed.
- By default show location tiles
- Optionally view locations in a grid
Refer to the Tile colours displayed and Tile resizing sections below for details on these options.
Group by folder
By default all locations below the selected folder are shown unless this option is checked. Multiple location tiles are shown as a single, stacked folder tile, which simplifies the location tile view for large networks. The overall number of issues is shown and not filtered by stability, load or health. Click a stacked tile to drill down to subfolders and locations.
- Sort A to Z
- By default tiles are arranged alphabetically.
- Issue duration
- Tiles are displayed in three groups: Now → 24 hours, 1 → 7 days and 7+ days. The newest issue in each section is in the top left. By default, stability, load and health issues are included, deselect one or more to change this (see Issues below). Note: As green tiles do not have any issues, they are not included when arranging by issue duration.
By default, all issue types are shown: stability, load and health. Use these icons to hide or show one or more issue types. These buttons have no effect on green tiles.
React to issues: Any or All
Refer to the section below for details on these options.
- A summary of heat tiles in each state (red, amber and green) is displayed in the panel header at all times
- Use the check box below or click on each colour to hide or show those tiles in the display
- Use the "reset all filters" button to return all settings to their default
The display resets to show all colours each time you log in
Heat tiles can be resized using the slider control with changes automatically applied. The sizes available are from 100 to 400 pixels wide. Depending on your browser, you may be able to incrementally adjust tile size by clicking the slider control then using the UP and RIGHT arrows on your keyboard to increase or the DOWN and LEFT arrows to decrease.
Use to restore the tile size to the default width:
- Service tiles
- 185 pixels
- Location tiles
- 150 pixels
Your selected tile size will be remembered when you next log in
Watch-centric (React to issues: Any)
With this default setting, Highlight's location tiles change colour based on the worst case of any single watch in the location. This view enables service provider operations teams to quickly see locations with issues to be resolved but can occasionally result in a sea of red tiles for relatively minor issues.
With Any selected on the "React to issues" toggle, location tiles are
- No issues
- Some watches are in a transitional (amber) state
- Some watches are in a degraded (red) state
Location-centric (React to issues: All)
Highlight provides an alternative view which is based on the location as a whole, rather than on any single watch. In this view an amber tile indicates a location with degraded connectivity which is still online. This view is beneficial to enterprises and service providers wanting a higher-level overview of the entire network.
With All selected on the "React to issues" toggle, location tiles are
- No issues
- Some watches are in a transitional or degraded (amber or red) state or all watches are amber
- Only when all watches have issues (amber or red) and at least one is red
*Note: the location tile is also amber when all
- Any All
- This setting is remembered when you next log in
- The summary of heat tiles in each state (red, amber and green) adapts to the Any/All setting
There is one watch in the Cellular Clarity location with a health issue which has been ongoing for 5 hours.
With "react to ANY issues" set,
the Cellular Clarity location tile is red.
With "react to ALL issues" set,
the Cellular Clarity location tile is amber.
The tile will only turn red if all 4 watches at the location are red.
Refer to the troubleshooting tile colour page for known side-effects with this functionality.