Line Health Events Dialog
The Line Health events dialog provides additional detail about the coloured blips on the Availability line. This dialog can also be used to exclude specific events from Highlight's availability calculations. This is useful if a maintenance period occurred but was not scheduled in advance or if an incident happened which was not the fault of the service provider. Highlight will show Operational and Adjusted availability figures in the SLA Compliance report.
For a general overview on SLA Now, watch the SLA Now video
For the selected time period (day, week or month) each event shows its type, the time and date it occurred, the event details, duration and if it is impacting the availability figure or not. All events are grouped as these types:
- typically outages, these show a duration and impact availability
- events removed through planned maintenance or manual exclusion
- events which do not have a duration and do not impact availability
- shows the number of events for each category
The header shows Event list (showing X of Y) where Y is the total number of events for the time period including events in maintenance. X is a reduced number if filters are applied.
Purple background areas on the chart show current and past maintenance periods. Events in maintenance do not impact availability.
Download the information to a CSV file to see all events and any comments on excluded events
By default, events are sorted by date and time with the most recent event at the top. This can be reversed or you can sort by any of the column headers:
- Events are impacting, excluded, maintenance or non-impacting
- Sort by event start time [default]
- Event Details
- An alphabetical sort of events
- Show longest duration events first or last
- Sort by impacting or non-impacting
- Use the text filter to show specific Event Details and comments. The button is bright blue to indicate a filter is applied to the list. If an incident number is included with a comment, use the text filter to search for it.
- Click this icon to see the saved comment
- Toggle each check box to show or hide events in the Event list
For a general overview on SLA Now, watch the SLA Now video
Users with the permission to Manage maintenance have two ways to exclude an outage from Highlight's availability calculations:
1. Before the event - set up a maintenance window
Refer to the Help & Support Centre for details on creating containers and putting them into maintenance
Purple background areas on the chart show current and past maintenance periods. Watches in maintenance do not send alerts nor change the colour of heat tiles.
2. After the event - exclude an outage
Events can only be excluded if the parent folder has SLA Now enabled on the Features tab and the user has Manage Maintenance permission
- Check one or more impacting events then add a comment and select this button
- To reverse exclusions, check those events which have previous been excluded (indicated with ), add a replacing comment and select this button
A comment is required each excluded event. We recommend including an incident number in the comment. This can then be searched for using the text filter in the dialog, in the CSV download or in the Audit log.
Event exclusions are recorded in the Audit Log against the watch with details of which user made the change, any comment and when.
Events which have been excluded have the following characteristics:
- the non-impacting indicator
- the duration of the event will be faded, for example 4h 42m
- and either
- the maintenance icon which shows "Event occurred whilst watch in maintenance" on hover
- the grey excluded icon with
- the indicator which displays the comment when clicked
In the CSV download, events which have been excluded show:
- Impacting Duration
- as entered by the user who excluded the event
The number of exclusions made and adjusted availability show in the SLA Compliance report with a red background if the availability figure is less than the SLA target.
The following totals for the time period (day, week or month) selected are shown. These are not affected by any Event list filter:
Total events:a count and cumulative duration of all events in the selected time period including events which occurred during a maintenance window for this watch.
Note: event duration will generally not be the same as the duration shown in Reporting for Exceptions, as Exceptions are based on heat tile state changes but Line Health event duration is based on impacting events. The difference is related to the watch Sensitivity settings.
Impacting events:a count and cumulative duration of the non-excluded impacting events in the selected time period. The duration is used to calculate Adjusted availability.
Operational availability:the percentage of time this watch was available in the time period; this will not include time before the device was activated nor outages during maintenance.
Adjusted availability:the availability figure which is calculated using non-excluded impacting events. If no events are manually excluded, operational and impacting availability figures match.
Operational and adjusted availability correspond to the figures shown in the SLA Compliance report for the same period.
Affects the Stability metric [CPU Util affects the Load metric, Discards and Line Errors affect the Health metric]
This line shows how available your connection has been. Availability is worked out as a percentage of the day, week or month that the watch was monitored. Several issues will cause outages which are shown as coloured blips on this line and decrease the availability percentage.
You can click on the line health graphic or on the View Events button for a full list of issues and outages, with exact timestamps. See the View Events section below.
Note: Mustard and blue blips can be custom configured to be included in Total outages or not. This is set by the service provider at a folder level and inherited by subfolders unless overridden. The duration will be shown in grey text to indicate it is excluded from the total outages figure and does not affect Operational availability.
The entries you may see are summarised in this table with full details below:
|Number||Blip Colour||Issue||Effect in Highlight||Included in Total outages?|
|1||Green||Transients or brief outages||Decrements stability||Yes|
|2||Mustard||Ongoing issue, cause unknown or monitoring an unknown interface||Decrements stability||Custom|
|3||Blue||No data, device not contactable or monitored interface does not exist||Stability previously decremented|
(only blue after contact re-established)
|4||Black||Device restarted, powered off/on||Decrements stability||No|
|5||Turquoise||Device configuration changed||No effect||No|
|6||Red||Circuit reported as down, or period of non-contact ends in a device restart||Stability previously decremented|
(only red after contact re-established)
|7||Deep Red||Circuit taken out of service||Decrements stability||Yes|
|8||Bright Blue||Unknown event type||No|
1. Green: Transient or brief outage
Shown as a green blip, these are brief outages which indicate the link being monitored went down and up very briefly between the times Highlight polled it. Highlight can tell this has happened because the device timestamps the last interface change, so if the state changed between consecutive successful polls there must have been a brief outage.
For example, if this link state timestamp had increased between the check at 00:00 and the check at 00:03, we know the line was briefly down between those times, even though it looked healthy both times we checked it.
For a green blip to be drawn, the line must be down for less than the polling period. The outage may have lasted for a few seconds, a minute or more. When the line comes back up, it will take a short time for routing to settle down, the router to start passing packets again, then for any connections using it to timeout and retry, so the effective outage will be longer. Highlight counts each transient outage as one minute of downtime, which is a sensible estimate.
2. Mustard: Ongoing connectivity problems
When Highlight loses contact with the device it is monitoring, we treat the outage as an ongoing issue shown by a mustard blip. Only once Highlight reconnects with the device can we check the last change timestamp for the monitored interface, which determines if
- - the link went down whilst we were out of touch with it (the mustard colour changes to red 'Circuit reported as down'); or
- - the link did not go down (the mustard colour changes to blue 'Device not contactable') - see next point 3; or
- - the period of non-contact ends in a device restart (the mustard colour changes to red 'Circuit reported as down')
A mustard blip will also be shown if Highlight is set to monitor an unknown interface.
3. Blue: No data : device not contactable
If the last change timestamp for the monitored interface hasn't changed when connection is re-established after connectivity problems, Highlight knows that the line has been up all the time and we simply lost contact because of a problem elsewhere on the network. Highlight marks these events by turning mustard blips to blue with the comment 'No data : Device not contactable.' There was no outage detected by the device, although it was not reachable from the Highlight platform.
The availability percentage is not decreased unless the folder option to treat blue as an outage is set to Yes.
A blue blip will also be shown if Highlight was previously set to monitor an unknown interface; or if the device does not support the "last change" parameter, unless the period of non-contact ends in a device restart which would cause the mustard blips to turn red.
4. Black: Device restarted
This is where the monitored device has been powered off/on or restarted and is shown by a black blip. If the link itself was down before the device was restarted you will see red blips with the last blip being black. For more details see point 6 below (Red: Circuit reported as down).
5. Turquoise: Device configuration changed
This indicates a change in the configuration of the monitored device itself and is shown as a turquoise blip.
6. Red: Circuit reported as down
This is where we regain contact with the device after an outage and establish that the monitored link definitely went down. The mustard blips will turn red. Like other red issues, this counts as a formal outage on your watch, shown by a red blip and the comment 'Circuit reported as down.' The availability percentage for that watch will be decreased.
A red line also occurs if a period of non-contact ends in a device restart.
7. Deep Red: Circuit taken out of service
If the monitored link has been shut down deliberately on the device, the comment will be 'Circuit taken out of service.'
8. Bright Blue: Unknown event type
If the event type cannot be determined, the comment will be 'Unknown event type'