Risks and Assumptions
- For Cluster Object Discovery and Monitoring implementation, We are considering the object which has Name equals to Cluster Name in Get-ClusterResource response.
- For ClusterGroup monitoring implementation, We are considering the object which has Name as Cluster Group in Get-ClusterGroup response.
- windows_cluster_group_failover_status metric’s possible instance values are 0-if there is no change in OwnerNode, 1-If there is a change in OwnerNode, 2 If no OwnerNode.
- The integration can manage critical/recovery failure alerts for the following two scenarios when the user activates App Failure Notifications in the settings:
- Connectivity Exception
- Authentication Exception
- If user enables agent monitoring templates on the Cluster/Node resource, he might see the duplicate metrics with different naming conventions.
- If user enables same thresholds on Additional OS level monitoring metrics on both Cluster and Node, he might see 2 alerts with same details with respective metric names (i.e, Windows_cluster_system_disk_Utilization, Windows_cluster_node_system_disk_Utilization).
- While trying to fetch the node ip address we receive multiple node ips, which will include many local ips and actual ips (example : lets say actual node ip is 10.1.1.1 when trying to fetch the details we will receive two ips one associated with custer(192.168.0.0) and other is the actual ip). to identify the actual node ip address from the list of ip addresses received we are assuming that node ip address is part of the same subnet of cluster ip address. meaning if cluster ip is 10.1.1.1 then node ips will be 10.1.X.X.
- We have provided the provision to give Cluster Ip Address OR HostName in configuration, But HostName provision will work only if the Host Name Resolution works.
- Support for Macro replacement for threshold breach alerts (i.e, customisation for threshold breach alert’s subject, description).
- No support of showing activity logs.
- The Template Applied Time will only be displayed if the collector profile (Classic and NextGen Gateway) is version 18.1.0 or higher.
- Powershell execution is not working in arm64 architecture due to which windows-failover-cluster application will not work in arm64 architecture.
- This application supports both Classic Gateway and NextGen Gateway.
Log File Monitoring Configuration
We expect Log file Monitoring configuration input in below JSON format
[
{
"Name": "",
"Folder Path": "",
"File Name": "",
"Rotated Folder Path": "",
"Rotated File Name": "",
"Notify": [
{
"Monitor Type": "",
"Expression": "",
"Operator": "",
"Threshold": "",
"Severity": ""
}
]
}
]
- Name: The name of the log file configuration.
- Folder Path: The absolute path of the folder containing the log file. (Regex and macros are not supported).
- File Name: The name of the log file to be monitored. (Regex and supported macros can be used).
- Rotated Folder Path: The absolute path of the folder containing the rotated log file. (Regex and macros are not supported).
- Rotated File Name: The name of the rotated log file. (Regex and supported macros can be used).
- Monitor Type: Specifies the type of monitoring to be performed for raising an alert. The possible values are:
- EXPRESSION_MATCH: Searches for a specific expression in the log file.
- LAST_UPDATED_TIME_IN_MIN: Uses the last updated time of the log file as the monitoring value.
- FILE_SIZE_LIMIT_IN_MB: Uses the size of the log file as the monitoring value.
- Expression:
- If the Monitor Type is EXPRESSION_MATCH, provide the search string (Regex is supported) to verify its entries in the log file. This field is case-insensitive.
- If the Monitor Type is LAST_UPDATED_TIME_IN_MIN or FILE_SIZE_LIMIT_IN_MB, this field is not applicable and can be left empty.
- Operator: Specifies the operator to verify threshold breaches. Possible values depend on the Monitor Type:
- If Monitor Type is EXPRESSION_MATCH, possible values are:
- EXISTS: Raises an alert if the specified expression is found in the log file at least the specified Threshold number of times.
- NOT_EXISTS: Raises an alert if the specified expression is not found in the log file. In this case, Threshold can be left empty.
- If Monitor Type is LAST_UPDATED_TIME_IN_MIN or FILE_SIZE_LIMIT_IN_MB, possible values are:
- GREATER_THAN
- LESS_THAN
- EQUAL_TO
- NOT_EQUAL_TO
- GREATER_THAN_EQUAL_TO
- LESS_THAN_EQUAL_TO
{monitored_value} {Operator} {Threshold}
evaluates to true. - If Monitor Type is EXPRESSION_MATCH, possible values are:
- Threshold: Specifies the threshold value as a valid integer. The interpretation of the threshold depends on the Monitor Type:
- If Monitor Type is EXPRESSION_MATCH, the threshold represents the frequency of the expression in the log file.
- If Monitor Type is LAST_UPDATED_TIME_IN_MIN, the threshold represents the time in minutes.
- If Monitor Type is FILE_SIZE_LIMIT_IN_MB, the threshold represents the size in megabytes (MB).
- Severity: Specifies the alert type to be created if a threshold breach occurs. Possible values are:
- CRITICAL
- WARNING
- INFO
- OK (case-insensitive).
Macro | Description |
---|---|
$hour$ | 0 - 23 |
$month$ | 1 - 12 |
$day$ | 1 - 31 |
$year$ | 1000 - 9999 |
$shortYear$ | 00 - 99 |
$weekdayName$ | Sun - Sat |
$fullWeekdayName$ | Sunday - Saturday |
$0hour$ | 00 - 23 |
$0day$ | 01 - 31 (two-digit day format) |
$0month$ | 01 - 12 (two-digit month format) |
$monthName$ | Jan - Dec (three-letter month format in English) |
$fullMonthName$ | January - December |
Limitations
- We do not process logs of size greater than 1GB.
- We will send recovery for only file not found alerts. No recovery alerts get generated for alerts caused by EXPRESSION_MATCH, LAST_UPDATED_TIME_IN_MIN and FILE_SIZE_LIMIT_IN_MB.
Sample Log File configuration
If Customer have below uses cases to be monitored for log file C:\ProgramData\logs\test.log and rotated log file C:\ProgramData\logs\test.1.log, then log file configuration will look like
- Notify if Expression ‘warning’ contains more than or equal to 4 times in the log file.
- File size should not be greater than 10MB.
- File last updated time should not be greater than 30 minutes.
[
{
"Name": "log_config1",
"Folder Path": "C:\\ProgramData\\logs",
"File Name": "test.log",
"Rotated Folder Path": "C:\\ProgramData\\logs",
"Rotated File Name": "test.1.log",
"Notify": [
{
"Monitor Type": "EXPRESSION_MATCH",
"Expression": "warning",
"Operator": "EXISTS",
"Threshold": "4",
"Severity": "WARNING"
},
{
"Monitor Type": "FILE_SIZE_LIMIT_IN_MB",
"Expression": "",
"Operator": "GREATER_THAN",
"Threshold": "10",
"Severity": "CRITICAL"
},
{
"Monitor Type": "LAST_UPDATED_TIME_IN_MIN",
"Expression": "",
"Operator": "GREATER_THAN_EQUAL_TO",
"Threshold": "30",
"Severity": "INFO"
}
]
}
]
If customer wants to monitor daily file in the format log_daily_{year}-{month}-{date}.log
for instance log_daily_2025-1-4, then log file configuration will look as follows:
- Notify if log file contains expressions starting with error.
- Notify if log file contains string ‘retry’ more than or equal to 4 times.
- File size should not be less than 1MB.
- File last updated time should not be greater than 4 hours (240 minutes).
[
{
"Name": "log_config2",
"Folder Path": "C:\\ProgramData\\logs",
"File Name": "log_daily_$year$-$month$-$day$.log",
"Rotated Folder Path": "C:\\ProgramData\\logs",
"Rotated File Name": "log_daily_$year$-$month$-$day$.log",
"Notify": [
{
"Monitor Type": "EXPRESSION_MATCH",
"Expression": "error\\w*",
"Operator": "EXISTS",
"Threshold": "1",
"Severity": "CRITICAL"
},
{
"Monitor Type": "EXPRESSION_MATCH",
"Expression": "retry",
"Operator": "EXISTS",
"Threshold": "4",
"Severity": "CRITICAL"
},
{
"Monitor Type": "FILE_SIZE_LIMIT_IN_MB",
"Expression": "",
"Operator": "LESS_THAN",
"Threshold": "1",
"Severity": "CRITICAL"
},
{
"Monitor Type": "LAST_UPDATED_TIME_IN_MIN",
"Expression": "",
"Operator": "GREATER_THAN_EQUAL_TO",
"Threshold": "240",
"Severity": "CRITICAL"
}
]
}
]
If customer wants to monitor log files starting with ‘test’, Log file configuration is as shown below - Notify if the string “success” does not exists in the log file. - Notify if the log file contains string “failed”. - File size should not be greater than 25MB. - File last updated time should not be greater than 1 hour ( 60 minutes).
[
{
"Name": "log_config3",
"Folder Path": "C:\\ProgramData\\logs",
"File Name": "^temp.*\\.log$",
"Rotated Folder Path": "C:\\ProgramData\\logs",
"Rotated File Name": "^temp.*\\.log$",
"Notify": [
{
"Monitor Type": "EXPRESSION_MATCH",
"Expression": "success",
"Operator": "NOT_EXISTS",
"Threshold": "",
"Severity": "CRITICAL"
},
{
"Monitor Type": "EXPRESSION_MATCH",
"Expression": "failed",
"Operator": "EXISTS",
"Threshold": "1",
"Severity": "CRITICAL"
},
{
"Monitor Type": "FILE_SIZE_LIMIT_IN_MB",
"Expression": "",
"Operator": "GREATER_THAN",
"Threshold": "25",
"Severity": "CRITICAL"
},
{
"Monitor Type": "LAST_UPDATED_TIME_IN_MIN",
"Expression": "",
"Operator": "GREATER_THAN_EQUAL_TO",
"Threshold": "60",
"Severity": "CRITICAL"
}
]
}
]