Skip to content

Instantly share code, notes, and snippets.

@kongou-ae
Last active May 2, 2019 16:20
Show Gist options
  • Save kongou-ae/981aad7df4749428de66b64e44f34ea7 to your computer and use it in GitHub Desktop.
Save kongou-ae/981aad7df4749428de66b64e44f34ea7 to your computer and use it in GitHub Desktop.
alerts.json
{
"ArmAlertTemplates": [
{
"Title": "A resource provider failed to process Azure Resource Manager requests.",
"Severity": "Warning",
"Description": "The resource provider {ResourceProviderNamespace} is not responding to some requests from Azure Resource Manager. This could be due to network issues or represent an unidentified problem with the resource provider.",
"Remediations": [
"Try the following actions to restore the resource provider to full operation.",
"1. In the portal, go to \u0027Resource provider manifests\u0027, locate the provider that is listed as the COMPONENT in this alert, and if that provider reports health information, review that provider’s health.",
"2. If the resource provider has active alerts, follow the remediation steps for those specific alerts first.",
"3. Review the network configuration and health for possible issues.",
"4. If the preceding steps don’t solve the problem, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
]
},
{
"Title": "A resource provider failed to process Azure Resource Manager requests.",
"Severity": "Critical",
"Description": "The resource provider {ResourceProviderNamespace} is not responding to a high percentage of requests from Azure Resource Manager. This could be due to network issues or represent an unidentified problem with the resource provider.",
"Remediations": [
"Try the following actions to restore the resource provider to full operation.",
"1. In the portal, go to \u0027Resource provider manifests\u0027, locate the provider that is listed as the COMPONENT in this alert, and if that provider reports health information, review that provider’s health.",
"2. If the resource provider has active alerts, follow the remediation steps for those specific alerts first.",
"3. Review the network configuration and health for possible issues.",
"4. If the preceding steps don’t solve the problem, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
]
},
{
"Title": "A resource provider failed to process Azure Resource Manager requests.",
"Severity": "Warning",
"Description": "The resource provider {ResourceProviderNamespace} is not processing requests from Azure Resource Manager due to unexpected errors.",
"Remediations": [
"Try the following actions to restore the resource provider to full operation.",
"1. In the portal, go to \u0027Resource provider manifests\u0027, locate the provider that is listed as the COMPONENT in this alert, and if that provider reports health information, review that provider’s health.",
"2. If the resource provider has active alerts, follow the remediation steps for those specific alerts first.",
"3. If the preceding steps don’t solve the problem, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
]
},
{
"Title": "A resource provider failed to process Azure Resource Manager requests.",
"Severity": "Critical",
"Description": "The resource provider {ResourceProviderNamespace} is not processing requests from Azure Resource Manager due to unexpected errors.",
"Remediations": [
"Try the following actions to restore the resource provider to full operation.",
"1. In the portal, go to \u0027Resource provider manifests\u0027, locate the provider that is listed as the COMPONENT in this alert, and if that provider reports health information, review that provider’s health.",
"2. If the resource provider has active alerts, follow the remediation steps for those specific alerts first.",
"3. If the preceding steps don’t solve the problem, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
]
},
{
"Title": "Service Principal for provider authorization is missing from guest directory tenant.",
"Severity": "Critical",
"Description": "Service Principal for provider authorization is missing from guest directory tenants. Please run multi-tenancy configuration scripts for Azure Stack.",
"Remediations": [
"Try the following actions to restore the resource provider to full operation.",
"1. Make sure multi-tenancy is enabled for Azure Stack. See instructions at: \u003clink type=\u0027Url\u0027 uri=\u0027https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-enable-multitenancy#configure-azure-stack-directory\u0027\u003ehttps://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-enable-multitenancy#configure-azure-stack-directory\u003c/link\u003e",
"2. Configure all guest directories by registering Azure Stack with each guest directory: \u003clink type=\u0027Url\u0027 uri=\u0027https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-enable-multitenancy#configure-guest-directory\u0027\u003ehttps://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-enable-multitenancy#configure-guest-directory\u003c/link\u003e",
"3. It takes a few minutes for permissions to propagate. However, if the preceding steps don’t solve the problem, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support.",
"{GuestDirectoryTenantId}"
]
},
{
"Title": "Azure Stack AAD Home Directory needs to be configured.",
"Severity": "Critical",
"Description": "Azure Stack AAD Home Directory {HomeDirectoryTenantId} needs to be configured.",
"Remediations": [
"Try the following actions to restore the resource provider to full operation.",
"1. Configure Azure Stack AAD Home Directory. See instructions at: \u003clink type=\u0027Url\u0027 uri=\u0027https://github.com/Azure/AzureStack-Tools/tree/master/Identity#updating-the-azure-stack-aad-home-directory-after-installing-updates-or-new-resource-providers\u0027\u003ehttps://github.com/Azure/AzureStack-Tools/tree/master/Identity#updating-the-azure-stack-aad-home-directory-after-installing-updates-or-new-resource-providers\u003c/link\u003e",
"2. It takes a few minutes for permissions to propagate. However, if the preceding steps don’t solve the problem, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
]
}
],
"AzureBridgeServiceAlertTemplates": [
{
"Title": "Activation Required",
"Severity": "Warning",
"Description": "Azure Stack is not activated.",
"Remediations": "You have not activated Azure Stack. To do so, see the following help article: \u003clink type=\u0027Url\u0027 uri=\u0027https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-register\u0027\u003ehttps://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-register\u003c/link\u003e."
},
{
"Title": "Activation Expired",
"Severity": "Warning",
"Description": "Azure Stack activation expired. Please reactivate your Azure Stack.",
"Remediations": "Your activation for Azure Stack has expired. To re-activate, see the following help article: \u003clink type=\u0027Url\u0027 uri=\u0027https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-register\u0027\u003ehttps://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-register\u003c/link\u003e. Resource Identifier: {AzureBridgeResourceUri}"
},
{
"Title": "Activation Expiring Soon",
"Severity": "Warning",
"Description": "Your Azure Stack activation will expire on {AzureBridgeActivationExpiration}",
"Remediations": "Your activation for Azure Stack will expire on {AzureBridgeActivationExpiration}. To re-activate, see the following help article: \u003clink type=\u0027Url\u0027 uri=\u0027https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-register\u0027\u003ehttps://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-register\u003c/link\u003e. Resource Identifier: {AzureBridgeResourceUri}"
}
],
"BMCAliveAlertTemplates": [
{
"Title": "BMC credentials are not valid",
"Severity": "Warning",
"Description": "The baseboard management controller (BMC) credentials on {NodeName} do not match the credentials stored in Azure Stack.",
"Remediations": "1. Update the BMC password in Azure Stack store. To do so, use PowerShell to open a remote session to the privileged endpoint using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackpep\u0027\u003ehttps://aka.ms/azurestackpep\u003c/link\u003e. \u003cnewline/\u003e 2. Run the command Set-BmcPassword. For guidance \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotatebmcpassword\u0027\u003ehttps://aka.ms/azsrotatebmcpassword\u003c/link\u003e \u003cnewline/\u003e 3. If the issue persists, contact your Azure Stack solution supplier."
},
{
"Title": "BMC connection timeout",
"Severity": "Warning",
"Description": "Azure Stack cannot connect to baseboard management controller (BMC) on {NodeName}",
"Remediations": "1. Please check cabling to the BMC. \u003cnewline/\u003e2. If the issue persists, contact your Azure Stack solution supplier"
},
{
"Title": "The password on the BMC is either default or a well known password string. Consider updating the credentials to use a strong password.",
"Severity": "Warning",
"Description": "Default or well known credentials for the bmc are in use",
"Remediations": "Update the BMC credentials using the steps provided by your OEM. Then, update the credentials stored in Azure Stack by using Set-Bmcpassword. For guidance on using Set-Bmcpassword, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotatebmcpassword\u0027\u003ehttps://aka.ms/azsrotatebmcpassword\u003c/link\u003e."
}
],
"ComputeControllerAlertTemplate": [
{
"Title": "The compute scale unit is inaccessible for virtual machine placement",
"Severity": "Critical",
"Description": "Scale unit {ScaleUnitName} is inaccessible. No new virtual machines can be created on the scale unit. Virtual machines on the scale unit may be inaccessible.",
"Remediations": "1. Review the health of network switches and check network cabling.\u003cnewline/\u003e2. If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Node inaccessible for virtual machine placement",
"Severity": "Critical",
"Description": "Node {ScaleUnitNodeName} in the scale unit {ScaleUnitName} is inaccessible. There is now decreased capacity for virtual machine creation. Virtual machines on node {ScaleUnitNodeName} will be moved to other nodes. If there is no available capacity, some virtual machines may not be restarted.",
"Remediations": "1. Initiate the Field Replaceable Unit (FRU) process if a network card failure is reported for the node.\u003cnewline/\u003e2. If no network card failure is reported, power off/power on the unhealthy node from the scale unit node blade.\u003cnewline/\u003e3. If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Low memory capacity",
"Severity": "Warning",
"Description": "The region has consumed more than {Percentage} of available memory. Creating virtual machines with large amounts of memory may fail.",
"Remediations": "See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Low memory capacity",
"Severity": "Critical",
"Description": "The region has consumed more than {Percentage} of available memory. Creating virtual machines with large amounts of memory may fail.",
"Remediations": "See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "All available memory capacity exhausted",
"Severity": "Critical",
"Description": "The region has consumed more than {Percentage} of available memory. Creating virtual machines will fail. Continued management of existing VMs may fail. Do not attempt to upgrade AzureStack until you have resolved the memory capacity issue.",
"Remediations": "See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Low core capacity",
"Severity": "Warning",
"Description": "The region has consumed more than {Percentage} of available logical cores. Creating virtual machines with large core counts may fail.",
"Remediations": "See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Low core capacity",
"Severity": "Critical",
"Description": "The region has consumed more than {Percentage} of available logical cores. Creating virtual machines with large core counts may fail.",
"Remediations": "See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Unable to provision virtual machines for specific class and size due to low memory capacity",
"Severity": "Critical",
"Description": "Low memory capacity prevented one or more virtual machines of the size {VMSize} to be provisioned.",
"Remediations": "See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Unable to provision virtual machines for specific class and size due to low logical core capacity",
"Severity": "Critical",
"Description": "Low logical core capacity prevented one or more virtual machines of the size {VMSize} to be provisioned.",
"Remediations": "See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Infrastructure role is unresponsive",
"Severity": "Critical",
"Description": "The compute controller infrastructure role is unresponsive. The region is unable to create new virtual machines. Virtual machine actions will not be available. Additionally, Azure Stack administrators are not able to administer scale units and nodes.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "{InfraOrTenant} VM has invalid cluster group cold start setting.",
"Severity": "Critical",
"Description": "The {InfraOrTenant} VM Cluster Resource {ClusterResourceName} has an invalid cluster group cold start setting on node {OwnerNode}. Having an invalid cold start setting inhibits the ability for Hyper-V to manage the VM and may cause the VM to go missing.",
"Remediations": "Please contact Support to resolve the issue."
},
{
"Title": "VM {FabricVmName} has invalid power state {VmState}.",
"Severity": "Critical",
"Description": "The VM {FabricVmName} has an invalid power state {VmState}.",
"Remediations": "Please contact Support to resolve the issue."
}
],
"DomainControllerAlertTemplates": [
{
"Title": "Infrastructure role is unhealthy",
"Severity": "Warning",
"Description": "The infrastructure role Directory Management is operating in a degraded state.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Infrastructure role is unhealthy",
"Severity": "Critical",
"Description": "The infrastructure role Directory Management is not functional.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Infrastructure role is unhealthy",
"Severity": "Warning",
"Description": "The infrastructure role, Directory Management, has reported time synchronization errors.",
"Remediations": "1. Connect to the Privileged endpoint, instructions can be found \u003clink type=\u0027Url\u0027 uri=\u0027https://docs.microsoft.com/azure/azure-stack/azure-stack-privileged-endpoint\u0027\u003ehere\u003c/link\u003e. \u003cnewline/\u003e 2. Once connected, run the Set-AzSTimeSource cmdlet to set an updated time source or force a resync. \u003cnewline/\u003eTo set a new time source use the following syntax Set-AzSTimeSource -TimeServer (String) [-Resync] \u003cnewline/\u003eTo force a resync use the following syntax Set-AzSTimeSource -Resync \u003cnewline/\u003e3. Once the command completes, close the alert. \u003cnewline/\u003e 4. If the alert returns or if Set-AzSTimeSource cmdlet fails, contact Support. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Infrastructure role is unhealthy",
"Severity": "Critical",
"Description": "The infrastructure role, Directory Management, has reported domain Name resolution is unavailable.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Pending Service account password expiration",
"Severity": "Warning",
"Description": "A service account password will expire within 30 days",
"Remediations": "1. Follow the steps for rotating service account passwords at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotateserviceaccounts\u0027\u003ehttps://aka.ms/azsrotateserviceaccounts\u003c/link\u003e. \u003cnewline/\u003e2. If the problem persists, please contact support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Pending Service account password expiration",
"Severity": "Critical",
"Description": "A service account password will expire within 7 days",
"Remediations": "1. Follow the steps for rotating service account passwords at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotateserviceaccounts\u0027\u003ehttps://aka.ms/azsrotateserviceaccounts\u003c/link\u003e. \u003cnewline/\u003e 2. If this didn\u0027t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. If hardware replacement is required, there are important pre- and post-replacement steps. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackreplacenode\u0027\u003ehttps://aka.ms/azurestackreplacenode\u003c/link\u003e."
},
{
"Title": "Pending user account password expiration",
"Severity": "Warning",
"Description": "The password for the {UserName} privileged endpoint account password will expire within 30 days",
"Remediations": "1. Follow the steps for rotating user account passwords in a privileged endpoint session at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotateuseraccounts\u0027\u003ehttps://aka.ms/azsrotateuseraccounts\u003c/link\u003e. \u003cnewline/\u003e2. If the problem persists, please contact support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Pending user account password expiration",
"Severity": "Critical",
"Description": "The password for the {UserName} privileged endpoint account password will expire within 7 days",
"Remediations": "1. Follow the steps for rotating user account passwords in a privileged endpoint session at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotateuseraccounts\u0027\u003ehttps://aka.ms/azsrotateuseraccounts\u003c/link\u003e. \u003cnewline/\u003e2. If the problem persists, please contact support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Low disk space for Azure Stack infrastructure",
"Severity": "Warning",
"Description": "The computer {Component} has only {FreeSpace} GB of available disk space on {DriveName}. When available disk space is less than {Threshold} GB, Azure Stack service availability may be at risk and Azure Stack updates will fail.",
"Remediations": "1. Select the ‘Repair’ action to try to free up disk space, and then wait for the action to complete. Do not attempt to repair more than one alert at a time. Do not attempt the repair action if an update is in progress.\u003cnewline/\u003e 2. The alert will automatically close if sufficient disk space is available after the repair process completes. It may take up to 15 minutes before the alert will close.\u003cnewline/\u003e 3. If the alert remains active for more than 30 minutes after the repair action completes, please contact customer support. Assistance from customer support is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"FrpHeartbeatAlertTemplates": [
{
"Title": "Infrastructure role instance unavailable",
"Severity": "Warning",
"Description": "The infrastructure role instance {NodeName} is unavailable. This might impact performance and availability of Azure Stack services.",
"Remediations": "1. Select the \u0027Repair\u0027 action to try to start the Infrastructure role instance, and then wait for the action to complete. Do not attempt to repair more than one alert at a time. Do not attempt the repair action if an update is in progress.\u003cnewline/\u003e 2. A few minutes after the Infrastructure role instance starts, the alert will automatically close. You can view the operational status of the role instance by navigating to the following \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027FRP\u0027 resourceType=\u0027infraroleinstances\u0027 resourceName=\u0027{NodeName}\u0027\u003e{NodeName}\u003c/link\u003e. \u003cnewline/\u003e 3. If the alert remains active for more than a few minutes after the repair action completes, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact support."
},
{
"Title": "Scale unit node is offline",
"Severity": "Critical",
"Description": "The node {NodeName} in the scale unit is inaccessible. There is less capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart.",
"Remediations": "1. Navigate to the \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027FRP\u0027 resourceType=\u0027scaleunitnodes\u0027 resourceName=\u0027{NodeName}\u0027\u003e{NodeName}\u003c/link\u003e and try to cycle the node using the Power off/Power on actions on the node blade. (A physical node restart might take up to 10 minutes.) \u003cnewline/\u003e 2. If this didn\u0027t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. If hardware replacement is required, there are important pre- and post-replacement steps. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackreplacenode\u0027\u003ehttps://aka.ms/azurestackreplacenode\u003c/link\u003e."
},
{
"Title": "Infrastructure role instance unavailable",
"Severity": "Warning",
"Description": "The infrastructure role instance {NodeName} is unavailable. This may impact performance and availability of Azure Stack services.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"HardwareMonitorAlertTemplates": [
{
"Title": "Power supply health is degraded",
"Severity": "Warning",
"Description": "{SensorName} health on {NodeName} is degraded and the power supply might fail.",
"Remediations": "1. Contact your Azure Stack solution supplier to replace the faulty power supply. \u003cnewline/\u003e"
},
{
"Title": "CPU health is degraded",
"Severity": "Critical",
"Description": "{SensorName} health on {NodeName} is degraded and CPU operations might place data at risk.",
"Remediations": "1. Perform a drain operation on the node with the faulty CPU. \u003cnewline/\u003e2. Contact your Azure Stack solution supplier to replace the faulty CPU. \u003cnewline/\u003e3. After the CPU is replaced, repair the node."
},
{
"Title": "Fan health is degraded",
"Severity": "Warning",
"Description": "{SensorName} health on {NodeName} is degraded indicating the cooling fan might be at risk of failure.",
"Remediations": "1. Contact your Azure Stack solution supplier to replace the faulty fan."
},
{
"Title": "Temperature breached",
"Severity": "Warning",
"Description": "System temperature on {NodeName} has exceeded a critical threshold.",
"Remediations": "1. Verify that the rack has enough cooling. \u003cnewline/\u003e2. If the rack cannot be sufficiently cooled contact your Azure Stack solution supplier for assistance."
},
{
"Title": "Memory health is degraded",
"Severity": "Warning",
"Description": "{SensorName} health on {NodeName} is degraded, indicating a potential issue that could put data at risk.",
"Remediations": "1. Perform a drain operation on the node with the faulty RAM. \u003cnewline/\u003e2. Contact your Azure Stack solution supplier to replace faulty RAM. \u003cnewline/\u003e3. After the RAM is replaced, repair the node."
},
{
"Title": "Local disk drive health is degraded.",
"Severity": "Warning",
"Description": "{SensorName} health on {NodeName} is degraded and the drive might be at risk of failure.",
"Remediations": "1. Perform drain operation on the node with the faulty drive. \u003cnewline/\u003e2. Contact your Azure Stack solution supplier to replace the faulty drive. \u003cnewline/\u003e3. After the drive is replaced, repair the node."
},
{
"Title": "Boot drive health is degraded",
"Severity": "Warning",
"Description": "{SensorName} health on {NodeName} is degraded and the boot drive might be at risk of failure.",
"Remediations": "1. Perform a drain operation on the node with the faulty drive. \u003cnewline/\u003e2. Contact your Azure Stack solution supplier to replace the faulty drive. \u003cnewline/\u003e3. After the drive is replaced, repair the node."
},
{
"Title": "Hardware monitoring test trap",
"Severity": "Warning",
"Description": "This is a test trap from {NodeName} to validate BMC settings for hardware monitoring",
"Remediations": "This is a test trap to validate BMC settings for hardware monitoring"
},
{
"Title": "Invalid SSD Firmware",
"Severity": "Critical",
"Description": "The firmware on HPE physical SSD drive {SensorName} on {NodeName} is no longer supported with Azure Stack",
"Remediations": "Please update the SSD drive firmware by installing the latest available HPE ProLiant for Microsoft Azure Stack Solution Update Bundle. Please log a case with HPE Support for any questions or issues related to installing the HPE Solution Update Bundle."
}
],
"HrpAlertTemplates": [
{
"Title": "Health controller cannot access storage account",
"Severity": "Warning",
"Description": "The health controller {Component} cannot access the storage account {StorageAccountName}. This may prevent alert generation from certain sources.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Infrastructure role unhealthy",
"Severity": "Warning",
"Description": "The health controller {Component} is unavailable. This may affect health reports and metrics.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Test Alert",
"Severity": "Warning",
"Description": "This is a test Alert. Please disregard.",
"Remediations": "No remediation is required."
}
],
"NRPFabricAlertTemplate": [
{
"Title": "Node unreachable",
"Severity": "Critical",
"Description": "MUX is unhealthy (Common case is BGPRouter disconnected)",
"Remediations": "BGP peer on the RRAS (BGP virtual machine) or top-of-rack (ToR) switch is unreachable or not peering successfully. Execute the following steps to remediate the problem.\u003cnewline/\u003eYou may have a misconfiguration on the ToR switch, and you may need to contact Support and your networking team to look at the switch configuration. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e"
},
{
"Title": "Route publication failure",
"Severity": "Critical",
"Description": "Loadbalancer Mux is not connected to a BGP router.",
"Remediations": "BGP Peering has failed between the SLB MUX and the top-of-rack (ToR) switch. Take the following steps to remediate this problem.\u003cnewline/\u003eYou may need to contact Support and your networking team to check the configuration of the ToR switch. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Load balancer MUX is overloaded",
"Severity": "Critical",
"Description": "There are performance issues with the MUX indicating that it may be at full capacity.",
"Remediations": "Deploy additional SLB MUX instances via the portal to help distribute the load. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddcapacity\u0027\u003ehttps://aka.ms/azurestackaddcapacity\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Certificate not authorized",
"Severity": "Critical",
"Description": "Failed to connect to Mux due to network or cert errors",
"Remediations": "Renew the Certificates via the certificate updates console."
},
{
"Title": "Certificate not trusted",
"Severity": "Critical",
"Description": "Failed to connect to Mux due to network or cert errors",
"Remediations": "Renew the Certificates via the certificate updates console."
},
{
"Title": "Public IP address utilization at 70% across all pools.",
"Severity": "Warning",
"Description": "Public IP address utilization is at 70% across all pools. If utilization reaches 100%, users will be unable to create VM instances, or create public IP addresses.",
"Remediations": "You need to add another block of public IP addresses to continue to service requests from users.\u003cnewline/\u003e1. Acquire another block of IP addresses from your network service provider.\u003cnewline/\u003e2. Sign in to the Azure Stack administrator portal using your service manager account.\u003cnewline/\u003e3. Open the \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027FRP\u0027\u003e Capacity Management\u003c/link\u003e blade and click Add IP pool.\u003cnewline/\u003e4. Provide the details of your new Public IP address pool."
},
{
"Title": "Public IP address utilization at 90% across all pools.",
"Severity": "Critical",
"Description": "Public IP address utilization is at 90% across all pools. If utilization reaches 100%, users will be unable to create VM instances, or create public IP addresses.",
"Remediations": "You need to add another block of public IP addresses to continue to service requests from users.\u003cnewline/\u003e1. Acquire another block of IP addresses from your network service provider.\u003cnewline/\u003e2. Sign in to the Azure Stack administrator portal using your service manager account.\u003cnewline/\u003e3. Open the \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027FRP\u0027\u003e Capacity Management\u003c/link\u003e blade and click Add IP pool.\u003cnewline/\u003e4. Provide the details of your new Public IP address pool."
},
{
"Title": "Public IP address utilization at 100% across all pools.",
"Severity": "Critical",
"Description": "Public IP address utilization is at 100% across all pools. Users are unable to create VM instances, or create public IP addresses.",
"Remediations": "You need to add another block of public IP addresses to continue to service requests from users.\u003cnewline/\u003e1. Acquire another block of IP addresses from your network service provider.\u003cnewline/\u003e2. Sign in to the Azure Stack administrator portal using your service manager account.\u003cnewline/\u003e3. Open the \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027FRP\u0027\u003e Capacity Management\u003c/link\u003e blade and click Add IP pool.\u003cnewline/\u003e4. Provide the details of your new Public IP address pool."
},
{
"Title": "Edge Gateway Pool at 70% utilization",
"Severity": "Warning",
"Description": "The Edge Gateway Pool {Name} is 70% utilized. If utilization reaches 100% users will be unable to create new gateway connections and performance may be impacted.",
"Remediations": "You need to add more capacity to the gateway pool to continue to service requests from users. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddcapacity\u0027\u003ehttps://aka.ms/azurestackaddcapacity\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Edge Gateway Pool at 90% utilization",
"Severity": "Critical",
"Description": "The Edge Gateway Pool {Name} is 90% utilized. If utilization reaches 100% users will be unable to create new gateway connections and performance may be impacted.",
"Remediations": "You need to add more capacity to the gateway pool to continue to service requests from users. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddcapacity\u0027\u003ehttps://aka.ms/azurestackaddcapacity\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "Edge Gateway Pool at 100% utilization",
"Severity": "Critical",
"Description": "The Edge Gateway Pool {Name} is 100% utilized. Users will be unable to create new gateway connections and performance may be impacted.",
"Remediations": "You need to add more capacity to the gateway pool to continue to service requests from users. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddcapacity\u0027\u003ehttps://aka.ms/azurestackaddcapacity\u003c/link\u003e for information about increasing capacity."
}
],
"OEMActivationOneNodePreviewAlertTemplates": [
{
"Title": "Missing OEM activation BIOS marker.",
"Severity": "Warning",
"Description": "The node is missing the OEM activation BIOS marker. A missing activation BIOS marker prevents Windows from activating. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning.",
"Remediations": "Contact your Azure Stack OEM."
},
{
"Title": "Invalid OEM activation BIOS marker.",
"Severity": "Warning",
"Description": "The node has an invalid OEM activation BIOS marker. An invalid activation BIOS marker prevents Windows from activating. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning.",
"Remediations": "Contact your Azure Stack OEM."
},
{
"Title": "Missing OEM activation license file.",
"Severity": "Warning",
"Description": "The node is missing the OEM activation license file. A missing activation license file prevents Windows from activating. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning.",
"Remediations": "Apply an OEM update package that is later than the current stamp OEM version, or contact your Azure Stack OEM."
},
{
"Title": "Mismatched OEM activation license.",
"Severity": "Warning",
"Description": "The physical OEM activation BIOS marker does not match the license file. A mismatched activation license file prevents Windows from activating. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning.",
"Remediations": "Apply an OEM update package that is later than the current stamp OEM version, or contact your Azure Stack OEM."
},
{
"Title": "OEM activation error.",
"Severity": "Warning",
"Description": "The node failed OEM activation with error {LicenseStatusReason}. Windows is not activated on this node. If the evaluation period for Windows is exceeded, Azure Stack will stop functioning.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"PolicyServiceAlertTemplate": [
{
"Title": "The graph service is experiencing unexpected errors.",
"Severity": "Warning",
"Description": "The policy service is not processing some incoming requests due to unexpected errors calling the graph service, {FailurePercentage}% failure detected.",
"Remediations": [
"Alert Remediations.",
"1. Check availability of the graph service. For connected environment, it is Azure Active Directory Graph and for disconnected environment, it is the local graph service. If latter, restart that service.",
"2. If the issue persists, please contact support."
]
},
{
"Title": "The graph service is experiencing unexpected errors.",
"Severity": "Critical",
"Description": "The policy service is not processing most or all incoming requests due to unexpected errors calling the graph service, {FailurePercentage}% failure detected.",
"Remediations": [
"Alert Remediations.",
"1. Check availability of the graph service. For connected environment, it is Azure Active Directory Graph and for disconnected environment, it is the local graph service. If latter, restart that service.",
"2. If the issue persists, please contact support."
]
}
],
"RdAgentAlert": [
{
"Title": "Compute Host Agent is not healthy",
"Severity": "Warning",
"Description": "Compute Host Agent is not healthy on node: {HostName}.",
"Remediations": "Please reach out to customer support to disable the Compute Host Agent feature."
},
{
"Title": "Compute Host Agent is not responding to calls.",
"Severity": "Warning",
"Description": "Could not communicate with the Compute Host Agent running on node: {HostName}.",
"Remediations": "Please disable Compute Host Agent feature flag and collect logs for further diagnosis."
},
{
"Title": "Unexpected Compute Host Agent found running on host: {HostName}",
"Severity": "Warning",
"Description": "Found an unexpected Compute Host Agent running on host: {HostName}.",
"Remediations": "Please disable Compute Host Agent feature flag and collect logs for further diagnosis."
}
],
"SCAlertTemplates": [
{
"Title": "Add Scale Unit Node operation failed to expand the storage capacity.",
"Severity": "Warning",
"Description": "Storage capacity failed to be expanded as part of the operation that adds a scale unit node in cluster {ClusterName}.",
"Remediations": "Customer Assistance is required to resolve this issue. If you require that the storage capacity be expanded, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"SecretExpirationAlertTemplates": [
{
"Title": "Pending internal certificate expiration",
"Severity": "Warning",
"Description": "An internal certificate will expire within 30 days.",
"Remediations": "1. Follow the steps to rotate internal certificates at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotateinternalcertificates\u0027\u003ehttps://aka.ms/azsrotateinternalcertificates\u003c/link\u003e. \u003cnewline/\u003e2. If the problem persists, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Pending internal certificate expiration",
"Severity": "Critical",
"Description": "An internal certificate will expire within 7 days.",
"Remediations": "1. Follow the steps to rotate internal certificates at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotateinternalcertificates\u0027\u003ehttps://aka.ms/azsrotateinternalcertificates\u003c/link\u003e. \u003cnewline/\u003e2. If the problem persists, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Pending external certificate expiration",
"Severity": "Warning",
"Description": "An external certificate will expire within 30 days.",
"Remediations": "1. Follow the steps to rotate external certificates at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotatecertificates\u0027\u003ehttps://aka.ms/azsrotatecertificates\u003c/link\u003e. \u003cnewline/\u003e2. If the problem persists, please contact support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Pending external certificate expiration",
"Severity": "Critical",
"Description": "An external certificate will expire within 7 days.",
"Remediations": "1. Follow the steps to rotate external certificates at \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azsrotatecertificates\u0027\u003ehttps://aka.ms/azsrotatecertificates\u003c/link\u003e. \u003cnewline/\u003e2. If the problem persists, please contact support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"SecurityAlertTemplates": [
{
"Title": "Code Integrity Off",
"Severity": "Critical",
"Description": "Code Integrity on {Component} is not enabled. Azure Stack is at risk of running unauthorized binaries.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Code Integrity in Audit Mode",
"Severity": "Critical",
"Description": "Code Integrity on {Component} is in audit mode. Azure Stack is at risk of running unauthorized binaries.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Code Integrity Violation",
"Severity": "Critical",
"Description": "Code Integrity detected a violation that a process {ProcessName} attempted to load {BinaryName} which violated code integrity policy.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "User Account Created",
"Severity": "Critical",
"Description": "A user account {UserName} was created for {Component}. It\u0027s a potential security risk.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"ServiceFabricAlertTemplates": [
{
"Title": "Infrastructure role is unhealthy",
"Severity": "Warning",
"Description": "The infrastructure role {Name} is experiencing issues.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Infrastructure role is unhealthy",
"Severity": "Warning",
"Description": "The infrastructure role {Name} is experiencing issues. This might impact performance and availability of Azure Stack services.",
"Remediations": "1. Select the “Repair” action to try to restart the infrastructure role {Name} , and then wait for the action to complete. Do not attempt to repair more than one alert at a time. Do not attempt the repair action if an update is in progress.\u003cnewline/\u003e 2. After the infrastructure role starts, the alert will automatically close. \u003cnewline/\u003e 3. If the alert remains active for more than a few minutes after the repair action completes, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact support."
},
{
"Title": "Infrastructure role cannot be monitored",
"Severity": "Warning",
"Description": "The infrastructure role {Name} cannot be monitored.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"StorageAlertTemplates": [
{
"Title": "Internal data store offline.",
"Severity": "Warning",
"Description": "An Internal data store service is offline. This might impact performance and availability of Azure Stack services.",
"Remediations": "1. Select the \u0027Repair\u0027 action to try to restart the Internal data store Infrastructure Role, and then wait for the action to complete. Do not attempt to repair more than one alert at a time. Do not attempt the repair action if an update is in progress. \u003cnewline/\u003e 2. After the infrastructure role starts, the alert will automatically close. \u003cnewline/\u003e 3. If the alert remains active for more than a few minutes after the repair action completes, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact support."
}
],
"UrpAlertTemplates": [
{
"Title": "Update failed.",
"Severity": "Critical",
"Description": "The most recent update failed. Microsoft recommends opening a service request as soon as possible. As part of the update process, Test-AzureStack is performed, and based on the output we generate the most appropriate alert. In this case, Test-AzureStack also failed.",
"Remediations": "Click the \"Download full logs\" button from the Update run details blade to review details on the update issue. For more information, visit \u003clink type=\u0027Url\u0027 uri=\u0027http://aka.ms/azurestackupdate\u0027\u003ehttp://aka.ms/azurestackupdate\u003c/link\u003e"
},
{
"Title": "Update needs attention.",
"Severity": "Warning",
"Description": "The most recent update needs attention. Microsoft recommends opening a service request during normal business hours. As part of the update process, Test-AzureStack is performed, and based on the output we generate the most appropriate alert. In this case, Test-AzureStack passed.",
"Remediations": "Click the \"Download full logs\" button from the Update run details blade to review details on the update issue. For more information, visit \u003clink type=\u0027Url\u0027 uri=\u0027http://aka.ms/azurestackupdate\u0027\u003ehttp://aka.ms/azurestackupdate\u003c/link\u003e"
}
],
"UsageBridgeServiceAlertTemplates": [
{
"Title": "Unable to connect to storage",
"Severity": "Warning",
"Description": "The Azure Stack Usage Bridge service is not able to connect to storage. Resource utilization data will not be sent.",
"Remediations": [
"1.Reconfigure the service by running the registration and activation procedure using the guidance from https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-registration",
"2.If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles."
]
},
{
"Title": "Unable to connect to the remote service",
"Severity": "Warning",
"Description": "The Azure Stack Usage Bridge service is unable to connect to the remote service. Resource utilization data will not be sent.",
"Remediations": [
"1.Verify that network configuration allows Usage Bridge to connect to the remote service.",
"2.Verify that the resource exists by running the following PowerShell commands: Login-AzureRmAccount -SubscriptionId (subscriptionId), followed by Get-AzureRmResource -ResourceGroupName (ResourceGroupName) -ResourceName (ResourceName)",
"3.If the resource does not exist, run the registration and activation procedure.",
"4. If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles."
]
},
{
"Title": "Unable to process usage data",
"Severity": "Warning",
"Description": "The Azure Stack Usage Bridge service has encountered an error. Resource utilization data will not be sent.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from https://aka.ms/azurestacklogfiles."
}
],
"WhsAlertTemplates": [
{
"Title": "A physical disk has failed",
"Severity": "Warning",
"Description": "A physical disk located at {FaultingObjectLocation} has failed. The storage repair process has started.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "Replace the physical disk as soon as possible to ensure full resiliency. To monitor the progress of virtual disk storage repair, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/virtualdiskhealth\u0027\u003ehttps://aka.ms/virtualdiskhealth\u003c/link\u003e. Please contact your Azure Stack solution supplier for support of this issue."
},
{
"Title": "Connectivity to a physical disk has been lost",
"Severity": "Warning",
"Description": "Connectivity to a physical disk has been lost located at {FaultingObjectLocation}.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "Replace the physical disk as soon as possible to ensure full resiliency. To monitor the progress of virtual disk storage repair, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/virtualdiskhealth\u0027\u003ehttps://aka.ms/virtualdiskhealth\u003c/link\u003e."
},
{
"Title": "A physical disk is failing",
"Severity": "Warning",
"Description": "The physical disk located at {FaultingObjectLocation} is sometimes unresponsive and is showing signs of failure.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "Replace the physical disk as soon as possible to ensure full resiliency. To monitor the progress of virtual disk storage repair, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/virtualdiskhealth\u0027\u003ehttps://aka.ms/virtualdiskhealth\u003c/link\u003e. Please contact your Azure Stack solution supplier if you need support for this issue."
},
{
"Title": "A failure of a physical disk is predicted to occur soon",
"Severity": "Warning",
"Description": "A failure of the physical disk at {FaultingObjectLocation} is predicted to occur soon.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "Replace the physical disk as soon as possible to ensure full resiliency. To monitor the progress of virtual disk storage repair, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/virtualdiskhealth\u0027\u003ehttps://aka.ms/virtualdiskhealth\u003c/link\u003e. Please contact your Azure Stack solution supplier if you need support for this issue."
},
{
"Title": "A physical disk is quarantined because it is not supported",
"Severity": "Warning",
"Description": "The physical disk located at {FaultingObjectLocation} is quarantined because it is not supported by your solution vendor. Only disks that are approved for the solution and have the correct disk firmware are supported.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "Replace this disk with a disk that has an approved manufacturer and model number for the solution. Please contact your Azure Stack solution supplier if you need support for this issue. "
},
{
"Title": "A physical disk is quarantined because its firmware version is not supported",
"Severity": "Warning",
"Description": "The physical disk located at {FaultingObjectLocation} is quarantined because its firmware version is not supported by your solution vendor. The physical disk does not have the minimum firmware version level required by this Azure Stack solution.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "1. Update the firmware on the physical disk to the target version, and then reseat the disk. \u003cnewline/\u003e2. If this issue persists, please please contact your Azure Stack solution supplier for support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "A replacement disk has existing data and is quarantined",
"Severity": "Warning",
"Description": "The replacement disk located at {FaultingObjectLocation} was previously used and may contain data from an unknown storage system. The disk is quarantined.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "Replace the disk with a new disk. If you must use this disk, remove the disk from the system, make sure there is no useful data on the disk, format the disk, and then reseat the disk."
},
{
"Title": "Failed attempt to update firmware on a physical disk",
"Severity": "Warning",
"Description": "There was a failed attempt to update firmware on the physical disk located at {FaultingObjectLocation}.\u003cnewline/\u003eDisk Description: {FaultingObjectDescription}",
"Remediations": "Please contact your Azure Stack solution provider for support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Storage device failure",
"Severity": "Critical",
"Description": "A storage device failure occurred which may cause one or more file shares to be inaccessible. Some data may be lost.",
"Remediations": "1. Check the physical and network connectivity of all storage devices.\u003cnewline/\u003e2. If all devices are connected correctly, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. You may have to restore from backup after the failed connection is restored."
},
{
"Title": "The scale unit does not have the minimum recommended storage reserve capacity",
"Severity": "Warning",
"Description": "The scale unit {FaultingObjectDescription} does not have the minimum recommended storage reserve capacity of one node. This may limit the ability to restore data resiliency if one or more drive failures occur.",
"Remediations": "1. Replace any failed or quarantined disks.\u003cnewline/\u003e2. If this issue persists, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddnode\u0027\u003ehttps://aka.ms/azurestackaddnode\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "The node is isolated from the scale unit",
"Severity": "Critical",
"Description": "The node {FaultingObjectDescription} is isolated from the scale unit because of connectivity issues.",
"Remediations": "Please contact Support. Customer Assistance is required to resolve this issue. Do not try to resolve this issue without their assistance. Before you open a support request, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Node quarantined because of recurring failures",
"Severity": "Critical",
"Description": "The node {FaultingObjectDescription} is quarantined by the scale unit because of recurring failures. There is less capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart.",
"Remediations": "1. Click the node name link in the Description field and try to cycle the node using the Power off/Power on actions on the node blade. (A physical node restart might take up to 10 minutes.)\u003cnewline/\u003e2. If this didn\u0027t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. If hardware replacement is required, there are important pre- and post-replacement steps. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackreplacenode\u0027\u003ehttps://aka.ms/azurestackreplacenode\u003c/link\u003e."
},
{
"Title": "Scale unit will go down if one more node fails",
"Severity": "Critical",
"Description": "The scale unit {FaultingObjectDescription} has multiple node failures. If another node fails, the scale unit will go down.",
"Remediations": "1. Try to recover each failed node in the scale unit one at a time. Navigate to Region management -\u003e Scale units -\u003e scale unit name -\u003e node name. On the node blade, try to cycle the node using the Power off/Power on actions. (A physical node restart might take up to 10 minutes.)\u003cnewline/\u003e2. If this didn\u0027t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. If hardware replacement is required, there are important pre- and post-replacement steps. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackreplacenode\u0027\u003ehttps://aka.ms/azurestackreplacenode\u003c/link\u003e."
},
{
"Title": "Network interface disconnected",
"Severity": "Critical",
"Description": "The network interface {FaultingObjectDescription} is disconnected. The node is unavailable and there is less capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart.",
"Remediations": "Reconnect the network cable."
},
{
"Title": "Node has missing network adapter(s) associated with the scale unit network",
"Severity": "Critical",
"Description": "The node {FaultingObjectDescription} has missing network adapter(s) associated with the scale unit network {FaultingObjectUniqueId}. The node is unavailable and there is less capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart.",
"Remediations": "1. Click the node name link in the Description field and try to cycle the node using the Power off/Power on actions on the node blade. (A physical node restart might take up to 10 minutes.)\u003cnewline/\u003e2. If this didn\u0027t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. If hardware replacement is required, there are important pre- and post-replacement steps. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackreplacenode\u0027\u003ehttps://aka.ms/azurestackreplacenode\u003c/link\u003e."
},
{
"Title": "A network interface has failed",
"Severity": "Critical",
"Description": "The network interface {FaultingObjectDescription} has failed. The node is unavailable and there is less capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart.",
"Remediations": "1. Click the node name link in the Description field and try to cycle the node using the Power off/Power on actions on the node blade. (A physical node restart might take up to 10 minutes.)\u003cnewline/\u003e2. If this didn\u0027t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. If hardware replacement is required, there are important pre- and post-replacement steps. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackreplacenode\u0027\u003ehttps://aka.ms/azurestackreplacenode\u003c/link\u003e."
},
{
"Title": "A network interface is not enabled",
"Severity": "Critical",
"Description": "The network interface {FaultingObjectDescription} is not enabled. The node is unavailable and there is less capacity available for tenant workloads. A process has been started to move tenant workloads from this node to other nodes. If there is no available capacity, some workloads may not restart.",
"Remediations": "1. Navigate to Region management -\u003e Scale units -\u003e scale unit name -\u003e node name. On the node blade, try to cycle the node using the Power off/Power on actions. (A physical node restart might take up to 10 minutes.)\u003cnewline/\u003e2. If this didn\u0027t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e. If hardware replacement is required, there are important pre- and post-replacement steps. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackreplacenode\u0027\u003ehttps://aka.ms/azurestackreplacenode\u003c/link\u003e."
},
{
"Title": "A file share is over 80% utilized",
"Severity": "Warning",
"Description": "The file share {FaultingObjectDescription} on volume {VolumeName} is over 80% utilized. If it reaches 100%, this can affect system functionality.",
"Remediations": "Do not add more load to the system, and see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddcapacity\u0027\u003ehttps://aka.ms/azurestackaddcapacity\u003c/link\u003e for information about increasing capacity."
},
{
"Title": "A file share is over 90% utilized",
"Severity": "Critical",
"Description": "The file share {FaultingObjectDescription} on volume {VolumeName} is over 90% utilized. If it reaches 100%, this can affect system functionality.",
"Remediations": "Do not add more load to the system, and see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackaddcapacity\u0027\u003ehttps://aka.ms/azurestackaddcapacity\u003c/link\u003e for information about increasing capacity."
}
],
"NRPAlertTemplate": [
{
"Title": "Node not connected to network controller",
"Severity": "Critical",
"Description": "\u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027FRP\u0027 resourceType=\u0027scaleunitnode\u0027 resourceName=\u0027{NodeName}\u0027\u003e{NodeName}\u003c/link\u003e is not connected to the network controller.",
"Remediations": "The port profile is not applied on the node or the node is not reachable from the network controller. Use the following steps to remediate this problem:\u003cnewline/\u003e1. Navigate to \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027FRP\u0027 resourceType=\u0027scaleunitnodes\u0027 resourceName=\u0027{NodeName}\u0027\u003e{NodeName}\u003c/link\u003e to open the node properties blade.\u003cnewline/\u003e2. Drain the node and then power off/power on the node using the available node actions.\u003cnewline/\u003e3. If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"SRPAlertTemplate": [
{
"Title": "Storage Resource Provider internal data store unavailable",
"Severity": "Critical",
"Description": "Cannot connect to the Storage Resource Provider internal data store.",
"Remediations": [
"1. Review the alerts on the Network Controller Role blade. If there are any active alerts, follow their remediation steps to fix them.",
"2. If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Storage Resource Provider internal data store corruption",
"Severity": "Critical",
"Description": "Storage Resource Provider internal data store corruption is detected.",
"Remediations": "1. Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Corrupted blob",
"Severity": "Critical",
"Description": "Couldn\u0027t access some of the blobs stored on these file share(s):\n{FileShares}.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "The blob service isn\u0027t running on a node",
"Severity": "Critical",
"Description": "Couldn\u0027t utilize all of your physical hardware for blob service resiliency and scale-out because the blob service didn\u0027t start on these node(s):\n{Nodes}.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "The blob service can\u0027t attach to a file share",
"Severity": "Critical",
"Description": "Couldn\u0027t access blobs stored on these file share(s):\n{FileShares}.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Blob service data is corrupted",
"Severity": "Critical",
"Description": "We couldn\u0027t access blobs stored on these shares because of problems with data corruption:\n{Shares}.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Blob service data store is corrupted",
"Severity": "Critical",
"Description": "Failures in the following shares indicate corruption issues in the blob service data store:\n{Shares}.",
"Remediations": [
"Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Table server data corruption",
"Severity": "Critical",
"Description": "There\u0027s a data corruption error in the Table server, which could cause a drop in availability for the table or queue service and result in data loss. The affected data repositories are:\n{DataRepositories}.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Account and Container Service data corruption",
"Severity": "Critical",
"Description": "There\u0027s a data corruption error in the Account and Container Service, which could cause a drop in availability for the table, queue, or blob service and result in data loss. The affected data repositories are:\n{DataRepositories}.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Share is not accessible",
"Severity": "Critical",
"Description": "Services on nodes {Nodes} can\u0027t access data on the following shares:\n{Shares}\nStorage services might be unavailable.",
"Remediations": [
"1. Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"2. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Storage service internal communication error",
"Severity": "Critical",
"Description": "Storage service internal communication error occurred when sending requests to the following nodes:\n{Nodes}.",
"Remediations": [
"1. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "ESENT data corruption",
"Severity": "Critical",
"Description": "There is ESENT data corruption in the following services:\n{Services}.",
"Remediations": "Please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Table service errors",
"Severity": "Critical",
"Description": "Errors in the Table service may cause a decrease in tenant workload performance.",
"Remediations": [
"Open the Storage Resource Provider blade, and then open the Table Service blade. Click the Availability past day chart, and then open the Availability metrics chart. If you see problems with availability, follow these steps in order:",
"1. Open the Storage Service Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"2. Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"3. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"4. Open the Infrastructure Management Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Table service errors",
"Severity": "Warning",
"Description": "Errors in the Table service may cause a decrease in tenant workload performance.",
"Remediations": [
"Open the Storage Resource Provider blade, and then open the Table Service blade. Click the Availability past day chart, and then open the Availability metrics chart. If you see problems with availability, follow these steps in order:",
"1. Open the Storage Service Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"2. Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"3. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"4. Open the Infrastructure Management Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Blob service errors",
"Severity": "Critical",
"Description": "Errors in the Blob service may cause a decrease in tenant workload performance.",
"Remediations": [
"Open the Storage Resource Provider blade, and then open the Blob Service blade. Click the Availability past day chart, and then open the Availability metrics chart. If you see problems with availability, follow these steps in order:",
"1. Open the Storage Service Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"2. Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"3. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"4. Open the Infrastructure Management Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Blob service errors",
"Severity": "Warning",
"Description": "Errors in the Blob service may cause a decrease in tenant workload performance.",
"Remediations": [
"Open the Storage Resource Provider blade, and then open the Blob Service blade. Click the Availability past day chart, and then open the Availability metrics chart. If you see problems with availability, follow these steps in order:",
"1. Open the Storage Service Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"2. Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"3. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"4. Open the Infrastructure Management Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Queue service errors",
"Severity": "Critical",
"Description": "Errors in the Queue service may cause a decrease in tenant workload performance.",
"Remediations": [
"Open the Storage Resource Provider blade, and then open the Queue Service blade. Click the Availability past day chart, and then open the Availability metrics chart. If you see problems with availability, follow these steps in order:",
"1. Open the Storage Service Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"2. Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"3. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"4. Open the Infrastructure Management Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
},
{
"Title": "Queue service errors",
"Severity": "Warning",
"Description": "Errors in the Queue service may cause a decrease in tenant workload performance.",
"Remediations": [
"Open the Storage Resource Provider blade, and then open the Queue Service blade. Click the Availability past day chart, and then open the Availability metrics chart. If you see problems with availability, follow these steps in order:",
"1. Open the Storage Service Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"2. Open the Storage Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"3. Open the Network Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"4. Open the Infrastructure Management Controller Role blade, check for any active alerts, and then follow the suggested steps to fix them.",
"If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
]
}
],
"WhsAlertTemplatesSrp": [
{
"Title": "A file share is over 80% utilized",
"Severity": "Warning",
"Description": "The file share {FaultingObjectDescription} on volume {VolumeName} is over 80% utilized. If it reaches 100%, affected tenants will not be able to use blobs, tables, or queues.",
"Remediations": "1. Navigate to Region management -\u003e Resource providers -\u003e \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027SRP\u0027\u003eStorage\u003c/link\u003e.\u003cnewline/\u003e2. On the Storage blade, click the Storage accounts tile.\u003cnewline/\u003e3. On the Storage accounts page, click Reclaim space to reclaim deleted account space. For more information, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/reclaimcapacity\u0027\u003ehttps://aka.ms/reclaimcapacity\u003c/link\u003e.\u003cnewline/\u003e4. If the issue persists, migrate storage to another file share. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/migratecontainer\u0027\u003ehttps://aka.ms/migratecontainer\u003c/link\u003e.\u003cnewline/\u003e5. If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "A file share is over 90% utilized",
"Severity": "Critical",
"Description": "The file share {FaultingObjectDescription} on volume {VolumeName} is over 90% utilized. If it reaches 100%, affected tenants will not be able to use blobs, tables, or queues.",
"Remediations": "1. Navigate to Region management -\u003e Resource providers -\u003e \u003clink type=\u0027ResourceId\u0027 resourceProvider=\u0027SRP\u0027\u003eStorage\u003c/link\u003e.\u003cnewline/\u003e2. On the Storage blade, click the Storage accounts tile.\u003cnewline/\u003e3. On the Storage accounts page, click Reclaim space to reclaim deleted account space. For more information, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/reclaimcapacity\u0027\u003ehttps://aka.ms/reclaimcapacity\u003c/link\u003e.\u003cnewline/\u003e4. If the issue persists, migrate storage to another file share. See \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/migratecontainer\u0027\u003ehttps://aka.ms/migratecontainer\u003c/link\u003e.\u003cnewline/\u003e5. If this didn’t solve the problem, please contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
}
],
"BRPAlertTemplates": [
{
"Title": "Backup failed.",
"Severity": "Critical",
"Description": "Infrastructure backup failed.",
"Remediations": "1. Start a new backup. \u003cnewline/\u003e2. If the problem persists, contact Support. Do not try to further troubleshoot steps without assistance from Support. Before you \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/newsupportrequest\u0027\u003eopen a Support request\u003c/link\u003e, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Backup failed because can\u0027t access backup share.",
"Severity": "Critical",
"Description": "Infrastructure backup failed because the backup file share is not accessible. This might be because of an authentication issue, or access is denied by the external file server.",
"Remediations": "1. Confirm that you can map a file share from a computer on the same network as the Azure Stack solution. Run Test-AzureStack -Include AzsBackupShareAccessibility from the privileged endpoint (PEP) to get additional diagnostic information to help troubleshoot this problem. Refer to validation documentation for more information \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/validateazurestackbackupsettings\u0027\u003ehttps://aka.ms/validateazurestackbackupsettings\u003c/link\u003e\u003cnewline/\u003e2. If the problem persists, check that the username and password used to access the file share {ExternalSharePath} are still valid. \u003cnewline/\u003e3. If the problem persists, sign in to the server on which the share is located, check that the share is enabled and accessible. \u003cnewline/\u003e4. If this didn\u0027t solve the problem, contact Support. Do not try to further troubleshoot without their assistance. Before you \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/newsupportrequest\u0027\u003eopen a support request\u003c/link\u003e, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Backup failed because backup file share path is not valid.",
"Severity": "Critical",
"Description": "Infrastructure backup failed because of an issue with the path of the external backup file share.",
"Remediations": "1. Confirm that the backup file share path is valid and accessible over the network. Run Test-AzureStack -Include AzsBackupShareAccessibility from the privileged endpoint (PEP) to get additional diagnostic information to help troubleshoot this problem. Refer to validation documentation for more information \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/validateazurestackbackupsettings\u0027\u003ehttps://aka.ms/validateazurestackbackupsettings\u003c/link\u003e\u003cnewline/\u003e2. If the file share path is incorrect or no longer valid, navigate to Infrastructure backup resource provider and update the file share path and credentials. \u003cnewline/\u003e3. If the problem persists, check for any errors on the file server that may cause the file share to be invalid. \u003cnewline/\u003e4. If this didn\u0027t solve the problem, contact Support. Do not try to further troubleshoot without their assistance. Before you \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/newsupportrequest\u0027\u003eopen a support request\u003c/link\u003e, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Backup failed because of network connectivity issues.",
"Severity": "Critical",
"Description": "Infrastructure backup failed to complete because of an issue connecting to the external backup file share. This may be a temporary issue caused by a network outage, or some other infrastructure issue.",
"Remediations": "1. Confirm that the network infrastructure between the Azure Stack solution and the backup file share is healthy. Run Test-AzureStack -Include AzsBackupShareAccessibility from the privileged endpoint (PEP) to get additional diagnostic information to help troubleshoot this problem. Refer to validation documentation for more information \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/validateazurestackbackupsettings\u0027\u003ehttps://aka.ms/validateazurestackbackupsettings\u003c/link\u003e\u003cnewline/\u003e2. If the problem persists, contact Support. Do not try to further troubleshoot without their assistance. Before you \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/newsupportrequest\u0027\u003eopen a support request\u003c/link\u003e, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Backup failed because file share is full.",
"Severity": "Critical",
"Description": "Infrastructure backup failed because the backup file share is out of capacity.",
"Remediations": "1. Navigate to the backup file share {ExternalSharePath} and delete one or more of the older backups. \u003cnewline/\u003e2. Start a new backup. The new backup folder will have a different time stamp so there is no risk of conflict. \u003cnewline/\u003e3. Confirm that the backup completed, and the resource shows up in the list of available backups."
},
{
"Title": "Cannot write to the backup file share.",
"Severity": "Critical",
"Description": "The file share is accessible over the network, but infrastructure backup failed to write to the file share.",
"Remediations": "1. Check that the backup account in use by Azure Stack has permissions to the backup location. Run Test-AzureStack -Include AzsBackupShareAccessibility from the privileged endpoint (PEP) to get additional diagnostic information to help troubleshoot this problem. Refer to validation documentation for more information \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/validateazurestackbackupsettings\u0027\u003ehttps://aka.ms/validateazurestackbackupsettings\u003c/link\u003e\u003cnewline/\u003e2. If the problem persists, contact the administrator of the backup location to troubleshoot the problem. \u003cnewline/\u003e3. After fixing the problem, start a new backup. \u003cnewline/\u003e4. If the problem persists, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
},
{
"Title": "Backup file share is not accessible.",
"Severity": "Critical",
"Description": "Infrastructure backup failed to access the file share over the network. This could be due to a problem with the backup account, the backup accounts permissions to the file share, or a general network issue.",
"Remediations": "1. Run Test-AzureStack -Include AzsBackupShareAccessibility from the privileged endpoint (PEP) to get additional diagnostic information to help troubleshoot this problem. Refer to validation documentation for more information \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/validateazurestackbackupsettings\u0027\u003ehttps://aka.ms/validateazurestackbackupsettings\u003c/link\u003e\u003cnewline/\u003e2. Contact the identity service, network, or backup location administrator to help investigate the problem. \u003cnewline/\u003e3. After fixing the issue, start a new backup. \u003cnewline/\u003e4. If the issue persists, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
},
{
"Title": "Backup failed during file copy to external share.",
"Severity": "Critical",
"Description": "Infrastructure backup failed because of an issue writing the backup files to the external backup file share. This may be a temporary issue caused by a network outage or some other issue with the infrastructure.",
"Remediations": "1. Navigate to the backup file share {ExternalSharePath} and delete the latest incomplete backup. If you are not sure which one to delete, do not make any changes to the file share. \u003cnewline/\u003e2. Start a new backup. The new backup folder will have a different time stamp so there is no risk of conflict. \u003cnewline/\u003e3. Confirm that the backup completed, and the resource shows up in the list of available backups. \u003cnewline/\u003e4. If the problem persists, contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Backup only partially completed because of an unexpected error.",
"Severity": "Critical",
"Description": "Infrastructure backup completed only for a subset of services. Backup of the following roles failed: {RolesFailedInBackup}. This might be a temporary issue with the backup controller.",
"Remediations": "1. Start a new backup. \u003cnewline/\u003e2. Confirm that the backup completed, and the resource shows up in the list of available backups. \u003cnewline/\u003e3. If this didn\u0027t solve the problem, contact Support. Before you do, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e."
},
{
"Title": "Backup is not enabled for a location.",
"Severity": "Warning",
"Description": "Backup is not enabled for the location. If this is a production environment, make sure to enable backup.",
"Remediations": "1. Navigate to Infrastructure backup, and configure the settings to enable backup. \u003cnewline/\u003e2. For additional guidance and sizing recommendations, see \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/backupazurestack\u0027\u003ehttps://aka.ms/azurestackbackup\u003c/link\u003e."
},
{
"Title": "Automatic backups are disabled.",
"Severity": "Warning",
"Description": "Automatic backups are currently disabled. Infrastructure backups have not been created in the past 24 hours. This warning will appear every 24 hours until the issue is resolved.",
"Remediations": "1. Address the issues that require the automatic backups to be disabled. \u003cnewline/\u003e 2. After all issues preventing backups are resolved, navigate to infrastructure backup and configure the settings to enable automatic backups."
},
{
"Title": "The scheduled backup was skipped due to a conflict with failed operations.",
"Severity": "Warning",
"Description": "The infrastructure backup process did not start because another operation failed to complete. Automatic backups cannot start if another process has not completed. Examples of operations that must complete before an automatic backup can start include an Azure Stack update, secrets rotation, or a field replaceable unit.",
"Remediations": "1. Address issues with the following failed operations: {FailedAdminOperations}. \u003cnewline/\u003e 2. If Azure Stack update failed, you need to address any underlying issues that are causing the update to fail and restart the update. \u003cnewline/\u003e 3. If secret rotation failed to complete, contact Support. Do not try to further troubleshoot without their assistance. \u003cnewline/\u003e 4. If servicing of a node has not completed, or replacement of a failed node is in progress, you need to complete those operations before backups can resume. \u003cnewline/\u003e 5. If you cannot resolve the failed operations, contact Support. Do not try to further troubleshoot without their assistance. \u003cnewline/\u003e 6. After resolving issues for failed operations, manually start the infrastructure backup or wait for the next automatic backup."
},
{
"Title": "Backup could not be deleted.",
"Severity": "Warning",
"Description": "Infrastructure backup failed to delete backup data from the backup location. The backup data that was not deleted has the following IDs: {FailedToDeleteBackupIds}. The backup location is accessible over the network. This issue could be due to a problem with the permissions of the backup account, the permissions set for the files on the backup location, or due to locked files.",
"Remediations": "1. Manually delete the backup(s) by deleting the following folder(s) {FailedToDeleteBackupFullPaths}. \u003cnewline/\u003e2. If you cannot delete a folder, contact the backup location administrator to help investigate the problem. \u003cnewline/\u003e3. After you manually deleted the folder(s), wait 10 minutes so that the list of available backups can refresh. \u003cnewline/\u003e4. If the issue persists, start the log file collection process using the guidance from \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestacklogfiles\u0027\u003ehttps://aka.ms/azurestacklogfiles\u003c/link\u003e, and then contact Support."
},
{
"Title": "The backup file share is almost full.",
"Severity": "Warning",
"Description": "The backup file share is at {ExternalShareCapacityThreshold} utilization. If you don\u0027t free space, future infrastructure backups will fail to complete.",
"Remediations": "1. Navigate to the backup file share {ExternalSharePath} and delete one or more of the older backups."
},
{
"Title": "Infrastructure backup settings need to be updated.",
"Severity": "Warning",
"Description": "Infrastructure backup settings need to be updated to use a certificate instead of a symmetric key. Support for symmetric keys is deprecated and will be removed in a future release. Make sure to update your settings by providing a certificate. The certificate can be self-signed. Trusted CA certificates are not required. If you chose to close this warning manually, it will appear again in 24 hours until the corrective actions are taken.",
"Remediations": "1. Navigate to infrastructure backup resource provider in the administrator portal and open the Settings blade. \u003cnewline/\u003e2. Under Encryption Settings, select use certificate, upload certificate, and select Save. \u003cnewline/\u003e3. For additional information, see best practices documentation \u003clink type=\u0027Url\u0027 uri=\u0027https://aka.ms/azurestackbackupbestpractices\u0027\u003ehttps://aka.ms/azurestackbackupbestpractices\u003c/link\u003e."
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment