MSEndpointMgr
Home » Azure » Log Analytics » Application Reliability Monitor with Log Analytics

Application Reliability Monitor with Log Analytics

If there is one certainty when it comes to computing, it is that applications will at some point crash or hang. The reasons are widespread, incompatibility between installed frameworks, dynamic link libraries, or drivers, but the issue is universal. Today we are however protected for the most part from the horrible side effects of some of the crash and hang events, as Windows itself will attempt to silently recover the application, resulting in less blue screens of death.

There is one important thing to note though, and that is although Windows does a much better job at masking the issues today, they will not go unnoticed by the end user. The relaunching of background services might just manifest itself as a slight delay or “hang” on the system, where foreground applications might have to be relaunched and of course that is most noticeable.

Windows Reliability Monitor

Monitoring these events in Windows used to be something IT engineers would turn to the event logs for, and yes you still can, however, the build-in Windows Reliability Monitor provides a much nicer graphical representation of the system stability issues over time.

As we can see from the screenshot below, there are multiple critical events on this device, mostly focused around the “Killer Intelligence Center” (not pointing fingers, merely an example).

If we double click on one of these critical instances we obtain additional information around the application version, the faulting modules, manufacturer etc.

This is all helpful when troubleshooting stability issues on Windows devices, and elements of what we see in the Reliability Monitor are of course found in Endpoint Analytics today.

Endpoint Analytics

Endpoint Analytics in Microsoft Endpoint Manager, are a set of informative events focused around things such as;

  • Startup Performance
  • Application Reliability
  • Resource Performance
  • Battery Health (in preview)

Focusing on Application Reliability for the purpose of this post, we can then obtain information from one of the following categories;

  • App Reliability Score
  • App Performance
  • Model Performance
  • Device Performance
  • OS Version Performance

Furthermore, limiting our focus to the App Reliability Score, we can then see how stable an application is, with the number of crashes, and the mean time to failure being included;

This is of course all very useful information, and is something that we can use to track how stable our systems are based on the combination of applications, OS versions etc. On reviewing the application stability events in Windows however, I found that there was a disconnect from what I was seeing here, both in terms of the crash events, and details such as which device was impacted, at what time, and additional information on the crash event.

This got me thinking, wouldn’t it be great to have the Windows Reliability Monitor ported to something in the cloud that we could monitor all of our devices with?

Log Analytics

At this point if you have read any of my previous log analytics posts, you would know I am quite the fan. I really appreciate that Microsoft has given us a platform other than SSRS or PowerBI, that we can send, query, and visualise datasets in. In fact, I did overhear Rod Trent coin the phrase “KQL is the new PowerShell”, and for some, the addition to find out what you can do with KQL is a major thing.

Using the Reliability Monitor as the template, I looked at the WMI class which contains the application reliability events, this being Win32_ReliabilityRecords (Win32\_ReliabilityRecords class | Microsoft Docs). Here we can see querying this class we get back a heap of useful information;

At this point we have the information about the type of event, the offending application, any associated modules, and the event time. Using PowerShell to extract this information we can then also combine that to obtain other items, such as the Computer Name, ManagedDeviceID, and signing information for both the application and associated modules. At this point we essentially have what the Windows Reliability Monitor uses, just not visualised in a friendly manner (I mean non admins here).

Taking that information and pushing up to Log Analytics however, results in not only the ability to gather information from all our devices, but also the ability to query, and use the information in a workbook (like PowerBI but easier IMO). So, I created a process to do just that, with examples available in GitHub for sending data by hardcoding the workspace ID and key, or using a function app such as the one Jan Ketil Skanke covers in this post – Securing Intune Enhanced Inventory with Azure Function – MSEndpointMgr

In the below example script, I can use a single script to collect the following information through Proactive Remediations on a scheduled basis (recommendation is every 24 hours);

<#
.SYNOPSIS
Collect application reliability and upload to Log Analytics for further processing.

.DESCRIPTION
This script will audit reliability events and upload this to a Log Analytics Workspace. This allows you to easily search in device hardware and installed apps inventory.
The script is meant to be runned on a daily schedule either via Proactive Remediations (RECOMMENDED) in Intune or manually added as local schedule task on your Windows 10 Computer.

.EXAMPLE
Invoke-CustomAppReliabilityWithAPI.ps1 (Required to run as System or Administrator)

.NOTES
FileName:    Invoke-CustomAppReliabilityWithAPI.ps1
Author:      Maurice Daly
Contributor: Jan Ketil Skanke / Sandy Zeng
Contact:     @modaly_it
Created:     2022-01-05
Updated:     2022-20-05

Version history:
1.0.0 - (2022 - 01 - 05) Script created
#>

#region initialize
# Define your azure function URL: 
# Example 'https://<appname>.azurewebsites.net/api/<functioname>'

$AzureFunctionURL = "https://<YOUR URL HERE>/api/LogCollectorAPI"

# Enable TLS 1.2 support 
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
#Set Log Analytics Log Name
$AppReliabilityLogName = "AppReliability"
$Date = (Get-Date)
#endregion initialize

#region functions
# Function to get Azure AD DeviceID
function Get-AzureADDeviceID {
    <#
    .SYNOPSIS
        Get the Azure AD device ID from the local device.
    
    .DESCRIPTION
        Get the Azure AD device ID from the local device.
    
    .NOTES
        Author:      Nickolaj Andersen
        Contact:     @NickolajA
        Created:     2021-05-26
        Updated:     2021-05-26
    
        Version history:
        1.0.0 - (2021-05-26) Function created
    #>
	Process {
		# Define Cloud Domain Join information registry path
		$AzureADJoinInfoRegistryKeyPath = "HKLM:\SYSTEM\CurrentControlSet\Control\CloudDomainJoin\JoinInfo"
		
		# Retrieve the child key name that is the thumbprint of the machine certificate containing the device identifier guid
		$AzureADJoinInfoThumbprint = Get-ChildItem -Path $AzureADJoinInfoRegistryKeyPath | Select-Object -ExpandProperty "PSChildName"
		if ($AzureADJoinInfoThumbprint -ne $null) {
			# Retrieve the machine certificate based on thumbprint from registry key
			$AzureADJoinCertificate = Get-ChildItem -Path "Cert:\LocalMachine\My" -Recurse | Where-Object { $PSItem.Thumbprint -eq $AzureADJoinInfoThumbprint }
			if ($AzureADJoinCertificate -ne $null) {
				# Determine the device identifier from the subject name
				$AzureADDeviceID = ($AzureADJoinCertificate | Select-Object -ExpandProperty "Subject") -replace "CN=", ""
				# Handle return value
				return $AzureADDeviceID
			}
		}
	}
} #endfunction 
function Get-AzureADJoinDate {
    <#
    .SYNOPSIS
        Get the Azure AD device ID from the local device.
    
    .DESCRIPTION
        Get the Azure AD device ID from the local device.
    
    .NOTES
        Author:      Nickolaj Andersen
        Contact:     @NickolajA
        Created:     2021-05-26
        Updated:     2021-05-26
    
        Version history:
        1.0.0 - (2021-05-26) Function created
    #>
	Process {
		# Define Cloud Domain Join information registry path
		$AzureADJoinInfoRegistryKeyPath = "HKLM:\SYSTEM\CurrentControlSet\Control\CloudDomainJoin\JoinInfo"
		
		# Retrieve the child key name that is the thumbprint of the machine certificate containing the device identifier guid
		$AzureADJoinInfoThumbprint = Get-ChildItem -Path $AzureADJoinInfoRegistryKeyPath | Select-Object -ExpandProperty "PSChildName"
		if ($AzureADJoinInfoThumbprint -ne $null) {
			# Retrieve the machine certificate based on thumbprint from registry key
			$AzureADJoinCertificate = Get-ChildItem -Path "Cert:\LocalMachine\My" -Recurse | Where-Object { $PSItem.Thumbprint -eq $AzureADJoinInfoThumbprint }
			if ($AzureADJoinCertificate -ne $null) {
				# Determine the device identifier from the subject name
				$AzureADJoinDate = ($AzureADJoinCertificate | Select-Object -ExpandProperty "NotBefore")
				# Handle return value
				return $AzureADJoinDate
			}
		}
	}
} #endfunction 
#Function to get AzureAD TenantID
function Get-AzureADTenantID {
	# Cloud Join information registry path
	$AzureADTenantInfoRegistryKeyPath = "HKLM:\SYSTEM\CurrentControlSet\Control\CloudDomainJoin\TenantInfo"
	# Retrieve the child key name that is the tenant id for AzureAD
	$AzureADTenantID = Get-ChildItem -Path $AzureADTenantInfoRegistryKeyPath | Select-Object -ExpandProperty "PSChildName"
	return $AzureADTenantID
}
# Function to get all Installed Application
function Get-InstalledApplications() {
	param (
		[string]$UserSid
	)
	
	New-PSDrive -PSProvider Registry -Name "HKU" -Root HKEY_USERS | Out-Null
	$regpath = @("HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\*")
	$regpath += "HKU:\$UserSid\Software\Microsoft\Windows\CurrentVersion\Uninstall\*"
	if (-not ([IntPtr]::Size -eq 4)) {
		$regpath += "HKLM:\Software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\*"
		$regpath += "HKU:\$UserSid\Software\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\*"
	}
	$propertyNames = 'DisplayName', 'DisplayVersion', 'Publisher', 'UninstallString'
	$Apps = Get-ItemProperty $regpath -Name $propertyNames -ErrorAction SilentlyContinue | . { process { if ($_.DisplayName) { $_ } } } | Select-Object DisplayName, DisplayVersion, Publisher, UninstallString, PSPath | Sort-Object DisplayName
	Remove-PSDrive -Name "HKU" | Out-Null
	Return $Apps
}
#endregion functions

#region script

#Get Common data for App and Device Inventory: 
#Get Intune DeviceID and ManagedDeviceName
if (@(Get-ChildItem HKLM:SOFTWARE\Microsoft\Enrollments\ -Recurse | Where-Object { $_.PSChildName -eq 'MS DM Server' })) {
	$MSDMServerInfo = Get-ChildItem HKLM:SOFTWARE\Microsoft\Enrollments\ -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.PSChildName -eq 'MS DM Server' }
	$ManagedDeviceInfo = Get-ItemProperty -LiteralPath "Registry::$($MSDMServerInfo)" -ErrorAction SilentlyContinue
}
$ManagedDeviceName = $ManagedDeviceInfo.EntDeviceName
$ManagedDeviceID = $ManagedDeviceInfo.EntDMID
$AzureADDeviceID = Get-AzureADDeviceID
$AzureADTenantID = Get-AzureADTenantID

#Get Computer Info
$ComputerInfo = Get-CimInstance -ClassName Win32_ComputerSystem
$ComputerName = $ComputerInfo.Name
$ComputerManufacturer = $ComputerInfo.Manufacturer

# Collect log flag
$CollectAppReliability = $true

if ($ComputerManufacturer -match "HP|Hewlett-Packard") {
	$ComputerManufacturer = "HP"
}

#region APPRELIABILITY
if ($CollectAppReliability) {
	# Obtain reliability data
	$AppReliabilityInventory = @()
	$ReliabilityEvents = Get-CimInstance win32_ReliabilityRecords | Where-Object { $_.EventIdentifier -match "1000|1002" -and $_.TimeGenerated -ge (Get-Date).AddHours(-24) }
	
	# Loop and process reliability events
	foreach ($ReliabilityEvent in $ReliabilityEvents) {
		
		$ApplicationName = $ReliabilityEvent.ProductName
		$ReliabilityEventId = $ReliabilityEvent.EventIdentifier
		$ReliabilityEventType = $ReliabilityEvent.SourceName
		[datetime]$ReliabilityEventTime = $ReliabilityEvent.TimeGenerated
		
		$ApplicationPath = $ReliabilityEvent.InsertionStrings | Where-Object { $_ -like "*\$ApplicationName" } | Select-Object -Unique
		
		if (-not ([string]::IsNullOrEmpty($ApplicationPath)) -and (Test-Path -Path $ApplicationPath)) {
			$ApplicationDetails = Get-ItemProperty -Path $ApplicationPath
			
			# Get file siging details
			$ApplicationSigningDetails = Get-AuthenticodeSignature -FilePath $ApplicationDetails.FullName
			$ApplicationSigningCert = $ApplicationSigningDetails.SignerCertificate
			
			# Get application publisher details
			$ApplicationPublisher = $ApplicationDetails.VersionInfo | Select-Object -ExpandProperty CompanyName
			if ([string]::IsNullOrEmpty($ApplicationPublisher)) {
				$ApplicationPublisher = "Unknown"
			}
			
			# Get version information
			$ApplicationVersion = $ApplicationDetails.VersionInfo.FileVersionRaw
			if ([string]::IsNullOrEmpty($ApplicationVersion)) {
				$ApplicationVersion = "Unavailable"
			}
			
			# Get faulting module
			if ($ReliabilityEvent.Message -match "module name") {
				$ApplicationFaultingModule = ((($ReliabilityEvent.Message.Split(",")) | Where-Object { $_ -match "Faulting module name:" }).Split(":") | Select-Object -Last 1).Trim()
				$ApplicationFaultingModulePath = $($ReliabilityEvent.Message).Split() | Where-Object { $_ -like "*\$ApplicationFaultingModule" }
			} else {
				$ApplicationFaultingModule = $null
				$ApplicationFaultingModulePath = $null
			}
			
			# Create JSON to Upload to Log Analytics
			$ReliabilityEventPayload = New-Object System.Object
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "ManagedDeviceName" -Value "$ManagedDeviceName" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "ManagedDeviceID" -Value "$ManagedDeviceID" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "ComputerName" -Value "$ComputerName" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "Application" -Value "$ApplicationName" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "EventType" -Value "$ReliabilityEventType" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "EventId" -Value "$ReliabilityEventId" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "ApplicationPublisher" -Value "$ApplicationPublisher" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "ApplicationPath" -Value "$ApplicationPath" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "ApplicationSignatureCert" -Value "$ApplicationSigningCert" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "ApplicationVersion" -Value "$ApplicationVersion" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "FaultingModule" -Value "$ApplicationFaultingModule" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "FaultingModulePath" -Value "$ApplicationFaultingModulePath" -Force
			$ReliabilityEventPayload | Add-Member -MemberType NoteProperty -Name "EventGenerated" -Value "$ReliabilityEventTime" -Force
			
			# Add event to array
			$AppReliabilityInventory += $ReliabilityEventPayload
		}
	}
}

#endregion APPRELIABILITY

#Randomize over 50 minutes to spread load on Azure Function - disabled on date of enrollment 
$JoinDate = Get-AzureADJoinDate
$DelayDate = $JoinDate.AddDays(1)
$CompareDate = ($DelayDate - $JoinDate)
if ($CompareDate.Days -ge 1) {
	Write-Output "Randomzing execution time"
	#$ExecuteInSeconds = (Get-Random -Maximum 3000 -Minimum 1)
	#Start-Sleep -Seconds $ExecuteInSeconds
}
#Start sending logs
$date = Get-Date -Format "dd-MM HH:mm"
$OutputMessage = "InventoryDate:$date "

$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")

$LogPayLoad = New-Object -TypeName PSObject
$LogPayLoad | Add-Member -NotePropertyMembers @{ $AppReliabilityLogName = $AppReliabilityInventory }

# Construct main payload to send to LogCollectorAPI
$MainPayLoad = [PSCustomObject]@{
	AzureADTenantID = $AzureADTenantID
	AzureADDeviceID = $AzureADDeviceID
	LogPayloads	    = $LogPayLoad
}

$MainPayLoadJson = $MainPayLoad | ConvertTo-Json -Depth 9

# Sending data to API
try {
	$ResponseInventory = Invoke-RestMethod $AzureFunctionURL -Method 'POST' -Headers $headers -Body $MainPayLoadJson
	$OutputMessage = $OutPutMessage + "Inventory:OK " + $ResponseInventory
} catch {
	$ResponseInventory = "Error Code: $($_.Exception.Response.StatusCode.value__)"
	$ResponseMessage = $_.Exception.Message
	$OutputMessage = $OutPutMessage + "Inventory:Fail " + $ResponseInventory + $ResponseMessage
}

# Check status and report to Proactive Remediations
if ($ResponseInventory -match "200") {
	$AppReliabilityResponse = $ResponseInventory.Split(",") | Where-Object { $_ -match "AppReliability:" }
	if ($AppReliabilityResponse -match "AppReliability:200") {
		$OutputMessage = $OutPutMessage + " AppReliability:OK " + $AppReliabilityResponse
	} else {
		$OutputMessage = $OutPutMessage + " AppReliability:Fail " + $AppReliabilityResponse
	}
	Write-Output $OutputMessage
	if ($AppReliabilityResponse -notmatch "AppReliability:200") {
		Exit 1
	} else {
		Exit 0
	}
} else {
	Write-Output "Error: $($ResponseInventory), Message: $($ResponseMessage)"
	Exit 1
}
#endregion script

The above script and the alternative script without the Azure function API integration are available on the MSEndpointMgr GitHub site here – Reporting/Scripts at main · MSEndpointMgr/Reporting (github.com)

Application Reliability Monitor Workbook

With the data now flowing from the clients to Log Analytics, we can visualise it within a workbook, something that your CIO will appreciate. Below then is the result of this, a combination of a master and child workbook that help you to visualise the following information about your environment;

  • Application Event Timeline
    This time chart shows application reliability incidents over time defined in the time range drop down
  • Application Event Type
    This area chart shows us the split between application crash and hang events over time
  • Application Details
    The data grid provides detailed information including;
    • Time of the event
    • Computer name
    • Application Name
    • Application Publisher
    • Application Path
    • Application Version
  • Application Error Events
    This data grid with bar chart shows the top applications which have critical errors
  • Application Hand Events
    This data grid provides the same information as above, but focused on hang events only
  • Manufacturer Summary
    This pie chart shows the manufacturer stats based on both hang and crash events
  • Application Reliability Issues – By Exe
    This data grid with bar chart shows the top applications with events in your environment

Clicking on the applications highlighted in blue (linked) will then open a slide in workbook with additional details about the events impacting that executable.

This workbook contains the following additional information;

  • Application Timeline
    Time chart of events based on the application
  • Application Count
    Count of application versions detected in your environment
  • Associated Faulting Modules
    Faulting modules (DLL’s etc)
  • Computers Impacted
    List of devices which have events for the application
  • Additional Application Issues
    Additional application events from the same manufacturer (this will dynamically display where the count is more than 0)

The workshop JSON’s (AppReliabilityDashboard & AppReliabilityDetails) are available here – Reporting/Workbooks at main · MSEndpointMgr/Reporting (github.com). Putting this together we have a reporting solution where you can do what you would like to with the data, including extending to logic apps or monitor events to inform you of spikes etc.

Workbook Configuration & Data Collection Prerequisites

Obviously there are some modifications that are required in order to get these workbooks to function, and you will need a Log Analytics workspace in the first instance to send the data to (something I am not covering here, but Microsoft docs have you covered – Create a Log Analytics workspace in the Azure portal – Azure Monitor | Microsoft Docs).

From a client device side of things, we are leveraging Proactive Remediations in this instance and therefore you will need to be licensed. You can view the Microsoft docs (What is Endpoint analytics? – Microsoft Endpoint Manager | Microsoft Docs) for more information, and it is possible to do something similar through scheduled tasks, but you will not have overview of the jobs, as you do within PR.

Deploying PowerShell script via Proactive Remediations

First things first, my recommendation here is to use a function app for log event ingestion as this provides the most secure method, with no hardcoding of workspace information. If you do opt to use the script with these values hard coded, just be aware that an admin of the device could obtain the workshop details, as they are in plain text.

  • Log into the Endpoint Manager Admin Center
  • Click on Reports
  • Click on Endpoint Analytics
  • Click on Proactive Remediations
  • Now click on “+ Create Script Package”
  • Give your PR script a name, “Collect Custom Inventory” for instance
  • Click on the folder icon on the “Detection Script File” line to browse for the collection PowerShell script.

    Please note that you will have to either update the API URI or the Workspace ID / Primary key depending on which script you are using
  • Add any required Scope Tags
  • Assign the script to a Group of your choice, defining a time to run
  • Click on Create at the “Review + Create” and now your clients will need time to check in

Workbook Modifications

Before we attempt to add the workbook, we should confirm that we have data to work with. Going into your Log Analytics workspace and click on Logs, check for the “AppReliability_CL” under your custom logs section, as per the screenshot;

Running a quick query on the custom log you should have something like the below;

Now onto the workbook modifications. Taking the JSON for the main workbook, and pasting that into a new blank workbook, you can click on Apply. At this point you should have something similar to the screenshot below;

In order to launch the details workbook, we need to specify a Log Analytics workspace ID and workbook ID, this is done by editing the parameters at the top of the main workbook.

Editing the Log Workspace parameter first, we need to specify the Log Analytics workspace name we are using for the log file, and then press the “Run Query” button, which should result in a single entry in the results pane.

For the child “App Details” workbook, you will need to import the JSON and save as “Application Reliability – App Details” (in this example), then editing the “DetailsWorkbook” parameter you should enter something similar to the below;

The above values are important for the child workbook, without that it will not work, so please ensure to edit these values. Once your clients check in, your workbooks should behave like in below video;

Conclusion

Leveraging PowerShell, Azure Functions (recommended), and Log Analytics we now have an application reliability dashboard that you can use to monitor the stability of client device applications across your entire environment (where of course you have internet, co-managed or Intune managed, and proactive remediations).

I hope you find this useful and I am looking for feedback on this, and any other functions you would like to have added.

Cost Note: Log ingestion costs will apply, however, they should be very small as the typically the payload sizes are less than 100Kb. As an example in an environment with 5000 devices / circa 5GB of data, log ingestion would cost circa $15-$25/€14-€23 per month based on a combination of hardware, application, and app reliability data. You can monitor this cost, and of course it depends on your region. I would suggest that the logs are stored within their own resource group and LA workspace for you to ensure you have clear visibility of the cost.

(3161)

Maurice Daly

Maurice has been working in the IT industry for the past 20 years and currently working in the role of Senior Cloud Architect with CloudWay. With a focus on OS deployment through SCCM/MDT, group policies, active directory, virtualisation and office 365, Maurice has been a Windows Server MCSE since 2008 and was awarded Enterprise Mobility MVP in March 2017. Most recently his focus has been on automation of deployment tasks, creating and sharing PowerShell scripts and other content to help others streamline their deployment processes.

Add comment

Sponsors

Subscribe

Do you want to be notified of new posts on our site?

Please enter your email address below:

Categories

MSEndpointMgr.com use cookies to ensure that we give you the best experience on our website.