macOS Performance Monitoring: Analysis and Improved Collection

I have spent the last few days improving the collection and analysis features of the original powermetrics command to better support Splunk ingestion.

The downside to the new approach is we have to run the new command as a script, the upside is that we now have easy access to timestamped energy impact data that is very easy to parse and chart in any log analytics tool. I know Splunk the best so I use it for my examples, but the energy data is completely portable.

Note: This script generates a fair bit of data, I recommend you limit your deployment of this to only a few machines.

Why the change to a script?

The answer has to do with how LaunchDaemons are loaded without a shell environment. My solution to add timestamps is very simple, but relies on a shell output redirect aka “pipe” to have the sed command prepend the date to each output line. Pipes and other output redirection are are not supported by LaunchDaemons directly but there are some work-arounds.

For example the following would not work in a LaunchDaemon but will work in a terminal or other shell environment.

echo "hi" | sed 's/hi/goodbye/'
# a shell environment will output "goodbye"
# a LaunchDaemon will produce an error

The good news is loading a shell script via a LaunchDaemon side-steps these limitations and is what Jamf and most other MDM tools use to be able to remotely run commands and scripts from the controlling LaunchDaemon. I will detail why this works and the intricacies of LaunchDaemons/Agents in a later blog post.

By loading the new powermetrics collection command as a script we are also able to create a while loop to handle the date command generating the proper date and time for each line.

How To: Install the Script

1) Create the file /usr/local/bin/powermetrics_custom.sh with the following contents.

#!/bin/bash

# While loop to make the date/time prepended to each line accurate
# sed command prepends a specifically formatted date and time to each line.

while true
do
    /usr/bin/powermetrics -b 1 -n 1 -s tasks --show-process-energy --show-process-io | sed -l "s/^/$(date +%Y-%m-%dT%H:%M:%S" "%Z) /" >> /tmp/powermetrics.log
    sleep 60
done

2) Create the file /Library/LaunchDaemons/com.cmdsec.energyimpact.logger.plist with the contents

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>KeepAlive</key>
	<true/>
	<key>Label</key>
	<string>com.cmdsec.energyimpact.logger</string>
	<key>LowPriorityBackgroundIO</key>
	<true/>
	<key>Program</key>
	<string>/usr/local/bin/powermetrics_custom.sh</string>
	<key>RunAtLoad</key>
	<true/>
</dict>
</plist>

3) Important! Set permissions on both files

#################################
#  IMPORTANT
#################################  
# Set the proper permissions on the script
chmod 744 /usr/local/bin/powermetric_custom.sh

# Set the proper permissions on the LaunchDaemon
chmod 644 /Library/LaunchDaemons/com.cmdsec.energyimpact.logger.plist

4) Load the LaunchDaemon and start powermetrics collection

sudo launchctl load /Library/LaunchDaemons/com.cmdsec.energyimpact.logger.plist

Parsing the Data in Splunk

When uploading or indexing the powermetrics data into Splunk make sure to select “generic_single_line” as the sourcetype. The output will be slightly messy on the lines not containing the actual energy impact output, this is normal and does not affect anything.

At this point in the process you will have lines of text in Splunk with the correct timestamps, but Splunk will have no idea which value is which field. In the next step we will define field extractions via a regular expression to tell Splunk how to map fields to values.

To extract the fields required for analysis navigate inside the Splunk web console to: Settings > Fields > Field Extractions (add new)

Configure the field extraction options as shown below:

# Here is the regex to put in the "Extraction/Transform" field:

\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\s\w{3}\s(?P<process_name>[\w \(\)\-\.]+)(\b|\))\s{3,}(?P<pid>[-\d]+)\s+(?P<cpu_ms_s>[\d\.]+)\s+(?P<percent_cpu_user>[\d\.]+)\s+(?P<deadlines_lt_2ms>[\d\.]+)\s+(?P<deadlines_2_to_5ms>[\d\.]+)\s+(?P<wakeups>[\d\.]+)\s+(?P<intr_pkg_idle>[\d\.]+)\s+(?P<bytes_read>[\d\.]+)\s+(?P<bytes_written>[\d\.]+)\s+(?P<pageins>[\d\.]+)\s+(?P<energy_impact>[\d\.]+)

Analytics Basics

If you are an advanced Splunk user the data will be properly key/value mapped and any Splunk analytics searches or functions will work. I have detailed one of the easier and most useful commands for this analysis below.

To map out the energy impact score over time the best tool in Splunk is the “timechart” command. Click Search in the top left of the Splunk console to be presented with a search field, enter the below text, and hit enter.

source="/tmp/powermetrics.log" | timechart  avg(energy_impact) by process_name

The command above means:

  1. Pipe output of ‘source=”/tmp/powermetrics.log”‘ to the timechart command
  2. Average the energy_impact score by x-axis time frame (aka span)
  3. Track each process_name separately.

To narrow the search to one or a few processes you can modify the first part of the search like this:

source="/tmp/powermetrics.log" process_name=cmdReporter | timechart avg(energy_impact) by process_name

# OR

source="/tmp/powermetrics.log" (process_name=cmdReporter OR process_name=WindowServer) | timechart avg(energy_impact) by process_name

# To quickly see all possible process_name values run
source="/tmp/powermetrics.log" | stats count by process_name

And lastly, to search on total combined energy impact:

source="/tmp/powermetrics.log" process_name=ALL_TASKS | timechart  avg(energy_impact) by process_name

An important note about time on the x-axis: Timechart will automatically determine the best resolution for the x-axis based on the time-frame of the search. If the search is for the past 24 hours the x-axis will be in one hour steps, if the search is for the past 60 minutes the steps will be every minute. Energy impact data is collected every 60 seconds so any span less than 60 seconds will output misleading data.

You can manually specify the x-axis steps with the ‘span=1h’ option to timechart. A basic example would be “timechart span=1h avg(energy_impact) by process_name”. More details about the span option can be found here: https://docs.splunk.com/Documentation/Splunk/8.0.2/SearchReference/Timechart#Span_options

Happy charting!

Author: Dan Griggs