By: Christophe Parisel
Detecting Service Principal anomalies in Azure activity logs is challenging:
- Busy services generate thousands of logs per minute, if not more;
- Service Principal Names (SPNs) are numerous: you might end up with more service principals than named users in your AAD;
- A significant number of SPNs have administrative roles, meaning wide-ranging role assignments for performing nearly arbitrary operations in Azure resource providers.
- With system-assigned Managed Identities, many SPNs have become transient.
For all those reasons, relying on traditional queries to hunt for anomalies is mostly irrelevant.
If we to turn to a statistical approach as an alternate way for chasing anomalies, the only ready-made tool at our disposal is Azure Sentinel time series analysis. This article is the first instalment of a discussion about SPN anomalies detection in Azure Activity:
- This instalment will explain how time series, like traditional queries, fail to meet our expectations;
- The next instalment will propose a more efficient solution;
- I might add extra installments depending on the interest raised by the cyber /architects community.
Time series analysis
Let me pick a SPN at random in an automated infra-as-code workload. Over a sample period of about a month, the time series decomposition of Azure operations is looking as follows:
This SPN is not very active: 221k ops/month is not that much. Despite of this, and even under a high resolution (we used 1 hour steps to make the time series), we see that the decomposition does not show any seasonal component.
Let’s dive further into the series and run a default[*] anomalies decomposition:
The only spotted spike lies between October 22 and 23. This doesn’t come as a surprise since it’s the most outstanding feature in the original decomposition. Aren’t there other anomalies missed by the decomposition?
Let run the anomalies decomposition with the lowest detection threshold[**] to capture more cases:
Now we see new spikes with very low scores: a plateau between October 4 and 7, a negative spike on October 5, an oscillation between October 7 and 9, a negative spike on October 19. But are they actual anomalies in terms of cybersecurity?
To answer this question, we need more insights: for that, let’s summarize count() on the actual operations performed by the SPN:
The result is fuzzy since one operation overwhelms all others, but we see something unexpected: a Microsoft.Network action with a count of just one. Due to the fuzziness we do not see it on the chart, so let’s refine the summarization on log(count()):
We see that the anomalies raised by the decomposer do not look so suspicious. But there is a security issue on October 10 that is missed by analysis: on the left-hand side of the green arrow, I have highlighted a unique, unprecedented call to the Network resource provider to modify a network interface.
Eventually, the only way to pinpoint the October 10 anomaly is by making series on all the operation values[***], but this triggers many high-score false positives (we see at least 6 of them in the picture below):
Unfortunately, false positives only get worse as we put more SPNs under our supervision.
There’s room for improvement obviously. In the next part, we will see what’s wrong with time series and how we can remediate that in Azure Sentinel.
[*]: default arguments are: threshold=1.5, seasonality=autodetect, trend=’linefit’
[***]: make-series by OperationNameValue
By: Christophe Parisel