Created
March 5, 2023 17:58
-
-
Save ufrisk/66ab549ae85015891165124df46cb1e5 to your computer and use it in GitHub Desktop.
MemProcFS example Jupyter Notebook
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "2c655741-6f87-4d2d-a563-644c441254a1", | |
"metadata": {}, | |
"source": [ | |
"# MemProcFS memory forensics using Jupyter notebook\n", | |
"\n", | |
"The [MemProcFS](https://github.com/ufrisk/MemProcFS) Jupyter example notebook showcase how it is possible to leverage the [MemProcFS Python API](https://github.com/ufrisk/MemProcFS/wiki/API_Python) to perform fast and efficient memory analysis and forensics on memory dump files. The [example notebook](https://github.com/ufrisk/MemProcFS/wiki/API_Python_Jupyter) is not a production ready notebook.\n", | |
"\n", | |
"MemProcFS is available on Python pip for both Windows and Linux. MemProcFS analyzes Windows memory. MemProcFS is also able to analyze live memory from PCIe FPGA devices, drivers and virtual machines. But for the purposes of this notebook it is recommended to use a full memory dump file.\n", | |
"\n", | |
"If you decide to publish your own notebook using MemProcFS, or if you have improvement suggestions please let me know at Github or [contact me](https://github.com/ufrisk/MemProcFS#Links)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "93d01273-b62f-4c5f-8877-22d16ac567ed", | |
"metadata": {}, | |
"source": [ | |
"## Install and Import\n", | |
"First lets install and import the dependencies." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "2849ab43-45e6-4850-abbe-ab1375531ea1", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"#%%capture\n", | |
"%pip install --upgrade memprocfs colorama matplotlib networkx pandas tqdm pyvis" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "fd57ec08-2aee-43b4-93c8-bd4dea2b26a6", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"import memprocfs\n", | |
"import time\n", | |
"import pandas as pd\n", | |
"import networkx as nx\n", | |
"import matplotlib.pyplot as plt\n", | |
"from tqdm import tqdm\n", | |
"from io import StringIO\n", | |
"from pyvis.network import Network\n", | |
"from IPython.display import display, HTML\n", | |
"from colorama import *\n", | |
"# Do not truncate outputs from pandas:\n", | |
"pd.set_option('display.max_rows', None)\n", | |
"pd.set_option('display.max_columns', None)\n", | |
"pd.set_option('display.width', None)\n", | |
"pd.set_option('display.max_colwidth', None)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "ef33ce73-6fee-4e9c-ad8a-5e06f9887037", | |
"metadata": {}, | |
"source": [ | |
"## Configure Memory Dump file\n", | |
"**Set the full path to the memory dump file you wish to analyze.** Optionally set the path to the page files as well (not required but will improve analysis quality)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "867eafa8-bcc1-4952-ab70-c90021d8ed46", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"memory_image_path_file = \"C:/Dumps/warren.mem\"\n", | |
"memory_image_path_pagefile = \"\"\n", | |
"memory_image_path_swapfile = \"\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "089da84c-e485-478d-8486-5e095ba83e89", | |
"metadata": {}, | |
"source": [ | |
"## Initialize MemProcFS\n", | |
"Initialize MemProcFS and execute the forensic mode. The forensic mode will take a short while to process the entire memory dump file. It will run multiple analysis tasks and generate CSV files which is used by pandas for data analytics. Lets wait for the forensic mode to complete." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "9ff6b125-a50f-4d39-8db6-16f86f8b316f", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"# Process arguments\n", | |
"args = ['-device', memory_image_path_file, '-forensic', '1', '-waitinitialize', '-vm']\n", | |
"if len(memory_image_path_pagefile) > 0:\n", | |
" args = args.append(\"-pagefile0\").append(memory_image_path_pagefile)\n", | |
" if len(memory_image_path_swapfile) > 0: args = args.append(\"-pagefile0\").append(memory_image_path_swapfile)\n", | |
" \n", | |
"# Initialize MemProcFS\n", | |
"vmm = memprocfs.Vmm(args)\n", | |
"\n", | |
"# Wait for forensic mode to complete...\n", | |
"with tqdm(desc=\"MemProcFS forensic analysis progress\",total=100) as pbar:\n", | |
" while True:\n", | |
" pbar.n = int(vmm.vfs.read('/forensic/progress_percent.txt'))\n", | |
" pbar.update(0)\n", | |
" time.sleep(0.5)\n", | |
" if pbar.n == 100:\n", | |
" pbar.disable = True\n", | |
" break\n", | |
"print(Fore.GREEN + \"[✓] MemProcFS forensic mode completed.\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "26a2a6de-99d1-4b77-b4e7-257a3512cc1c", | |
"metadata": {}, | |
"source": [ | |
"## Ingest Data\n", | |
"Ingest CSV data from the MemProcFS virtual file system (VFS) path `/forensic/csv/` into pandas for data analytics. Even if the MemProcFS virtual file system isn't mounted as a virtual drive it is possible to use the API to retrieve these files." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "e1026373-875d-441a-8a37-b40d8556f3dd", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"def get_csv(f):\n", | |
" # by default vmm.vfs.read(str) will read 1MB, since the csv files may be larger allow reads up to 256MB.\n", | |
" return StringIO(vmm.vfs.read('/forensic/csv/' + f, 0x10000000).decode(\"utf-8\"))\n", | |
"\n", | |
"dfdevice = pd.read_csv(get_csv('devices.csv'))\n", | |
"dfdriver = pd.read_csv(get_csv('drivers.csv'))\n", | |
"dfhandle = pd.read_csv(get_csv('handles.csv'))\n", | |
"dfmodule = pd.read_csv(get_csv('modules.csv'))\n", | |
"dfprocess = pd.read_csv(get_csv('process.csv'))\n", | |
"dfservice = pd.read_csv(get_csv('services.csv'))\n", | |
"dfthread = pd.read_csv(get_csv('threads.csv'))\n", | |
"dfunloadedmodule = pd.read_csv(get_csv('unloaded_modules.csv'))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3c34cd56-8e7f-4136-9800-98e89a1bddb0", | |
"metadata": {}, | |
"source": [ | |
"## Suspicious Drivers\n", | |
"\n", | |
"The kernel drivers of the analyzed system is already retrieved into a dataframe. First lets show the driver dataframe as-is.\n", | |
"\n", | |
"Then lets go hunting for suspicious drivers. We assume suspicious drivers don't have the words `SystemRoot` or `System32` in them.\n", | |
"\n", | |
"NB! if a driver called ad_driver.sys is found it is indeed loaded from a suspicious path, but it belongs to FTK which probably was the driver used to dump the memory." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "2c90f33a-e7b1-4471-9a3a-15b5dcf843e6", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"dfdriver" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "35e19128-d58b-4242-b646-d03e72aa0352", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"mask = ~dfdriver['DriverPath'].str.contains('SystemRoot|System32', case=False).fillna(True)\n", | |
"suspicious = dfdriver[mask][['Name', 'DriverName', 'DriverPath']]\n", | |
"\n", | |
"if len(suspicious) == 0:\n", | |
" print(Fore.GREEN + \"[✓] No suspicious drivers.\")\n", | |
"else:\n", | |
" display(HTML(suspicious.to_html()))\n", | |
" print(Fore.RED + \"[!] Suspicious drivers found!\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "17118c15-add4-4389-996d-1e167904ed8f", | |
"metadata": {}, | |
"source": [ | |
"## Open file handles in Word\n", | |
"\n", | |
"Word may have open file handles to interesting files such as autosave files. These files may be recovered from memory.\n", | |
"\n", | |
"Join the process and handle dataframes on PID. Then filter on the word process and file handles which contain the text AppData.\n", | |
"\n", | |
"These files may be downloaded from the MemProcFS path `/<pid>/files/handles/<object-address>-<filename>` using either the `vmm.vfs.read()` function showcased above in the csv import or using the mounted MemProcFS file system." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "8f2626d0-d9c2-43d0-8fd7-65850add3ca4", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"df_word = pd.merge(dfprocess, dfhandle, on=\"PID\")[['PID', 'Name', 'Object', 'Type', 'Description']].dropna()\n", | |
"df_word = df_word[ df_word['Name'].str.contains('word', case=False) ]\n", | |
"df_word = df_word[ df_word['Type'].str.match('File') ]\n", | |
"df_word = df_word[ df_word['Description'].str.contains('AppData', case=False) ]\n", | |
"\n", | |
"if len(df_word) == 0:\n", | |
" print(Fore.GREEN + \"[✓] No word process with open handles in AppData found.\")\n", | |
"else:\n", | |
" display(HTML(df_word.to_html()))\n", | |
" print(Fore.YELLOW + \"[!] Interesting open files in the word process found!\")\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "fffacf30-bee3-4db5-b177-67bce12d07e2", | |
"metadata": {}, | |
"source": [ | |
"## Process Relationships #1\n", | |
"\n", | |
"The MemProcFS process CSV file does not contain the parent process name - only the PPID. Lets perform a left outer join to add the ParentName.\n", | |
"\n", | |
"Lets display a simplified version of the process relationship table with some additional information" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "1a717045-1269-43cc-807c-28ba75dac763", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"df_proc2_ppid = dfprocess[['PID', 'Name']].rename(columns={'PID': 'P_PID', 'Name': 'ParentName'})\n", | |
"df_proc2 = dfprocess.merge(df_proc2_ppid, left_on='PPID', right_on='P_PID', how='left')\n", | |
"df_proc2 = df_proc2[['PID', 'Name', 'PPID', 'ParentName', 'User', 'UserPath', 'KernelPath']].fillna('---')\n", | |
"\n", | |
"df_proc2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "f67954c9-9316-4b76-b3dd-291544f810c7", | |
"metadata": {}, | |
"source": [ | |
"## Process Relationships #2\n", | |
"\n", | |
"Process relationships may be visualized in a directed graph. Lets visualize the proces relationships in a pyvis graph. Here we'll just use the normal process dataframe.\n", | |
"\n", | |
"NB! this does not take reused PIDs into account!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "57eff587-0823-4ed6-823c-bbfb75884aae", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"net = Network(directed=True, cdn_resources='remote', notebook=True, height=800, width=800)\n", | |
"for index, row in dfprocess.iterrows():\n", | |
" net.add_node(row['PID'], row['Name'] + '(' + str(row['PID']) + ')')\n", | |
"for index, row in dfprocess.iterrows():\n", | |
" net.add_node(row['PPID'], '(' + str(row['PPID']) + ')')\n", | |
"for index, row in dfprocess.iterrows():\n", | |
" try: net.add_edge(row['PID'], row['PPID'])\n", | |
" except: pass\n", | |
"net.show('_pyvis.html')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "8249bd24-3989-464b-bef8-00a337bf5bbd", | |
"metadata": {}, | |
"source": [ | |
"## Loaded Modules / DLLs\n", | |
"\n", | |
"Show loaded DLLs in which it's possible to retrieve the company name from the PE VersionInfo. Also exclude Microsoft from here - so show only non-Microsoft loaded modules." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "e724cfd8-02f0-422f-a937-3e3b1c3edfb9", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"df_noms = dfmodule[['PID', 'Process', 'Name', 'VerCompanyName', 'Path']]\n", | |
"\n", | |
"df_noms = df_noms[ ~df_noms['VerCompanyName'].str.contains('microsoft', case=False).fillna(True) ]\n", | |
"df_noms" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "06042b4f-7c8a-4d2c-aa27-923db5ed765d", | |
"metadata": {}, | |
"source": [ | |
"## API: Physical Memory Map\n", | |
"\n", | |
"The physical memory map describes at which physical memory address ranges the actual physical RAM available to the operating system is located. MemProcFS retrieves it from the kernel. Check out the guide for info about [vmm.maps.memmap()](https://github.com/ufrisk/MemProcFS/wiki/API_Python_Base)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "cb800225-9f2b-4f41-994b-d82afacc9132", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"memmap = vmm.maps.memmap()\n", | |
"for memrange in memmap:\n", | |
" print(\"%10x -> %10x\" % (memrange[0], memrange[0] + memrange[1] - 1))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "12f14109-ebee-4a0c-9139-5205aa34c635", | |
"metadata": {}, | |
"source": [ | |
"## API: Read 0x100 bytes from kernel32 PE header (in explorer.exe)\n", | |
"\n", | |
"This should be the PE file header. Display it as hex. Use the MemProcFS API to achieve this. Check out the guide about the [process](https://github.com/ufrisk/MemProcFS/wiki/API_Python_Process#VmmProcess) and [module](https://github.com/ufrisk/MemProcFS/wiki/API_Python_Process#VmmModule) objects." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "1b70d8df-8aa2-4856-a2b8-b722426377a0", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"explorer = vmm.process('explorer.exe')\n", | |
"kernel32 = explorer.module('kernel32.dll')\n", | |
"memory_peheader = explorer.memory.read(kernel32.base, 0x100)\n", | |
"print('PE header of explorer.exe!kernel32.dll:')\n", | |
"print(vmm.hex(memory_peheader))\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4f3dcf13-166b-4aa7-942c-a49bed1e4453", | |
"metadata": {}, | |
"source": [ | |
"## API: List exported functions from kernel32.dll (in explorer.exe)\n", | |
"\n", | |
"Show only the first 20 entries." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ff173b26-011a-4a95-8916-db4e595b1330", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"explorer = vmm.process('explorer.exe')\n", | |
"kernel32 = explorer.module('kernel32.dll')\n", | |
"exports = kernel32.maps.eat()\n", | |
"count = 0\n", | |
"print(\"ordinal address name forward\")\n", | |
"print(\"===============================================================\")\n", | |
"for e in exports['e']:\n", | |
" print(\"%2i %12x %-36s %s\" % (e['ord'], e['va'], e['fn'], e['fwdfn']))\n", | |
" if count == 20: break\n", | |
" count += 1" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "419439ad-b945-4c7d-9f53-c6ef0ff4c4dc", | |
"metadata": {}, | |
"source": [ | |
"## API: Network Connections\n", | |
"\n", | |
"The network connections can be retrieved using the [MemProcFS API](https://github.com/ufrisk/MemProcFS/wiki/API_Python_Base)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ff049a6e-4299-4bd3-a5e1-5b0183d0feba", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"import pprint \n", | |
"netmap = vmm.maps.net()\n", | |
"for e in netmap:\n", | |
" print(\"%04i: %32s -> %32s\" % (e['pid'], e['src-ip'], e['dst-ip']))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "ce91ea24-3e09-4f7a-9ded-e327ad97df7e", | |
"metadata": {}, | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.11.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment