-
-
Save NeighborGeek/84d578de2bc5538bd8d1 to your computer and use it in GitHub Desktop.
| <# | |
| .Synopsis | |
| Parses html files produced by Google Takeout to present Google Voice call and text history in a useful way. | |
| .DESCRIPTION | |
| When exporting Google Voice data using the Google Takeout service, the data is delivered in the form | |
| of many individual .html files, one for each call (placed, received, or missed), each text message | |
| conversation, and each voicemail or recorded call. For heavy users of Google Voice, this could mean many | |
| thousands of individual files, all located in a single directory. This script parses all of the html | |
| files to collect details of each call or message and outputs them as an object which can then be | |
| manipulated further within powershell or exported to a file. | |
| This script requires the "HTML Agility Pack", available via NuGet. The script was written and tested | |
| with HTML Agility Pack version 1.4.9 For more information, go to http://htmlagilitypack.codeplex.com/ | |
| In order to run this script, you must have at least powershell version 3.0. If you're running Windows 7, | |
| You can update powershell by installing the latest "Windows Management Framework" from microsoft, currently | |
| WMF 4. | |
| About This Script | |
| ----------------- | |
| Author: Steve Whitcher | |
| Web Site: http://www.neighborgeek.net | |
| Version: 1.1 | |
| Date: 11/16/2015 | |
| .EXAMPLE | |
| import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45 | |
| This command parses files in c:\temp\takeout\voice\calls using the HtmlAgilityPack.dll file located | |
| in 'C:\packages\HtmlAgilityPack.1.4.9\lib\net45'. Run this way, all of the text message and call history would be | |
| output to the screen only. | |
| .EXAMPLE | |
| import-gvhistory -path c:\temp\takeout\voice\calls -agilitypath C:\packages\HtmlAgilityPack.1.4.9\lib\net45\ | | |
| where-object {$_.Type -eq "Text"} | export-csv c:\temp\TextMessages.csv | |
| This command uses the same parameters as Example 1, but then passes the information on be filtered | |
| by Where-Object to only include records of Text messages, and not calls. After filtering, the information | |
| is saved to c:\temp\TextMessages.csv by passing the output of Where-Object to Export-CSV. | |
| .EXAMPLE | |
| import-gvhistory -path c:\temp\takeout\voice\calls | export-csv c:\temp\GVHistory.csv | |
| This command does not include the -agilitypath parameter, so the script will attempt to find | |
| and use HTMLAgilityPack.dll in the current working directory. The command will process all call and text | |
| message information and save it to c:\temp\GVHistory.csv | |
| #> | |
| function import-gvhistory | |
| { | |
| [CmdletBinding()] | |
| [Alias()] | |
| [OutputType("Selected.System.Management.Automation.PSCustomObject")] | |
| #Requires -Version 3.0 | |
| Param | |
| ( | |
| # Path to the "Calls" directory containing Google Voice data exported from Google Takeout. | |
| [Parameter(Mandatory=$true, | |
| ValueFromPipelineByPropertyName=$true, | |
| Position=0)] | |
| $Path, | |
| # Path to "HtmlAgilityPack.dll" if not located in the working directory. | |
| $AgilityPath = "." | |
| ) | |
| Begin | |
| { | |
| $option = [System.StringSplitOptions]::None | |
| $separator = "-" | |
| $Records = (get-childitem $Path) | Where-object {$_.Name -like "*.html"} | |
| $Calls = @() | |
| $Texts = @() | |
| $GVHistory = @() | |
| add-type -assemblyname system.web | |
| add-type -path "$($AgilityPath)\HtmlAgilityPack.dll" | |
| } | |
| Process | |
| { | |
| ForEach ($Record in $Records) | |
| { | |
| Write-Verbose "Record $Record.Name" # File name being processed | |
| # Split File Name into Contact Name, Call Type, and Timestamp | |
| $RecordName = (($Record.Name).trimend(".html")).split($separator,3,$option) | |
| Write-Verbose "RecordName $RecordName" | |
| $Contact = $RecordName[0].trim() | |
| $Type = $RecordName[1].trim() | |
| $FileTime = $RecordName[2] | |
| Write-Verbose "Name $Contact" | |
| Write-Verbose "Type $Type" | |
| Write-Verbose "TimeStamp $FileTime" | |
| Write-Verbose "" | |
| $doc = New-Object HtmlAgilityPack.HtmlDocument | |
| $source = $doc.Load($Record.fullname) | |
| if ($Type -ne "Text") | |
| { | |
| # Record is of a phone call that was placed, received, or missed, or of a voicemail message. | |
| # v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats | |
| # the text format of the time. | |
| # $GMTTime = $doc.documentnode.selectnodes("//abbr [@class='published']").InnerText.Trim() | |
| # $CallTime = get-date $GMTTime | |
| $AttribTime = $doc.documentnode.selectnodes("//abbr [@class='published']").getattributevalue('title','') | |
| $CallTime = get-date $attribtime | |
| $Tel = $doc.documentnode.selectnodes(".//a [@class='tel']") | |
| $ContactName = $tel.selectsinglenode(".//span[1]").InnerText.Trim() | |
| $ContactNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+") | |
| If ($Type -ne "Missed" -and $Type -ne "Recorded") | |
| { | |
| # Missed Calls don't have a duration listed. Some recorded calls might also be zero length. | |
| # Get duration for all other call types. | |
| $Duration = $doc.documentnode.selectnodes(".//abbr[@class='duration']").InnerText.Trim("(",")") | |
| } | |
| Else | |
| { | |
| $Duration = "" | |
| } | |
| If ($Type -eq "Voicemail") | |
| { | |
| # Get the Automated Transcription of voicemail messages as well as the name of the mp3 audio file. | |
| $FullText = $doc.documentnode.selectnodes("//span [@class='full-text']").InnerText | |
| $Fulltext = [System.Web.HttpUtility]::HtmlDecode($FullText) | |
| $Audio = $doc.documentnode.selectsinglenode("//audio") | |
| If ($Audio) | |
| { | |
| # If there was no audio recorded, the mp3 file won't exist. | |
| $AudioFilePath = $Audio.GetAttributeValue("src", "") | |
| } | |
| } | |
| Else | |
| { | |
| # Calls of type other than "Voicemail" won't have audio or transcription, so blank the associated variables. | |
| $FullText = "" | |
| $Audio = "" | |
| $AudioFilePath = "" | |
| } | |
| # Add the details of this call record to $Calls | |
| $Calls += [PSCustomObject]@{ | |
| Contact = $ContactName | |
| Time = $CallTime | |
| Type = $Type | |
| Number = $ContactNum | |
| Duration = $Duration | |
| Message = $FullText | |
| AudioFile = $AudioFilePath | |
| Direction = "" | |
| } | |
| } | |
| else | |
| { | |
| # Record is of an SMS Conversation containing one or more messages | |
| $Messages = $doc.documentnode.selectnodes("//div[@class='message']") | |
| # Each HTML file represents a single SMS "Conversation". A conversation could include many messages. | |
| # Process each individual message. | |
| ForEach ($Msg in $messages) { | |
| # v1.1 Changed to get time from time attribute instead of innertext due to a change in how google formats | |
| # the text format of the time. | |
| # $GMTTime = $msg.selectsinglenode(".//abbr[@class='dt']").InnerText.Trim() | |
| # $MsgTime = get-date $GMTTime | |
| $AttribTime = $msg.selectsinglenode(".//abbr[@class='dt']").getattributevalue('title','') | |
| $MsgTime = get-date $AttribTime | |
| $Tel = $msg.selectsinglenode(".//a [@class='tel']") | |
| $SenderName = $tel.InnerText.Trim() | |
| $SenderNum = $tel.GetAttributeValue("href", "Number").TrimStart("tel:+") | |
| $Body = $msg.selectsinglenode(".//q").InnerText.Trim() | |
| $Body = [System.Web.HttpUtility]::HtmlDecode($Body) | |
| if ($SenderName -eq "Me") | |
| { | |
| $Direction = "Received" | |
| } | |
| else | |
| { | |
| $Direction = "Sent" | |
| } | |
| # Add the details of this message to $Texts | |
| $Texts += [PSCustomObject]@{ | |
| Contact = $Contact | |
| Time = $MsgTime | |
| Type = $Type | |
| Direction = $Direction | |
| Message = $Body | |
| } | |
| } | |
| } | |
| } | |
| } | |
| End | |
| { | |
| # Combine all $Calls and $Texts, sort based on the timestamp. | |
| $GVHistory = $Calls + $Texts | |
| $GVHistory | Sort Time | |
| } | |
| } |
@NeighborGeek I am getting the same errors as @bouchacha, except in my case the CSV file is blank.
This is what I am running on an admin prompt for PowerShell:
. 'C:\Users\user.name\Documents\GV History\Scripts\import-gvhistory.ps1'
$agilitypath = 'C:\Users\user.name\Documents\GV History\Scripts\htmlagilitypack.1.11.24\lib\Net45'
$path = 'C:\Users\user.name\Documents\GV History\Takeout\Voice\Calls'
import-gvhistory -Path $path -AgilityPath $agilitypath |
where-object {$_.Type -eq "Text"} | export-csv 'C:\Users\user.name\Documents\GV History\TextMessages.csv'
Can you let me know what I am doing wrong?
I've been working w/ this and it looks like the null valued errors are group conversations.
I tried getting this to run, in all cases I just get a blank GVHistory.csv
@NeighborGeek, I admit I have no idea what dot sourcing is and am a complete PowerShell noob, so thank you for your patience. I did in fact manage to get this to work! One of the issues I ran into is that I didn't know you had to enter the code you included above in separate lines.
That said, I get a TON of these two errors:
I still get a very full GVHistory.csv, but it's not clear whether any records are missing.
Also, is there an elegant way to include MMS like photos and videos? I realize that's asking a lot. It could work if this was exported to HTML instead of CSV.
Thank you for your work on this.