Skip to content

Instantly share code, notes, and snippets.

Created August 2, 2016 02:58
Show Gist options
  • Save latkin/b8c3086210fb5f1fdcdd29cb39bcea3e to your computer and use it in GitHub Desktop.
Save latkin/b8c3086210fb5f1fdcdd29cb39bcea3e to your computer and use it in GitHub Desktop.
Benchmark of PowerShell de-duplication cmdlets
# Benchmarking code used to produce results from blog post
# home-grown HashSet-based approach
function hashunique {
[Parameter(Mandatory=$true, ValueFromPipeline = $true)]
[Object[]] $InputObject
begin { $t = New-Object System.Collections.Generic.HashSet[Object] }
process {
foreach($x in $inputObject) {
if($t.Add($x)) { $x }
1..20 |%{
$collectionSize = $_ * 500
$data = 1..$collectionSize |%{ Get-Random -Min 1 -Max ($collectionSize/2) }
$selectTime = Measure-Command { $data | select -unique }
$sortTime = Measure-Command { $data | sort -unique }
$groupTime = Measure-Command { $data | group |% Name }
$uniqTime = Measure-Command { $data | hashunique }
[PSCustomObject] @{
'Collection Size' = $collectionSize
'select -unique' = $selectTime.TotalMilliseconds
'sort -unique' = $sortTime.TotalMilliseconds
'group |% Name' = $groupTime.TotalMilliseconds
'hashunique' = $uniqTime.TotalMilliseconds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment