Skip to content

Instantly share code, notes, and snippets.

@inaz2
Last active January 4, 2016 03:21
Show Gist options
  • Save inaz2/854684e89576fbc74e3c to your computer and use it in GitHub Desktop.
Save inaz2/854684e89576fbc74e3c to your computer and use it in GitHub Desktop.
IE random crawler by PowerShell
$initial = "http://dir.yahoo.co.jp/"
$next = $initial
while ($True) {
Write-Host ("[+] Navigate to {0}" -f $next)
$ie = New-Object -Com InternetExplorer.Application
$ie.Left = 0
$ie.Top = 0
$ie.Visible = $True
$ie.Navigate($next)
While ($ie.Busy -or $ie.ReadyState -ne 4) {
Start-Sleep -milliseconds 100
}
Write-Host ("[+] Collecting links...")
$doc = $ie.document
#$links = $doc.getElementsByTagName("a") | ? { $_.protocol -eq "http:" -or $_.protocol -eq "https:" } | % { $_.href }
$elements = [System.__ComObject].InvokeMember("getElementsByTagName", [System.Reflection.BindingFlags]::InvokeMethod, $null, $doc, "a")
$links = $elements | ? { $_.protocol -eq "http:" -or $_.protocol -eq "https:" } | % { $_.href }
$linkcount = ($links.Count, 0 -ne $null)[0]
Write-Host ("[+] Found {0} links" -f $linkcount)
If ($linkcount) {
Do {
$tmp = $links | Get-Random
} While ($tmp -eq $next)
$next = $tmp
} else {
$next = $initial
}
$ie.Quit()
}
>powershell -ex remotesigned ./test.ps1
[+] Navigate to http://dir.yahoo.co.jp/
[+] Collecting links...
[+] Found 298 links
[+] Navigate to http://dir.yahoo.co.jp/News/Television/?q=Television
[+] Collecting links...
[+] Found 111 links
[+] Navigate to http://www.fujitv.co.jp/ana/
[+] Collecting links...
[+] Found 234 links
[+] Navigate to http://blog.fujitv.co.jp/anatamago/E20151128001.html
[+] Collecting links...
[+] Found 181 links
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment