Skip to content

Instantly share code, notes, and snippets.

@icy
Last active October 2, 2018 06:39
Show Gist options
  • Select an option

  • Save icy/fb89097221b502239fb99cf671fcbf0d to your computer and use it in GitHub Desktop.

Select an option

Save icy/fb89097221b502239fb99cf671fcbf0d to your computer and use it in GitHub Desktop.
split.rb
#!/usr/bin/env ruby
# Author : Ky-Anh Huynh
# Date : Sep 2018
# License: MIT
# Purpose: Providing N-lines from STDIN, this script creates N/P files,
# Purpose: each file contains P files from the input.
BEGIN {
$start_group = (ENV["START"] ? ENV["START"].to_i : 0)
$idx = 0
$buffers = []
$max = 5000
$groups = $start_group
def ship
$groups += 1
puts "flushing buffer #{$groups}, size = #{$buffers.size}"
File.open("s3sync-input/input.#{$groups}.txt", "w") { |f|
f.puts $buffers.join
}
$buffers = []
$idx = 0
end
}
if ($idx <= $max)
$buffers << $_
$idx += 1
ship if $idx == $max
end
END {
ship
puts "number of groups: #{$groups - $start_group}"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment