Skip to content

Instantly share code, notes, and snippets.

@mtrolle
Created November 14, 2018 13:57
Show Gist options
  • Save mtrolle/dbd2cdf70f77a83b4178971aa79b6292 to your computer and use it in GitHub Desktop.
Save mtrolle/dbd2cdf70f77a83b4178971aa79b6292 to your computer and use it in GitHub Desktop.
High memory consumption on HTTParty, question asked on StackOverflow: https://stackoverflow.com/questions/53169751/ruby-memory-usage-gone-wild
require 'bundler/inline'
require 'json'
gemfile do
source 'https://rubygems.org'
gem 'httparty'
gem 'nokogiri'
gem 'memory_profiler'
end
xml_base_url = 'http://noneed.dk/test.xml?date='
auth_end_point = 'http://noneed.dk/session.php'
auth_payload = {"user" => "[email protected]", "password" => "secret"}
image_search_base_url = 'http://noneed.dk/images.php?id='
image_base_url = 'http://noneed.dk/'
logger = Logger.new(STDOUT)
download_size = 0
time_start = Time.now
requests = 0
report = MemoryProfiler.report do
# Authenticates
response = HTTParty.post(auth_end_point, body: {payload: auth_payload})
cookie_jar = HTTParty::CookieHash.new
response.headers.get_fields('Set-Cookie').each { |c| cookie_jar.add_cookies(c) }
logger.debug "Got a cookiejar - #{cookie_jar.to_cookie_string}"
download_size += response.body.bytesize
requests += 1
# Run X days of data
1.times do |n|
date = (Time.now + n * 86400).strftime("%F")
logger.info "Downloads data for #{date}"
response = HTTParty.get(xml_base_url + date)
download_size += response.body.bytesize
requests += 1
logger.debug "Download completed with code #{response.code} with a total size of #{response.body.bytesize/1024}Kb."
# Parse downloaded XML and process data items
xml = Nokogiri::XML(response.body)
items = xml.xpath("//noneed/item")
logger.debug "-- found #{items.count} records in the file"
items.each do |item|
id = item.at("id").content
unless id.nil?
logger.info "Find images for id:#{id}"
images = HTTParty.post(image_search_base_url + id, cookies: cookie_jar)
download_size += images.body.bytesize
requests += 1
logger.debug "Download completed with code #{images.code} with a total size of #{images.body.bytesize}."
image_json = JSON.parse(images.body)
image_json['ids'].each do |img_id|
logger.debug "-- downloading image #{img_id}"
image = HTTParty.post(image_base_url + img_id.to_s, cookies: cookie_jar)
download_size += image.body.bytesize
requests += 1
logger.debug "---- download status: #{image.code} with a total size of #{image.body.bytesize/1024.0/1024.0}Mb."
end
end
end
end
logger.info "Total download size: #{(download_size / 1024.0 / 1024.0).round(2)}Mb in #{Time.now - time_start} seconds through #{requests} requests."
end
report.pretty_print
logger.info "Completed!"
@mtrolle
Copy link
Author

mtrolle commented Nov 14, 2018

Identical to https://gist.github.com/mtrolle/96f55822122ecabd3cc46190a6dc18a5 but this version uses HTTParty over RestClient as it was suggested on https://stackoverflow.com/questions/53169751/ruby-memory-usage-gone-wild that HTTParty was a more "commonly used" gem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment