Skip to content

Instantly share code, notes, and snippets.

@MarkMurphy
Last active October 1, 2024 20:27
Show Gist options
  • Save MarkMurphy/f843566780979174f270 to your computer and use it in GitHub Desktop.
Save MarkMurphy/f843566780979174f270 to your computer and use it in GitHub Desktop.
Rails resumable uploads

Uploads large files in multiple chunks. Also has the ability to resume if the upload is interrupted.

Typical usage:

  1. Send a POST request to /upload with the first chunk of the file and receive an upload id in return.
  2. Repeatedly PATCH subsequent chunks using the upload id to identify the upload in progress and an offset representing the number of bytes transferred so far.
  3. After each chunk has been uploaded, the server returns a new offset representing the total amount transferred.
  4. After the last chunk commit the upload by passing its id to another endpoint such as POST /upload/commit/:id:
POST /upload/commit/20140806036f1378006746bcb40a0e3981257552 HTTP/1.1
Host: api.example.com
Accept: application/vnd.example.v1+json

Chunks can be any size up to 150 MB. A typical chunk is 4 MB. Using large chunks will mean fewer calls to /upload and faster overall throughput. However, whenever a transfer is interrupted, you will have to resume at the beginning of the last chunk, so it is often safer to use smaller chunks.

If the offset you submit does not match the expected offset on the server, the server will ignore the request and respond with a 400 error that includes the current offset. To resume upload, seek to the correct offset (in bytes) within the file and then resume uploading from that point.

A chunked upload can take a maximum of 48 hours before expiring.


Create a resumable multipart upload

POST /upload

Request

POST /upload HTTP/1.1
Host: api.example.com
Content-Type: application/offset+octet-stream
Content-Length: 221993
File-Name: video.mp4
File-Size: 443987
Offset: 0

...[binary data]...

Parameters

Name Type Description
File-Name string The name of the file that you're uploading.
File-Size number The full/final size of the file in bytes.
Content-Length number The length in bytes of the chunk you are sending.
Checksum-Type string (Optional) Passing checksum type signals to the server that you'd like it to return a checksum generated via: MD5, SHA1 or SHA2. The default is MD5

Response

HTTP/1.1 201 Created
Checksum: d41d8cd98f00b204e9800998ecf8427e
Checksum-Type: MD5
{
  "id": "20140806036f1378006746bcb40a0e3981257552",
  "offset": 221993,
  "expires": "2014-08-02T22:43:49Z"
}

Resume a multipart upload

PATCH /upload/:id

Request

POST /upload/20140806036f1378006746bcb40a0e3981257552 HTTP/1.1
Host: api.example.com
Content-Type: application/offset+octet-stream
Content-Length: 221993
Offset: 221993

...[binary data]...

Parameters

Name Type Description
Offset number The byte offset of this chunk, relative to the beginning of the full file. The server will verify that this matches the offset it expects. If it does not, the server will return an error with the expected offset.
Checksum-Type string (Optional) Passing checksum_type signals to the server that you'd like it to return a checksum generated via: MD5, SHA1 or SHA2

Response

HTTP/1.1 200 OK
Checksum: d41d8cd98f00b204e9800998ecf8427e
Checksum-Type: MD5
{
  "id": "20140806036f1378006746bcb40a0e3981257552",
  "offset": 443986,
  "expires": "2014-08-02T22:43:49Z"
}

Errors

404 - The upload id does not exist or has expired.

400 - The offset parameter does not match up with what the server expects. The body of the error response will be JSON similar to the above, indicating the correct offset to upload.


Get info about a multipart upload

POST /upload/:id

Request

POST /upload/20140806036f1378006746bcb40a0e3981257552 HTTP/1.1
Host: api.example.com

Response

HTTP/1.1 200 OK
{
  "id": "20140806036f1378006746bcb40a0e3981257552",
  "offset": 221993,
  "expires": "2014-08-02T22:43:49Z"
}

module Upload
module Error; end
class FileNotFound < StandardError; end
class UnexpectedOffset < StandardError; end
end
class Upload::File
include Upload::Error
include ActiveModel::Model
include ActiveModel::Serializers::JSON
DATA_DIR = "#{Rails.root}/data"
DATE_EXP_TIME = 2.days
attr_accessor :id, :offset, :expires, :file_name, :file_size, :data, :metadata, :created_at, :updated_at
validates :file_name, presence: true
def self.find(id)
file_path = self.file_path(id)
info_path = self.info_path(id)
# raise ActiveRecord::RecordNotFound unless File.exists?(file_path)
raise Upload::FileNotFound unless File.exists?(file_path)
begin
json = File.open(info_path, "r") { |file| file.read }
info = JSON.parse(json, object_class: Upload::FileInfo).merge!(id: id)
new(info)
rescue SystemCallError => e
raise e # (PermissionError, e.message) if e.class.name.start_with?("Errno::")
end if File.exists?(file_path)
end
def self.create(attributes = nil)
if attributes.is_a?(Array)
attributes.collect { |attr| create(attr, &block) }
else
object = new(attributes)
return false unless object.valid?
object.id = SecureRandom.uuid.delete!("-")
begin
File.open(file_path(object.id), "wb") do |file|
file.sync = true
file.seek(object.offset) unless object.offset.nil?
file.write(object.data) unless object.data.nil?
object.data = nil
object.offset = file.size
object.expires = file.ctime + DATE_EXP_TIME
object.created_at = file.ctime
object.updated_at = file.mtime
end
yield(object) if block_given?
object.save
rescue SystemCallError => e
raise e # (PermissionError, e.message) if e.class.name.start_with?("Errno::")
end
object
end
end
def self.file_path(id)
File.join(DATA_DIR, "#{id}.bin")
end
def self.info_path(id)
File.join(DATA_DIR, "#{id}.json")
end
def initialize(attributes = {})
attributes.each do |name, value|
send("#{name}=", value)
end
end
def created_at=(value)
@created_at = value.to_time.utc.iso8601
end
def updated_at=(value)
@updated_at = value.to_time.utc.iso8601
end
def expires=(value)
@expires = value.to_time.utc.iso8601
end
def persisted?
id.present?
end
def attributes
{ id: id, offset: offset, expires: expires }
end
def file_path
self.class.file_path(id)
end
def info_path
self.class.info_path(id)
end
def apply(data, offset = nil)
begin
File.open(file_path, "r+b") do |file|
# Verify offset matches offset expected
raise Upload::UnexpectedOffset if file.size != offset
file.sync = true
file.seek(offset) unless offset.nil?
file.write(data) unless data.nil?
# Unset data since it has been flushed
self.data = nil
self.offset = file.size
self.expires = file.ctime + DATE_EXP_TIME
self.created_at = file.ctime
self.updated_at = file.mtime
end
save
rescue SystemCallError => e
raise e # (PermissionError, e.message) if e.class.name.start_with?("Errno::")
end
end
def open(&block)
begin
File.open(file_path, "rb") do |file|
file.class.class_eval do
attr_accessor :original_filename
end
file.original_filename = file_name
block.call(file) unless block.nil?
end
rescue SystemCallError => e
raise e # (PermissionError, e.message) if e.class.name.start_with?("Errno::")
end
end
def delete!
unless id.blank?
Dir.glob("#{DATA_DIR}/#{id}*").each do |path|
File.delete(path)
end
end
end
def save
return false unless valid?
info = Upload::FileInfo.new({
offset: offset,
expires: expires,
file_name: file_name,
file_size: file_size,
created_at: created_at,
updated_at: updated_at
})
begin
File.open(info_path, "w") do |file|
file.write(info.to_json)
end
rescue SystemCallError => e
raise e # (PermissionError, e.message) if e.class.name.start_with?("Errno::")
end
true
end
def self.assign_to_attachment(id, instance, attachment_name)
if upload = Upload::File.find(id)
upload.open do |file|
instance.send("#{attachment_name}=", file)
end
end
end
def self.generate_checksum(file, type = "MD5")
"Digest::#{type}".constantize.file(file).hexdigest
end
end
class Upload::FileInfo < Hash
def initialize(args = {})
self["offset"] = args[:offset] || 0
self["expires"] = args[:expires] || nil
self["file_name"] = args[:file_name] || nil
self["file_size"] = args[:file_size] || nil
self["created_at"] = args[:created_at] || nil
self["updated_at"] = args[:created_at] || nil
self["metadata"] = args[:metadata] || nil
end
def offset=(value)
self["offset"] = value.to_i
end
def offset
self["offset"]
end
def file_name=(value)
self["file_name"] = value
end
def file_name
self["file_name"]
end
def file_size=(value)
self["file_size"] = value.to_i
end
def file_size
self["file_size"]
end
def created_at=(value)
self["created_at"] = value.to_s
end
def created_at
self["created_at"].to_time.utc.iso8601()
end
def updated_at=(value)
self["updated_at"] = value.to_s
end
def updated_at
self["updated_at"].to_time.utc.iso8601()
end
def expires=(value)
self["expires"] = value.to_s
end
def expires
self["expires"].to_time.utc.iso8601()
end
def remaining_length
file_size - offset
end
end
module API::V1
# 411 Length Required
# The request did not specify the length of its content, which is required by the requested resource.
#
# 413 :request_entity_too_large
# The request is larger than the server is willing or able to process.
#
# 415 Unsupported Media Type
# The request entity has a media type which the server or resource does not support. For example, the client uploads an image as image/svg+xml, but the server requires that images use a different format.
class UploadController < APIController
before_action :doorkeeper_authorize!
HTTP_ETAG_HEADER = API::HTTP::Header.new("ETag")
HTTP_CHECKSUM_HEADER = API::HTTP::Header.new("Checksum")
HTTP_CHECKSUM_TYPE_HEADER = API::HTTP::Header.new("Checksum-Type")
CHECKSUM_TYPES = %w(MD5 SHA1 SHA2)
DEFAULT_CHECKSUM_TYPE = "MD5"
# GET /multipart_upload/:id
# GET /multipart_upload/:id.json
def show
begin
@upload = Upload::File.find(params[:id])
respond_with @upload
rescue Upload::FileNotFound
# The upload id does not exist or has expired.
render json: {}, status: :not_found
end
end
# POST /multipart_upload
# POST /multipart_upload.json
def create
@upload = Upload::File.create(create_params)
headers[HTTP_ETAG_HEADER] =
headers[HTTP_CHECKSUM_HEADER] = request_digest.checksum
headers[HTTP_CHECKSUM_TYPE_HEADER] = request_digest.type
if @upload.errors.empty?
render json: @upload, status: :created, location: api_multipart_upload_url(@upload.id)
else
respond_with @upload
end
end
# PATCH /multipart_upload/:id
# PATCH /multipart_upload/:id.json
def update
begin
@upload = Upload::File.find(params[:id])
@upload.apply(data, offset)
headers[HTTP_ETAG_HEADER] =
headers[HTTP_CHECKSUM_HEADER] = request_digest.checksum
headers[HTTP_CHECKSUM_TYPE_HEADER] = request_digest.type
if @upload.errors.empty?
render json: @upload, status: :ok
else
respond_with @upload
end
rescue Upload::UnexpectedOffset
# The offset parameter does not match up with what the server expects.
render json: @upload, status: :bad_request
rescue Upload::FileNotFound
# The upload id does not exist or has expired.
render json: {}, status: :not_found
end
end
private
def create_params
params[:file_name] ||= request.headers["File-Name"]
params[:file_size] ||= request.headers["File-Size"].to_i
params[:checksum_type] ||= request.headers["Checksum-Type"]
params.permit(:file_name, :file_size).merge!(data: request.body.read)
end
def update_params
params[:offset] ||= request.headers["Offset"].to_i
params[:checksum_type] ||= request.headers["Checksum-Type"]
params.permit(:offset).merge!(data: request.body.read)
end
def data
request.body.read
end
def offset
(params[:offset] || request.headers["Offset"]).to_i
end
def request_digest
@request_digest ||= begin
type = CHECKSUM_TYPES.detect { |key| key == checksum_type.to_s.upcase } || DEFAULT_CHECKSUM_TYPE
checksum = "Digest::#{type}".constantize.hexdigest(request.body.read)
OpenStruct.new(type: type, checksum: checksum)
end
end
def checksum_type
params[:checksum_type] ||= request.headers["Checksum-Type"]
end
end
end
@sidd-kulk
Copy link

Nice! How does it handle if chunks arrive out of order?

@MarkMurphy
Copy link
Author

MarkMurphy commented Jan 11, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment