availability: September 2013
$ gem install rails Successfully installed rake-0.8.4 Successfully installed activesupport-2.3.2 Successfully installed activerecord-2.3.2 Successfully installed actionpack-2.3.2 Successfully installed actionmailer-2.3.2 Successfully installed activeresource-2.3.2 Successfully installed rails-2.3.2The first step is to create controller that has two actions, on 'index' it will show a form "uploadfile.html.erb' and the action 'upload' will handle the upload
$ gem install sqlite3-ruby $ rails upload-test $ cd upload-test $ script/generate controller Upload exists app/controllers/ exists app/helpers/ create app/views/upload exists test/functional/ create test/unit/helpers/ create app/controllers/upload_controller.rb create test/functional/upload_controller_test.rb create app/helpers/upload_helper.rb create test/unit/helpers/upload_helper_test.rb
#app/controller/upload_controller.rb class UploadController < ApplicationController def index render :file => 'app/views/upload/uploadfile.html.erb' end def upload post = Datafile.save(params[:uploadform]) render :text => "File has been uploaded successfully" end endThe second create the view to have file upload form in the browser. Note the multipart parameter to do a POST
#app/views/upload/uploadfile.html.erb
<% form_for :uploadform, :url => { :action => 'upload'}, :html => {:multipart => true} do |f| %>
<%= f.file_field :datafile %><br />
<%= f.submit 'Create' %>
<% end %>
Last is to create the model , to save the uploaded file to public/data. Note the orignal_filename we use to
#app/models/datafile.rb
class Datafile < ActiveRecord::Base
def self.save(upload)
name = upload['datafile'].original_filename
directory = "public/data"
# create the file path
path = File.join(directory, name)
# write the file
File.open(path, "wb") { |f| f.write(upload['datafile'].read) }
end
end
$ mkdir public/data $ ./script server webrick => Booting WEBrick => Rails 2.3.2 application starting on http://0.0.0.0:3000 => Call with -d to detach => Ctrl-C to shutdown server [2009-04-10 13:18:27] INFO WEBrick 1.3.1 [2009-04-10 13:18:27] INFO ruby 1.8.6 (2008-03-03) [universal-darwin9.0] [2009-04-10 13:18:27] INFO WEBrick::HTTPServer#start: pid=5057 port=3000Point your browser to http://localhost:3000/upload and you can upload a file. If all goes well, there should be a file public/data with the same name as your file that your uploaded.
curl -Fuploadform['datafile']=@large.zip http://localhost:3000/upload/uploadIf you would use this, rails would throw the following error: "ActionController::InvalidAuthenticityToken (ActionController::InvalidAuthenticityToken):"
#app/controller/upload_controller.rb class UploadController < ApplicationController skip_before_filter :verify_authenticity_tokenWebrick and Large File Uploads
7895 ruby 16.0% 0:26.61 2 33 144 559M 188K 561M 594M====> Memory GROWS: We see that the ruby process is growing and growing. I guess it is because webrick loads the body in a string first.
#gems/rails-2.3.2/lib/webrick_server.rb
def handle_dispatch(req, res, origin = nil) #:nodoc:
data = StringIO.new
Dispatcher.dispatch(
CGI.new("query", create_env_table(req, origin), StringIO.new(req.body || "")),
ActionController::CgiRequest::DEFAULT_SESSION_OPTIONS,
data
)
=====> Files get written to disk Multiple times for the Multipart parsing: When the file is upload, you see message appearing in the webrick log. It has a file in /var/folder/EI/....
Processing UploadController#upload (for ::1 at 2009-04-09 13:51:23) [POST]
Parameters: {"commit"=>"Create", "authenticity_token"=>"rf4V5bmHpxG74q6ueI3hUjJzwhTLUJCp9VO1uMV1Rd4=", "uploadform"=>{"datafile"=>#<File:/var/folders/EI/EIPLmNwOEea96YJDLHTrhU+++TI/-Tmp-/RackMultipart.7895.1>}}
[2009-04-09 14:09:03] INFO WEBrick::HTTPServer#start: pid=7974 port=3000
It turns out, that the part that handles the multipart, writes the files to disk in the $TMPDIR. It creates files like
$ ls $TMPDIR/ RackMultipart.7974.0 RackMultipart.7974.1Strange, two times? We only uploaded one file? I figure this is handled by the rack/utils.rb bundled in action_controller. Possible related is this bug described at https://rails.lighthouseapp.com/projects/8994/tickets/1904-rack-middleware-parse-request-parameters-twice
#gems/actionpack-2.3.2/lib/action_controller/vendor/rack-1.0/rack/utils.rb # Stolen from Mongrel, with some small modifications: def self.parse_multipart(env) write multiOptimizing the last write to disk
# write the file
File.open(path, "wb") { |f| f.write(upload['datafile'].read) }
We can use the following to avoid writing to disks our selves
FileUtils.mv upload['datafile'].path, pathThis makes use from the fact that the file is allready on disk, and a file move is much faster then rewriting the file.
$ gem install mongrel Successfully installed gem_plugin-0.2.3 Successfully installed daemons-1.0.10 Successfully installed fastthread-1.0.7 Successfully installed cgi_multipart_eof_fix-2.5.0 Successfully installed mongrel-1.1.5 $ mongrel_rails startOk, let's start the upload again using our curl:
lib/mongrel/const.rb # This is the maximum header that is allowed before a client is booted. The parser detects # this, but we'd also like to do this as well. MAX_HEADER=1024 * (80 + 32)In our tests, we saw that aside from the RackMultipart.<pid>.x files, there is additional file written in $TMPDIR: mongrel.<pi>.0
# Maximum request body size before it is moved out of memory and into a tempfile for reading. MAX_BODY=MAX_HEADER
lib/mongrel/http_request.rb # must read more data to complete body if remain > Const::MAX_BODY # huge body, put it in a tempfile @body = Tempfile.new(Const::MONGREL_TMP_BASE) @body.binmode else # small body, just use that @body = StringIO.new end
$ gem install merb Successfully installed dm-aggregates-0.9.11 Successfully installed dm-validations-0.9.11 Successfully installed randexp-0.1.4 Successfully installed dm-sweatshop-0.9.11 Successfully installed dm-serializer-0.9.11 Successfully installed merb-1.0.11Let's create the merb application:
$ merb-gen app uploader-app $ cd uploader-appWe need to create the controller, but this a bit different from our original controller:
#app/controllers/upload.rb class Upload < Application def index render :file => 'app/views/upload/uploadfile.rhtml'
end def upload post = Datafile.save(params[:uploadform]) render :text => "File has been uploaded successfully" end endThe model looks like this:
#app/models/datafile.rb
class Datafile
include DataMapper::Resource
def self.save(upload)
name = upload['datafile']['filename']
directory = "public/data"
# create the file path
path = File.join(directory, name)
# write the file
File.open(path, "wb") { |f| f.write(upload['datafile']['tempfile'].read) }
end
We create the public/data
$ mkdir public/dataAnd start merb .
$ merb ~ Connecting to database... ~ Loaded slice 'MerbAuthSlicePassword' ... ~ Parent pid: 57318
~ Compiling routes... ~ Activating slice 'MerbAuthSlicePassword' ...
merb : worker (port 4000) ~ Starting Mongrel at port 4000When you start the upload, a merb worker becomes active.
merb : worker (port 4000) ~ Successfully bound to port 4000=====> 3 Filewrites: 1 mongrel + 1 merb + 1 final write
merb : worker (port 4000) ~ Params: {"format"=>nil, "action"=>"upload", "id"=>nil, "controller"=>"upload", "uploadform"=>{"datafile"=>{"content_type"=>"application/octet-stream",
"size"=>306609434, "tempfile"=>#<File:/var/folders/EI/EIPLmNwOEea96YJDLHTrhU+++TI/-Tmp-/Merb.13243.0>, "filename"=>"large.zip"}}}
merb : worker (port 4000) ~
After that Merb handles the multipart stream and writes once in $TMPDIR/Merb.<pid>.0
$ gem install sinatra Successfully installed sinatra-0.9.1.1 1 gem installed Installing ri documentation for sinatra-0.9.1.1... Installing RDoc documentation for sinatra-0.9.1.1...Create a sample upload handler:
#sinatra-test-upload.rb
require 'rubygems'
require 'sinatra'
post '/upload' do
File.open("/tmp/theuploadedfile","wb") { |f| f.write(params[:datafile]['file'].read) }
end
$ ruby upload-sinatra.rb == Sinatra/0.9.1.1 has taken the stage on 4567 for development with backup from Mongrel====> No memory increase: good!
So instead of 3000 it listens on 4567
require 'rubygems' require 'mongrel'
class HelloWorldHandler < Mongrel::HttpHandler def process(request, response)
puts request.body.path response.start(200) do |head,out| head['Content-Type'] = "text/plain" out << "Hello world!" end end def request_progress (params, clen, total) end end
Mongrel::Configurator.new do listener :port => 3000 do uri "/", :handler => HelloWorldHandler.new end
run; join end=====>No memory increase: good!
request.body.path = /var/folders/EI/EIPLmNwOEea96YJDLHTrhU+++TI/-Tmp-/mongrel.93690.0
# Allow the metal piece to run in isolation require(File.dirname(__FILE__) + "/../../config/environment") unless defined?(Rails) class Uploader def self.call(env) if env["PATH_INFO"] =~ /^\/uploader/ puts env["rack.input"].path
[200, {"Content-Type" => "text/html"}, ["It worked"]] else [400, {"Content-Type" => "text/html"}, ["Error"]] end end end
env["rack.input"].path = actually the /var/folders/EI/EIPLmNwOEea96YJDLHTrhU+++TI/-Tmp-/mongrel.81685.0If we want to parse this, we can pass the env to the Request.new but this kicks in the RackMultipart again.
request = Rack::Request.new(env) puts request.POST #uploaded_file = request.POST["file"][:tempfile].read=====>No memory increase: good!
curl -v -F datafile['file']=@large.zip http://localhost:80/ * About to connect() to localhost port 80 * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 80 > POST /datafiles HTTP/1.1 > User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5 > Host: localhost > Accept: */* > Content-Length: 421331151 > Expect: 100-continue > Content-Type: multipart/form-data; boundary=----------------------------1bf75aea2f35 > < HTTP/1.1 100 ContinueSetting up mod_rails is beyond the scope of this document. So we assume you have it working for your rails app.
LoadModule passenger_module /opt/ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-2.1.3/ext/apache2/mod_passenger.so PassengerRoot /opt/ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-2.1.3 PassengerRuby /opt/ruby-enterprise-1.8.6-20090201/bin/rubyMod_rails has a nice setting that you can specify your Tmpdir per virtual host:
Ok let's start the upload and see what happens:
5.10. PassengerTempDir <directory>
Specifies the directory that Phusion Passenger should use for storing temporary files. This includes things such as Unix socket files, buffered file uploads, etc.
This option may be specified once, in the global server configuration. The default temp directory that Phusion Passenger uses is /tmp.
This option is especially useful if Apache is not allowed to write to /tmp (which is the case on some systems with strict SELinux policies) or if the partition that /tmp lives on doesn’t have enough disk space.
# ./passenger-memory-stats -------------- Apache processes --------------- PID PPID Threads VMSize Private Name ----------------------------------------------- 30840 1 1 184.3 MB 0.0 MB /usr/sbin/httpd 30852 30840 1 186.2 MB ? /usr/sbin/httpd 30853 30840 1 184.3 MB ? /usr/sbin/httpd 30854 30840 1 184.3 MB ? /usr/sbin/httpd 30855 30840 1 184.3 MB ? /usr/sbin/httpd 30856 30840 1 184.3 MB ? /usr/sbin/httpd 30857 30840 1 184.3 MB ? /usr/sbin/httpd 30858 30840 1 184.3 MB ? /usr/sbin/httpd 30859 30840 1 184.3 MB ? /usr/sbin/httpd ### Processes: 9 ### Total private dirty RSS: 0.03 MB (?)
---------- Passenger processes ----------- PID Threads VMSize Private Name ------------------------------------------ 30847 4 14.1 MB 0.1 MB /opt/ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-2.1.3/ext/apache2/ApplicationPoolServerExecutable 0 /opt/ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-2.1.3/bin/passenger-spawn-server /opt/ruby-enterprise-1.8.6-20090201/bin/ruby /tmp/passenger.30840/info/status.fifo 30848 1 87.7 MB ? Passenger spawn server 30888 1 123.6 MB 0.0 MB Passenger ApplicationSpawner: /home/myrailsapp 30892 1 1777.4 MB 847.5 MB Rails: /home/myrailsapp ### Processes: 4 ### Total private dirty RSS: 847.62 MB (?)Very strange: in the /opt/ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-2.1.3/ext/apache2/Hooks.cpp of the passenger source
expectingUploadData = ap_should_client_block(r);
if (expectingUploadData && atol(lookupHeader(r, "Content-Length"))
> UPLOAD_ACCELERATION_THRESHOLD) {
uploadData = receiveRequestBody(r);
}
the part expectionUploadData is the one that sends the
> Expect: 100-continueBut is seems curl, isn't handling this request, it keeps on streaming the file, ignoring the response.
$ curl -v -0 -F datafile['file']=@large.zip http://localhost:80 * About to connect() to localhost port 80 * Trying 127.0.0.1... connected * Connected to localhost (127.0.0.1) port 80 > POST /uploader/ HTTP/1.0 > User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5 > Host: localhost > Accept: */* > Content-Length: 421331151 > Content-Type: multipart/form-data; boundary=----------------------------1b04b7cb6566Now the correct mechanism happens.
/tmp/passenger.1291/backends/backend.g0mi40ARBFbEdb08pxB3uzyh3JJyfR1eaI9xPuQwyLEd3NjQ24rbpSBb9FrZfNX5WI5VYQ====> Memory doesn't go up: good! (again)
Using a Raw HTTP Server, Plain sockets to implement webserver, http://lxscmn.com/tblog/?p=25