Google App Engine Blobstore API and AppScale Implementation
Google App Engine's Blobstore API is the primary method of storing large objects. This blog post talks about the API and how it is implemented in AppScale.
Google App Engine Blobstore Upload
There are two methods of getting blobs uploaded, one is the Files API, in which you directly supply a large binary object programmatically, and the other is via an HTML form. When uploading a file via a form an upload link must be created:
upload_url = blobstore.create_upload_url('/upload')
This url becomes the action path in your HTML form. The upload url will actually redirect the browser client to another App Engine application which handles the upload directly from the user's browser. If you try to upload a file with a bad session, you'll see this application report an error (http://temporary-blobstore-error.appspot.com).
Behind the scenes it could be storing the blob in the Google File System (GFS) or as blocks into Megastore/BigTable. The '/upload' path tells Google where to send the blob's information after it has been successfully uploaded. The upload handler will get a POST from the blobstore application with the file swapped out for a blob info (BlobInfo) object. This object has information such as the file's name, creation date, extension, and size. The POST also contains other elements from the form. These are simply forwarded on. A direct link for hosting images can be attained from your blob:
image_url = images.get_serving_url(blob_key)
The image url will be hosted on the same hosting platform as Picassa (gghpt.com) providing high availability.
Blob Download
Downloading is as simple as providing a BlobKey (stored within a BlobInfo object):
BlobInfo.get(blob_key)
Or if you are serving up an image, just provide the image url.
AppScale Implementation
There are three components for the blobstore service in AppScale.
- Application server (Modified GAE SDK)
- Blobstore server (tornado server)
- Datastore (AppScale supports a multitude of datastores)
The application server is single threaded (although multiple instances/processes run on all machines) and we don't want an application server to get tied up handling uploads. Therefore we have a tornado server to handle these uploads, and it does so across all applications.
Let's step through the above workflow of how blobs are uploaded within AppScale.
- The user requests a web page which has an upload file form
- The application will create a blobstore session
- Store the session info into the datastore (prevents unauthorized uploads)
- Create a unique path to the blobstore server running on port 6106 (blob in alpha-numeric)
- The action path of the HTML form contains the path from step 2.2
- When the user submits the form, it goes to the blobstore server
- The blobstore server interacts with the datastore
- Verify the session
- Store a BlobInfo object
- Store the uploaded file in 1MB chunks
- Remove the session
- A POST is done to the success path given in step 2
- Any uploaded files are replaced with their BlobInfo entity
- All other form elements are forwarded
- The success path handler must do a redirect
- The redirect is forwarded to the user client
Application Example
Blobstore Example source code: http://tinyurl.com/3n8fjuj
Additional Resources
Official Blobstore Documentation: http://code.google.com/appengine/docs/python/blobstore/
AppScale Blobstore Server: http://tinyurl.com/3tue8dk
Previous blog on blobstore: http://nlake44.posterous.com/blobstore-in-appscale
-- Raj
