BlobStore in AppScale

This brief article describes the blobstore implementation in AppScale. There are two files which differ in AppScale's AppServer compared to GAE's SDK version:

  • blobstore_stub.py
  • file_blob_storage.py -> datastore_blob_storage.py
  • a blobstore server for uploads
The blobstore implementation in AppScale must be distributed and fault tolerant. We could store the files on disk and replicate, but this becomes cumbersome to track which files are located where. AppScale uses the datastore to store the files instead, splitting up the blob into 1MB chunks. Each blob has a "BlobInfo" entity which describes the blob (owner, size, etc) and using the key to this entity we can get the set of chunks to the blob. The different chunks use the blob info key name with their chunk number (the sequential sequence of blocks) appended as its own key names. A file that is 3MB would have a blob info key of "xxx" and the chunks would be reference with key names "xxx__0", "xxx__1", and "xxx_2".

To prevent an AppServer from getting occupied with a large upload, all blobs are uploaded to a separate web server running tornado. When a users application creates a session, a session object is stored into the database. That session id is passed to the blobserver in the URI. It validates the upload, stores the file and a BlobInfo object which stores meta information about the file. After storing the file into the datastore, a form request with the BlobInfo key is sent to the application's successful path. The redirect which the application sends is then forwarded to the application user. 
§


Posterous theme by Cory Watilo