My Notes on Fantasm for Google App Engine

Fantasm is a Google App Engine library which abstracts away TaskQueues by configuring work flows as finite state machines. Other comparable projects include the Pipeline API and the MapReduce API. Fantasm is great for processing large amounts of data which cannot be done normally due to timeout constraints.

Configuration
Hook fantasm up into your app.yaml file.

- url: /fantasm/.*
  script: fantasm/main.py
  login: admin

State machines are specified in a fsm.yaml file. In the file you give your state machine a name and individual states and transitions.

State machines have a single starting state and can have multiple final states.

Each state's execution ends with that state emitting a string to signify what the next state should be.

Make sure you use the full path of the action class. Example:

  - action: serverside.computations.InitialClass

Otherwise you'll get a ModuleNotFound error.

Communication Between States
"context" is passed from from one state to another and done so by arguments in the url. By default you should just pass strings and not send more context than can fit in a single POST request.

Communication Internal to a State
"obj" is passed from doing a continuation to the actual execution of a state. The "obj" is not serialized between states.

Advance Settings
It is possible to fork off a new process by calling context.fork(data=dictionary_of_new_context).

Be Careful
Make sure you have non-idempotent statements (statements with side effects, like updating an entity in the datastore) are done last. There probably still are some race conditions even if you do this, but they should be rare. Use locks via memcache to ensure there are none.

All states with continuation should also have final as a potential state. This is needed for the execute method for the case of no results in the query.

When is your job done?

Right now there is no way to get a callback or a trigger that a job is done.

Useful Iteration

The documentation on the Google Article Site does not talk about this method which shows up in the testing code. This method does not require you to use cursors as when using the continuation function. Here's how to count up all the accounts for your application if your application is really popular (otherwise it might be best to just use count() for on the query):

from fantasm.action import FSMAction, DatastoreContinuationFSMAction

class AllAccountsClass(DatastoreContinuationFSMAction):
  def getQuery(self, context, obj):
    return Accounts.all()

  def execute(self, context, obj):
    if not obj['result']:
      return None
    return "peraccount"

# Fan in here every X seconds

class CountAccountsClass(FSMAction):
  def execute(self, contexts, obj):
    """Transactionally update our batch counter"""
    batch_key = "num_accounts"

    def tx():
      batch = Batch.get_by_key_name(batch_key)
      if not batch:
        # For whatever reason it was not already created in previous state
        batch = BadgeBatch(key_name=batch_key)
        batch.put()
      batch.counter += len(contexts)
      batch.put()
    db.run_in_transaction(tx)

 

What Does Your State Machine Look Like?

See your state machine by going to the url: fantasm/graph/<state_machine_name>

It uses the google chart API.

Fanning In

You can have a state where you specify in the fsm.yaml file to accumulate context every X seconds (fan_in: X). In your execute function you'll have a contexts or list_of_contexts variable where you can get just the length (or more from each context if need be). Then inside a transaction increment some counter.

Code examples: http://code.google.com/p/userinfuser/wiki/Analytics
Fantasm Site: http://code.google.com/p/fantasm/w/list

Fantasm is developed by: http://www.vendasta.com/

§


Posterous theme by Cory Watilo