Filed under: AppScale

App Engine Channel API in AppScale

One of Google's newest App Engine features is the Channel API which allows for the pushing of messages to a client's javascript code. This blog entry explains AppScale's scalable implementation which is built using ejabberd and strophejs. 

There are two sets of APIs for the developer. First is the python API which consists of create_channel(app_client_id) and send_message(app_client_id, message). The create channel API under the covers uses the xmpp service implementation of AppScale. We are able to leverage ejabberd to take care of the distribution and sending of messages for us. The trick here lies in that we must create temporary accounts with each new channel created. This requires garbage collection of channels which live on longer than a prescribe period of time. 

Second, is the javascript API which can be included into the developer's code by adding the following line in the head of the html:
<script src='/_ah/channel/jsapi'></script>
This API allows for the creation of connections using strophejs. Strophejs is a robust and open source project that enabled BOSH connections to ejabberd (https://github.com/metajack/strophejs). The creation of a channel socket is actually using strophejs's connections, as well as its message callbacks. The functions have the same name and functionality to preserve the API, but the implementation is different. Google's implementation uses google talk and their xmpp service. Their javascript in production is minified and hard to decode while their SDK version uses polling instead of long lived BOSH connections (500ms poll time). AppScale's javascript library is also minified to save on bandwidth, yet the unminified version can be found in appscale/AppServer/google/appengine/tools/appscale-js.js. Within this file you will see a goog.appengine library to maintain the APIs as well as the strophe library along with additional libraries of MD5 and SHA which are needed by strophe. 

Nginx is used as a proxy to connect to ejabberd's http bind path (see http://tinyurl.com/68qbwyc on why a proxy is needed). The proxy connects to port 5280 to ejabberd's http-bind path. Long lived ajax calls are created to provide low overhead as opposed to constant polling. This can be seen when using resource tracking with Firefox or Chrome. You'll notice a call which blocks until a message is returned, followed immediately by another long lived connection. The javascript library also listens to the unload event where the client window is closed. Before a full exit, the client library will send a disconnect message to free up resources. 

AppScale's implementation allows for sending messages to multiple receivers which is more functionality then the one sender and one receiver restriction in GAE. Any clients given the same application key will see messages which are sent to that application when using the send_message(client_id, message) function. 

Naming issues
Each xmpp account is registered as <username>@<head-ip>, where username is the first part of your email (i.e. joe.smith of joe.smith@gmail.com). This reserves that username, and restricts other emails which the same username name (i.e. joe.smith@yahoo.com). 

The xmpp API implementation also creates an xmpp account for each app. If your username conflicts with an appname, you will not be able to use that email. We have ideas on how to alleviate this problem but its low on our list. If we see that users definitely don't like this limitation we will address it. 

The User/App Server within AppScale, which is a SOAP frontend to the APPS and USERS table in the datastore, must keep track of which User entry is an app, user, or channel. This is for authentication and also to know which accounts need to be garbage collected.  

Scalable Implementation
In order to have xmpp scale we need DNS. Without it we cannot route between machines because their domain (ip address) is different. The default setting will be to route all messages to the head node using nginx, but we will support DNS configuration for the advance users in the future. 

How to add a database in AppScale

This blog discusses how to add a datastore in AppScale ("datastore" and "database" are interchangeably used). There are three primary procedures which must be automated by the developer: installing, starting and stopping the datastore. Installation is done using shell scripts. Starting and stopping must be written in ruby (the AppController's language). Moreover, the AppScale DB interface must be implemented using a python interface.

Reference Code
There are currently nine different datastores already implemented in AppScale. Each one of these can serve as an example as to how to best integrate your given datastore. There is however a limitation with some datastores which do not have the capability to do range queries or the ability to get an entire table. For these datastores you must use the dhash interface. The dhash interface shards the key space amongst 16 special keys within the datastore to get around this limitation, but these datastores do not scale as well because each put must access these special keys.  
Datastore which use the dhash interface:
  • MemcacheDB (master/slave, written in C)
  • Voldemort (peer to peer, Java)
  • SimpleDB
  • Scalaris
Datastores which use the regular DB interface:
  • Cassandra (peer to peer, Java)
  • HBase (master/slave, Java)
  • Hypertable (master/slave, C++)
  • MongoDB (master/slave, C++)
  • MySQL (peer to peer, C++)
Code Locations
Starting, Stopping, and AppDB Interface paths:
appscale/AppDB/
appscale/AppDB/dbinterface.py
appscale/AppDB/dhash_datastore.py
appscale/AppDB/dbname/
appscale/AppDB/dbname/py_dbname.py
appscale/AppDB/dbname/dbname_helper.rb
appscale/AppDB/dbname/prime_dbname.py
appscale/AppDB/datastore_tester.py
appscale/AppDB/dbname/templates/
appscale/AppDB/dbname/patches/

Installation paths:
appscale/debian/appscale_install_functions.sh
appscale/debian/appscale_install.sh
appscale/debian/control.all
appscale/debian/makedeb_all.sh
appscale/debian/rules.dbname

Tools:
appscale-tools/bin/appscale-run-instances

Installing the Datastore
The scripts needed to install the datastore are to go in appscale/debian/. Here you will see shell scripts for automating installation. Grep the code in this folder for an example database for reference.

Initializing and Stopping the Datastore
The datastore you may be creating may need to have configuration files custom made for each spawning. All configuration files, or templates for them must go into appscale/AppDB/dbname/templates. The function in dbname_helper.rb named setup_db_config_files should use these templates. This function has the master ip, slave ips, and credentials (dictionary of additional args) passed to it. See a reference helper file for the functions which must be implemented.

AppScale DB Interface
The interface is a template for the following functions:
get_entity(table_name, row_key, column_names)
put_entity(table_name, row_key, column_names, cell_values)
get_table(table_name, column_names)
delete_entity(table_name, row_key)
get_schema(table_name)
delete_table(table_name)

The interface is very particular as to what is expected for each template function. Fully understand one of the reference implementations before implementing a new one.

AppScale Tools
Add the new database name into the run instance script.

Testing
Beyond trying out multiple applications and seeing if they behave correctly, there is also the datastore_tester.py in appscale/AppDB/.
Run this with args: -t <dbname>
This will check to make sure the peculiarities of the interface are correctly implemented.

Code Placement of AppScale

This blog entry explains the different components of AppScale and its code layout. After using apt-get or building from scratch, you'll find the appscale directory in the root folder.

Controller: appscale/AppController
This is the main controller of the system. All nodes have an AppController, but the master node is in charge of telling all other AppControllers on what to do. The code in djinn.rb dictates to other nodes using remote command via ssh and through SOAP calls what to run. This spawns the databases, AppServer (both python and java), and all services which are needed for the APIs. 

Application Servers: appscale/AppServer and appscale/AppServer_Java

The AppServer is a modified Google App Engine SDK. Stubs from the original SDK are removed and replaced with scalable components. 

Load Balancer and Login: appscale/AppLoadBalancer

The AppLoadBalancer is in charge of routing traffic to AppServers as well as providing a login service. Routing is done using Nginx and HAProxy.

Scatch Install: appscale/debian

To build AppScale from scratch use the appscale_build.sh script located in this directory.

Monitoring: appscale/AppMonitoring

AppScale employs Monitor which uses collectd to gather cluster wide information. 

Randomized Killing of Services: appscale/Loki
This service kills components randomly within AppScale to test our fault tolerance.

Datastores: appscale/AppDB

Each datastore's interface can be found here under that datastore's given directory. The naming convention is py_<dbname>.py. Each datastore implements the AppScale DB Interface found in within AppDB/dbinterface.py. Each datastore must also provide a helper script which starts up and shuts down each datatstore. This is a ruby script and is called upon by the AppControler during initialization. Two services which abstract the db away are the appscale protocol buffer server (interfaces to the AppServers via HTTP) and the soap_server.py (provides SOAP calls for managing and storing information about users and applications). Moreover, the ZooKeeper code lives here (used for transactions).

Logs: On each node, a multitude of places

  • General logs: /tmp/<ip> of node
  • System log: /var/log/syslog
  • HBase log: appscale/AppDB/hbase/hbase-{version}/logs
  • Hadoop logs: appscale/AppDB/hadoop-{version}/logs
  • Hypertable logs: /opts/Hypertable/current/logs
  • Cassandra logs: /var/log/cassandra/system.log
  • MongoDB logs: /var/log/mongodb/
  • MySQL logs: /var/log/mysql/
  • ZooKeeper logs: /var/log/zookeeper/
  • ejabberd logs: /var/log/ejabberd/
  • nginx logs: /var/log/nginx/
  • scalaris logs: /var/log/scalaris/
  • memcachedb logs: /var/log/memcachdb.log
  • appscale datastore logs (if enabled): AppDB/logs

Any questions? Just ask.

Building AppScale From Scratch

Here is how to build AppScale from the latest code from Launchpad. 

  • Create a blank Ubuntu Karmic image
  • Start the image up and get a console
    • xm create xen.conf
    • xm console <console id>
  • Sudo su and become root if not already
  • Install any basic packages such as ssh using apt-get
  • Make sure to allow root ssh login
  • Edit the /etc/apt/sources.list file  
    • In order to install java you must add multiverse as one of the repositories.
  • Install bzr
    • apt-get -y install bzr
  • Check out the code
    • cd ~
    • bzr branch lp:appscale
    • if you intend to run tools from your head node then
      • cd; bzr branch lp:appscale/trunk-tools
  • cd ~/appscale/debian
  • sh appscale_build.sh
After that the script will take a while to build and install all the needed packages. This will install all the databases. To only install a subset of databases use the apt-get install method or comment out databases from the build script.  If there is a problem with the build please email the mailing list or contact me. 
On success, halt the image and copy the root.img file to your other instances. Make sure the images are correctly shut down before moving or copying images around. More info on setting up a Xen or KVM image can be found on the google code site (http://code.google.com/p/appscale/). 

For Contributors 
To contribute code back to the appscale branch 
  • Create a new branch 
  • After you have setup your ssh keys push your version of the branch
    • bzr commit for any changes (locally stored)
    • bzr launchpad-log <yourlogin>
    • bzr push <branch> --use-existing-dir
  • The modifications must be tested 
    • for 1-4 nodes
    • for all databases
    • and built from scratch
    • with applications "guestbook", "tasks", and any custom applications
  • Go to the branches launchpad page and click propose for merge
  • Select the appscale trunk as the target branch
  • Give good information as to what the changes were and so on in the comment section
  • We will test your proposed branch before merging it into the main branch
Testing Tips
Modifications to the datastores should be tested by running
  • python ~/appscale/AppDB/datastore_tester.py -t <db-name>
  • python ~/appscale/AppDB/soap_tester.py
  • see other unit tests in ~/appscale/AppDB/tests/
If you're looking for a research project to work on or just want to contribute contact the mailing list. 

AppScale 1.4 just released

We've finally got our newest version of AppScale out thanks to the hard work of the AppScale team. Check out the new and improved AppScale at http://code.google.com/p/appscale. Here are some cool new features:
  1. Install appscale using apt-get
  2. Advance placement of components can be specified in a yaml file 
  3. We have transaction support for all the backends
  4. Java GAE has more supported APIs
  5. HTTPS support
  6. and much more...

1 of 1


Posterous theme by Cory Watilo