Friday, August 27, 2010

GAE and background workers in Tapestry5 app

In GAE you use task queues API to implement background workers.

For instance Ping Service uses task queues to batch ping web pages according to cron schedule.

In task queue API every task considered as an HTTP request to your application.



If you have just several requests per your billing period (say ~100 per day) then using Tapestry5 to handle tasks requests is not a bad idea since this doesn't hurt billing too much.

But if you have thousands of requests (for instance Ping Service currently servers ~13K background jobs per day) using Tapestry5 for this purposes will be a problem. Why? Because of GAE load balancing policy. The thing is GAE may (and do) shut down/start up instances of your application sporadically for better utilization of its internal resources. And every time load request happens your T5 application will have to load and initialize entire app configuration again and again.

Tapestry5 page as background worker


To implement this approach you may just create new T5 page and implement worker logic in onActivate() method. In this case you have all the power of Tapestry5 (IoC, built-in services, activation context, etc.).

In Tapestry5 every page should have a template file with markup. But for background workers this would typically be files with empty/dummy markup since nobody will access these pages from browser. When I used this approach I used @Meta(Application.NO_MARKUP) annotation and override MarkupRender service that prevents normal rendering queue for pages having this annotation and returns empty content (<html></html>) to client. Here's the discussion of implementation details.

Custom filter as background worker



Using Filter API you can declare custom filter that handle task requests. In this way Tapestry5 shouldn't be involved to processing at all and there won't be any additional overhead during load requests.

    <filter>
<filter-name>runJob</filter-name>
<filter-class>dmitrygusev.ping.filters.RunJobFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>runJob</filter-name>
<url-pattern>/filters/runJob/*</url-pattern>
</filter-mapping>


The problem here is that Tapestry5 also uses Filter API to handle requests and usually declared to serve all incoming requests:

    <filter-mapping>
<filter-name>app</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>


To avoid loading Tapestry5 on such request I implemented LazyTapestryFilter class that checks if request is a background worker URL and ignores it.

public class LazyTapestryFilter implements Filter {

private static final Logger logger = LoggerFactory.getLogger(LazyTapestryFilter.class);

private Filter tapestryFilter;

private FilterConfig config;

public static FilterConfig FILTER_CONFIG;

@Override
public void init(FilterConfig config) throws ServletException
{
FILTER_CONFIG = config;
this.config = config;
// Note: Comment this off to profile Google API requests
// ApiProxy.setDelegate(new ProfilingDelegate(ApiProxy.getDelegate()));
}

@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException
{
String requestURI = ((HttpServletRequest) request).getRequestURI();

if (requestURI.startsWith("/filters/") || requestURI.equalsIgnoreCase("/favicon.ico"))
{
return;
}

if (tapestryFilter == null)
{
long startTime = System.currentTimeMillis();

logger.info("Creating Tapestry Filter...");

tapestryFilter = new TapestryFilter();
tapestryFilter.init(config);

logger.info("Tapestry Filter created and initialized ({} ms)", System.currentTimeMillis() - startTime);
}

tapestryFilter.doFilter(request, response, chain);
}

@Override
public void destroy()
{
tapestryFilter.destroy();
}

}


Its also a good idea to skip all non-tapestry requests that you usually declare in AppModule.java like this:

public static void contributeIgnoredPathsFilter(Configuration<String> configuration) {
// GAE filters (Admin Console)
configuration.add("/_ah/.*");
}


Note its required for Tapestry5 to ignore these paths to enable Admin Console in development server.

Worker filter implementation should initialize as late as possible, i.e. not in init() method, but in doFilter() because init() may be invoked during app server startup even if incoming request will not match that filter:

    @Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException
{
long startTime = System.currentTimeMillis();

if (emf == null) {
lazyInit();
}

// ...
}


Also note that using this approach you will have to manage transactions manually. You may consider Ping Service AbstractFilter as a reference.