Friday, August 27, 2010

GAE and background workers in Tapestry5 app

In GAE you use task queues API to implement background workers.

For instance Ping Service uses task queues to batch ping web pages according to cron schedule.

In task queue API every task considered as an HTTP request to your application.



If you have just several requests per your billing period (say ~100 per day) then using Tapestry5 to handle tasks requests is not a bad idea since this doesn't hurt billing too much.

But if you have thousands of requests (for instance Ping Service currently servers ~13K background jobs per day) using Tapestry5 for this purposes will be a problem. Why? Because of GAE load balancing policy. The thing is GAE may (and do) shut down/start up instances of your application sporadically for better utilization of its internal resources. And every time load request happens your T5 application will have to load and initialize entire app configuration again and again.

Tapestry5 page as background worker


To implement this approach you may just create new T5 page and implement worker logic in onActivate() method. In this case you have all the power of Tapestry5 (IoC, built-in services, activation context, etc.).

In Tapestry5 every page should have a template file with markup. But for background workers this would typically be files with empty/dummy markup since nobody will access these pages from browser. When I used this approach I used @Meta(Application.NO_MARKUP) annotation and override MarkupRender service that prevents normal rendering queue for pages having this annotation and returns empty content (<html></html>) to client. Here's the discussion of implementation details.

Custom filter as background worker



Using Filter API you can declare custom filter that handle task requests. In this way Tapestry5 shouldn't be involved to processing at all and there won't be any additional overhead during load requests.

    <filter>
<filter-name>runJob</filter-name>
<filter-class>dmitrygusev.ping.filters.RunJobFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>runJob</filter-name>
<url-pattern>/filters/runJob/*</url-pattern>
</filter-mapping>


The problem here is that Tapestry5 also uses Filter API to handle requests and usually declared to serve all incoming requests:

    <filter-mapping>
<filter-name>app</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>


To avoid loading Tapestry5 on such request I implemented LazyTapestryFilter class that checks if request is a background worker URL and ignores it.

public class LazyTapestryFilter implements Filter {

private static final Logger logger = LoggerFactory.getLogger(LazyTapestryFilter.class);

private Filter tapestryFilter;

private FilterConfig config;

public static FilterConfig FILTER_CONFIG;

@Override
public void init(FilterConfig config) throws ServletException
{
FILTER_CONFIG = config;
this.config = config;
// Note: Comment this off to profile Google API requests
// ApiProxy.setDelegate(new ProfilingDelegate(ApiProxy.getDelegate()));
}

@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException
{
String requestURI = ((HttpServletRequest) request).getRequestURI();

if (requestURI.startsWith("/filters/") || requestURI.equalsIgnoreCase("/favicon.ico"))
{
return;
}

if (tapestryFilter == null)
{
long startTime = System.currentTimeMillis();

logger.info("Creating Tapestry Filter...");

tapestryFilter = new TapestryFilter();
tapestryFilter.init(config);

logger.info("Tapestry Filter created and initialized ({} ms)", System.currentTimeMillis() - startTime);
}

tapestryFilter.doFilter(request, response, chain);
}

@Override
public void destroy()
{
tapestryFilter.destroy();
}

}


Its also a good idea to skip all non-tapestry requests that you usually declare in AppModule.java like this:

public static void contributeIgnoredPathsFilter(Configuration<String> configuration) {
// GAE filters (Admin Console)
configuration.add("/_ah/.*");
}


Note its required for Tapestry5 to ignore these paths to enable Admin Console in development server.

Worker filter implementation should initialize as late as possible, i.e. not in init() method, but in doFilter() because init() may be invoked during app server startup even if incoming request will not match that filter:

    @Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException
{
long startTime = System.currentTimeMillis();

if (emf == null) {
lazyInit();
}

// ...
}


Also note that using this approach you will have to manage transactions manually. You may consider Ping Service AbstractFilter as a reference.

Thursday, August 26, 2010

GAE and Tapestry5 Exception Handling

Tapestry5 uses its own technique to process unhandled exceptions.
When unhandled exception occurs Tapestry5 redirects response to special error page which is responsible to display exception detail.

There is a standard error page in Tapestry5 that can be very helpful for developer if you configure your application to run in development mode. To do this you contribute SymbolConstants.PRODUCTION_MODE symbol with value "false" in your AppModule.java like this:

public static void contributeApplicationDefaults(
MappedConfiguration<String, String> configuration)
{
// ...
configuration.add(SymbolConstants.PRODUCTION_MODE, "false");
// ...
}

Standard error page provides you all necessary information to understand the cause of exception:


And here is how exception report looks like in production:


This is reasonable, because in production you usually don't want to display all this information to clients. But this is also not so user friendly, because it displays value of Throwable.getMessage().

Tapestry5 allows overriding standard error page with your own exception page so you can display more user friendly messages.

There's also another scenario when you don't want Tapestry5 to generate exception report, and let application server provide static HTML page with apologizes to client. This approach better suits for production, but in development mode its better to leave detailed error report as is.

To change the way Tapestry5 handles exceptions you should provide another implementation of RequestExceptionHandler. One way doing this is to decorate RequestExceptionHandler:

public RequestExceptionHandler decorateRequestExceptionHandler(
final Logger logger,
final Response response,
@Symbol(SymbolConstants.PRODUCTION_MODE)
boolean productionMode)
{
// Leave default implementation of RequestExceptionHandler in development mode
if (!productionMode) return null;

// Provide simple implementation that logs exception and returns
// HTTP error code which will be handled by application server
return new RequestExceptionHandler()
{
public void handleRequestException(Throwable exception) throws IOException
{
logger.error("Unexpected runtime exception", exception);

// Return HTTP error code 500
response.sendError(HttpServletResponse.SC_INTERNAL_SERVER_ERROR, null);
}
};
}


Next, add this markup to web.xml:

<error-page>
<error-code>500</error-code>
<location>/500.html</location>
</error-page>


Now in case of any exceptions client will see contents of 500.html.



This approach have one more advantage for GAE. Generating exception reports consumes billable CPU cycles and takes request processing time.

Saving CPU cycles is good. And there is one note about request processing time. As you may know on GAE each request have to be processed in 30 seconds. If it doesn't, then runtime raises DeadlineExceededException and gives application few hundreds of milliseconds to fail gracefully. As practice shows, default T5 RequestExceptionHandler + error report generation usually takes longer.

One more note about GAE exception handling. Since version 1.3.6 GAE allows developers declare custom static error handlers for GAE specific errors: over_quota, dos_api_denial and timeout.
In case of first two errors GAE doesn't even pass requests to application code. Timeout errors appear as a result of application code execution and (I suppose) this static error handler may conflict with RequestExceptionHandler that overrides DeadlineExceededException with HTTP error code 500.

I also want to share my implementation of over_quota.html page. I noticed free quotas got reset every day near 11am-12am Moscow Summer Time (its around 7am-8am UTC time, not sure if it the same for another applications). I thought it would be good if I include how many time is it left for GAE enabled free quotas next time. And though over_quota.html is a static page it is possible to include a peace of javascript that calculates this time in client timezone. Here is it:

<html>
<head>
<title>Ping Service - Over Capacity</title>
</head>
<body>
<h1>
Over Capacity
</h1>

<p>
We apologize for the inconvenience.
</p>

<p>
Service is temporary unavailable until <span id="deadline">8:00 am UTC time.</span>

<script type="text/javascript">
var element = document.getElementById("deadline");
var now = new Date();
var deadline = new Date(now.getFullYear(), now.getMonth(), now.getDate(), 8);
var timezoneOffset = now.getTimezoneOffset() / 60;
deadline.setHours(deadline.getHours() - timezoneOffset);
if (deadline <= now) {
deadline.setDate(deadline.getDate() + 1);
}
element.innerHTML = deadline.toLocaleTimeString().replace(/:00$/, "")
+ " your time ("
+ Math.round((deadline - now) / 60 / 60 / 1000)
+ " hours left).";
</script>
</p>
</body>
</html>



See also