Wednesday, September 01, 2010

GAE and Tapestry5 Data Access Layer

GAE provides two ways communicating with its datastore from Java:


  1. Using low-level API

  2. Using JDO/JPA (with DataNucleus appengine edition)

In this post I will try to explain some performance improvements of JPA usage. Of course, there's always some overhead using high-level API. But I use JPA in Ping Service and think it worth it.

Update (17.09.2010): There is another way to communicate with GAE datastore from Java: Objectify

Spring vs. Tapestry-JPA



Its a good practice using JPA in conjunction with IoC-container to inject EntityManager into your services. At the very beginning of development I used Spring 3.0 as IoC and for transaction management. It worked, but it takes too much time to initialize during load requests, and every time user opens its first web page, he ended with DeadlineExceededException.

Then I tried tapestry-jpa from Tynamo and it fits perfectly. It runs pretty fast and allows to:

  • inject EntityManager to DAO classes (as regular T5 services)

  • manage transactions using @CommitAfter annotation


DAO and Caching



Since GAE datastore can't operate with multiple entities in a single transaction I've added @CommitAfter annotation to every method of each DAO class.

Datastore access is a an expensive operation in GAE, so I've implemented DAO-level caching:

DAO interface

public interface JobDAO {

// ...

@CommitAfter
public abstract Job find(Key jobKey);
@CommitAfter
public abstract void update(Job job, boolean commitAfter);

DAO implementation

public class JobDAOImpl implements JobDAO {

// ...

@Override
public Job find(Key jobKey) {
return em.find(Job.class, jobKey);
}

public void update(Job job, boolean commitAfter) {
if (!em.getTransaction().isActive()){
// see Application#internalUpdateJob(Job)
logger.debug("Transaction is not active. Begin new one...");

// XXX Rewrite this to handle transactions more gracefully
em.getTransaction().begin();
}
em.merge(job);

if (commitAfter) {
em.getTransaction().commit();
}
}

DAO cache

public class JobDAOImplCache extends JobDAOImpl {

// ...

@Override
public Job find(Key jobKey) {
Object entityCacheKey = getEntityCacheKey(Job.class, getJobWideUniqueData(jobKey));
Job result = (Job) cache.get(entityCacheKey);
if (result != null) {
return result;
}
result = super.find(jobKey);
if (result != null) {
cache.put(entityCacheKey, result);
}
return result;
}

@Override
public void update(Job job, boolean commitAfter) {
super.update(job, commitAfter);
Object entityCacheKey = getEntityCacheKey(Job.class, getJobWideUniqueData(job.getKey()));

Job cachedJob = (Job)cache.get(entityCacheKey);

if (cachedJob != null) {

if (!cachedJob.getCronString().equals(job.getCronString())) {
abandonJobsByCronStringCache(cachedJob.getCronString());
abandonJobsByCronStringCache(job.getCronString());
}

cache.put(entityCacheKey, job);
} else {
abandonJobsByCronStringCache();
}

updateJobInScheduleCache(job);
}


Notice how update method implemented in JobDAOImplCache. If DAO method changes object in database it is responsible for updating all cached object instances in the entire cache. It may be difficult to support such implementation, on the other hand it may be very effective because you have full control over cache.

Each *DAOImplCache class uses two-level JSR-107 based cache:

  • Level-1: Local memory (appserver instance, request scoped)

    provides quick access to objects that were "touched" during current request


  • Level-2: Memcache (cluster wide)

    allows application instances to share cached objects across entire appengine cluster



Note that local memory cache should be request scoped, or it may lead to stale data across appserver instances. To reset local cache after each request it should be registered as ThreadCleanupListener:

    public static Cache buildCache(Logger logger, PerthreadManager perthreadManager) {
try {
CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
Cache cache = cacheFactory.createCache(Collections.emptyMap());

LocalMemorySoftCache cache2 = new LocalMemorySoftCache(cache);

// perthreadManager may be null if we creating cache from AbstractFilter
if (perthreadManager != null) {
perthreadManager.addThreadCleanupListener(cache2);
}

return cache2;
} catch (CacheException e) {
logger.error("Error instantiating cache", e);
return null;
}
}


Here's how LocalMemorySoftCache implementation looks like:

public class LocalMemorySoftCache implements Cache, ThreadCleanupListener {

private final Cache cache;

private final Map<Object, Object> map;

@SuppressWarnings("unchecked")
public LocalMemorySoftCache(Cache cache) {
this.map = new SoftValueMap(100);
this.cache = cache;
}

@Override
public void clear() {
map.clear();
cache.clear();
}

@Override
public boolean containsKey(Object key) {
return map.containsKey(key)
|| cache.containsKey(key);
}

@Override
public Object get(Object key) {
Object value = map.get(key);
if (value == null) {
value = cache.get(key);
map.put(key, value);
}
return value;
}

@Override
public Object put(Object key, Object value) {
map.put(key, value);
return cache.put(key, value);
}

@Override
public Object remove(Object key) {
map.remove(key);
return cache.remove(key);
}

// ...

/**
* Reset in-memory cache but leave original cache untouched.
*/
public void reset() {
map.clear();
}

@Override
public void threadDidCleanup() {
reset();
}
}


Make Tapestry-JPA Lazy



On every request Tapestry-JPA creates new EntityManager and starts new transaction on it. And at the end of request if current transaction is still active it gets rolled back.

But if all data were taken from cache, there won't be any interaction to database. In this case EntityManager creation and transaction begin/rollback were not required. But they consumed time and another resources.

Moreover Tapestry-JPA creates EntityManagerFactory instance on application load which is very expensive, though you might not need it (because of DAO cache or simply because request isn't using datastore at all).

To avoid this I created lazy implementations of JPAEntityManagerSource, JPATransactionManager and EntityManager, you can find them here: LazyJPAEntityManagerSource and LazyJPATransactionManager.

2 comments:

  1. Have you thought about contributing this back to the tapestry-jpa library?

    ReplyDelete
  2. Sure, why not.
    What is the best way to do this?

    ReplyDelete