dmitrygusev online: GAE and Tapestry5 Data Access Layer

GAE provides two ways communicating with its datastore from Java:

Using low-level API

Using JDO/JPA (with DataNucleus appengine edition)

In this post I will try to explain some performance improvements of JPA usage. Of course, there's always some overhead using high-level API. But I use JPA in Ping Service and think it worth it.

Update (17.09.2010): There is another way to communicate with GAE datastore from Java: Objectify

Spring vs. Tapestry-JPA

Its a good practice using JPA in conjunction with IoC-container to inject EntityManager into your services. At the very beginning of development I used Spring 3.0 as IoC and for transaction management. It worked, but it takes too much time to initialize during load requests, and every time user opens its first web page, he ended with DeadlineExceededException.

Then I tried tapestry-jpa from Tynamo and it fits perfectly. It runs pretty fast and allows to:

inject EntityManager to DAO classes (as regular T5 services)

manage transactions using @CommitAfter annotation

DAO and Caching

Since GAE datastore can't operate with multiple entities in a single transaction I've added @CommitAfter annotation to every method of each DAO class.

Datastore access is a an expensive operation in GAE, so I've implemented DAO-level caching:

DAO interface

public interface JobDAO {

    // ...

    @CommitAfter
    public abstract Job find(Key jobKey);
    @CommitAfter
    public abstract void update(Job job, boolean commitAfter);

DAO implementation

public class JobDAOImpl implements JobDAO {

    // ...

    @Override
    public Job find(Key jobKey) {
        return em.find(Job.class, jobKey);
    }

    public void update(Job job, boolean commitAfter) {
        if (!em.getTransaction().isActive()){
            // see Application#internalUpdateJob(Job)
            logger.debug("Transaction is not active. Begin new one...");
            
            // XXX Rewrite this to handle transactions more gracefully
            em.getTransaction().begin();
        }
        em.merge(job);
        
        if (commitAfter) {
            em.getTransaction().commit();
        }
    }

DAO cache

public class JobDAOImplCache extends JobDAOImpl {

    // ...

    @Override
    public Job find(Key jobKey) {
        Object entityCacheKey = getEntityCacheKey(Job.class, getJobWideUniqueData(jobKey));
        Job result = (Job) cache.get(entityCacheKey);
        if (result != null) {
            return result;
        }
        result = super.find(jobKey);
        if (result != null) {
            cache.put(entityCacheKey, result);
        }
        return result;
    }

    @Override
    public void update(Job job, boolean commitAfter) {
        super.update(job, commitAfter);
        Object entityCacheKey = getEntityCacheKey(Job.class, getJobWideUniqueData(job.getKey()));

        Job cachedJob = (Job)cache.get(entityCacheKey);

        if (cachedJob != null) {
            
            if (!cachedJob.getCronString().equals(job.getCronString())) {
                abandonJobsByCronStringCache(cachedJob.getCronString());
                abandonJobsByCronStringCache(job.getCronString());
            }
            
            cache.put(entityCacheKey, job);
        } else {
            abandonJobsByCronStringCache();
        }
        
        updateJobInScheduleCache(job);
    }

Notice how update method implemented in JobDAOImplCache. If DAO method changes object in database it is responsible for updating all cached object instances in the entire cache. It may be difficult to support such implementation, on the other hand it may be very effective because you have full control over cache.

Each *DAOImplCache class uses two-level JSR-107 based cache:

Level-1: Local memory (appserver instance, request scoped)
provides quick access to objects that were "touched" during current request

Level-2: Memcache (cluster wide)
allows application instances to share cached objects across entire appengine cluster

Note that local memory cache should be request scoped, or it may lead to stale data across appserver instances. To reset local cache after each request it should be registered as ThreadCleanupListener:

    public static Cache buildCache(Logger logger, PerthreadManager perthreadManager) {
        try {
            CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
            Cache cache = cacheFactory.createCache(Collections.emptyMap());
            
            LocalMemorySoftCache cache2 = new LocalMemorySoftCache(cache);

            //  perthreadManager may be null if we creating cache from AbstractFilter
            if (perthreadManager != null) {
                perthreadManager.addThreadCleanupListener(cache2);
            }
            
            return cache2;
        } catch (CacheException e) {
            logger.error("Error instantiating cache", e);
            return null;
        }
    }

Here's how LocalMemorySoftCache implementation looks like:

public class LocalMemorySoftCache implements Cache, ThreadCleanupListener {

    private final Cache cache;
    
    private final Map<Object, Object> map;
    
    @SuppressWarnings("unchecked")
    public LocalMemorySoftCache(Cache cache) {
        this.map = new SoftValueMap(100);
        this.cache = cache;
    }

    @Override
    public void clear() {
        map.clear();
        cache.clear();
    }

    @Override
    public boolean containsKey(Object key) {
        return map.containsKey(key)
            || cache.containsKey(key);
    }

    @Override
    public Object get(Object key) {
        Object value = map.get(key);
        if (value == null) {
            value = cache.get(key);
            map.put(key, value);
        }
        return value;
    }

    @Override
    public Object put(Object key, Object value) {
        map.put(key, value);
        return cache.put(key, value);
    }

    @Override
    public Object remove(Object key) {
        map.remove(key);
        return cache.remove(key);
    }

    // ...

    /**
     * Reset in-memory cache but leave original cache untouched.
     */
    public void reset() {
        map.clear();
    }
    
    @Override
    public void threadDidCleanup() {
        reset();
    }
}

Make Tapestry-JPA Lazy

On every request Tapestry-JPA creates new EntityManager and starts new transaction on it. And at the end of request if current transaction is still active it gets rolled back.

But if all data were taken from cache, there won't be any interaction to database. In this case EntityManager creation and transaction begin/rollback were not required. But they consumed time and another resources.

Moreover Tapestry-JPA creates EntityManagerFactory instance on application load which is very expensive, though you might not need it (because of DAO cache or simply because request isn't using datastore at all).

To avoid this I created lazy implementations of JPAEntityManagerSource, JPATransactionManager and EntityManager, you can find them here: LazyJPAEntityManagerSource and LazyJPATransactionManager.

dmitrygusev online

Wednesday, September 01, 2010

GAE and Tapestry5 Data Access Layer

Spring vs. Tapestry-JPA

DAO and Caching

Make Tapestry-JPA Lazy

2 comments:

Blog Archive

About Me

Followers

My Profiles In Social Networks

My Open Source Projects

My shared items in Google Reader

Labels