GAE provides two ways communicating with its datastore from Java:
- Using low-level API
- Using JDO/JPA (with DataNucleus appengine edition)
In this post I will try to explain some performance improvements of JPA usage. Of course, there's always some overhead using high-level API. But I use JPA in Ping Service and think it worth it.
Update (17.09.2010): There is another way to communicate with GAE datastore from Java: Objectify
Spring vs. Tapestry-JPA
Its a good practice using JPA in conjunction with IoC-container to inject
EntityManager
into your services. At the very beginning of development I used Spring 3.0 as IoC and for transaction management. It worked, but it takes too much time to initialize during load requests, and every time user opens its first web page, he ended with DeadlineExceededException
.Then I tried tapestry-jpa from Tynamo and it fits perfectly. It runs pretty fast and allows to:
- inject
EntityManager
to DAO classes (as regular T5 services) - manage transactions using
@CommitAfter
annotation
DAO and Caching
Since GAE datastore can't operate with multiple entities in a single transaction I've added
@CommitAfter
annotation to every method of each DAO class.Datastore access is a an expensive operation in GAE, so I've implemented DAO-level caching:
DAO interface
public interface JobDAO {
// ...
@CommitAfter
public abstract Job find(Key jobKey);
@CommitAfter
public abstract void update(Job job, boolean commitAfter);
DAO implementation
public class JobDAOImpl implements JobDAO {
// ...
@Override
public Job find(Key jobKey) {
return em.find(Job.class, jobKey);
}
public void update(Job job, boolean commitAfter) {
if (!em.getTransaction().isActive()){
// see Application#internalUpdateJob(Job)
logger.debug("Transaction is not active. Begin new one...");
// XXX Rewrite this to handle transactions more gracefully
em.getTransaction().begin();
}
em.merge(job);
if (commitAfter) {
em.getTransaction().commit();
}
}
DAO cache
public class JobDAOImplCache extends JobDAOImpl {
// ...
@Override
public Job find(Key jobKey) {
Object entityCacheKey = getEntityCacheKey(Job.class, getJobWideUniqueData(jobKey));
Job result = (Job) cache.get(entityCacheKey);
if (result != null) {
return result;
}
result = super.find(jobKey);
if (result != null) {
cache.put(entityCacheKey, result);
}
return result;
}
@Override
public void update(Job job, boolean commitAfter) {
super.update(job, commitAfter);
Object entityCacheKey = getEntityCacheKey(Job.class, getJobWideUniqueData(job.getKey()));
Job cachedJob = (Job)cache.get(entityCacheKey);
if (cachedJob != null) {
if (!cachedJob.getCronString().equals(job.getCronString())) {
abandonJobsByCronStringCache(cachedJob.getCronString());
abandonJobsByCronStringCache(job.getCronString());
}
cache.put(entityCacheKey, job);
} else {
abandonJobsByCronStringCache();
}
updateJobInScheduleCache(job);
}
Notice how update method implemented in
JobDAOImplCache
. If DAO method changes object in database it is responsible for updating all cached object instances in the entire cache. It may be difficult to support such implementation, on the other hand it may be very effective because you have full control over cache.Each
*DAOImplCache
class uses two-level JSR-107 based cache:- Level-1: Local memory (appserver instance, request scoped)
provides quick access to objects that were "touched" during current request
- Level-2: Memcache (cluster wide)
allows application instances to share cached objects across entire appengine cluster
Note that local memory cache should be request scoped, or it may lead to stale data across appserver instances. To reset local cache after each request it should be registered as
ThreadCleanupListener
:public static Cache buildCache(Logger logger, PerthreadManager perthreadManager) {
try {
CacheFactory cacheFactory = CacheManager.getInstance().getCacheFactory();
Cache cache = cacheFactory.createCache(Collections.emptyMap());
LocalMemorySoftCache cache2 = new LocalMemorySoftCache(cache);
// perthreadManager may be null if we creating cache from AbstractFilter
if (perthreadManager != null) {
perthreadManager.addThreadCleanupListener(cache2);
}
return cache2;
} catch (CacheException e) {
logger.error("Error instantiating cache", e);
return null;
}
}
Here's how
LocalMemorySoftCache
implementation looks like:public class LocalMemorySoftCache implements Cache, ThreadCleanupListener {
private final Cache cache;
private final Map<Object, Object> map;
@SuppressWarnings("unchecked")
public LocalMemorySoftCache(Cache cache) {
this.map = new SoftValueMap(100);
this.cache = cache;
}
@Override
public void clear() {
map.clear();
cache.clear();
}
@Override
public boolean containsKey(Object key) {
return map.containsKey(key)
|| cache.containsKey(key);
}
@Override
public Object get(Object key) {
Object value = map.get(key);
if (value == null) {
value = cache.get(key);
map.put(key, value);
}
return value;
}
@Override
public Object put(Object key, Object value) {
map.put(key, value);
return cache.put(key, value);
}
@Override
public Object remove(Object key) {
map.remove(key);
return cache.remove(key);
}
// ...
/**
* Reset in-memory cache but leave original cache untouched.
*/
public void reset() {
map.clear();
}
@Override
public void threadDidCleanup() {
reset();
}
}
Make Tapestry-JPA Lazy
On every request Tapestry-JPA creates new EntityManager and starts new transaction on it. And at the end of request if current transaction is still active it gets rolled back.
But if all data were taken from cache, there won't be any interaction to database. In this case
EntityManager
creation and transaction begin/rollback were not required. But they consumed time and another resources. Moreover Tapestry-JPA creates
EntityManagerFactory
instance on application load which is very expensive, though you might not need it (because of DAO cache or simply because request isn't using datastore at all).To avoid this I created lazy implementations of
JPAEntityManagerSource
, JPATransactionManager
and EntityManager
, you can find them here: LazyJPAEntityManagerSource
and LazyJPATransactionManager
.
Have you thought about contributing this back to the tapestry-jpa library?
ReplyDeleteSure, why not.
ReplyDeleteWhat is the best way to do this?