Wednesday 5 November 2008

To shard or not to shard

I remember back in the day it was simple, HTML --> Server Side Scripting --> DB. Thats it. It was easy.

It seems that as with the gaming industry driving the development of hardware to handle the increased requirements of the gaming software, data is driving the development of frameworks and designs to solve one issue;

"How do we create reliable architecture to scale with increased storage and retrieval requirements". That to me is the bottom line. To further segment those requirements with proposed solutions

Storage
  1. More Disk Space
  2. More Servers
  3. Sharding
  4. Archiving
Retrieval
  1. In Memory caches
  2. Indexing(DB)
  3. Indexing(Application)
  4. Sharding
There are more but these seem to be the most prevelant. The performance and scalablity gains are paid for in complexity, which in turn excacerbates maintenance issues, but I suppose that is the price of progress.
In our teams latest prototype, we have created a prototype which works extremely well using a java suite of framworkes.
Briefly if uses Spring MCV and IOC,Hibernate Core,Hibernate Search,Memcached and MYSQL.
We have a DAO design pattern with Hibernate implementations, A single Hibernate Search(Lucene Index) as well as a Memcached object cache with an indexed mysql DB being used fro persistent storage.
After hammering the web application using JMETER, I have come to the conclusion that it performs very well.
Now enter Hibernate Sharding and the wheels are coming off!

My first blow came from my extensive use of DetachedCriteria.

My design all stems from a base object with a shardId in all object hierarchies, and a constraint ensuring all objects descended from that base object will be in the same shard. My ShardSelectionStrategy takes care of that based on entity type
So I subclassed DetachedCriteria and using a factory based on entity type I can retrieve the shard I need, which is restrictive but works for me.

I am now busy with the FullTextSession and I am running into big trouble.

Basically EventSource isnt supported by shards and a ShardedSessionImplementor and a SessionImplementor are quite different. I can created shard aware classes for all the main players.
  1. FullTextSession
  2. FullTextQuery
  3. AbstractQuery
In a nutshell, I have got to point, as expected where it returns the correct hits from the index, but since I needed to reference a single shard(The master shard) in order to do this, it will only return the objects on that shard. So my result size may be 3 but my list of objects will only contain the objects on that shard.
Now without rewriting the whole implementation, and going through that pain, how do I solve the issue of implementing a distributed fetch?

3 comments:

Anonymous said...

Well I got it to work. I am not sure if its the best solution, but it appears to work. I havent tested all the features but in a nutshell I can get all the hits from my index and then hydrate using the sharded session factory. It was actually a bit easier than I thought. What I did was.

1) I created a subclass of FullTextSession and added a property so that I could pass in the referecne to the ShardedSession independent of the constructor. I then created a method createShardedFullTextQuery
which returns an instance of a new class called ShardedFullTextQueryImpl which extends AbstractQueryImpl and implements FullTextQuery just like the FullTextQueryImpl.

2) The new ShardedFullTextQueryImpl is nearly identical to the FullTextQueryImpl save for the addition of a new Session property which takes the sharded session passed in in the constructor. Then in the method list() I passed the sharded session into the loader and the framework took care of the rest.

I am not sure if this is going to work but its start and if there are 10 hits in the index stored across all the shards it will return 10 objects.

So when you invoke the method createFullTextSession you pass your master shard(usually the first shard in your array) in as the session paramater and then pass in the reference to the full session as a property.

This is not the most elegant way but any other way will certainly be a lot more coding.

Unknown said...

It is very interesting to hear it is working;
Did your solution manage to avoid creating a mutual dependency from Shards to Search and back?
you should drop a word about it on hibernate's dev-list.

Anonymous said...

Hi Sanne,

There is a dependency.

If you refer to the original FullTextQueryImpl.list() method (Line 237 in the latest version)

It uses the SessionImplementor passed in as a constructor parameter. My subclass passes in the sharded session as a property and uses this instead.

So this line
Session sess = (Session) this.session, points to the sharded session

Session sess = (Session) this.shardedSession.

This gets passed into the Loader and the framework takes care of it from there.

To answer you question there is a dependency but by doing this

Session sess =
(this.shardedSession == null) ?
(Session) this.session : (Session)this.shardedSession;

You should be able to use a sharded or normal session.

This is still very much work in progress, I am still not sure if it will work accross the whole application, and I am having issues with other areas of the Hibernate framework using a sharded session. Once I have fully articulated the issues I will post them here.

This issue is also on the hibernate forum.
http://forum.hibernate.org/viewtopic.php?t=992085