Friday, September 6, 2013

Neo4j and beauty of role-based entity referencing

Polyglot persistence is all the rage now, and one of more exotic types of databases around are graph DBs, so we decided to give it a shot for a part of larger system. We picked Neo4j. Even if we were not a Java shop, we would probably stumble on it anyway since it definitely looks the most popular graph database right now.

After working with it for some time I noticed a thing that I really like - object-graph mismatch is much lower than object-relational one. Although there are numerous things where object-relational mismatch shows its face, one of things that bothered me the most is that I always had to take good care of what type/role I will be referencing some object with.

In Java land, even with as poor meta-model as it has (compared to some other more exotic languages out there), we can reference some entity from another one by many ways - using class, subclass or interface, And we all know that one of great principles of good OO design is to reference objects by their role, which can be expressed in any of mentioned language constructs. Interface, if sufficient, is usually the most preferred way to express an object role.

Here's an example ...

Let's say we have a User class that has reference to its owner entity, described by UserOwner interface. This UserOwner interface is the role that the owner entity plays in that case.

 public class User {  
   private String name;  
   private UserOwner owner;  

And let's say this UserOwner role can be played by multiple different entities - company and department. Let's even say that users themselves can be the owners of other users. If we were to express this in Java, we would implement this UserOwner interface by many classes:

 public class User implements UserOwner {  
 public class Company implements UserOwner {  
 public class Department implements UserOwner {  

So how would we map this case to SQL world? We would have USERS table, but also COMPANIES and DEPARTMENTS table.

And to express the reference from user to its owner, we would need to have a foreign key that points from USERS table to .... to.... to what? We don't have a concept in SQL world that would "mark" USERS, DEPARTMENTS and COMPANIES tables belonging to some "USER_OWNERS" type so we could define the foreign key by that target. Problem is that SQL meta-model is still much poorer compared to OO meta-model, and it doesn't have a concept of supertables (for hierarchies of  tables), or some other concept that would mark records as being of multiple types.

In Neo4j it is straightforward - unlike SQL database, it is schema-less so we don't burden ourselves with types, we just have a special relationship type (named let's say "BELONGS_TO") that corresponds to association between a User and its UserOwner.

You see how different users reference their owners via BELONGS_TO relationship, regardless if that entity is company, department or other user. Now you can write simple Cypher queries such as this one which without any fuss fetches the owner of some entity:

 START user=node(<someUserId>) MATCH user-[:BELONGS_TO]->owner RETURN owner;  

In application layer, we would cast result of that query to UserOwner object and do with it whatever that role allows us (via methods on that interface).