So what seemed to work better in this case is just to describe it using all too familiar RDBMS/SQL model and its elements: tables, columns, records, foreign keys ...In other words, let's try to describe Neo4j-graph model as it would be if built on top of SQL model.
Actually, this is quite easy to do. We just need 2 tables, and let's call them "NODES" and "RELATIONSHIPS". Both reflect 2 main elements in Neo4j model - graph nodes and relationships between them.
"NODES" table
This one would be where entities are stored, and it contains 2 columns - "ID" and "PROPERTIES".
ID | PROPERTIES |
334 | {"name": "John Doe", "age": 31, "salary": 80000} |
335 | {"name": "ACME Inc.", "address": "Broadway 345, New York City, NY"} |
336 | {"manufacturer": "Toyota", "model": "Corolla", "year": 2005} |
337 | {"name": "Annie Doe", "age" 30, "salary": 82000} |
PROPERTIES column stores map-like data structure containing arbitrary properties with their values. Just for purpose of presentation, I picked JSON serialization here. So you see, due to this schema-less design, there are no constraints upon what properties are contained in the PROPERTIES column - which is actually the only practical/possible way since all entity types (department, company, employee, vehicle...) are stored in this single table.
"RELATIONSHIPS" table
This table would contain "ID", "NAME", "SOURCE_NODE_ID", "TARGET_NODE_ID" and "PROPERTIES" columns, and purpose is to store associations between nodes. We can say that records stored here represent schema-less version of SQL foreign-keys.
ID | NAME | SOURCE_NODE_ID | TARGET_NODE_ID | PROPERTIES |
191 | MARRIED_TO | 334 | 337 | {"wedding_date": "20070213"} |
192 | OWNS | 337 | 336 | |
193 | WORKS_FOR | 337 | 335 | {"job-position": "IT manager"} |
Relationship's NAME marks its "type", and we can add new association "types" into the system dynamically, just by storing new relationship records with previously non-existing names, whereas in SQL database, we need to pre-define available foreign keys upfront.
Since relationships usually have a direction (though they can be bi-directional also in Neo4j), thus we have "SOURCE_NODE_ID" and "TARGET_NODE_ID" foreign keys, pointing to respective NODES. Direction is mainly valuable for its semantic purpose.
Similar to NODES table, here we also have PROPERTIES column to store additional information about association - in SQL world we would need to introduce "link" table to store this kind of data.
Recap
Having no schema brings well known trade-off to the table. On one hand, the structure of such system is less obvious, and special care has to be taken not to corrupt the data, but on the other hand, given flexibility can be exploited for domains that are rich and rapidly changing. And of course, since there are no constraints imposed by database here, it means that application now is solely responsible for correctness of stored data.
No comments:
Post a Comment