Tuesday, October 22, 2013

Introducing Teuta, laughingly simple dependency injection container in Clojure

It was year 2002, when I tried my first dependency injection container in Java (these were mostly called Inversion-of-Control containers then). It was one of Apache Avalon subprojects, namely Fortress (beside ECM, Merlin and some others). Before it, I designed my applications in any custom way I saw fit, and sometimes there wasn't much design at all, so that moment really felt enlightening. I know it sounds silly now because these containers are so common now in all mainstream languages, but back then, it really took quality of my apps to whole new level, and I felt I could comprehend my code much more easily.

Now it's 2013, and destiny took me to Clojure language. I'm still fresh to it, but what I noticed is there isn't much info around about structuring the applications, as if namespaces and vars contained in them are sufficient for anything. If there wasn't few presentations from Stuart Sierra or Prismatic team, I would probably go on thinking it must be an issue with my OO legacy. Fortunately, after these talks, I could see there is a real need for some kind of componentization, and although there are some libraries out there such as Prismatic Graph or Jig, they are somewhat different from what Java programmers are used to, so I decided to write my own, especially because it's so dead-simple idea. The final result is small GitHub project called Teuta.

Library Dependencies

Add the necessary dependency to your Leiningen project.clj and require the library in your ns:
[vmarcinko/teuta "0.1.0"] ; project.clj

(ns my-app (:require [vmarcinko.teuta :as teuta])) ; ns

Container Specification

Anyway, to create a component container, we have to start by defining a specification, and it is simply a map of entries [component-id component-specification]. Component ID is usually a keyword, though String or some other value can be used. Component specification is vector of [component-factory-fn & args], so a component can be constructed later, during container construction time, by evaluating factory function with given arguments. So you see, this is just an ordinary function, and a component can be constructed in any arbitrary way, though maybe most usual way would be to use records and their map factory functions which are very descriptive. If a component depends upon some other component, then it should be configured to use it. Referring to other components is done via
(teuta/comp-ref some-comp-id)
If components form circular dependencies, exception will be reported during container construction time. Similarly, if we want to parametrize some piece of component configuration, then we simply do that via:
(teuta/param-ref some-param-id-path)
So, specification would look something like:
{:my-comp-1 [mycompany.myapp/map->MyComp1Record 
             {:my-prop-1  "Some string"
              :my-prop-2  334
              :my-prop-3  (teuta/param-ref :comp-1-settings :some-remote-URL)
              :comp2-prop (teuta/comp-ref :my-comp-2)}]
 :my-comp-2 [mycompany.myapp/map->MyComp2Record 
             {:my-prop-1 6161
              :my-prop-2 (atom nil)
              :my-prop-3 (teuta/param-ref :comp-2-settings :admin-email)}]}
Since whole specification is simply a regular map, it is useful to have some common map containing always present components, and have separate profile-specific maps with components for production, test, development... That way you simply merge those maps together to construct desired final specification.

Container Construction

Once we have our specification, we can simply create a container by calling
(def my-container (teuta/create-container my-specification my-parameters))
The container is just a sorted map of [component-id component] entries. When the container map is printed, in order to make it a bit more clear, referred components will be printed as << component some-comp-id >>.

Since whole application state is also contained in this container map, this means it plays nicely with Stuart Sierra "reloaded" workflow.

Component Lifecycle

If a component's functions depend upon some side-effecting logic being executed prior to using them, then a component can implement vmarcinko.teuta/Lifecycle protocol. The protocol combines start and stop functions which will get called during starting and stopping of a container.
(defprotocol Lifecycle
  (start [this] "Starts the component. Returns nil.")
  (stop [this] "Stops the component. Returns nil."))
Container is started by:
(teuta/start-container my-container)
Components are started in dependency order. If any component raises exception during startup, the container will automatically perform stopping of all already started components, and rethrow the exception afterwards. Likewise, stopping of container is done via:
(teuta/stop-container my-container)
If any component raises exception during this process, the exception will be logged and the process will continue with other components.

Example

Here we define 2 components - divider and alarmer.

Divider takes 2 numbers and returns result of their division. Let's define working interface of the component as protocol, so we can allow many implementations.
(ns vmarcinko.teutaexample.divider)

(defprotocol Divider
  (divide [this n1 n2] 
  "Divides 2 numbers and returns vector [:ok result]. 
  In case of error, [:error \"Some error description\"] will be returned"))
Unlike this example, component interfaces will mostly contain multiple related functions. Request-handler components, such as web handlers, usually don't have a working interface since we don't "pull" them for some functionality, they just need to be started and stopped by container, thus implement Lifecycle protocol. Default implementation of our divider component will naturally return the result of dividing the numbers, but in case of division by zero, it will also send notification about the thing to alarmer component (by calling vmarcinko.teutaexample.alarmer/raise-alarm). Placing component implementation in separate namespace is just a nice way of separating component interface and implementation.
(ns vmarcinko.teutaexample.divider-impl
  (:require [vmarcinko.teutaexample.alarmer :as alarmer]
            [vmarcinko.teutaexample.divider :as divider]
            [vmarcinko.teuta :as teuta]))

(defrecord DefaultDividerImpl [alarmer division-by-zero-alarm-text]
  divider/Divider
  (divide [_ n1 n2]
    (if (= n2 0)
      (do
        (alarmer/raise-alarm alarmer division-by-zero-alarm-text)
        [:error "Division by zero error"])
      [:ok (/ n1 n2)])))
Alarmer is defined as follows:
(ns vmarcinko.teutaexample.alarmer)

(defprotocol Alarmer
  (raise-alarm [this description] "Raise alarm about some issue. Returns nil."))
Default implementation of alarmer "sends" alarm notifications to preconfigured email addresses. For this example, sending an email is just printing the message to stdout. It also prints alarm count, which is mutable state of this component, and is held in an atom passed to it during construction. Atom state is initialized and cleaned up during lifecycle phases - start and stop.
(ns vmarcinko.teutaexample.alarmer-impl
  (:require [vmarcinko.teutaexample.alarmer :as alarmer]
            [vmarcinko.teuta :as teuta]))

(defrecord DefaultAlarmerImpl [notification-emails alarm-count]
  alarmer/Alarmer
  (raise-alarm [_ description]
    (let [new-alarm-count (swap! alarm-count inc)]
      (println (str "Alarm Nr." new-alarm-count " raised: '" description "'; notifying emails: " notification-emails))))
  teuta/Lifecycle
  (start [_]
    (reset! alarm-count 0))
  (stop [_]
    (reset! alarm-count nil)))
So let's finally create container specification and wire these 2 components. We will also extract alarmer email addresses as application parameters.
(def my-parameters {:alarmer-settings {:emails ["admin1@mycompany.com" "admin2@mycompany.com"]}})

(def my-specification
  {:my-divider [vmarcinko.teutaexample.divider-impl/map->DefaultDividerImpl
                {:alarmer                       (teuta/comp-ref :my-alarmer)
                 :division-by-zero-alarm-text   "Arghhh, somebody tried to divide with zero!"}]

   :my-alarmer [vmarcinko.teutaexample.alarmer-impl/map->DefaultAlarmerImpl
                {:notification-emails   (teuta/param-ref :alarmer-settings :emails)
                 :alarm-count           (atom nil)}]})
Now we can construct the container, start it and try out dividing 2 numbers via divider component.
(def my-container (teuta/create-container my-specification my-parameters))

(teuta/start-container my-container)

(vmarcinko.teutaexample.divider/divide (:my-divider my-container) 3 44)
=> [:ok 3/44]

(vmarcinko.teutaexample.divider/divide (:my-divider my-container) 3 0)
=> Alarm Nr.1 raised: 'Arghhh, somebody tried to divide with zero!': notifying emails: ["admin1@mycompany.com" "admin2@mycompany.com"]
=> [:error "Division by zero error"]
In order to call vmarcinko.teutaexample.divider/divide function "from outside", we needed to pick divider component from the container first. But if request-handling piece of application is also a component in container, as could be the case with some web handler serving HTTP requests to our vmarcinko.teutaexample.divider/divide function, then container specification will handle wiring specified divider component. Let's create such a web handler component using popular Jetty web server:
(ns vmarcinko.teutaexample.web-handler
  (:require [ring.adapter.jetty :as jetty]
            [vmarcinko.teuta :as teuta]
            [ring.middleware.params :as ring-params]
            [vmarcinko.teutaexample.divider :as divider]))

(defn- create-handler [divider]
  (fn [request]
    (let [num1 (Integer/parseInt ((:params request) "arg1"))
          num2 (Integer/parseInt ((:params request) "arg2"))
          result (nth (divider/divide divider num1 num2) 1)]
      {:status 200
       :headers {"Content-Type" "text/html"}
       :body (str "<h1>Result of dividing " num1 " with " num2 " is: " result " </h1>")})))

(defn- ignore-favicon [handler]
  (fn [request]
    (when-not (= (:uri request) "/favicon.ico")
      (handler request))))

(defrecord DefaultWebHandler [port divider server]
  teuta/Lifecycle
  (start [this]
    (reset! server
      (let [handler (->> (create-handler divider)
                         ring-params/wrap-params
                         ignore-favicon)]
        (jetty/run-jetty handler {:port port :join? false}))))
  (stop [this]
    (.stop @server)
    (reset! server nil)))
Jetty server is held in an atom, and is started on configured port during lifecycle start phase. As can be seen, divider component is the only dependency of this component, and request URL parameters "arg1" and "arg2" are passed as arguments to vmarcinko.teutaexample.divider/divide function. We added also favicon request ignoring handler to simplify testing it via browser. This component requires popular Ring library, so one needs to add that to project.clj as:
:dependencies [[ring/ring-core "1.2.0"]
               [ring/ring-jetty-adapter "1.2.0"]
               ...
Let's expand our specification to wire this new component.
(def my-parameters { ...previous parameters ...
                    :web-handler-settings {:port 3500}})

(def my-specification
  { ....previous components ....
   :my-web-handler [vmarcinko.teutaexample.web-handler/map->DefaultWebHandler
                    {:port (teuta/param-ref :web-handler-settings :port)
                     :divider (teuta/comp-ref :my-divider)
                     :server (atom nil)}]})
Now, after the container has been started, we can try out HTTP request: 
http://localhost:3500?arg1=3&arg2=44
Division result should be returned as HTML response. Division with zero should print alarming message to REPL output.

Wednesday, October 2, 2013

Neo4j model for (SQL) dummies

In general, one characteristic of the mind is that it has hard time grasping new concepts if these are presented without comparison to some familiar ones. And I experienced that when trying to explain Neo4j data model to people who are stumbling on it for first time. Mostly they are confused by lack of schema, because when visualized, those scattered graph nodes, connected into some kind of spider web, bring confusion into minds so long accustomed to nicely ordered rectangular SQL world.

So what seemed to work better in this case is just to describe it using all too familiar RDBMS/SQL model and its elements: tables, columns, records, foreign keys ...In other words, let's try to describe Neo4j-graph model as it would be if built on top of SQL model.

Actually, this is quite easy to do. We just need 2 tables, and let's call them "NODES" and "RELATIONSHIPS". Both reflect 2 main elements in Neo4j model - graph nodes and relationships between them.

"NODES" table

This one would be where entities are stored, and it contains 2 columns - "ID" and "PROPERTIES".

ID PROPERTIES
334 {"name": "John Doe", "age": 31, "salary": 80000}
335 {"name": "ACME Inc.", "address": "Broadway 345, New York City, NY"}
336 {"manufacturer": "Toyota", "model": "Corolla", "year": 2005}
337{"name": "Annie Doe", "age" 30, "salary": 82000}

PROPERTIES column stores map-like data structure containing arbitrary properties with their values. Just for purpose of presentation, I picked JSON serialization here. So you see, due to this schema-less design, there are no constraints upon what properties are contained in the PROPERTIES column - which is actually the only practical/possible way since all entity types (department, company, employee, vehicle...) are stored in this single table.

"RELATIONSHIPS" table

This table would contain "ID", "NAME", "SOURCE_NODE_ID", "TARGET_NODE_ID" and "PROPERTIES" columns, and purpose is to store associations between nodes. We can say that records stored here represent schema-less version of SQL foreign-keys.

ID NAME SOURCE_NODE_ID TARGET_NODE_ID PROPERTIES
191 MARRIED_TO 334 337 {"wedding_date": "20070213"}
192 OWNS 337 336
193 WORKS_FOR 337 335 {"job-position": "IT manager"}

Relationship's NAME marks its "type", and we can add new association "types" into the system dynamically, just by storing new relationship records with previously non-existing names, whereas in SQL database, we need to pre-define available foreign keys upfront.

Since relationships usually have a direction (though they can be bi-directional also in Neo4j), thus we have "SOURCE_NODE_ID" and "TARGET_NODE_ID" foreign keys, pointing to respective NODES. Direction is mainly valuable for its semantic purpose.

Similar to NODES table, here we also have PROPERTIES column to store additional information about association - in SQL world we would need to introduce "link" table to store this kind of data.

Recap

Having no schema brings well known trade-off to the table. On one hand, the structure of such system is less obvious, and special care has to be taken not to corrupt the data, but on the other hand, given flexibility can be exploited for domains that are rich and rapidly changing. And of course, since there are no constraints imposed by database here, it means that application now is solely responsible for correctness of stored data.