Monday, April 27, 2015

Setting up Undertow with Spring MVC

In case you still don't know about it, Undertow is web server developed under JBoss community, and is praised a lot for its impressive performance. I usually deployed my web application in .war form under some servlet container (mostly Tomcat), but now I wanted to try out embedded servlet container, and Undertow had seemed especially good for that considering its small footprint.

I am also heavy Spring user, so I wanted to integrate this web server with Spring MVC, but only examples so far where I could find Spring-Undertow integration is within Spring Boot project, and for some reason I still use plain Spring Framework, so I set out to do it my own.

Of course, as everything in Spring is a bean, I developed my own UndertowServer bean which was pretty straightforward, and bigger question was how to define my web application deployment in it because there are couple of ways to do it. I decided to use Servlet 3.0+ ServletContainertInitializer (one other option would be to straight use Undertow's API to specify web application deployment).

Here is piece of code from our implementation of this interface:
 public class WebAppServletContainerInitializer implements ServletContainerInitializer, ApplicationContextAware {  
   private ApplicationContext applicationContext;  

   @Override  
   public void onStartup(Set<Class<?>> c, ServletContext ctx) throws ServletException {  
     XmlWebApplicationContext rootWebAppContext = new XmlWebApplicationContext();  
     rootWebAppContext.setConfigLocation("/WEB-INF/applicationContext.xml");  
     rootWebAppContext.setParent(applicationContext);  
     ctx.addListener(new ContextLoaderListener(rootWebAppContext));
  
     FilterRegistration.Dynamic encodingFilter = ctx.addFilter("encoding-filter", CharacterEncodingFilter.class);  
     encodingFilter.setInitParameter("encoding", "UTF-8");  
     encodingFilter.setInitParameter("forceEncoding", "true");  
     encodingFilter.addMappingForServletNames(EnumSet.allOf(DispatcherType.class), false, "admin");  

     FilterRegistration.Dynamic springSecurityFilterChain = ctx.addFilter("springSecurityFilterChain", DelegatingFilterProxy.class);  
     springSecurityFilterChain.addMappingForServletNames(EnumSet.allOf(DispatcherType.class), false, "admin");  
     ServletRegistration.Dynamic dispatcher = ctx.addServlet("admin", DispatcherServlet.class);  
     dispatcher.setLoadOnStartup(1);  
     dispatcher.addMapping("/admin/*");  
 ...  
 ...  

We had to inject our main ApplicationContext (that contains UndertowServer bean) via ApplicationContextAware marker, to be able to use it as parent context for root WebApplicationContext that we put under /WEB-INF/ together with other DispatcherServlet contexts.

Later on, we use this ServletContainertInitializer implementation during startup phase of Undertow server to construct DeploymentInfo object:
 InstanceFactory<? extends ServletContainerInitializer> instanceFactory = new ImmediateInstanceFactory<>(servletContainerInitializer);  
 ServletContainerInitializerInfo sciInfo = new ServletContainerInitializerInfo(WebAppServletContainerInitializer.class, instanceFactory, new HashSet<>());  

 DeploymentInfo deploymentInfo = constructDeploymentInfo(sciInfo);  

FInal result is that we can now readily use this web server bean in our Spring XML deployment descriptors, such as:
 <bean class="vmarcinko.undertow.UndertowServer">  
   <property name="port" value="8080"/>  
   <property name="webAppName" value="myapp"/>  
   <property name="webAppRoot" value="${distribution.dir}/web-app-root"/>  
   <property name="servletContainerInitializer">  
     <bean class="vmarcinko.web.admin.WebAppServletContainerInitializer"/>  
   </property>  
 </bean>  

If you ask me why I still use Spring XML in 2015 instead of Java config - well, I think XML is still nicer DSL for describing deployment than Java, but that's just me :)

Anyway, when you boot the system, this little web server will be up and running under specified port, serving this single web application under given root directory. In the example above, my administration web console (specific DispatcherServlet serving that under '/admin/*') would be available under:

http://localhost:8080/myapp/admin

Whole code for both classes is available at this Gist.



Monday, January 26, 2015

Nanocube feeder in Java, part 2

Continuing on the previous post where we prepared DMP encoder, we now want to see how can we use it to encode and stream the data into Nanocube. And because Nanocube process uses standard input for data provision, we have to first start the process in order to obtain that input stream.

We will use ProcessBuilder class to set up everything needed to start the Nanocube. Configuration-wise, we only need to set up the directory where Nanocube binaries are present (in particular nanocube-leaf command).
 private final static Logger appLogger = Logger.getLogger("Feeder");  
 private final static Logger nanocubeOutLogger = Logger.getLogger("Nanocube OUT");  
 ...  
 ...  
 String nanocubeBinPath = "/home/vmarcinko/nanocube/bin";  
 ProcessBuilder pb = new ProcessBuilder(nanocubeBinPath + "/nanocube-leaf", "-q", "29512", "-f", "10000");  
   
 Map<String, String> env = pb.environment();  
 env.put("NANOCUBE_BIN", nanocubeBinPath);  
   
 pb.redirectOutput(ProcessBuilder.Redirect.PIPE);  
 pb.redirectInput(ProcessBuilder.Redirect.PIPE);  
 pb.redirectErrorStream(true);  
   
 appLogger.info("Starting Nanocube process...");  
 Process nanocubeProcess = pb.start();  
   
 ExecutorService executorService = Executors.newSingleThreadExecutor();  
 startNanocubeOutputLoggingTask(executorService, nanocubeProcess);  
   
 OutputStream inPipeOutputStream = nanocubeProcess.getOutputStream();  
 ...  
 ...  
As seen above, when we start Nanocube process, we fetch output end of the process input "pipe" in form of OutputStream, and we will use that to stream the data into Nanocube.

One interesting piece above is the way we handle Nanocube process output - regardless if it was error or standard output, we start new task in another thread (thus java.util.concurrent.Executor) to read that output and print it via some logging framework used in our java application (here plain java.util.logging loggers). Here is relevant startNanocubeOutputLoggingTask method:
 private static void startNanocubeOutputLoggingTask(Executor executor, final Process nanocubeProcess) {  
   InputStream outPipeInputStream = nanocubeProcess.getInputStream();  
   Runnable runnable = new Runnable() {  
     @Override  
     public void run() {  
       try {  
         try (BufferedReader reader = new BufferedReader(new InputStreamReader(outPipeInputStream))) {  
           String line;  
           while ((line = reader.readLine()) != null) {  
             nanocubeOutLogger.info(line);  
           }  
         }  
       } catch (IOException e) {  
         appLogger.log(Level.SEVERE, "Error reading from Nanocube output: " + e.getMessage(), e);  
       }  
     }  
   };  
   // start logging task  
   executor.execute(runnable);  
 }  
   
By having ExecutorService instance around, we can use it to stop that logging task at the application shutdown.

And finally, with the code above, and instance of NanocubeDmpEncoder ready (see previous post), we perform the feeding:
 appLogger.info("Streaming DMP content into Nanocube process...");  
   
 NanocubeDmpEncoder dmpEncoder = ...;  

 byte[] headerBytes = dmpEncoder.encodeHeaders();
 inPipeOutputStream.write(headerBytes);

 byte[] recordBytes = dmpEncoder.encodeRecord(....);
 inPipeOutputStream.write(recordBytes);

 inPipeOutputStream.flush();  
 inPipeOutputStream.close();  
   
 appLogger.info("Streaming of data finished");  
   
 nanocubeProcess.waitFor();  
 executorService.shutdown();  
 appLogger.info("Nanocube process stopped");  
   
At the end, we wait for Nanocube process to stop before doing the cleanup of our output logging task.


Wednesday, January 21, 2015

Nanocube feeder in Java, part 1

For some time I've been sporadically following Nanocube project, which is a novel way to index spatio-temporal data, all in memory, enabling very fast analytical querying and visualization of fairly large datasets. Just to get rough impression of Nanocube memory consumption - for up to 10 billions of records, it can require around few tens of GB of RAM, and the querying will work sufficiently fast to allow real-time interactive exploration of such huge pool of data.

Note - this post is based on nanocube version 3.1.

Currently provided way to feed data into nanocube is via python command line tool - nanocube-binning-csv.py.
In a nuthshell, this tool is consuming CSV datasets and converts it into so called "dmp" format which is than piped into nanocube server. This can be cumbersome way for some use cases, and being a java developer I wanted to have some java-friendly way to feed the data, so I set out to develop my own feeder in Java.

DMP Encoder

Biggest portion of my feeder would be DMP encoder which would convert input data, given as plain java objects, into DMP format required by nanocube server. Since documentation about DMP format is somewhat scarce, the easiest way seemed to be analyzing python code within nanocube-binning-csv.py tool, and translate it to Java.

The result encoder is not the most generic one, but focused on the most common use cases - when we have following dimensions - spatial, temporal, count, and arbitrary number of categorical dimensions. Of course, you need accordingly to have compiled binaries of nanocube for given combination of dimensions and their details (such as byte lengths of certain dimensions, zoom level etc..), and few of them are found in nanocube-dir/bin directory, example being nc_q25_c1_c1_u2_u4.

Whole source code for DMP encoder is available here.

The way to use it is as follows.

You first instantiate the encoder by specifying schema for the dataset. Constructor arguments are:
  • name
  • number of location zoom levels (eg. 25 is typically sufficient)
  • time offset
  • length of time bin in seconds
  • length of time value in bytes (2, 4 or 8)
  • map that describes categorical dimensions and their options
 NanocubeDmpEncoder dmpEncoder = new NanocubeDmpEncoder("telco-data", 25, timeOffset, 3600, 2, categoryEnumerations);  
Map of category data is actually a 2-level map. Top level map contains entries where keys are category names and values are maps again, containing enum options specification for given category. Each enum options map ties domain-specific option value to (label, byte) pair which specifies how that option will be presented (label) and also how it will be encoded in nanocube (byte value).

For example, if we want to have categories "NetType" and "Status" which will represent network type and status for some mobile call, then in Java we will usually use enum class, such as:
 public enum NetType {  
   GSM, GPRS, UMTS, EDGE  
 }  
 public enum Status {  
   SUCCEEDED, FAILED  
 }  
So you can define our category definition map something like:
 Map<String, Map<Object, CategoryEnumInfo>> categoryEnumerations = new HashMap<>();  
 categoryEnumerations.put("NetType", createEnumInfosFromEnumClass(NetType.class));  
 categoryEnumerations.put("Status", createEnumInfosFromEnumClass(Status.class));  
 ....  
 private <E extends Enum> Map<Object, CategoryEnumInfo> createEnumInfosFromEnumClass(Class<E> enumClass) {  
   Map<Object, CategoryEnumInfo> enumInfos = new HashMap<>();  
   for (E constant : enumClass.getEnumConstants()) {  
     enumInfos.put(constant, new CategoryEnumInfo(constant.name(), (byte) constant.ordinal()));  
   }  
   return enumInfos;  
 }  
So now when we have NanocubeDmpEncoder instance constructed, now we can encode the headers and records into required DMP format.
 // headers  
 byte[] headerBytes = dmpEncoder.encodeHeaders();  
 // record  
 Map<String, Object> categoryValues = new HashMap<>();  
 categoryValues.put("NetType", NetType.GSM);  
 categoryValues.put("Status", Status.SUCCEEDED);  
 byte[] recordBytes = dmpEncoder.encodeRecord(44.8789661034259, 17.72673345412568, new Date(), 1, categoryValues);  

In second part of this post we will see how to boot Nanocube server from Java process, and how to stream encoded data into it.