Last night I promoted a new nREPL middleware project, ‘nrepl-inspect’, derived from the javert library.
This repository contains:
- an Emacs client file that extends nrepl.el, and
- a rich generic middleware inspector that runs under nREPL.
Key features include:
- ‘C-c C-i’ inspects the var at point, or any value returned by eval of an arbitrary expression in the current buffer’s active namespace,
- a simple model for recursing into sub-objects based on a value index map maintained in the middleware during serialization of the value, and
- a rendering method extensible for custom types.
My goal is to be able to stack navigate a Datomic database given an Entity, the example in the repository should support that with just a little more work.
Please note, while I use this in my day to day development, but it’s not yet well packaged and has been minimally tested. It currently does not truncate maps or sequences, so please don’t inspect ‘(repeat 1)’!
[NOTE: The release of cider deprecates much of the content here. I will post an update on Clojure Debugging ’14 early in the near year]
I’m ramping up for a new set of development projects in 2013 and 2014. My 2010 era setup with slime and swank-clojure is unlikely to remain a viable approach throughout the project. I’ve decided it is time to join the nREPL community as well as take advantage of some of architecture innovations there which may make it easier to debug the distributed systems I’m going to be working on.
Features I’m accustomed to from common lisp slime/swank:
- Code navigation via Meta-. and Meta-,
- Fuzzy completion in editor windows and the repl
- Documentation help in mini-buffer
- Object inspector. Ability to walk any value in the system
- Walkable backtraces with one-key navigation to offending source
- Evaluate an expression in a specific frame, inspect result
- Easy tracing of functions to the repl or a trace buffer (in emacs)
- Trigger a continuable backtrace via watchpoint or breakpoint
Only the first three of these features is available in the stock nrepl. The rest of this post will discuss how to setup a reasonable approximation to this feature set in Emacs using nREPL middleware providers as of May 2013.
My github fork of the Clojure library for HBase, clojure-hbase is now deprecated. I’ve extracted the functionality from David Santiago’s original library (with permission) along with a duplicate of his admin functions to create a parallel repository with the schema-oriented API I developed.
I recently wrote a plugin in Clojure to add to the Cloudera Flume framework. As it was my first time writing a full java class interface I had to learn about the proper use of both proxy and gen-class. Given the poor error reporting at the java-clojure boundary, figuring out what you did wrong if you don’t get every detail exactly right (particularly when loading a class in the plugin’s final environment) can be difficult.
The clojure-hadoop library is an excellent facility for running Clojure functions within the Hadoop MapReduce framework. At Compass Labs we’ve been using its job abstraction for elements of our production flow and found a number of limitations that motivated us to implement extensions to the base library. We’ve promoted this for integration into a new release of clojure-hadoop which should issue shortly.
There are still some design and implementation warts in the current release which should be fixed by ourselves or other users in the coming weeks.
Compass Labs is a heavy user of Clojure for analytics and data process. We have been using Stuart Sierra’s excellent clojure-hadoop package for running a wide variety of algorithms derived from dozens of different Java libraries over our datasets on large Elastic MapReduce clusters.
The standard way to build a Jar for mapreduce is to pack all the libraries for a given package into a single jar and upload it to EMR. Unfortunately, building uberjars for Hadoop is a mallet when a small tap is needed.
We recently reached a point where the overhead of uploading large jars causes a noticeable slow down in our development cycle, especially when launching jobs on connections with limited upload bandwidth and with the slower uberjar creation of lein 1.4.
There are (at least) two solutions to this:
- Build smaller jars by having projects with dependencies specific to a given job
- Cache the dependencies on the cluster and send over your dependency list and pull the set you need into the Distributed Cache at job configuration time.
To allow us to continue to package all our job steps in a single jar, source tree and lein project, we opted for the latter solution which I will now describe.