Saturday, November 21, 2009

The Great Chef: A Parable

There was once a boy who loved food. He would sit for hours and read anything that had to do with food and cooking, including the instruction manuals for the kitchen appliances. The boy began to cook by trying out simple recipes. He would make the same recipes over and over and over, until he understood how each part contributed to the whole. His father noticed the boy's near obsession with cooking, and arranged for the boy to apprentice a chef.

Though the chef was not particularly masterful, he was a good chef. The boy was enthralled. He studied every minute detail of the kitchen, every movement of the chef. He studied the way the chef assigned tasks to the others that worked in the kitchen. He studied the way the chef chose his ingredients, and prepared his recipes. The boy became familiar with every tool at the chef's disposal, and eagerly did every job assigned to him, from chopping vegetables to mopping the floor.

Eventually, the boy became a man and he decided to enter cooking school. The first few years were boring drudgery to him, and he learned almost nothing. He had already experienced the inner workings of a disciplined kitchen, and he had continued to be a voracious reader of anything to do with food and cooking. However, in the later years of cooking school a new world was opened to him. He learned cooking techniques only few in the world understood. He learned the potent and exotic flavor of each spice. He learned the magic and the music of cooking.

Delicate dishes were like a symphony, each part had its purpose, and when everything came together the person eating it shared in some great truth, it was almost...mystical. The man had such a profound and heartfelt appreciation for food and cooking, that he found he couldn't even explain it to anyone else. In fact, the only people he could explain it to were others who had the same deep appreciation for food and cooking.

The man excelled at cooking school. His instructors and professors loved him because he was a passionate student. He eventually felt as though his professors were his peers, instead of his teachers. He even taught them a few things as he experimented with creating some truly unique recipes.

Finally, the man graduated from cooking school with the highest honors that could be achieved. He had become a master craftsman, an artisan. He had become The Great Chef. He struck out on his own to start a restaurant. His restaurant soon gained acclaim as everyone recognized his genius. Reservations had to be made over a year in advance.

The Great Chef decided that he would try something that would tax his skills to the breaking point. He had always toyed with the idea of creating this one dish that everyone said was impossible to create. He set his mind to it. If he cooked the tomatoes just so it would bring out the flavor he desired, and that would combine perfectly with the mushrooms. The combination of spices would perfectly complement the other ingredients. He sought out only the freshest meats and vegetables that were at the peak of ripeness, and at great expense.

All of The Great Chef's life's preparation had led up to this moment. By following his finely tuned intuition, he had created a recipe where no single ingredient could be removed without causing the flavor of the whole dish to collapse, because each ingredient depended on the others to draw out and complement its flavors. He only had to make the recipe once to know that it was perfect. He had done the impossible! He had proven beyond any doubt that he had no equal, and in the process he had created a new branch of the culinary arts.

One day, a man came to have dinner at The Great Chef's restaurant. He wasn't a connoisseur but he liked to try strange new things, and when someone told him The Great Chef's new dish was the best, he had to have it. He had made a reservation a year ago, and had come in from out-of-town expressly for this purpose. When it came time to order, the man said he would like to try the new dish, but as he read the description he said, "Oh...I don't like tomatoes. Can you make it without the tomatoes?"

Friday, November 20, 2009

The Programming of Philosophy

Just recently I watched Rich Hickey's presentation from the JVM Languages Summit 2009. It is a very interesting and thought provoking presentation, and well worth viewing. In it he takes from philosophy an understanding of time, state, and identity and applies it to the design of computer programming languages and models for concurrency.

Rich's presentation is also a further proof that no science (to include computer science) is philosophically or religiously neutral1. Computer science in particular is one of the more "philosophical" sciences. We model and reason about information in its purest and most elemental forms. We learn De Morgan's laws in CS classes for goodness sake!

The connection between philosophy, religion, and computer science is illustrated best in the field of artificial intelligence, where we must answer questions like:

  • What is intelligence?
  • Are human brains nothing more than biochemical computers?
  • Can an electronic computer model intelligence accurately?

I find it telling, that most AI researchers today punt on these issues. They have moved away from trying to define and model some theory of general intelligence, and moved towards statistical techniques, and creating agents that "act rationally" by imitating human behavior. However, even taking a pragmatic approach2 there is still a philosophical underpinning. For instance, take Jeff Hawkins work at Numenta, which I have written about previously, clearly there are influences of Empiricism.

We each have our own worldview, and our worldview not only influences the way we model the world, but it also limits our model of the world. Programmers are fond of talking about how the programming language you use limits the way you think about solving a problem3. We encourage each other to learn multiple languages especially from different paradigms (declarative, functional, object oriented, imperative, etc.) to gain new ways of solving problems. We have arguments about which languages model the world better. Is action (functional languages) primary or is existence/state (object oriented) primary? In other words which came first God existing, or God creating? You could even liken the debate about static versus dynamic typing to a debate about absolute versus relative morality. Can we possibly come up with a system of rules beforehand that are applied in every situation, or is that too rigid? ... OK maybe that last analogy is pushing it a bit... :)

I like what Rich has done. He acknowledges that his philosophy of state and time has come from Alfred Whitehead. Perhaps there are more lessons that can be learned from philosophy and applied to computer science. Or perhaps if we are explicit about our philosophical underpinnings and follow them to their logical conclusions we will gain some useful insights.

Footnotes:

1 This will most assuredly set some people afire, but I do not see a distinction between philosophy and religion. They both answer questions we have about the nature of the universe, the limits and proper use of reasoning, etc.

2 Pragmatism is a branch of philosophy, by the way.

3 Again from philosophy! See "Wittgenstein philosophy of language"

Friday, October 30, 2009

Simplicity and Complexity in Software

Avdi Grimm wrote "Simplicity is Complicated" where he argues that simplicity isn't simple. It brought to mind a few pet peeves of my own.

I think I agree with what Avdi has to say, but I might couch it in different language and come at it from the complexity side of the issue. I also have some opinions on simplicity in programs, which basically boil down to readability.

I like to think of complexity as being of two kinds: essential, and accidental. Accidental complexity is complexity that is introduced by the way in which you solve the problem. It is complexity that is unnecessary, and that could be removed by solving the problem in a more elegant fashion. Essential complexity, however, is complexity that is inherent to the problem, and cannot be reduced no matter how you solve the problem.

I had this conversation with a coworker recently. We were working on several reports for a Rails application. We were discussing whether it would be better to have a single controller with sixteen actions, or sixteen controllers with one action each. This is an example of essential complexity. If you have sixteen reports, then you are going to have sixteen "somethings," there is just no way around it. This is like Avdi's example of a pocket of air trapped under plastic, if you squeeze on the actions, then sixteen controllers will pop up somewhere else.

Then when it comes to simplicity in programs, my view is to reduce the accidental complexity as much as possible, and also to stick to the conventions and idioms of your language as much as possible. Obviously, there would be no innovation if we only stuck to conventions, but as much as possible, a Ruby programmer should be able to read and quickly understand a Ruby program. Consider these two examples from Avdi's article:

Example 1


sum = 0
i = 0
while i < times.length
  time = times[i]
  # parse / manipulate the time
  sum = sum + time
  i = i + 1
end

Example 2


def average_time_of_day(times)
  sum = times.map(&:to_time).inject(&:+)
end

I don't think that Avdi is necessarily arguing for this viewpoint, but he says that "[Example 1] uses language constructs that are familiar to almost all programmers, not just Ruby programmers," and that this can be considered a form of simplicity. While it's true that one might see that as a form of simplicity, I think the most important thing when writing a Ruby program should be writing it in a way that it is easy for Ruby programmers to understand.

When a Ruby programmer sees Example 1, he has to stop and think about what is going on, because it is not idiomatic Ruby. When he sees Example 2, he can immediately grasp the essence of what the code is doing. This is more an issue of readability than simplicity.

I'm not saying that the Ruby programmer grasps it easier because it is more terse. A Java programmer would look at Example 2 and perhaps be a bit befuddled. What I'm saying is that a Ruby programmer grasps it easily because it is idiomatic Ruby. I do not think it would be appropriate to write something like Example 2 in Java.

I would not advocate writing Java code with Ruby (like Example 1), nor would I advocate writing Ruby code with Java. Stick to the idioms and conventions of the language you are working within. And to those who say, "Well some Ruby programmers wouldn't easily grasp the essence of Example 2," I say, "You mean newbies?" Tough. They should master their tool. They should read more code written by others. That's part of being a professional programmer.

So to summarize, the distinction between essential and accidental complexity is, I think, a useful and important one in thinking about complexity of code. Second, to me the issue isn't so much simplicity as readability, and the key to readability is to stick to the idioms of your language as much as possible.

Wednesday, October 28, 2009

H1N1 "Swine" Flu

I have been confused about H1N1. The news reports that 1,000 people have died from it, but is that 1,000 out of 1,000 or 1,000 out of 17 million? There is no context to it. They report hyped up stories about a person dying from H1N1, but then it turns out the person did not die from H1N1 but from "complications"--where their immune system was compromised and they got an additional infection. You hear about how 60% of the hospitalizations and deaths are occurring in those under 65, when with the seasonal flu it is usually the reverse (i.e. 60% of the serious cases are in those over 65).

Then if you're like us, you hear about H1N1 infections in friends and friends of friends, and they're not dying. You hear about school classrooms where 5-6 kids at a time contract it, yet there are no follow up reports about schools where half of the kids have died...apparently they are recovering from the infections without much fanfare. You hear about doctors telling their patients that they do not need to get confirmation that they have H1N1, because they would treat it the same either way. So much for tracking the "deadly" pandemic as it spreads like wildfire across the nation.

Many health professionals are still seriously urging people to get vaccinated, but are saying that H1N1 is presenting in about the same way as the seasonal flu. Although there are some (perhaps disconcerting) differences from the seasonal flu, it is not some super deadly virus ravaging the earth's population.

The US Federal Government's Flu.gov site has a page about H1N1. Here are the highlights (if you trust the government ;) ):

  • About 70% of the people who have been hospitalized have had one or more medical conditions that placed them in the "high risk" category.
  • People over 64 do not appear to have an increased risk of complications.
  • Although there have been hospitalizations and deaths, the vast majority of people who have contracted H1N1 have recovered without medical treatment.
  • H1N1 spreads the same way as seasonal flu (coughing, sneezing, person-to-person contact). It is not an airborne super contagious version of the flu.

The CDC also has a page about the characteristics of H1N1. More highlights:

  • Between April and July of 2009, it is estimated that about 1 million people had been infected with H1N1, and of those 1 million about 5,000 people had been hospitalized and about 300 had died.
  • H1N1 occurs most often among 5-24 year olds.
  • Hospitalizations occur most often among 0-4 year olds.
  • Deaths occur more often among 5-24 year olds. But again the deaths usually occur in cases where there are other underlying medical conditions.

Another whole set of issues, which I won't go into here, has to do with the vaccination. All I can say is that as far as I know there is nothing "different" or "untested" about this vaccine. It is prepared the same way as the vaccine for the seasonal flu, it just contains a different strain of the virus. There is more information about the vaccination at flu.gov.

I am not a health professional, statistician, or expert in any way. I may have misinterpreted something, if so, let me know. This is just food for thought, and perhaps a voice of balance among the hype.

Tuesday, October 27, 2009

Google Reader gets magic

Something I have long desired in a feed reader is magic. Not just magic, but personalized magic. I was all excited with Google Reader's "auto" sorting, which turned out to be less than useful. The problem was that it was not personalized.

I had even thought about creating a "smart" reader. Something as simple as a naive bayes filter seemed like it would be a step in the right direction. If I can teach a computer to recognize spam, then why can't I teach it to recognize the feed articles that I enjoy? I had experimented with such a smart reader, but it was never enough of a problem for me to pursue it far enough.

Enter Google Reader's new "magic" sorting. Unlike the previous "auto" sorting, this one is personalized. It takes into account the articles that I "like", "star" and "share." I've been a big fan of true personalization. Sites like digg and reddit (and postrank in the blogosphere) are nice, but I don't want to read what the "community" finds interesting--which is often puerile 13 year old male content--I want to read what I find interesting.

Finally a feed reader seems to have what I'm looking for: true personalization. I plan on using this feature, and I hope I won't be disappointed.

Friday, July 10, 2009

metric_fu, rcov, and rspec

I've been setting up metric_fu on one of my projects, and I ran into a problem where the rcov report was not including my specs. (This is an old project that has both Test::Unit tests and RSpec specs.) I did lots and lots of digging, and was baffled that no one has mentioned that they cannot run coverage reports for their specs using metric_fu. It seemed like an obvious and heinous oversight on the part of metric_fu.

As always, I suspected that I was missing something. Test::Unit tests can be run individually from the command line like ruby path/to/test, and metric_fu appeared to be trying to do this with the specs, but when I did ruby path/to/spec I got nothing.

I actually looked at the metric_fu specs to discover this, and it is probably very obvious to any serious RSpec user, but I discovered that I needed to add require "spec/autorun" to the top of my spec_helper.rb file. This includes the magic that makes ruby path/to/spec work, and makes rcov include my specs in its report when it is invoked by metric_fu.

Tuesday, April 14, 2009

Terracotta Bug Reports

The integration work with Clojure and Terracotta has been progressing. After I brought up a couple of the issues I had encountered on the tc-dev mailing list, it turned out that I had discovered a couple of bugs in Terracotta.

These three issues were the source of some of the major changes that were required to integrate Clojure and Terracotta, and they should all be fixed in Terracotta 3.0.1:

In particular, CDV-1233 required some ugly changes in the compiler, which should now be (thankfully!) unnecessary.

I believe I also have a solution to the problem that comes about when a root Var binding is a non-portable object. Stay tuned for that!

Monday, March 30, 2009

Clojure + Terracotta Update

I have gotten to the point in my Clojure + Terracotta experiment, where I believe all of the features of Clojure are functional (Refs, Atoms, transactions, etc.). I do not have a way to extensively test the Clojure functionality, but I have run the clojure.contrib.test-clojure test suites successfully, as well as some simple tests on my machine.

There are still some open issues, and given the limited extent to which I have tested this, I would not consider this production quality in the least. I would welcome help from the Clojure community in testing this integration module. I'm sure there are unexplored corners.

Being that several of the changes are relatively trivial, they could be easily integrated back into the Clojure core. I have detailed as best as possible the changes I had to make to Clojure in this report: Clojure + Terracotta Integration Report

The code is available at GitHub (http://github.com/pjstadig/tim-clojure-1.0-snapshot/tree/master), and there are instructions on setting it up and running the code. If you have any difficulties or questions, please feel free to e-mail me paul@stadig.name

Thursday, March 5, 2009

Clojure + Terracotta: We Have REPLs!

Update: The Clojure + Terracotta integration is (I believe) feature complete. Details at http://paul.stadig.name/2009/03/clojure-terracotta-update.html.

JVM #1

paul@pstadig-laptop:~/tim-clojure/tim-clojure-1.0-SNAPSHOT/examples/shared-everything$ ./bin/dso-clojure repl.clj 
Starting BootJarTool...
2009-03-05 15:00:10,868 INFO - Terracotta 2.7.3, as of 20090129-100125 (Revision 11424 by cruise@su10mo5 from 2.7)
2009-03-05 15:00:11,428 INFO - Configuration loaded from the file at '/home/paul/tim-clojure/tim-clojure-1.0-SNAPSHOT/examples/shared-everything/tc-config.xml'.

Starting Terracotta client...
2009-03-05 15:00:14,904 INFO - Terracotta 2.7.3, as of 20090129-100125 (Revision 11424 by cruise@su10mo5 from 2.7)
2009-03-05 15:00:15,436 INFO - Configuration loaded from the file at '/home/paul/tim-clojure/tim-clojure-1.0-SNAPSHOT/examples/shared-everything/tc-config.xml'.
2009-03-05 15:00:15,656 INFO - Log file: '/home/paul/terracotta/client-logs/org.terracotta.modules.sample/20090305150015636/terracotta-client.log'.
2009-03-05 15:00:17,870 INFO - Statistics buffer: '/home/paul/tim-clojure/tim-clojure-1.0-SNAPSHOT/examples/shared-everything/statistics-127.0.1.1'.
2009-03-05 15:00:18,421 INFO - Connection successfully established to server at 127.0.1.1:9510
user=> (defn foo [] 42)
#'user/foo
user=>

JVM #2

paul@pstadig-laptop:~/tim-clojure/tim-clojure-1.0-SNAPSHOT/examples/shared-everything$ ./bin/dso-clojure repl.clj
Starting BootJarTool...
2009-03-05 15:01:39,663 INFO - Terracotta 2.7.3, as of 20090129-100125 (Revision 11424 by cruise@su10mo5 from 2.7)
2009-03-05 15:01:40,225 INFO - Configuration loaded from the file at '/home/paul/tim-clojure/tim-clojure-1.0-SNAPSHOT/examples/shared-everything/tc-config.xml'.

Starting Terracotta client...
2009-03-05 15:01:45,507 INFO - Terracotta 2.7.3, as of 20090129-100125 (Revision 11424 by cruise@su10mo5 from 2.7)
2009-03-05 15:01:46,091 INFO - Configuration loaded from the file at '/home/paul/tim-clojure/tim-clojure-1.0-SNAPSHOT/examples/shared-everything/tc-config.xml'.
2009-03-05 15:01:46,275 INFO - Log file: '/home/paul/terracotta/client-logs/org.terracotta.modules.sample/20090305150146254/terracotta-client.log'.
2009-03-05 15:01:50,868 INFO - Connection successfully established to server at 127.0.1.1:9510
"user=> "(foo)
42
"user=> "*print-dup*
false
"user=> "

Commentary

This is obviously an example of the "shared everything" approach. It's neither perfect nor complete, but it's a start. There are still some non-portable classes that need to be reworked, for some reason the second VM is printing the REPL prompt readably even though *print-dup* (as you can see) is false, I still haven't worked out the problem with *in*, *out*, and *err*, etc. etc.

It's still very raw, but I'll see if I can't push to github in the next day or two. This is an exciting first step!

Tuesday, March 3, 2009

Clojure + Terracotta: The Next Steps

Update: I've gotten a *multiple* REPLs running with Terracotta. http://paul.stadig.name/2009/03/clojure-terracotta-we-have-repl.html.

In my last post about Clojure + Terracotta I gave an example of sharing specific references between JVMs through Terracotta. This is what I call the "shared somethings" approach. You specify exactly what you like to share. Another approach is what I call the "shared everything" approach.

Shared Everything

The goal of shared everything is to have multiple VMs working within a single global context through Terracotta, all of your vars and refs would be shared by default, and the canonical test case for this would be to define a function in one VM and have it show up automatically in another VM.

The first task was to move my work into a Terracotta Integration Module (TIM). When using a TIM, in addition to packaging the configuration for reuse, classes can be replaced with clustered versions that will work with Terracotta, without having to fork the original code base.

The second was to replace a couple of classes in the Clojure codebase. The Namespace class uses AtomicReference, which is not supported by Terracotta. There was a minor change necessary in Var, too, because it was using its dvals field as a sentinel value to indicate that the var is not bound. This does not do for Terracotta, because dvals is a ThreadLocal, so I created a NOT_BOUND sentinel field. There were some other changes as well, I'm not going to detail all of the changes, but you get the idea.

At this point I would have hoped that I could run a REPL and possibly even try my canonical test case, but it should be so easy. I have run into two major roadblocks:

  1. *in*, *out*, and *err*. Being I/O streams, *in*, *out*, and *err* obviously cannot be shared through Terracotta. The problem is that they are stored as Vars and interned into the clojure.core namespace. This means that Terracotta will try to share them, because they are part of the shared object graph. I could make clojure.lang.Var.root a transient field (through Terracotta's configuration file), but that would make the roots of all Vars transient, which is not what we want. Instead, what I thought I needed is some kind of TransientVar that could have a different root value (not just bindings) for each JVM. I pursued this a bit using the class replacement of the TIM, and concluded that if that is the route to go, then it should probably be made in the Clojure code (it got messy), or that at least there are some changes to the Clojure code that would ease this. What I settled on (after a suggestion from Rich) is to leave the root bindings for *in*, *out*, and *err* as nil and allow the REPL to bind them. However, the REPL did not bind them for me, so I created my own repl.clj file that binds them and calls clojure.main/repl, and it works! However, this is only a temporary solution. Whether it is creating a TransientVar class, or something else, we need a more permanent solution.
  2. Classes. I am able to connect a single JVM to Terracotta, and run the REPL, but I cannot connect multiple JVMs, nor can I disconnect and reconnect a single JVM. When an instance of Clojure connects to Terracotta, it pulls a compiled function out of the object cache, and then throws a ClassNotFoundException because it cannot find the associated class. I started to pursue modifying the DynamicClassLoader and Compiler to store the compiled classes in the Terracotta object graph, and I still think that this might be the direction to go in. However, I wanted to go ahead and share what I have and get some feedback to see if there are any other solutions.

The code is available at http://github.com/pjstadig/tim-clojure-1.0-snapshot/tree/master. In the "examples" directory I have a "shared-everything" example and a "shared-somethings" example. If you have any trouble running the examples, then let me know. There are some dead ends and some commented code that may not make sense, but my goal was to do a proof-of-concept first and to clean it up once I understand what needs to be done.

Conclusion

We are getting close to a "shared-everything" approach to integrating Clojure and Terracotta. We have some issues to deal with, but we are on our way to making this dream a reality.

Friday, February 27, 2009

Clojure + Terracotta = Yeah, Baby!

Update: I've gotten a REPL running with Terracotta. http://paul.stadig.name/2009/03/clojure-terracotta-next-steps.html.

What is Terracotta?

Terracotta provides a network-attached, virtual, persistent heap and transparent inter-JVM thread coordination. With Terracotta, you no longer need to map your objects to database tables and back. You simply hand your object to Terracotta and it will cache your data. Not only does it cache your data, but it will make your object available to a cluster of networked JVMs. Not only that, but it will also spill your objects to disk if necessary (just like Virtual Memory), so you need not worry about having gobs of memory to hold all of your objects.

What is Clojure?

Clojure is a Lisp for the JVM with a software transactional memory, and agents (asynchronous, message based concurrency). It is a functional language with immutable datatypes. It can also inter-operate with any existing Java code.

NOTE: you need to use Clojure r1310 or later, because the Keyword class needs to have hashCode implemented to play nicely with Terracotta.

Clojure + Terracotta = ?

These two seem like an interesting combination. Imagine the possibilities...kill your database, simple POJO applications, free distributed transactions, clustered JVMs with limitless memory...it would make your hair would grow back, you'd get women, and become filthy rich...well...maybe not, but at least you'd have more fun writing software.

After some initial setup (the code and instructions are at http://github.com/pjstadig/terraclojure/tree/master/), there are two things that need to be done to integrate Clojure and Terracotta: 1) instead of running Clojure with the 'java' command, you run it with the 'dso-java.{sh,bat}' script provided with Terracotta, and 2) you need to create a configuration file that defines how your objects will be shared between JVMs.

Configuration

The configuration for Terracotta (at least in our case) consists of defining: roots, instrumented classes, auto-locks, additional boot jar classes, and servers. At this point it's probably helpful to take a peek at the config.xml file that comes with the code and follow along.

  • Roots. A root is a object that is shared between JVMs. Any objects that are part of the object graph that can be reached from the root are also shared, so any objects that are assigned to data members, etc. A common use case is to have a ConcurrentHashMap (or in our case a PersistentHashMap from Clojure) that is shared as a root. This creates a flexible hierarchy of shared objects. In Clojure's case, we also share clojure.lang.Keyword.table, so that our keywords are unique across all of the JVMs, otherwise inserting into a hash map would create multiple entries for the same keyword.
  • Instrumented classes. Any class that is shared (either directly as a root, or indirectly as a part of a root's object graph), must be instrumented. I made all of the clojure.lang.* classes instrumented. It's a bit of a broad stroke, but there aren't any performance problems that result from instrumenting too many classes. Terracotta is helpful in this case, if you end up inserting an uninstrumented class into the object graph, it'll throw a RuntimeException that explains exactly how to modify your config file to instrument that class.
  • Auto-locks. Terracotta will transparently convert your synchronized blocks into distributed transactions across all the JVMs in the cluster. Again, I made broad strokes here and just defined auto-locks for all of the methods on any clojure.lang.* class, and again, there aren't any performance penalties for auto-locking methods that don't have any synchronized blocks. I used write locks, and Terracotta has a few different types of locks that are worth looking into if you need to do something more serious. In the case of auto-locks, Terracotta will also help you out by throwing a RuntimeException if you leave out anything.
  • Additional boot jar classes. Frankly, this was something Terracotta told me to do, and I don't know exactly what is going on here. (Perhaps someone else can explain?) I think what happens is that by default Terracotta instruments the java.lang.* and java.util.concurrent.* classes, but to instrument other Java core classes you have to add them in this configuration element.
  • Servers. Terracotta is very easy to work with, and by default will just run a single server on localhost. You can define more than one server in a cluster. In my case, I only wanted one server, but I wanted to change the persistence mode. By default the persistence mode is a temporary-swap-only mode. The objects will be preserved across stopping and starting clients, but once the server is stopped, the data disappears. To have the objects persisted across restarting the server, you have to set the persistence mode to permanent-store. The temporary swap mode will be faster for data like the intermediate results of calculations, caching, etc., but if you need to permanently persist the data, then you need to use permament-store.

There are instructions about how to run this example in the README with the code, so I won't bother to duplicate that here. I'd just like to share some of the issues I encountered, the results, and any future direction that could be taken.

Issues

The first major issue that I encountered was that Keywords weren't unique across JVMs, so I had to make clojure.lang.Keyword.table a root. This ensured that keywords are unique across JVMs, but I still ran into an issue when using keywords as keys for a PersistentHashMap. The result of identical? was true for keywords from two JVMs, but I was still getting duplicate entries in my hash map. After some debugging, I was able to determine that the issue is that the keyword class did not override the default implementation of hashCode. After mentioning this to Rich, and a quick fix in r1310, it worked nicely.

The only other major issue was how to reference Clojure vars and refs from the Java side. The main reason for this is to define a root that will be shared by Terracotta. When Clojure code gets compiled some Java classes get generated with mangled names. As far as I can tell, there isn't a good predictable way to get at a Clojure var, because Clojure will generate a class for each namespace called my/namespace/namespace__init.class and it creates static fields on that class for various definitions (functions, vars, etc.). Those fields are named const_1, const_2, const_3, etc. There is no reliable, flexible way to predict the name of a particular Var.

My solution was to create a simple Java class called terraclojure.Root with a couple of static fields containing refs. At first I just used that class directly to access the refs, but then I decided to actually assign the static fields to some vars in my namespace, i.e. (def *hash* terraclojure.Root/hash). This works and it makes it a little more transparent on the Clojure side. I would be happy to hear if there is another way to do this.

Result

The result of this whole experiment was that I am able to use the Software Transactional Memory with a couple of refs, and to have my changes shared across multiple JVMs. I didn't do any extensive testing to verify that transaction retries work as expected, but since Clojure uses the java.util.concurrent.* classes and standard synchronization, I don't expect there would be an issues.

Where do we go from here?

I only experimented with the STM. I didn't experiment with Agents, so that is certainly an area for future work. On the Terracotta side, I only used one server, I didn't setup a whole array of servers, nor did I try using one or more servers on different machines. All my testing was local, so the performance reflected that (it was pretty good! :)). If you do any further experimentation, then please share it on a blog or to the Google group.

Conclusion

I don't have a lot of experience with Terracotta, but it seems to be quite mature and easy-to-use. I also think that Clojure is a very exciting language, and the combination of the two opens up some interesting possibilities for how to architect highly available, scalable, database-less applications.

P.S. I have a B.S. in Computer Science and will have an M.S. in Computer Science in May. I don't do anything near this interesting at my job. If you have any need for consulting, or if you'd like to offer me a job ;), then feel free to contact me at paul@stadig.name.

Friday, February 20, 2009

Rails, respond_to, IE6, and the Accept Header

Pain...much pain caused by IE6.

If you've worked with respond_to in Rails, you know what a cool idea it is. Provide access to the same resource in different formats based on either the extension on the URL (i.e. http://something/people/1.xml), or based on an HTTP header that your browser send to the web server, called the Accept header.

It sounds good, but in practice there is one particular browser (*cough* IE) that causes problems. I got into it thinking, "I don't need to worry about this 'Accept' header thing. If a user pulls up http://something/people/1 they'll get an HTML version and if they pull up http://something/people/1.xml they'll get an XML version." This fallacious (?!) reasoning works like a champ with Firefox and IE7 (I think it's getting hazy at this point), but IE6 FAIL!

Change the order of my respond_to block? FAIL! How about forcing a sane Accept header? Sweet! It works, until I upgrade rails and now the request headers are frozen. FAIL! (This may have been my problem because I wasn't doing it right, but it doesn't matter, there is a better way.) How about just explicitly specifying the :format for every URL in the application? Annoying, tedious, but it works, until I get a call from a user, "When I search here and click there I get a 'data dump.'" FAIL!

At this point, I may be doing something wrong. Perhaps one of the above solutions "should" have worked, but I'm mad...there has to be a better way. Can't Rails just serve HTML by default and some other format when you specify the extension? Can't Rails just ignore the Accept header? It turns out that there was a commit on June 27, 2008 that did just that. This was supposedly done for Rails 2.2, and I'm running 2.2.2, so why am I not benefiting from it? Because two weeks later it was undone. However, we're on the right track now.

Given that this is such a widely known issue, I don't know why someone hasn't posted the magic solution until now, but here it is...Are you ready? Add this line to config/environments/{test,development,production}.rb:

config.action_controller.use_accept_header = false

There...that was simple. You're welcome.

Friday, February 13, 2009

To 'and' or not to 'and'

Ruby has two 'or' operators ('||' and 'or'). It also has two 'and' operators ('&&' and 'and'). This can be confusing to people, but especially to those learning the language. There is a temptation to use 'and' and 'or' because it is more readable, and I can certainly appreciate that. However, there are some serious differences between these operators, and I recommend only using '&&' and '||' in boolean expressions.

Of the two, 'and' has lower precedence than '&&', and it is the same with 'or' and '||'. This means that there is a difference between:

irb(main):001:0> true || false && false
=> true

and:

irb(main):002:0> false || true and false
=> false

You might then be tempted to just adopt the practice of always using 'or' and 'and', but that also might surprise you:

irb(main):023:0> true or false and false
=> false

This surprising result follows from the fact that, whereas '&&' has a higher precedence than '||', 'or' has the same precedence as 'and', so Ruby just evaluates the statement left to right handling first the 'or' then the 'and'.

Even though you may understand the nuances between these operators, not everyone may understand, and the fact is that 99% of programmers in the world (really 100% I would hope) can understand statements involving '&&' and '||'. So let's just stick with the traditional boolean operators, because in the end it is actually more readable.

Friday, February 6, 2009

CSS: multiple class selection

I don't know how many times I've wished that I could select an HTML element that has two classes. I want to select the table rows that are both 'odd' and 'awesome'. So instead of doing this:

...
<tr class="even awesome">...</tr>
<tr class="odd awesome">...</tr>
...

I end up doing this:

...
<tr class="even awesome even_awesome">...</tr>
<tr class="odd awesome odd_awesome">...</tr>
...

I always felt dirty doing something like that, and thought there had to be a better way to do it. Well there is! It turns out that '.odd.awesome' will select those elements with both the 'odd' and 'awesome' classes.

This stylesheet:

.odd_awesome {
  color: red;
  size: 48pt;
}

Has now become:

.odd.awesome {
  color: red;
  size: 48pt;
}

And the HTML is simply:

...
<tr class="even awesome">...</tr>
<tr class="odd awesome">...</tr>
...

Now that I know this secret, I vaguely remember having known it many years ago (like when I was first introduced to CSS), but somehow I had forgotten it. It's like running into an old friend. "Hello Mr. CSS Selector! It's been a long time."

Now go simplify your HTML/CSS!