Tuesday, August 30, 2016

"Clojure Polymorphism" Released!

From my new blog Real World Clojure. What am I doing with this new blog? I have no idea, but you can follow along.

~ ~ ~ ~

I have released a short e-book (30 pages) titled "Clojure Polymorphism." You can get 50% off by using this coupon link http://www.leanpub.com/clojurepolymorphism/c/ONeJZ629Isy7.
What is this book about?
When it comes to Clojure there are many tutorials, websites, and books about how to get started (language syntax, set up a project, configure your IDE, etc.). There are also many tutorials, websites, and books about how language features work (protocols, transducers, core.async). There are precious few tutorials, websites, and books about when and how to use Clojure's features.


This is a comparative architecture class. I assume you are familiar with Clojure and even a bit proficient at it.  I will pick a theme and talk about the tools Clojure provides in that theme.  I will use some example problems, solve them with different tools, and then pick them apart for what is good and what is bad.  There will not be one right answer.  There will be principles that apply in certain contexts.
I this installment, I will pick up the theme of "Polymorphism" looking at the tools of polymorphism that Clojure provides. Then I take a couple of problems and solve them several ways. At the end of it all, we look back at the implementations and extract principles. The end goal is for you to develop an understanding of tradeoffs and a taste for good Clojure design.


I have some ideas for other e-books. Perhaps a concurrency tour of Clojure taking a look at futures, STM, reducers, core.async, etc. Or maybe talk about identity by looking at atom, agent, ref, volatile!, etc. Or maybe look at code quality tools. Or how to organize namespaces. Or adding a new data structure with deftype?

What would you like to see? Contact me. :)

Friday, August 19, 2016

Reducible Streams

Laziness is a great tool, but there are some gotchas. The classic:

(with-open [f (io/reader (io/file some-file))]
  (line-seq f))

line-seq will return a lazy seq of lines read from some-file, but if the lazy seq escapes the dynamic extent of with-open, then you will get an exception:

IOException Stream closed  java.io.BufferedReader.ensureOpen (BufferedReader.java:115)

With laziness, the callee produces data, but the caller can control when data is produced. However, sometimes the data that is produced has associated resources that must be managed. Leaving the caller in control of when data is produced means the caller must know about and manage the related resources. Using a lazy sequence is like co-routines passing control back and forth between the caller and callee, but it only transfers control for each item, there is no way to run a cleanup routine after the caller has decided to stop consuming the sequence.

A Tempting Solution

One might immediately think about putting the resource control into the lazy seq:

(defn my-line-seq* [rdr [line & lines]]
  (if line
    (cons line (lazy-seq (my-line-seq* rdr lines)))
    (do (.close rdr)
        nil)))

(defn my-line-seq [some-file]
  (let [rdr (io/reader (io/file some-file))
        lines (line-seq rdr)]
    (my-line-seq* rdr lines)))

This way the caller can consume the sequence how it wants, but the callee remains in control of the resources. The problem with this approach is the caller is not guaranteed to fully consume the sequence, and unless the caller fully consumes the sequence the file reader will never get closed.

An Actual Solution

There is a way to fix this. You can require the caller to pass in a function to consume the generated data, then the callee can manage the resource and execute the function. It might look something like:

(defn process-the-file [some-file some-fn]
  (with-open [f (io/reader (io/file some-file))]
    (doall (some-fn (line-seq f)))))

(process-the-file my-file-name do-the-things)

Once upon a time clojure.java.jdbc used to have a with-query-results macro that would expose a lazy seq of query results, and you had these resource management issues. Then it was changed to use this second approach where you pass in functions.

There is a hitch to this approach. Now the callee has to know more about how the caller's logic works. For instance, in the above code you are assuming that some-fn returns a sequence that you can pass to doall, but what if some-fn reduces the sequence of lines down to a scalar value? Perhaps process-the-file could take two functions seq-fn and item-fn:

(defn process-the-file [some-file item-fn seq-fn]
  (with-open [f (io/reader (io/file some-file))]
    (seq-fn (map item-fn (line-seq f)))))

(process-the-file my-file-name do-a-thing identity)

That's better? I still see two problems:
  1. The caller is back to having to know/worry about resource management, because it could pass a seq-fn that does not fully realize the lazy seq before it escapes the with-open
  2. The logic hooks that process-the-file provides may never be quite right. What about a hook for when the file is open? How about when it is closed?
I could argue that this whole situation is worse, since the caller still has to worry about resource management, and now the callee has this additional burden of trying to predict all of the logic hooks the caller might want.

An additional design consequence is that you are inverting control from what it was in the lazy seq case. Whereas before the caller had control over when the data is consumed, now the callee does. You have to break your logic up into small chunks that can be passed into process-the-file, which can make the code a bit harder to follow, and you must put your sharded logic close to the callsite for process-the-file (i.e. you cannot take a lazy sequence from process-the-file and pass it to another part of your code for processing). There are advantages and disadvantages to this consequence, so it is not necessarily bad, it is just something you have to consider.

Another Solution

We can also solve this by using a different mechanism in Clojure: reduction. Normally you would think of the reduction process as taking a collection and producing a scalar value:

(defn process-the-file [some-file some-fn]
  (with-open [f (io/reader (io/file some-file))]
    (reduce (fn [a v] (conj a (somefn v)) [] (line-seq f))))

(process-the-file my-file-name do-a-thing)

While this may look very similar to our first attempt, we have some options for improving it. Ideally we'd like to push the resource management into the reduction process and pull the logic out. We can do this by reifying a couple of Clojure interfaces, and by taking advantage of transducers.

If we can wrap a stream in an object that is reducible, then it can manage its own resources. The reduction process puts the collection in control of how it is reduced, so it can clean up resources even in the case of early termination. When we also make use of transducers, we can keep our logic together as a single transformation pipeline, but pass the logic into the reduction process.

I have created a library called pjstadig/reducible-stream, which will create this wrapper object around a stream. There are several functions that will fuse an input stream, a decoding process, and resource management into an reducible object. Let's take a look at them:
  • decode-lines! will take an input stream and produce a reducible collection of the lines from that stream.
  • decode-edn! will take an input stream and produce a reducible collection of the objects read from that stream (using clojure.edn/read).
  • decode-clojure! will take an input stream and produce a reducible collection of the objects read from that stream (using clojure.core/read).
  • decode-transit! will take an input stream and produce a reducible collection of the objects read from that stream.
Finally, there is a decode! function that encapsulates the general abstraction, and can be used for some other kind of decoding process. Here is an example of the use of decode-lines!:

(into []
      (comp (filter (comp odd? count))
            (take-while (complement #(string/starts-with? % "1"))))
      (decode-lines! (io/input-stream (io/file "/etc/hosts"))))

This code will parse /etc/hosts into lines keeping only lines with an odd number of characters until it finds a line that starts with the number '1'. Whether the process consumes the entire file or not, the input stream will be closed.

Advantages:
  • This reducible object can be created and passed around to other bits of code until it is ready to be consumed.
  • When the object is consumed either partially or fully the related resources will be cleaned up.
  • Logic can be defined separately and in total (as a transducer), and can be applied to other sources like channels, collection, etc..
Disadvantages:
  • This object can only be consumed once. If you try to consume it again, you will get an exception because the stream is already closed.
  • If you treat this object like a sequence, it will fully consume the input stream and fully realize the decoded data in memory. In certain uses cases this may be an acceptable tradeoff for having the resources automatically managed for you.

Summary

Clojure affords you several different tools for deciding how to construct your logic and manage resources when you are processing collections. Laziness is one tool and it has advantages and disadvantages. It's main disadvantage is around managing resources.

By making use of transducers and the reduction process in a smart way, we can produce an object that can manage its own resources while also allowing collection processing logic to be defined externally. The library pjstadig/reducible-stream provides a way to construct these reducible wrappers with decoding and resource management fused to a stream.

Acknowledgments


Special hat tip to hiredman. His treatise on reducers is well worth the read. Many moons ago it got me started thinking about these things, and I think with transducers on the scene, the idea of a collection managing its own resources during reduction is even more interesting.

Monday, May 9, 2016

The March of Technology

"Our inventions are wont to be pretty toys, which distract our attention from serious things. They are but improved means to an unimproved end, an end which it was already but too easy to arrive at; as railroads lead to Boston or New York. We are in great haste to construct a magnetic telegraph from Maine to Texas; but Maine and Texas, it may be, have nothing important to communicate. Either is in such a predicament as the man who was earnest to be introduced to a distinguished deaf woman, but when he was presented, and one end of her ear trumpet was put into his hand, had nothing to say. As if the main object were to talk fast and not to talk sensibly. We are eager to tunnel under the Atlantic and bring the Old World some weeks nearer to the New; but perchance the first news that will leak through into the broad, flapping American ear will be that the Princess Adelaide has the whooping cough. After all, the man whose horse trots a mile in a minute does not carry the most important messages; he is not an evangelist, nor does he come round eating locusts and wild honey. I doubt if Flying Childers ever carried a peck of corn to mill."
Thoreau, Henry David. Walden, and on the Duty of Civil Disobedience. Project Gutenberg. Web. 09 May 2016. https://www.gutenberg.org/

Or in the words of a more modern philosopher and poet:

Saturday, March 5, 2016

Making Fake Things

Software is fake. There are bits inside a computer represented by a magnetic or electrical charge or mechanical potential or some such thing. But software is not an electrical charge. Electrical charges can represent ones and zeroes and a series of ones and zeroes like "10111101" can represent the JVM opcode "anewarray" or the fraction one-half ("½") in the ISO-8859-1 character encoding or the number -67 in twos-complement. Software is not electrical charges, it is a particular interpretation imposed on electrical charges. An interpretation does not weigh anything. It has no color, taste, temperature, volume, mass, or any physical features. It is fake, but fake things can be useful.

Fake things can represent real things (or other fake things). For example, you can represent a couch with a 3D model in a computer. You can represent cities and towns and roads with fake things. You can also represent fake things with other fake things. JVM opcodes, characters, and numbers are all fake things represented by "10111101", which is fake. Fake things are useful because they can represent real and fake things in a way that can be cheaply manipulated and transported instantly across the world. Fake things also have challenges.

Software is a little unique even among fake things because in making software we are often making something that has never existed before. When someone creates a stove there are hundreds of thousands of others stoves in existence to draw upon. There are wood stoves, electric stoves, and gas stoves. But when someone created a text editor, they created something that had never existed before. Here is how Richard Gabriel describes it:

"But, consider the first people to design and build a text editor. Before that, there was never a text editor. Changes to a manuscript were always made by retyping or retypesetting. How would people want to make textual changes? How would people want to navigate? Searching? - no one ever heard of that before. Systematic changes? Huh? By the way, there were no display terminals, so how do you even look at the manuscript?" -- http://dreamsongs.com/LessonsFromNothing.html

Web applications, virtual currencies, automated theorem provers, and many other software applications had never existed before or were so different in nature from their physical counterparts that they were a unique thing. Making fake things is hard enough, but making things that have never existed before is that much harder. That's not the end of it, though.

Fake things have no real world to help co-design them. Stoves have a real world to help co-design them. There are accessories that are used with stoves that help co-design them. Real things like pots and pans. Stoves have to fit through doorways, nestle between kitchen cabinets, and match the colors on the walls. Text editors have accessories like keyboards and mice that were invented to give real people made of meat a way of manipulating a conceptual world by proxy. Perhaps a mouse has to be compatible with a human hand, but a text editor has to be compatible with the mental model of a text editor that exists in a human mind, a model which no one had ever thought of before. Ultimately making software is a process of collaborating with other humans to dream up some mental model, and then making a fake thing out of software that other humans can use to manipulate that model (assuming they properly understand the mental model).

Which reminds me, collaboration is also a fake thing. Collaboration is about using real things, like vibrating air, to push around fake things, like words. It is about using real things, like markers and whiteboards, to manipulate fake things, like ideas. All of these real things can be replaced by fake things, like video conferencing software and text editors. And fake things like words and ideas can be replaced by other fake things, and all of these fake things can be instantly transported, copied, and manipulated by real people in real (and very distant) places. Collaboration is not a real thing, it is a fake thing produced through the interaction of real people thinking creatively.

And making software is a creative act. Writing software is writing instructions to make a computer do something. You must choose the instructions, determine their order, name things. You develop your own style. Writing software is writing words that have effect. Writing software is as close as you can get to God with words speaking reality into existence, the ultimate creative act. But writing software is not just for telling computers what to do. It is also collaboration with other humans. They must read, understand, modify, and extend what you write. They must understand your vision. You must collaborate with them through your source code.

So, here we are. We have discovered that software is a fake thing, that it is often an entirely new thing, that it is a pure product of the mind, that it is born of collaboration, and it is creative expression. Now what? We must systematically question the constraints we place on ourselves, because those constraints are often meant for real things and our things are fake. Here are a few examples:

A top-down management hierarchy is for making real things, not fake things. Top-down, command-and-control hierarchies are about control and efficiency. Control and efficiency are important for real things, because real things have locality, cost, and scarcity. Software has none of these things. Control and efficiency are important when you are manufacturing the same thing over and over. Software is often exploratory. Software is valuable not because we repetitively make lots of little copies of the same thing, but because we dream up some new way of doing things that has never been done before. Control and efficiency are important when you have a predictable process. A creative process is not predictable. You may think for hours about a problem, sleep on it, and then have the answer pop into your head the instant you wake up. We need to think differently, not just about what we make, but how we make it.

Offices are about locality. An office puts materials, means of production, and managers in the same physical location. Yet with software there is no material and the means of production are mental. There is no reason to be concerned about locality. Ostensibly having a bunch of people in the same office enables them to collaborate, but collaboration is a fake thing. Collaboration does not exist in San Francisco or Saint Louis. It does not weigh 1kg. It is not blue. Having an office for collaboration is a rationalization that projects the past onto the future. Is collaboration different using video conferencing and Google Docs than it is using tables and chairs in an office? Yes, because fake things are different than real things. I do not recommend mixing fake things like video conferencing with real things like offices. It may take getting used to, but embracing the fakeness of collaboration has advantages like hiring people where they want to live instead of trying to convince them to live where you live. It also means having permanent, searchable, modifiable artifacts that can be shared instantly across the world, instead of a whiteboard in a room.

Software can process data, but software is also data. This creates leverage. You can flip a bit, and that bit can flip ten others, and those ten another one hundred, etc. Compilers, build tools, continuous integration, and automated tests are all software doing things to software. "The cloud" has created a lot of leverage because it took something that was real (a machine) and made it fake (a "cloud instance"), and once it is fake it can be manipulated by software. The higher you can climb the mountain of abstraction the more powerful you will become. Before selling to Facebook, WhatsApp had ~450 million active users and ~55 employees. Yahoo has ~12,500 employees. I don't know how many active users they have, but let's just pretend it is ~450 million. Don't be Yahoo.

These are just examples, and you can agree or disagree. My point is, we as an industry can achieve market success and realize our visions much more powerfully, but we must understand the nature of the software we are creating (it is fake), and the newness of what we are doing every day, and its collaborative nature, and the tools that we can take advantage of, and we must have the courage to give up on arbitrary constraints that are optimized for making real things. We must pursue leverage, because leverage will enable us to do amazing things.