Monday, August 13, 2007

Programming in C, ummm, Haskell Here's a small Haskell program for computing factorial.
fac n = do {
    a <- auto 1;
    i <- auto n;
    while (i >. 0) $ do {
        a *= i;
        i -= 1;
    };
    a;
  }
It even runs, runE (fac 10) produces 3628800. Let's compare that to the C program doing the same thing.
int
fac(int n) {
    auto a = 1;
    auto i = n;
    while (i > 0) {
        a *= i;
        i -= 1;
    }
    return a;
}
They look rather similar, don't they? (BTW, I decided to use the heavily underused C keyword auto. If you don't know C, don't worry, auto doesn't mean anything. A local declaration auto int i; is the same as int i; in C.) I often hear people complain that C-like programming in Haskell is ugly and verbose, but I don't think that has to be the case. What people complain about is when they write code like this:
fac' n = do
    a <- newIORef 1
    i <- newIORef n
    whileM (do i' <- readIORef i; return $ i' > 0) $ do
        i' <- readIORef i
        a' <- readIORef a
        writeIORef a (a' * i')
        writeIORef i (i' + 1)
    a' <- readIORef a
    return a'
And I agree, it's ugly, it's verbose, but it's not necessary. One of the reasons it's verbose are the uses of readIORef and writeIORef. In C you just refer to the variable and you'll get an r-value when you need it, and you'll get an l-value when you need it. And you can do the same in Haskell. Let's look at a tiny version of the problem. We want the following operations:
type V a ...                     -- variables of type a
type E a ...                     -- expressions of type a
(=:) :: V a -> E a -> IO ()      -- assignment
plus :: E Int -> E Int -> E Int  -- addition
one :: E Int                     -- constant 1
Here's a tiny sample:
  x <- auto one
  x =: x `plus` x

-- the following should be a type error
  (x `plus` x) =: one
How can we make this happen? We want x to be able to be the left hand side of an assignment as well as an expression on the right hand side. But the assignment operator has different types on the left and right. So we could possibly make V the same as E. But this would be bad, because we want the left hand side to be a variable and now we'd have no way of knowing, which means that it would no longer be a type error to assign to a non-variable. So, we need a little bit of type trickery. First, we'll make V and E "subtypes" of the same type, so they have something in common.
data E' v a = ...
data LValue
data RValue
type V a = E' LValue a
type E a = E' RValue a
The type E' has an additional type argument that determines if the value is an l-value or an r-value. So what about auto, what type should it have? It needs to return something that is both an l-value and an r-value. So we'll make it polymorphic! First attempt:
auto :: E a -> IO (E' v a)
But this won't work. Saying "x <- auto 2; ..." is the same as "auto 2 >>= \ x -> ...". And when a variable is lambda bound it is not polymorphic. What does this mean? It means that we can use x as either an l-value or as an r-value inside ..., but not both; the v type variable can only have a single type. Are we stuck? Well, we would have been without higher ranked polymorphism. But here's a type that works:
auto :: E a -> IO (forall v . E' v a)
So now x will have type forall v . E' v a which is indeed polymorphic in v just as we wanted. OK, so now we have some type signatures that work (using stub implementations it's easy to check that the types work out). So now we need to implement the types and the operations. Let's start with r-values. To make a constructor that only can construct r-values we'll use a GADT. The E type will simply embed an IO value. So when we see the type E' RValue a it's really isomorphic to IO a. To extract the IO value we have the function runE.
data E' v a where
    E :: IO a -> E' RValue a

runE :: E' v a -> IO a
runE (E t) = t
So now we can do plus and one; they are totally straightforward except that we need to unwrap and wrap the E constructor.
plus :: E Int -> E Int -> E Int
plus x y = E $ do
    x' <- runE x
    y' <- runE y
    return $ x' + y'

one :: E Int
one = E $ return 1
And then the hard part, variables. To represent a variable we'll use two fields, one that reads the variable and one that assigns the variable. We need to extend the runE function to use the variable read field to get the value. The auto function allocates a new variable and then packages up the two fields.
data E' v a where
    E :: IO a -> E' RValue a
    V :: IO a -> (a -> IO ()) -> E' v a

runE :: E' v a -> IO a
runE (E t) = t
runE (V t _) = t

auto :: E a -> IO (forall v . E' v a)
auto x = do
    x' <- runE x
    r  <- newIORef x'
    return (V (readIORef r) (writeIORef r))
We're about done, just assignment left. It's easy, just use the assignment field from the V constructor.
(=:) :: V a -> E a -> IO ()
(V _ asg) =: e = do
    e' <- runE e
    asg e'
Hmmm, but there's a missing case in that definition. What about the constructor E? Can't we get a runtime failure? No. Look at the type of the first argument, it's V a, i.e., E' LValue a. The E constructor can only construct values of type E' RValue a, so we just can't get a match failure. Now our little example above compiles, and trying the bad expression (x `plus` x) =: one gives this error:
    Couldn't match expected type `LValue'
    against inferred type `RValue'
      Expected type: V a
      Inferred type: E Int
Once we've made the E data type it's totally routine to make the factorial function I first showed work. We just need to create an instance for Num (E a) and a few more functions. Is this the end of the story? Is everything rosy? Is C programming in Haskell as smooth as that? Well, there are some flies in the ointment. While it's true that programming with the E type is easy, it's a little cumbersome if we want to use pure functions. Pute functions will not have the E type and will have to be lifted into this new brave world. Likewise, IO types will have to be lifted. So while we have reduced some verbosity, it has popped up again in a different place. Which is better? I don't know, but I thought I'd share the l-value/r-value trick. Oh, and (of course), there's nothing special about the IO monad. I just used it as an example; any monad with references will work. You can parametrize E' over the underlying monad.

Labels:

5 Comments:

Blogger sigfpe said...

Hey! You're still not using "ampersand l t semicolon" in your HTML for the less than symbol.

Anyway, neat! Just for fun I was trying to embed BASIC in Haskell, pretty much the same thing as C. But you got further than me with your l/r trick.

And how elegantly can you support gotos?

Tuesday, August 14, 2007 at 2:29:00 AM GMT+1  
Blogger augustss said...

Getting 'goto' working with anything like C syntax looks a bit if a challenge. I'll settle for 'break' and 'continue' for a start. Oh, and a working 'return'. I think those are doable.

Tuesday, August 14, 2007 at 9:00:00 AM GMT+1  
Blogger Matthew N said...

Hi! I also enjoyed reading your article, and your lvalue/rvalue trick.

Your embedding reminds me of when I was writing Recipe, a little Lava library for imperative programming defined in terms of circuit primitives. To allow pure Lava functions and imperative code to be mixed, I defined an "apply" function, similar to "modifyIORef", except that pure functions over tuples of variables (or other structures of variables) could be used. Using this approach, "fac" would look something like:

fac n = do
a <- auto 1
i <- auto n

while (> 0) i $
apply (\(a, i) -> (a * i, i - 1)) (a, i)

return a

(Sorry, I can't get pre tags to work!)

I found this to be slightly more convenient than something like modifyIORef, especially considering that the semantics of parallel assigment is different to two sequential assignments. So, I just thought I'd mention it!

Tuesday, August 14, 2007 at 5:19:00 PM GMT+1  
Blogger newsham said...

I played with something like this a while ago. Mine wasn't as polished but the gist was the same:
http://www.thenewsh.com/%7Enewsham/x/impermand.hs

Wednesday, August 15, 2007 at 5:44:00 AM GMT+1  
Blogger augustss said...

The crux of my blog posting is really the fact that you can use a variable both as an lvalue and an rvalue.

Wednesday, August 15, 2007 at 8:36:00 AM GMT+1  

Post a Comment

<< Home