Functional Semantics in Imperative Clothing

There's an old joke about programming with pure functions:

“Eventually you have to do some effects. Otherwise you're just heating up the CPU.”

I've always wanted the purely functional Roc programming language to be delightful for I/O-heavy use cases. But when I recently sat down to port an I/O-heavy shell script from Bash to Roc, I wasn't happy with how the code felt.

Fortunately, all it took was a bit of syntax sugar to change that. Thanks to one little operator, purely functional I/O in Roc has now become a real delight!

From Bash to Roc

The shell script in question assembles some static assets for the roc-lang.org website. Here's part of the Bash version of the script:

cp -r public/ build/

# If this is set, assume we're on a build server
if [ -v GITHUB_TOKEN ]; then
    echo 'Fetching latest roc nightly...'
    # Get roc release archive
    curl -fOL https://github.com…
    # Extract archive
    ls | grep "roc_nightly" | xargs tar -xzvf
    # Delete archive
    ls | grep "roc_nightly.*tar.gz" | xargs rm
    roc='./roc_nightly/roc'
else
    # Build the `roc` CLI from source
    cargo build --release
    roc=target/release/roc
fi

$roc --version

echo 'Building site markdown content'

$roc run www/main.roc www/content/ www/build/

Here's how this code can now look in Roc, thanks to the new syntax sugar that our awesome contributor Luke Boswell recently implemented: the ! suffix.

Dir.copyAll! "public/" "build/"

pathToRoc =
    # If this is set, assume we're on a build server
    if Result.isOk (Env.var! "GITHUB_TOKEN") then
        Stdout.line! "Fetching latest roc nightly..."

        # Get Roc release archive
        filename = "nightly.tar.gz"
        Http.download! "https://github.com…" filename
        Cmd.exec! "tar" ["-xzvf", filename]
        # Delete archive
        File.removeIfExists! filename

        "./roc_nightly/roc"
    else
        # Build the `roc` CLI from source
        Cmd.exec! "cargo" ["build", "--release"]

        "target/release/roc"

roc = \args -> Cmd.exec! pathToRoc args

roc! ["--version"]

Stdout.line! "Building site markdown content"

roc! ["www/main.roc", "www/content/", "www/build/"]

I really like how this reads! It looks totally imperative, which is a nice fit for a script that's doing lots of I/O and not much else.

In fact, it's so visually similar to the Bash version, you might not even guess that the Roc version desugars to a big pile of 100% statically type-checked pure functions.

It's functional semantics in imperative clothing!

Desugaring the Sugar

To explain how this imperative-looking code can actually be compiling down to nothing but pure functions, I need to start by explaining how the ! suffix works.

It's very similar to the await keyword in other languages. For example, this line…

if Result.isOk (Env.var! "GITHUB_TOKEN") then

…might look like this in JavaScript:

if (Result.isOk(await Env.var("GITHUB_TOKEN"))) {

Before we had the ! suffix, code like this didn't look nearly as nice. The closest we had was backpassing, which was unhelpful in nested expressions; this one line would probably have been two lines instead:

result <- Env.var "GITHUB_TOKEN" |> Task.await

if Result.isOk result then

Even when conditionals weren't involved, seeing <- for some assignments and = for others, plus lots of |> Task.await, wasn't nearly as nice as the style we have now.

It might look like a minor difference when comparing one small line to another, but multplied across the whole program, the ! version of the script felt much nicer.

So what does the ! suffix actually do? It basically desugars into two things:

A call to Task.await
An anonymous function which gets passed to that Task.await call

Let's walk through an example.

html = Http.getUtf8! url
path = Path.fromStr filename
File.writeUtf8! path html
Stdout.line! "Wrote HTML to: $(filename)"

This desugars to the following Roc code.

Task.await (Http.getUtf8 url) \html ->
    path = Path.fromStr filename
    Task.await (File.writeUtf8 path html) \_ ->
    Stdout.line "Wrote HTML to: $(filename)"

If you wanted to, you could have written the code this way and it would have compiled to exactly the same program! Going line by line:

html = Http.getUtf8! url

…becomes:

Task.await (Http.getUtf8 url) \html ->

The Task.await function plays a similar role as the await keyword in other languages: it says "wait until this Task successfully completes, then pass its output to a function." (A Task in Roc is a value that represents an asynchronous effect; the Http.getUtf8 function returns a Task.)

Tasks can be chained together using the Task.await function, similarly to how JavaScript Promises can be chained together using a Promise's then() method. (You might also know functions in other languages similar to Task.await which go by names like andThen, flatMap, or bind.)

The next line in our example was path = Path.fromStr filename, but that line was unchanged since it didn't use ! at all. The next ! was in this line:

File.writeUtf8! path html

It desugars to:

Task.await (File.writeUtf8 path html) \_ ->

Notice that here, since we didn't have html = at the start (because we don't care about the output of a file write), we also didn't have the named argument \html -> in the function being passed to Task.await. Instead we had \_ ->, which is how in Roc we write a function that ignores its argument.

It's worth noting that both Http.getUtf8 and File.writeUtf8 are operations that can fail. If they do, the whole chain of tasks will short-circuit to some error-handling code. That's part of what Task.await has always done, and the ! sugar doesn't affect error handling at all.

Finally, we had:

Stdout.line! "Wrote HTML to: $(filename)"

This desugars to:

Stdout.line "Wrote HTML to: $(filename)"

Since this is the last task in a chain, the ! doesn't do anything and isn't necessary…so we just drop it during desugaring instead of giving a compiler error or generating an unnecessary Task.await. This allows for a more consistent visual style, where async I/O operations always end in the ! suffix, but doesn't have any runtime cost.

I/O from Pure Functions

Earlier I said that this style of Roc code "desugars to a big pile of 100% statically type-checked pure functions."

The 100% statically type-checked part is easy to explain: Roc has full type inference, so your types always get checked, but you never need to write type annotations. You can optionally add annotations anywhere you think they'll be helpful, but for this shell script I didn't think they'd be worth including. (For Roc programs that aren't shell scripts, the common practice is to annotate all top-level functions and that's usually about it.)

What about the "all pure functions" part? By definition, pure functions don't have side effects, right? (A side effect is when a function changes some state outside the function itself, like a global variable or the file system.) So…how can these functions be pure if all this I/O is happening?

It's surprisingly simple:

Each function returns a value describing what I/O it wants done.
The compiled program has a runtime which looks at those values and actually performs the I/O they describe.

There are practical benefits to separating things this way (more on those later), but to illustrate what's happening behind the scenes here, let's go back to that example from earlier:

html = Http.getUtf8! url
path = Path.fromStr filename
File.writeUtf8! path html
Stdout.line! "Wrote HTML to: $(filename)"

We already went through how that code desugars to this code:

Task.await (Http.getUtf8 url) \html ->
path = Path.fromStr filename
Task.await (File.writeUtf8 path html) \_ ->
Stdout.line "Wrote HTML to: $(filename)"

This code in turn compiles down to something which looks similar to the following at runtime. (I'm using JavaScript syntax here rather than Roc, which will be convenient in the next example.)

{
  operation: "Http.getUtf8",
  args: [url],
  afterwards: (html) => {
  const path = Path.fromStr(filename)

    return {
      operation: "File.writeUtf8",
      args: [path, html],
      afterwards: () => {
        operation: "Stdout.line",
        args: [`Wrote HTML to ${filename}`],
        afterwards: () => { operation: "done" }
      }
    }
  }
}

Nothing but nested object literals where one field is a function that returns another object literal. No side effects anywhere! (This structure isn't exactly how Roc represents Task values in memory—the operation isn't a string, for example—but let's go with it for simplicity's sake.)

Implementing a Runtime

Now let's look at how a runtime can translate those nested object literals into I/O.

In Node.js I could do this by writing a loop which:

Starts with one of these "Task" values to run
Looks at the task's operation field and performs the requested I/O operation
Calls the function in the task's afterwards field, passing the output of that I/O operation. (This function will return another Task value.)
Loops back to the start to repeat this process until we encounter a Task whose operation field is "done", which tells us we're done.

Here's how that would look in code:

while task.operation != "done" {
  if task.operation == "Http.getUtf8" {
    const [url] = task.args
    const response = httpRequest(url)

    task = task.afterwards(response.text())
  } else if task.operation == "File.writeUtf8" {
    const [path, content] = task.args
    fs.writeFileSync(path, content)

    task = task.afterwards()
  } else if task.operation == "Stdout.line" {
    const [line] = task.args
    console.log(line)

    task = task.afterwards()
  }
}

Although this would work, Node encourages doing asynchronous I/O instead of synchronous like we've done here.

Fortunately, one of the benefits of representing effects as values that hold "afterwards" continuation functions like this is that the Task value can also be translated into async I/O. Here's the same Node code done in an asynchronous callback style instead—and using recursion instead of a while loop.

const interpretTask = (task) => {
  if task.operation == "Http.getUtf8" {
    const [url] = task.args

    fetch(url, (response) => {
      const text = response.text()

      const nextTask = task.afterwards(text)
      return interpretTask(nextTask)
    })
  } else if task.operation == "File.writeUtf8" {
    const [path, content] = task.args

    fs.writeFile(path, content, () => {
      const nextTask = task.afterwards()
      return interpretTask(nextTask)
    })
  } else if task.operation == "Stdout.line" {
    const [line] = task.args
    console.log(line)

    const nextTask = task.afterwards()
    return interpretTask(nextTask)

    } else if task.operation == "done" {
      // Don't recurse. We're done!
   }
  }

The actual I/O can be implemented in any number of styles. Promises, for example. Or async/await. Outside of JavaScript, it could be done with high-performance low-level async I/O operating system primitives like io_uring in C, Zig, or Rust—including inside async runtimes like Tokio.

Separating these I/O descriptions from the runtime that performs the actual I/O lets you drop in whatever async I/O runtime system you want, without having to change your application code at all!

In fact, one of the main motivations for representing effects as values like this is so that future Roc platforms can do all this "traversing the data structure" work behind the scenes to quietly give you excellent async I/O performance that's potentially even tailored to a particular domain (e.g. web servers, CLIs, games), while your application code gets to look as straightforward as it would in any imperative language:

html = Http.getUtf8! url
path = Path.fromStr filename
File.writeUtf8! path html
Stdout.line! "Wrote HTML to: $(filename)"

Platform authors can also use this representation to offer features like "dry-run mode" in which all the requests for disk I/O are performed on a fake in-memory filesystem (perhaps using the current state of the real filesystem for its initial structure) so that scripts can be tried out without their "I/O operations" affecting the actual disk. Or automatic logging of all I/O operations, with the application code specifying the logging system to use. The list goes on!

Besides platforms being able to apply different I/O runtimes to the same application, the functional semantics underneath the sugar have benefits for application authors too. For example, they unlock nicer testability.

Testability

We've seen how individual functions can be pure while resulting in an overall program that does I/O. But at the end of the day, if the code is just resulting in the I/O being performed anyway, what could possibly make it easier to test?

The key is that we can run a test on the value being returned, instead of handing it off to the runtime. That means no actual I/O gets performed, and the test is completely deterministic—yet all of the actual logic around the I/O can be tested!

For example, here's a test I can write using only Task values:

expect task.operation == "Http.getUtf8"
expect task.args == ["example.com/something"]

fakeResponse = "<html><body>testing!</body></html>"
next = task.afterwards [fakeResponse]

expect next.operation == "writeUtf8"
expect answer.args == [filename, fakeResponse]

This test will run extremely quickly, and it will never flake. All it does is look at values!

I could also write a property test to randomly run this a bunch of times with random inputs and verify that (for example) no matter what lines are in the files, the output has commas there instead. I could simulate that a third-party server I have no control over is timing out, or returning 500 errors, and verify that my application is handling that correctly…all without actually contacting that server!

(By the way, you can already write tests in Roc using the built-in expect keyword and the roc test CLI command, although writing simulation-style tests of Tasks relies on a language feature that hasn't been fully implemented yet. I plan to write about that after it ships!)

Of course, many programming languages have ways to test logic without actually running I/O, such as monkey patching, mocking, and so on. What I like about this "simulation" style is that I don't have to guess which APIs need to be monkey patched, I don't have to make my implementation more generic than I want it to be (just so I can swap in doubles for testing), and the simulation works just fine even if third-party packages are involved in assembling the Task values.

To be fair, any language can benefit from representing effects as values without going as far as to make all functions in the language pure, but there are separate practical benefits to having all functions be pure. Some of the other benefits take longer to explain than testability, so I'll stick to just that one example in this article…but I'd like to write more about some of the others in the future!

Trying Out Roc

If you're intrigued by this "functional semantics in imperative clothing" idea and want to give Roc a try for yourself, the tutorial is the easiest way to get up and running. It takes you from no Roc experience whatsoever to building your first program, while explaining language concepts along the way.

I also highly recommend dropping in to say hi on Roc Zulip Chat. There are lots of friendly people on there, and we love to help beginners get started with the language!

Finally, if you'd like to meet up with a bunch of Roc enthusiasts in person, there will be 3 different Roc talks at Software You Can Love 2024 in Milan this May, and we expect it to be the largest in-person gathering of Roc programmers to date. It's going to be amazing!