This post is mirrored on my blog.
It has been a while since I felt the need to add features and make some
more significant changes to Cakelisp.
Two came in this past month:
defer and CRC builds.
defer feature is one I had been wanting for a while, but was
unsure about how I wanted to implement it cleanly.
Here's an example usage of
(defun main (&return int)
(var file (* FILE) (fopen "File.txt", "rb"))
(defer (fclose file))
;; Do file operations...
defer there, I am guaranteed to have the file closed if the
function ever returns. This removes the need to copy-paste
(fclose file) before every
return, which can be very cumbersome. It
also makes the program more reliable, because I might forget to paste
This feature can be found in many other new languages, including
Go. It is a simple way to
have some automatic actions without needing to add C++-style
constructors and destructors, which can be quite complicated.
In Cakelisp, macros and compile-time code modification make it tricky to
know when the code is the final state that will be compiled.
defer, I needed to know two things:
- The commands that should be deferred, which can be specified in many
- Everywhere a scope exit occurs, so that the commands can be executed
Scopes are sequences of code that will always be executed together. An
if clause can have a scope executed if the condition is true, or
(optionally) one that should be executed when the condition is false.
Loop constructs like C's
while enter and exit the loop body
scope on each iteration. Finally, functions themselves constitute a
How Cakelisp code generation works
In Cakelisp, code generation happens through either a macro or a
generator. Macros output tokens. There are only four kinds of tokens:
"Hello, world!"; symbols, like
parenthesis; and close parenthesis. Cakelisp macros can run arbitrary
code, including custom validation, creating and setting compile-time
variables, etc. I have written about macros many times
This extremely restricted world makes it simple to write the
"evaluator": When the evaluator encounters an open parenthesis token,
it expects the next token to be a symbol. If it isn't, it's a syntax
error, otherwise, look up the symbol in the evaluator's known list of
macros and generators, by name. If one is found, evaluate it
immediately. If it isn't found, create a "reference" which we will
hope to eventually resolve.
Generators output C or C++ code in the form of "string operations".
These operations have various different flags such as "double quote"
or "newline after", which are processed by the writer. The writer
simply goes operation by operation, following its flags and outputting
text into a file as requested.
defer was implemented
defer consisted of three major parts.
defer statement itself was implemented as a generator. The
generator outputs the body of the defer into a splice.
Splices are special string operations that say, "output the array of
string operations at this address". Splices accomplish a few things:
- They create "holes" that can be later filled. This is used by
invocations where Cakelisp doesn't yet know whether you are trying
to call a C function or a macro/generator that has not been defined
yet. Cakelisp will generate everything in that state as if it were a
C function call, then if the macro/generator is later defined, it
will clear the splice's operations and replace it with the
- They make it possible to change the output later. This enables code
modification, which is when a function has already finished being
generated, then a second pass is done at compile-time which rewrites
that function with modifications. For example,
GameLib has a compile-time
function which rewrites every Cakelisp function to add performance
- They create a place to stow code for other operations. This is how
defer uses them.
defer generator outputs a single splice string operation with a
flag telling the writer that it should output the contents of that
splice on every scope exit.
Second, I needed to mark all the places where scopes enter and exit. I
was worried this would be complex, but it turned out simpler than I
expected. I had to audit all existing control flow generators (
for, etc.) and mark up
their Open and Close operations as scope-entering and scope-exiting
continue statements needed special
Third, the writer needed to have a stack of scopes as well as discovered
defer splices. When a scope enter operation is encountered, it adds a
scope to the stack. When a
defer is encountered, it adds a pointer to
its splice to the current scope on the stack.
Scope exits are when the
defer statements need to be output. The
writer has three different ways to handle scope exits:
- If the exit is "natural", e.g. the end of an
if true block is
reached, the writer simply outputs all
defer splices in the
current scope before the
if block's closing bracket.
- If the exit is from a
return, the writer must output the
splices for all scopes currently on the stack, because
exits all scopes.
- If the exit is from a
break, the writer outputs all
defer splices on all scopes until it hits a "continue breakable
scope", which is the start of a
Finally, the writer pops the most recently entered scope off the stack
to finish the exit.
One subtle detail is that the writer always outputs separate
splices in reverse order within the scope. This ensures that the first
defer is always the last to be executed, in case subsequent defers are
dependent on it.
defer did make the writer more complex, but not significantly. I
implemented it in the writer because I didn't want to add an extra
evaluator stage; as implemented,
defer is very inexpensive in terms of
performance during compile-time.
It is limited in that there is no compile-time place where the user
could analyze the final code after
defer has been applied, then make
changes to it. This is because it happens in the writing stage, which is
after any compile-time code generation or modification can occur. I will
keep it implemented as is until I find I need to do that, in which case
it will need to be moved into an evaluator stage.
My work on
was disturbed when I had problems with stale builds. I was trying to
create an auto-update build for the distributed-automation worker on
Windows, but the executable wasn't being updated.
Cakelisp used file modification times to decide whether an "artifact"
(an executable, object file, etc.) needed to be rebuilt. If the source
.c file, header file, etc.) had a file modification time later than
the artifact, the artifact is out of date and must be rebuilt.
The problem was that my Windows clock wasn't the correct time--it had
drifted into the future.[^1] When I ran a build, all the artifacts were
marked as being built at that future time. Once I set the clock to the
correct time, no artifacts would be built, because they were already
marked as being more recently modified than their source.
This might be obvious to someone who has already written a build system.
I knew it was an issue when I wrote the timestamp system, but I figured
the clocks were reliable enough that it wouldn't matter.
Now, Cakelisp takes the CRC of every source and header file and records
it in a cache. On next build, Cakelisp checks the source files against
the recorded CRCs. If they do not match, the artifact is rebuilt. This
is slower and more cumbersome than just checking modified times, but is
absolutely necessary if the modification times cannot be trusted
I now invalidate artifacts if the CRC is different or the source has a
newer timestamp. This lets the user e.g.
touch a file without changing
its contents to force a rebuild, for whatever reason. I may remove all
modification time code in the future, because it's not really providing
value past this.
Neither of these features are flashy, but
defer is a big
quality-of-life feature, and the CRC builds are an important fix for
what was an untrustworthy build system.
I don't have anything specific planned for Cakelisp in the near future.
I am still following the strategy where I only implement things when I
have a pressing need for them, so I can't say what I'll do next.
[^1]: The clock problem consistently happens because I dual-boot Windows
and Linux on that machine. The two operating systems don't agree on
how the hardware clock should keep time, so I must manually tell
Windows to reset the clock to the network time after I've booted. I
know I could solve this problem by configuring one or the other, but
haven't gotten to it yet. It's good to have solved the problem
with timestamps either way, because time in general shouldn't be
relied on for this kind of system.