Handmade Seattle

November 16 - 18. In person and online.
Catch up

We are a community of programmers producing quality software through deeper understanding.

Originally inspired by Casey Muratori's Handmade Hero, we have grown into a thriving community focused on building truly high-quality software. We're not low-level in the typical sense. Instead we realize that to write great software, you need to understand things on a deeper level.

Modern software is a mess. The status quo needs to change. But we're optimistic that we can change it.

Around the Network

New forum thread: GUI problems
desiredusername
New forum thread: RemedyBG 0.3.8.5
x13pixels
longtran2904
Christoffer Lernö

Macros and compile time evaluation are popular ways to extend a language. While macros fell out of favour by the time Java was created, they've returned to the mainstream in Nim and Rust. Zig has compile time and JAI has both compile time execution and macros.

At one point in time I was assuming that the more power macros and compile time execution provided the better. I'll try to break down why I don't think so anymore.

Code with meta programming are hard to read

Macros and compile time form a set of meta programming tools, and in general meta programming has very strong downsides in terms of maintaining and refactoring code. To understand code with meta programming you have to first resolve the meta program in your head, and not until you do so you can think about the runtime code. This is exponentially harder than reading normal code.

Bye bye, refactoring tools

It's not just you as a programmer that need to resolve the meta programming – any refactoring tool would need to do the same in order to safely do refactorings – even simple ones as variable name changes.

And if the name is created through some meta code, the refactoring tool would basically need to reprogram your meta program to be correct, which is unreasonably complex. This is why everything from preprocessing macros to reflection code simply won't refactor correctly with tools.

Making it worse: arbitrary type creation

Some languages allow that arbitrary types are created at compile time. Now the IDE can't even know how types look unless it runs the meta code. If the meta code is arbitrarily complex, so will the IDE need to be in order to "understand" the code. While the meta programming evalution might be nicely ordered when running the compiler, a responsive IDE will try to iteratively compile source files. This means the IDE will need to compile more code to get the correct ordering.

Code and meta code living together.

Many languages try to make the code and meta code look very similar. This leads to lots of potential confusion. Is a a compile time variable (and thus may change during compilation, and any expression containing it might be compile time resolved) or is it a real variable?

Here's some code, how easy is it to identify the meta code?

fn performFn(comptime prefix_char: u8, start_value: i32) i32  {
    var result: i32 = start_value;
    comptime var i = 0;
    inline while (i < cmd_fns.len) : (i += 1) {
        if (cmd_fns[i].name[0] == prefix_char) {
            result = cmd_fns[i].func(result);
        }
    } 
    return result;
}

I've tried to make it easier in C3 by not mixing meta and runtime code syntax. This is similar how macros in C are encouraged to be all upper case to avoid confusion:

macro int performFn(char $prefix_char, int start_value)
{
    int result = start_value;
    // Prefix $ all compile time vars and statements
    $for (var $i = 0; $i < CMD_FNS.len, $i++):
        $if (CMD_FNS[$i].name[0] == $prefix_char):
            result = CMD_FNS[$i].func(result);
        $endif;   
    $endfor;   
    return result;
}    

The intention with the C3 separate syntax is that the approximate runtime code can be found by removing all rows starting with $:

macro int performFn(char $prefix_char, int start_value)
{
    int result = start_value;


            result = CMD_FNS[$i].func(result);


    return result;
}    

Not elegant, but the intention is to maximize readability. In particular, look at the "if/$if" statement. In the top example you can only infer that it is compile time evaluated and folded by looking at i and prefix_char definitions. In the C3 example, the $if itself guarantees the contant folding and will return an error if the boolean expression inside of () isn't compile time folded.

Extending syntax for the win?

A popular use for macros is for extending syntax, but this often goes wrong. Even if you have a language with a macro system that is doing this well, what does it mean? It means that suddenly you can't look at something like foo(x) and be able to make assumptions about it. In C without macros we can make the assumption that neither x nor other local variables will not changed (unless they have been passed by reference to some function prior to this), and the code will resume running after the foo call (except if setjmp/longjmp is used). With C++ we can asume less, since foo may throw an exception, and x might implicitly be passed by reference.

The more powerful the macro system the less we can assume. Maybe it's pulling variables from the calling scope and changing them? Maybe it's returning from the current context? Maybe it's formatting the drive? Who knows. You need to know the exact definition or you can't read the local code and this undermines the idea of most languages.

Because in a typical language you will what "breaks the rules": all the built in statements like if, for and return. Then there is a way to extend the language that follows certain rules: functions and types. This forms the common language understood by a developer to be what "knowing a language is about": you know the syntax and semantics of the built-in statements.

If the language extends its syntax, then every code base becomes a DSL which you have to learn from scratch. This is similar to having to buy into some huge framework in the JS/Java-space, just worse.

The point is that while we're always extending the syntax of the language, doing this through certain limited mechanisms like functions works well, but the more unbounded the extension mechanisms the harder the code will be to read and understand.

When meta programming is needed

In some cases meta programming can make code more readable. If the problem is something like having a pre-calculated list for fast calculations or types defined from a protocol, then code generation can often solve the problem. Languages can improve this by better compiler support for triggering codegen.

In other cases the meta programming can be replaced by running code at startup. Having "static init" like Java static blocks can help for cases when libraries need to do initialization.

If none of those options work, there is always copy-paste.

Summary

So to summarize:

  • Code with meta programming is hard to read (so minimize and support readability).
  • Meta programming is hard to refactor (so adopt a subset that can work with IDEs).
  • Arbitrary type creation is hard for tools (so restrict it to generics).
  • Same syntax is bad (so make meta code distinct).
  • Extending syntax with macros is bad (so don't do it).
  • Codegen and init at runtime can replace some use of compile time.

Macros and compile time can be made extremely powerful, but this power is tempered by the huge drawbacks, good macros are not what you can do with them, but if it manages to balance readability with necessary features.

Gaurav Gautam

It's increasingly popular to use type inference for variable declarations.

– and it's understandable, after all who wants to write something like Foobar<Baz<Double, String>, Bar> more than once?

I would argue that "auto" (or your particular language's equivalent) is an anti-pattern when the type is fully known.

When is type inference used?

Few are arguing for replacing:

int i = get_val();

by

auto i = get_val();

The latter is longer and gives less information. Still, some "auto all the things!" fanatics argue that this is right. Because maybe at some time you change what get_val() returns and then you need to change one less place, so now rather than having a syntax error where the function is invoked you get it later at some other place to make it extra hard to debug...

But most people will argue it's mainly for when the type gets complex. For example:

std::map<std::string,std::vector<int> >::iterator it = myMap.begin();
// vs
auto it = myMap.begin();

Another important use is when you write macros or templates and the type has to be inferred. Here's a C3 example:

// No type inference
macro @swap1(&a, &b)
{
  $typeof(a) temp = a;
  a = b;
  b = temp;	
}
// vs
macro @swap2(&a, &b)
{
  var temp = a;
  a = b;
  b = temp;	
}

So we have two common cases:

  • When type is unknown
  • When the type name grows long and complex.

Where do long type names come from?

No one is arguing against the use of type inference when the type isn't known or generic – this use makes perfect sense.

– But there is a problem with the auto it = myMap.begin() use, where type inference is a desired shorthand to only because the type names are too long.

Type names only become long because parameterized types usually carry their parameterization in their type (well, some Java "enterprise" code manages long type names anyway, but that's beside the point).

This inevitably causes type signatures to blow up. It's usually possible to write typedefs to make the types shorter, but few are doing that because it's convenient to just define the type directly with parameters as opposed to doing type defines, plus sometimes the parameterization is actually helpful to determine if it matches a particular generic function.

So basically the way we parameterize types in most languages cause the type name blowup that is then mitigated with type inference.

Again, the problem with type inference

I'm not going to rehash the arguments made here: https://austinhenley.com/blog/typeinference.html. I am mostly in agreement with them.

I think the most important thing is that the type declarations locally documents the assumptions in the code. If I ever need to "hover over a variable in the IDE to find the type" (as some suggest as a solution), it means that it is unclear from the local code context what the type is. Since the type of a variable is fundamental to how the code works, this should never be unclear – which is why the type declaration serves as strong support for code reading. (Explicit variable types also makes it easy to text search for type usage and for the IDE to track types).

While this is bad, the problem with long type signatures often makes up for it. Type inference becomes a necessary because of how parameterized types work.

I would strongly object the idea of introducing type inference it to languages that don't have issues with long type names, such as C (or C3), because fundamentally it is something that will make to code less clear to read and consequently: bugs harder to catch.

The design smell

"auto" is a language design smell because it is typically a sign of the language having types parameterized in a way that makes them inconveniently long.

The type inference thus becomes a language design band-aid which lets people ignore tackling the very real issue of long type names.

If long type names are bad, why is everyone doing it?

Unfortunately there is an added complication: there aren't many good alternatives. Enforcing something like typedefs to use parameterized types works but is not particularly elegant.

There are other possibilities that could be explored, such as eliding the parameterization completely, but retaining the rest of the type (e.g. iterator it = myMap.begin) and similar ideas that straddle both inference and types trying to get the best of both worlds.

Such explorations are uncommon though, which the "auto" style type inference is probably to blame for. A popular band-aid is easier to apply than to find a more innovative solution.

This article is mirrored on my blog.

News

It has been nearly a month since my last post, and there have been some interesting developments.

RSS now supported

This blog now has RSS generated. I wrote the generator in Cakelisp, of course. You should be able to add this blog to your favorite feed reader by entering https://macoy.me as the URL. Note that HTTP is not supported, so depending on your feed reader's defaults, you will likely need to specify the https:// rather than just macoy.me.

Comments on Linker-Loader

My post on bringing a dynamic environment to C was on the front page of Hacker News a few weeks ago. The post got a good amount of positive comments, which was rewarding. I also got some private emails from some who expressed their interest and support in the endeavor.

Some commentators astutely pointed out that the Tiny C Compiler already supported something similar. In fact, I had already started working on modifying TCC to work towards my goals before this post arrived. I mentioned this project in my argument for self-modifying applications.

Progress on that front is going very well, and I'm excited to demonstrate my results soon. I plan on recording a video demonstration of it once it is ready. Suffice it to say, I think TCC is the way to go in terms of the enabling of a dynamic environment for C. I plan to write why TCC is the answer and what my modifications to it are in a future article.

Recent Cakelisp changes

The dynamic environment project and the RSS feed generator both required some Cakelisp changes.

Deferred macro execution

I wrote about my XML generator in Writing XML with S-expressions. I used this same generator to write the RSS feed XML. However, due to a subtle execution order issue, the RSS feed generator broke the XML writer.

The XML writer consisted of two Cakelisp macros:

  • A syntax definition, which tells Cakelisp which things are XML tags.
  • A writer. The writer allows the programmer to freely interweave XML tags and Cakelisp code because the writer can simply write the syntax-defined XML tags and execute everything else as Cakelisp code.

The problem arose when the write macro was executed before the syntax definition macro was. There was no facility in Cakelisp to detect dependencies between macros. Most dependencies were resolved simply by using different compilation phases, which I talk about in the Basics tutorial.

Now, macros can decide to defer their execution when they detect that e.g. a definition macro hasn't been executed yet. Cakelisp will respond to this by simply resolving other macros until there are none left to resolve. If all remaining macros are still trying to defer, that means the definition will never be resolved, and the build will fail. A helpful error indicates this by saying e.g. "Failed to resolve deferred references to <macro name>" and indicating the invocations that couldn't be resolved.

Essentially, this lets macros "wait and see" if they will ever be able to successfully execute.

This is admittedly strange and a bit complex, but because I had only encountered this construct once in all my Cakelisp code, I considered it acceptable.

C support

My dynamic environment project depends on my modified version of Tiny C Compiler. Before I started this project, Cakelisp transpiled to C++ by default. Needless to say, Tiny C Compiler cannot compile C++, so it was in my best interest to prefer C whenever possible.

I had done some preliminary work to output to C instead. The remaining work I did to make outputting to C the default was:

  • C had stricter header inclusion rules. For example, stdbool.h needs to be included in C, but not in C++.
  • Some declarations and definitions needed to be tweaked to prefer C syntax, which C++ also supports
  • I had to eschew the use of C++'s nullptr and instead prefer NULL, which is supported in both languages. Luckily, in Cakelisp I already had a special keyword null that meant that no "end user" code changes were required there.
  • Various changes needed to be made to GameLib, usually header file inclusions
  • A separate C compiler option, build-time-c-compiler, was added to allow setting a specific compiler for C. The build-time-compiler specifies the C++ compiler.

Cakelisp will keep track of which language features are required on a module-by-module basis, which means you can intermix C and C++ in the same Cakelisp project. For example, the instant your module references a namespace, that module will then require C++ and be automatically compiled as a C++ file. If you remove that reference it will become a C file instead. This is a nice feature because it means you can still reference C++ dependencies without having to infect your entire project with C++.

Exciting times

My dynamic environment project should be usable regardless of whether you want to write Cakelisp or prefer C. In fact, any language which produces ELF, PE, or COFF files should work with Tiny C Compiler's linker, though any features which depend on knowing the symbol's underlying type will not function.

I am very excited to be approaching my goal of a fully dynamic environment. I think this will not only open the door to new development processes, but also new forms of end-user applications, a la malleable systems.

Community Showcase

This is a selection of recent work done by community members. Want to participate? Join us on Discord.