layer8 14 minutes ago

If this implementation had existed in the 1980s, the C standard would have a rule that different tokens hashing to the same 16-bit value invoke undefined behavior, and optimizing compilers in the 2000s would simply optimize such tokens away to a no-op. ;)

mati365 3 hours ago

Oh, it looks like my X86-16 boot sector C compiler that I made recently [1]. Writing boot sector games has a nostalgic magic to it, when programming was actually fun and showed off your skills. It's a shame that the AI era has terribly devalued these projects.

[1] https://github.com/Mati365/ts-c-compiler

xorvoid 4 hours ago

I may be the author.. enjoy! It was an absolute blast making this!

  • JamesTRexx 4 hours ago

    Would and how much would it shrink when if, while, and for were replaced by the simple goto routine? (after all, in assembly there is only jmp and no other fancy jump instruction (I assume) ).

    And PS, it's "chose your own adventure". :-) I love minimalism.

    • SAI_Peregrinus 3 hours ago

      What fancy jumps are present in assembly depends on the CPU architecture. But there are always conditional jumps, like JNZ that jumps if the Zero flag isn't set.

  • veltas 4 hours ago

    This is very nice. I'm currently writing a minimalist C compiler although my goal isn't fitting in a boot sector, it's more targeted at 8-bit systems with a lot more room than that.

    This is a great demonstration of how simple the bare bones of C are, which I think is one reason I and many others find it so appealing despite how Spartan it is. C really evolved from B which was a demake of Fortran, if Ken Thompson is to be trusted.

  • einpoklum 4 hours ago

    An interesting use case - for the compiler as-is or for the essentiall idea of barely-C - might be in bootstrapping chains, i.e. starting from tiny platform-specific binaries one could verify the disassembly of, and gradually building more complex tools, interpreters, and compiler, so that eventually you get to something like a version of GCC and can then build an entire OS distribution.

    Examples:

    https://github.com/cosinusoidally/mishmashvm/

    and https://github.com/cosinusoidally/tcc_bootstrap_alt/

mojuba 3 hours ago

Compare that to the C compiler in 100,000 lines written by Claude in two weeks for $20,000 (I think was posted on HN just yesterday)

  • vidarh 3 hours ago

    It's a fun comparison, but with the notable difference that that one can compile the Linux kernel and generate code for multiple different architectures, while this one can only compile a small proportion of valid C. It's a great project, but it's not so much a C compiler, as a compiler for a subset of C that allows all programs this compiler can compile to also be compiled by an actual C compiler, but not vice versa.

    • d_silin 2 hours ago

      But can it compile "Hello, World" example from its own README.md?

      https://github.com/anthropics/claudes-c-compiler/issues/1

      • vidarh 2 hours ago

        Noticed the part where all it requires is to actually have the headers in the right location?

        • d_silin 2 hours ago

          "The location of Standard C headers do not need to be supplied to a conformant compiler."

          From https://news.ycombinator.com/item?id=46920922 discussion.

          • vidarh 2 hours ago

            And it doesn't for the compiler in question either. As long as the headers exist in the places it looks for them. No compiler magically knows where the headers are if you haven't placed them in the right location

            • Retr0id an hour ago

              stddef.h (et al) should be shipped by the compiler itself, and so it should know where it is. But they rely on gcc for it, hence it doesn't always know where to look. Seems totally fine for a prototype.

              • vidarh 41 minutes ago

                Especially given they're not shipping anything. The GCC binaries can't find misplaced or not installed headers either.

            • d_silin an hour ago

              Would you accept the same quality of implementation from a human team?

              • vidarh 43 minutes ago

                A compiler that can't magically know how to find headers that don't exist in the expected directory?

                Yes, that is the case for pretty much every compiler. I suppose you could build the headers into the binary, but nobody does that.

      • Retr0id 2 hours ago

        It's fascinating how few people read past the issue title

sanufar 4 hours ago

The way hashing is used for tokens and for making a pseudo symbol table is such an elegant idea.

  • fix4fun 4 hours ago

    I think the same. Really nice project and good trick with hashing tokens.

    PS. There left 21 bytes (21 * 0x00 - from 0x01e0 to 0x01fd). Maybe something can be packed there ;)

SeanSullivan86 3 hours ago

Why is it called a C Compiler if it's a subset of C?

gonzus 3 hours ago

Lacking support for structs, I think this is too minimalistic to be called "a C compiler".

  • pilord314 34 minutes ago

    you bootstrap it into a library you can include optionally, duh

NooneAtAll3 4 hours ago

> I wrote a fairly straight-forward and minimalist lexer and it took >150 lines of C code

was it supposed to be "<150"?

  • owalt 3 hours ago

    They're saying the naive implementation was more than 150 lines of C code (300-450 bytes), i.e. too big.