wasm and C - The bare necessities

Posted in code wasm c clang llvm

I am currently resuming my self-improvement task of learning Webassembly. Or, I should rather say, using Webassembly.

Since I have some optimized small code chunks that I want to use in an embedded environment, it seemed sensible that the first step would be to establish two-way communication with C.

In my last outing around three years ago, I was using Emscripten to bridge the gap. That tool adds quite a few bells and whistles, and doesn't quite yield that warm, fuzzy bare-metal rush. Emscripten relies on Clang and LLVM, all of which seem to have gotten their wasm support built-in in the meantime (at least on my archlinux system). This it integrates nicely with wabt - the swiss-army knife of Webassembly.

So how far do we get with just clang, LLVM and wabt ? Let's see if we at least can set up a code snippet which simply writes "foobar" to memory. The host will write "foo", and wasm will write "bar".

Without libc

This excellent tutorial by Surma provides a good starting point. Go ahead and read that first. This text is not a Webassembly primer, so the following will make a lot more sense if you do.

That setup still adds some magic. Namely, the memory and symbol table are here added by the wasm linker. It would be even more fun to pass this from the host system instead.

And so we start: [1]

 0 #include "string.h"
 1 
 2 extern unsigned char __heap_base;
 3 extern void call_me_sometime(unsigned char *b);
 4 
 5 void foo() {
 6         unsigned char *buf = (unsigned char*)&__heap_base;
 7         *(buf+3) = 'b';
 8         *(buf+4) = 'a';
 9         *(buf+5) = 'r';
10         call_me_sometime(buf);
11 }

Compiling this without linking gives us a hint on what needs to be defined.

$ clang --target=wasm32 -nostdlib -nostartfiles -o bare.wasm -c bare.c
$ wasm-objdump -x bare.wasm
bare.wasm:      file format wasm 0x1

Section Details:

Type[2]:
 - type[0] () -> nil
 - type[1] (i32) -> nil
Import[4]:
 - memory[0] pages: initial=0 <- env.__linear_memory
 - table[0] type=funcref initial=0 <- env.__indirect_function_table
 - global[0] i32 mutable=1 <- env.__stack_pointer
 - func[0] sig=1 <env.call_me_sometime> <- env.call_me_sometime
Function[1]:
 - func[1] sig=0 <foo>
Code[1]:
 - func[1] size=138 <foo>
Custom:
 - name: "linking"
  - symbol table [count=4]
   - 0: F <foo> func=1 binding=global vis=hidden
   - 1: G <env.__stack_pointer> global=0 undefined binding=global vis=default
   - 2: D <__heap_base> undefined binding=global vis=default
   - 3: F <env.call_me_sometime> func=0 undefined binding=global vis=default
Custom:
 - name: "reloc.CODE"
  - relocations for section: 3 (Code) [5]
   - R_wasm_GLOBAL_INDEX_LEB offset=0x000007(file=0x000099) symbol=1 <env.__stack_pointer>
   - R_wasm_GLOBAL_INDEX_LEB offset=0x00001c(file=0x0000ae) symbol=1 <env.__stack_pointer>
   - R_wasm_MEMORY_ADDR_SLEB offset=0x000031(file=0x0000c3) symbol=2 <__heap_base>
   - R_wasm_FUNCTION_INDEX_LEB offset=0x000073(file=0x000105) symbol=3 <env.call_me_sometime>
   - R_wasm_GLOBAL_INDEX_LEB offset=0x000086(file=0x000118) symbol=1 <env.__stack_pointer>
Custom:
 - name: "producers"

Using nodejs as the host, we check if we can instantiate a WebAssembly object

 0 const fs = require('fs');
 1 
 2 const imports = {}
 3 
 4 async function init() {
 5         const code = fs.readFileSync('./bare.wasm');
 6         const m = new WebAssembly.Module(code);
 7         const i = new WebAssembly.Instance(m, imports);
 8 }
 9 init();

Running this tells us we are apparently missing a property env in the imports object.

$ node bare_naive.js
/home/lash/src/tests/wasm/bare/bare_naive.js:8
const i = new WebAssembly.Instance(m, imports);
          ^

TypeError: WebAssembly.Instance(): Import #0 module="env" error: module is not an object or function

That seems to match with the Import section in the objdump output above. Let's stick the memory and table in there. [2]

And let's make a bold guess that the callback function call_me_sometime needs to go in there aswell.

 0 const fs = require('fs');
 1 
 2 const memory = new WebAssembly.Memory({initial: 2});
 3 const table = new WebAssembly.Table({initial: 3, element: 'anyfunc'});
 4 const importsObj = {
 5         env: {
 6                 memory: memory,
 7                 __linear_memory: memory,
 8                 __indirect_function_table: table,
 9                 call_me_sometime: (n) => {
10                         let a = new Uint8Array(memory.buffer, n, 9)
11                         a.set([0x66, 0x6f, 0x6f], 0);
12                         console.debug('heap is at: ' + n);
13                         console.log('heap contains: ' + new TextDecoder().decode(a));
14                 },
15         },
16 }
17 
18 async function init() {
19         const code = fs.readFileSync('./bare.wasm');
20         const m = new WebAssembly.Module(code);
21         const i = new WebAssembly.Instance(m, importsObj);
22         i.exports.foo();
23 }
24 init();

The linker needs a little help from us for this:

  • Our callback function will not be available at link time, so we have to --allow-undefined to promise that the host has got this covered.
  • --import-memory and --import-table to enable us to get memory and symbol table from the host.
  • --export="foo" to make sure we only export exactly what we intend to from our wasm.
$ clang --target=wasm32 -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined  -o bare.wasm bare.c

And that should give us:

$ node bare.js
heap is at: 66560
heap contains: foobar

This way of pointing to memory is of course grossly inadequate and unsafe and ridiculous for any purpose more advanced that this one. So some proper memory management would not be a bad thing.

Adding libc

And what do you know. In other news since last time I looked at this is the addition of "a libc for WebAssembly programs built on top of WASI system calls." [wasi-libc]. Let's see if we can add a slightly less manual way of handling memory with malloc and memcpy

 0 #ifdef HAVE_LIBC
 1 #include <string.h>
 2 #include <stdlib.h>
 3 #endif
 4 
 5 extern unsigned char __heap_base;
 6 extern void call_me_sometime(unsigned char *b);
 7 
 8 void foo() {
 9 
10 #ifdef HAVE_LIBC
11         unsigned char *buf;
12         buf = malloc(9);
13         memcpy(buf+3, "bazbar", 6);
14 #else
15         unsigned char *buf = (unsigned char*)&__heap_base;
16         *(buf+3) = 'b';
17         *(buf+4) = 'a';
18         *(buf+5) = 'r';
19 #endif
20         call_me_sometime(buf);
21 
22 #ifdef HAVE_LIBC
23         free(buf);
24 #endif
25 
26 }

As you see, we need a few more parameters for the compiler and linker at this point. The --target=wasm32-unknown-wasi --sysroot /opt/wasi-libc .. /opt/wasi-libc/lib/wasm32-wasi/libc.a is needed to hook us up with headers and symbols for the libc.

My archlinux puts that sysroot in /opt/wasi-libc, that may of course not be the case elsewhere.

$ clang -DHAVE_LIBC=1 --target=wasm32-unknown-wasi --sysroot /opt/wasi-libc -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined  -o bare.wasm bare.c /opt/wasi-libc/lib/wasm32-wasi/libc.a
$ wasm-objdump -x bare.wasm

bare.wasm:      file format wasm 0x1

Section Details:

Type[3]:
 - type[0] (i32) -> nil
 - type[1] () -> nil
 - type[2] (i32) -> i32
Import[3]:
 - memory[0] pages: initial=2 <- env.memory
 - table[0] type=funcref initial=1 <- env.__indirect_function_table
 - func[0] sig=0 <call_me_sometime> <- env.call_me_sometime
Function[7]:
 - func[1] sig=1 <foo>
 - func[2] sig=2 <malloc>
 - func[3] sig=2 <dlmalloc>
 - func[4] sig=0 <free>
 - func[5] sig=0 <dlfree>
 - func[6] sig=1 <abort>
 - func[7] sig=2 <sbrk>
Global[2]:
 - global[0] i32 mutable=1 - init i32=67072
 - global[1] i32 mutable=0 <__heap_base> - init i32=67072
Export[2]:
 - func[1] <foo> -> "foo"
 - global[1] -> "__heap_base"
Code[7]:
 - func[1] size=171 <foo>
 - func[2] size=10 <malloc>
 - func[3] size=6984 <dlmalloc>
 - func[4] size=10 <free>
 - func[5] size=1908 <dlfree>
 - func[6] size=4 <abort>
 - func[7] size=78 <sbrk>
Data[2]:
 - segment[0] memory=0 size=7 - init i32=1024
  - 0000400: 6261 7a62 6172 00                        bazbar.
 - segment[1] memory=0 size=500 - init i32=1032
  - 0000408: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000418: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000428: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000438: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000448: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000458: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000468: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000478: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000488: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000498: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00004a8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00004b8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00004c8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00004d8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00004e8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00004f8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000508: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000518: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000528: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000538: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000548: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000558: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000568: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000578: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000588: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 0000598: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00005a8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00005b8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00005c8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00005d8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00005e8: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  - 00005f8: 0000 0000                                ....
Custom:
 - name: "name"
 - func[0] <call_me_sometime>
 - func[1] <foo>
 - func[2] <malloc>
 - func[3] <dlmalloc>
 - func[4] <free>
 - func[5] <dlfree>
 - func[6] <abort>
 - func[7] <sbrk>
Custom:
 - name: "producers"

What luxury. And of course, our bare.wasm file just grew from 350 bytes to 10k...

We don't have to change our javascript code at this point. Simply run again, and get:

$ node bare.js
heap is at: 67088
heap contains: foobazbar
[1]__heap_base will be set by default by the wasm environment, and is thus available as an external symbol.
[2]After linking the memory symbol meeds to be called memory instead of __linear_memory for some reason. Thus we add both here for clarity.
[wasi-libc]https://github.com/WebAssembly/wasi-libc