I am currently resuming my self-improvement task of learning Webassembly. Or, I should rather say, using Webassembly.
Since I have some optimized small code chunks that I want to use in an embedded environment, it seemed sensible that the first step would be to establish two-way communication with C.
In my last outing around three years ago, I was using Emscripten to bridge the gap. That tool adds quite a few bells and whistles, and doesn't quite yield that warm, fuzzy bare-metal rush. Emscripten relies on Clang and LLVM, all of which seem to have gotten their wasm
support built-in in the meantime (at least on my archlinux system). This it integrates nicely with wabt - the swiss-army knife of Webassembly.
So how far do we get with just clang, LLVM and wabt ? Let's see if we at least can set up a code snippet which simply writes "foobar" to memory. The host will write "foo", and wasm
will write "bar".
Without libc
This excellent tutorial by Surma provides a good starting point. Go ahead and read that first. This text is not a Webassembly primer, so the following will make a lot more sense if you do.
That setup still adds some magic. Namely, the memory and symbol table are here added by the wasm linker. It would be even more fun to pass this from the host system instead.
And so we start: [1]
0 #include "string.h" 1 2 extern unsigned char __heap_base; 3 extern void call_me_sometime(unsigned char *b); 4 5 void foo() { 6 unsigned char *buf = (unsigned char*)&__heap_base; 7 *(buf+3) = 'b'; 8 *(buf+4) = 'a'; 9 *(buf+5) = 'r'; 10 call_me_sometime(buf); 11 }
Compiling this without linking gives us a hint on what needs to be defined.
$ clang --target=wasm32 -nostdlib -nostartfiles -o bare.wasm -c bare.c
$ wasm-objdump -x bare.wasm
bare.wasm: file format wasm 0x1
Section Details:
Type[2]:
- type[0] () -> nil
- type[1] (i32) -> nil
Import[4]:
- memory[0] pages: initial=0 <- env.__linear_memory
- table[0] type=funcref initial=0 <- env.__indirect_function_table
- global[0] i32 mutable=1 <- env.__stack_pointer
- func[0] sig=1 <env.call_me_sometime> <- env.call_me_sometime
Function[1]:
- func[1] sig=0 <foo>
Code[1]:
- func[1] size=138 <foo>
Custom:
- name: "linking"
- symbol table [count=4]
- 0: F <foo> func=1 binding=global vis=hidden
- 1: G <env.__stack_pointer> global=0 undefined binding=global vis=default
- 2: D <__heap_base> undefined binding=global vis=default
- 3: F <env.call_me_sometime> func=0 undefined binding=global vis=default
Custom:
- name: "reloc.CODE"
- relocations for section: 3 (Code) [5]
- R_wasm_GLOBAL_INDEX_LEB offset=0x000007(file=0x000099) symbol=1 <env.__stack_pointer>
- R_wasm_GLOBAL_INDEX_LEB offset=0x00001c(file=0x0000ae) symbol=1 <env.__stack_pointer>
- R_wasm_MEMORY_ADDR_SLEB offset=0x000031(file=0x0000c3) symbol=2 <__heap_base>
- R_wasm_FUNCTION_INDEX_LEB offset=0x000073(file=0x000105) symbol=3 <env.call_me_sometime>
- R_wasm_GLOBAL_INDEX_LEB offset=0x000086(file=0x000118) symbol=1 <env.__stack_pointer>
Custom:
- name: "producers"
Using nodejs as the host, we check if we can instantiate a WebAssembly
object
0 const fs = require('fs'); 1 2 const imports = {} 3 4 async function init() { 5 const code = fs.readFileSync('./bare.wasm'); 6 const m = new WebAssembly.Module(code); 7 const i = new WebAssembly.Instance(m, imports); 8 } 9 init();
Running this tells us we are apparently missing a property env
in the imports object.
$ node bare_naive.js
/home/lash/src/tests/wasm/bare/bare_naive.js:8
const i = new WebAssembly.Instance(m, imports);
^
TypeError: WebAssembly.Instance(): Import #0 module="env" error: module is not an object or function
That seems to match with the Import
section in the objdump
output above. Let's stick the memory and table in there. [2]
And let's make a bold guess that the callback function call_me_sometime
needs to go in there aswell.
0 const fs = require('fs'); 1 2 const memory = new WebAssembly.Memory({initial: 2}); 3 const table = new WebAssembly.Table({initial: 3, element: 'anyfunc'}); 4 const importsObj = { 5 env: { 6 memory: memory, 7 __linear_memory: memory, 8 __indirect_function_table: table, 9 call_me_sometime: (n) => { 10 let a = new Uint8Array(memory.buffer, n, 9) 11 a.set([0x66, 0x6f, 0x6f], 0); 12 console.debug('heap is at: ' + n); 13 console.log('heap contains: ' + new TextDecoder().decode(a)); 14 }, 15 }, 16 } 17 18 async function init() { 19 const code = fs.readFileSync('./bare.wasm'); 20 const m = new WebAssembly.Module(code); 21 const i = new WebAssembly.Instance(m, importsObj); 22 i.exports.foo(); 23 } 24 init();
The linker needs a little help from us for this:
- Our callback function will not be available at link time, so we have to
--allow-undefined
to promise that the host has got this covered. --import-memory
and--import-table
to enable us to get memory and symbol table from the host.--export="foo"
to make sure we only export exactly what we intend to from ourwasm
.
$ clang --target=wasm32 -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined -o bare.wasm bare.c
And that should give us:
$ node bare.js
heap is at: 66560
heap contains: foobar
This way of pointing to memory is of course grossly inadequate and unsafe and ridiculous for any purpose more advanced that this one. So some proper memory management would not be a bad thing.
Adding libc
And what do you know. In other news since last time I looked at this is the addition of "a libc for WebAssembly programs built on top of WASI system calls." [wasi-libc]. Let's see if we can add a slightly less manual way of handling memory with malloc
and memcpy
0 #ifdef HAVE_LIBC 1 #include <string.h> 2 #include <stdlib.h> 3 #endif 4 5 extern unsigned char __heap_base; 6 extern void call_me_sometime(unsigned char *b); 7 8 void foo() { 9 10 #ifdef HAVE_LIBC 11 unsigned char *buf; 12 buf = malloc(9); 13 memcpy(buf+3, "bazbar", 6); 14 #else 15 unsigned char *buf = (unsigned char*)&__heap_base; 16 *(buf+3) = 'b'; 17 *(buf+4) = 'a'; 18 *(buf+5) = 'r'; 19 #endif 20 call_me_sometime(buf); 21 22 #ifdef HAVE_LIBC 23 free(buf); 24 #endif 25 26 }
As you see, we need a few more parameters for the compiler and linker at this point. The --target=wasm32-unknown-wasi --sysroot /opt/wasi-libc .. /opt/wasi-libc/lib/wasm32-wasi/libc.a
is needed to hook us up with headers and symbols for the libc.
My archlinux puts that sysroot in /opt/wasi-libc
, that may of course not be the case elsewhere.
$ clang -DHAVE_LIBC=1 --target=wasm32-unknown-wasi --sysroot /opt/wasi-libc -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined -o bare.wasm bare.c /opt/wasi-libc/lib/wasm32-wasi/libc.a
$ wasm-objdump -x bare.wasm
bare.wasm: file format wasm 0x1
Section Details:
Type[3]:
- type[0] (i32) -> nil
- type[1] () -> nil
- type[2] (i32) -> i32
Import[3]:
- memory[0] pages: initial=2 <- env.memory
- table[0] type=funcref initial=1 <- env.__indirect_function_table
- func[0] sig=0 <call_me_sometime> <- env.call_me_sometime
Function[7]:
- func[1] sig=1 <foo>
- func[2] sig=2 <malloc>
- func[3] sig=2 <dlmalloc>
- func[4] sig=0 <free>
- func[5] sig=0 <dlfree>
- func[6] sig=1 <abort>
- func[7] sig=2 <sbrk>
Global[2]:
- global[0] i32 mutable=1 - init i32=67072
- global[1] i32 mutable=0 <__heap_base> - init i32=67072
Export[2]:
- func[1] <foo> -> "foo"
- global[1] -> "__heap_base"
Code[7]:
- func[1] size=171 <foo>
- func[2] size=10 <malloc>
- func[3] size=6984 <dlmalloc>
- func[4] size=10 <free>
- func[5] size=1908 <dlfree>
- func[6] size=4 <abort>
- func[7] size=78 <sbrk>
Data[2]:
- segment[0] memory=0 size=7 - init i32=1024
- 0000400: 6261 7a62 6172 00 bazbar.
- segment[1] memory=0 size=500 - init i32=1032
- 0000408: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000418: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000428: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000438: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000448: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000458: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000468: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000478: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000488: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000498: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004a8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004b8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004c8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004d8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004e8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004f8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000508: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000518: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000528: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000538: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000548: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000558: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000568: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000578: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000588: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000598: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005a8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005b8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005c8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005d8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005e8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005f8: 0000 0000 ....
Custom:
- name: "name"
- func[0] <call_me_sometime>
- func[1] <foo>
- func[2] <malloc>
- func[3] <dlmalloc>
- func[4] <free>
- func[5] <dlfree>
- func[6] <abort>
- func[7] <sbrk>
Custom:
- name: "producers"
What luxury. And of course, our bare.wasm
file just grew from 350 bytes to 10k...
We don't have to change our javascript
code at this point. Simply run again, and get:
$ node bare.js
heap is at: 67088
heap contains: foobazbar
[1] __heap_base
will be set by default by the wasm environment, and is thus available as an external symbol.
[2] After linking the memory symbol meeds to be called memory
instead of__linear_memory
for some reason. Thus we add both here for clarity.
[wasi-libc] https://github.com/WebAssembly/wasi-libc