I am currently resuming my self-improvement task of learning Webassembly. Or, I should rather say, using Webassembly.
Since I have some optimized small code chunks that I want to use in an embedded environment, it seemed sensible that the first step would be to establish two-way communication with C.
In my last outing around three years ago, I was using Emscripten to bridge the gap. That tool adds quite a few bells and whistles, and doesn't quite yield that warm, fuzzy bare-metal rush. Emscripten relies on Clang and LLVM, all of which seem to have gotten their wasm support built-in in the meantime (at least on my archlinux system). This it integrates nicely with wabt - the swiss-army knife of Webassembly.
So how far do we get with just clang, LLVM and wabt ? Let's see if we at least can set up a code snippet which simply writes "foobar" to memory. The host will write "foo", and wasm will write "bar".
Without libc
This excellent tutorial by Surma provides a good starting point. Go ahead and read that first. This text is not a Webassembly primer, so the following will make a lot more sense if you do.
That setup still adds some magic. Namely, the memory and symbol table are here added by the wasm linker. It would be even more fun to pass this from the host system instead.
And so we start: [1]
0 #include "string.h" 1 2 extern unsigned char __heap_base; 3 extern void call_me_sometime(unsigned char *b); 4 5 void foo() { 6 unsigned char *buf = (unsigned char*)&__heap_base; 7 *(buf+3) = 'b'; 8 *(buf+4) = 'a'; 9 *(buf+5) = 'r'; 10 call_me_sometime(buf); 11 }
Compiling this without linking gives us a hint on what needs to be defined.
$ clang --target=wasm32 -nostdlib -nostartfiles -o bare.wasm -c bare.c
$ wasm-objdump -x bare.wasm
bare.wasm: file format wasm 0x1
Section Details:
Type[2]:
- type[0] () -> nil
- type[1] (i32) -> nil
Import[4]:
- memory[0] pages: initial=0 <- env.__linear_memory
- table[0] type=funcref initial=0 <- env.__indirect_function_table
- global[0] i32 mutable=1 <- env.__stack_pointer
- func[0] sig=1 <env.call_me_sometime> <- env.call_me_sometime
Function[1]:
- func[1] sig=0 <foo>
Code[1]:
- func[1] size=138 <foo>
Custom:
- name: "linking"
- symbol table [count=4]
- 0: F <foo> func=1 binding=global vis=hidden
- 1: G <env.__stack_pointer> global=0 undefined binding=global vis=default
- 2: D <__heap_base> undefined binding=global vis=default
- 3: F <env.call_me_sometime> func=0 undefined binding=global vis=default
Custom:
- name: "reloc.CODE"
- relocations for section: 3 (Code) [5]
- R_wasm_GLOBAL_INDEX_LEB offset=0x000007(file=0x000099) symbol=1 <env.__stack_pointer>
- R_wasm_GLOBAL_INDEX_LEB offset=0x00001c(file=0x0000ae) symbol=1 <env.__stack_pointer>
- R_wasm_MEMORY_ADDR_SLEB offset=0x000031(file=0x0000c3) symbol=2 <__heap_base>
- R_wasm_FUNCTION_INDEX_LEB offset=0x000073(file=0x000105) symbol=3 <env.call_me_sometime>
- R_wasm_GLOBAL_INDEX_LEB offset=0x000086(file=0x000118) symbol=1 <env.__stack_pointer>
Custom:
- name: "producers"
Using nodejs as the host, we check if we can instantiate a WebAssembly object
0 const fs = require('fs'); 1 2 const imports = {} 3 4 async function init() { 5 const code = fs.readFileSync('./bare.wasm'); 6 const m = new WebAssembly.Module(code); 7 const i = new WebAssembly.Instance(m, imports); 8 } 9 init();
Running this tells us we are apparently missing a property env in the imports object.
$ node bare_naive.js
/home/lash/src/tests/wasm/bare/bare_naive.js:8
const i = new WebAssembly.Instance(m, imports);
^
TypeError: WebAssembly.Instance(): Import #0 module="env" error: module is not an object or function
That seems to match with the Import section in the objdump output above. Let's stick the memory and table in there. [2]
And let's make a bold guess that the callback function call_me_sometime needs to go in there aswell.
0 const fs = require('fs'); 1 2 const memory = new WebAssembly.Memory({initial: 2}); 3 const table = new WebAssembly.Table({initial: 3, element: 'anyfunc'}); 4 const importsObj = { 5 env: { 6 memory: memory, 7 __linear_memory: memory, 8 __indirect_function_table: table, 9 call_me_sometime: (n) => { 10 let a = new Uint8Array(memory.buffer, n, 9) 11 a.set([0x66, 0x6f, 0x6f], 0); 12 console.debug('heap is at: ' + n); 13 console.log('heap contains: ' + new TextDecoder().decode(a)); 14 }, 15 }, 16 } 17 18 async function init() { 19 const code = fs.readFileSync('./bare.wasm'); 20 const m = new WebAssembly.Module(code); 21 const i = new WebAssembly.Instance(m, importsObj); 22 i.exports.foo(); 23 } 24 init();
The linker needs a little help from us for this:
- Our callback function will not be available at link time, so we have to
--allow-undefinedto promise that the host has got this covered. --import-memoryand--import-tableto enable us to get memory and symbol table from the host.--export="foo"to make sure we only export exactly what we intend to from ourwasm.
$ clang --target=wasm32 -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined -o bare.wasm bare.c
And that should give us:
$ node bare.js
heap is at: 66560
heap contains: foobar
This way of pointing to memory is of course grossly inadequate and unsafe and ridiculous for any purpose more advanced that this one. So some proper memory management would not be a bad thing.
Adding libc
And what do you know. In other news since last time I looked at this is the addition of "a libc for WebAssembly programs built on top of WASI system calls." [wasi-libc]. Let's see if we can add a slightly less manual way of handling memory with malloc and memcpy
0 #ifdef HAVE_LIBC 1 #include <string.h> 2 #include <stdlib.h> 3 #endif 4 5 extern unsigned char __heap_base; 6 extern void call_me_sometime(unsigned char *b); 7 8 void foo() { 9 10 #ifdef HAVE_LIBC 11 unsigned char *buf; 12 buf = malloc(9); 13 memcpy(buf+3, "bazbar", 6); 14 #else 15 unsigned char *buf = (unsigned char*)&__heap_base; 16 *(buf+3) = 'b'; 17 *(buf+4) = 'a'; 18 *(buf+5) = 'r'; 19 #endif 20 call_me_sometime(buf); 21 22 #ifdef HAVE_LIBC 23 free(buf); 24 #endif 25 26 }
As you see, we need a few more parameters for the compiler and linker at this point. The --target=wasm32-unknown-wasi --sysroot /opt/wasi-libc .. /opt/wasi-libc/lib/wasm32-wasi/libc.a is needed to hook us up with headers and symbols for the libc.
My archlinux puts that sysroot in /opt/wasi-libc, that may of course not be the case elsewhere.
$ clang -DHAVE_LIBC=1 --target=wasm32-unknown-wasi --sysroot /opt/wasi-libc -nostdlib -nostartfiles -Wl,--no-entry -Wl,--export="foo" -Wl,--import-memory -Wl,--import-table -Wl,--allow-undefined -o bare.wasm bare.c /opt/wasi-libc/lib/wasm32-wasi/libc.a
$ wasm-objdump -x bare.wasm
bare.wasm: file format wasm 0x1
Section Details:
Type[3]:
- type[0] (i32) -> nil
- type[1] () -> nil
- type[2] (i32) -> i32
Import[3]:
- memory[0] pages: initial=2 <- env.memory
- table[0] type=funcref initial=1 <- env.__indirect_function_table
- func[0] sig=0 <call_me_sometime> <- env.call_me_sometime
Function[7]:
- func[1] sig=1 <foo>
- func[2] sig=2 <malloc>
- func[3] sig=2 <dlmalloc>
- func[4] sig=0 <free>
- func[5] sig=0 <dlfree>
- func[6] sig=1 <abort>
- func[7] sig=2 <sbrk>
Global[2]:
- global[0] i32 mutable=1 - init i32=67072
- global[1] i32 mutable=0 <__heap_base> - init i32=67072
Export[2]:
- func[1] <foo> -> "foo"
- global[1] -> "__heap_base"
Code[7]:
- func[1] size=171 <foo>
- func[2] size=10 <malloc>
- func[3] size=6984 <dlmalloc>
- func[4] size=10 <free>
- func[5] size=1908 <dlfree>
- func[6] size=4 <abort>
- func[7] size=78 <sbrk>
Data[2]:
- segment[0] memory=0 size=7 - init i32=1024
- 0000400: 6261 7a62 6172 00 bazbar.
- segment[1] memory=0 size=500 - init i32=1032
- 0000408: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000418: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000428: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000438: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000448: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000458: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000468: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000478: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000488: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000498: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004a8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004b8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004c8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004d8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004e8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00004f8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000508: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000518: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000528: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000538: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000548: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000558: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000568: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000578: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000588: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 0000598: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005a8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005b8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005c8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005d8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005e8: 0000 0000 0000 0000 0000 0000 0000 0000 ................
- 00005f8: 0000 0000 ....
Custom:
- name: "name"
- func[0] <call_me_sometime>
- func[1] <foo>
- func[2] <malloc>
- func[3] <dlmalloc>
- func[4] <free>
- func[5] <dlfree>
- func[6] <abort>
- func[7] <sbrk>
Custom:
- name: "producers"
What luxury. And of course, our bare.wasm file just grew from 350 bytes to 10k...
We don't have to change our javascript code at this point. Simply run again, and get:
$ node bare.js
heap is at: 67088
heap contains: foobazbar
[1] __heap_basewill be set by default by the wasm environment, and is thus available as an external symbol.
[2] After linking the memory symbol meeds to be called memoryinstead of__linear_memoryfor some reason. Thus we add both here for clarity.
[wasi-libc] https://github.com/WebAssembly/wasi-libc