Wasm binary structure
When the Wasm Runtime executes a Wasm binary, there are two main steps: decoding the Wasm binary and processing instructions. To decode the binary, it is necessary to understand the structure of the binary, so this chapter will explain that structure.
By the end of this chapter, you will understand what the structure of the binary looks like.
Overview of Wasm Binary
A Wasm binary consists of an 8-byte preamble at the beginning, followed by various sections.
The preamble consists of the Magic Number
'\0asm'
and the version value 1
, each occupying 4 bytes at the beginning of the file.
The following is a modified output of a binary generated by wat2wasm -v
, with some additional explanations.
\0asm
┌───┴───┐
0000000: 0061 736d ; WASM_BINARY_MAGIC
~~~~~~~ ~~ ~~~~~~~~~~~~~~~~~~~~
│ │ │
│ │ └ Comment
│ └ Hexadecimal notation, 2 digits = 1 byte
└ Offset of address
0000004: 0100 0000 ; WASM_BINARY_VERSION
In the explanations of the binary structure that follow, we will generally use this output with some added clarity.
There are multiple sections, each storing information necessary for execution at runtime. For example, there is information about function signatures, memory initialization, and instructions to be executed.
It is worth noting that sections are optional, so it is possible to create a minimal Wasm binary consisting only of the preamble.
In this document, we will implement the following sections, so for other sections, please refer to the specification.
Section | Description |
---|---|
Type Section | Information about function signatures |
Code Section | Information about instructions per function |
Function Section | Reference information to function signatures |
Memory Section | Information about linear memory |
Data Section | Information about data to be placed in memory during initialization |
Export Section | Information about exporting to other modules |
Import Section | Information about importing from other modules |
Subsequently, we will explain the data structures of each section.
Type Section
An area that holds function signature information. In brief, a signature refers to the type of a function.
A function signature is uniquely determined by the following combination:
- Types and order of arguments
- Types and order of return values
For example, functions $a
and $b
in List 1 have differences in argument order and presence of return values, so they have different signatures, but $a
and $c
have the same signature.
Signatures essentially define the input and output of a function, independent of the function's content, so $a
and $c
will reference the same signature information.
List 1
(module
(func $a (param i32 i64))
(func $b (param i64 i32) (result i32 i64)
(local.get 1)
(local.get 0)
)
(func $c (param i32 i64))
)
Function signatures provide information on how many arguments and return values to push onto the stack when executing a function.
The detailed usage of signature information will be explained in the chapter on Runtime
implementation.
List 2 represents the binary structure.
List 2
; section "Type" (1)
0000008: 01 ; section code
0000009: 0d ; section size
000000a: 02 ; num types
; func type 0
000000b: 60 ; func
000000c: 02 ; num params
000000d: 7f ; i32
000000e: 7e ; i64
000000f: 00 ; num results
; func type 1
0000010: 60 ; func
0000011: 02 ; num params
0000012: 7e ; i64
0000013: 7f ; i32
0000014: 02 ; num results
0000015: 7f ; i32
0000016: 7e ; i64
The first 2 bytes represent the section code
and section size
, which are common to all sections.
Although they do not have an official name, we will refer to them as section headers in this document.
; section "Type" (1)
0000008: 01 ; section code
0000009: 0d ; section size
000000a: 02 ; num types
The section code
is a unique value used to identify the section, with 1
representing the Type Section
.
The section size
indicates the number of bytes in the section data excluding the first 2 bytes.
This helps determine how much of the binary needs to be read when decoding the section.
num types
represents the number of function signatures. For each of these, the function signatures will be decoded.
The remaining part of the section defines function signatures. Each function signature starts with 0x60
and is defined in the order of the number and types of arguments, followed by the number and types of return values.
In List 3, func type 0
contains the signature information of (func $a (param i32 i64))
and (func $c (param i32 i64))
, while func type 1
contains the signature information of (func $b (param i64 i32) (result i32 i64))
.
List3
; func type 0
000000b: 60 ; func ┐
000000c: 02 ; num params │ (func $a (param i32 i64))
000000d: 7f ; i32 ├ (func $c (param i32 i64))
000000e: 7e ; i64 │
000000f: 00 ; num results ┘
; func type 1
0000010: 60 ; func ┐
0000011: 02 ; num params │
0000012: 7e ; i64 │ (func $b
0000013: 7f ; i32 ├ (param i64 i32)
0000014: 02 ; num results │ (result i32 i64)
0000015: 7f ; i32 │ )
0000016: 7e ; i64 ┘
Decoding function signatures generally involves the following steps:
- Read 1 byte and verify if it is
0x60
. - Read 1 byte to obtain the number of arguments.
- Read the bytes corresponding to the number obtained in step 2. For example, if it is 2, read 2 bytes.
- Read through the bytes obtained in step 3 one by one to get the type information corresponding to the values (e.g., if
0x7e
, it represents the i64 type). - Obtain the return value type information following steps 2 to 4.
Code Section
The Code Section
primarily stores the instruction information of functions.
List 4 represents the binary structure of the Code Section
.
List4
; section "Code" (10)
000001d: 0a ; section code
000001e: 0e ; section size
000001f: 03 ; num functions
; function body 0
0000020: 02 ; func body size
0000021: 00 ; local decl count
0000022: 0b ; end
; function body 1
0000023: 06 ; func body size
0000024: 00 ; local decl count
0000025: 20 ; local.get
0000026: 01 ; local index
0000027: 20 ; local.get
0000028: 00 ; local index
0000029: 0b ; end
; function body 2
000002a: 02 ; func body size
000002b: 00 ; local decl count
000002c: 0b ; end
num functions
indicates the number of functions, and you decode functions based on this number.
The remaining part consists of the definitions of local variables and instruction information for each function, which need to be decoded iteratively.
func body size
indicates the number of bytes in the function body.
local decl count
indicates the number of local variables.
If it is 0
, no action is taken, but if it is greater than 1
, the subsequent byte sequence defines the types of local variables.
The byte sequence up to end
represents the function instructions, and the Runtime
processes these instructions.
Decoding functions generally involves the following steps:
- Read 1 byte to obtain the size of the function.
- Read the byte sequence corresponding to the function size obtained in step 1.
- Read 1 byte to obtain the number of local variables.
- Read through the bytes obtained in step 3 one by one to get the type information.
- Obtain the instructions until the byte sequence read in step 2 is exhausted.
Function Section
The Function Section
holds information that links function bodies (Code Section
) with type information (Type Section
).
List 5 represents the binary structure.
List5
; section "Function" (3)
0000017: 03 ; section code
0000018: 04 ; section size
0000019: 03 ; num functions
000001a: 00 ; function 0 signature index
000001b: 01 ; function 1 signature index
000001c: 00 ; function 2 signature index
The value of function x signature index
represents the index information (0-based) to the function signature.
For example, function 2
indicates that it has the signature 0
from the Type Section
.
To clarify the relationship, refer to Figure 1.
Figure 1
Memory Section
The Memory Section
stores information on how much memory to allocate for the Runtime
.
Memory can be extended in page units, with 1 page being 64KiB as specified in the specification.
Memory is formatted as (memory $initial $max)
as shown in List 6, where 2
represents the initial memory page count, and 3
represents the maximum page count.
max
is optional, and if not specified, there is no upper limit.
List6
(module
(memory 2 3)
)
The binary structure is represented as shown in List 7.
List7
; section "Memory" (5)
0000008: 05 ; section code
0000009: 04 ; section size
000000a: 01 ; num memories
; memory 0
000000b: 01 ; limits: flags
000000c: 02 ; limits: initial
000000d: 03 ; limits: max
num memories
indicates the number of memories, but in version 1 of the specification, only one memory can be defined per module, making this value effectively fixed at 1.
limits: flags
is a value used to determine whether max
exists, meaning that if it is 0
, only initial
exists, and if it is 1
, both initial
and max
exist. This allows you to understand how to decode it.
Data Section
The Data Section
is the area where data to be placed after memory allocation in the Runtime
is defined. In other words, it defines the initial data of the memory.
List 8 is an example defining the string Hello, World!\n
in memory.
List 8
(module
(memory 1)
(data 0 (i32.const 0) "Hello, World!\n")
)
The data is formatted as (data $memory $offset $data)
and consists of the following elements:
$memory
is the index of the memory where the data is placed$offset
is the instruction sequence to calculate the offset of the memory to place the data$data
is the actual data to be placed in memory
In this example, the string Hello, World!\n
is placed in the 0th byte of the 0th memory.
The binary structure is as shown in List 9.
List 9
; section "Data" (11)
000000d: 0b ; section code
000000e: 14 ; section size
000000f: 01 ; num data segments
; data segment header 0
0000010: 00 ; segment flags
0000011: 41 ; i32.const
0000012: 00 ; i32 literal
0000013: 0b ; end
0000014: 0e ; data segment size
; data segment data 0
0000015: 4865 6c6c 6f2c 2057 6f72 6c64 210a ; data segment data
The data is organized into units called segments
, and there may be multiple segments
. A segment
consists of header
and data
areas, where header
contains the instruction sequence to calculate the offset and data
holds the actual data.
num data segments
is the number of segments
.
The data segment header
is the area that holds metadata such as the memory where the data is placed and the offset. There is one for each segment
.
segment flags
indicate the index of the memory where the data is placed. In version 1, only one memory can be defined, so it is effectively fixed at 0.
From i32.const
to end
is the instruction sequence to calculate the offset. In this case, only fixed values are handled, but global values can also be referenced.
data segment size
is the length of the actual data to be placed, and data segment data
is the actual data to be placed in memory.
Figure 2 illustrates the structure of the segment
in List 9.
Figure 2
Export Section
The Export Section
is the area where information accessible from other modules is defined. In version 1, memories, functions, etc., can be exported.
On the Runtime
side, only exported functions can be called, so if, for example, a function to perform addition needs to be called from the Runtime
, the function must be exported.
List 10 is an example of exporting the function $dummy
that the module itself has as dummy
.
List 10
(module
(func $dummy)
(export "dummy" (func $dummy))
)
The export format is (export $name ($type $index))
. $name
is the name to be exported, $type
is the type of data to be exported such as func
or memory
, and $index
is the index or name of that data. For example, in the case of func 0
, it refers to the 0th function. In this example, the function name $dummy
is specified, but it will be converted to an index when it becomes binary.
The binary structure is as shown in List 11.
List 11
; section "Export" (7)
0000012: 07 ; section code
0000013: 09 ; section size
0000014: 01 ; num exports
0000015: 05 ; string length
0000016: 6475 6d6d 79 ; export name (dummy)
000001b: 00 ; export kind
000001c: 00 ; export func index
num exports
is the number of data to be exported.
string length
is the length of the byte sequence of the exported name, and export name
is the actual byte sequence of characters.
export kind
is the type of data, where for memory it is 0x02
.
export func index
is the index of the function to be exported.
Import Section
Import Section
is an area where information is defined to import entities such as memory and functions that exist outside the module. The term "outside the module" refers to memory and functions provided by other modules or the Runtime
.
In this case, we are implementing WASI, and the actual implementation of WASI functions is done on the Runtime
side, so we plan to import and use them.
List 12 is an example of importing a function named add
from a module called adder
.
List 12
(module
(import "adder" "add" (func (param i32 i32) (result i32)))
)
The import format is (import $module $name $type)
.
$module
is the module name, $name
is the name of the function or memory to import, and $type
contains the type definition information. For functions, it includes the function's signature information, and for memory, it defines the min
and max
information of the memory.
The binary structure looks like List 13.
List 13
; section "Type" (1)
0000008: 01 ; section code
0000009: 07 ; section size
000000a: 01 ; num types
; func type 0
000000b: 60 ; func
000000c: 02 ; num params
000000d: 7f ; i32
000000e: 7f ; i32
000000f: 01 ; num results
0000010: 7f ; i32
; section "Import" (2)
0000011: 02 ; section code
0000012: 0d ; section size
0000013: 01 ; num imports
; import header 0
0000014: 05 ; string length
0000015: 6164 6465 72 ; import module name (adder)
000001a: 03 ; string length
000001b: 6164 64 ; import field name (add)
000001e: 00 ; import kind
000001f: 00 ; import signature index
string length
represents the length of the byte sequence of the characters, import module name
represents the byte sequence of the actual module name, and import field name
represents the byte sequence of the function or memory name to import.
import kind
indicates the type of import, where 0
is used for functions.
import signature index
points to the index of the function's signature information, referring to func type 0
in the Type Section
.
Summary
In this chapter, we explained the sections targeted for implementation. If you are not familiar with handling binaries, it may seem challenging, but we recommend revisiting this chapter repeatedly until you become comfortable with it.
In the next chapter, we will proceed with implementing the process of decoding a Wasm binary.