[Example study] Solana program Instruction data encoding
In this article, we will show an example to illustrate how the instruction data encoding works down to the byte level.
(The example we show you is written with the Anchor framework, which is a popular framework.)
The Example
We have a Solana program which is a "Forum" application, which supports creating posts and replies. The example we will look into is calling the function of creating a new post.
(Program source code: https://github.com/airicyu/solaforum/blob/master/programs/solaforum/src/lib.rs )
Solscan link of this transaction:
In Solscan, you can view the transaction's instructions.
You can see that the instruction data is (hex):
7b5cb81de7180fca01000000000000000d00000044656d6f206e657720706f73741700000048656c6c6f206e69636520746f206d65657420796f7521
What the hell do these hex bytes 7b5cb81de7180f......20796f7521
means?
Explanation
Instruction signature bytes
For every program written in Anchor, it will use the instruction data's first 8 bytes to represent the signature of the instruction. This is the way to tell the program which method is invoking.
Look into the example to say:
The program's method name is create_post
. So the code name is createPost
. And then the snake case of it is create_post
.
Anchor would then use the pattern of "${namespace}:create_post
" as the instruction identifier. If you don't do any customization, the namespace is default global
.
i.e. function identifier is global:create_post
.
Anchor will do a SHA256 hash to the function identifier.
Which give you 7b5cb81de7180fcadcd116a38b9be03fb9b86c3ef256c12c735b7da11830bd3f
(hex).
Remember that we said that we only take the first 8 bytes? So that is:
7b 5c b8 1d e7 18 0f ca
OK, we look back into the transaction instruction data:
**7b5cb81de7180fca**01000000......57420796f7521
You see? It is exactly the first 8 bytes!
Instruction data bytes
The remaining part of the data bytes are purely the instruction data.
If you look into the program source code, the payload is encoding data of the below struct.
#[account]
pub struct CreatePostData {
earth_id: u64, // unsigned 64 bytes number
title: String,
content: String,
}
If you want to know the details of how much space each data type is used to encode, you can look info Anchor's doc at [ https://www.anchor-lang.com/docs/space ]. But I will assume you already know and not cover too much here.
The below figure shows how exactly the bytes are encoded:
P.S. Note that for the u64 number or string length, they are both encoded in "little-endian". If you are interested in it, you can find what it is from Google/Wiki.
Read more
If you are interested in PDA data raw bytes structuring, you may read my other article:
Or if you feel like the instruction data packing data bytes are too blind box to you, you may also read my other article which talks about how Anchor IDL helps us describe the program interface: