Intro of Solana Program - PDA
In this article, we will talk about how the Solana program (smart contract) stores on-chain data via PDA.
What is a PDA?
PDA stands for Program Derived Addresses.
(Read more: https://solanacookbook.com/core-concepts/pdas.html )
Every data stored on-chain is a Data Account
In Solana, when we want to store some data on-chain, we would create data accounts and allocate bytes to them. Then we can store data structs in such data accounts.
(Read more: https://solanacookbook.com/core-concepts/accounts.html )
Every such data account does have a unique account address, which looks like a public key address. They also have owner property to state which account (can be wallet address or program account) owns these data accounts.
"Program ->PDA". It's like managing a key-value store.
You can imagine that a program (smart contract) is owning a lot of data accounts. So only this program can have access to modify these variables. (Note that data read is public and transparent.)
In the above figure, we can see that there is a global variable "global_counter" which is stored in a PDA.
And we may also need to store a balance for each wallet address (user). In this case, we will have key-value records which 1-to-1 mapped for each user like:
"balance:dv4A ...... BVJB"
"balance:9YR7 ...... gyjY"
Note that in both cases above, the logical keys (e.g. "global_counter" or "balance:dv4A ...... BVJB") are not the PDA address.
We would hash it to derive the final PDA addresses.
Derive the PDA address
The way we do this is by providing some bytes as seeds for hash input. For example, we can turn the logical key into bytes as seeds.
Then the logic is like the below pseudo code:
Given "seeds" of type byte[][];
for (let bump = 255; bump >0; bump --) {
let buffer: byte[];
buffer += flattened "seeds"
buffer += bump
buffer += programId as bytes
buffer += "ProgramDerivedAddress" as utf8 bytes
let publicKeyBytes = SHA256 hash of buffer
test whether publicKeyBytes is on ED25519 curve?
if it is not on the curve {
we got it!
} otherwise {
continue try with next bump
}
}
That is, we get initial seeds bytes, then we will add 1 more byte as a "bump" which is for the for-loop-retry mechanism. Because each try may have around a 50% chance of getting a valid result, we may need to retry several times to get it.
Then we will do SHA256 hash to get 32 bytes, and then we will test whether this 32 bytes result is on the ED25519 curve. We keep trying until we find a result that is NOT on the curve, then this is the result PDA address.
In the next sections, we will use some real examples to explain the details.
Example 1: Global counter data
We have a program with an address 4HeVTFdGHgSzjmexn7k1zpJxFsBymJ7FcpwJzGARvswN
.
And we have a global variable named earth_id_counter
. We want to know how to calculate the address of this PDA.
To find the PDA address, we specify a seed by "earth_id_counter" as utf8 bytes.
Let's look at some code written in NodeJS with @solana/web3.js to see how it calculates the address.
Here we write some NodeJS code to get the PDA address:
Look into the findProgramAddressSync
function:
You can see that it uses a nonce (bump) which starts from 255 and appends to the seeds.
And then look into the createProgramAddressSync
function:
Basically like what we described in the last section's pseudo code.
Console output:
Turn out we are lucky, so the first time we try with bump 255 we can already get a valid PDA address with 3RtxkgsA6ohhQo87dusfbwCfBaNS1bPGL554YFYRx6MZ
.
Example 2: Per User record data
We have the same program with an address 4HeVTFdGHgSzjmexn7k1zpJxFsBymJ7FcpwJzGARvswN
.
this time we want to have a per-user metadata data record which is named user
.
Let's assume the user's address is 8kChC4Fr7D7nQpRXJ9HkRYSLHbKFTF6Jb2hD5AEfE6KP
.
Let's look at the code to find the address:
The seeds have two elements:
The utf8 bytes of "user" string.
The user address bytes. (Not utf8 bytes, but Base58 decode bytes)
Console output:
The PDA address is Agfdvfde3dNPTRD7ngLHjpLszCRDDBbe8RfY9QwDd2jU
with bump 255.
Seeds to use must be consistent between the Program and Client side
Note that the PDA seeds must be consistent on the smart contract side and client side.
In example 2,
The smart contract side uses [ "user", signer address bytes ] to derive the PDA address.
And in the client side, it must also align to use ["user", signer address bytes] to derive the PDA address.
This is because when the client-side drafts a transaction to submit, they must also know what accounts need to be created/updated. So client side would need to derive it to submit to create the PDA account. And smart contract side would need to derive it the same way with some security checking.
Will another program with the same seed have the same PDA address?
Short answer: No.
Look at the logic that derives the PDA address. The program ID bytes are included in the seeds before the SHA256 hash. Hence, even though another program may have the same seeds, the final hash result is different. So it is unlikely to have PDA address conflicts.
Will someone submit the transaction to write data into a forged PDA address to hack other users?
What this question means is that:
User A may submit transactions that write data to the PDA address of User B. If this is possible, then User A may pretend to be User B and potentially hack something.
The answer is:
If the smart contract side is written correctly, this should not happen. And this is the responsibility of the smart contract to protect the security.
A simple way to protect it is that:
The smart contract can derive the PDA address and check whether the signer address is used in the seeds, instead of some forgery address. This ensures that we are always working against the signer(user).