Training Node

Training nodes are responsible for training and fine-tuning the AI tasks initiated by the task creators. This mechanism ensures the integrity and health of the ecosystem, as nodes have vested interests via staking. In return, the nodes will be rewarded in proportion to their contributions. To become a training node, a user has to stake FLOCK.

0. Overview: reward drivers for training nodes

Put simply, a training node’s daily return from a task depends on three factors:

(1) the relative staking amount of this task against all tasks, meaning a training node’s stake in a particular task will indirectly affect it’s rewards from that task; and

(2) a training node’s stake in this task as well as stake delegated to this training node; and

(3) quality of the node’s submission, as shown by the node’s relative ranking. Specifically, it is a geometric series, along with its ranking, multiplied by the relative stake of this task.

It’s important to note that rank is being used here to determine the quality of the training node’s work, not other metrics such as absolute scores, primarily because scores can come very closely between nodes. Such design decision is believed to make reward calculation fairer and easier.

Calculation of reward distributions for training nodes follow a three-step formula:

1. Reward distribution within a single AI Arena task

Within a single AI Arena task, the reward distribution between training nodes and validators is determined based on their relative stake amounts.

We assume there are 𝑛 submissions (𝑂1,...,𝑂𝑛)(𝑂_1, . . . , 𝑂_𝑛 ) from 𝑛 training nodes with stakes (𝑡1,...,𝑡𝑛)(𝑡_1, . . . , 𝑡_𝑛 ), and 𝑚 validators (𝑉1,...,𝑉𝑚)(𝑉_1, . . . , 𝑉_𝑚 ) with stakes (𝑠1,...,𝑠𝑚)(𝑠_1, . . . , 𝑠_𝑚). Each validator 𝑉𝑗(1𝑗𝑚)𝑉_𝑗 (1 ≤ 𝑗 ≤ 𝑚) evaluates the nn models submitted by the training nodes.

Let the total daily reward allocated to a task be denoted as R0R_0​ and the parameter γ\gamma controls the split rewards, defining the balance between fixed and stake-dependent reward components.

The total rewards for training nodes are:

R0(γ+(12γ)i=1ntii=1nti+j=1msj)R_0 \cdot \left( \gamma + (1 - 2\gamma) \cdot \frac{\sum_{i=1}^{n} t_i}{\sum_{i=1}^{n} t_i + \sum_{j=1}^{m} s_j} \right)

2. Rewards for training nodes & their delegators

We can now compute the total rewards allocated for the training nodes as well as their delegators, which is based on the quality of their submission and their total amount of stake:

fi(gi,ti)=gitiαtk=1ngktkαtf_i(g_i, t_i) = \frac{g_{i} \cdot t_i^{\alpha_t}}{\sum_{k=1}^{n} g_{k} \cdot t_k^{\alpha_t}}

In which tit_i the total stake amount from the training node 𝑖 as well as its respective delegators, gig_i is the scores of the submitted models from training node, whereas kk denotes a given training node’s rank amongst its peers in the same task. On the other hand, αt\alpha_t is a system parameter that determines the influence of the stake on the reward distribution.

3. Rewards for training nodes

If a training node ii’s stake in the task is tnt_n and stakes delegated to training node ii is tdt_d i.e. ti=tn+tdt_i = t_n + t_d, then the actual reward for training node ii is:

fi(σ+(1σ)tntn+td)f_i \cdot \left(\sigma + (1-\sigma) \cdot \frac{t_n}{t_n+t_d}\right)

Note that in the front-end, you will see a “reward-sharing ratio”, which refers to (1σ)(1 - \sigma), which means when reward-sharing ratio is 60%, σ\sigma is 0.4. This ratio is set by training nodes and validators permissionlessly.

4. Example

Let’s assume daily total rewards for all AI Arena tasks for a given day is 309,157.68. We have 1 task with 2 nodes and 3 validators.

Nodes A and B stake 3,000 and 3,500 FLOCK respectively, while validators A, B and C stake 3,000, 6,000 and 3,000 respectively. Node A also receives an additional 1,000 FLOCK from its delegators, which brings the tit_i (total stake including delegated stake) to be 4,000 for Node A. For simplicity, we assume γ\gamma to be 0 in this example.

First, for this given task, total rewards for *all* training nodes are:

R0×i=1ntii=1nti+j=1msj  =  309,157.68×65006500+12000108,623.7R_0 \times \frac{\sum_{i=1}^n t_i}{\sum_{i=1}^n t_i + \sum_{j=1}^m s_j} \;=\; 309{,}157.68 \times \frac{6500}{6500 + 12000} \approx 108{,}623.7

We can then compute the rewards for *Node A and its delegators*. We are assuming that the scores for Node A and B are 0.501435 and 0.498565 respectively. Consider αt​=1, rewards for Node A (together with delegators) are:

fi(gi,ti)=gitik=1ngktk=0.501435×4000(0.501435×4000)+(0.498565×3500)×108,623.7=58,084f_i(g_i, t_i) = \frac{g_i \cdot t_i}{\sum_{k=1}^{n} g_k \cdot t_k} = \frac{0.501435 \times 4000}{(0.501435 \times 4000) + (0.498565 \times 3500)} \times 108{,}623.7 = 58{,}084

Finally, given 𝜎=0.4, the actual rewards for *Node A alone* is:

fi(σ+(1σ)tntn+td)=58,084×(0.4+0.6×30004000)=49,371.40f_i \cdot \Bigl(\sigma + (1-\sigma)\,\frac{t_n}{t_n + t_d}\Bigr) = 58{,}084 \times \Bigl(0.4 + 0.6 \times \tfrac{3000}{4000}\Bigr) = 49,371.40

Last updated