Training Node

Training nodes are responsible for training and fine-tuning the AI tasks initiated by the task creators. This mechanism ensures the integrity and health of the ecosystem, as nodes have vested interests via staking. In return, the nodes will be rewarded in proportion to their contributions. To become a training node, a user has to stake FLOCK.

0. Overview: reward drivers for training nodes

Put simply, a training node’s daily return from a task depends on three factors:

(1) the relative staking amount of this task against all tasks, meaning a training node’s stake in a particular task will indirectly affect it’s rewards from that task; and

(2) a training node’s stake in this task as well as stake delegated to this training node; and

(3) quality of the node’s submission, as shown by the node’s relative ranking. Specifically, it is a geometric series, along with its ranking, multiplied by the relative stake of this task.

It’s important to note that rank is being used here to determine the quality of the training node’s work, not other metrics such as absolute scores, primarily because scores can come very closely between nodes. Such design decision is believed to make reward calculation fairer and easier.

Calculation of reward distributions for training nodes follow a three-step formula:

1. Reward distribution within a single AI Arena task

Within a single AI Arena task, the reward distribution between training nodes and validators is determined based on their relative stake amounts.

We assume there are 𝑛 submissions $(𝑂_1, . . . , 𝑂_𝑛 )$ from 𝑛 training nodes with stakes $(𝑡_1, . . . , 𝑡_𝑛 )$ , and 𝑚 validators $(𝑉_1, . . . , 𝑉_𝑚 )$ with stakes $(𝑠_1, . . . , 𝑠_𝑚)$ . Each validator $𝑉_𝑗 (1 ≤ 𝑗 ≤ 𝑚)$ evaluates the $n$ models submitted by the training nodes.

Let the total daily reward allocated to a task be denoted as $R_0$ and the parameter $\gamma$ controls the split rewards, defining the balance between fixed and stake-dependent reward components.

The total rewards for training nodes are:

R_0 \cdot \left( \gamma + (1 - 2\gamma) \cdot \frac{\sum_{i=1}^{n} t_i}{\sum_{i=1}^{n} t_i + \sum_{j=1}^{m} s_j} \right)

2. Rewards for training nodes & their delegators

We can now compute the total rewards allocated for the training nodes as well as their delegators, which is based on the quality of their submission and their total amount of stake:

f_i(g_i, t_i) = \frac{g_{i} \cdot t_i^{\alpha_t}}{\sum_{k=1}^{n} g_{k} \cdot t_k^{\alpha_t}}

In which $t_i$ the total stake amount from the training node 𝑖 as well as its respective delegators, $g_i$ is the scores of the submitted models from training node, whereas $k$ denotes a given training node’s rank amongst its peers in the same task. On the other hand, $\alpha_t$ is a system parameter that determines the influence of the stake on the reward distribution.

3. Rewards for training nodes

If a training node $i$ ’s stake in the task is $t_n$ and stakes delegated to training node $i$ is $t_d$ i.e. $t_i = t_n + \epsilon \cdot t_d$ , in which $\epsilon$ refers to the effective delegation amount. Specifically, effective delegation amount adjusts how much of the delegated stake $t_d$ is actually counted (i.e., how “effective” it is) when computing node $i$ 's total stake $t_i$ . When $\epsilon = 1$ , then the delegated stake $t_d$ is counted fully—delegations are treated just like the training node’s own stake. When $\epsilon < 1$ , then the delegated stake is “discounted,” so training node $i$ only gets a fraction of $t_d$ when accounting for its total stake, and vice versa. Thus, the actual reward for training node $i$ is:

f_i \cdot \left(\sigma + (1-\sigma) \cdot \frac{t_n}{t_n+ t_d}\right)

Note that in the front-end, you will see a “reward-sharing ratio”, which refers to $(1 - \sigma)$ , which means when reward-sharing ratio is 60%, $\sigma$ is 0.4. This ratio is set by training nodes and validators permissionlessly.

4. Example

Let’s assume daily total rewards for all AI Arena tasks for a given day is 309,157.68. We have 1 task with 2 nodes and 3 validators.

Nodes A and B stake 3,000 and 3,500 FLOCK respectively, while validators A, B and C stake 3,000, 6,000 and 3,000 respectively. Node A also receives an additional 1,000 FLOCK from its delegators, which brings the $t_i$ (total stake including delegated stake) to be 4,000 for Node A. For simplicity, we assume $\gamma$ to be 0 and $\epsilon$ to be in this example.

First, for this given task, total rewards for *all* training nodes are:

R_0 \times \frac{\sum_{i=1}^n t_i}{\sum_{i=1}^n t_i + \sum_{j=1}^m s_j} \;=\; 309{,}157.68 \times \frac{6500}{6500 + 12000} \approx 108{,}623.7

We can then compute the rewards for *Node A and its delegators*. We are assuming that the scores for Node A and B are 0.501435 and 0.498565 respectively. Consider αt=1, rewards for Node A (together with delegators) are:

f_i(g_i, t_i) = \frac{g_i \cdot t_i}{\sum_{k=1}^{n} g_k \cdot t_k} = \frac{0.501435 \times 4000}{(0.501435 \times 4000) + (0.498565 \times 3500)} \times 108{,}623.7 = 58{,}084

Finally, given 𝜎=0.4, the actual rewards for *Node A alone* is:

f_i \cdot \Bigl(\sigma + (1-\sigma)\,\frac{t_n}{t_n + t_d}\Bigr) = 58{,}084 \times \Bigl(0.4 + 0.6 \times \tfrac{3000}{4000}\Bigr) = 49,371.40

PreviousData Provider NextValidator

Last updated 6 months ago

Was this helpful?