# Task Lifecycle Deep-dive

This section offers a deep-dive into a cycle of an AI Arena Task.

## 1. Task Creation

Task creation is the primary stage of the training cycle. Task creators define the desired models and submit tasks to the platform.

To qualify as a task creator, users must meet one or more of the following criteria:

• Stake a sufficient amount of $FML

• Have successfully trained or validated a task previously, as evidenced by on-chain records

• Possess a reputation in the ML space or be recognised as a domain expert in relevant fields, as verified by the FLock community

## 2. Training Node and Validator Selection

Each participant is required to stake in order to participate either as a training node or a validator. Also, rate limiting is adopted to determine the number of times participants can be eligible as validator for a given task. Essentially, the likelihood of a participant being selected to validate a task submission increases with their stake. However, the rate at which validation frequency increases relative to the staking amount tends to diminish as the staking amount grows.

## 3. Training

Each training node is given $\mathcal{D}{\text{local}}$*, which contains locally sourced data samples, comprising feature set *$X$* and label set *$Y$*, with each sample *$x_i \in X$* corresponding to a label *$y_i \in Y$*. The goal of training is to define a predictive model *$f$*, which learns patterns within *$\mathcal{D}{\text{local}}$ such that $f(x_i) \approx y_i$.

To quantify the success (i.e. ability to predict) of the predictive model $f$, we introduce a loss function $L(f(x_i), y_i)$, assessing the discrepancy between predictions $f(x_i)$ and actual labels $f(y_i)$. A generic expression for this function is: where $N$ denotes the total sample count, and $l$ signifies a problem-specific loss function, e.g., mean squared error or cross-entropy loss.

Ultimately, the optimisation goal of training is to adjust the model parameters $\theta$ to minimise $L$, typically through algorithms such as gradient descent.

## 4. Validation

After the training node produces a trained model $\theta^{task}_p$, a selected group of validators, denoted as $V_j \in V$, each equipped with the evaluation dataset $\mathcal{D}_{\text{eval}}$ from the task creator, will validate the model. The dataset consists of pairs $(x_i, y_i)$, where $x_i$ represents the features of the $i-th$ sample, and $y_i$is the corresponding true label.

To assess the performance of the trained model, we use an general evaluation, which is calculated as follows:

Here, $\mathbf{1}$ represents the indicator function that returns 1 if the predicted label $\hat{y}i$* matches the true label *$y_i$*, and *$0$* otherwise. The function *$|\mathcal{D}{\text{eval}}|$ denotes the total number of samples within the evaluation dataset.

Each predicted label $\hat{y}_i$ from the model $\theta^{task}p$* is compared against its corresponding true label *$y_i$* within the dataset *$\mathcal{D}{\text{eval}}$. The calculated metric result (accuracy here) serves as a quantifiable measure of $\theta^{task}_p$'s effectiveness at label prediction across the evaluation dataset.

Last updated