Task Lifecycle Deep-dive
This section offers a deep-dive into a cycle of an AI Arena Task.
Last updated
This section offers a deep-dive into a cycle of an AI Arena Task.
Last updated
Task creation is the primary stage of the training cycle. Task creators define the desired models and submit tasks to the platform.
To qualify as a task creator, users must meet one or more of the following criteria:
• Stake a sufficient amount of $FML
• Have successfully trained or validated a task previously, as evidenced by on-chain records
• Possess a reputation in the ML space or be recognised as a domain expert in relevant fields, as verified by the FLock community
Each participant is required to stake in order to participate either as a training node or a validator. Also, rate limiting is adopted to determine the number of times participants can be eligible as validator for a given task. Essentially, the likelihood of a participant being selected to validate a task submission increases with their stake. However, the rate at which validation frequency increases relative to the staking amount tends to diminish as the staking amount grows.
Each training node is given , which contains locally sourced data samples, comprising feature set and label set , with each sample corresponding to a label . The goal of training is to define a predictive model , which learns patterns within such that .
To quantify the success (i.e. ability to predict) of the predictive model , we introduce a loss function , assessing the discrepancy between predictions and actual labels . A generic expression for this function is: where denotes the total sample count, and signifies a problem-specific loss function, e.g., mean squared error or cross-entropy loss.
Ultimately, the optimisation goal of training is to adjust the model parameters to minimise , typically through algorithms such as gradient descent.
After the training node produces a trained model , a selected group of validators, denoted as , each equipped with the evaluation dataset from the task creator, will validate the model. The dataset consists of pairs , where represents the features of the sample, and is the corresponding true label.
To assess the performance of the trained model, we use an general evaluation, which is calculated as follows:
Here, represents the indicator function that returns 1 if the predicted label matches the true label , and otherwise. The function denotes the total number of samples within the evaluation dataset.
Each predicted label from the model is compared against its corresponding true label within the dataset . The calculated metric result (accuracy here) serves as a quantifiable measure of 's effectiveness at label prediction across the evaluation dataset.