API Endpoint

Endpoint

The endpoint is OpenAI-compatible, if you are using the OpenAI client, simply switch the base URL to

https://api.flock.io/v1


Example Request

curl -X POST 'https://api.flock.io/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'x-litellm-api-key: sk-your-api-key' \
  -d '{
    "model": "qwen3-30b-a3b-instruct-2507",
    "stream": true,
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello, can you help me?"}
    ]
  }'

Request Body Parameters

Required

  • model string

    • The ID of the model to use.

    • Use the [List Models API] to see all available models.

  • messages string | array

    • Input(s) for the model.

    • Can be a string, array of strings, array of tokens, or array of token arrays.

    • <|endoftext|> is used as a document separator.

Optional

  • best_of integer

    • Defaults to 1.

    • Generates multiple completions server-side and returns the one with the highest log probability.

    • Not supported with stream.

  • frequency_penalty number

    • Defaults to 0.

    • Range: -2.0 to 2.0.

    • Positive values penalize repetition.

  • logit_bias map

    • Defaults to null.

    • Adjust likelihood of specific tokens.

    • Example: {"50256": -100} prevents <|endoftext|>.

  • logprobs integer

    • Defaults to null.

    • Returns log probabilities for top n tokens.

    • Max: 5.

  • max_tokens integer

    • Defaults to 16.

    • Maximum tokens to generate in the completion.

  • n integer

    • Defaults to 1.

    • Number of completions to generate.

  • presence_penalty number

    • Defaults to 0.

    • Range: -2.0 to 2.0.

    • Positive values encourage new topics.

  • seed integer

    • If provided, makes sampling deterministic when possible.

  • stop string | array

    • Defaults to null.

    • Up to 4 sequences that will stop token generation.

  • stream boolean

    • Defaults to false.

    • If true, responses are streamed as server-sent events.

  • stream_options object

    • Only used when stream: true.

  • temperature number

    • Defaults to 1.

    • Range: 0–2.

    • Higher values = more random output.

  • top_p number

    • Defaults to 1.

    • Nucleus sampling alternative to temperature.

  • user string

    • Unique identifier for the end-user.


Response

Returns a completion object, or a sequence of completion objects if streaming is enabled.


Notes

  • Use max_tokens and stop wisely to control output length.

  • For deterministic results, combine seed with fixed parameters.

Last updated

Was this helpful?