Inference API

Describes the interfaces Valence exposes for using inference in smart contracts.

Use npm i valence-inference-lib to install our Solidity library.

Inference Precompile

Valence Inference is provided through a standard Interface that any smart contract can use. The inference is implemented by a custom precompile called IValenceInference.

The Inference precompile and its functions are accessible at address 0x00000000000000000000000000000000000000F4

Valence exposes various types of inference, ranging from LLMs to classical ML models. In addition, it supports various security techniques, such as ZKML and TEE inference. Developers have the option to choose the most suitable methods for their use-case and requirements.

The 2 high-level functions exposed by the inference precompile are runModel and runLllm. runLllm is a function specifically designed for running large language models, whereas runModel is a generic method that can be used to execute any type of AI and ML models.

interface IValenceInference {

    enum ModelInferenceMode { VANILLA, ZK }
    enum LlmInferenceMode { VANILLA, TEE }

    function runModel(
        ModelInferenceMode mode, 
        ModelInferenceRequest memory request
    ) external returns (ModelOutput memory);

    function runLlm(
        LlmInferenceMode mode,
        LlmInferenceRequest memory request
    ) external returns (LlmResponse memory);
}

LLM Inference Requests

The request and response for LLMs (runLlm) are defined as follows. The main input is a prompt, and the answer is returned as a string as well. In addition, there are some common LLM parameters that can be tuned.

The list of LLMs you can use can be found in Supported LLMs.

struct LlmInferenceRequest {
    string model; // ID of the LLM to use
    string prompt; // LLM prompt
    uint32 max_tokens; // max tokens to generate
    string[] stop_sequence; // stop sequences for model response
    uint32 temperature; // model temperature (between 0 and 100)
}

struct LlmResponse {
    string answer; // answer generated by the LLM
    
    bool is_simulation_result; // indicates whether the result is real
}is_simulation_resu

To read more about is_simulation_result, please see Simulation Results

Generic Inference Requests

For all other ML models, the input and response can take various shapes and forms (such as numbers and strings), so we built a flexible framework that lets you define any type of input that your ONNX model expects. The input is made up of an array of number tensors and string tensors. You only need to set the tensors that your model expects (eg can leave string tensors empty if your model only expects numbers).


/**
 * Can be used to represent a floating-point number or integer.
 *
 * eg 10 can be represented as Number(10, 0),
 * and 1.5 can be represented as Number(15, 1)
 */
struct Number {
    int128 value;
    int128 decimals;
}

/**
 * Represents a model tensor input filled with numbers.
 */
struct NumberTensor {
    string name;
    Number[] values;
}

/**
 * Represents a model tensor input filled with strings.
 */
struct StringTensor {
    string name;
    string[] values;
}

/**
 * Model input, made up of various tensors of numbers and/or strings.
 */
struct ModelInput {
    NumberTensor[] numbers;
    StringTensor[] strings;
}

/**
 * Model inference request.
 */
struct ModelInferenceRequest {
    string modelId;
    ModelInput input;
}

/**
 * Model output, made up of tensors of either numbers or strings, ordered
 * as defined by the model. 
 *
 * For example, if a model's output is: [number_tensor_1, string_tensor_1, number_tensor_2],
 * you could access them like this:
 *
 * number_tensor_1 = output.numbers[0];
 * string_tensor_1 = output.strings[0];
 * number_tensor_2 = output.numbers[1];
 *
 */
struct ModelOutput {
    NumberTensor[] numbers;
    StringTensor[] strings;
    
    bool is_simulation_result; // indicates whether the result is real
}

To read more about is_simulation_result, please see Simulation Results

Simulation Results

Both LlmResponse and ModelOutput have a flag called is_simulation_result that indicates whether the result returned is "real" or not. As explained in Parallelized Inference Pre-Execution (PIPE), Vanna transactions are executed in 2 phases. In the first phase, the transaction is executed in simulation mode to gather and execute all inference requests in the background. Once the results are ready, the transaction is re-executed with the actual inference results. is_simulation_result indicates whether the transaction is currently being executed in simulation mode or not. When it it set to false, the value returned is coming from the model, however, when it is set to true, the value is empty and developers should explicitly handle this scenario in their code.

Transaction simulation results are never committed to the blockchain

For example:

function calculateFeeFromModelResult(ModelResult memory result) int128 {
    if (result.is_simulation_result) {
      // when in simulation, return some sensible default value
      return 1;
    }
    
    return result.numbers[0].values[0].value * 2;
}

Last updated