# Inference API

{% hint style="info" %}
Use `npm i valence-inference-lib` to install our Solidity library.
{% endhint %}

### **Inference Precompile**

Valence Inference is provided through a standard Interface that any smart contract can use. The inference is implemented by a custom precompile called `IValenceInference`.

{% hint style="info" %}
The Inference precompile and its functions are accessible at address `0x00000000000000000000000000000000000000F4`
{% endhint %}

Valence exposes various types of inference, ranging from LLMs to classical ML models. In addition, it supports various security techniques, such as ZKML and TEE inference. Developers have the option to choose the most suitable methods for their use-case and requirements.

The 2 high-level functions exposed by the inference precompile are `runModel` and `runLllm`. `runLllm` is a function specifically designed for running large language models, whereas `runModel` is a generic method that can be used to execute any type of AI and ML models.

```solidity
interface IValenceInference {

    enum ModelInferenceMode { VANILLA, ZK }
    enum LlmInferenceMode { VANILLA, TEE }

    function runModel(
        ModelInferenceMode mode, 
        ModelInferenceRequest memory request
    ) external returns (ModelOutput memory);

    function runLlm(
        LlmInferenceMode mode,
        LlmInferenceRequest memory request
    ) external returns (LlmResponse memory);
}
```

### LLM Inference Requests

The request and response for LLMs (`runLlm`) are defined as follows. The main input is a prompt, and the answer is returned as a string as well. In addition, there are some common LLM parameters that can be tuned.

The list of LLMs you can use can be found in [supported-llms](https://docs.vannalabs.ai/build/models/supported-llms "mention").

<pre class="language-solidity"><code class="lang-solidity">struct LlmInferenceRequest {
    string model; // ID of the LLM to use
    string prompt; // LLM prompt
    uint32 max_tokens; // max tokens to generate
    string[] stop_sequence; // stop sequences for model response
    uint32 temperature; // model temperature (between 0 and 100)
}

struct LlmResponse {
    string answer; // answer generated by the LLM
    
<strong>    bool is_simulation_result; // indicates whether the result is real
</strong>}is_simulation_resu
</code></pre>

To read more about `is_simulation_result`, please see [#simulation-results](#simulation-results "mention")

### Generic Inference Requests

For all other ML models, the input and response can take various shapes and forms (such as numbers and strings), so we built a flexible framework that lets you define any type of input that your ONNX model expects. The input is made up of an array of number tensors and string tensors. You only need to set the tensors that your model expects (eg can leave string tensors empty if your model only expects numbers).

```solidity

/**
 * Can be used to represent a floating-point number or integer.
 *
 * eg 10 can be represented as Number(10, 0),
 * and 1.5 can be represented as Number(15, 1)
 */
struct Number {
    int128 value;
    int128 decimals;
}

/**
 * Represents a model tensor input filled with numbers.
 */
struct NumberTensor {
    string name;
    Number[] values;
}

/**
 * Represents a model tensor input filled with strings.
 */
struct StringTensor {
    string name;
    string[] values;
}

/**
 * Model input, made up of various tensors of numbers and/or strings.
 */
struct ModelInput {
    NumberTensor[] numbers;
    StringTensor[] strings;
}

/**
 * Model inference request.
 */
struct ModelInferenceRequest {
    string modelId;
    ModelInput input;
}

/**
 * Model output, made up of tensors of either numbers or strings, ordered
 * as defined by the model. 
 *
 * For example, if a model's output is: [number_tensor_1, string_tensor_1, number_tensor_2],
 * you could access them like this:
 *
 * number_tensor_1 = output.numbers[0];
 * string_tensor_1 = output.strings[0];
 * number_tensor_2 = output.numbers[1];
 *
 */
struct ModelOutput {
    NumberTensor[] numbers;
    StringTensor[] strings;
    
    bool is_simulation_result; // indicates whether the result is real
}
```

To read more about `is_simulation_result`, please see [#simulation-results](#simulation-results "mention")

### Simulation Results

Both `LlmResponse` and `ModelOutput` have a flag called `is_simulation_result` that indicates whether the result returned is "real" or not. As explained in [parallelized-inference-pre-execution-pipe](https://docs.vannalabs.ai/vanna-network/architecture/parallelized-inference-pre-execution-pipe "mention"), Vanna transactions are executed in 2 phases. In the first phase, the transaction is executed in simulation mode to gather and execute all inference requests in the background. Once the results are ready, the transaction is re-executed with the actual inference results. `is_simulation_result` indicates whether the transaction is currently being executed in simulation mode or not. When it it set to `false`, the value returned is coming from the model, however, when it is set to `true`, the value is empty and developers should explicitly handle this scenario in their code.

{% hint style="info" %}
Transaction simulation results are never committed to the blockchain
{% endhint %}

For example:

```solidity
function calculateFeeFromModelResult(ModelResult memory result) int128 {
    if (result.is_simulation_result) {
      // when in simulation, return some sensible default value
      return 1;
    }
    
    return result.numbers[0].values[0].value * 2;
}
```
