Inference API
Describes the interfaces Valence exposes for using inference in smart contracts.
Inference Precompile
Valence Inference is provided through a standard Interface that any smart contract can use. The inference is implemented by a custom precompile called IValenceInference
.
Valence exposes various types of inference, ranging from LLMs to classical ML models. In addition, it supports various security techniques, such as ZKML and TEE inference. Developers have the option to choose the most suitable methods for their use-case and requirements.
The 2 high-level functions exposed by the inference precompile are runModel
and runLllm
. runLllm
is a function specifically designed for running large language models, whereas runModel
is a generic method that can be used to execute any type of AI and ML models.
interface IValenceInference {
enum ModelInferenceMode { VANILLA, ZK }
enum LlmInferenceMode { VANILLA, TEE }
function runModel(
ModelInferenceMode mode,
ModelInferenceRequest memory request
) external returns (ModelOutput memory);
function runLlm(
LlmInferenceMode mode,
LlmInferenceRequest memory request
) external returns (LlmResponse memory);
}
LLM Inference Requests
The request and response for LLMs (runLlm
) are defined as follows. The main input is a prompt, and the answer is returned as a string as well. In addition, there are some common LLM parameters that can be tuned.
The list of LLMs you can use can be found in Supported LLMs.
struct LlmInferenceRequest {
string model; // ID of the LLM to use
string prompt; // LLM prompt
uint32 max_tokens; // max tokens to generate
string[] stop_sequence; // stop sequences for model response
uint32 temperature; // model temperature (between 0 and 100)
}
struct LlmResponse {
string answer; // answer generated by the LLM
bool is_simulation_result; // indicates whether the result is real
}is_simulation_resu
To read more about is_simulation_result
, please see Simulation Results
Generic Inference Requests
For all other ML models, the input and response can take various shapes and forms (such as numbers and strings), so we built a flexible framework that lets you define any type of input that your ONNX model expects. The input is made up of an array of number tensors and string tensors. You only need to set the tensors that your model expects (eg can leave string tensors empty if your model only expects numbers).
/**
* Can be used to represent a floating-point number or integer.
*
* eg 10 can be represented as Number(10, 0),
* and 1.5 can be represented as Number(15, 1)
*/
struct Number {
int128 value;
int128 decimals;
}
/**
* Represents a model tensor input filled with numbers.
*/
struct NumberTensor {
string name;
Number[] values;
}
/**
* Represents a model tensor input filled with strings.
*/
struct StringTensor {
string name;
string[] values;
}
/**
* Model input, made up of various tensors of numbers and/or strings.
*/
struct ModelInput {
NumberTensor[] numbers;
StringTensor[] strings;
}
/**
* Model inference request.
*/
struct ModelInferenceRequest {
string modelId;
ModelInput input;
}
/**
* Model output, made up of tensors of either numbers or strings, ordered
* as defined by the model.
*
* For example, if a model's output is: [number_tensor_1, string_tensor_1, number_tensor_2],
* you could access them like this:
*
* number_tensor_1 = output.numbers[0];
* string_tensor_1 = output.strings[0];
* number_tensor_2 = output.numbers[1];
*
*/
struct ModelOutput {
NumberTensor[] numbers;
StringTensor[] strings;
bool is_simulation_result; // indicates whether the result is real
}
To read more about is_simulation_result
, please see Simulation Results
Simulation Results
Both LlmResponse
and ModelOutput
have a flag called is_simulation_result
that indicates whether the result returned is "real" or not. As explained in Parallelized Inference Pre-Execution (PIPE), Vanna transactions are executed in 2 phases. In the first phase, the transaction is executed in simulation mode to gather and execute all inference requests in the background. Once the results are ready, the transaction is re-executed with the actual inference results. is_simulation_result
indicates whether the transaction is currently being executed in simulation mode or not. When it it set to false
, the value returned is coming from the model, however, when it is set to true
, the value is empty and developers should explicitly handle this scenario in their code.
For example:
function calculateFeeFromModelResult(ModelResult memory result) int128 {
if (result.is_simulation_result) {
// when in simulation, return some sensible default value
return 1;
}
return result.numbers[0].values[0].value * 2;
}
Last updated