ONNX runtime on C++

5 min readOct 31, 2023

If you want to infer visual machine learning models with C, start with building OpenCV, which provides a plentitude of valuable functions for computer vision. If you still need to do that, refer to my previous article on how to do it.

You might be tempted to use the DNN functions in the OpenCV, which may work in many cases. But they aren’t updated as vigorously as directly supported libraries and thus may lack support on newer developments or less popular machine learning functions and layers.

Suppose you find your machine learning model unable to be inferred by the OpenCV DNN functions. In this case, you should try to inference it with a directly supported library, and this is the reason why I wrote this story/tutorial. I spent hours trying to start using the Onnxruntime library, so I think this could help to speed up your development if you are doing it the first time.

Setting up

Unlike building OpenCV, we can get pre-build ONNX Runtime with GPU support with NuGet. If you’re using Visual Studio, it’s in “Tools> NuGet Package Manager> Manage NuGet packages for solution” and browse for “Microsoft.ML.OnnxRuntime.Gpu”. Always make sure your CUDA and CuDNN version matches the version you install. While this is the simpler way when I first wrote this article, it seems harder to use after I revisited it. Now I will recommend the the method below.

Another way to do it is to download pre-built version from their github page. Note that currently(2/25/2024) , there are two CUDA versions available, one for CUDA 11 and one for CUDA 12, choose carefully. Some pictures on how to link them if you are unfamilier with it.

Furthermore, you will have to either copy all the DLLs into your working folder or adding all the directories into environment path.

After setting up, you can try

#include <onnxruntime_cxx_api.h>
void main(){
  //check providers
  auto providers = Ort::GetAvailableProviders();
  for (auto provider : providers) {
    std::cout << provider << std::endl;
  }
}

If you cannot run this or is experiencing an error 0xc00007b. Check where there is an “onnxruntime.dll” in your system32. Delete it.

Setup session options

To run a .onnx file, we start by filling in the settings.

Ort::Env env = Ort::Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "Default");
Ort::SessionOptions sessionOptions;
OrtCUDAProviderOptions cuda_options;

sessionOptions.SetInterOpNumThreads(1);
sessionOptions.SetIntraOpNumThreads(1);
// Optimization will take time and memory during startup
sessionOptions.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_DISABLE_ALL);
// CUDA options. If used.
if (_UseCuda)
{
  cuda_options.device_id = 0;  //GPU_ID
  cuda_options.cudnn_conv_algo_search = OrtCudnnConvAlgoSearchExhaustive; // Algo to search for Cudnn
  cuda_options.arena_extend_strategy = 0;
  // May cause data race in some condition
  cuda_options.do_copy_in_default_stream = 0;
  sessionOptions.AppendExecutionProvider_CUDA(cuda_options); // Add CUDA options to session options
}
try{
  // Model path is const wchar_t*
  Ort::Session session = Ort::Session(env, ModelPath, sessionOptions);
}
catch (Ort::Exception oe) {
  std::cout << "ONNX exception caught: " << oe.what() << ". Code: " << oe.GetOrtErrorCode() << ".\n";
  return -1;
}

Ort::MemoryInfo memory_info{ nullptr };     // Used to allocate memory for input
try {
  memory_info = std::move(Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault));
}
catch (Ort::Exception oe) {
  std::cout << "ONNX exception caught: " << oe.what() << ". Code: " << oe.GetOrtErrorCode() << ".\n";
  return -1;
}

Input and output

To run a model in ONNX Runtime, you must know the input and output of your model. To learn about it, you can use Netron to help you.

A part of YOLOv7 from https://github.com/WongKinYiu/yolov7

Still, we can use ONNX Runtime directly to find the names, but they aren’t effortless.

 
 
// Demonstration of getting input node info by code
size_t num_input_nodes = 0;
std::vector<const char*>* input_node_names = nullptr; // Input node names
std::vector<std::vector<int64_t>> input_node_dims;    // Input node dimension.
ONNXTensorElementDataType type;                       // Used to print input info
Ort::TypeInfo* type_info;

num_input_nodes = session.GetInputCount();
input_node_names = new std::vector<const char*>;
for (int i = 0; i < num_input_nodes; i++) {

  char* tempstring = new char[strlen(session.GetInputNameAllocated(i, allocator).get()) + 1];
  strcpy_s(tempstring, strlen(session.GetInputNameAllocated(i, allocator).get()) + 1, session.GetInputNameAllocated(i, allocator).get());
  input_node_names->push_back(tempstring);
  type_info = new Ort::TypeInfo(session.GetInputTypeInfo(i));
  auto tensor_info = type_info->GetTensorTypeAndShapeInfo();
  type = tensor_info.GetElementType();
  input_node_dims.push_back(tensor_info.GetShape());

  // print input shapes/dims
  if (_Debug) {
    printf("Input %d : name=%s\n", i, input_node_names->back());
    printf("Input %d : num_dims=%zu\n", i, input_node_dims.back().size());
    for (int j = 0; j < input_node_dims.back().size(); j++)
       printf("Input %d : dim %d=%jd\n", i, j, input_node_dims.back()[j]);
       printf("Input %d : type=%d\n", i, type);
     }
  delete(type_info);
}

 // Set output node name explicitly
 output_node_names.push_back("output");

Preprocess the input

Now, we make the input image to match the model's input. In my case, it is 1x3x640x640.

Note that Ort::Value objects do not have to be deleted. They are capable of handling themselves.

 std::vector<Ort::Value> inputTensor;        // Onnxruntime allowed input

 // this will make the input into 1,3,640,640
 cv::Mat blob = cv::dnn::blobFromImage(image, 1 / 255.0, cv::Size(640, 640), (0, 0, 0), false, false);
 size_t input_tensor_size = blob.total();
 try {
  inputTensor.emplace_back(Ort::Value::CreateTensor<float>(memory_info, (float*)blob.data, input_tensor_size, input_node_dims[0].data(), input_node_dims[0].size()));
 }
 catch (Ort::Exception oe) {
  std::cout << "ONNX exception caught: " << oe.what() << ". Code: " << oe.GetOrtErrorCode() << ".\n";
  return -1;
 }

Run inference

Finally, we can run the model. Remember that the input and output depends on your model, so the same method of making sense of the result doesn’t always work.

 try {
   outputTensor = session.Run(Ort::RunOptions{ nullptr }, input_node_names->data(), inputTensor.data(), inputTensor.size(), output_node_names.data(), 1);
 }
 catch (Ort::Exception oe) {
   std::cout << "ONNX exception caught: " << oe.what() << ". Code: " << oe.GetOrtErrorCode() << ".\n";
   return result;
 }
 // Pushing the results
 if (outputTensor.size() > 0) {
  float* arr = outputTensor.front().GetTensorMutableData<float>();
  for (int i = 0; i < outputTensor.front().GetTensorTypeAndShapeInfo().GetElementCount(); i += 7) {
    detected = New Detected(
    static_cast<int>(arr[i + 5]), // Class type
    arr[i + 6],                   // Accuracy
    arr[i + 1] / (float)ModelInputImageSize.width * original.cols,      // X
    arr[i + 2] / (float)ModelInputImageSize.height * original.rows,     // Y
    (arr[i + 3] - arr[i + 1]) / (float)ModelInputImageSize.width * (float)original.cols,  // W
    (arr[i + 4] - arr[i + 2]) / (float)ModelInputImageSize.height * (float)original.rows  // H
   );
   result.push_back(NewDetected);
    } 
 }
 inputTensor.clear();

 // Always remember to delete 
 delete(input_tensor_values);

As I am running YOLOv7 from https://github.com/WongKinYiu/yolov7, my result looks like this. If you experienced an stating that ONNX Runtime library failed to load, check you CUDA installation.

A frame in https://github.com/intel-iot-devkit/sample-videos person-bicycle-car-detection.mp4.

Thank you for watching this hastily written article! Don’t hesitate to question me anything about the article!

If you want the full code go to my github.

https://github.com/JINSCOTT/Simple-ONNX-runtime-c-example