In search of my own ML model inference library

3 min readSep 23, 2024

Since the last few stories, I have been trying to create some way to infer trained machine-learning models on edge devices. I am glad to say that I have reached a milestone as I am able to run the DTLN model with pure C code and only ONEMKL as a dependency. Sadly, my code is written too hastily and I can’t keep all the variables' names coherent and clean of debug codes, furthermore, I haven’t finished
“ifdef” ing platform-specific code, and I should implement fall-back functions for every operation, so my pride in programming doesn't want me to make it fully public “yet”. However, I do not have enough time due to starting of the semester for a graduate program, but I am eager to share what I have learned. I do have a small working example possible to try out, and the link is here https://github.com/JINSCOTT/DTLN-in-C.

Project environment and dependency

While the goal seems to be to perform these computations on an edge device, I am still in the concept development phase, thus I program and deploy it on my normal work computer. As stated previously, I am still using oneMKL thus I am still programming on an Intel CPU, and to be frank, I am still using Windows as OS. The reason to use oneMKL is that it provides a GEMM implementation, and the Discrete Fourier transform(DFT) is necessary for me to run the DTLN model. I wish to use FFTW3 and openBLAS to fill in the place of oneMKL on Linux-based platforms if I have the time.

The Library: from top to bottom

To use the library, put all the files and the generated header into your project as one of its components.

From top to bottom, the hierarchy is “model”, “nodes”, and “ops”. The nodes consist of one or multiple “tensor”s. There are also other files that contain data structures and utility functions.

The generated header file contains a function to create a “model”. It pushes the “nodes”, the type of layers, and their associated attributes onto a linked list. Then the “inference_model” function is used to traverse the linked list and run the nodes one by one. The nodes are run by using the “function_ops” and “ops”, the first one is for functions with tenors as input and the second one is for raw pointer inputs.

The GitHub repo I provided is an example of how it is envisioned to be deployed.

Current Implemented data types

Current implemented nodes

Here is a list of implemented nodes, they should match the definition of ONNX, however, many of them aren’t very well-tested and are only partially implemented, I might try to make them better if I have time. Here is the list of currently implemented functions.

Final words

From now on I will try to enhance every part of them and create a better description of each of the parts, meanwhile, I’m learning HPC and have a better understanding of OS, hopefully, I will be much more resourceful in optimizing code in the near future.

If you are interested, please give me advice or tell me what things you want to me clarify. Go check the code if you think it might be useful for your use case!

In search of my own ML model inference library

Project environment and dependency

The Library: from top to bottom

Current implemented nodes

Final words

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Scott Jin

No responses yet