Converting machine learning models to ONNX

Machine learning frameworks are a pain. They are big and complicated and if you want GPU support you are in to a world of pain. I believe this is one of the big reasons we are seeing limited availability of AI on the desktop. $300+ applications or outrageous subscription-based web-services are way too common.

Another issue is that it is often difficult to see exactly how they work as the model structure is embedded in (usually) custom Python code and if it is some framework you are not too familiar with it gets very difficult to see what is going on.

So I want to try to convert some projects to ONNX, which is a interchangeable model format for Machine Learning, which stores all the operations needed and the trained parameters in a graph format. You can use tools such as Netron to visualize them to see how they work.

So I have two goals, to get a better understanding of some ML based projects and to create documented ONNX files which makes it easier for other people (and myself) to use. I will start with two projects DeepCreamPy (defunct) for inpainting and Waifu2x for old super resolution. Maybe ESRGAN later, as I know there is a small community of making game upscalers with it. Suggestions are welcome as well.

Inpainting

DeepCreamPy, while intended for more risqué applications, I believe is just trained to do general inpainting of anime. My main question here is how this CNN handles arbitrary masked areas.

The project is based on Keras which is a high-level interface to TensorFlow. And there is converter available: GitHub - onnx/keras-onnx: Convert tf.keras/Keras models to ONNX
Like many of these converters, you need to embed a function call into the Python code which traces the in-memory model. So you need to be able to run the original code in order for this to work.
And despite Python having a package manager and dependencies should just work, actually getting this to work was much more tricky than I expected. Version combability isn’t great, you easily end up with incompatible packages and unintelligible error messages.

After much work, I finally got it working and I ended up with a 256MB model I could inspect with Netron:


A rather large and complicated model… And some of it seems to be the converter not being quite optimal as well. Here is a closeup:

Simplifying the model

I tried some existing tools like: GitHub - daquexian/onnx-simplifier: Simplify your onnx model but this did not change anything. So instead I found this project which tries to make it easier to manually modify the ONNX files: GitHub - scailable/sclblonnx: Scailable ONNX python tools
It is not very great and I still have some warnings in the resulting ONNX files I haven’t resolved yet, but it works.

First things first, this model has three inputs, the first being the image, the second being the mask, and the third being the mask? The last input isn’t even connected to anything, but the code did require three inputs… So removing that is a good start.
The model has these ReduceMean->Split->Concat->Split->Concat sections I did not quite understand. Resaving the ONNX file with fixed input sizes makes it possible to see the size after each operation and that made it much clearer:


It takes the mean value of each of the (here 64) channels, and then repeats that value to fill the entire image plane. And this is one of the central operations in the models, taking the mask and applying it on the internal convolutions and then rescaling them based on the mean.
The silly thing is that expanding the 1x1x1x64 array to do a division with a 1x256x256x64 array is completely unnecessary as the div operation will handle this just fine. So I can just remove those without changing anything:

Next are all those Transpose operations before and after a Convolution. They are taking the Batch x Width x Height x Channels array and changing it into a Batch x Channels x Width x Height array. The convolution then works on the Width x Height bit and then there is another Transpose to put it back.
I wanted to try to embed the Transpose into the Convolution, but this might actually not be possible. So instead I tried to move the Transpose operations in the network such that you could simply skip a Transpose->Transpose connection. And a lot of manual work later, the only Transposes left were for the input and output, so I ended up changing the input and output dimensions to match this dimension order to remove all of them:

Another minor optimzation I could do was to move some of the mean multiplication before the upscale operations:

The last optimization required a bit more insight into what is happening. This model is based on a U-net architecture. The basic idea is to take the input image and then downscale it several times, and do one or more convolutions at each resolution scale. Then you upscale it again, combine it with the result from the previous resolution and then do another convolution.
For this model, you have a set of matching downscales and upscales for both the image and the mask, and then the mask is used with the ReduceMean, mul, div sections to mask the results at each resolution step.
But if you look at weights for the convolutions on the mask, they are all 1.0. They are in fact just used reduced the masked regions in the mask to match the area of effect the main convolution on the image has. And you have these large convolutions 512x1024x3x3 filled with 1.0s. Since they are all hardcoded like that, I could replace all of them with 1x1x3x3 and 1x2x3x3 convolutions instead without affecting the result. And this halved the file size to ~128 MB!

Results

It did require quite a bit of manual work, but working on the network directly like this was very interesting. I have a much better understanding of the network structure and operation than if I had tried to read the code and it allowed me to do simplifications I would have never noticed otherwise.
Next I would like to try adding extra outputs to make some visualisation to see if I can get some understanding of its internal representation.

I will share the code for running the models later (perhaps Python, perhaps C++, maybe both?), but for now you can take a look at the ONNX models and try them out for yourself here:

https://drive.google.com/drive/folders/1Rc_VxLYXKapwpde_7haCI6j4_wiooitE?usp=sharing

I will touch them up a bit at some point through. There are also some of the Waifu2x models for 2x upscaling, there was nothing interesting to say about those though as they are simple and converting them was easy as well using this project (without having to run the original code): GitHub - htshinichi/caffe-onnx: caffe model convert to onnx model

1 Like