Nowadays, many Deep Learning (DL) tools are available on the market. Therefore, various Data Science teams use different DL libraries and frameworks to train and execute Neural Networks (NNs). Unfortunately, sharing is quite challenging since each DL tool has its format to store a Neural Network.
This makes sense. We are still in the infancy of deep learning, so there are no set standards in terms of formats. Different vendors and organizations are competing to set the standard.
For example, we at Hasty are very much a PyTorch company. Still, for the past two years, we have worked on many different projects and had to work with various frameworks and formats for our customers. We found that having a solid understanding of NN storing formats and DL frameworks is an advantage. It can help you make the right architecture decisions when developing your next AI solution.
So, if you are new in the industry or want to learn more about DL frameworks and their NN storing formats - this is the post for you.
Deep Learning model formats
The DS community rarely talks about the difficulties when sharing DL projects, so it might be good to get an overview of why this is a potential issue if you are new to the space. Let’s check the number of different DL tools you might encounter when working on an ML project.
- TensorFlow/Keras is an open-source software library for ML and AI with the primary focus on the training and inference of deep neural networks (DL framework);
- PyTorch is also a Deep Learning framework;
- MXNet is yet another DL framework;
- Theano is a Python library for fast numerical computations that can be executed on the CPU or GPU;
- Caffe is a Deep Learning framework;
- DeepLearning4j is a software library with comprehensive support of DL algorithms;
- Apache TVM is an open-source Machine Learning compiler framework for CPUs, GPUs, and ML accelerators;
- MATLAB is a desktop environment and programming language used to analyze and visualize the data, develop algorithms, and create models;
- PaddlePaddle is a deep learning framework that might be unknown for most of us in the western world but is huge in China.
Each of the tools presented above has its serialization format (or even a couple of them). Therefore, it is difficult to provide an exact number of existing formats because the DL tools’ market constantly expands, bringing new storage approaches. Most organizations pick one or two of these tools, but sometimes you are not that lucky. For us, we have had many requests from customers when it comes to deploying to specific hardware. In these cases, you have often limited to the formats the hardware supports, which means that we had to learn and live with many formats from the tools outlined above.
The second question that might pop into your head is why there are so many formats. To answer that, let’s discuss socket outlets. You might know there are many styles of plugs and wall outlets across the globe. According to Google, there are no less than 15 different styles. What stands behind such diversity? Many countries preferred to develop a plug of their own instead of adopting the US standard. Partly, we can attribute that to national and regional pride, but there were also plenty of improvements that addressed the existing disadvantages of the US standard.
Drawing a parallel between plugs and DL formats, there are many NN storing formats because every developer wants his framework to be the best, leading to many competing formats. Today, there is no unified standard for an ML format, so Data Science teams can get creative when developing their approach and even focus on it to be their competitive advantage.
Still, as it regularly happens, the availability of such many different formats does not mean every single one of them is widely used. In general, in most AI teams, you will likely use one or all of the following:
- PyTorch + accompanying formats;
- TensorFlow/Keras + accompanying formats;
If you have ever worked on a Deep Learning project, you should have heard of TensorFlow - the classic open-source platform for ML. The Data Science community usually associates TensorFlow with Keras since Keras is a high-level API that runs on top of TensorFlow. So, when talking about ML formats, we will consider TensorFlow and Keras as a unity.
In general, every DL model consists of multiple components:
- Model’s architecture which describes model’s layers and connections between them;
- Weights (parameter tensors) - parameters a model learned while training;
- An optimizer and its state;
- A set of losses and metrics.
As you can see, there is a variety of possible information about a model you might want to store, for example, all of these components at once or some individual pieces. The Keras API provides many storing capabilities such as:
- Saving all the model’s information into a single archive using either TensorFlow SavedModel or Keras H5 format. This approach is the most popular and is considered the standard practice;
- Saving the model’s architecture only in the form of a JSON file;
- Saving the current state of the model while training. Usually, you can only save the weights’ values in such a case.
SavedModel format is the most comprehensive model storing format you can get when using TensorFlow as your DL framework.
With a single line of code, you get a folder with model architecture, weights, and training configuration such as metrics, losses, and optimizer. Moreover, SavedModel also stores the traced TensorFlow subgraphs of the call functions, allowing Keras to restore built-in layers and custom objects.
Still, you can save the entire model in a single file using the older Keras H5 format. Such a format is a lightweight alternative to SavedModel. The file contains the model’s architecture, weights, and optimizer.
However, the H5 format does not store information about metrics, losses, and the computation graph of custom objects. Thus, when importing a model from an h5 file, Keras will need access to custom objects’ Python classes or functions to reconstruct the model entirely.
As for storing the model’s architecture, it is easier to do in the JSON format. With the to_json function, your model will turn into a JSON string that can be further saved via the Python JSON module. This approach is beneficial since you can upload the file without the original model class.
json_config = model.to_json()
Additionally, when training a model, you might want to save the model’s state on each epoch. Fortunately, the Keras API allows you to store weights in the TF Checkpoint or H5 formats.
These formats use similar approaches to the weights storage. For example, TF Checkpoint can be represented as a Python dictionary where the layers are the keys, and the weights are the values. As for the H5 format, it stores weights grouped by layer names. The parameters are presented in lists ordered by concatenating the trainable weights list to the non-trainable weights list.
PyTorch is a Python open-source Deep Learning framework that has two key features. Firstly, it is good at tensor computation that can be accelerated using GPUs. Secondly, PyTorch allows you to build deep neural networks on a tape-based autograd system and has a dynamic computation graph. PyTorch is a well-known, tested, and popular deep learning framework among Data Scientists. It is commonly used in Kaggle competitions and by various DS teams across the globe. You can store PyTorch’s DL models in files with the pth or pt extensions.
When saving a model, PyTorch heavily relies on Python’s pickle module. In fact, torch.save() and torch.load() functions simply use pickle.dump() and pickle.load(). One of the most effective ways to save a PyTorch model is to use a model’s state_dict attribute.
A state_dict is a Python dictionary that maps each layer to its parameter tensors. So, state_dict contains information about all layers with learnable parameters. Python dictionary can be easily pickled, unpickled, updated, and restored. Moreover, you can expand the dictionary and manually add the optimizer state, hyperparameters, etc., as key-value pairs along with the model’s state_dict to resume training later. As a result, PyTorch strongly recommends this approach because of its flexibility. In a real-life scenario, it is frequently used to save a model for inference.
Still, you can save the model object itself. By the way, it is also done with the help of the pickle module.
Despite this approach requiring the least amount of code, it is less popular because the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. It happens because pickle keeps a path to the class file used during load times, not the model class itself. As a result, you might face various errors when using the serialized file in a new environment.
Also, there is such a valuable PyTorch feature as TorchScript. Torchscript helps Data Scientists to create serializable and optimizable models from PyTorch code. These models can be further saved and exported from a Python environment and then loaded in a process with no Python dependency. So, when using TorchScript, you can serialize ML models in such a way to later run them outside of Python, which allows you to embed your models in various production environments like mobile or IoT devices.
From the code perspective, saving a model in a TorchScript format is straightforward.
The format includes code, parameters, attributes, and debug information. In other words, within the archive, you have a model representation that you can load in an entirely separate process.
Open Neural Network Exchange (ONNX) is the open-source standard for representing traditional Machine Learning and Deep Learning models. If you want to learn more about ONNX specifications, please refer to their official website or GitHub page.
In general, ONNX’s philosophy is as follows:
- Build and train a Machine Learning model using the framework of your choice;
- Export the model to ONNX format. Nowadays, many ML tools support ONNX. For example, you can directly export your model to ONNX when working with TensorFlow/Keras, PyTorch, MXNet, or Hasty.;
- Accelerate inferencing using a supported runtime;
- Convert from ONNX format to the desired framework.
So, ONNX allows Data Scientists to create models in their framework of choice without worrying about the deployment environment. Also, in some cases, through ONNX, you can even convert a model from one framework to another. Moreover, the ONNX format massively accelerates inferencing if you choose the correct runtime (for example, check out the ONNX runtime).
However, ONNX is too young to live up to the lofty expectations you might have. First, it might be challenging to convert complex models to ONNX (it might require adjusting the code depending on the model's architecture and implementation). Second, ONNX's support of converting models from one framework to another does not work as well as it sounds. The majority of frameworks do not support an import of ONNX models. Additionally, if you want to convert a model from ONNX to another format, you might need to use additional software like the onnx2pytorch Python library. Still, some ML tools, such as Apache TVM, support ONNX and believe it has potential, so it is just a matter of time and community contribution before ONNX unveils its true power.
There are so many ML storing formats that it is impossible to cover all of them over a single post. In the sections above, we talked about the formats we at Hasty encountered the most. Still, we want to mention other storage approaches which are of interest to us.
As mentioned above, Apache TVM is a powerful tool to accelerate NNs’ performance. Its popularity has increased for the past couple of years, so you might accidentally come across TVM’s neural network format consisting of 3 separate files. For example, let’s assume you want to convert an ONNX model to TVM. What will you get?
- model.so file represents the model in the form of a C++ library that the TVM runtime can load;
- model.json file is a text representation of the TVM Relay computation graph;
- model.params file contains the parameter tensors for the pre-trained model.
Also, if you have ever wondered whether all specialists across the industry work in Python, you should know that MATLAB and R are still very popular among students and R&D researchers. So, if you ever have to encounter either of these tools, you should know that:
- MATLAB supports ONNX. You can import your neural network into MATLAB through an ONNX file which is very convenient. Moreover, you also can export NN from MATLAB if you ever need to;
- R has a great H2O package that supports exporting models in MOJO (Model Object, Optimized) and POJO (Plain Old JAVA Object) formats that can be used in any JAVA environment.
Additionally, we have used Caffe for one of our projects and wanted to mention their caffemodel format. With Caffe as a DL framework, you will create and save your models as plain text prototxt files. However, when the model is trained, it will be saved in a caffemodel file, a binary protocol buffer file. Therefore, files with the caffemodel extension cannot be opened or edited in a source code editor.
So many formats and frameworks exist. We only touched on a few in this article. But two that are much more popular than the rest - PyTorch and TensorFlow. With that in mind, the safest bet for any newcomers is to stick with what has the broadest adoption.
However, needs are different across the organizations, and for some, other formats will be preferable. For example, if you are a manufacturer working with Texas Instruments hardware, you need to go with Caffe.
We hope this was interesting and that you got some insight into the lively and ever-changing world of deep-learning model formats. With that said, let us quickly do a…
How Hasty can help (aka a shameless plug)
For those of you that are looking for an ML platform - look no further! Hasty is a vision AI platform that helps you throughout the ML lifecycle. To date, we can help you with:
- Automating up to 90% of all automation
- Make quality control 35x faster
- Train models directly on your data using our low-code model builder
- Take any custom models trained in Hasty and deploy them back to the annotation environment in one click
- Export any models you create in commonly used formats
- Or host any model in our cloud
- Monitor inferences made in production
- Most importantly, we offer all this through an API for easy integration.
In short, we take care of a lot of the MLOps so you don’t have to. Book a demo if you want to know more.