Luminoth is an open source toolkit for computer vision. It supports object detection and it’s built in Python, using TensorFlow. The code is open source and available on GitHub.
The first thing is being familiarized with the Luminoth CLI tool, that is, the tool that you interact with using the lumi command. This is the main gate to Luminoth, allowing you to train new models, evaluate them, use them for predictions, manage the checkpoints and more.
If we want Luminoth to predict the objects present in one of pictures (image.jpg). The way to do that is by running the following command:
1 | lumi predict image.jpg |
The result will be like this:
1 | Found 1 files to predict. |
Since you didn’t tell Luminoth what an “object” is for you, nor have taught it how to recognize said objects. So one way to do this is to use a pre-trained model that has been trained to detect popular types of objects. E.g., it can be a model trained with COCO dataset or Pascal VOC. What’s more, each pre-trained model might be associated with a different algorithm. The checkpoints correspond to the weights of a particular model (Faster R-CNN or SSD), trained with a particular dataset. The case of “accurate” is just a label for a particular Deep Learning model underneath, here, Faster R-CNN, trained with images from the COCO dataset.
After the checkpoint download, with these commands we can output everything (a resulting image and a json file) to a preds directory:
1 | mkdir preds |
First, run the lumi checkpoint refresh command, so Luminoth knows about the checkpoints that it has available for download. After refreshing the local index, you can list the available checkpoints running lumi checkpoint list:
1 | ================================================================================ |
Here, you can see the “accurate” checkpoint and another “fast” checkpoint that is the SSD model trained with Pascal VOC dataset. Let’s get some information about the “accurate” checkpoint by the following command:
1 | lumi checkpoint info e1c2565b51e9 |
or
1 | lumi checkpoint info accurate |
If getting predictions for an image or video using a specific checkpoint (e.g., fast) you can do so by using the –checkpoint parameter:
1 | lumi predict img.jpg --checkpoint fast -f preds/objects.json -d preds/ |
Luminoth includes a simple web frontend so you can play around with detected objects in images using different thresholds. To launch this, simply type lumi server web and then open your browser at http://localhost:5000. If you are running on an external VM, you can do lumi server web –host 0.0.0.0 –port
In order to use a custom dataset, we must first transform whatever format your data is in, to TFRecords files (one for each split — train, val, test). Luminoth reads datasets natively only in TensorFlow’s TFRecords format. This is a binary format that will let Luminoth consume the data very efficiently. Fortunately, Luminoth provides several CLI tools for transforming popular dataset format (such as Pascal VOC, ImageNet, COCO, CSV, etc.) into TFRecords.
We should start by downloading the annotation files (this and this, for train) and the class description file.
After we get the class-descriptions-boxable.csv file, we can go over all the classes available in the OpenImages dataset and see which ones are related to traffic dataset. The following were hand-picked after examining the full file:
1 | /m/015qff,Traffic light |
Luminoth includes a dataset reader that can take OpenImages format, the dataset reader expects a particular directory layout so it knows where the files are located. In this case, files corresponding to the examples must be in a folder named like their split (train, test, …). So, you should have the following:
1 | . |
Then run the following command:
1 | lumi dataset transform \ |
This will generate TFRecord file for the train split:
1 | INFO:tensorflow:Saved 360 records to "./out/train.tfrecords" |
Training orchestration, including the model to be used, the dataset location and training schedule, is specified in a YAML config file. This file will be consumed by Luminoth and merged to the default configuration, to start the training session.
You can see a minimal config file example in sample_config.yml. This file illustrates the entries you’ll most probably need to modify, which are:
1) train.run_name: the run name for the training session, used to identify it.
2) train.job_dir: directory in which both model checkpoints and summaries (for TensorBoard consumption) will be saved. The actual files will be stored under
3) dataset.dir: directory from which to read the TFRecord files.
4) model.type: model to use for object detection (fasterrcnn, or ssd).
5) network.num_classes: number of classes to predict (depends on your dataset).
For looking at all the possible configuration options, mostly related to the model itself, you can check the base_config.yml file.
Probably the most important setting for training is the learning rate. You will most likely want to tune this depending on your dataset, and you can do it via the train.learning_rate setting in the configuration. For example, this would be a good setting for training on the full COCO dataset:
1 | learning_rate: |
To get to this, you will need to run some experiments and see what works best.
1 | train: |
Assuming you already have both your dataset (TFRecords) and the config file ready, you can start your training session by running the command as follows:
1 | lumi train -c config.yml |
You can use the -o
option to override any configuration option using dot notation (e.g. -o model.rpn.proposals.nms_threshold=0.8). If you are using a CUDA-based GPU, you can select the GPU to use by setting the CUDA_VISIBLE_DEVICES
environment variable.
As the training progresses, Luminoth will periodically save a checkpoint with the current weights of the model. The files will be output in your
Generally, datasets (like OpenImages, which we just used) provide “splits”. The “train” split is the largest, and the one from which the model actually does the learning. Then, you have the “validation” (or “val”) split, which consists of different images, in which you can draw metrics of your model’s performance, in order to better tune your hyperparameters. Finally, a “test” split is provided in order to conduct the final evaluation of how your model would perform in the real world once it is trained.
Let’s start by building TFRecords from the validation split of OpenImages. For this, we can download the files with the annotations and use the same lumi dataset transform that we used to build our training data.
In your dataset folder (where the class-descriptions-boxable.csv is located), run the following commands:
1 | mkdir validation |
After the downloads finish, we can build the TFRecords with the following:
1 | lumi dataset transform \ |
In Luminoth, lumi eval will make a run through your chosen dataset split (ie. validation or test), and run the model through every image, and then compute metrics like loss and mAP. If you are lucky and happen to have more than one GPU in your machine, it is advisable to run both train and eval at the same time.
Start by running the evaluation:
1 | lumi eval --split validation -c custom.yml |
Mean Average Precision (mAP) is the metric commonly used to evaluate object detection task. It computes how well your classifier works across all classes, mAP will be a number between 0 and 1, and the higher the better. Moreover, it can be calculated across different IoU (Intersection over Union) thresholds. For example, Pascal VOC challenge metric uses 0.5 as threshold (notation mAP@0.5), and COCO dataset uses mAP at different thresholds and averages them all out (notation mAP@[0.5:0.95]). Luminoth will print out several of these metrics, specifying the thresholds that were used under this notation.
TensorBoard is a very good tool for this, allowing you to see plenty of plots with the training related metrics. By default, Luminoth writes TensorBoard summaries during training, so you can leverage this tool without any effort:
1 | tensorboard --logdir <job_dir>/<run_name> |
If you are running from an external VM, make sure to use --host 0.0.0.0
and --port
if you need other one than the default 6006.
First, go to the “Scalars” tab. You are going to see several tags.
Here, you will get the same loss values that Luminoth computes for the train, but for the chosen dataset split (validation, in this case). As in the case of train, you should mostly look at validation_losses/no_reg_loss.
These will be the mAP metrics that will help you judge how well your model perform:
The mAP values refer to the entire dataset split, so it will not jump around as much as other metrics.
You can also use lumi server web command that we have seen before and try your partially trained model in a bunch of novel images. For this, you can launch it with a config file like:
1 | lumi server web -c config.yml |
Here you can also use –host and –port options.
We can create checkpoints and set some metadata like name, alias, etc. This time, we are going to create the checkpoint for our traffic model:
1 | lumi checkpoint create \ |
You can verify that you do indeed have the checkpoint when running lumi checkpoint list, which should get you an output similar to this:
1 | ================================================================================ |
Moreover, if you inspect the ~/.luminoth/checkpoints/
folder, you will see that now you have a folder that corresponds to your newly created checkpoint. Inside this folder are the actual weights of the model, plus some metadata and the configuration file that was used during training.
Simply run lumi checkpoint export cb0e5d92a854
. You will get a file named cb0e5d92a854.tar in your current directory, which you can easily share to somebody else. By running lumi checkpoint import cb0e5d92a854.tar
, the checkpoint will be listed locally.
Calling Luminoth from your Python app is very straightforward.
1 | from luminoth import Detector, read_image, vis_objects |
Hope you enjoyed the simple tutorial! :)
Computer-Vision, Object-Detection — Feb 14, 2020
Made with ❤️ and ☀️ on Earth.