Data Labeling at scale is an important concern for an organisation since creating labels on large data sets by hand is often slow and expensive. Current state-of-the-art solutions for object labeling are done by humans. A typical approach is to use some sort of model-assisted labeling. The existing ML model is used to predict where the objects are and what their labels are. A human input is just needed to verify the labels. This is definitely faster than marking all the objects by hand. However, this approach will work only if there exists a ML model which can detect decently the objects. In the beginning of the project, all the objects must be manually labeled to create the first dataset and the first version of the model. This is a very slow process especially if one is preparing a dataset for an object detector in which case one has to mark the object positions in an image. Would it possible to automate this more?
As we are a mobile application team our solution is of course to make a mobile application for every problem we face. Our idea is to record a short video of objects and automatically track the objects from all the video frames. The user only has to mark the objects in the first video frame and the rest of the video frames are labeled automatically. We will save all the video frames to get a lot of labeled images.
The first step is to capture a short video of the object(s) we want to label. It is important to capture it from different angles so that we get variations to the labels. We don’t need too many very similar images associated with the same label which is what we get from a video. Therefore we decided to record videos with 10fps in VGA resolution. It is also important to notice that all the labeled objects must be visible in the first video frame.
Next step is to mark the objects. Our user interface works as followed: a user simply swipes from the upper-left corner to the lower-right corner of the desired object to mark said object. By using these coordinates, we can draw a rectangle around the object.
We were also lucky to have an image classifier from an earlier project which can identify our objects (we are now building an object detector for which we need a new dataset). So every time a user marks an object, we will cut the object out of the image and send it to the image classifier which gives us the correct label for the object. So now we can quickly mark the objects on the first video frame and also get their labels instantly.
Now it is time analyse the whole video. A well known algorithm for image tracking in computer vision is Lucas-Kanade algorithm. It calculates an optical flow of the objects in successive images. The assumption is that two images are separated by a small time increment ∆t in such a way that the objects in it have not moved by a significant amount. The algorithm works by trying to analyse in which direction an object has moved so that local changes in intensity can be explained. The algorithm does not scan the second image looking for an exact match for a given pixel like people often think of this kind of matching algorithms.
So each video frame is fed to the object tracker. It calculates the position of all the objects which we marked at the first frame based on the object information in the previous frame. At the end of this process we have all the video frames saved as images with an annotation file. This object tracking is very compute-intensive process so we decided to make it by using C++ in our Android application.
Review of the Labels
The last step in the labeling process is to review and save the results. The mobile application shows all the labeled frames to the end user and he can choose which of them are taken to the dataset. Sometimes there are too similar frames so one may choose not to pick all of them. We noticed that we don’t want to save all the frames, but instead it is better for example to take only every 5th frame to get more diversity.
By using this method we can quickly create a lot of object labels and send them directly to the ML training process. The labeler creates the labels for a given object, do the labeling instantly and transfer the labels to the dataset. So this mobile application approach actually changes the way a labeler work and what his task is. We can share our labeling application with the experts who know their objects best and they can create the labels for us.