Setting up an image annotation web platform

Published by Samy Doloris on


The importance of data

Most of the time, when you hear about Artificial Intelligence (AI), people talk about new algorithms or even the computation power needed to train them.
Some of the most famous examples are often about new AI algorithms beating pro players in competitive games such as Go or Chess, or even popular products that embed artificial intelligence in one form or the other (e.g.: Google Assistant, Apple Siri, Amazon Echo).
Yet, artificial intelligence does not rely only on algorithms and computation power. In fact, today’s rise of AI is due to mainly 4 different factors:

Four key success factors of AI

Data is one of the most important factors in AI.
There is a direct relationship between the success of an AI project and the quality and amount of data available. Gartner estimates that up to 85% of those projects will fail with, among other, data quality as a reason.

Data labelling

Data labelling is one of the key processes to build powerful models. In fact, supervised learning, which is the most classical and frequently used machine learning subfield aims at approximating the mathematical function that explains at the relationship between input and output.
This obviously requires annotated data and the more of them, the better!

A web platform to label images

A lot of industry application of machine learning relies on images a huge majority of them can not rely on open data.
That and the motivations given above drove us to look into what has been done and what would suit our needs.

Looking at open-source projects, we found Make Sense which offered a bunch of features that we were looking for:

  • User-friendly web platform (no installation required)
  • Bounding boxes, points and polygon annotations
  • Does not require a backend on its own
  • Well designed

We adapted or modified some features and added our own to match our needs, for example: we wanted to simplify as much as we could its use and modified “Export” option so that it directly uploads the images and its annotations on a storage of our own.
We also added a project picker which automatically import a set of predefined labels. This is useful so that our annotators use the exact same notations seemlessly.
The website is completely hosted on AWS: an S3 bucket to host the (static) website with AWS CloudFront as content delivery network. We also store the annotated images on an S3 bucket and we implemented a Flask server that handles uploading those files on the bucket.

We made some more changes but I invite you to go through the website yourself and see how such a platform can be useful: LegIAnnotate!

We are still developing more functionalities and plan to keep doing so as things progress but we are obviously open to any suggestion or questions!