Creating ready-to-use AI training datasets – with AIQ

The creators of Pascal and Harvey, two AI-based platforms that are revolutionizing compliance screening and scientific desk research respectively, are onto something new. And the Dutch province of Gelderland has awarded them a research grant to chase their idea. What’s it about? Creating datasets that are suitable for training AI algorithms. Not with the usual massive human effort by data scientist, but with AI itself. They call it AIQ.
Header image for Creating ready-to-use AI training datasets – with AIQ

Vartion wins provincial research grant for feasibility project

Artificial intelligence is an immensely powerful technique that can perform all kinds of data-heavy processes in a fraction of the time that humans can. Creating and training an AI algorithm, however, is itself a painstaking process. The algorithm must learn to take certain decisions independently, and success depends very much on the training dataset used. It has to be sufficiently large and of good quality (reproducible, representative, low signal-to-noise ratio, etc.). It has to have all the relevant items labelled in advance, and it must be cleansed of inaccuracies and bias. A good training dataset is essential to achieve “explainable AI”, where users trust the results because they can trace all the agorithm’s decision making steps. It ensures that an AI algorithm ultimately does what it is meant to do – and nothing else.

 

It’s our dream to be the first to be able to use AI to produce quality-certified datasets.

Until now, data scientists have been doing this work manually. Collecting, cleaning up and validating data for training purposes costs a great deal of time and money. The creation of a training dataset typically takes up 80 to 90 %of the total development time of an AI application. As a consequence, many organisations that could benefit from AI technology, but don’t have the expertise in house, have not yet taken the plunge. This got the scientists of Vartion thinking. What if you could find a way to reduce the time and effort needed to create an AI training dataset, so data scientists can spend their time on other important tasks? The solution they came up with – being AI experts – was to build an AI algorithm for this task. An algorithm that can assess data quality. They called this AIQ.

Gelderland awards research grant
Based on this idea, the company applied for a research grant from their home province Gelderland, and recently they got the good news that their application had been honoured. With a grant Vartion is going to explore whether it is feasible to automate the process of making publicly available data suitable for training purposes. The project will also assess whether this can be turned into an economically viable product.

Vartion’s COO/CTO Patrick Croonen is delighted with the support from Gelderland province and excited about the project and its potential benefits. “We’re hoping to eventually use this algorithm first of all to support our own AI development processes. We’re constantly innovating our existing platforms Pascal and Harvey, and AIQ can help us do so even faster and better. Beyond that, it’s our dream to be the first to be able to use AI to produce quality-certified datasets for the training of new AI applications.”