NuMind is Out!

TLDR

NuMind is now public, you can use it for free, and join our Discord server.
NuMind is a tool to create custom information-extraction models (classification, multi-label classification, NER, and soon Structured Extraction).
Custom models are <1Gb and reach higher performance than the best generic LLMs.
NuMind relies on in-house open-source foundation models, automatic machine learning, and active learning.
Data and computations are kept local.
NuMind is multi-industry & multi-lingual.

A Tool to Teach AIs

At the highest level, NuMind is a tool to teach an AI how to perform a task automatically. By “teaching”, we essentially mean three things:

Telling the AI what to do
Showing the AI what to do
Iteratively correcting the AI’s mistakes

This is the core of how humans teach each other and is very effective. Note that there are other human-human teaching modalities that are efficient - such as giving explanations - but we decided not to include them for now (they are harder to deal with from a product perspective!).

NuMind currently deals with information extraction tasks such as Classification, Named Entity Recognition (NER), and Structured Extraction. We will extend it to all kinds of NLP tasks later on.

NuMind is industry agnostic. You can use it to extract information from medical reports, legal documents, financial documents, social media posts, chat messages, and so on. Also, NuMind is multilingual.

Ok, let’s now understand what this “AI teaching” is by creating an NER model from scratch.

Creating an NER Model

Starting a Project

To create an NER model, we first need to download NuMind on a desktop/laptop. No need for GPUs: NER models are small and run just fine on CPUs. We can then start a project and select the Entity Detection task:

Information extraction tasks that can be tackled with NuMind.

A few hundred documents is typically fine. These documents are stored on the computer and will not be shared.

Teaching the AI

We can now start to teach the AI. The first step is to “tell the AI what to do”:

Project information. “Tell the AI what to do” phase.

We can describe the project, create classes, add keywords, etc. This part is optional and is, in fact, not so useful for NER (unlike for generative tasks).

The next step is to “show the AI what to do”, which means demonstrating the task by annotating a first document:

First document annotated. "Show the AI what to do" phase.

Here we create the entity types “Lawyer”, “Tribunal”, and “Party” via the text field, and click-and-drag the text to annotate it. After doing so, we click on the “validate” button and three things happen, almost instantaneously:

A custom model is created (via fine-tuning) using the annotated example
An active learning procedure selects the next document to annotate
The selected document is pre-annotated by the custom model

Note that this procedure happens on the computer. Again, the data is not shared. Here is what the next document looks like after this process:

Second documents. Dashed borders show the pre-annotations of the model.

The dashed borders show the pre-annotations of the model. It correctly identifies the lawyer and the tribunal but fails to identify the parties involved. If we select one of the missing parties, we can see the probabilistic belief of the model about the entity type:

The model incorrectly believes that "OceanView Resorts" is not one of the parties involved.

In this case, the model believes that it might be a tribunal with a 25% chance, or a party with a 2% chance. It is clearly wrong since it is a party. So we correct the model by selecting “Party” and clicking “validate.” Again, a new model is trained via fine-tuning and a document is selected for us to correct and improve the AI further.

As we repeat this “correcting the AI’s mistakes” process a few times, we see the model improving, and we quickly only have to make minor corrections, or no corrections at all. Here is the teaching panel once we reach the 20th document:

Teaching panel after correcting 19 documents. The model is correct about the 20th documents, and shows decent performance.

We can see that the model is doing the task perfectly for this document, and that the performance plot on the bottom right looks pretty good. Such performance after a rather small number of corrections is essentially due to three things:

The foundation model, NuNER, is good at the NER task, we wrote a paper about it.
The automatic machine learning figures out the right regularization and calibration via cross-validation.
The active learning procedure selected the most informative documents to annotate.

Here is a depiction of these aspects as part of NuMind’s teaching workflow:

NuMind teaching workflow. Allows to create high-quality custom models with small amount of effort.

This workflow (that we sometimes call “Interactive AI Development”) allows to obtain good performance with small number of corrections, but also allows to see in real time the performance of the model in order to take important project decisions early on.

Overall, we typically find that after 10 examples of corrections, the custom model is better than a well-prompted (including examples) GPT-3.5, and after 100 examples of correction, it is better than a well prompted GPT-4. The interesting thing is that this custom model has about 100M parameters, which is at least 10000x less than GPT-4. This allows to process large volume of documents at a exceedingly low inference cost and to keep these documents private.

Reviewing Disagreements

What we just demonstrated is the core of how we teach an AI with NuMind, but there is more to it. One important thing is to review annotated documents to find instances where we disagree with the model. For example let’s look at the 16th document:

The model wrongly believes that "Morgan" is not part of the Lawyer entity.

The colored frames indicate our annotations while the colored backgrounds indicate model predictions (which are obtained via cross-validation). That way, we can quickly see if the model agrees with us or not. In this case, there is an agreement everywhere except for the name of the lawyer, “Morgan”, for which the model incorrectly believes it does not belong to the “Lawyer” type. In this case, there is nothing to do since we were correct. Sometimes, it is the model that is correct, like in the 10th document where we forgot to annotate a party involved:

The model correctly believes that "Mrs. Gray" is a party involved. We were wrong.

This review process is effective at improving the model, and it can be done efficiently by entering the “Review disagreements” mode which will only show you documents for which you and the model disagree. It is also useful if you import data that is already annotated and you want to clean it.

Breaking the AI

Another thing we can do to improve the model is to actively look for its weaknesses via the model playground. This playground is an editable text field that is continuously updated with model predictions. Here we altered a document until the model could not recognize the lawyer anymore:

Model Playground. We managed to break the model by modifying an example.

To do so, we just had to remove the word “Attorney” and modify the name. Arguably, the model should still figure out that it is a lawyer from the context. Since we managed to break the model, we click “Add to dataset” for the model to learn from this new document.

This interactive debugging is nice to understand how the model works and is effective in making the model more robust.

Estimating Performance

We already have an idea of how good the model is from correcting it. To go further, we can look at the performance report:

Performance report. The model is excellent for the "Tribunal" and "Party" classes, but could be better for the "Lawyer" class.

The most important metric to look at is the F-score, which, ideally, should be close to 1. Each class has its own F-score value. We can see that “Party” and “Tribunal” have great F-scores, 0.98 and 1 respectively. However, “Lawyer” is not so great at the moment, only 0.86. This performance should increase as we continue to correct the model, but if that is not the case, it would be good to figure out what is going on. Sometimes, bad performance are caused by inconsistent annotations. Sometimes it is caused by a bad ontology. Sometimes it is just because the task is difficult.

Note that this performance report is computed via cross-validation: models are trained on some documents and tested on others. That way, this performance should be representative of what we will obtain in production.

Deploying the AI

Once we are happy with our model, it is time to use it!

In the simplest case, we have a bunch of documents that we need to process offline. We can easily do this via the “Process File” button.

In the most common case, we want to deploy our model on a server. We can do this by wrapping it into an API. We first create this API on our machine to make integration tests locally:

This is classic REST API that outputs a JSON containing predictions, probabilities, positions, and so on:

Entry points:

POST http://localhost:64166/entities
  data: a single string
  ex: curl -d "mytext" -X POST http://localhost:64166/entities
or
  data: a JSON Array of strings
  ex: curl -d '["text1", "text2"]' -X POST http://localhost:64166/entities

Response schema:
{
  "document": str,
  "predictions": [
    {
      "label": str,
      "text": str,
      "probability": double,
      "from": int,
      "to": int
    },
    ...
  ]
}

We can then deploy our model on a server. To do this, we first export the model via the “Export Model” button, and create the API on the server of our choice using a Docker Image.

The nice thing is that the model is less than 1Gb, which means we can use almost any cheap CPU server to host it. Also, note that this deployment is independent from NuMind, no vendor lock-in!

Of course, deploying on your own server requires a bit of work, which is why we are also working on a production cloud for those that do not have strong privacy/confidentiality requirements.

NuMind Tasks

We demonstrated an example of the NER task, but you can also use NuMind for classification, which means assigning a label to a document:

Classification can be useful to detect sentiment, moderate content, or identify the topic. We also created open-source foundation models for these applications, which are used by NuMind.

In some case, we want to be able to assign more than one label for each document, which can be done via the multi-label classification task:

In all these cases, the teaching procedure is the same: you iteratively correct the model until it “gets it”.

At the moment, these are the tasks you can use NuMind for. There is however an extra task that we should release soon: Structured Extraction. This task allows to extract arbitrarily complex information from text and turn it into structured data:

Structured Extraction is the most generic information extraction task, and allows to do pretty much anything you want. We recently released the foundation model for this task, called NuExtract, and are working hard to implement the full task into NuMind.

Try it out!

Alright, that's pretty much what NuMind is and how you can use it. Fell free to try it for yourself and to give us feedback in our Discord server. We hope you will find it useful 🙂!

Get Started

NuExtract 1.5 - Multilingual, Infinite context, still small, and better than GPT-4o!

We introduce NuExtract 1.5, the new version of our foundation model for structured extraction. NuExtract 1.5 is multilingual, can handle arbitrarily long documents, and outperforms GPT-4o in English while being 500 times smaller. As usual, we release it under MIT license.

October 14, 2024

Research

NuExtract: A Foundation Model for Structured Extraction

We introduce NuExtract, a lightweight text-to-JSON LLM. NuExtract allows to extract arbitrarily complex information from text and turns it into structured data. This model can be directly used in a zero-shot setting or fine-tuned to solve a specific extraction problem. As usual, we open-source it under MIT license for everyone to use.

June 24, 2024