At the highest level, NuMind is a tool to teach an AI how to perform a task automatically. By “teaching”, we essentially mean three things:
This is the core of how humans teach each other and is very effective. Note that there are other human-human teaching modalities that are efficient - such as giving explanations - but we decided not to include them for now (they are harder to deal with from a product perspective!).
NuMind currently deals with information extraction tasks such as Classification, Named Entity Recognition (NER), and Structured Extraction. We will extend it to all kinds of NLP tasks later on.
NuMind is industry agnostic. You can use it to extract information from medical reports, legal documents, financial documents, social media posts, chat messages, and so on. Also, NuMind is multilingual.
Ok, let’s now understand what this “AI teaching” is by creating an NER model from scratch.
To create an NER model, we first need to download NuMind on a desktop/laptop. No need for GPUs: NER models are small and run just fine on CPUs. We can then start a project and select the Entity Detection task:
A few hundred documents is typically fine. These documents are stored on the computer and will not be shared.
We can now start to teach the AI. The first step is to “tell the AI what to do”:
We can describe the project, create classes, add keywords, etc. This part is optional and is, in fact, not so useful for NER (unlike for generative tasks).
The next step is to “show the AI what to do”, which means demonstrating the task by annotating a first document:
Here we create the entity types “Lawyer”, “Tribunal”, and “Party” via the text field, and click-and-drag the text to annotate it. After doing so, we click on the “validate” button and three things happen, almost instantaneously:
Note that this procedure happens on the computer. Again, the data is not shared. Here is what the next document looks like after this process:
The dashed borders show the pre-annotations of the model. It correctly identifies the lawyer and the tribunal but fails to identify the parties involved. If we select one of the missing parties, we can see the probabilistic belief of the model about the entity type:
In this case, the model believes that it might be a tribunal with a 25% chance, or a party with a 2% chance. It is clearly wrong since it is a party. So we correct the model by selecting “Party” and clicking “validate.” Again, a new model is trained via fine-tuning and a document is selected for us to correct and improve the AI further.
As we repeat this “correcting the AI’s mistakes” process a few times, we see the model improving, and we quickly only have to make minor corrections, or no corrections at all. Here is the teaching panel once we reach the 20th document:
We can see that the model is doing the task perfectly for this document, and that the performance plot on the bottom right looks pretty good. Such performance after a rather small number of corrections is essentially due to three things:
Here is a depiction of these aspects as part of NuMind’s teaching workflow:
This workflow (that we sometimes call “Interactive AI Development”) allows to obtain good performance with small number of corrections, but also allows to see in real time the performance of the model in order to take important project decisions early on.
Overall, we typically find that after 10 examples of corrections, the custom model is better than a well-prompted (including examples) GPT-3.5, and after 100 examples of correction, it is better than a well prompted GPT-4. The interesting thing is that this custom model has about 100M parameters, which is at least 10000x less than GPT-4. This allows to process large volume of documents at a exceedingly low inference cost and to keep these documents private.
What we just demonstrated is the core of how we teach an AI with NuMind, but there is more to it. One important thing is to review annotated documents to find instances where we disagree with the model. For example let’s look at the 16th document:
The colored frames indicate our annotations while the colored backgrounds indicate model predictions (which are obtained via cross-validation). That way, we can quickly see if the model agrees with us or not. In this case, there is an agreement everywhere except for the name of the lawyer, “Morgan”, for which the model incorrectly believes it does not belong to the “Lawyer” type. In this case, there is nothing to do since we were correct. Sometimes, it is the model that is correct, like in the 10th document where we forgot to annotate a party involved:
This review process is effective at improving the model, and it can be done efficiently by entering the “Review disagreements” mode which will only show you documents for which you and the model disagree. It is also useful if you import data that is already annotated and you want to clean it.
Another thing we can do to improve the model is to actively look for its weaknesses via the model playground. This playground is an editable text field that is continuously updated with model predictions. Here we altered a document until the model could not recognize the lawyer anymore:
To do so, we just had to remove the word “Attorney” and modify the name. Arguably, the model should still figure out that it is a lawyer from the context. Since we managed to break the model, we click “Add to dataset” for the model to learn from this new document.
This interactive debugging is nice to understand how the model works and is effective in making the model more robust.
We already have an idea of how good the model is from correcting it. To go further, we can look at the performance report:
The most important metric to look at is the F-score, which, ideally, should be close to 1. Each class has its own F-score value. We can see that “Party” and “Tribunal” have great F-scores, 0.98 and 1 respectively. However, “Lawyer” is not so great at the moment, only 0.86. This performance should increase as we continue to correct the model, but if that is not the case, it would be good to figure out what is going on. Sometimes, bad performance are caused by inconsistent annotations. Sometimes it is caused by a bad ontology. Sometimes it is just because the task is difficult.
Note that this performance report is computed via cross-validation: models are trained on some documents and tested on others. That way, this performance should be representative of what we will obtain in production.
Once we are happy with our model, it is time to use it!
In the simplest case, we have a bunch of documents that we need to process offline. We can easily do this via the “Process File” button.
In the most common case, we want to deploy our model on a server. We can do this by wrapping it into an API. We first create this API on our machine to make integration tests locally:
This is classic REST API that outputs a JSON containing predictions, probabilities, positions, and so on:
Entry points:
POST http://localhost:64166/entities
data: a single string
ex: curl -d "mytext" -X POST http://localhost:64166/entities
or
data: a JSON Array of strings
ex: curl -d '["text1", "text2"]' -X POST http://localhost:64166/entities
Response schema:
{
"document": str,
"predictions": [
{
"label": str,
"text": str,
"probability": double,
"from": int,
"to": int
},
...
]
}
We can then deploy our model on a server. To do this, we first export the model via the “Export Model” button, and create the API on the server of our choice using a Docker Image.
The nice thing is that the model is less than 1Gb, which means we can use almost any cheap CPU server to host it. Also, note that this deployment is independent from NuMind, no vendor lock-in!
Of course, deploying on your own server requires a bit of work, which is why we are also working on a production cloud for those that do not have strong privacy/confidentiality requirements.
We demonstrated an example of the NER task, but you can also use NuMind for classification, which means assigning a label to a document:
Classification can be useful to detect sentiment, moderate content, or identify the topic. We also created open-source foundation models for these applications, which are used by NuMind.
In some case, we want to be able to assign more than one label for each document, which can be done via the multi-label classification task:
In all these cases, the teaching procedure is the same: you iteratively correct the model until it “gets it”.
At the moment, these are the tasks you can use NuMind for. There is however an extra task that we should release soon: Structured Extraction. This task allows to extract arbitrarily complex information from text and turn it into structured data:
Structured Extraction is the most generic information extraction task, and allows to do pretty much anything you want. We recently released the foundation model for this task, called NuExtract, and are working hard to implement the full task into NuMind.
Alright, that's pretty much what NuMind is and how you can use it. Fell free to try it for yourself and to give us feedback in our Discord server. We hope you will find it useful 🙂!