What is the proper format for uploading a multi-label multi-class classification datasets with text and label in Doccano?

429 Views Asked by At

I have a question that I'd like to upload datasets to my doccano annotation project in which the labels have been already set beforehand in 8 classes with tags.
I'd like to know what is the correct uploading format of CSV or JSON for multi-label classification datasets with text and label column.
For example, I have 8 classes (a, b, c ,... ,h)
When I upload the file in this kind of format:

| text   | label     |
| ------ | --------- |
| text_1 | [a, b]    |
| text_2 | [a, b ,c] |
| text_3 | [a, c]    |

It is expected for text_1, it will only shows a and b, yet it turn out to be like [a, b]

Another example with screenshot.
0-7 are my project defined classes, in this cases it is expected only showing the correct marks in the labels with tags number 5 and 6. However it return a lot of mixing label list. How do I modify my uploading dataset format to do it?
enter image description here

1

There are 1 best solutions below

0
Weber Huang On

I found a solution,
there are a lot of mistaken labels in this project since at the beginning I upload the label column in the wrong format "[a, b]" (while it requires array) and it is stored inside the project. This kind of wrong label may mess up the following upload

  • my debugging step:
  1. delete all labels in label management
  2. re-create the label with tags
  3. re-upload the file with JSON format and it works

Now the annotation is fine like:
enter image description here