What is the proper format for uploading a multi-label multi-class classification datasets with text and label in Doccano?

429 Views Asked by Weber Huang At 16 June 2022 at 09:03

I have a question that I'd like to upload datasets to my doccano annotation project in which the labels have been already set beforehand in 8 classes with tags.
I'd like to know what is the correct uploading format of CSV or JSON for multi-label classification datasets with text and label column.
For example, I have 8 classes (a, b, c ,... ,h)
When I upload the file in this kind of format:

| text   | label     |
| ------ | --------- |
| text_1 | [a, b]    |
| text_2 | [a, b ,c] |
| text_3 | [a, c]    |

It is expected for text_1, it will only shows a and b, yet it turn out to be like [a, b]

Another example with screenshot.
0-7 are my project defined classes, in this cases it is expected only showing the correct marks in the labels with tags number 5 and 6. However it return a lot of mixing label list. How do I modify my uploading dataset format to do it?

Original Q&A

There are 1 best solutions below

Weber Huang On 16 June 2022 at 09:52

I found a solution,
there are a lot of mistaken labels in this project since at the beginning I upload the label column in the wrong format "[a, b]" (while it requires array) and it is stored inside the project. This kind of wrong label may mess up the following upload

my debugging step:

delete all labels in label management
re-create the label with tags
re-upload the file with JSON format and it works

Now the annotation is fine like:

What is the proper format for uploading a multi-label multi-class classification datasets with text and label in Doccano?

There are 1 best solutions below

Related Questions in ANNOTATIONS

Related Questions in MULTILABEL-CLASSIFICATION

Related Questions in DOCCANO

Trending Questions

Popular # Hahtags

Popular Questions