I'm training BertForSequenceClassification for a classification task. My dataset consists of 'contains adverse effect' (1) and 'does not contain adverse effect' (0). The dataset contains all of the 1s and then the 0s after (the data isn't shuffled). For training I've shuffled my data and get the logits. From what I've understood, the logits are the probability distributions before softmax. An example logit is [-4.673831, 4.7095485]. Does the first value correspond to the label 1 (contains AE) because it appears first in the dataset, or label 0. Any help would be appreciated thanks.
How does the BERT model select the label ordering?
1.1k Views Asked by abhishekkuber At
1
There are 1 best solutions below
Related Questions in PYTORCH
- Influence of Unused FFN on Model Accuracy in PyTorch
- Conda CMAKE CXX Compiler error while compiling Pytorch
- Which library can replace causal_conv1d in machine learning programming?
- yolo v5 export to torchscript: how to generate constants.pkl
- Pytorch distribute process across nodes and gpu
- My ICNN doesn't seem to work for any n_hidden
- a problem for save and load a pytorch model
- The meaning of an out_channel in nn.Conv2d pytorch
- config QConfig in pytorch QAT
- Can't load the saved model in PyTorch
- How can I convert a flax.linen.Module to a torch.nn.Module?
- Snuffle in PyTorch Dataloader
- Cuda out of Memory but I have no free space
- Can not load scripted model using torch::jit::load
- Should I train my model with a set of pictures as one input data or I need to crop to small one using Pytorch
Related Questions in BERT-LANGUAGE-MODEL
- The training accuracy and the validation accuracy curves are almost parallel to each other. Is the model overfitting?
- Give Bert an input and ask him to predict. In this input, can Bert apply the first word prediction result to all subsequent predictions?
- how to create robust scraper for specific website without updating code after develop?
- Why are SST-2 and CoLA commonly used datasets for debiasing?
- Is BertForSequenceClassification using the CLS vector?
- How to add noise to the intermediate layer of huggingface bert model?
- Bert Istantiation TypeError: 'NoneType' object is not callable Tensorflow
- tensorflow bert 'tuple' object has no attribute problem
- Data structure in Autotrain for bert-base-uncased
- How to calculate cosine similarity with bert over 1000 random example
- the key did not present in Word2vec
- ResourceExhaustedError In Tensorflow BERT Classifier
- Enhancing BERT+CRF NER Model with keyphrase list
- Merging 6 ONNX Models into One for Unity Barracuda
- What's the exact input size in MultiHead-Attention of BERT?
Related Questions in HUGGINGFACE-TRANSFORMERS
- Text_input is not being cleared out/reset using streamlit
- Hugging Face - What is the difference between epochs in optimizer and TrainingArguments?
- Is BertForSequenceClassification using the CLS vector?
- HUGGINGFACE ValidationError: 1 validation error for StuffDocumentsChain __root__
- How to obtain latent vectors from fine-tuned model with transformers
- Is there a way to use a specific Pytorch model image processor in C++?
- meta-llama/Llama-2-7b-hf returning tensor instead of ModelOutput
- trainer.train doesnt work I am using transformers package and it gives me error like this:
- How to add noise to the intermediate layer of huggingface bert model?
- How can i import the document in Llamaindex
- Obtain prediction score
- How to converting GIT (ImageToText / image captioner ) model to ONNX format
- Encoder-Decoder with Huggingface Models
- How can I fine-tune a language model with negative examples using SFTTrainer?
- Fine tune resnet-50
Related Questions in LOGITS
- Logit function and marginal effects
- How to match values of logits and labels while creating a GAN?
- How to use "logit bias" in llama in meta scripts?
- How to Conduct and Analyze Time Series Ordinal Logistic Regression using R?
- Analyzing BERT-models -- Using raw output logits or softmax values?
- pylogit wide to long data transformation with opt-out alternative
- Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()
- Error During Python Train Data so how to train data
- multiplying conv layer weights( N,C,H,W) with Logits (H,W) pytorch
- cleverhans, tf2, fgsm - how can i pass my LSTM regression model to the fast gradient method function in cleverhans? (logits)
- Can I specify a random intercept in a conditional logit model?
- Predicted Probabilities based on logit model - correct specification?
- How to implement Haldane-Anscombe correction when calculating unadjusted odds ratios in R
- Logit regression including year and industry fixed effects in Python
- Why doesn't tensorflow use `from_logits` automatically where needed?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
The first value corresponds to label 0 and the second value corresponds to label 1. What BertForSequenceClassification does is feeding the output of the pooler to a linear layer (after a dropout which I will ignore in this answer). Let's look at the following example:
Output:
The pooled_output is a tensor of shape [batch_size,hidden_size] and represents the contextualized (i.e. attention was applied)
[CLS]token of your input sequences. This tensor is feed to a linear layer to calculate the logits of your sequence:When we normalize these logits we can see that the linear layer predicts that our input should belong to label 1:
Output (will differ since the linear layer is initialed randomly):
The linear layer applies a linear transformation:
y=xA^T+band you can already see that the linear layer is not aware of your labels. It 'only' has a weights matrix of size [2,768] to produce logits of size [1,2] (i.e.: first row corresponds to the first value and second row to the second):Output:
The BertForSequenceClassification model learns by applying a CrossEntropyLoss. This loss function produces a small loss when the logits for a certain class (label in your case) deviate only slightly from the expectation. That means the CrossEntropyLoss is the one that lets your model learn that the first logit should be high when the input
does not contain adverse effector small when itcontains adverse effect. You can check this for our example with the following:Output: