I am working on Roberta Pretrained model for a custom named entity recognition task, and for this, I am using TrainingArguments, Trainer classes for training. While reading the input files (sentences and labels), encoding="UTF-8" is used. But trainer.train() function gives the "UnicodeEncodeError: 'charmap' codec can't encode characters in position 5063-5066: character maps to " error.
And everything stops, after showing the following error: `
ConnectionResetError Traceback (most recent call last)
File ~\anaconda3\Lib\site-packages\wandb\sdk\wandb_init.py:438, in _WandbInit._pause_backend(self, *args, **kwargs)
436 if self.backend.interface is not None:
437 logger.info("pausing backend") # type: ignore
--> 438 self.backend.interface.publish_pause()
File ~\anaconda3\Lib\site-packages\wandb\sdk\interface\interface.py:656, in InterfaceBase.publish_pause(self)
654 def publish_pause(self) -> None:
655 pause = pb.PauseRequest()
--> 656 self._publish_pause(pause)
File ~\anaconda3\Lib\site-packages\wandb\sdk\interface\interface_shared.py:355, in InterfaceShared._publish_pause(self, pause)
353 def _publish_pause(self, pause: pb.PauseRequest) -> None:
354 rec = self._make_request(pause=pause)
--> 355 self._publish(rec)
File ~\anaconda3\Lib\site-packages\wandb\sdk\interface\interface_sock.py:51, in InterfaceSock._publish(self, record, local)
49 def _publish(self, record: "pb.Record", local: Optional[bool] = None) -> None:
50 self._assign(record)
---> 51 self._sock_client.send_record_publish(record)
File ~\anaconda3\Lib\site-packages\wandb\sdk\lib\sock_client.py:221, in SockClient.send_record_publish(self, record)
219 server_req = spb.ServerRequest()
220 server_req.record_publish.CopyFrom(record)
--> 221 self.send_server_request(server_req)
File ~\anaconda3\Lib\site-packages\wandb\sdk\lib\sock_client.py:155, in SockClient.send_server_request(self, msg)
154 def send_server_request(self, msg: Any) -> None:
--> 155 self._send_message(msg)
File ~\anaconda3\Lib\site-packages\wandb\sdk\lib\sock_client.py:152, in SockClient._send_message(self, msg)
150 header = struct.pack("<BI", ord("W"), raw_size)
151 with self._lock:
--> 152 self._sendall_with_error_handle(header + data)
File ~\anaconda3\Lib\site-
packages\wandb\sdk\lib\sock_client.py:130, in
SockClient._sendall_with_error_handle(self, data)
128 start_time = time.monotonic()
129 try:
--> 130 sent = self._sock.send(data)
131 # sent equal to 0 indicates a closed socket
132 if sent == 0:
ConnectionResetError: [WinError 10054] An existing connection was
forcibly closed by the remote host`
I tried character encoding: 'UTF-8' , 'cp437', but nothing seems to be working. Every time, the code stops after flashing the error mentioned above.
I am new to Wandb, and I don't know what to do next. Any thoughts/suggestions?