Running Colab model locally - "OOM detection from /sys/fs/cgroup/memory.events is not supported in this environment."

21 Views Asked by At

All, I've tried to run my CNN Model with Google Colab locally following the steps https://research.google.com/colaboratory/local-runtimes.html "Option 1. Colab Docker runtime image ". The problem is the model will be compiling/training for nearly 8 hours and as you know the timeout defined on Colab's free account is 90 minutes. I got 2 errors while running the commands through Powershell and WSL.

I thought it was an issue related with NVIDIA CUDA drivers, however I got everything up to date (Windows and WSL Ubuntu 20.04) but I cannot even run it thought CMD, anyone can help?

This is my Windows 11 PC settings:

PS C:\\Users\\MyUSER\> systeminfo | findstr /B /C:"OS Name" /B /C:"OS Version"
OS Name:                   Microsoft Windows 11 Pro
OS Version:                10.0.22631 N/A Build 22631

PS C:\\Users\\MyUSER\> docker version
Client: Docker Engine - Community
Cloud integration: 1.0.7
Version:           20.10.2
API version:       1.41
Go version:        go1.13.15
Git commit:        2291f61
Built:             Mon Dec 28 16:14:16 2020
OS/Arch:           windows/amd64
Context:           default
Experimental:      true

Server: Docker Engine - Community
Engine:
Version:          20.10.2
API version:      1.41 (minimum version 1.12)
Go version:       go1.13.15
Git commit:       8891c58
Built:            Mon Dec 28 16:15:28 2020
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          1.4.3
GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version:          1.0.0-rc92
GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version:          0.19.0
GitCommit:        de40ad0

PS C:\\Users\\MYUSER\> nvidia-smi
Sun Mar 17 12:21:48 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76                 Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060      WDDM  |   00000000:07:00.0  On |                  N/A |
|  0%   47C    P8             14W /  168W |    1561MiB /   6144MiB |     17%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1984    C+G   ...ces\\Razer Central\\Razer Central.exe      N/A      |
|    0   N/A  N/A      2448    C+G   ...es\\Docker\\Docker\\Docker Desktop.exe      N/A      |
|    0   N/A  N/A      7728    C+G   C:\\Windows\\explorer.exe                     N/A      |
|    0   N/A  N/A      8832    C+G   ...nt.CBS_cw5n1h2txyewy\\SearchHost.exe      N/A      |
|    0   N/A  N/A      8856    C+G   ...2txyewy\\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A     12880    C+G   ...5n1h2txyewy\\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A     15232    C+G   ...les\\Microsoft OneDrive\\OneDrive.exe      N/A      |
|    0   N/A  N/A     15416    C+G   ...t.LockApp_cw5n1h2txyewy\\LockApp.exe      N/A      |
|    0   N/A  N/A     16844    C+G   ...GeForce Experience\\NVIDIA Share.exe      N/A      |
|    0   N/A  N/A     18024    C+G   ...US\\ArmouryDevice\\asus_framework.exe      N/A      |
|    0   N/A  N/A     20008    C+G   ...CBS_cw5n1h2txyewy\\TextInputHost.exe      N/A      |
|    0   N/A  N/A     20392    C+G   ...les\\Microsoft OneDrive\\OneDrive.exe      N/A      |
|    0   N/A  N/A     22912    C+G   ...on\\122.0.2365.92\\msedgewebview2.exe      N/A      |
|    0   N/A  N/A     23692    C+G   ...l\\Microsoft\\Teams\\current\\Teams.exe      N/A      |
|    0   N/A  N/A     25144    C+G   ...l\\Microsoft\\Teams\\current\\Teams.exe      N/A      |
|    0   N/A  N/A     26272    C+G   ...m Files\\Mozilla Firefox\\firefox.exe      N/A      |
|    0   N/A  N/A     26436    C+G   ...crosoft\\Edge\\Application\\msedge.exe      N/A      |
|    0   N/A  N/A     26472    C+G   ...\\cef\\cef.win7x64\\steamwebhelper.exe      N/A      |
|    0   N/A  N/A     26732    C+G   ...\__8wekyb3d8bbwe\\WindowsTerminal.exe      N/A      |
|    0   N/A  N/A     27920    C+G   ...m Files\\Mozilla Firefox\\firefox.exe      N/A      |
|    0   N/A  N/A     29656    C+G   ...ft Office\\root\\Office16\\ONENOTE.EXE      N/A      |
|    0   N/A  N/A     29780    C+G   ...\\Docker\\frontend\\Docker Desktop.exe      N/A      |
|    0   N/A  N/A     31416    C+G   ...ekyb3d8bbwe\\PhoneExperienceHost.exe      N/A      |
+-----------------------------------------------------------------------------------------+
PS C:\\Users\\MyU\>
\`

And the error output:

\`
PS C:\\Users\\MYUSER\> docker run -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime
OOM detection from /sys/fs/cgroup/memory.events is not supported in this environment.
OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 2 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 3 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 4 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 6 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 7 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 8 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 9 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 11 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
Traceback (most recent call last):
File "\<string\>", line 1, in \<module\>
File "/usr/local/lib/python3.10/dist-packages/google/colab/__init__.py", line 20, in \<module\>
from google.colab import \_reprs
File "/usr/local/lib/python3.10/dist-packages/google/colab/\_reprs.py", line 13, in \<module\>
import numpy as np
File "/usr/local/lib/python3.10/dist-packages/numpy/__init__.py", line 140, in \<module\>
from . import core
File "/usr/local/lib/python3.10/dist-packages/numpy/core/__init__.py", line 23, in \<module\>
from . import multiarray
File "/usr/local/lib/python3.10/dist-packages/numpy/core/multiarray.py", line 10, in \<module\>
from . import overrides
File "/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py", line 6, in \<module\>
from numpy.core.\_multiarray_umath import (
KeyboardInterrupt

PS C:\\Users\\MYUSER\> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
PS C:\\Users\\MUYSER\> docker run --gpus=all -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda\>=12.2, please update your driver to a newer version, or use an earlier cuda container: unknown
\`

Also I got WSL Ubuntu 20.04

\`
MYUSER@ME:\~$ uname -r
5.10.102.1-microsoft-standard-WSL2
MYUSER@ME:\~$  docker version
Client: Docker Engine - Community
Cloud integration: 1.0.7
Version:           20.10.2
API version:       1.41
Go version:        go1.13.15
Git commit:        2291f61
Built:             Mon Dec 28 16:17:34 2020
OS/Arch:           linux/amd64
Context:           default
Experimental:      true

Server: Docker Engine - Community
Engine:
Version:          20.10.2
API version:      1.41 (minimum version 1.12)
Go version:       go1.13.15
Git commit:       8891c58
Built:            Mon Dec 28 16:15:28 2020
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          1.4.3
GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version:          1.0.0-rc92
GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version:          0.19.0
GitCommit:        de40ad0
MYUSER@ME:\~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sde        251G   12G  227G   5% /
none            7.8G   23M  7.8G   1% /mnt/wslg
none            7.8G  399M  7.5G   5% /mnt/wsl
/dev/sdd        251G   31G  208G  13% /mnt/wsl/docker-desktop-data/isocache
none            7.8G  8.0K  7.8G   1% /mnt/wsl/docker-desktop/shared-sockets/host-services
/dev/sdc        251G  126M  239G   1% /mnt/wsl/docker-desktop/docker-desktop-proxy
/dev/loop0      384M  384M     0 100% /mnt/wsl/docker-desktop/cli-tools
tools           931G  575G  357G  62% /init
none            7.8G     0  7.8G   0% /dev
none            7.8G  8.0K  7.8G   1% /run
none            7.8G     0  7.8G   0% /run/lock
none            7.8G     0  7.8G   0% /run/shm
none            7.8G     0  7.8G   0% /run/user
tmpfs           7.8G     0  7.8G   0% /sys/fs/cgroup
none            7.8G   76K  7.8G   1% /mnt/wslg/versions.txt
none            7.8G   76K  7.8G   1% /mnt/wslg/doc
drivers         931G  575G  357G  62% /usr/lib/wsl/drivers
lib             931G  575G  357G  62% /usr/lib/wsl/lib
drvfs           931G  575G  357G  62% /mnt/c
drvfs           231G  212G   20G  92% /mnt/d
drvfs           932G  856G   76G  92% /mnt/g
MYUSER@ME:\~$ lspci | grep -i nvidia
MYUSER@ME:\~$ uname -r
5.10.102.1-microsoft-standard-WSL2
MYUSER@ME:\~$ uname -m && cat /etc/\*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
MYUSER@ME:\~$ nvidia-smi
Sun Mar 17 12:21:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060        On  |   00000000:07:00.0  On |                  N/A |
|  0%   46C    P8             14W /  168W |    1564MiB /   6144MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
MYUSR@ME:\~$
\`

And the error output looks similar:

MYSUER@ME:~$ docker run --gpus=all -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.2, please update your driver to a newer version, or use an earlier cuda container: unknown. ERRO[0002] error waiting for container: context canceled MYSUER@ME:~$ docker run -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime OOM detection from /sys/fs/cgroup/memory.events is not supported in this environment. OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 2 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 3 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 4 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 6 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 7 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 8 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 9 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 11 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max Traceback (most recent call last):   File "<string>", line 1, in <module>   File "/usr/local/lib/python3.10/dist-packages/google/colab/__init__.py", line 20, in <module>     from google.colab import _reprs   File "/usr/local/lib/python3.10/dist-packages/google/colab/_reprs.py", line 13, in <module>     import numpy as np   File "/usr/local/lib/python3.10/dist-packages/numpy/__init__.py", line 140, in <module>     from . import core   File "/usr/local/lib/python3.10/dist-packages/numpy/core/__init__.py", line 23, in <module>     from . import multiarray   File "/usr/local/lib/python3.10/dist-packages/numpy/core/multiarray.py", line 10, in <module>     from . import overrides   File "/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py", line 6, in <module>     from numpy.core._multiarray_umath import ( KeyboardInterrupt
0

There are 0 best solutions below