All, I've tried to run my CNN Model with Google Colab locally following the steps https://research.google.com/colaboratory/local-runtimes.html "Option 1. Colab Docker runtime image ". The problem is the model will be compiling/training for nearly 8 hours and as you know the timeout defined on Colab's free account is 90 minutes. I got 2 errors while running the commands through Powershell and WSL.
I thought it was an issue related with NVIDIA CUDA drivers, however I got everything up to date (Windows and WSL Ubuntu 20.04) but I cannot even run it thought CMD, anyone can help?
This is my Windows 11 PC settings:
PS C:\\Users\\MyUSER\> systeminfo | findstr /B /C:"OS Name" /B /C:"OS Version"
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.22631 N/A Build 22631
PS C:\\Users\\MyUSER\> docker version
Client: Docker Engine - Community
Cloud integration: 1.0.7
Version: 20.10.2
API version: 1.41
Go version: go1.13.15
Git commit: 2291f61
Built: Mon Dec 28 16:14:16 2020
OS/Arch: windows/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.2
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 8891c58
Built: Mon Dec 28 16:15:28 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
PS C:\\Users\\MYUSER\> nvidia-smi
Sun Mar 17 12:21:48 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.76 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2060 WDDM | 00000000:07:00.0 On | N/A |
| 0% 47C P8 14W / 168W | 1561MiB / 6144MiB | 17% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1984 C+G ...ces\\Razer Central\\Razer Central.exe N/A |
| 0 N/A N/A 2448 C+G ...es\\Docker\\Docker\\Docker Desktop.exe N/A |
| 0 N/A N/A 7728 C+G C:\\Windows\\explorer.exe N/A |
| 0 N/A N/A 8832 C+G ...nt.CBS_cw5n1h2txyewy\\SearchHost.exe N/A |
| 0 N/A N/A 8856 C+G ...2txyewy\\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 12880 C+G ...5n1h2txyewy\\ShellExperienceHost.exe N/A |
| 0 N/A N/A 15232 C+G ...les\\Microsoft OneDrive\\OneDrive.exe N/A |
| 0 N/A N/A 15416 C+G ...t.LockApp_cw5n1h2txyewy\\LockApp.exe N/A |
| 0 N/A N/A 16844 C+G ...GeForce Experience\\NVIDIA Share.exe N/A |
| 0 N/A N/A 18024 C+G ...US\\ArmouryDevice\\asus_framework.exe N/A |
| 0 N/A N/A 20008 C+G ...CBS_cw5n1h2txyewy\\TextInputHost.exe N/A |
| 0 N/A N/A 20392 C+G ...les\\Microsoft OneDrive\\OneDrive.exe N/A |
| 0 N/A N/A 22912 C+G ...on\\122.0.2365.92\\msedgewebview2.exe N/A |
| 0 N/A N/A 23692 C+G ...l\\Microsoft\\Teams\\current\\Teams.exe N/A |
| 0 N/A N/A 25144 C+G ...l\\Microsoft\\Teams\\current\\Teams.exe N/A |
| 0 N/A N/A 26272 C+G ...m Files\\Mozilla Firefox\\firefox.exe N/A |
| 0 N/A N/A 26436 C+G ...crosoft\\Edge\\Application\\msedge.exe N/A |
| 0 N/A N/A 26472 C+G ...\\cef\\cef.win7x64\\steamwebhelper.exe N/A |
| 0 N/A N/A 26732 C+G ...\__8wekyb3d8bbwe\\WindowsTerminal.exe N/A |
| 0 N/A N/A 27920 C+G ...m Files\\Mozilla Firefox\\firefox.exe N/A |
| 0 N/A N/A 29656 C+G ...ft Office\\root\\Office16\\ONENOTE.EXE N/A |
| 0 N/A N/A 29780 C+G ...\\Docker\\frontend\\Docker Desktop.exe N/A |
| 0 N/A N/A 31416 C+G ...ekyb3d8bbwe\\PhoneExperienceHost.exe N/A |
+-----------------------------------------------------------------------------------------+
PS C:\\Users\\MyU\>
\`
And the error output:
\`
PS C:\\Users\\MYUSER\> docker run -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime
OOM detection from /sys/fs/cgroup/memory.events is not supported in this environment.
OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 2 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 3 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 4 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 6 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 7 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 8 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 9 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
OpenBLAS blas_thread_init: pthread_create failed for thread 11 of 12: Operation not permitted
OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
Traceback (most recent call last):
File "\<string\>", line 1, in \<module\>
File "/usr/local/lib/python3.10/dist-packages/google/colab/__init__.py", line 20, in \<module\>
from google.colab import \_reprs
File "/usr/local/lib/python3.10/dist-packages/google/colab/\_reprs.py", line 13, in \<module\>
import numpy as np
File "/usr/local/lib/python3.10/dist-packages/numpy/__init__.py", line 140, in \<module\>
from . import core
File "/usr/local/lib/python3.10/dist-packages/numpy/core/__init__.py", line 23, in \<module\>
from . import multiarray
File "/usr/local/lib/python3.10/dist-packages/numpy/core/multiarray.py", line 10, in \<module\>
from . import overrides
File "/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py", line 6, in \<module\>
from numpy.core.\_multiarray_umath import (
KeyboardInterrupt
PS C:\\Users\\MYUSER\> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
PS C:\\Users\\MUYSER\> docker run --gpus=all -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda\>=12.2, please update your driver to a newer version, or use an earlier cuda container: unknown
\`
Also I got WSL Ubuntu 20.04
\`
MYUSER@ME:\~$ uname -r
5.10.102.1-microsoft-standard-WSL2
MYUSER@ME:\~$ docker version
Client: Docker Engine - Community
Cloud integration: 1.0.7
Version: 20.10.2
API version: 1.41
Go version: go1.13.15
Git commit: 2291f61
Built: Mon Dec 28 16:17:34 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.2
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 8891c58
Built: Mon Dec 28 16:15:28 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
MYUSER@ME:\~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sde 251G 12G 227G 5% /
none 7.8G 23M 7.8G 1% /mnt/wslg
none 7.8G 399M 7.5G 5% /mnt/wsl
/dev/sdd 251G 31G 208G 13% /mnt/wsl/docker-desktop-data/isocache
none 7.8G 8.0K 7.8G 1% /mnt/wsl/docker-desktop/shared-sockets/host-services
/dev/sdc 251G 126M 239G 1% /mnt/wsl/docker-desktop/docker-desktop-proxy
/dev/loop0 384M 384M 0 100% /mnt/wsl/docker-desktop/cli-tools
tools 931G 575G 357G 62% /init
none 7.8G 0 7.8G 0% /dev
none 7.8G 8.0K 7.8G 1% /run
none 7.8G 0 7.8G 0% /run/lock
none 7.8G 0 7.8G 0% /run/shm
none 7.8G 0 7.8G 0% /run/user
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
none 7.8G 76K 7.8G 1% /mnt/wslg/versions.txt
none 7.8G 76K 7.8G 1% /mnt/wslg/doc
drivers 931G 575G 357G 62% /usr/lib/wsl/drivers
lib 931G 575G 357G 62% /usr/lib/wsl/lib
drvfs 931G 575G 357G 62% /mnt/c
drvfs 231G 212G 20G 92% /mnt/d
drvfs 932G 856G 76G 92% /mnt/g
MYUSER@ME:\~$ lspci | grep -i nvidia
MYUSER@ME:\~$ uname -r
5.10.102.1-microsoft-standard-WSL2
MYUSER@ME:\~$ uname -m && cat /etc/\*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
MYUSER@ME:\~$ nvidia-smi
Sun Mar 17 12:21:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2060 On | 00000000:07:00.0 On | N/A |
| 0% 46C P8 14W / 168W | 1564MiB / 6144MiB | 11% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
MYUSR@ME:\~$
\`
And the error output looks similar:
MYSUER@ME:~$ docker run --gpus=all -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.2, please update your driver to a newer version, or use an earlier cuda container: unknown. ERRO[0002] error waiting for container: context canceled MYSUER@ME:~$ docker run -p 127.0.0.1:9000:8080 us-docker.pkg.dev/colab-images/public/runtime OOM detection from /sys/fs/cgroup/memory.events is not supported in this environment. OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 2 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 3 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 4 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 6 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 7 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 8 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 9 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max OpenBLAS blas_thread_init: pthread_create failed for thread 11 of 12: Operation not permitted OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.10/dist-packages/google/colab/__init__.py", line 20, in <module> from google.colab import _reprs File "/usr/local/lib/python3.10/dist-packages/google/colab/_reprs.py", line 13, in <module> import numpy as np File "/usr/local/lib/python3.10/dist-packages/numpy/__init__.py", line 140, in <module> from . import core File "/usr/local/lib/python3.10/dist-packages/numpy/core/__init__.py", line 23, in <module> from . import multiarray File "/usr/local/lib/python3.10/dist-packages/numpy/core/multiarray.py", line 10, in <module> from . import overrides File "/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py", line 6, in <module> from numpy.core._multiarray_umath import ( KeyboardInterrupt