I installed ensembl-vep as instructed in the documentation:
cd $HOME/vep_data
curl -O https://ftp.ensembl.org/pub/release-110/variation/vep/homo_sapiens_vep_110_GRCh38.tar.gz
tar xzf homo_sapiens_vep_110_GRCh38.tar.gz
When I tied running the program as follows, the following error message was printed, even though the file does exist:
$ sudo docker run -v $HOME/vep_data:/data ensemblorg/ensembl-vep vep --cache --offline --format vcf --vcf --force_overwrite
--input_file input/1000123_23191_0_0.g.vcf --output_file output/my_output.vcf
-------------------- EXCEPTION --------------------
MSG: ERROR: File "input/1000123_23191_0_0.g.vcf" does not exist
STACK Bio::EnsEMBL::VEP::Parser::file /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Parser.pm:237
STACK Bio::EnsEMBL::VEP::Parser::new /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Parser.pm:131
STACK Bio::EnsEMBL::VEP::Runner::get_Parser /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:802
STACK Bio::EnsEMBL::VEP::Runner::get_InputBuffer /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:829
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:136
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/src/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:200
STACK toplevel /opt/vep/src/ensembl-vep/vep:46
Date (localtime) = Wed Aug 2 15:41:03 2023
Ensembl API version = 110
---------------------------------------------------
In your docker container you have a completely different overlay filesystem, and therefor
input/1000123_23191_0_0.g.vcfjust does not exist there. You need to make every directory or drive available within your container first usingvolumes(-v) as you did with the downloaded dataset:-v $HOME/vep_data:/dataAn option would be to add
-v $(pwd)/input:/inputto have everything available inside your container.It's completely fine if you have several
-vparameters in youdocker runcommand, just make sure they are placed betweenrunand the docker-image usedEdit: In this specific case the program assumes that you put everything into $HOME/vep_data. As taken from the Dockerfile you can see that the working directory in the container is
/data. In the example you are trying to reproduce, its assumed, that there is e.g. a /data/input directory which contains the file to be processed. So you ultimately have to put your files under$HOME/vep_data/inputso that the relative paths used in the example are working. Alternative is to use absolute paths and use additional volumes as described above:-v /path/to/input:/input .... --input_file /input/myfile.vcfI personally prefer absolute paths when working with these containers, as I don't want to rely on some structure that might be expected (and change over time with newer versions)