NCBI blast+ from commandline not recognizing the blastn command

643 Views Asked by At

I've been trying to generate HCR probes using the script provided here, which should be a pretty straithforward, user friendly method, and I'd like to get this ready for the whole lab to use. I have some experience in python, but never used NCBI blast+ before.

I'm running into an issue when trying to blast the probes generated.. For some reason it errors on the 'cline()' command. I've installed (pip) and imported the cline package, but it is no use.. Any idea's where the problem lies?

I haven't had to specify the location of NCBI Blast+ executable, which I've read on other pages might solve it, but I wouldn't know how to integrate it in such a complex code.. it is currently installed on my C:/ drive in program files..

Any suggestions are more than welcome !

I tried pip installing cline (which isn't requested), and providing the path to the executable as follows;

blastn = r"C:\Program Files\NCBI\blast-BLAST_VERSION+\bin\blastn.exe"

however it is not my own code and it's quite complex so I don't know the correct way to integrate it. I've read a similar issues like this, where it was fixed after specifying the blastn.exe path but I've not been able to do this.

FYI I'm on windows, using jupyter notebook via anaconda

ApplicationError                          Traceback (most recent call last)
Cell In[1], line 16
     14 strt = start()
     15 name,fullseq,amplifier,pause,choose,polyAT,polyCG,BlastProbes,db,dropout,show,report,maxprobe,numbr = strt[0],strt[1],strt[2],strt[3],strt[4],strt[5],strt[6],strt[7],strt[8],strt[9],strt[10],strt[11],strt[12],strt[13]
---> 16 maker(name,fullseq,amplifier,pause,choose,polyAT,polyCG,BlastProbes,db,dropout,show,report,maxprobe,numbr)

File c:\Users\wilke\OneDrive - Hubrecht Institute\Jupyter notebooks\insitu_probe_generator-v.0.3.2\maker37cb.py:411, in maker(name, fullseq, amplifier, pause, choose, polyAT, polyCG, BlastProbes, db, dropout, show, report, maxprobe, numbr)
    408 ## Probe BLAST setup and execution from FASTA file prepared in previous step
    410     cline = bn(query = str(name)+"PrelimProbes.fa", subject = db, outfmt = 6, task = 'blastn-short') #this uses biopython's blastn formatting function and creates a commandline compatible command 
--> 411     stdout, stderr = cline() #cline() calls the string as a command and passes it to the command line, outputting the blast results to one variable and errors to the other
    413     ## From results of blast creating a numpy array (and Pandas database)
    414     dt = [(np.unicode_,8),(np.unicode_,40),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.float),(np.float)]

File ~\anaconda3\envs\hcr\lib\site-packages\Bio\Application\__init__.py:574, in AbstractCommandline.__call__(self, stdin, stdout, stderr, cwd, env)
    571     stderr_arg.close()
    573 if return_code:
--> 574     raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
    575 return stdout_str, stderr_str

ApplicationError: Non-zero return code 1 from 'blastn -outfmt 6 -query tbx18PrelimProbes.fa -subject "C:\\Users\\wilke\\OneDrive - Hubrecht Institute\\Jupyter notebooks\\insitu_probe_generator-v.0.3.2\\fastas\\Tbx18-cDNA.fa" -task blastn-short', message "'blastn' is not recognized as an internal or external command,"

screenshot of the error

current code :

from start import start
from maker37cb import maker
import pandas as pd
from Bio.Seq import Seq
from Bio.Blast.Applications import NcbiblastnCommandline
import io
import numpy as np
import pandas as pd
import cline

blastn = r"C:\Program Files\NCBI\blast-BLAST_VERSION+\bin\blastn.exe"

    
strt = start()
name,fullseq,amplifier,pause,choose,polyAT,polyCG,BlastProbes,db,dropout,show,report,maxprobe,numbr = strt[0],strt[1],strt[2],strt[3],strt[4],strt[5],strt[6],strt[7],strt[8],strt[9],strt[10],strt[11],strt[12],strt[13]
maker(name,fullseq,amplifier,pause,choose,polyAT,polyCG,BlastProbes,db,dropout,show,report,maxprobe,numbr)

update :

I tried modifying the source code to include the path to the executable as described the above mentioned biostars post .. no luck still, now the path is not recognized.. still get an error

2

There are 2 best solutions below

0
WilkeM On BEST ANSWER

Turns out, running the script from my OneDrive created this issue.. A fresh install of the github script to my C:/ drive ran just fine straight from the start.

Leaving it up in case anyone else might run into a similar issue. But amplifying @Wayne's tip in the comments, don't run things from OneDrive.. :)

Spaces and unusual things in paths are asking for trouble when working, and I've seen where things aren't actually where you really think they are dealing with OneDrive. It is a common cause of issues on the Jupyter Community Discourse Forum. However, it makes no sense that if you copy the file and place it next to the notebook, that it cannot see it. Can you list the files that are alongside the notebook by running ls or dir in the notebook? Also run pwd to see if the current working directory is really where you suspect it is. If it's not, you use %cd to change it. – Wayne

0
Wayne On

Having a place where it works, might help you sort out what your problems are running it locally.

(As noted in my comments, my big concern is trying to put in the path to the sequence to blast that has spaces it in [or is overly complex] is causing issues that are manifesting in ways that aren't really pointing you at the problem. In this example, I put the sequence right alongside the notebook and script in the file hierarchy and refer to is simply.)

Here is how I was able to successfully run the code:

  1. I went here and launched a session via MyBinder where all dependencies needed are already installed. Shortcut to reliable launch for now: press here.

  2. When the session started, opened I made a new cell under the list of 'Available notebooks' and ran the command !git clone https://github.com/rwnull/insitu_probe_generator.git. That gets the 'insitu_probe_generator' software.

  3. Then I right-clicked on the Jupyter logo in the upper left side above the opening notebook, and choose 'Open Link in New Window' to open a JupyterLab interface into the same session. (OPTIONAL: To make things easier for you, you can optionally right now click on the tile that says 'terminal' in the Launcher pane on the right side and when it comes up enter any command, such as ls. This will keep the session active longer even if your notebook kernel times out after 10 minutes of inactivity.)

  4. Then in the file navigator panel on the left side In JupyterLab, I double click on 'insitu_probe_generator' folder in the list to go into the 'insitu_probe_generator' directory. And then in that directory I opened a new notebook and ran in that new notebook the command !curl -O http://sgd-archive.yeastgenome.org/sequence/S288C_reference/chromosomes/fasta/chrmt.fsa, which came first cell in the first notebook listed as 'available notebooks'. That command gets chrmt.fsa. I'll use that as my file to run BLAST against of the probes. (Noe that in JupyterLab, you can drag-and-drop from your local machine into the file navigation panel on the left. So you can later test with your own sequence files.)

  5. Then I opened maker37cb.py and edited line 414 to read the following:

dt = [(np.unicode_,8),(np.unicode_,40),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.int32),(np.float64),(np.float64)]

Made sure I clicked 'Save' to save the edited version of the script. (You have to do this before running the next step. If you happen to run the notebook first according to the next step, make sure you restart the kernel after editing the script or the changes will not take effect.)

  1. Then I double-clicked on 'UserInterface_v0.3.2.ipynb' in the list and opened it and began running cell number 1 of the notebook.
    For the questions that came up, like so:
    Gene name: eGFP

    As the sequence of the sense strand "of your cDNA" I used the following for no particular reason other than to have something to use that I got from chrmt.fsa:

TAAATTAATAAAATAATAATACCATTTATATATTCCATTATATATATATATTTAATAAAAATAATAATATCATTTATATATTTTATTATATATTATATATATTTTATATAAAATAATAATAATAAATTTATATTTTTATATATTATTATTAAATAATAATAATATAAATAACTCCTTCGGGGTTCGGTCCCCACGGGTCCCTCACTCCTTCTTAAGAATAAAAAGGGGTTCGGTCCCCCTCCCGTTAGTACACGGGAGGGGGTCTCTCACTCCTTCTTAAAAAATAAAAAGGTGGAAGGACTAATATAATTTTAAATAATAATTAATACTTTAATAATAATTTGTATTTCTTTATTATTAATATATTAAATATAATAATAATTAATATAATTACAATATATTAATATTATCAAATATTAATAAATATACTTTTTTATATAATTTATTTATTTATTTATTTTTTTTTTATTAAACTAATTA

amplifier: B1
How many bases from 5' end: 40
max acceptable length for polyA or polyT homopolymers: 6
max acceptable length for polyC or polyG homopolymers: 6
Do you want to choose program options? Y
able to select between potential longest probe sets? N
BLAST potential probes against a FASTA file? Y
Where is the FASTA file: 'chrmt.fsa' <== with single-quotes!!
display detailed BLAST outputs: Y
eliminate probes that appear in low quaility[sic] BLAST outputs: Y
display chosen parameters: Y
limit the number of probes made: N

Then it will run. The BLAST results are shown under the heading 'This is a detailed look at the probes with good matches'.


I do note that entering an incorrect link to the sequence file to use for BLAST file results in (wrong file in this case is 'C:/users/user fake/***.fasta':

ApplicationError: Non-zero return code 1 from 'blastn -outfmt 6 -query eGFPPrelimProbes.fa -subject "\'C:/users/user fake/***.fasta\'" -task blastn-short', message 'Command line argument error: Argument "subject". File is not accessible:  `\'C:/users/user fake/***.fasta\'\''

That's not the same error you reported but it is it possible you fixed it later or that Windows reports different things than Linux for the path to the sequence file not working correctly?



Note that when you run it above, you will see several warnings about future deprecation, such as this:

/home/jovyan/insitu_probe_generator/maker37cb.py:396: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  remove = remove.append({'pos1' : newlist[a][0], 'seq' : str(fullseq[newlist[a][0]:(newlist[a][0]+25)]+"nn"+fullseq[(newlist[a][0]+27):newlist[a][1]]), 'pos2' : newlist[a][1], 'fasta':nm, 'num':a}

For now the code works, if Pandas begins enforcing things, the code would fail in the future when run with a new version of Pandas.
This is one of the big problems with using abandonware. Plus the developers haven't specified what versions of things worked at the time they developed the code. If that was included, then the versions to install in the environment to run it can be pinned to those and things should work as at the time the code was written.

In this case, it should simple fix, similar to what I already built in for np.float64 use. But you can see how things like this can accumulate over time. It would be preferable if the developers periodically kept updating it as approaches and methods within the dependencies get adapted to keep the code current.