How to have one progress bar with multiple downloads using tqdm/python?

Question

How to have one progress bar with multiple downloads using tqdm/python?

112 Views Asked by Irfan At 06 September 2023 at 05:21

Here is my working python script for downloading fasta sequences from UniProt (with real appreciation to the community). '''

UniProt fasta downloader using accession ids from a text file,
show the download progress for each downloading sequence,
and make a list of unaccessible sequnces
'''
import functools
import pathlib
import shutil
import requests
from tqdm.auto import tqdm
#Part I: Read the file with IDs and make a list of urls to download the respective sequences
with open ('errtest.txt', 'r') as infile:
    lines = infile.readlines()

listfile_name = infile.name
file_name = listfile_name.split('.', 1)[0]

downloaded = 0 #sequences downloaded

URL_list = []
for line in lines:
    access_id = line.strip()
    url_part1 = 'https://rest.uniprot.org/uniprotkb/'
    url_part2 = '.fasta'
    URL = url_part1+access_id+url_part2          
    URL_list.append(URL)

not_found = []
for url in URL_list:
    r = requests.get(url, stream=True, allow_redirects=True)
    file_size = int(r.headers.get('Content-Length', 0))
    if r.status_code != 200:
        Apart = url.removeprefix('https://rest.uniprot.org/uniprotkb/')
        short_id = Apart.removesuffix('.fasta')
        not_found.append (short_id)
        print (short_id, '-- not found')
    elif r.status_code == 200:
        path = pathlib.Path((file_name)+'seqs.fa').expanduser().resolve()
        path.parent.mkdir(parents=True, exist_ok=True)

        desc = "(Unknown total file size)" if file_size == 0 else ""
        r.raw.read = functools.partial(r.raw.read, decode_content=True)  # Decompress if needed
        with tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
            with path.open("ab") as f:
                shutil.copyfileobj(r_raw, f)
        downloaded += 1
print ('Sequences with these accesion ids were not found:\n', not_found)
print (downloaded, 'sequences downloaded')

These are the contents of errtest.txt file (some wrong IDs to count and some correct IDs):

wrong1
D3VN13
B9W4V6
wrong2
A0A8S0XZH6
wrong3

This is the typical output:

wrong1 -- not found

  0%|          | 0/477 [00:00<?, ?it/s]
100%|██████████| 477/477 [00:00<00:00, 239kB/s]

  0%|          | 0/473 [00:00<?, ?it/s]
100%|██████████| 473/473 [00:00<00:00, 42.4kB/s]
wrong2 -- not found

  0%|          | 0/534 [00:00<?, ?it/s]
100%|██████████| 534/534 [00:00<00:00, 268kB/s]
wrong3 -- not found
Sequences with these accesion ids were not found:
 ['wrong1', 'wrong2', 'wrong3']
3 sequences downloaded

So far, so good. Next, I want to make a single progress bar for all the downloads. In this text file, there are only 3 legit IDs and 3 wrong IDs (which happens sometimes) and three progress bars can be shown one after another. But in reality, thousands of IDs will be in the list file, with 1000s or URLs and respective sequence downloads. So it would be ideal to have a single progress bar showing the downloading progress.

Original Q&A

There are 1 best solutions below

**curt** · Answer 1 · 2023-09-06T05:29:05.987000

I think that you could compute the total size before starting the loop of the download, and then use a unique progress bar, something like this:

import functools
import pathlib
import shutil
import requests
from tqdm.auto import tqdm

# Part I: Read the file with IDs and make a list of URLs to download the respective sequences
with open('errtest.txt', 'r') as infile:
    lines = infile.readlines()

listfile_name = infile.name
file_name = listfile_name.split('.', 1)[0]

downloaded = 0

URL_list = []
total_file_size = 0  # Initialize total file size
not_found = []

for line in lines:
    access_id = line.strip()
    url_part1 = 'https://rest.uniprot.org/uniprotkb/'
    url_part2 = '.fasta'
    URL = url_part1 + access_id + url_part2
    URL_list.append(URL)
    # classify files
    r = requests.get(URL, stream=True, allow_redirects=True)
    if r.status_code != 200:
        Apart = URL.removeprefix('https://rest.uniprot.org/uniprotkb/')
        short_id = Apart.removesuffix('.fasta')
        not_found.append(short_id)
        print(short_id, '-- not found')
    else:
        file_size = int(r.headers.get('Content-Length', 0))
        total_file_size += file_size  # Add current file size to total file size

# Create unique progress bar
with tqdm(total=total_file_size, unit='B', unit_scale=True, unit_divisor=1024, desc='Downloading') as pbar:
    for URL in URL_list:
        r = requests.get(URL, stream=True, allow_redirects=True)
        if r.status_code == 200:
            path = pathlib.Path((file_name) + 'seqs.fa').expanduser().resolve()
            path.parent.mkdir(parents=True, exist_ok=True)

            r.raw.read = functools.partial(r.raw.read, decode_content=True)
            with path.open("ab") as f:
                shutil.copyfileobj(r.raw, f)
            downloaded += 1
        pbar.update(file_size)

print('Sequences with these accession IDs were not found:\n', not_found)
print(downloaded, 'sequences downloaded')

How to have one progress bar with multiple downloads using tqdm/python?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in DOWNLOAD

Related Questions in PROGRESS

Related Questions in TQDM

Trending Questions

Popular # Hahtags

Popular Questions