Adding progress bars and parallelize tasks in Python

data engineering

Publish Date: 2023-03-13

use tqdm and pqdm in Python to add progress bars to your code and parallelize tasks.

Projects in Python often involve long-running tasks like training models or processing large datasets. To make these tasks more manageable, it’s helpful to add progress bars to your code and parallelize tasks to take advantage of multiple CPU cores.

Two popular Python libraries for achieving these goals are tqdm and pqdm. tqdm provides a simple way to add progress bars to your code, while pqdm is a wrapper around tqdm and concurrent.futures that allows you to parallelize tasks while also showing a progress bar.

In this blog post, we’ll walk through how to use tqdm and pqdm in Python to add progress bars and parallelize tasks.

Adding Progress Bars with tqdm

tqdm is a Python library that provides a simple way to add progress bars to your code. To use tqdm, you first need to install it using pip:

pip install tqdm

Once you’ve installed tqdm, you can use it in your code like this:

from tqdm import tqdm
import time

for i in tqdm(range(100)):
    time.sleep(0.1) # simulate a longer-running task

In this example, we use tqdm to add a progress bar to a loop that runs 100 times. We also use the time.sleep function to simulate a longer-running task. When you run this code, you’ll see a progress bar that updates in real-time as the loop runs:

100%|███████████████████████████████████████████████████| 100/100 [00:10<00:00,  9.87it/s]

The progress bar shows you the percentage of the loop that’s completed, as well as an estimate of the time remaining to complete the loop. You can customize the appearance of the progress bar with different colors, styles, and other options to match your preferences and the needs of your project.

Parallelizing Tasks with pqdm

pqdm is a Python library that builds on top of tqdm and concurrent.futures to provide a simple way to parallelize tasks while also showing a progress bar. To use pqdm, you first need to install it using pip:

pip install pqdm

Once you’ve installed pqdm, you can use it in your code like this:

from pqdm.processes import pqdm
import time

def process_data(data):
    time.sleep(0.1) # simulate a longer-running task
    return data * 2

if __name__ == '__main__':
    data = range(100)
    processed_data = pqdm(data, process_data, n_jobs=4, desc='Processing data')

In this example, we define a function process_data that takes in a piece of data and returns a processed version of that data. We then use pqdm to apply this function to each piece of data in parallel, using four processes (n_jobs=4) and displaying a progress bar with the label “Processing data” (desc=’Processing data’).

robot learner

https://datasciencebyexample.github.io/2023/03/13/parrallelizing-and-visualizing-tasks-with-pqdm/

All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !

tqdm pqdm progress bars

Longer and more, the one minute takeaways of GPT4 model

2023-03-14 data science

gp4

How to convert CURL to python requests and vice versa

2023-03-12 data engineering

python curl requests