Tuesday, 21 May 2019

How to make a progress bar on a web page for pandas operation

I have been googling for a while and couldn't figure out a way to do this. I have a simple Flask app which takes a CSV file, reads it into a Pandas dataframe, converts it and output as a new CSV file. I have managed to upload and convert it successfully with HTML

<div class="container">
  <form method="POST" action="/convert" enctype="multipart/form-data">
    <div class="form-group">
      <br />
      <input type="file" name="file">
      <input type="submit" name="upload"/>
    </div>
  </form>
</div>

where after I click submit, it runs the conversion in the background for a while and automatically triggers a download once it's done. The code that takes the result_df and triggers download looks like

@app.route('/convert', methods=["POST"])
def convert(
  if request.method == 'POST':
    # Read uploaded file to df
    input_csv_f = request.files['file']
    input_df = pd.read_csv(input_csv_f)
    # TODO: Add progress bar for pd_convert
    result_df = pd_convert(input_df)
    if result_df is not None:
      resp = make_response(result_df.to_csv())
      resp.headers["Content-Disposition"] = "attachment; filename=export.csv"
      resp.headers["Content-Type"] = "text/csv"
      return resp

I'd like to add a progress bar to pd_convert which is essentially a pandas apply operation. I found that tqdm works with pandas now and it has a progress_apply method instead of apply. But I'm not sure if it is relevant for making a progress bar on a web page. I guess it should be since it works on Jupyter notebooks. How do I add a progress bar for pd_convert() here?

The ultimate result I want is:

  1. User clicks upload, select the CSV file from their filesystem
  2. User clicks submit
  3. The progress bar starts to run
  4. Once the progress bar reaches 100%, a download is triggered

1 and 2 are done now. Then the next question is how to trigger the download. For now, my convert function triggers the download with no problem because the response is formed with a file. If I want to render the page I form a response with return render_template(...). Since I can only have one response, is it possible to have 3 and 4 with only one call to /convert?

Not a web developer, still learning about the basics. Thanks in advance!


====EDIT====

I tried the example here with some modifications. I get the progress from the row index in a for loop on the dataframe and put it in Redis. The client gets the progress from Redis from the stream by asking this new endpoint /progress. Something like

@app.route('/progress')
def progress():
  """Get percentage progress for the dataframe process"""
  r = redis.StrictRedis(
    host=redis_host, port=redis_port, password=redis_password, decode_responses=True)
  r.set("progress", str(0))
  # TODO: Problem, 2nd submit doesn't clear progress to 0%. How to make independent progress for each client and clear to 0% on each submit
  def get_progress():

    p = int(r.get("progress"))
    while p <= 100:
      p = int(r.get("progress"))
      p_msg = "data:" + str(p) + "\n\n"
      yield p_msg
      logging.info(p_msg)
      if p == 100:
        r.set("progress", str(0))
      time.sleep(1)

  return Response(get_progress(), mimetype='text/event-stream')

It is currently working but with some issues. The reason is definitely my lack of understanding in this solution.

Issues:

  • I need the progress to be reset to 0 every time submit button is pressed. I tried several places to reset it to 0 but haven't found the working version yet. It's definitely related to my lack of understanding in how stream works. Now it only resets when I refresh the page.
  • How to handle concurrent requests aka the Redis race condition? If multiple users make requests at the same time, the progress should be independent for each of them. I'm thinking about giving a random job_id for each submit event and make it the key in Redis. Since I don't need the entry after each job is done, I will just delete the entry after it's done.

I feel my missing part is the understanding of text/event-stream. Feeling I'm close to a working solution. Please share your opinion on what is the "proper" way to do this. I'm just guessing and trying to put together something that works with my very limited understanding.



from How to make a progress bar on a web page for pandas operation

No comments:

Post a Comment