Saturday, 1 July 2023

How to save user session info when the user closes the website

I'm trying to build a user replay session functionality for a website and I'm using rrweb library to do that.

What this library does is when recording: it captures all the events in the webpage and I'm able to save those events by storing them in an array and when I want to replay the session I'll just pass the array to the replay function and that function handles the session replay.

Currently for testing purposes I'm saving this array in my sessionStorage and every time a new event is emitted I'll just get the array then push that new event into it then save the array again in my sessionStorage:

rrweb.record({
    emit(event) {
        const sessions = JSON.parse(sessionStorage.getItem('sessions')) || [];
        sessions.push(event);
        sessionStorage.setItem('sessions', JSON.stringify(sessions));
    },
});

However, for production, instead of saving that array in my sessionStorage and then updating it every time a new event is emitted, I would like to save that array in my database and I want to call the function that would save the array to my database once either when the user logs out or when the user decides to close the website (like by pressing the X button).

The first part -when the user logs out- is pretty straight forward, I'll just add an eventListener on the logout button, it's the second part -when the user decides to close the website- that is giving me some headache.

I know that there is the beforeUnload event however after a quick search it became clear to me that it's unreliable so basically what I'm looking for is a reliable way to determine when the user closed my website so I could fire an async function that would save the array to my database.



from How to save user session info when the user closes the website

Foundation Sticky | Element should stick when the top of viewport hits it not immediately

I am not entirely sure how Foundation sticky works. An element within a section needs to be sticky when the top of the viewport hits the element, not immediately. There are multiple sections on the page and the element that needs to be sticky might be further down the page, below another section. After the top of the viewport hits the element, it should be sticky all the way down.

The element becomes sticky immediately after you hit the scroll button, not when the top of the viewport hits it. I tried many data-anchors and played around with data-sticky-containers, but I can't seem to make it work. Maybe it is more difficult because the element that needs to be sticky is inside of a section and it's not a section of its own. But I can't change the HTML structure.

I created a Codepen to display and play with the issue here: https://codepen.io/vialito/pen/yLQMQZR

This is the HTML, I hope someone can help me out!

<main>
    <div class="nav">
        <div class="gridcontainer">
            <div class="grid-x">
                <div class="content"></div>
            </div>
        </div>
    </div>

    <section class="main">
        <div class="gridcontainer">
            <div class="grid-x">

                <div class="one cell small-12 large-8">
                    <div class="content"></div>
                </div>

                <div class="two cell small-12 large-4">
                    <div class="content"></div>
                </div>

                <div class="three cell small-12" data-sticky-container>
                    <div class="content sticky" data-sticky data-margin-top="0"></div>
                </div>

            </div>
        </div>
    </section>

    <section class="random">
        <div class="gridcontainer">
            <div class="grid-x">
                <div class="content"></div>
            </div>
        </div>
    </section>
</main>


from Foundation Sticky | Element should stick when the top of viewport hits it, not immediately

Performance degradation with increasing threads in Python multiprocessing

I have a machine with 24 cores and 2 threads per core. I'm trying to optimize the following code for parallel execution. However, I noticed that the code's performance starts to degrade after a certain number of threads.

import argparse
import glob
import h5py
import numpy as np
import pandas as pd
import xarray as xr
from tqdm import tqdm
import time
import datetime
from multiprocessing import Pool, cpu_count, Lock
import multiprocessing
import cProfile, pstats, io


def process_parcel_file(f, bands, mask):
    start_time = time.time()
    test = xr.open_dataset(f)
    print(f"Elapsed in process_parcel_file for reading dataset: {time.time() - start_time}")

    start_time = time.time()
    subset = test[bands + ['SCL']].copy()
    subset = subset.where(subset != 0, np.nan)
    if mask:
        subset = subset.where((subset.SCL >= 3) & (subset.SCL < 7))
    subset = subset[bands]

    # Adding a new dimension week_year and performing grouping
    subset['week_year'] = subset.time.dt.strftime('%Y-%U')
    subset = subset.groupby('week_year').mean().sortby('week_year')
    subset['id'] = test['id'].copy()

    # Store the dates and counting pixels for each parcel
    dates = subset.week_year.values
    n_pixels = test[['id', 'SCL']].groupby('id').count()['SCL'][:, 0].values.reshape(-1, 1)

    # Converting to dataframe
    grouped_sum = subset.groupby('id').sum()
    ids = grouped_sum.id.values
    grouped_sum = grouped_sum.to_array().values
    grouped_sum = np.swapaxes(grouped_sum, 0, 1)
    grouped_sum = grouped_sum.reshape((grouped_sum.shape[0], -1))
    colnames = ["{}_{}".format(b, str(x).split('T')[0]) for b in bands for x in dates] + ['count']
    values = np.hstack((grouped_sum, n_pixels))
    df = pd.DataFrame(values, columns=colnames)
    df.insert(0, 'id', ids)
    print(f"Elapsed in process_parcel_file til end: {time.time() - start_time}")
    return df


def fs_creation(input_dir, out_file, labels_to_keep=None, th=0.1, n=64, days=5, total_days=180, mask=False,
                mode='s2', method='patch', bands=['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12']):
    files = glob.glob(input_dir)
    times_pool = []  # For storing execution times
    times_seq = []
    cpu_counts = list(range(2, multiprocessing.cpu_count() + 1, 4))  # The different CPU counts to use

    for count in cpu_counts:
        print(f"Executing with {count} threads")
        if method == 'parcel':
            start_pool = time.time()
            with Pool(count) as pool:
                arguments = [(f, bands, mask) for f in files]
                dfs = list(tqdm(pool.starmap(process_parcel_file, arguments), total=len(arguments)))


            end_pool = time.time()
            start_seq = time.time()
            dfs = pd.concat(dfs)
            dfs = dfs.groupby('id').sum()
            counts = dfs['count'].copy()
            dfs = dfs.div(dfs['count'], axis=0)
            dfs['count'] = counts
            dfs.drop(index=-1).to_csv(out_file)
            end_seq = time.time()
            times_pool.append(end_pool - start_pool)  
            times_seq.append(end_seq - start_seq)

    pd.DataFrame({'CPU_count': cpu_counts, 'Time pool': times_pool, 
                  'Time seq' : times_seq}).to_csv('cpu_times.csv', index=False)

    return 0

When executing the code, it scales well up to around 7-8 threads, but after that, the performance starts to deteriorate. I have profiled the code, and it seems that each thread takes more time to execute the same code.

For example, with 2 threads:

Elapsed in process_parcel_file for reading dataset: 0.012271404266357422
Elapsed in process_parcel_file til end: 1.6681673526763916
Elapsed in process_parcel_file for reading dataset: 0.014229536056518555
Elapsed in process_parcel_file til end: 1.5836331844329834

However, with 22 threads:

Elapsed in process_parcel_file for reading dataset: 0.17968058586120605
Elapsed in process_parcel_file til end: 12.049026727676392
Elapsed in process_parcel_file for reading dataset: 0.052398681640625
Elapsed in process_parcel_file til end: 6.014119625091553

I'm struggling to understand why the performance degrades with more threads. I've already verified that the system has the required number of cores and threads.

I would appreciate any guidance or suggestions to help me identify the cause of this issue and optimize the code for better performance.

It's really hard for me to provide a minimal working example so take that into account.

Thank you in advance.



from Performance degradation with increasing threads in Python multiprocessing