Hemant Vishwakarma: November 2023

Thursday 30 November 2023

Python IntelliJ style 'search everywhere' algorithm

I have a list of file names in python like this:

HelloWorld.csv
hello_windsor.pdf
some_file_i_need.jpg
san_fransisco.png
Another.file.txt
A file name.rar

I am looking for an IntelliJ style search algorithm where you can enter whole words or simply the first letter of each word in the file name, or a combination of both. Example searches:

hw -> HelloWorld.csv, hello_windsor.pdf
hwor -> HelloWorld.csv
winds -> hello_windsor.pdf

sf -> some_file_i_need.jpg, san_francisco.png
sfin -> some_file_i_need.jpg
file need -> some_file_i_need.jpg
sfr -> san_francisco.png

file -> some_file_i_need.jpg, Another.file.txt, A file name.rar
file another -> Another.file.txt
fnrar -> A file name.rar

You get the idea.

Is there any Python packages that can do this? Ideally they'd also rank matches by 'frecency' (how often the files have been accessed, how recently) as well as by how strong the match is.

I know pylucene is one option but it seems very heavyweight given the list of file names is short and I have no interest in searching the contents of the file? Is there any other options?

from Python IntelliJ style 'search everywhere' algorithm

telethon and opentele conflict

I'm calling this in main.py:

from opentele.td import TDesktop
from opentele.tl import TelegramClient
from opentele.api import API, UseCurrentSession


async def save(name, path):
    client = TelegramClient(f"{name}.session")

    tdesk = await client.ToTDesktop(flag=UseCurrentSession)
    tdesk.SaveTData(path)

where I'm using TelegramClient from telethon, and I'm getting an error 'bytes or str expected, not <class 'int'>.' However, when I call it separately or in another file, there are no errors, and everything works

#main.py
 await client.start()
        await client(UpdateUsernameRequest(user_name))
        await save_tdata.save(user_name, f"{user_name}/tdata")

When I remove the line await save_tdata.save(user_name, f"{user_name}/tdata"), the error persists. However, the issue is resolved only when I remove import save_tdata. I want to emphasize that if I run it from other files, for example, create test.py and call save_tdata.save(), everything works fine. I think it somehow conflicts with Telethon. I've tried giving them different names using 'as'.

from telethon and opentele conflict

How to block users from closing a window in Javascript?

Is it possible to block users from closing the window using the exit button [X]? I am actually providing a close button in the page for the users to close the window.Basically what I'm trying to do is to force the users to fill the form and submit it. I don't want them to close the window till they have submitted it.

I really appreciate your comments, I'm not thinking of hosting on any commercial website. Its an internal thing, we are actually getting all the staff to participate in this survey we have designed....

I know its not the right way but I was wondering if there was a solution to the problem we have got here...

from How to block users from closing a window in Javascript?

Wednesday 29 November 2023

How to decode PKCS#7/ASN.1 using Javascript?

Recently I started to work on Apple app store receipt validation and due to the deprecation of the legacy /verifyReceipt endpoint, I decided to go for on-device validation. The guideline described gives a step-by-step solution for MacOS however we want to perform this validation in the backend service using NodeJs. For the purpose of this validation information defined in the PCKS7# container is required to be decoded. Here my knowledge comes short as I am unable to retrieve this information (e.g. receipt_creation_data, bundle_id) I managed to convert the receipt from PCKS7 to ASN.1 but could not find a way to retrieve the actual key values from it. I tried several libraries like node-forge, asn1js, asn1.js. What I found really useful were these resources:

iap-local-receipt - Python implementation of the exact problem
https://github.com/tamtamchik/app-store-receipt-parser - shows a possible way of parsing Apple-store receipt
https://lapo.it/asn1js/ - tool converting PKCS#7 to ASN.1

AFAIK the information should be encoded in OCTET STRING format

How can information such as bundle_id or receipt_creation_date be retrieved from ASN.1 using Javascript?

from How to decode PKCS#7/ASN.1 using Javascript?

Python code packet window is shown as full by Wireshark

I have a python code with such a structure:

server_info = (server_address, listen_port)
socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
socket.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 100000)
socket.bind(server_info)
socket.listen(100)

while True:
    try:
        client_conn, client_address = socket.accept()

        more_to_read = True
        while more_to_read:
            client_conn.settimeout(3.0)
            data_part = client_conn.recv(10240)
            if data_part:
                print('Data part received ...')
                # record {data_part} somewhere
            else:
                more_to_read = False
                # finalize the message
        client_conn.close()
    except socket.error as e:
        print('Communication error:', str(e))
        continue

socket.close()

I know the outer loop does not exit!

In Wireshark I see this black record:

My first surprise is that I set the recv buffer size to 10240 but why does it show window size is 14656 and the second surprise is that why the window is marked as full while the data length is only 146 bytes?

The error message is:

Expert Info (Warning/Sequence): TCP window specified by the receiver is now completely full

How to fix the code?

PS.

1- Listen port is 7010

2- In the wireshark filter, tcp.port == 7010 the 1 after it is a typo that does not impact the filter.

from Python code packet window is shown as full by Wireshark

Nuxt : n2 is not a function

I'm using Nuxt 3 and I have an TypeError like this :

Uncaught TypeError: n2 is not a function

I have a button calling my function toggleSelectRow with a @click.prevent

And here is the function :

const toggleSelectRow = (keyword) => {
    if (keywordsSelected.value.find((kw) => kw.uid_kw === keyword.uid_kw) && props.keywords.length > 1) {
        keywordsSelected.value = []
        nuxtApp.$bus.$emit('filter-from-list', '');
    } else {
        keywordsSelected.value = [keyword]
        nuxtApp.$bus.$emit('filter-from-list', keyword.uid_kw);
  }
}

Here is the receiver part :

$bus.$on('filter-from-list', (newKeyword) => {   filters.keyword = (props.keywords.find((kw) => kw.uid_kw === newKeyword)) ? newKeyword : null   if (!filters.keyword) {
    refreshNuxtData('average-keyword-position');
    return;   }

  fetchKeywordPosition({ keyword: newKeyword }); });

nuxtApp is defined above with : const nuxtApp = useNuxtApp();

I think my problem come from the emit but I don't know why because if I refresh the page, it will work.

Fixing test 1

I tried to use a declared function in the receiver component like this :

function filterFromList(newKeyword){
  filters.keyword = (props.keywords.find((kw) => kw.uid_kw === newKeyword)) ? newKeyword : null
  if (!filters.keyword) {
    refreshNuxtData('average-keyword-position');
    return;
  }

  fetchKeywordPosition({ keyword: newKeyword });
}

to call it like this : $bus.$on('filter-from-list', filterFromList); but I still have the same error so I think the problem must come from the sender/receiver.

from Nuxt : n2 is not a function

How do I overwrite a BigQuery table (data and schema) from PySpark?

I am trying to write a PySpark DataFrame to a BigQuery table. The schema for this table may change between job executions (columns may be added or omitted). So, I would like to overwrite this table each execution.

An example:

df = spark.createDataFrame(data=[(1, "A")],schema=["col1","col2"])
df.write.format("bigquery")\
    .option("temporaryGcsBucket","temporary-bucket-name")\
    .mode("overwrite")\
    .save(path="dataset_name.table_name")

When `dataset_name.table_name" doesn't already exists, the above works great to generate:

However, subsequent jobs may be as below:

df.withColumnRenamed("col1", "col3").write.format("bigquery")\
    .option("writeDisposition", "WRITE_TRUNCATE")\
    .option("temporaryGcsBucket","temporary-bucket-name")\
    .mode("overwrite")\
    .save(path="dataset_name.table_name")

The above job does not generate what I want. I get no col3 and col1 still appears:

Even more disturbing, I get no error message.

So, what options should I specify so that the result in BigQuery is just col2 and col3 with appropriate data?

Basically, I want to mimic the SQL statement CREATE OR REPLACE TABLE from PySpark.

from How do I overwrite a BigQuery table (data and schema) from PySpark?

Tuesday 28 November 2023

How to use PublisherServiceAsyncClient

I am trying to use AsyncPublisherClient from pubsublite

pip install google-cloud-pubsublite==1.8.3

from google.cloud import pubsublite_v1

pubsub = pubsublite_v1.AsyncPublisherClient()

error: Module has no attribute "AsyncPublisherClient"

The documentation is very scarce and I couldn't even find this class in the virtualenv directory, just its interface.

How do I use this library?

EDIT: It looks like the correct class is PublisherServiceAsyncClient

EDIT2:

from google.cloud.pubsublite_v1.types.publisher import PublishRequest
from google.cloud.pubsublite_v1.types import PubSubMessage
from google.cloud.pubsublite_v1 import PublisherServiceAsyncClient

pubsub = PublisherServiceAsyncClient()

message = PubSubMessage(data=json.dumps(payload).encode("utf-8"))

request = PublishRequest(topic=os.environ["TOPIC"], messages=[message])

async def request_generator():
    yield request

await pubsub.publish(requests=request_generator())

ValueError: Unknown field for PublishRequest: topic

from How to use PublisherServiceAsyncClient

SqlAlchemy extending an existing query with additional select columns from raw sql

I'm quite new to SQLAlchemy and Python, and have to fix some bugs in a legacy environment so please bear with me..

Environment:
Python 2.7.18
Bottle: 0.12.7
SQLAlchemy: honestly don't know but something from 2014 (might be 0.9.8?)
MySQL: 5.7

Scenario: I have this SQL Statement below that pivots a linked table and adds rows dynamically as additional columns in the original table - I constructed it in SQL Workbench and it returns the results I want: All specified columns from table t1 plus the additional columns with values from table t2 appear in the result

SET SESSION group_concat_max_len = 1000000; --this is needed to prevent the group_concat function to cut off after 1024 characters, there are quite a few columns involved that easily exceed the original limit
SET @sql = NULL;
SELECT GROUP_CONCAT(DISTINCT CONCAT(
  'SUM(
  CASE WHEN custom_fields.FIELD_NAME = "', custom_fields.FIELD_NAME, '" THEN
custom_fields_tags_link.field_value END) 
  AS "', custom_fields.FIELD_NAME, '"')
) AS t0
INTO @sql
FROM tags t1 LEFT OUTER JOIN custom_fields_tags_link t2 ON t1.id = t2.tag_id JOIN
custom_fields ON custom_fields.id = t2.custom_field_id;

SET @sql = CONCAT('SELECT tags.id AS tags_id,
tags.tag_type AS tags_tag_type, tags.name AS
tags_name, tags.version AS tags_version, ', @sql,
' FROM tags LEFT OUTER JOIN custom_fields_tags_link ON tags.id = 
custom_fields_tags_link.tag_id JOIN custom_fields ON custom_fields.id = 
custom_fields_tags_link.custom_field_id GROUP BY tags.id');
SELECT @sql;

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Problem: I have an already existing SQLAlchemy session, that expands a query to be used for pagination. Currently this query returns all my specified columns from table t1, that are joined with tables t2 and custom_fields to get all necessary columns. The missing part is the SQLAlchemy representation for the SELECT GROUP_CONCAT part of the above statement, the rest is all taken care of - since I have control and know how the Frontend presenation of this table looks and now also the raw SQL version in SQL Workbench, I tried to work backwards to get the SQLAlchemy / Python part right by consulting https://docs.sqlalchemy.org/en/14/orm/queryguide.html#orm-queryguide-selecting-text and https://docs.sqlalchemy.org/en/14/core/sqlelement.html#sqlalchemy.sql.expression.TextClause.columns, but now I am stuck at how to get this TextClause object converted into a TextualSelect without typing the columns statically in the .column() function cause I don't know what the column names, that the users provide for these custom_fields, will be.

Goal: concat a dynamically created raw SQL statement to my existing SQLAlchemy query to select these dynamically created fields from a linked table so that I have the same result as when I execute this raw SQL statement in a SQL editor

Attempts:

#session is a app-wide shared MySQL session object that is created via sqlalchemy.orm's sessionmaker, scoped_session function
alchemy = session.query(Tag)
try:
  #mainly the next line will be changed (x)
  select_custom_fields_columns_stmt = select().from_statement(text(
  '''SET SESSION group_concat_max_len = 1000000;
    SET @sql = NULL;
    SELECT GROUP_CONCAT(DISTINCT CONCAT(
    'SUM(
      CASE WHEN custom_fields.FIELD_NAME = "', custom_fields.FIELD_NAME, '" THEN
      custom_fields_tags_link.field_value END) AS "', custom_fields.FIELD_NAME, '"')) 
    AS t0
      INTO @sql
      FROM tags t1 LEFT OUTER JOIN custom_fields_tags_link t2 ON t1.id = t2.tag_id JOIN custom_fields ON custom_fields.id = t2.custom_field_id;''')) #or here after the statement (y)
  # next line is my attempt to add the columns that have been generated by the previous function but of course unsuccessful
  alchemy = alchemy.add_columns(select_custom_fields_columns_stmt)
except:
  logException()
joined_query = alchemy.outerjoin(model.tag.CustomFieldTagLink)
.join(model.tag.CustomField)

A: This results in this error: AttributeError: 'Select' object has no attribute 'from_statement'

B: Changing the line (x) above that constructs the query for the additional rows to select_custom_fields_columns_stmt = session.select().from_statement(text(... --> results in: AttributeError: 'Session' object has no attribute 'select'

C: adding a .subquery("from_custom_fields") statement at (y) --> results in: AttributeError: 'AnnotatedTextClause' object has no attribute 'alias'

D: other attempts for (x) substituting select() with session.query() or session.query(Tags) also didn't result in additional columns

What else can I try? Would it be preferable/easier to write the whole raw SQL part in SQLAlchemy and if so, how could I do that?

from SqlAlchemy extending an existing query with additional select columns from raw sql

Monday 27 November 2023

Email queue keeps sending out duplicate emails

I'm trying to implement an email queue, but it's not quite working as expected. The idea is to respond from an endpoint as quickly as possible and then just have the queue manager attempt to process the queue in the background, in this case send out an email.

One of the problems is that the interval keeps executing when this.running is set to true. Another problem is that an email sometimes sends out multiple times, when it should only be sent out once.

I've also tried using redis for this because I recognise the closure issue with initialize and also saw bull as a potential solution, but I'm trying to avoid as many unneccessary libraries as possible. Ideally, I'd like to make some sort of base class that I can reuse for different queues, for examples an SMS queue, not just an email queue. But for now, I guess I'm just trying to get this email queue to work.

What am I doing wrong here? Here's my code:

import emailClient from 'services/emailClient';
import randomUuid from 'services/randomUuid'

export interface EmailClientConfig {
    to: string;
}

const MAX_ATTEMPTS_FOR_EMAILS = 5;
class EmailQueueManager {
    private running = false;
    private emailQueue = new Map();

    private handleRemoveSentEmailsFromQueue(sentEmailUuids: string[]) {
        // delete each sent out email from queue
        sentEmailUuids.forEach((sentEmailUuid) => {
            this.emailQueue.delete(sentEmailUuid);
        });

        // set state to "ready to process"
        this.running = false;
    }

    private async handleProcessEmailQueue() {
        // determines if queue is currently being executed
        this.running = true;
        const processedEmails: Record<string, boolean> = {};
        // list of successfully sent emails
        const sentEmailUuids: string[] = [];

        // do nothing if currently being executed, so as
        // to not send out duplicate emails
        if (!this.emailQueue.size) {
            this.running = false;
            return;
        }

        const emails = [...this.emailQueue];

        for (let i = 0; i < emails.length; i++) {
            const [emailUuid, { attempts, emailLogEvent, ...emailConfig }] = emails[i];

            emailClient(emailConfig)
                .then((result) => {
                    sentEmailUuids.push(emailUuid);

                    // maybe do something else
                })
                .catch((err) => {
                    if (attempts > MAX_ATTEMPTS_FOR_EMAILS) {
                        // remove from queue permanently if
                        // attempt count exceeds what is allowed
                        this.emailQueue.delete(emailUuid);
                    } else {
                        // send back to queue and attempt to send email out again
                        this.emailQueue.set(emailUuid, {
                            attempts: attempts + 1,
                            emailLogEvent,
                            ...emailConfig,
                        });
                    }
                })
                .finally(() => {
                    // flag email as "processed", regardless of whether
                    // or not is was successfully sent
                    processedEmails[emailUuid] = true;
                });
        }

        // check every 250ms if all emails in queue
        // were processed (not necessarily successfully sent out)
        const checkProcessedEmailsInterval = setInterval(() => {
            if (Object.values(processedEmails).every(Boolean)) {
                this.handleRemoveSentEmailsFromQueue(sentEmailUuids);
                clearInterval(checkProcessedEmailsInterval);
            }
        }, 250);
    }
    public addEmailToQueue({
        emailConfig,
        emailLogEvent,
    }: {
        emailConfig: EmailClientConfig;
        emailLogEvent: string;
    }) {
        // add email to queue
        this.emailQueue.set(randomUuid(), { attempts: 0, emailLogEvent, ...emailConfig });
        // and immediately process the queue
        this.handleProcessEmailQueue();
    }

    public initialize() {
        // check every minute if there are
        // emails in queue that need to be processed
        setInterval(() => {
            if (!this.running) {
                this.handleProcessEmailQueue();
            }
        }, 1000 * 60);
    }
}

const emailQueueManager = new EmailQueueManager();

// instantiate the interval
emailQueueManager.initialize();

export default emailQueueManager;

And I use this like so:

import emailQueueManager from 'src/managers/queues/emailQueue';

. . .
async function someEndpoint(req, res) {
    // do stuff
    emailQueueManager.addEmailToQueue({
        emailConfig: {
            to: 'test@test.com',
        },
        emailLogEvent: 'sent_email',
    });
    // do more stuff

    res.end()
}
. . .

from Email queue keeps sending out duplicate emails

How properly store and load own embeddings in Redis vector db

Here is a simple code to use Redis and embeddings but It's not clear how can I build and load own embeddings and then pull it from Redis and use in search

But I want to load a text from e.g. text file, create embedings and load into Redis for later use. Something like this:

Does anyone have good example to implement this approach?

from How properly store and load own embeddings in Redis vector db

Incorrect image matching results despite differences (human fingerprints)

I want to use python to compared two images to check whether they are the same or not, I want to use this for fingerprint functionality in django app to validate whether the provided fingerprint is matches the one stored in the database. I have decided to use OpenCV for this purpose, utilizing ORB_create with detectAndCompute and providing the provided fingerprint to the BFMatcher. However, with the code below, when attempting to match the images, it consistently return True, while the provided images are not the same, along with the print statment "Images are the same".

def compared_fingerprint(image1, image2):

    finger1 = cv2.imread(image1, cv2.IMREAD_GRAYSCALE)
    finger2 = cv2.imread(image2, cv2.IMREAD_GRAYSCALE)

    orb = cv2.ORB_create()

    keypoints1, descriptors1 = orb.detectAndCompute(finger1, None)
    keypoints2, descriptors2 = orb.detectAndCompute(finger2, None)

    bf = cv2.BFMatcher()
    matches = bf.match(descriptors1, descriptors2)
    threshold = 0.7

    similar = len(matches) > threshold * len(keypoints1)

    if similar:
        print('Images are the same')
        return similar
    else:
        print('Images are not the same')
        return similar

result = compared_fingerprint('c.jpg', 'a.jpg')
print(result)

With the provided images, the function supposed to return the second statement, since, they are not the same, I thought, it was the threshold assign to 0.7, and when I increase the threshold to 1.7, it return the second statement saying: "Images are not the same" False, but when I try to make the images to be the same, I mean: result = compared_fingerprint('a.jpg', 'a.jpg'), it's still return "Images are not the same" False.

a.jpg

c.jpg

from Incorrect image matching results despite differences (human fingerprints)

Deadlock with asyncio.Semaphore

I have asyncio code that sometimes freezes in a deadlock, which should not be possible in my opinion. As reality always wins over theory, I must obviously be missing something. Can somebody spot a problem in the following code and tell me why it is possible at all that I can run into a deadlock?

async def main():
    sem = asyncio.Semaphore(8)
    loop = asyncio.get_running_loop()
    tasks = []

    #   Wrapper around the 'do_the_work' function to make sure that the
    #   semaphore is released in every case. In my opinion it should be
    #   impossible to leave this code without releasing the semaphore.
    #
    #   But as I can observe a deadlock in real life, I must be missing
    #   something!?
    async def task(**params):
        try:
            return await do_the_work(**params)
        finally:
            #   Whatever happens in do_the_work (that does not crash the whole
            #   interpreter), the semaphore should be released.
            sem.release()

    for params in jobs:
        #   Without the wait_for my code freezes at some point. The do_the_work
        #   function does not take too long, so the 10min timeout is
        #   unrealistic high and just a plausibility check to "proof" the dead
        #   lock.
        try:
            await asyncio.wait_for(sem.acquire(), 60*10)
        except TimeoutError as e:
            raise RuntimeError("Deadlock?") from e

        #   Start the task. Due to the semaphore there can only be 8 tasks
        #   running at the same time.
        tasks.append(loop.create_task(task(**params)))

        #   Check tasks which are already done for an exception. If there was
        #   one just stop immediately and raise it.
        for t in tasks:
            if t.done():
                e = t.exception()
                if e:
                    raise e

    #   If I reach this point, all tasks were scheduled and the results are
    #   ready to be consumed.
    for result in await asyncio.gather(*tasks):
        handle_result(result)

from Deadlock with asyncio.Semaphore

Sunday 26 November 2023

Creating an ensemble of classifiers based on predefined feature subsets

The following MWE creates an ensemble method from the features selected using SelectKBest algorithm and RandomForest classifier.

# required import
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.pipeline import Pipeline

# ensemble created from features selected
def get_ensemble(n_features):
  # define base models
  models = []
  # enumerate the features in the training dataset
  for i in range(1, n_features + 1):
    # feature selection transform
    fs = SelectKBest(score_func=f_classif, k=i)
    # create the model
    model = RandomForestClassifier(n_estimators=50)
    # create the pipeline
    pipe = Pipeline([('fs', fs), ('m', model)])
    # list of tuple of models for voting
    models.append((str(i), pipe))

  # define the voting ensemble
  ensemble_clf = VotingClassifier(estimators=models, voting='hard')

  return ensemble_clf

So, to use the ensemble model:

# generate data for a 3-class classification
X, y = make_classification(n_samples=1000, n_features=10, n_classes=3,
                             n_informative=3)

X = pd.DataFrame(X, columns=list('ABCDEFGHIJ'))

X_train, X_test, y_train, y_test = train_test_split(X, y,
         test_size=0.3, random_state=42)

X_train.head()
       A       B       C       D       E       F       G       H       I      J
541  0.1756 -0.3772 -1.6396 -0.7524  0.2138  0.3113 -1.4906 -0.2885  0.1226  0.2057
440 -0.4381 -0.3302  0.7514 -0.4684 -1.2477 -0.5081 -0.7934 -0.3138  0.8423 -0.4038
482 -0.6648  1.2337 -0.2878 -1.6737 -1.2377 -0.4479 -1.1843 -0.2424 -0.9935 -1.4537
422  0.6099  0.2475  0.9612 -0.7339  0.6926 -1.5761 -1.6061 -0.3879 -0.1895  1.3738
778 -1.4893  0.5234  1.6126  0.8704 -2.7363 -1.3818 -0.2196 -0.7894 -1.1755 -2.8779

# get the ensemble model
ensemble_clssifier = get_ensemble(X_train.shape[1])

ensemble_clssifier.fit(X_train, y_train)

Creates 10 base models (n_features=10) and then an ensemble VotingClassifier based on majority (voting = hard).

Question:

The MWE described above works fine. However, I would like to replace the SelectKBest feature selection process in the get_ensemble function.

I have conducted a different feature selection process, and discovered the "optimal" feature subset for each class in this dataset as follows:

             | best predictors
-------------+-------------------
   class 0   |  A, B, C
   class 1   |  D, E, F, G
   class 2   |  G, H, I, J
-------------+-------------------

So the modification I would like to make to get_ensemble is that, instead of iterating over the number of available features, creating n base-models, it should create 3 (no. of classes) base models, where:

base-model 1 will be fitted using the feature subset ['A', 'B', 'C'].
base-model 2 will be fitted using the feature subset ['D', 'E', 'F', 'G'].
base-model 3 will be fitted using the feature subset ['G', 'H', 'I', 'J'].
finally the ensemble_classifier based on majority voting of the sub-models output.

That's, I when I make the call to:

ensemble_clssifier.fit(X_train, y_train)

It proceeds like so:

# 1st base model on fitted on its feature subset
model.fit(X_train[['A', 'B', 'C']], y_train)
# 2nd base model
model.fit(X_train[['D', 'E', 'F', 'G']], y_train)
# 3rd model also
model.fit(X_train[['G', 'H', 'I', 'J']], y_train)

This scenario should apply as well during prediction, making sure each base model selects the appropriate feature subset from X_test to make its prediction on ensemble_clssifier.fit(X_test) before the final voting.

I am not sure how to proceed. Any ideas?

EDIT

Regarding this question, I made some changes (e.g. not using the VotingClassifier) to further train the final ensemble on the output of the base models (base models confidences). Then finally make predictions.

I created the following ensemble class:

from sklearn.base import clone

class CustomEnsemble:
    def __init__(self, base_model, best_feature_subsets):
        self.base_models = {class_label: clone(base_model) for class_label in best_feature_subsets}
        self.best_feature_subsets = best_feature_subsets
        self.final_model = base_model

    def train_base_models(self, X_train, y_train):
        for class_label, features in self.best_feature_subsets.items():
            model = self.base_models[class_label]
            model.fit(X_train[features], (y_train == class_label))
        
        return self
    
    def train_final_model(self, X_train, y_train):
        """
        Probably better to implement the train methods (base models & ensemble)
        in one method suc as the train_base_models  altogether.
        """
        predictions = pd.DataFrame()

        for class_label, model in self.base_models.items():
            predictions[class_label] = model.predict_proba(X_train[self.best_feature_subsets[class_label]])[:, 1]

        self.final_model.fit(predictions, y_train)


    def predict_base_models(self, X_test):
        predictions = pd.DataFrame()

        for class_label, model in self.base_models.items():
            predictions[class_label] = model.predict_proba(X_test[self.best_feature_subsets[class_label]])[:, 1]

        return predictions

    def predict(self, X_test):
        base_model_predictions = self.predict_base_models(X_test)
        return self.final_model.predict(base_model_predictions)

    def predict_proba_base_models(self, X_test):
        predictions = pd.DataFrame()

        for class_label, model in self.base_models.items():
            predictions[class_label] = model.predict_proba(X_test[self.best_feature_subsets[class_label]])[:, 1]

        return predictions

    def predict_proba(self, X_test):
        base_model_predictions = self.predict_proba_base_models(X_test)
        return self.final_model.predict_proba(base_model_predictions)

Usage:

Define dictionary of best feature subsets for classes:

optimal_features = {
    0: ['A', 'B', 'C'],

    1: ['D', 'E', 'F', 'G'],

    2: ['G', 'H', 'I', 'J']
}

Instantiate class and train models:

classifier = RandomForestClassifier()
ensemble   = CustomEnsemble(classifier, optimal_features)

Train models:

# first, train base models
ensemble.train_base_models(X_train, y_train)
# then, train the ensemble
ensemble.train_final_model(X_train, y_train)

Make predictions:

yhat = ensemble.predict(X_test)
yhat_proba = ensemble.predict_proba(X_test) # so as to calculate roc_auc_score()

However, it appears I am not doing things right. I am not training the ensemble on the output of base models, but on the original input features.
Also, I am not sure if separating train_base_models() and train_final_model() is the best approach (this implies fitting twice: base models then final model as in the usage). Or better to combine these into one method (say train_ensemble()).

from Creating an ensemble of classifiers based on predefined feature subsets

Meteor Blaze dynamic template CSS results in duplicate element IDs

I was given an old Bootstrap template to make it dynamic, and I had issues with dynamic CSS with ::before and ::after selectors check the below example image

The core issue is, that it works fine for single usage/class but when I tried to use the same CSS for dynamic templates (more than 1) it behaved strangely, initially I noticed that the ID was the same so I tried to convert it to class but issue not resolved, then I realized that that issue coming with only ::before and ::after selectors all other templates classes working fine, so I tried to add that CSS dynamically with combination of unique document ID’s, even I ended up with dynamic style tag with JS but it seems very hacky solution.

h4, h5, h6,
h1, h2, h3 {margin: 0;}
ul, ol {margin: 0;}
p {margin: 0;}

html, body{
    font-family: 'Roboto Condensed', sans-serif;
    font-size: 100%;
    overflow-x: hidden;
    background: #FFFFFF;
}

#page-wrapper3 .widget-shadow {
    background-color: #fff;
    box-shadow: 0 -1px 3px rgba(0,0,0,0),0 1px 2px rgba(0,0,0,0);
    -webkit-box-shadow: 0 -1px 3px rgba(0,0,0,0),0 1px 2px rgba(0,0,0,0);
    -moz-box-shadow: 0 -1px 3px rgba(0,0,0,0),0 1px 2px rgba(0,0,0,0);
}
#page-wrapper3 .login-top {
    padding: 1.5em;
    border-bottom: 0 solid #DED9D9!important;
    text-align: center;
}

#wrapper {
    width: 100%;
}
#page-wrapper {
    padding:7em 2em 2.5em 12em;
    background-color: #EFF4FA;
}
#page-wrapper2 {
    padding:7em 2em 2.5em 2em;
    background-color: #EFF4FA;
}
#page-wrapper3 {
    padding:4em 2em 2.5em 2em;
    background-color: #FFFFFF;
}

#page-wrapper3 .login-page{
    width: 70%!important;
}

.selecOpc3{
    width: 100%;
    display: block;
    margin: 10px 40px 40px;
}

.selecOpc3 div{
    display: flex;
    justify-content: space-between;
    width: 100%;
}

.selecOpc3 .dRati{
    display: inline-block;
    position: relative;
}

.selecOpc3 .dRati .escala{
    display: grid;
    grid-template-columns: auto auto auto auto auto;
    margin: 25px auto;
}

.selecOpc3 img{
    width: 35px;
}


.textCuest1{
    margin: 10px 0px 0px;
    text-align: left;
}

.rating, .rating1, .escala {
  position: absolute;
  top: 50%;
  left: 50%;
  transform: translate(-50%, -50%) rotate(180deg);
  display: flex;
  justify-content: space-around;
}

.rating input, .rating1 input, .escala input {
  display: none;
}


.escala label {
    display: inline-grid;
    cursor: pointer;
    width: 40px;
    transform: rotate(180deg);
    margin: 5px;
}


/* here is the issue which I had to resolve dynamically */
.escala label[for=escala1]::before {
  content: '1';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala2]::before {
  content: '2';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala3]::before {
  content: '3';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala4]::before {
  content: '4';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala5]::before {
  content: '5';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala6]::before {
  content: '6';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala7]::before {
  content: '7';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala8]::before {
  content: '8';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala9]::before {
  content: '9';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala10]::before {
  content: '10';
  position: relative;
  display: block;
  border: 1px solid;
    border-radius: 50px;
    padding: 7px;
}

.escala label[for=escala1]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '1';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala2]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '2';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala3]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '3';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala4]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '4';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala5]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '5';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala6]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '6';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala7]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '7';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala8]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '8';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala9]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '9';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}

.escala label[for=escala10]::after {
  position: absolute;
  top: 0px;
  opacity: 0;
  transition: 0.5s;
  content: '10';
    width: 40px;
    border: 1px solid #44987b;
    border-radius: 50px;
    padding: 7px;
    background-color: rgb(68 152 123 / 17%);
    color: #44987b;
}


.escala label:hover::after,
.escala label:hover ~ label::after,
.escala input:checked ~ label::after {
  opacity: 1;
}

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
</head>
<body>
<div class="selecOpc3">
    <div class="dRati" id="scale">
        <div class="escala">
            <input type="radio" name="escala" id="escala10" value="10"><label for="escala10"></label>
            <input type="radio" name="escala" id="escala9" value="9"><label for="escala9"></label>
            <input type="radio" name="escala" id="escala8" value="8"><label for="escala8"></label>
            <input type="radio" name="escala" id="escala7" value="7"><label for="escala7"></label>
            <input type="radio" name="escala" id="escala6" value="6"><label for="escala6"></label>
            <input type="radio" name="escala" id="escala5" value="5"><label for="escala5"></label>
            <input type="radio" name="escala" id="escala4" value="4"><label for="escala4"></label>
            <input type="radio" name="escala" id="escala3" value="3"><label for="escala3"></label>
            <input type="radio" name="escala" id="escala2" value="2"><label for="escala2"></label>
            <input type="radio" name="escala" id="escala1" value="1"><label for="escala1"></label>
        </div>
    </div>
</div>

<br />
<hr />
<br />

<div class="selecOpc3">
    <div class="dRati" id="scale">
        <div class="escala">
            <input type="radio" name="escala" id="escala10" value="10"><label for="escala10"></label>
            <input type="radio" name="escala" id="escala9" value="9"><label for="escala9"></label>
            <input type="radio" name="escala" id="escala8" value="8"><label for="escala8"></label>
            <input type="radio" name="escala" id="escala7" value="7"><label for="escala7"></label>
            <input type="radio" name="escala" id="escala6" value="6"><label for="escala6"></label>
            <input type="radio" name="escala" id="escala5" value="5"><label for="escala5"></label>
            <input type="radio" name="escala" id="escala4" value="4"><label for="escala4"></label>
            <input type="radio" name="escala" id="escala3" value="3"><label for="escala3"></label>
            <input type="radio" name="escala" id="escala2" value="2"><label for="escala2"></label>
            <input type="radio" name="escala" id="escala1" value="1"><label for="escala1"></label>
        </div>
    </div>
</div>
</body>
</html>

I added sample template code (extracted from a big template here) https://github.com/raza2022/dynamic-css-sample

the live demo is here https://dynamic-css.meteorapp.com/

You can test whenever the user goes and selects the second question only the first one selected CSS apply, (I want both to behave independently) as I have other types (smilies/stars) as well, and with the hacky solution is https://dynamic-css.meteorapp.com/v1

from Meteor Blaze dynamic template CSS results in duplicate element IDs

Proxy won't connect after adding user directory and profile directory

I am having trouble connecting to this proxy with seleniumwire, it will connect and ask me for user and password if I remove this line. and yes, I have the actual directory and proxy in the real code, it is changed on here for privacy reasons.

chrome_options.add_argument(f'--profile-directory={profile_directory}')

I'm not sure if maybe the profile directory changes proxy or what but it will only connect if that is removed, how can I make it connect and still use the profile directory?

from seleniumwire import webdriver

proxy_address = "your_proxy_address:your_proxy_port"
user_data_dir = "path/to/your/user/data/directory"
profile_directory = "YourProfileDirectory"

chrome_options = webdriver.ChromeOptions()

proxy_options = {
    'proxy': {
        'http': f'http://{proxy_address}',
        'https': f'https://{proxy_address}',
        'no_proxy': 'localhost,127.0.0.1'
    }
}

chrome_options.add_argument(f'--user-data-dir={user_data_dir}')
chrome_options.add_argument(f'--profile-directory={profile_directory}')

driver = webdriver.Chrome(seleniumwire_options=proxy_options, chrome_options=chrome_options)


driver.get("https://myurl.com/")

from Proxy won't connect after adding user directory and profile directory

JavaScript - Attach context to async function call?

Synchronous function call context

In JavaScript, it's easy to associate some context with a synchronous function call by using a stack in a global scope.

// Context management

let contextStack = [];
let context;

const withContext = (ctx, func) => {
  contextStack.push(ctx);
  context = ctx;

  try {
    return func();
  } finally {
    context = contextStack.pop();
  }
};

// Example

const foo = (message) => {
  console.log(message);
  console.log(context);
};

const bar = () => {
  withContext("calling from bar", () => foo("hello"));
};

bar();

This allows us to write context-specific code without having to pass around a context object everywhere and have every function we use depend on this context object.

This is possible in JavaScript because of the guarantee of sequential code execution, that is, these synchronous functions are run to completion before any other code can modify the global state.

Generator function call context

We can achieve something similar with generator functions. Generator functions give us an opportunity to take control just before conceptual execution of the generator function resumes. This means that even if execution is suspended for a few seconds (that is, the function is not run to completion before any other code runs), we can still ensure that there is an accurate context attached to its execution.

const iterWithContext = function* (ctx, generator) {
  // not a perfect implementation

  let iter = generator();
  let reply;

  while (true) {
    const { done, value } = withContext(ctx, () => iter.next(reply));
    
    if (done) {
      return;
    }
    
    reply = yield value;
  }
};

Question: Async function call context?

It would also be very useful to attach some context to the execution of an async function.

const timeout = (ms) => new Promise(res => setTimeout(res, ms));

const foo = async () => {
  await timeout(1000);
  console.log(context);
};

const bar = async () => {
  await asyncWithContext("calling from bar", foo);
};

The problem is, to the best of my knowledge, there is no way of intercepting the moment before an async function resumes execution, or the moment after the async function suspends execution, in order to provide this context.

Is there any way of achieving this?

My best option right now is to not use async functions, but to use generator functions that behave like async functions. But this is not very practical as it requires the entire codebase to be written like this.

Background / Motivation

Using context like this is incredibly valuable because the context is available deep down the call-stack. This is especially useful if a library needs to call an external handler such that if the handler calls back to the library, the library will have the appropriate context. For example, I'd imagine React hooks and Solid.js extensively use context in this way under-the-hood. If not done this way, the programmer would have to pass a context object around everywhere and use it when calling back to the library, which is both messy and error-prone. Context is a way to neatly "curry" or abstract away a context object from function calls, based on where we are in the call stack. Whether it is good practice or not is debatable, but I think we can agree that it's something library authors have chosen to do. I would like to extend the use of context to asynchronous functions, which are supposed to conceptually behave like synchronous functions when it comes to the execution flow.

Bounty

I'm only realizing now that the previously-accepted answer does not work in browsers, as browsers do not implement async stack traces. I hope that an alternative hack is possible and so I am starting another bounty.

from JavaScript - Attach context to async function call?

Saturday 25 November 2023

More than 6 characters string repeated

I am trying to find the repeated strings (not words) from text.

x = 'This is a sample text and this is lowercase text that is repeated.'

In this example, the string ' text ' should not return because only 6 characters match with one another. But the string 'his is ' is the expected value returned.

I tried using range, Counter and regular expression.

import re
from collections import Counter

duplist = list()
for i in range(1, 30):
  mylist = re.findall('.{1,'+str(i)+'}', x)
  duplist.append([k for k,v in Counter(mylist).items() if v>1])

from More than 6 characters string repeated

Code splitting not working with parcel in react

I have this component which in the parcel's documentation they say it should work by default.

import React from 'react';

const Router = () => {
 ...
  const ComponentToRender = React.lazy(() => import(`./Pages/${variableCompoentName}/index.js`));
  return (
    <React.Suspense fallback={<div>Loading...</div>}>
      <ComponentToRender />
    </React.Suspense>
  );
};

export default Router;

But I get an error in the browser

Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/html". Strict MIME type checking is enforced for module scripts per HTML spec.

2src.7ed060e2.js:1362 Uncaught TypeError: Failed to fetch dynamically imported module: http://localhost:1234/Pages/Home/index.js

What is the way around this? and what am I doing wrong?

here is my package.json

  "dependencies": {
    "@babel/core": "^7.12.3",
    "@babel/preset-env": "^7.12.1",
    "@babel/preset-react": "^7.12.5",
    "babel-plugin-transform-class-properties": "^6.24.1",
    "parcel-bundler": "^1.12.5",
    "react": "^16.9.0 || ^17.0.0 || ^18",
    "react-dom": "^18.2.0",
     ...
  },
  "scripts": {
    "start": "parcel public/index.html",
    "build": "parcel build public/index.html",
  }

from Code splitting not working with parcel in react

Friday 24 November 2023

How can I communicate with 2 reactjs components - using Jotai

I have 2 reactjs files:

Reports.js (used to request report and display the result)
AuthContext.js (has a socket connection to maintain communication with the backend server)

user first goes to the report page generated by Reports.js and then there is a call to the backend server which returns right away but it shows a loading spinner. When the report is completed, it will send the data to AuthContext.js. However, I have trouble from AuthContext.js to be able to request a call setReportLoading() in Reports.js for the purpose of stopping the loading spinner. can you please advise how to solve this ?

I tried the method below setReportLoading() but it has this error:

setReportLoading.js:115 Uncaught TypeError: setReportLoading is not a function

here is my code snippet

In file AuthContext.js
import { setReportLoading } from 'Reports';

export const AuthProvider = ({ children }) => {

            socket.on('processmessage', (msg) => {
              if (msg.type == 'checking'){
                  setReportLoading(2);         
              }            
            }); 
   
}

In file Reports.js

const Reports = () => {

    const [loading, setLoading] = useState(1)

    const setReportLoading = (isLoading) => {
        setLoading(isLoading);
      };

      const renderContent = () => {

        if (loading === 1) return (
            <div className="d-flex justify-content-center align-items-center" style=>
                <div className="spinner-border text-primary" role="status">
                    <span className="visually-hidden">{t('loc_react_loading')}</span>
                </div>
            </div>
        )


        if (loading === 2) return (
            <div className="mb-4 card border-0 shadow-sm" style=min-width>
                {renderReportSelection()}
                {showReport && renderReport()}
            </div>   
        )
    }


    return (
        <DashboardLayout>
        </DashboardLayout>
    )
}

export default Reports;

UPDATE tried using jotai 2.0.0 but somehow no crash, but it didnt seem to go to handleReportloadingChange() function

in AuthContext.js

import { atom, useAtom } from 'jotai';

export const reportloadingAtom = atom(1);

export const AuthProvider = ({ children }) => {
  const [reportloading, setReportloading] = useAtom(reportloadingAtom);

  socket.on('processmessage', (msg) => {
    if (msg.type == 'checking') {
      setReportloading(2);
    }
  });

  // ...
};

in Reports.js

import { useAtom } from 'jotai';
import { reportloadingAtom } from 'AuthContext'

const Reports = () => {
  const [reportloading, setReportloading] = useAtom(reportloadingAtom);
  const [loading, setLoading] = useState(1);

  useEffect(() => {
    // This function will run every time the value of reportloadingAtom changes
    function handleReportloadingChange() {
      console.log("handleReportloadingChange")
      setLoading(reportloading);
    }

  }, [reportloading]);

  // ...
};

from How can I communicate with 2 reactjs components - using Jotai

Logarithmic heatmap in Plotly

I am using heatmap from Plotly. I want to use a logarithmic scale for the color but cannot find how to do so. Here is a MWE:

import plotly.graph_objects as go
import numpy as np

z = [[1e-4,1e-3,1e-2],
    [1e-1, 1, 1e1],
    [1e2, 1e3, 1e4]]

go.Figure(
    data = go.Heatmap(
        z = z,
    )
).show()

go.Figure(
    data = go.Heatmap(
        z = np.log(z),
    )
).show()

In the MWE I manually calculate the logarithm of the data. I want the color map to be shown as in the second figure but without having to manually transform the data, and also displaying the real z values in the color scale, not the logarithm.

from Logarithmic heatmap in Plotly

Thursday 23 November 2023

What events are fired from the Selectmenu Widget when the refresh method is run?

I am using jQuery to attempt to update a drop-down list however the appearance of the list is not properly reflecting the change. Running the refresh method is often recommended:

$(#ddlExample").selectmenu('refresh');

to correct this however when I do so other form elements unrelated to the specific list are altered. There are multiple JavaScript business rules on the page and it occurs to me that the refresh method may be firing events I am unaware of and triggering these.

Has anyone else experienced unexpected side effects when using this method?

from What events are fired from the Selectmenu Widget when the refresh method is run?

PySpark: CumSum with Salting over Window w/ Skew

How can I use salting to perform a cumulative sum window operation? While a tiny sample, my id column is heavily skewed and I need to perform effectively this operation on it:

window_unsalted = Window.partitionBy("id").orderBy("timestamp")  

# exected value
df = df.withColumn("Expected", F.sum('value').over(window_unsalted))

However, I want to try salting because at the scale of my data, I cannot compute it otherwise.

Consider this MWE. How can I replicate the expected value, 20, using salting techniques?

from pyspark.sql import functions as F  
from pyspark.sql.window import Window  

data = [  
    (7329, 1636617182, 1.0),  
    (7329, 1636142065, 1.0),  
    (7329, 1636142003, 1.0),  
    (7329, 1680400388, 1.0),  
    (7329, 1636142400, 1.0),  
    (7329, 1636397030, 1.0),  
    (7329, 1636142926, 1.0),  
    (7329, 1635970969, 1.0),  
    (7329, 1636122419, 1.0),  
    (7329, 1636142195, 1.0),  
    (7329, 1636142654, 1.0),  
    (7329, 1636142484, 1.0),  
    (7329, 1636119628, 1.0),  
    (7329, 1636404275, 1.0),  
    (7329, 1680827925, 1.0),  
    (7329, 1636413478, 1.0),  
    (7329, 1636143578, 1.0),  
    (7329, 1636413800, 1.0),  
    (7329, 1636124556, 1.0),  
    (7329, 1636143614, 1.0),  
    (7329, 1636617778, -1.0),  
    (7329, 1636142155, -1.0),  
    (7329, 1636142061, -1.0),  
    (7329, 1680400415, -1.0),  
    (7329, 1636142480, -1.0),  
    (7329, 1636400183, -1.0),  
    (7329, 1636143444, -1.0),  
    (7329, 1635977251, -1.0),  
    (7329, 1636122624, -1.0),  
    (7329, 1636142298, -1.0),  
    (7329, 1636142720, -1.0),  
    (7329, 1636142584, -1.0),  
    (7329, 1636122147, -1.0),  
    (7329, 1636413382, -1.0),  
    (7329, 1680827958, -1.0),  
    (7329, 1636413538, -1.0),  
    (7329, 1636143610, -1.0),  
    (7329, 1636414011, -1.0),  
    (7329, 1636141936, -1.0),  
    (7329, 1636146843, -1.0)  
]  
  
df = spark.createDataFrame(data, ["id", "timestamp", "value"])  
  
# Define the number of salt buckets  
num_buckets = 100  
  
# Add a salted_id column to the dataframe  
df = df.withColumn("salted_id", (F.concat(F.col("id"),   
                (F.rand(seed=42)*num_buckets).cast("int")).cast("string")))  
  
# Define a window partitioned by the salted_id, and ordered by timestamp  
window = Window.partitionBy("salted_id").orderBy("timestamp")  
  
# Add a cumulative sum column  
df = df.withColumn("cumulative_sum", F.sum("value").over(window))  
  
# Define a window partitioned by the original id, and ordered by timestamp  
window_unsalted = Window.partitionBy("id").orderBy("timestamp")  
  
# Compute the final cumulative sum by adding up the cumulative sums within each original id  
df = df.withColumn("final_cumulative_sum",   
                   F.sum("cumulative_sum").over(window_unsalted))  

# exected value
df = df.withColumn("Expected", F.sum('value').over(window_unsalted))

# incorrect trial
df.agg(F.sum('final_cumulative_sum')).show()

# expected value
df.agg(F.sum('Expected')).show()

from PySpark: CumSum with Salting over Window w/ Skew

Wednesday 22 November 2023

Sending an excel file to teams bot in a channel

I have made a teams bot using botbuilder SDK 4.0, there is a feature in the bot where the user uploads a file to the bot and the bot collects the download_url and send it to the backend for the file to be downloaded and processed,this is all working fine until i added the bot to a channel.

I can send and receive messages from the bot without any problems but the file upload is not working the file can be uploaded in the channel but the bot does not receive the response and the 'context.activity.attachment.length' is 0, indicating that the bot has not received the attachment.

const url = context.activity.attachments[0].content.downloadUrl;

This is the code i use to get the download url after checking if the attachment.length is greater than 0.

Would appreciate any help in getting the download url for the file uploaded in the teams bot framework.

from Sending an excel file to teams bot in a channel

Flask restx Debug "Unauthorized" response

I have a Flask (2.2.3) app with Flask-RESTX (1.1.0) used as an API (without frontend). I'm using flask-azure-oauth library to authenticate users using Azure AD. The setup is:

from flask import Flask, current_app
from flask_azure_oauth import FlaskAzureOauth
from flask_restx import Api

app = Flask(__name__)
api = Api(app, <...>)
CORS(app)
auth = FlaskAzureOauth()
auth.init_app(app)

# App routes
@api.route("/foo")
class FooCollection(Resource):
    @auth('my_role')
    def get(self):
        return [<...>]

This was working fine, but since a few days I started to receive Unauthorized responses when passing valid token. Unfortunately I am not able to track the reason - tokens seem fine (examined manually or decoded using jwt.ms) and the only response I have from API is: 401 UNAUTHORIZED with response body { "message": null }.

I tried to add error logging and error handlers:

# Logging request/response
@app.before_request
def log_request_info():
    app.logger.debug(f"{request.method} {request.path} {request.data}")

@app.after_request
def log_response_info(response):
    app.logger.debug(f"{response.status}")
    return response

# Error handling
@app.errorhandler(Unauthorized)
def handle_error(error):
    current_app.logger.debug(f"Oops")
    <...>

@app.errorhandler
def handle_error(error):
    current_app.logger.debug(f"Noooo...!")
    <...>

With this, request and response are logged and non-HTTP exceptions are handled by handle_error. But HTTP errors like 404, 401, ... are just passing by, ignored by both generic error handler and a specific one (@app.errorhandler(Unauthorized)).

So how do I properly intercept and examine them? (with focus on: how do I find out why it denied token authorization)

from Flask restx Debug "Unauthorized" response

StackingClassifier with base-models trained on feature subsets

I can best describe my goal using a synthetic dataset. Suppose I have the following:

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

df = pd.DataFrame(X, columns=list('ABCDEFGHIJ'))

X_train, X_test, y_train, y_test = train_test_split(
    df, y, test_size=0.3, random_state=42)

X_train.head()
         A       B           C        D         E       F          G         H       I        J
541 -0.277848 1.022357 -0.950125 -2.100213  0.883638 0.821387  1.154613  0.075376  1.176242 -0.470087
440  1.089665 0.841446 -1.701004 -1.036256 -1.229357 0.345068  1.876470 -0.750067  0.080685 -1.318271
482  0.016010 0.025488 -1.189296 -1.052935 -0.623029 0.669521  1.518927  0.690019 -0.045486 -0.494186
422 -0.133358 -2.16219  1.170989 -0.942150  1.933444 -0.55118 -0.059908 -0.938672 -0.924097 -0.796185
778  0.901954 1.479360 -2.639176 -2.588845 -0.753915 -1.650621 2.727146  0.075260  1.330432 -0.941594

After conducting a feature importance analysis, the discovered that each of the 3-classes in the dataset can best be predicted using feature subset, as oppose to the whole. For example:

class  | optimal predictors
-------+-------------------
   0   |  A, B, C
   1   |  D, E, F, G
   2   |  G, H, I, J
-------+-------------------

At this point, I would like to use 3 one-ve-rest classifiers to train sub-models, one for each class and using the class's best predictors (as the base models). And then a StackingClassifier for final prediction.

I have high-level understanding of the StackingClassifier, where different base models can be trained (e.g. DT, SVC, KNN etc) and a meta classifier using another model e.g. Logistice Regression.

In this case however, the base model is one DT classifier, only that each is to be trained using feature subset best for the class, as above.

Then finally make predictions on the X_test.

But I am not sure how this can be done. So I give the description of my work using pseudo data as above.

How to design this to train the base models, and a final prediction?

from StackingClassifier with base-models trained on feature subsets

Tuesday 21 November 2023

Google Colab Annotate

The bounding box coordinates of the colab_utils.annotate is:

np.array([[0.41941667, 0.08333333, 0.89941667, 0.7296875]], dtype=np.float32),

This is the bounding box coordinates of apps like labelImg:

<bndbox>
            <xmin>948</xmin>
            <ymin>537</ymin>
            <xmax>1416</xmax>
            <ymax>650</ymax>
</bndbox>

Question: is the coordinates of google colab normalised according to the pixels of the image? like this:?

# Bounding box coordinates
xmin = 948
ymin = 537
xmax = 1416
ymax = 650

# Image dimensions
width = 1920
height = 1080

# Compute normalized coordinates
xmin_norm = xmin / width
ymin_norm = ymin / height
xmax_norm = xmax / width
ymax_norm = ymax / height

from Google Colab Annotate

Bundle Vue project into a single html file that can be embedded into a Blogger blogspot post

I want to bundle all the Vue.js project files (HTML, js, CSS) into a single HTML file so that it can be deployed in a blogger blogspot post.

A similar question was asked in the past for a ghost blog but it was about bundling the files into a single js file.

Bundle Vue project into single js file that can be embedded in Ghost blog post

I am using @vue/cli 5.0.1, yarn v1.22.21

from Bundle Vue project into a single html file that can be embedded into a Blogger blogspot post

Python Filtering OData Entity by Parent - Requests Library

I have an OData API I'm trying to query using the Python Requests library. I'm trying to filter a table based on the parent table for an incremental data pull. The children tables don't use a last_update_date, but I can get this date from the parent as the parent's last_updated_date changes even if the change is in a child table. Note that in my example, employees is the parent entity of employee_texts.

My code looks like the below:

data = requests.get(
    endpoint_url + "employee_texts"
    , auth=(api_username, api_password)
    , params={
        "$filter": "employees/LastModifiedDate gt 2023-11-15T00:00:00.00Z",
        "$format": "json"
    }
    , verify=True
)

If I run this, I get the below error:

{'error': {'code': '',
'message': 'Value cannot be null.\r\nParameter name: type'}}

Please note that this code works as expected:

data = requests.get(
    endpoint_url + "employees"
    , auth=(api_username, api_password)
    , params={
        "$filter": "LastModifiedDate gt 2023-11-15T00:00:00.00Z",
        "$format": "json"
    }
    , verify=True
)

from Python Filtering OData Entity by Parent - Requests Library

Monday 20 November 2023

Extracting feature embeddings from an image

I'm trying to use TensorFlow.js to extract feature embeddings from images.

Elsewhere I'm using PyTorch and ResNet152 to extract feature embeddings to good effect.

The following is a sample of how I'm extracting those feature embeddings.

import torch
import torchvision.models as models
from torchvision import transforms
from PIL import Image

# Load the model
resnet152_torch = models.resnet152(pretrained=True)

# Enumerate all of the layers of the model, except the last layer. This should leave
# the average pooling layer. 
layers = list(resnet152_torch.children())[:-1]

resnet152 = torch.nn.Sequential(*(list(resnet152_torch.children())[:-1]))

# Set to evaluation model. 
resnet152_torch.eval()

# Load and preprocess the image, it's already 224x224
image_path = "test.png" 
img = Image.open(image_path).convert("RGB")

# Define the image transformation
preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Apply the preprocessing steps to the image
img_tensor = preprocess(img).unsqueeze(0)

with torch.no_grad():
    # Get the image features from the ResNet-152 model
    img_features = resnet152(img_tensor)

print(img_features.squeeze())

Essentially, I'm using the pre-trained model and dropping the last layer to get my feature embeddings.

The result of the above script is:

tensor([0.2098, 0.4687, 0.0914,  ..., 0.0309, 0.0919, 0.0480])

So now, I want to do something similar with TensorFlow.js.

The first thing that I need is an instance of the ResNet152 model that I can use with TensorFlow.js. So I created the following Python script to export ResNet152 to the Keras format...

from tensorflow.keras.applications import ResNet152
from tensorflow.keras.models import save_model

# Load the pre-trained ResNet-152 model without the top (fully connected) layer
resnet152 = ResNet152(weights='imagenet')

# Set the model to evaluation mode
resnet152.trainable = False

# Save the ResNet-152 model
save_model(resnet152, "resnet152.h5")

And then I exported the Keras (.h5) model to the TensorFlow.js format using the "tensorflowjs_converter" utility...

tensorflowjs_converter --input_format keras resnet152.h5 resnet152

Once I have the model in the appropriate format (I think), I switch over to Javascript.

import * as tf from '@tensorflow/tfjs-node';
import fs from 'fs';

async function main() {
    const model = await tf.loadLayersModel('file://resnet152/model.json');

    const modelWithoutFinalLayer = tf.model({
        inputs: model.input,
        outputs: model.getLayer('avg_pool').output
    });

    // Load the image from disk
    const image = fs.readFileSync('example_images/test.png'); // This is the exact same image file.
    const imageTensor = tf.node.decodeImage(image, 3);
    const preprocessedInput = tf.div(tf.sub(imageTensor, [123.68, 116.779, 103.939]), [58.393, 57.12, 57.375]);

    const batchedInput = preprocessedInput.expandDims(0);
    const embeddings = modelWithoutFinalLayer.predict(batchedInput).squeeze();

    embeddings.print();

    return;
}

await main();

The result of the above script is:

Tensor
    [0, 0, 0, ..., 0, 0, 0.029606]

Looking at the first three values of the outputs between the two versions of the script, I expected there to be some variation but not THIS MUCH.

Where do I go from here? Is this much variation expected? Am I just doing this wrong?

Any help would be greatly appreciated.

from Extracting feature embeddings from an image

Adding Durations to dates in a manner stable across timezones

I'm currently working with software that does computations with date-fns Durations both on the server- and client-side.

This software gathers data for a time window that is specified using Durations from a URL. The intention then is to gather data and perform computations on both sides for the same time window.

Now because of DST there are cases where these windows do not align when adding the Durations to a current date on either end.

For example when computing add(new Date('2023-11-13T10:59:13.371Z'), { days: -16 }) in UTC the computation arrives at 2023-10-28T10:59:13.371Z, but a browser in CET will arrive at 2023-10-28T09:59:13.371Z instead.

Attempted solution

I've been trying to conjure up a special addDuration function to add durations the way UTC does it in the hope of obtaining a reproducible way to apply durations independent of Browsers. However (because time is hard) this appears quite hard to get right and I'm not sure it is even entirely possible with what we've got. (I wish temporal was ready to aid me in this.)

So I came up with this function:

const addDuration = (date, delta) => {
  const { years = 0, months = 0, weeks = 0, days = 0, hours = 0, minutes = 0, seconds = 0 } = delta

  const dateWithCalendarDelta = add(date, { months, years, days, weeks })
  const tzDelta = date.getTimezoneOffset() - dateWithCalendarDelta.getTimezoneOffset()

  return add(dateWithCalendarDelta, { hours, minutes: minutes + tzDelta, seconds })
}

I then went on to test it with several examples and print outputs like this:

console.table(
  examples.map(({ start, delta, utc }) => {
    const add1 = add(new Date(start), delta)
    const ok1 = add1.toISOString() === utc ? '✅' : '❌'
    const add2 = addDuration(new Date(start), delta)
    const ok2 = add2.toISOString() === utc ? '✅' : '❌'

    return { start: new Date(start), delta, utc: new Date(utc), add1, ok1, add2, ok2 }
  }),
)

With this I went ahead and executed the code with different TZ environment variables:

Output of TZ=UTC node example.js:

Output of TZ=CET node example.js:

Here in the add2 column we see how addDuration behaves and a ✅ is displayed in the ok2 column when it matches the UTC output. Similarly add1 is the behaviour of the typical date-fns/add function.

Open ends

I'd like to specifically learn more about these aspects:

Is it generally possible to apply Durations to a Date in Browsers without shipping a whole dump of different timezone data?
- Is there a simple way to correct the broken case for addDuration in TZ=CET?
Is there an easy/simple way to achieve the desired outcome using date-fns? Maybe I've just overlooked something?
Is what I'm trying here a bad idea for some reason and I just struggle to understand that?

I think I want this:

A pure function to apply Durations (deltas) to a Date independent of the local timezone. Ideally it should work the same as UTC, but that feels secondary to working the same across different browsers.

I'm under the impression that this is hindered to some extent by how Date in JavaScript behaves dependent on the local TZ.

Existence of such a function would - I think - imply that a statement such as 'yesterday' or '1 year ago' could be interpreted in a way that makes sense independent of the local TZ and independent of DST.

I know that it would be possible to gloss over the facts of how many days the current year or month have exactly and 'just' compute a number of hours for this, to then accept the same delta for all - but I'd like things like { months: -1 } to work in a way that makes 'sense' for humans if possible.

Related notes

I've also opened a discussion about this on date-fns GitHub.
I've had looks at both date-fns-tz and date-fns/utc and could not find good ways to use them.

Complete example

Here's the complete example.js source:

// const add = require('date-fns/add')

const examples = [{
    start: '2023-10-29T03:00:00.000Z',
    delta: {
      hours: 0
    },
    utc: '2023-10-29T03:00:00.000Z',
  },
  {
    start: '2023-10-29T03:00:00.000Z',
    delta: {
      hours: -1
    },
    utc: '2023-10-29T02:00:00.000Z',
  },
  {
    start: '2023-10-29T03:00:00.000Z',
    delta: {
      hours: -2
    },
    utc: '2023-10-29T01:00:00.000Z',
  },
  {
    start: '2023-10-29T03:00:00.000Z',
    delta: {
      hours: -3
    },
    utc: '2023-10-29T00:00:00.000Z',
  },
  {
    start: '2023-10-29T03:00:00.000Z',
    delta: {
      hours: -4
    },
    utc: '2023-10-28T23:00:00.000Z',
  },
  {
    start: '2023-11-13T10:59:13.371Z',
    delta: {
      days: -15,
      hours: -4
    },
    utc: '2023-10-29T06:59:13.371Z',
  },
  {
    start: '2023-11-13T10:59:13.371Z',
    delta: {
      days: -16
    },
    utc: '2023-10-28T10:59:13.371Z',
  },
  {
    start: '2023-11-13T10:59:13.371Z',
    delta: {
      days: -16,
      hours: -4
    },
    utc: '2023-10-28T06:59:13.371Z',
  },
  {
    start: '2023-11-13T10:59:13.371Z',
    delta: {
      hours: -(16 * 24 + 4)
    },
    utc: '2023-10-28T06:59:13.371Z',
  },
  {
    start: '2023-10-30T00:00:00.000Z',
    delta: {
      days: -1
    },
    utc: '2023-10-29T00:00:00.000Z',
  },
  {
    start: '2023-10-30T00:00:00.000Z',
    delta: {
      days: -2
    },
    utc: '2023-10-28T00:00:00.000Z',
  },
  {
    start: '2023-03-26T04:00:00.000Z',
    delta: {
      hours: 0
    },
    utc: '2023-03-26T04:00:00.000Z',
  },
  {
    start: '2023-03-26T04:00:00.000Z',
    delta: {
      hours: -1
    },
    utc: '2023-03-26T03:00:00.000Z',
  },
  {
    start: '2023-03-26T04:00:00.000Z',
    delta: {
      hours: -2
    },
    utc: '2023-03-26T02:00:00.000Z',
  },
  {
    start: '2023-03-26T04:00:00.000Z',
    delta: {
      hours: -3
    },
    utc: '2023-03-26T01:00:00.000Z',
  },
  {
    start: '2023-03-26T04:00:00.000Z',
    delta: {
      days: -1
    },
    utc: '2023-03-25T04:00:00.000Z',
  },
  {
    start: '2023-03-26T04:00:00.000Z',
    delta: {
      days: -1,
      hours: 1
    },
    utc: '2023-03-25T05:00:00.000Z',
  },
  {
    start: '2023-10-30T00:00:00.000Z',
    delta: {
      months: 1,
      days: -1
    },
    utc: '2023-11-29T00:00:00.000Z',
  },
  {
    start: '2023-10-30T00:00:00.000Z',
    delta: {
      months: -1,
      days: 1
    },
    utc: '2023-10-01T00:00:00.000Z',
  },
  {
    start: '2023-10-30T00:00:00.000Z',
    delta: {
      years: 1,
      days: -1
    },
    utc: '2024-10-29T00:00:00.000Z',
  },
  {
    start: '2023-10-30T00:00:00.000Z',
    delta: {
      years: -1,
      days: 1
    },
    utc: '2022-10-31T00:00:00.000Z',
  },
  {
    start: '2023-10-29T00:00:00.000Z',
    delta: {
      months: 1,
      days: -1
    },
    utc: '2023-11-28T00:00:00.000Z',
  },
  {
    start: '2023-10-29T00:00:00.000Z',
    delta: {
      months: -1,
      days: 1
    },
    utc: '2023-09-30T00:00:00.000Z',
  },
  {
    start: '2023-10-29T00:00:00.000Z',
    delta: {
      years: 1,
      days: -1
    },
    utc: '2024-10-28T00:00:00.000Z',
  },
  {
    start: '2023-10-29T00:00:00.000Z',
    delta: {
      years: -1,
      days: 1
    },
    utc: '2022-10-30T00:00:00.000Z',
  },
  {
    start: '2023-10-28T00:00:00.000Z',
    delta: {
      months: 1,
      days: -1
    },
    utc: '2023-11-27T00:00:00.000Z',
  },
  {
    start: '2023-10-28T00:00:00.000Z',
    delta: {
      months: -1,
      days: 1
    },
    utc: '2023-09-29T00:00:00.000Z',
  },
  {
    start: '2023-10-28T00:00:00.000Z',
    delta: {
      years: 1,
      days: -1
    },
    utc: '2024-10-27T00:00:00.000Z',
  },
  {
    start: '2023-10-28T00:00:00.000Z',
    delta: {
      years: -1,
      days: 1
    },
    utc: '2022-10-29T00:00:00.000Z',
  },
  {
    start: '2023-03-27T00:00:00.000Z',
    delta: {
      months: 1,
      days: -1
    },
    utc: '2023-04-26T00:00:00.000Z',
  },
  {
    start: '2023-03-27T00:00:00.000Z',
    delta: {
      months: -1,
      days: 1
    },
    utc: '2023-02-28T00:00:00.000Z',
  },
  {
    start: '2023-03-27T00:00:00.000Z',
    delta: {
      years: 1,
      days: -1
    },
    utc: '2024-03-26T00:00:00.000Z',
  },
  {
    start: '2023-03-27T00:00:00.000Z',
    delta: {
      years: -1,
      days: 1
    },
    utc: '2022-03-28T00:00:00.000Z',
  },
  {
    start: '2023-03-26T00:00:00.000Z',
    delta: {
      months: 1,
      days: -1
    },
    utc: '2023-04-25T00:00:00.000Z',
  },
  {
    start: '2023-03-26T00:00:00.000Z',
    delta: {
      months: -1,
      days: 1
    },
    utc: '2023-02-27T00:00:00.000Z',
  },
  {
    start: '2023-03-26T00:00:00.000Z',
    delta: {
      years: 1,
      days: -1
    },
    utc: '2024-03-25T00:00:00.000Z',
  },
  {
    start: '2023-03-26T00:00:00.000Z',
    delta: {
      years: -1,
      days: 1
    },
    utc: '2022-03-27T00:00:00.000Z',
  },
  {
    start: '2023-03-25T00:00:00.000Z',
    delta: {
      months: 1,
      days: -1
    },
    utc: '2023-04-24T00:00:00.000Z',
  },
  {
    start: '2023-03-25T00:00:00.000Z',
    delta: {
      months: -1,
      days: 1
    },
    utc: '2023-02-26T00:00:00.000Z',
  },
  {
    start: '2023-03-25T00:00:00.000Z',
    delta: {
      years: 1,
      days: -1
    },
    utc: '2024-03-24T00:00:00.000Z',
  },
  {
    start: '2023-03-25T00:00:00.000Z',
    delta: {
      years: -1,
      days: 1
    },
    utc: '2022-03-26T00:00:00.000Z',
  },
]

const addDuration = (date, delta) => {
  const {
    years = 0, months = 0, weeks = 0, days = 0, hours = 0, minutes = 0, seconds = 0
  } = delta

  const dateWithCalendarDelta = add(date, {
    months,
    years,
    days,
    weeks
  })
  const tzDelta = date.getTimezoneOffset() - dateWithCalendarDelta.getTimezoneOffset()

  return add(dateWithCalendarDelta, {
    hours,
    minutes: minutes + tzDelta,
    seconds
  })
}

const main = () => {
  console.table(
    examples.map(({
      start,
      delta,
      utc
    }) => {
      const add1 = add(new Date(start), delta)
      const ok1 = add1.toISOString() === utc ? '✅' : '❌'
      const add2 = addDuration(new Date(start), delta)
      const ok2 = add2.toISOString() === utc ? '✅' : '❌'

      return {
        start: new Date(start),
        delta,
        utc: new Date(utc),
        add1,
        ok1,
        add2,
        ok2
      }
      document.querySelector('tbody')
    }),
  )
}

setTimeout(main, 500)

<script type="module">
  import { add } from 'https://esm.run/date-fns';
  window.add = add;
</script>

from Adding Durations to dates in a manner stable across timezones

Leaflet JS widget in Filament won't execute

For a non-profit I'm creating a Filament admin panel with a widget running a Leaflet.js map.

I've set this up according to the Filemant docs

All goes well, the data gets passed to the JS. But leaflet doesn't create the map. It seems that the L.map() command can't find the DIV I'm referrencing.

i've tried hardcoding it as const map = L.map('map'), L.map(document.getElementById('map')); L.map(this.$refs.leafletMap), L.map(this.$refs.leafletMap.id), ... none does the tric

Any help is hugely appreciated :)

AdminPanelProvider.php

class AdminPanelProvider extends PanelProvider
{
    public function boot()
    {
        FilamentAsset::register([
            Css::make('leaflet-1-9-4-css', 'https://unpkg.com/leaflet@1.9.4/dist/leaflet.css'),
            Js::make('leaflet-1-9-4-js', 'https://unpkg.com/leaflet@1.9.4/dist/leaflet.js'),
            AlpineComponent::make('visitor-heatmap-js', __DIR__ . '/../../../resources/js/leaflet-heatmap.js'),         
        ]);
    }
    
   ..
}

visitor-heatmap-widget.blade

<x-filament-widgets::widget>
    <x-filament::section>
    HEATMAP
        
        
        <div
            x-ignore
            ax-load
            ax-load-src=""
            x-data="leafletVisitorsHeatmap({
                cities: @js($data)
            })"
        >
            
                <div x-ref="leafletMap" id="map" style="width: 100%; height: 100%">

                </div>
            
        </div>
    </x-filament::section>
</x-filament-widgets::widget>

leaflet-heatmap.js (stripped to the bare minimum)

export default function leafletVisitorsHeatmap({ cities }) 
{
    return {
 
        init: function () {
            console.log('start leafletVisitorsHeatmap');
            //console.log('cities:');
            //console.log(cities);
                    
            const map = L.map(this.$refs.leafletMap.id).setView([51.505, -0.09], 13);

            const tiles = L.tileLayer('https://tile.openstreetmap.org/{z}/{x}/{y}.png', {
                maxZoom: 19,
                attribution: '&copy; <a href="http://www.openstreetmap.org/copyright">OpenStreetMap</a>'
            }).addTo(map);
            
            
        },
        
    }
}

from Leaflet JS widget in Filament won't execute

Can the results of UMAP for HDBScan clustering be made more consistent?

I have a set of ~40K phrases which I'm clustering with HDBScan after using UMAP for dimensionality reduction. The steps are:

Generate embeddings using a fine-tuned BERT model
Reduce dimensions with UMAP
Cluster with HDBScan

I'm finding that sometimes, HDBScan finds 100-200 clusters, which is the desired result. But other times, it finds only 2-4. This is with the same dataset and no change in parameters either for UMAP or HDBScan.

From the UMAP documentation I see that UMAP is a stochastic algorithm, so complete reproducibility should not be expected. But it also says "the variance between runs should ideally be relatively small", which is not the case here. Also, the variance seems to be bimodal -- I either end up with 2-4 clusters or 100+, nothing in between.

I've tried different values of parameters for both UMAP (n_components: 3, 4, 6, 10; min_dist: 0.0, 0.1, 0.3, 0.5; n_neighbors: 15, 30) and HDBScan (min_cluster_size: 50, 100, 200) but with all combinations so far, I still occasionally get the undesired 2-4 clusters.

Why is UMAP behaving this way, and how can I ensure it yields the desired 100+ clusters rather than 2-4?

from Can the results of UMAP for HDBScan clustering be made more consistent?

Sunday 19 November 2023

How to get the Chart.js type information into my plain javascript file using JSDoc?

I can't seem to get JSDoc-based intellisense in VSCode to work for the Chart.js library.

Reproducible scenario

The steps are minimal and detailed below, but you can also clone my repository created with the steps from this question.

I have a rather old application, here's a minimally reproducible variant of what I have:

Run npm i chart.js chartjs-adapter-moment moment
Create an index.html like this:

<html>
  <script src="node_modules/moment/moment.js"></script>
  <script src="node_modules/chart.js/dist/chart.umd.js"></script>
  <script src="node_modules/chartjs-adapter-moment/dist/chartjs-adapter-moment.js"></script>
  <script src="app.js"></script>
  <div style="height: 200px; width: 300px;"><canvas id="my-chart"></canvas></div>
</html>

Create an app.js file like this:

// @ts-check

/**
 * @typedef moment
 * @property {import('moment')} moment
 */

/**
 * @global {import('chart.js')} Chart
 */

document.addEventListener("DOMContentLoaded", () => {
  const ctx = document.getElementById("my-chart");
  const chartRef = new Chart(ctx, {
    type: "line",
    options: {
      scales: {
        x: { type: "time" },
        y: { min: 0, max: 10 },
      },
    },
    data: {
      datasets: [
        {
          label: 'Stuff',
          data: [
            { x: moment("2023-01-01"), y: 2 },
            { x: moment("2023-02-01"), y: 8 },
            { x: moment("2023-03-01"), y: 3 },
            { x: moment("2023-04-01"), y: 6 },
          ],
        },
      ],
    },
  });
});

The problem

The moment types seem to be working well in VSCode. But the Chart typings are not working at all. Here's what it does in VSCode currently:

I've tried a bunch of different things instead of the @global declaration, nothing seems to work.

Bottom line / The question

How do I properly get the type completion for Chart.js Chart and related types in VSCode when I'm not using any import or require or similar?

Things I've tried

A helpful answer appeared but I can add the suggestions from it to the "things I've tried", but they don't fix my specific issue as detailed below.

The first suggestion was to use /** @typedef {import("chart.js")} */ but that leaves me with the same error:

Cannot find name 'Chart'. ts(2304)

As I wrote in my comment to the answer at some point I think I also saw:

'Chart' only refers to a type, but is being used as a value here. ts(2693)

But I must've gotten that on a slight variation to the suggestion, and can't seem to get that specific error now anymore.

An edit to the question was made to suggest trying /** @typedef {import("chart.js").Chart} Chart */ which does give me:

'Chart' only refers to a type, but is being used as a value here. ts(2693)

As you can tell from the corresponding screenshot:

from How to get the Chart.js type information into my plain javascript file using JSDoc?