Friday, 5 November 2021

How to normalize unix timestamps to sum to discrete numeric values?

I have an "ideal" formula in which I have a sum of values like

score[i] = SUM(properties[i]) * frequency[i] + recency[i]

with properties being a vector of values, frequency and recency scalar values, taken from a given dataset of N items. While all variables here is numeric and with discrete integer values, the recency value is a UNIX timestamp in a given time range (like 1 month since now, or 1 week since now, etc. on daily basis).

In the dataset, each item i has a date value expressed as recency[i], and a frequency value frequency[i], and the list properties[i]. All properties of item[i] are therefore evaluated on each day expressed as recency[i] in the proposed time range.

According to this formula the recency contribution to the score value for the item[i] is a negative contribution: the older is the timestamp the better is the score (hence the + sign in that formula).

My idea was to use a re-scaler approach in the given range like

scaler = MinMaxScaler(feature_range=(min(recencyVec), max(recencyVec)))
scaler = scaler.fit(values)
normalized = scaler.transform(values)

where recencyVec collects all recency vectors for each data point, where min(recencyVec) is the first day and max(recencyVec) is the last day.

using the scikit-learn object MinMaxScaler, hence transforming the recency values by scaling each feature to the given range as suggested in How to Normalize and Standardize Time Series Data in Python

Is this the correct approach for this numerical formulation? Which alternative approach may be possible to normalize the timestamp values when summed to other discrete numeric values?



from How to normalize unix timestamps to sum to discrete numeric values?

No comments:

Post a Comment