Friday, 22 March 2019

sqlalchemy query of timeseries data formatted as pairs of (step, next_step) time-adjacent samples

I have some time series data where I have sets of time series, each Timeseries instance of which has a one-to-many relationship with Point instances. Below is a simplified representation of the data.

tables.py:

class Timeseries(Base):
    __tablename__ = "timeseries"

    id = Column("id", Integer, primary_key=True)
    points = relationship("Point", back_populates="ts")


class Point(Base):
    __tablename__ = "point"

    id = Column("id", Integer, primary_key=True)
    t = Column("t", Float)
    v = Column("v", Float)
    ts_id = Column(Integer, ForeignKey("timeseries.id"))
    ts = relationship("Timeseries", back_populates="points")

Question: I'm trying to come up with a query with these kind of columns: "timeseries_id", "id", "t", "v", "id_next", "t_next", "v_next". That is, I want to be able to see each point's data alongside the next points data in the time series in chronological order, but I've been struggling get a table that doesn't elements from a implicit join? Here's the basic start of what I got, (Note that I haven't run this code since this is a simplified version of my actual database, but it's the same idea):

# The point data actually in the database.
sq = (session.query(
    Timeseries.id.label("timeseries_id"),
    Point.id,
    Point.t,
    Point.v)
.select_from(
    join(Timeseries, Point, Timeseries.id==Point.ts_id))
.group_by('timeseries_id')
.subquery())

# first point manually added to each list in query
sq_first = (session.query(
    Timeseries.id.label("timeseries_id"),
    sa.literal_column("-1", Integer).label("id"), # Some unused Point.id value
    sa.literal_column(-math.inf, Float).label("t"),
    sa.literal_column(-math.inf, Float).label("v"))
.select_from(
    join(Timeseries, Point, Timeseries.id==Point.ts_id))
.subquery())

# last point manually added to each list in query.
sq_last = (session.query(
    Timeseries.id.label("timeseries_id"),
    sa.literal_column("-2", Integer).label("id"), # Another unused Point.id value
    sa.literal_column(math.inf, Float).label("t"),
    sa.literal_column(math.inf, Float).label("v"))
.select_from(
    join(Timeseries, Point, Timeseries.id==Point.ts_id))
.subquery())

# Append each timeseries in `sq` table with last point
sq_points_curr = session.query(sa.union_all(sq_first, sq)).subquery()
sq_points_next = session.query(sa.union_all(sq, sq_last)).subquery()

Assuming what I've done so far is useful, this is the part where I get stuck:

#I guess rename the columns in `sq_points_next` to append them by "_next"....
sq_points_next = (session.query(
    sq_points_curr.c.timeseries_id
    sq_points_curr.c.id.label("id_next"),
    sq_points_curr.c.t.label("t_next"),
    sq_points_curr.c.v.label("v_next"))
.subquery())

# ... and then perform a join along "timeseries_id" somehow to get the table I originally wanted...
sq_point_pairs = (session.query(
    Timeseries.id.label("timeseries_id")
    "id",
    "t",
    "v",
    "id_next",
    "t_next",
    "v_next"
).select_from(
    sq_points, sq_points_next, sq_points.timeseries_id==sq_points_next.timeseries_id)
)

I'm not even sure if this last would compile at this point since again it is adapted/simplified from real code but it doesn't yield a table of adjacent points in time, etc..



from sqlalchemy query of timeseries data formatted as pairs of (step, next_step) time-adjacent samples

No comments:

Post a Comment