Wednesday 4 September 2019

SQLAlchemy: group by day over multiple tables

In my Flask application, I have something similar to a bank account: One User has one Account, credit entries are modeled as Incomings, deductions are modeled as Outgoings.

The problem:

Get an "account statement" for one user, i.e. credit entries / deductions per day, e.g.

Thu 29 Aug 2019
  Some deduction: -23.00
  Some credit: 123.00
Fri 30 Aug 2019
  Big credit: 4223.00
  Another deduction: -42.00

My data model:

This is what (a simplified version of) my models.py looks like:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy     import Column, Float, ForeignKey, Integer, Text, TIMESTAMP
from sqlalchemy.orm import relationship

Base = declarative_base()

class Account(Base):
    __tablename__ = 'account'
    id        = Column(Integer, primary_key=True)
    balance   = Column(Float,   nullable=False)
    userID    = Column(Integer, ForeignKey('user.id'))
    incomings = relationship("Incoming", back_populates="account")
    outgoings = relationship("Outgoing", back_populates="account")
    user      = relationship("User",     back_populates="account")

class Incoming(Base):
    __tablename__ = 'incoming'
    id          = Column(Integer,   primary_key=True)
    accountID   = Column(Integer,   ForeignKey('account.id'))
    amount      = Column(Float,     nullable=False)
    description = Column(Text,      nullable=False)
    timestamp   = Column(TIMESTAMP, nullable=False)
    account     = relationship("Account", back_populates="incomings")

class Outgoing(Base):
    __tablename__ = 'outgoing'
    id          = Column(Integer,   primary_key=True)
    accountID   = Column(Integer,   ForeignKey('account.id'))
    amount      = Column(Float,     nullable=False)
    description = Column(Text,      nullable=False)
    timestamp   = Column(TIMESTAMP, nullable=False)
    account     = relationship("Account", back_populates="outgoings")

class User(Base):
    __tablename__ = 'user'
    id      = Column(Integer,   primary_key=True)
    name    = Column(Text,      nullable=False)
    account = relationship("Account", back_populates="user")

My general intended approach:

  1. get all Incomings for user, grouped by day
  2. get all Outgoings for user, grouped by day
  3. somehow merge the two lists, grouped by day

My background:

It's been a while since I've worked with the underlying database PostgreSQL (but then, I did manage to set up a trigger function to auto-update the balance), but as far as SQLAlchemy (the ORM in use) is concerned, I merely seem to have scratched the surface.

Step 1: get all Incomings for user, grouped by day

Following the first SO hit, I tried

from sqlalchemy import func

# existing sample account ID
accountID  = 42
# not relevant to the point at hand, known to work
db_session = get_a_scoped_session_from_elsewhere()

db_incomings = db_session.query(Incoming)                         \
                         .filter(Incoming.accountID == accountID) \
                         .group_by(func.day(Incoming.timestamp))  \
                         .all()

but this fails with

ProgrammingError: (psycopg2.errors.UndefinedFunction) ...
 ... function day(timestamp without time zone) does not exist

which seems to indicate that PostgreSQL doesn't support day.

According to this SO answer,

# imports and variables as above
db_incomings = db_session.query(Incoming)                                      \
                         .filter(Incoming.accountID == accountID)              \
                         .group_by(func.date_trunc('day', Incoming.timestamp)) \
                         .all()

works for PostgreSQL, but for me fails with

ProgrammingError: (psycopg2.errors.GroupingError) ...
 ... column "incoming.id" must appear in the GROUP BY clause ...
 ... or be used in an aggregate function

When I just blindly try to do what the error message tells me and add incoming.id to the GROUP BY clause as in

db_incomings = db_session.query(Incoming)                                      \
                         .filter(Incoming.accountID == accountID)              \
                         .group_by(Incoming.id,
                                   func.date_trunc('day', Incoming.timestamp)) \
                         .all()

the code works, but does not return the wanted result; instead, I get a list of objects like

{'timestamp': datetime.datetime(2019, 8, 29, 10, 4, 27, 459000), 'id': 1, 'accountID': 42, ...}
{'timestamp': datetime.datetime(2019, 8, 29, 10, 8, 21, 493000), 'id': 2, 'accountID': 42, ...}
{'timestamp': datetime.datetime(2019, 8, 29, 10, 8, 42, 660000), 'id': 3, 'accountID': 42, ...}

which isn't really surprising, considering I'm grouping by Incoming.id.

Trying to understand the underlying issue (see e.g. here or here), it seems I cannot reference a field on the SELECT statement (i.e. the SQLAlchemy .query) if it doesn't appear on the GROUP BY clause (i.e. the SQLAlchemy .group_by). Looking at the error message, this also seems to be the case vice versa.

I've been wrecking my brain for a couple of hours now, found lots of alternatives to func.date_trunc and have 800 browser tabs open, but still no idea how to approach this.

My question: How do I need to structure / build up the SQLAlchemy query ?



from SQLAlchemy: group by day over multiple tables

No comments:

Post a Comment