Friday, 24 March 2023

SQLAlchemy How to use JSON columns with indexes interchangeably between SQLite and PostgreSQL

I'm working on defining a schema with sqlalchemy that works with multiple engine backends, notably sqlite and postgresql.

I'm having an issue because I have a JSON column with an index on it. This seems to work on sqlite, but with postgresql it complains that the index type can't be a btree. I've seen docs that highlight a JSONB type that is specific to the postgres dialect, but the issue is that my schema is declarative: I don't know if I'm going to connect to an SQLite or a PostgreSQL database.

As an example, here is a toy declarative schema:


    # from sqlalchemy.dialects.postgresql import JSONB
    from sqlalchemy import create_engine
    from sqlalchemy import inspect
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import sessionmaker
    from sqlalchemy.sql.schema import Column, Index
    from sqlalchemy.types import Integer, JSON
    from sqlalchemy_utils import database_exists, create_database

    CustomBase = declarative_base()

    class User(CustomBase):
        __tablename__ = 'users'
        id = Column(Integer, primary_key=True, doc='unique internal id')
        name = Column(JSON)
        loose_identifer = Column(JSON, index=True, unique=False)
        # loose_identifer = Column(JSONB, index=True, unique=False)

    uri = 'sqlite:///test_sqlite_v7.sqlite'
    # uri = 'postgresql+psycopg2://admin:admin@localhost:5432/test_postgresql_v4.postgres'

    engine = create_engine(uri)
    DBSession = sessionmaker(bind=engine)
    session = DBSession()

    if 'postgresql' in uri:
        if not database_exists(uri):
            create_database(uri)

    inspector = inspect(engine)
    table_names = inspector.get_table_names()
    if len(table_names) == 0:
        CustomBase.metadata.create_all(engine)

    user_infos = [
        {'name': 'user1', 'loose_identifer': "AA" },
        {'name': 'user2', 'loose_identifer': "33" },
        {'name': 'user3', 'loose_identifer': 33 },
        {'name': 'user4', 'loose_identifer': 33 },
        {'name': 'user5', 'loose_identifer': "AA" },
        {'name': 'user6', 'loose_identifer': None},
        {'name': 'user7', 'loose_identifer': [1, 'weird']},
    ]
    for row in user_infos:
        user = User(**row)
        session.add(user)

    session.commit()

    import pandas as pd
    import json
    table_df = pd.read_sql_table('users', con=engine)
    table_df['loose_identifer'] = table_df['loose_identifer'].apply(repr)
    print(table_df)

    query = session.query(User.name, User.loose_identifer).filter(User.loose_identifer == json.dumps(33))
    results = list(query.all())
    print(f'results={results}')

    query = session.query(User.name, User.loose_identifer).filter(User.loose_identifer == json.dumps('33'))
    results = list(query.all())
    print(f'results={results}')

The User table has a loose_identifer column that I want to be allowed to be a fairly arbitrary JSON type, and I want to add an index on it. The main reason that I have it like this is because I have to support these "loose" identifiers that could be integers or strings.

Using a Column(JSON, index=True, unique=False) seems to work just fine when I use sqlite, but when I switch it to target a postgresql engine I get this error:

ProgrammingError: (psycopg2.errors.UndefinedObject) data type json has no default operator class for access method "btree"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.

[SQL: CREATE INDEX ix_users_loose_identifer ON users (loose_identifer)]
(Background on this error at: https://sqlalche.me/e/14/f405)

I've tried adding the index explicitly by adding this class attribute:

    __table_args__ =  (
        # https://stackoverflow.com/questions/30885846/how-to-create-jsonb-index-using-gin-on-sqlalchemy
        Index(
            "ix_users_loose_identifer", loose_identifer,
            postgresql_using="gin",
        ),
    )

But that doesn't seem to be working. It's possible I'm doing something wrong in that declaration.

If I change JSON to JSONB in the above schema, it does work, but JSONB isn't compatible with sqlite, so my question is: how to I declare my schema with json columns that will be indexed using syntax that is compatible between sqlite and postgresql backends?



from SQLAlchemy How to use JSON columns with indexes interchangeably between SQLite and PostgreSQL

No comments:

Post a Comment