I'm working on defining a schema with sqlalchemy that works with multiple engine backends, notably sqlite and postgresql.
I'm having an issue because I have a JSON column with an index on it. This seems to work on sqlite, but with postgresql it complains that the index type can't be a btree. I've seen docs that highlight a JSONB type that is specific to the postgres dialect, but the issue is that my schema is declarative: I don't know if I'm going to connect to an SQLite or a PostgreSQL database.
As an example, here is a toy declarative schema:
# from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy import create_engine
from sqlalchemy import inspect
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql.schema import Column, Index
from sqlalchemy.types import Integer, JSON
from sqlalchemy_utils import database_exists, create_database
CustomBase = declarative_base()
class User(CustomBase):
__tablename__ = 'users'
id = Column(Integer, primary_key=True, doc='unique internal id')
name = Column(JSON)
loose_identifer = Column(JSON, index=True, unique=False)
# loose_identifer = Column(JSONB, index=True, unique=False)
uri = 'sqlite:///test_sqlite_v7.sqlite'
# uri = 'postgresql+psycopg2://admin:admin@localhost:5432/test_postgresql_v4.postgres'
engine = create_engine(uri)
DBSession = sessionmaker(bind=engine)
session = DBSession()
if 'postgresql' in uri:
if not database_exists(uri):
create_database(uri)
inspector = inspect(engine)
table_names = inspector.get_table_names()
if len(table_names) == 0:
CustomBase.metadata.create_all(engine)
user_infos = [
{'name': 'user1', 'loose_identifer': "AA" },
{'name': 'user2', 'loose_identifer': "33" },
{'name': 'user3', 'loose_identifer': 33 },
{'name': 'user4', 'loose_identifer': 33 },
{'name': 'user5', 'loose_identifer': "AA" },
{'name': 'user6', 'loose_identifer': None},
{'name': 'user7', 'loose_identifer': [1, 'weird']},
]
for row in user_infos:
user = User(**row)
session.add(user)
session.commit()
import pandas as pd
import json
table_df = pd.read_sql_table('users', con=engine)
table_df['loose_identifer'] = table_df['loose_identifer'].apply(repr)
print(table_df)
query = session.query(User.name, User.loose_identifer).filter(User.loose_identifer == json.dumps(33))
results = list(query.all())
print(f'results={results}')
query = session.query(User.name, User.loose_identifer).filter(User.loose_identifer == json.dumps('33'))
results = list(query.all())
print(f'results={results}')
The User table has a loose_identifer column that I want to be allowed to be a fairly arbitrary JSON type, and I want to add an index on it. The main reason that I have it like this is because I have to support these "loose" identifiers that could be integers or strings.
Using a Column(JSON, index=True, unique=False) seems to work just fine when I use sqlite, but when I switch it to target a postgresql engine I get this error:
ProgrammingError: (psycopg2.errors.UndefinedObject) data type json has no default operator class for access method "btree"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
[SQL: CREATE INDEX ix_users_loose_identifer ON users (loose_identifer)]
(Background on this error at: https://sqlalche.me/e/14/f405)
I've tried adding the index explicitly by adding this class attribute:
__table_args__ = (
# https://stackoverflow.com/questions/30885846/how-to-create-jsonb-index-using-gin-on-sqlalchemy
Index(
"ix_users_loose_identifer", loose_identifer,
postgresql_using="gin",
),
)
But that doesn't seem to be working. It's possible I'm doing something wrong in that declaration.
If I change JSON to JSONB in the above schema, it does work, but JSONB isn't compatible with sqlite, so my question is: how to I declare my schema with json columns that will be indexed using syntax that is compatible between sqlite and postgresql backends?
from SQLAlchemy How to use JSON columns with indexes interchangeably between SQLite and PostgreSQL
No comments:
Post a Comment