Monday, 31 August 2020

How are blobs removed in RelStorage pack?

This question is related to How to pack blobstorage with Plone and RelStorage

Using zodb database with RelStorage and sqlite as its backend I am trying to remove unused blobs. Currently db.pack does not remove the blobs from disc. The minimum working example below demonstrates this behavior:

import logging
import numpy as np
import os
import persistent
from persistent.list import PersistentList
import shutil
import time
from ZODB import config, blob

connectionString = """
%import relstorage
<zodb main>
<relstorage>
blob-dir ./blob
keep-history false
cache-local-mb 0
<sqlite3>
    data-dir .
</sqlite3>
</relstorage>
</zodb>
"""


class Data(persistent.Persistent):
    def __init__(self, data):
        super().__init__()

        self.children = PersistentList()

        self.data = blob.Blob()
        with self.data.open("w") as f:
            np.save(f, data)


def main():
    logging.basicConfig(level=logging.INFO)
    # Initial cleanup
    for f in os.listdir("."):
        if f.endswith("sqlite3"):
            os.remove(f)

    if os.path.exists("blob"):
        shutil.rmtree("blob", True)

    # Initializing database
    db = config.databaseFromString(connectionString)
    with db.transaction() as conn:
        root = Data(np.arange(10))
        conn.root.Root = root

        child = Data(np.arange(10))
        root.children.append(child)

    # Removing child reference from root
    with db.transaction() as conn:
        conn.root.Root.children.pop()

    db.close()

    print("blob directory:", [[os.path.join(rootDir, f) for f in files] for rootDir, _, files in os.walk("blob") if files])
    db = config.databaseFromString(connectionString)
    db.pack(time.time() + 1)
    db.close()
    print("blob directory:", [[os.path.join(rootDir, f) for f in files] for rootDir, _, files in os.walk("blob") if files])


if __name__ == "__main__":
    main()

The example above does the following:

  1. Remove any previous database in the current directory along with the blob directory.
  2. Create a database/storage from scratch adding two objects (root and child), while child is referenced by root and perform a transaction.
  3. Remove the linkage from root to child and perform a transaction.
  4. Close the database/storage
  5. Open the database/storage and perform db.pack for one second in the future.

The output of the minimum working example is the following:

INFO:ZODB.blob:(23376) Blob directory '<some path>/blob/' does not exist. Created new directory.
INFO:ZODB.blob:(23376) Blob temporary directory './blob/tmp' does not exist. Created new directory.
blob directory: [['blob/.layout'], ['blob/3/.lock', 'blob/3/0.03da352c4c5d8877.blob'], ['blob/6/.lock', 'blob/6/0.03da352c4c5d8877.blob']]
INFO:relstorage.storage.pack:pack: beginning pre-pack
INFO:relstorage.storage.pack:Analyzing transactions committed Thu Aug 27 11:48:17 2020 or before (TID 277592791412927078)
INFO:relstorage.adapters.packundo:pre_pack: filling the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: Filled the pack_object table
INFO:relstorage.adapters.packundo:pre_pack: analyzing references from 7 object(s) (memory delta: 256.00 KB)
INFO:relstorage.adapters.packundo:pre_pack: objects analyzed: 7/7
INFO:relstorage.adapters.packundo:pre_pack: downloading pack_object and object_ref.
INFO:relstorage.adapters.packundo:pre_pack: traversing the object graph to find reachable objects.
INFO:relstorage.adapters.packundo:pre_pack: marking objects reachable: 4
INFO:relstorage.adapters.packundo:pre_pack: finished successfully
INFO:relstorage.storage.pack:pack: pre-pack complete
INFO:relstorage.adapters.packundo:pack: will remove 3 object(s)
INFO:relstorage.adapters.packundo:pack: cleaning up
INFO:relstorage.adapters.packundo:pack: finished successfully
blob directory: [['blob/.layout'], ['blob/3/.lock', 'blob/3/0.03da352c4c5d8877.blob'], ['blob/6/.lock', 'blob/6/0.03da352c4c5d8877.blob']]

As you can see db.pack does remove 3 objects "will remove 3 object(s)" but the blobs in the file system are unchanged.

In the unit tests of RelStorage it appears that they do test if the blobs are removed from the file system (see here), but in the script above it does not work.

What am I doing wrong? Any hint/link/help is appreciated.



from How are blobs removed in RelStorage pack?

No comments:

Post a Comment