Sunday, 29 November 2020

Python library for hierarchical sets (or set operations on trees)

Let's imagine I have the following hierarchy/tree:

                Commit
             /     |    \
           /      Fix     \
         /       /   \      \
  Feature   CodeFix  DocFix  Refactoring

I want to be able to define sets where the elements are the leaves of this tree. The way to do it with standard python sets would be to list all the leaves to be included, e.g. {Feature, CodeFix, DocFix}. What I want is to use a shortcut and pass the entire sub-trees instead. For that, we need some abstraction level that handles that, let's say class HSet. With such class, it should be possible not only to create a set like this:HSet(Feature, BugFix, DocFix) but also like this: HSet(Feature, Fix). Both alternatives should yield the same object and represent the same set since Fix == CodeFix|DocFix according to the hierarchy.

Having such a shortcut would be especially beneficial if Fix had dozens of children. Having to create large sets in the code manually is exactly my use case. Except for set creation, I need also a shortcut for listing the elements in a set. For example, if a set contains all the leaves in the hierarchy, it could be represented by the root of the hierarchy, which would be {Commit} for the example provided above. Please see the examples that will hopefully illustrate the concept better.

>>> from somelibrary import HSet 

>>> commit = HSet.create_root_element("Commit")
>>> feature, fix, refactoring = commit.add_children("Feature", "Fix", "Refactoring")
>>> code_fix, doc_fix = fix.add_children("CodeFix", "DocFix")

>>> str(code_fix | doc_fix)
'{Fix}'
>>> str(code_fix | refactoring)
'{CodeFix|Refactoring}'
>>> str(feature | fix | refactoring)
'{Commit}'

>>> str(~ code_fix)
'{Feature|DocFix|Refactoring}'
>>> str(~ commit)
'{}'
>>> str(~ (feature | refactoring))
'{Fix}'

I am curious if there exists such somelibrary that implements this.

Update: Implementation

In the meantime, I wrote my implementation of the concept I was talking about. But I am always concerned about not reinventing the wheel. So, I would be grateful to pointers to existing data structures in the standard python library or custom libraries that can be used to fully/partially replace my code.

hset.py

from typing import List

from hset.tree import _Tree


class HSet:
    def __init__(self, subtrees: List[_Tree], domain: _Tree):
        self.subtrees = domain.collapse_subtrees(subtrees)
        self.domain = domain

    @classmethod
    def create_root_element(cls, label: str) -> 'HSet':
        tree = _Tree(label, None, [])
        return cls([tree], tree)

    def add_children(self, *labels: str) -> List['HSet']:
        if len(self.subtrees) != 1:
            raise ValueError(f"Cannot add children to this HSet since it has multiple root elements: {self}")
        if self.subtrees[0].children:
            raise ValueError(f"This HSet already has children.")

        trees = self.subtrees[0].add_children(*labels)
        return list(map(lambda t: self._create([t]), trees))

    def _create(self, subtrees: List[_Tree]) -> 'HSet':
        return HSet(subtrees, self.domain)

    def __str__(self):
        return '{' + "|".join(map(lambda x: x.label, self.subtrees)) + '}'

    def __invert__(self) -> 'HSet':
        return self._create(self.domain.complement(self.subtrees))

    def __or__(self, other: 'HSet') -> 'HSet':
        if self.domain != other.domain:
            raise ValueError("Cannot perform operations on HSets with different domains")

        return self._create(self.domain.collapse_subtrees(self.subtrees + other.subtrees))

tree.py

from dataclasses import dataclass
from typing import Optional, List


@dataclass
class _Tree:
    label: str
    parent: Optional['_Tree']
    children: List['_Tree']

    def add_children(self, *labels: str) -> List['_Tree']:
        children = [_Tree(label, self, []) for label in labels]
        self.children = children
        return children

    def collapse_subtrees(self, subtrees: List['_Tree']) -> List['_Tree']:
        if self in subtrees:
            return [self]
        elif not self.children:
            return []
        fully_covered = True
        res = []
        for child in self.children:
            collapsed_child = child.collapse_subtrees(subtrees)
            fully_covered &= (collapsed_child == [child])
            res.extend(collapsed_child)

        return [self] if fully_covered else res

    def complement(self, subtrees: List['_Tree']) -> List['_Tree']:
        if self in subtrees:
            return []
        elif not self.children:
            return [self]
        return [elem for lst in map(lambda x: x.complement(subtrees), self.children) for elem in lst]

Cheers, Hlib.



from Python library for hierarchical sets (or set operations on trees)

No comments:

Post a Comment