Hemant Vishwakarma: How to get partial subtree depending on dependency relations with SpaCy?

Wednesday, 19 April 2023

How to get partial subtree depending on dependency relations with SpaCy?

I have parsed the dependency relations of some text with SpaCy. How can I impose a condition relating to those dependency relations when extracting the subtree of a given token/span?

For example, I would like to get the subtree of a given token but exclude all portions of the subtree where the immediate child of my original token has a conjunction ("conj") dependency relation with that token.

To give an even more concrete example: I would like to extract the names and the corresponding attributes from the following sentence: "The entrepreneur and philanthropist Bill Gates and the Apple's Steve Jobs ate hamburgers."

person	attribute
Bill Gates	entrepreneur and philanthropist
Steve Jobs	Apple's

The dependency relations look like this:

The following code succeeds at extracting the person entities but Bill Gates' subtree overlaps that of Steve Jobs:

import spacy
nlp = spacy.load("en_core_web_trf")

s = "The entrepreneur and philanthropist Bill Gates and the Apple's Steve Jobs ate hamburgers."
doc = nlp(s)

persons = [ent for ent in doc.ents if ent.label_ == "PERSON"]
# [Bill Gates, Steve Jobs]

[[token for token in p.subtree] for p in persons]
# [[The, entrepreneur, and, philanthropist, Bill, Gates, and, the, Apple, 's, Steve, Jobs], [the, Apple, 's, Steve, Jobs]]

So I would like to either get only the parts of Bill Gates' subtree where the first child has a nmod dependency relation, or remove those parts that are connected to a first child with the conj dependency relation. In R, the package rsyntax would get the job done so I assume something similar is already built into SpaCy.

(Any tips for smarter ways to get the table above are also appreciated – I'm not super well-versed in SpaCy nor Python in general)

from How to get partial subtree depending on dependency relations with SpaCy?

Hemant Vishwakarma

Wednesday, 19 April 2023

How to get partial subtree depending on dependency relations with SpaCy?

No comments:

Post a Comment