For each concept of my dataset I have stored the corresponding wikipedia categories. For example, consider the following 5 concepts and their corresponding wikipedia categories.
- hypertriglyceridemia:
['Category:Lipid metabolism disorders', 'Category:Medical conditions related to obesity']
- enzyme inhibitor:
['Category:Enzyme inhibitors', 'Category:Medicinal chemistry', 'Category:Metabolism']
- bypass surgery:
['Category:Surgery stubs', 'Category:Surgical procedures and techniques']
- perth:
['Category:1829 establishments in Australia', 'Category:Australian capital cities', 'Category:Metropolitan areas of Australia', 'Category:Perth, Western Australia', 'Category:Populated places established in 1829']
- climate:
['Category:Climate', 'Category:Climatology', 'Category:Meteorological concepts']
As you can see, the first three concepts belong to medical domain (whereas the remaining two terms are not medical terms).
More precisely, I want to divide my concepts as medical and non-medical. However, it is very dificult to divide the concepts using the categories alone. For example, even though the two concepts enzyme inhibitor
and bypass surgery
are in medical domain, their categories are very different to each other.
Therefore, I would like to know if there is a way to obtain the parent category
of the categories (for example, the categories of enzyme inhibitor
and bypass surgery
belong to medical
parent category)
I am currently using pymediawiki
and pywikibot
. However, I am not restricted to only those two libraries and happy to have solutions using other libraries as well.
EDIT
As suggested by @IlmariKaronen I am also using the categories of categories
and the results I got is as follows (The small font near the category
is the categories of the category
).
However, I still could not find a way to use these category details to decide if a given term is a medical or non-medical.
Moreover, as pointed by @IlmariKaronen using Wikiproject
details can be potential. However, it seems like the Medicine
wikiproject do not seem to have all the medical terms. Therefore we also need to check other wikiprojects as well.
I am happy to provide more details if needed.
from How to group wikipedia categories in python?
No comments:
Post a Comment