I have a dataframe with a column containing text.
I want to create a new column that contains a tuple/list of the top 'n' TF-IDF scoring words in each row as a way of summarizing what is in the text.
An example dataframe (with a large amount of brevity) is:
df = pd.DataFrame({'Ref': [1,2,3,4,5], 'Text': ["the cow jumped off the other cow",
"the fox had a fox",
"the spanner was a tool to tool",
"the football player played football",
"the house had a house"]})
I have spent the last few days trying to find a solution, but I can only find examples which finds the top TF-IDF words for the whole corpus, rather than for each row in a dataframe based on the whole corpus.
Can anyone steer me in the right direction?
from Python - Using TF-IDF to summarise dataframe text column
No comments:
Post a Comment