Keyword Extraction using An Artificial Neural Network

Shaomin Zhang (Nottingham Trent University)
Heather Powell (Nottingham Trent University)
Dominic Palmer-Brown (Nottingham Trent University)


The research presented in this paper investigates domain independent
techniques for automatic knowledge extraction from text. The knowledge is to
be organised into a knowledge base. The techniques presented are aimed at
the first stage: the automatic identification of keywords.

Artificial Neural Networks (ANNs) are trained to recognise keywords on the basis of their relationships to one or more seed words which are manually selected as indicative of the areas of knowledge required. The relationships are obtained from an electronic dictionary. Training data is generated using example keywords that humans have identified as being keywords associated with particular seed words. After training, the ANN can be used to extract keywords automatically from other documents.

Natural and pure generalisations are used to evaluate this new approach. Natural generalisation is the percentage of nouns in new text that are correctly categorised as keywords or non-keywords. Pure generalisation is the percentage of nouns with previously unseen input patterns in the new text that are correctly classified. Experiments so far, on documents concerning education show good natural and pure generalisation for non-keywords at 84% and 82% respectively and reasonable generalisation for keywords (62% for natural and 47% for pure).