Weighted Probability Distribution Voting, an introduction

Hans van Halteren (University of Nijmegen)

Many tasks that have to be handled in natural language processing
systems can be formulated as classification tasks, i.e. tasks in which
an output, taken from a finite set of possible values, is calculated
on the basis of a specific set of input information units.  An example
is wordclass tagging, where the input consists of features of the
token to be tagged and its context, and the output consists of a
wordclass tag.  Classification tasks can generally be handled
relatively well with machine learning techniques. A variety of machine
learning techniques has already been applied to NLP classification
tasks, e.g. decision trees, neural networks, case bases and maximum
entropy models.

I am developing a new machine learning technique, called Weighted Probability Distribution Voting (WPDV). During learning, WPDV takes every possible combination of input features in turn. It searches the training data for all instances of each combination and calculates a probability distribution for the co-occurring output features. During classification, WPDV takes all possible input feature combinations that occur in the new input and adds the corresponding probability distributions, each multiplied by a weight factor which increases exponentially with the number of elements in the factor which increases exponentially with the number of elements in the combination. The output feature with the highest sum is then selected.

In this paper, I describe the WPDV technique in detail and evaluate its performance on several NLP tasks.