We discuss an approach based on the "maximum entropy technique" of building a model distribution over parses which is the most uniform possible given constraints derived from statistical features collected from the training data and weighted according to their frequency in the data. The major difficulty with this approach lies in the problem of overfitting due to too many features being considered. This results in an unduly high probability being assigned to parses actually in the training data and an unduly low probability being assigned to parses not specifically trained upon. Our strategy of feature merging involves combining multiple features which commonly occur together into one more general feature. We discuss methods of doing this which may be used to reduce the size of the model, resulting in increased generality and overfitting reduction.