Αρχειοθήκη ιστολογίου

Δευτέρα 12 Σεπτεμβρίου 2016

Improving N-Gram probability estimates by compound-head clustering

Compounding is one of the most productive word formation processes in many languages and is therefore a main source of data sparsity in language modeling. Many solutions have been suggested to model compound words, most of which break the compound into its constituents and train a new model with them. In earlier work, we argued that this approach is suboptimal and we presented a novel technique that clusters new, domain-specific compound words together with their semantic heads. The clusters were then used to build a class-based n-grarn model that enabled a reliable estimation of n-grarn probabilities, without the need for additional training data. In this paper, we investigate how this "semantic head mapping" can best be made an integral part of the language modeling strategy and find that, with some adaptations, our technique is capable of producing more accurate compound probability estimates than a baseline word-based n-gram language model, which lead to a significant word error rate reduction for Dutch read speech.

from #MedicinebyAlexandrosSfakianakis via xlomafota13 on Inoreader http://ift.tt/2cQqx2T
via IFTTT

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου