Αρχειοθήκη ιστολογίου

Παρασκευή 4 Μαρτίου 2016

Multi-Modular Text Normalization of Dutch User-Generated Content

As social media constitute a valuable source for data analysis for a wide range of applications, the need for handling such data arises. However, the non-standard language used on social media poses problems for Natural Language Processing (NLP) tools as these are typically trained on standard language material. We propose a text normalization approach to tackle this problem. More specifically, we investigate the usefulness of a multi-modular approach to account for the diversity of normalization issues encountered in user-generated content. We consider three different types of user-generated content written in Dutch (SNS, SMS and tweets) and provide a detailed analysis of the performance of the different modules and the overall system. We also apply an extrinsic evaluation by evaluating the performance of a part-of-speech tagger, lemmatizer and named-entity recognizer before and after normalization.

from #MedicinebyAlexandrosSfakianakis via xlomafota13 on Inoreader http://ift.tt/1nkQcQF
via IFTTT

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου