Publication date: Available online 18 May 2017
Source:Computer Speech & Language
Author(s): Herman Kamper, Aren Jansen, Sharon Goldwater
Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-coverage systems attempt to completely segment and cluster the audio into word-like units—effectively performing unsupervised speech recognition. This article presents the first attempt we are aware of to apply such a system to large-vocabulary multi-speaker data. Our system uses a Bayesian modelling framework with segmental word representations: each word segment is represented as a fixed-dimensional acoustic embedding obtained by mapping the sequence of feature frames to a single embedding vector. We compare our system on English and Xitsonga datasets to state-of-the-art baselines, using a variety of measures including word error rate (obtained by mapping the unsupervised output to ground truth transcriptions). Very high word error rates are reported—in the order of 70–80% for speaker-dependent and 80–95% for speaker-independent systems—highlighting the difficulty of this task. Nevertheless, in terms of cluster quality and word segmentation metrics, we show that by imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, both single-speaker and multi-speaker versions of our system outperform a purely bottom-up single-speaker syllable-based approach. We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding). Our system's discovered clusters are still less pure than those of unsupervised term discovery systems, but provide far greater coverage.
from #MedicinebyAlexandrosSfakianakis via xlomafota13 on Inoreader http://ift.tt/2rAdnxk
via IFTTT
Αρχειοθήκη ιστολογίου
-
►
2020
(289)
- ► Φεβρουαρίου (28)
-
►
2019
(9071)
- ► Δεκεμβρίου (19)
- ► Σεπτεμβρίου (54)
- ► Φεβρουαρίου (3642)
- ► Ιανουαρίου (3200)
-
►
2018
(39872)
- ► Δεκεμβρίου (3318)
- ► Σεπτεμβρίου (3683)
- ► Φεβρουαρίου (2693)
- ► Ιανουαρίου (3198)
-
▼
2017
(41099)
- ► Δεκεμβρίου (3127)
- ► Σεπτεμβρίου (2173)
-
▼
Μαΐου
(6766)
-
▼
Μαΐ 30
(425)
- The effect of subinhibitory concentrations of gent...
- Podcast Interviews
- The Role of Patients
- Decision Making for Diagnosis and Management
- Safer VL intubation: Don't lift or displace the to...
- Flap Basics I
- Anatomy of the Skin and the Pathogenesis of Nonmel...
- Reconstruction of Cutaneous Nasal Defects
- Scar Revision and Recontouring Post-Mohs Surgery
- The Physiology and Biomechanics of Skin Flaps
- Speech-evoked auditory brainstem responses in chil...
- Glucosamine has an antiallergic effect in mice wit...
- Safety and efficacy of a bioabsorbable fluticasone...
- Safety and tolerability of surfactant nasal irriga...
- Safer Intubation Tip #5
- Disease activity and mucosal healing in inflammato...
- SMARCA4-deficient pulmonary adenocarcinoma: clinic...
- Comparative evaluation of insertion torque and mec...
- QUILT-3.047: NANT Head and Neck Squamous Cell Carc...
- Phase 2 Trial of Apatinib Mesylate in Locally Adva...
- "Model for Early Allograft Function" outperforms "...
- Plasma Exosomes from HLA-Sensitized Kidney Transpl...
- Comparative Evaluation of [alpha]CD40 (2C10R4) and...
- Magnetic behaviour of hydrogenated Ho(1-x)Mm(x)Co(...
- Multi-Objective Genetic Algorithms for the minimis...
- DeepEar: Robust Smartphone Audio Sensing in Uncons...
- Goodbye Warm Front: Evaluating the Delivery of Ene...
- Can deep learning revolutionize mobile sensing?
- A theoretical elucidation of glucose interaction w...
- Modified Radius Directed Equaliser for High Order QAM
- Social Touch Gesture Recognition using Random Fore...
- Optical Non-Contact Railway Track Measurement with...
- Public understanding of the purpose of cancer scre...
- How to study spoken language understanding: a surv...
- Early Endarterectomy Carries a Lower Procedural Ri...
- Joint CHEST-SGP Congress 2017. Basel, Switzerland,...
- Graphene Oxide Framework Materials: Theoretical Pr...
- The healthy human cerebellum engaging in complex p...
- Identity projects in complementary and mainstream ...
- Adsorption Sites and Binding Nature of CO 2 in Pro...
- A new family of metal borohydride ammonia borane c...
- Exploring mobile news reading interactions for new...
- Multi-Objective Genetic Algorithms for the minimis...
- Magnetic behaviour of hydrogenated Ho(1-x)Mm(x)Co(...
- Goodbye Warm Front: Evaluating the Delivery of Ene...
- Early Endarterectomy Carries a Lower Procedural Ri...
- Optical Non-Contact Railway Track Measurement with...
- Delayed diagnosis of subcutaneous dirofilariasis f...
- Can deep learning revolutionize mobile sensing?
- Radiosensitization by BRAF inhibitors
- DeepEar: Robust Smartphone Audio Sensing in Uncons...
- A theoretical elucidation of glucose interaction w...
- Public understanding of the purpose of cancer scre...
- How to study spoken language understanding: a surv...
- Efficient cross-coupling of aryl chlorides with ar...
- Thermodynamics of addition of H-2, CO, N-2, and C-...
- Full title with Editorial board members
- On the origin of selective nitrous oxide N-N bond ...
- IOP-details
- Instructions to Authors
- Increased Frequency of Bronchiolar Histotypes in L...
- Olefin metathesis-active ruthenium complexes beari...
- Four-coordinate molybdenum chalcogenide complexes ...
- Thermodynamics of phosphine coordination to the [P...
- On the origin of selective nitrous oxide N-N bond ...
- Olefin metathesis-active ruthenium complexes beari...
- Four-coordinate molybdenum chalcogenide complexes ...
- Efficient cross-coupling of aryl chlorides with ar...
- Thermodynamics of addition of H-2, CO, N-2, and C-...
- Thermodynamics of phosphine coordination to the [P...
- α2,6-Sialylation mediates hepatocellular carcinoma...
- The immunosuppressive cytokine interleukin-4 incre...
- Prostate cancer incidence as an iceberg
- Association between childhood adversity and a diag...
- The hidden epidemic of schistosomiasis in recent A...
- Post San Antonio Breast Cancer Symposium
- Morphological control of self-assembled multivalen...
- Crosslinked shells for nano-assembled capsules: a ...
- Lewis acid catalyzed cascade annulation of alkynol...
- Observing the Dynamic "Hot Spots" on Two Dimension...
- Role of apoptosis in the development of autosomal ...
- Extrarenal determinants of kidney filter function
- Quantifying podocyte depletion: theoretical and pr...
- Role of TGF-β in metastatic colon cancer: it is fi...
- Engineering kidney cells: reprogramming and direct...
- A Dissimilar Biosimilar?: Lichenoid Drug Eruption ...
- "Anticancer Res"[jour]; +75 new citations
- Thermodynamics of addition of CO, isocyanide, and ...
- Oncologic safety of cervical nerve preservation in...
- Corrigendum to “Differences in Brain Metabolic Imp...
- First transition metal-boryl bond energy and quant...
- New Metamaterial Helps Improve High Field MRI Scans
- Altered postcapillary and collecting venular react...
- Spectroscopic detection of organolanthanide dihydr...
- The Response of Macro- and Micronutrient Nutrient ...
- Corrigendum to "Is there a correlation between nas...
- Evolving trends in head and neck cancer epidemiolo...
- Salvage surgery for oropharyngeal squamous cell ca...
- A numerical kinematic model of welding process for...
- Heat girdling does not affect xylem integrity: an ...
-
▼
Μαΐ 30
(425)
-
►
2016
(13807)
- ► Δεκεμβρίου (700)
- ► Σεπτεμβρίου (600)
- ► Φεβρουαρίου (1350)
- ► Ιανουαρίου (1400)
-
►
2015
(1500)
- ► Δεκεμβρίου (1450)
Ετικέτες
Εγγραφή σε:
Σχόλια ανάρτησης (Atom)
Δεν υπάρχουν σχόλια:
Δημοσίευση σχολίου