Erjon Sulaj | Text To Speech

Text-To-Speech

2014 - Ongoing

Ongoing Project - Development of an OSS based human voice text-to-speech engine in Albanian language.

Speech synthesis is the artificial production of human speech. The most important qualities a speech synthesis system must fulfill are naturalness and intelligibility.
Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood. The ideal speech synthesizer is both natural and intelligible.

HMM-based synthesis is a synthesis method based on hidden Markov models, also called Statistical Parametric Synthesis. In this system, the frequency spectrum (vocal tract), fundamental frequency (vocal source), and duration (prosody) of speech are modeled simultaneously by HMMs. Speech waveforms are generated from HMMs themselves based on the maximum likelihood criterion.

My work is currently focused in developing in Albanian "MaryTTS", which is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.

Currently this open source software supports German, British and American English, French, Italian, Swedish, Russian, Turkish, etc; while I hope to be able to successfully add Albanian to the list of supported languages.

External Links

Text-To-Speech

Comments

my aim

tags