Graduate School of Science and Engineering Information and Computer Science

Spoken Language Processing Laboratory

Speech and Language Processing for Development of Human Communication

Staff

KATO Tsuneo
[Professor]

Acceptable course
Master's degree course	✓
Doctoral degree course	✓

Telephone : +81-774-65-6981
tsukato@mail.doshisha.ac.jp
Office : KC-221
Database of Researchers

TAMURA Akihiro
[Associate Professor]

Acceptable course
Master's degree course	✓
Doctoral degree course

Telephone : +81-774-65-6983
aktamura @mail.doshisha.ac.jp
Office : KC-321
Database of Researchers

Research Topics

Longitudinal analysis of English speech produced by Japanese children
Computer-Assisted / Robot-Assisted Language Learning (CALL/RALL)
Multimodal conversation analysis
Spoken dialogue system
Music research
Text-entry methods for smart devices

Research Contents

Research background and goals

The world is globalizing, and opportunities to communicate in foreign languages are increasing everywhere. Second language learning is becoming increasingly important in the globalized community. The importance of English communication is particularly emphsized in Japan.
The Spoken Language Processing Laboratory in Doshisha University conducts research on the nature of development in speech production, the linguistic properties of second language (L2) speech and assisting technology for L2 learning based on natural language processing (NLP) and signal processing (SP) technologies.
Although the accuracy of automatic speech recognition has improved significantly, recognizing L2 speech is still challenging because L2 speech has a wide variety including errors in different levels i.e., in pronunciation, lexicon and grammar. Automatic assistance of L2 learning e.g., automatic detection and correction of errors, is the next big challenge.
We are conducting research on various aspects of speech and spoken language from a number of perspectives. We regularly collect samples of English speech produced by Japanese elementary school children, and we are analyzing the collected speech longitudinally. We are developing automatic speech recognition for Japanese-accented English and an automatic prosody assessment of L2 English speech. We are developing joining-in-type robot-assisted language learning (RALL) system and exploring effective ways of learning with this system through measuring learning effectiveness.

Approaches

Our research on speech and spoken language is mainly based on signal processing and statistical techniques. The core techniques are SP, NLP and machine learning. In the machine learning field, deep learning is a big trend. Our research is based not only on core techniques, but also on phonetics, cognitive science and user-interface design theory.

Specific research themes

1) Longitudinal analysis of English speech produced by Japanese children
In Japan, there is a plan to implement compulsory English education from the 5th grade of elementary school (at 11 years old, two years earlier than at present, to enable the children to learn communication skills in English more effectively). Until now, the English speech of native Japanese children who learn English as a foreign language domestically has not been analyzed or recorded longitudinally.
We biannually collected samples of English speech produced by children at Doshisha elementary school, and we are analyzing the speech based on signal processing and phonetics longitudinally. We are studying how the children's pronunciation changes during their physical and intellectual development and how educational program affects the changes.
We are also developing techniques for automatically assessing L2 English speech. We are now focusing on development of our original metric for assessing rhythm including correctness of sentence and word stress in speech.

2) Computer-Assisted / Robot-Assisted Language Learning (CALL/RALL)
We aim to develop a computer-/robot-assisted language learning (CALL/RALL) system that enables multi-level English self training, such as pronunciation, construction of sentence patterns, and conversations. This project includes research on elemental techniques, such as automatic recognition of spoken English produced by Japanese learners, automatic assessment of their prosody and linguistic performance, and the development of a joining-in-type robot-assisted language learning (JIT-RALL) system consisting of two NAO robots. The JIT-RALL system is an integrated system that operates two robots, one acting as a teacher and the other acting as a co-learner of a human learner. The robots can collaborate to simulate English conversation, ask same questions to the human learner and the other robot, or show a model answer to the question. We are measuring the effects on the human learners' development going from declarative to procedural knowledge through repetitive training on Japanese university students.

3) Multimodal conversation analysis
In typical human-human conversations, we use nonverbal behaviors as well as verbal information. Nonverbal behaviors such as gestures, nodding, and eye gaze, play important roles in establishing smooth conversations, especially when the communicative abilities of the participants are not sufficient (such as in L2 conversations or human-robot conversations).
The Multimodal Conversation Analysis project is studying on how nonverbal behaviors function in conversations under the influence of communicative insufficiency by comparing the eye gaze in native-language (L1) and L2 conversations. The results of these analyses can be adapted to the model for natural gaze patterns of robots and for estimating people's intention by their gaze activities.

4) Spoken dialogue system
Recent prevalence of smart speaker products and dialogue agents on smartphones has caused a rapid increase in demand for intelligent information processing on spoken dialogue systems. We have been working on a natural language understanding (NLU) unit for spoken dialogue systems, specifically utterance intent classification and slot-filling with recent neural network techniques. We will further advance into the automatic response generation with neural network techniques from now.

5) Music research
Piano performance can be analyzed in combination with audio signals; the movement of the keys and hammers can be captured by a digital-recording-equipped Bosendorfer (Bosendorfer CEUS) and movements of fingers and arms can be captured by an optical motion capture (MoCap) system. We are interested in the human perception of musical tempo change. It is well known that musical performances tend to accelerate unconsciously. We studied on the relationships between the human perception of musical tempo changes, musical training, and ability to keep track of tempo.

6) Text-entry methods for smart devices
We are studying human-computer interaction (HCI). We aim to develop an intelligent user interface (IUI) based on behavioral SP and NLP. For the last three years, we have been studying efficient Japanese text-entry methods on smartwatches and a correction algorithm of pointing position on touch screens under vibrating environments based on acceleration signals.

Keywords

Speech recognition
Natural language processing
Nonlinear speech signal processing

Acquiring foreign language (L2) ability
Spoken language processing