Graduate School of Science and Engineering Information and Computer Science
- Course Outline
- Laboratory for Information Theory and its Applications
- Information Systems Laboratory
- Intelligent Information Processing Laboratory
- Intelligent Mechanism Laboratory
- Intelligent Systems Design Laboratory
- Socio-informatics Laboratory
- Co-Creation Informatics Laboratory
- Applied Media Information Laboratory
- Network Information Systems Laboratory
- Intelligent Mechatro-Informatics Laboratory
- Spoken Language Processing Laboratory
Spoken Language Processing Laboratory
Speech and Language Processing for Development of Human Communication
Staff
KATO Tsuneo
[Professor]
Acceptable course | |
---|---|
Master's degree course | ✓ |
Doctoral degree course | ✓ |
Telephone : +81-774-65-6981
tsukato@mail.doshisha.ac.jp
Office : KC-221
Database of Researchers
TAMURA Akihiro
[Associate Professor]
Acceptable course | |
---|---|
Master's degree course | ✓ |
Doctoral degree course |
Telephone : +81-774-65-6983
aktamura @mail.doshisha.ac.jp
Office : KC-321
Database of Researchers
Research Topics
- Longitudinal analysis of English speech produced by Japanese children
- Computer-Assisted / Robot-Assisted Language Learning (CALL/RALL)
- Multimodal conversation analysis
- Spoken dialogue system
- Music research
- Text-entry methods for smart devices
Research Contents
Research background and goals
The world is globalizing, and opportunities to communicate in foreign languages are increasing everywhere. Second
language learning is becoming increasingly important in the globalized community. The importance of English
communication is particularly emphsized in Japan.
The Spoken Language Processing Laboratory in Doshisha University conducts research on the nature of development in
speech production, the linguistic properties of second language (L2) speech and assisting technology for L2 learning
based on natural language processing (NLP) and signal processing (SP) technologies.
Although the accuracy of automatic speech recognition has improved significantly, recognizing L2 speech is still
challenging because L2 speech has a wide variety including errors in different levels i.e., in pronunciation,
lexicon and grammar. Automatic assistance of L2 learning e.g., automatic detection and correction of errors, is the
next big challenge.
We are conducting research on various aspects of speech and spoken language from a number of perspectives. We
regularly collect samples of English speech produced by Japanese elementary school children, and we are analyzing
the collected speech longitudinally. We are developing automatic speech recognition for Japanese-accented English
and an automatic prosody assessment of L2 English speech. We are developing joining-in-type robot-assisted language
learning (RALL) system and exploring effective ways of learning with this system through measuring learning
effectiveness.
Approaches
Our research on speech and spoken language is mainly based on signal processing and statistical techniques. The core techniques are SP, NLP and machine learning. In the machine learning field, deep learning is a big trend. Our research is based not only on core techniques, but also on phonetics, cognitive science and user-interface design theory.
Specific research themes
1) Longitudinal analysis of English speech produced by Japanese children
In Japan, there is a plan to implement compulsory English education from the 5th grade of elementary school (at 11
years old, two years earlier than at present, to enable the children to learn communication skills in English more
effectively). Until now, the English speech of native Japanese children who learn English as a foreign language
domestically has not been analyzed or recorded longitudinally.
We biannually collected samples of English speech produced by children at Doshisha elementary school, and we are
analyzing the speech based on signal processing and phonetics longitudinally. We are studying how the children's
pronunciation changes during their physical and intellectual development and how educational program affects the
changes.
We are also developing techniques for automatically assessing L2 English speech. We are now focusing on development
of our original metric for assessing rhythm including correctness of sentence and word stress in speech.
2) Computer-Assisted / Robot-Assisted Language Learning (CALL/RALL)
We aim to develop a computer-/robot-assisted language learning (CALL/RALL) system that enables multi-level English
self training, such as pronunciation, construction of sentence patterns, and conversations. This project includes
research on elemental techniques, such as automatic recognition of spoken English produced by Japanese learners,
automatic assessment of their prosody and linguistic performance, and the development of a joining-in-type
robot-assisted language learning (JIT-RALL) system consisting of two NAO robots. The JIT-RALL system is an
integrated system that operates two robots, one acting as a teacher and the other acting as a co-learner of a human
learner. The robots can collaborate to simulate English conversation, ask same questions to the human learner and
the other robot, or show a model answer to the question. We are measuring the effects on the human learners'
development going from declarative to procedural knowledge through repetitive training on Japanese university
students.
3) Multimodal conversation analysis
In typical human-human conversations, we use nonverbal behaviors as well as verbal information. Nonverbal behaviors
such as gestures, nodding, and eye gaze, play important roles in establishing smooth conversations, especially when
the communicative abilities of the participants are not sufficient (such as in L2 conversations or human-robot
conversations).
The Multimodal Conversation Analysis project is studying on how nonverbal behaviors function in conversations under
the influence of communicative insufficiency by comparing the eye gaze in native-language (L1) and L2 conversations.
The results of these analyses can be adapted to the model for natural gaze patterns of robots and for estimating
people's intention by their gaze activities.
4) Spoken dialogue system
Recent prevalence of smart speaker products and dialogue agents on smartphones has caused a rapid increase in demand
for intelligent information processing on spoken dialogue systems. We have been working on a natural language
understanding (NLU) unit for spoken dialogue systems, specifically utterance intent classification and slot-filling
with recent neural network techniques. We will further advance into the automatic response generation with neural
network techniques from now.
5) Music research
Piano performance can be analyzed in combination with audio signals; the movement of the keys and hammers can be
captured by a digital-recording-equipped Bosendorfer (Bosendorfer CEUS) and movements of fingers and arms can be
captured by an optical motion capture (MoCap) system. We are interested in the human perception of musical tempo
change. It is well known that musical performances tend to accelerate unconsciously. We studied on the relationships
between the human perception of musical tempo changes, musical training, and ability to keep track of tempo.
6) Text-entry methods for smart devices
We are studying human-computer interaction (HCI). We aim to develop an intelligent user interface (IUI) based on
behavioral SP and NLP. For the last three years, we have been studying efficient Japanese text-entry methods on
smartwatches and a correction algorithm of pointing position on touch screens under vibrating environments based on
acceleration signals.
Keywords
- Speech recognition
- Natural language processing
- Nonlinear speech signal processing
- Acquiring foreign language (L2) ability
- Spoken language processing