Mon, Mar 2 AM 09:20 - 11:00 |
(1) |
09:20-09:45 |
Investigation of neural speech rate conversion with multi-speaker WaveNet vocoder |
Takuma Okamoto (NICT), Keisuke Matsubara (Kobe Univ./NICT), Tomoki Toda (Nagoya Univ./NICT), Yoshinori Shiga, Hisashi Kawai (NICT) |
(2) |
09:45-10:10 |
|
|
(3) |
10:10-10:35 |
Multichannel NMF with Joint-Diagonalizable Constraint Based on Generalized Gaussian Distribution for Blind Source Separation |
Keigo Kamo, Yuki Kubo, Norihiro Takamune (UTokyo), Daichi Kitamura (NIT Kagawa), Hiroshi Saruwatari (UTokyo), Yu Takahashi, Kazunobu Kondo (Yamaha) |
(4) |
10:35-11:00 |
Dimension reduction without multiplication in machine learning |
Nobutaka Ono (TMU) |
|
11:00-11:10 |
Break ( 10 min. ) |
Mon, Mar 2 AM 11:10 - 12:00 |
(5) |
11:10-12:00 |
[Invited Talk]
Target speech extraction in speech mixtures with SpeakerBeam |
Marc Delcroix (NTT), Katerina Zmolikova (BUT), Keisuke Kinoshita, Tsubasa Ochiai, Tomohiro Nakatani, Shoko Araki (NTT) |
|
12:00-13:00 |
Lunch Break ( 60 min. ) |
Mon, Mar 2 PM 13:00 - 14:30 |
(6) |
13:00-14:30 |
Vulnerability investigation of speaker verification against black-box adversarial attacks |
Hiroto Kai, Sayaka Shiota, Hitoshi Kiya (TMU) |
(7) |
13:00-14:30 |
Learning of Classification Models using Emotion-specific Soft Labels for Speech Emotion Recognition |
Mayuko Ozawa, Keisuke Imoto, Ryosuke Yamanishi, Yoichi Yamashita (Ritsumeikan Univ.) |
(8) |
13:00-14:30 |
Japanese dialect speech classification using sequence-to-one neural networks |
Ryo Imaizumi (TMU), Ryo Masumura (NTT), Sayaka Shiota, Hitoshi Kiya (TMU) |
(9) |
13:00-14:30 |
[Poster Presentation]
Neural Voice Activity Detection using Multiple Auxiliary Networks |
Ryo Masumura, Kiyoaki Matsui, Yuma Koizumi, Takanobu Oba (NTT) |
(10) |
13:00-14:30 |
Data augmentation for ASR system by using locally time-reversed speech
-- Temporal inversion of feature sequence -- |
Takanori Ashihara, Tomohiro Tanaka, Takafumi Moriya, Ryo Masumura, Yusuke Shinohara, Makio Kashino (NTT) |
(11) |
13:00-14:30 |
Adaptation to Meeting Speech and Mitigation of Wraparound Speech for End-to-end Speech Recognition |
Kazua Ouchi, Atsuhiko Kai (Shizuoka Univ.) |
(12) |
13:00-14:30 |
The Effectiveness of Additional Context in DNN-based Spontaneous Speech Synthesis |
Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi (UTokyo), Yusuke Ijima, Ryo Masumura (NTT), Hiroshi Saruwatari (UTokyo) |
(13) |
13:00-14:30 |
Production and auditory features of speech sounds before and after vocal training using hearing impairment simulator, WHIS |
Soichi Higashiyama, Hanako Yoshigi, Hideki Kawahara, Toshio Irino (Wakayama Univ.) |
(14) |
13:00-14:30 |
[Poster Presentation]
A study on loudspeaker measurement with reflection cancellation for each frequency in a reflective environment |
Akiyuki Moritani, Yutaka Kaneda (Tokyo Denki Univ.) |
(15) |
13:00-14:30 |
[Poster Presentation]
Study on Error factors and improvement method in low frequency band on MUSIC sound source direction estimation |
Kazuki Yuasa, Yutaka Kaneda (Tokyo Denki Univ) |
(16) |
13:00-14:30 |
[Poster Presentation]
Development of Distributed Wireless Synchronous Recording System for Event Detection |
Taku Kitajima, Kan Okubo, Norio Tagawa (TMU) |
(17) |
13:00-14:30 |
[Poster Presentation]
Evaluation of spatial impression of stereo sound source conversion methods for headphone reproduction |
Yui Ueno, Mitsunori Mizumachi (Kyutech), Toshiharu Horiuchi (KDDI Research, Inc.) |
(18) |
13:00-14:30 |
[Poster Presentation]
Feature Analysis of Accuracy and Direction of Sound Image Localization Using Narrow-band Signal |
Michika Yamada, Fumikazu Saze (TMU), Toshiharu Horiuchi (KDDI Research), Kan Okubo (TMU) |
(19) |
13:00-14:30 |
[Poster Presentation]
Selective synthesis of sound field using directivity control and stereo width control |
Toshiharu Horiuchi, Sumaru Niida (KDDI Research) |
(20) |
13:00-14:30 |
[Poster Presentation]
Sway Angle Estimation for Jib Cranes with three microphones |
Naoki Horie, Masayoshi Nakamoto, Toru Yamamoto (Hiroshima Univ.) |
(21) |
13:00-14:30 |
[Poster Presentation]
Shadow Detection and Removal with CNN using Generative Adversarial Networks |
Takahiro Nagae, Ryo Abiko, Takuro Yamaguchi, Masaaki Ikehara (Keio Univ.) |
(22) |
13:00-14:30 |
[Poster Presentation]
A Robust Approach to Jointly-Sparse Signal Recovery Based on Minimax Concave Loss Function |
Kyohei Suzuki, Masahiro Yukawa (Keio Univ.) |
(23) |
13:00-14:30 |
[Poster Presentation]
[Poster presentation] A Study on application of Nonlinear IIR Filter to Nonlinear Acoustic Echo Canceller |
Kenta Iwai (Ritsumeikan Univ.), Yoshinobu Kajikawa (Kansai Univ.) |
(24) |
13:00-14:30 |
[Poster Presentation]
High-precision modeling of distortion stomp box by deep learning using spectral features |
Kento Yoshimoto, Daichi Kitahara, Akira Hirabayashi (Ritsumeikan Univ.) |
(25) |
13:00-14:30 |
[Poster Presentation]
Sensor placement allowing independent setting of estimation and candidate regions for field estimation based on Gaussian process |
Tomoya Nishida, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari (Univ Tokyo) |
(26) |
13:00-14:30 |
[Poster Presentation]
Restoration of clipped signal using oversampling based on differentiable and convex loss function |
Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari (Univ. Tokyo) |
(27) |
13:00-14:30 |
[Poster Presentation]
Beam steering of portable parametric array loudspeaker using phased array technique with weighting function |
Kyosuke Nakagawa, Yoshinobu Kajikawa (Kansai Univ.) |
(28) |
13:00-14:30 |
[Poster Presentation]
Effective Sound Source Arrangement for Three Sound Source Localization Using Two Microphones |
Yoshiki Kikuchi, Tomoyuki Ishiguro, Kenji Suyama (Tokyo Denki Univ.) |
|
14:30-14:45 |
Break ( 15 min. ) |
Mon, Mar 2 PM 14:45 - 15:35 |
(29) |
14:45-15:10 |
Iterative phase reconstruction to embed image into speech spectrogram |
Arata Kawamura (Kyoto Sangyo Univ.) |
(30) |
15:10-15:35 |
A Pattern Recognition Method Using Secure Sparse Representations in L0 Norm Minimization |
Takayuki Nakachi, Yitu Wang (NTT), Hitoshi Kiya (Tokyo Metro. Univ.) |
|
15:35-15:45 |
Break ( 10 min. ) |
Mon, Mar 2 PM 15:45 - 16:35 |
(31) |
15:45-16:10 |
Performance evaluation of distilling knowledge using encoder-decoder for CTC-based automatic speech recognition systems |
Takafumi Moriya, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara (NTT) |
(32) |
16:10-16:35 |
Dysarthric Speech Recognition Based on Deep Metric Learning |
Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki (Kobe Univ.) |
|
16:35-16:45 |
Break ( 10 min. ) |
Mon, Mar 2 PM 16:45 - 17:25 |
(33) |
16:45-17:25 |
[Fellow Memorial Lecture]
Building Dictionary for Media Information |
Kunio Kashino (NTT) |
Tue, Mar 3 AM 09:00 - 10:30 |
(34) |
09:00-10:30 |
[Poster Presentation]
Implementation of a high-accuracy method for automatic fluency scoring of spontaneous English utterances by Japanese learners |
Ayano Yasukagawa, Shintaro Ando, Eisuke Konno, Zhenchao Lin, Yusuke Inoue, Daisuke Saito, Nobuaki Minematsu (UTokyo), Kazuya Saito (UCL) |
(35) |
09:00-10:30 |
[Poster Presentation]
Initial analysis of oral reading skills obtained from large scale subjective evaluation |
Takuya Ozuru (Univ. of Tokyo), Yusuke Ijima (NTT), Daisuke Saito, Nobuaki Minematsu (Univ. of Tokyo) |
(36) |
09:00-10:30 |
[Poster Presentation]
Automatic estimation of prosodic control made in English utterances using DNN-based acoustic models trained with prosodic features and labels |
Yang Shen, Shintarou Ando, Nobuaki Minematsu, Daisuke Saito (UTokyo), Satoshi Kobashikawa (NTT) |
(37) |
09:00-10:30 |
[Poster Presentation]
An Educational Study on Prosodic Symbols and Their Acoustic Realization Using Japanese End-to-end Speech Synthesis |
Fuki Yoshizawa (UTokyo), Tadashi Kumano (NHK), Nobuaki Minematsu (UTokyo), Kiyoshi Kurihara (NHK) |
(38) |
09:00-10:30 |
Evaluation of vocal personality and expression for speech synthesized by non-parallel voice conversion with narrative speech |
Ryotaro Nagase, Keisuke Imoto, Ryosuke Yamanishi, Yoichi Yamashita (Ritsumeikan Univ.) |
(39) |
09:00-10:30 |
Cross-Lingual Voice Conversion using Cyclic Variational Auto-encoder |
Hikaru Nakatani, Patrick Lumban Tobing, Kazuya Takeda, Tomoki Toda (Nagoya Univ.) |
(40) |
09:00-10:30 |
Semi-supervised Self-produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-conducted Signals Using Variational Autoencoder |
Shogo Seki, Moe Takada, Kazuya Takeda, Tomoki Toda (Nagoya Univ.) |
(41) |
09:00-10:30 |
A Study for HMM-based embedded speech synthesis using a large-scale speech corpus |
Nobuyuki Nishizawa, Tomohiro Obara, Hiromi Ishizaki (KDDI Research, Inc.) |
(42) |
09:00-10:30 |
LARGE-CONTEXT POINTER-GENERATOR NETWORKS FOR SPOKEN-TO-WRITTEN STYLE CONVERSION |
Mana Ihori, Akihiko Takashima, Ryo Masumura (NTT) |
(43) |
09:00-10:30 |
A study on step size control method for pre-estimating adaptive filter in feedback type active noise control systems |
Kensaku Fujii (Kodaway Lab.), Mitsuji Muneyasu (Kansai Univ.), Yoshifumi Chisaki (CIT) |
(44) |
09:00-10:30 |
[Poster Presentation]
Design of automatic soundscape generation based on image object detection |
Yoshifumi Chisaki (CIT), Toshiharu Horiuchi (KDDI Research, Inc.) |
(45) |
09:00-10:30 |
[Poster Presentation]
A study on reverberation time estimation based on regression error |
Yohei Iiyama, Yutaka Kaneda (Tokyo Denki Univ.) |
(46) |
09:00-10:30 |
[Poster Presentation]
High Resolution Acoustic Analysis for Classification of Bell Crickets |
Hideto Otsuka, Fumikazu Saze, Kan Okubo (TMU) |
(47) |
09:00-10:30 |
[Poster Presentation]
Basic Examination on Omni-Directional Sound Source Using Facing Ultrasonic Sensor Arrays |
Kyoka Okamoto, Kan Okubo (TMU) |
(48) |
09:00-10:30 |
[Poster Presentation]
Study on method for calculating loudness of stationary sound using auditory filterbank |
Takuto Isoyama, Shunsuke Kidani, Masashi Unoki (JAIST) |
(49) |
09:00-10:30 |
[Poster Presentation]
Time-domain audio source separation using multiresolution deep layered analysis based on simultaneous learning of neural networks and wavelet basis functions |
Shihori Kozuka, Tomohiko Nakamura, Hiroshi Saruwatari (UTokyo) |
(50) |
09:00-10:30 |
[Poster Presentation]
Bed ANC System with AF-VS for Reducing the Noise in ICU |
Reo Maeda, Yoshinobu Kajikawa (Kansai Univ.), Liu Lichuan, Bi Congzhi (NIU) |
(51) |
09:00-10:30 |
[Poster Presentation]
Multi-scale graph construction method for graph signal coding with SPIHT algorithm |
Kosuke Abe, Yuichi Tanaka (TUAT) |
(52) |
09:00-10:30 |
[Poster Presentation]
A Comparison of Language Models for a Design of Reduced Phoneme Set |
Shuji Komeiji, Toshihisa Tanaka (TUAT), Koichi Shinoda (titech) |
(53) |
09:00-10:30 |
[Poster Presentation]
Decoding of Non-Isochronous Rhythms Imagery from EEG Using Convolutional Neural Network |
Naoki Yoshimura, Toshihisa Tanaka (TUAT) |
(54) |
09:00-10:30 |
[Poster Presentation]
Performance Evaluation of Convolutional-Sparse-Coded Dynamic Mode Decomposition in River Groynes Model Experiment |
Yusuke Arai, Yuhei Kaneko, Shogo Muramatsu, Hiroyasu Yasuda, Kiyoshi Hayasaka, Yu Otake (Niigata Univ.) |
(55) |
09:00-10:30 |
[Poster Presentation]
EEG-Based Estimation of Attentional Direction while Simultaneously Listening to Music and Speech |
Ryosuke Matsui, Toshihisa Tanaka (TUAT) |
(56) |
09:00-10:30 |
[Poster Presentation]
Comparison of Neural Network Models for Detection of Spatiotemporal Abnormal Intervals in Epileptic EEG |
Kosuke Fukumori (TUAT), Noboru Yoshida (Juntendo Univ.), Toshihisa Tanaka (TUAT) |
(57) |
09:00-10:30 |
[Poster Presentation]
EpiNet: Convolutional Neural Network for Epileptic Seizure Localization from Interictal Intracranial EEG |
Kosuke Mori, Kosuke Fukumori, Toshihisa Tanaka (TUAT), Yasushi Iimura, Takumi Mitsuhashi, Hidenori Sugano (Juntendo Univ.) |
|
10:30-10:45 |
Break ( 15 min. ) |
Tue, Mar 3 AM 10:45 - 11:35 |
(58) |
10:45-11:35 |
[Invited Talk]
How to incorporate spatial model in deep learning based speech source separation? |
Masahito Togami (LINE) |
|
11:35-12:35 |
Lunch Break ( 60 min. ) |
Tue, Mar 3 PM 12:35 - 13:25 |
(59) |
12:35-13:00 |
Real-time visualization of reverberation time using frequency domain variants of velvet noise |
Hideki Kawahara (Wakayama Univ.), Ken-Ichi Sakakibara (Health Science Univ. Hokkaido), Mitsunori Mizumachi (Kyushu Inst. Tech.), Masanori Morise (Meiji Univ.), Hideki Banno (Meijo Univ.) |
(60) |
13:00-13:25 |
An objective value for dialogue level auto adjustment on the production of second audio program |
Hiroki Kubo, Satoshi Oode (NHK) |
|
13:25-13:35 |
Break ( 10 min. ) |
Tue, Mar 3 PM 13:35 - 14:50 |
(61) |
13:35-14:00 |
Application of frequency-domain variant of velvet noise to the measurement of auditory effects on the fundamental frequency of sustained voicing |
Hideki Kawahara (Wakayama Univ.), Ken-Ichi Sakakibara (Health Science Univ. of Hoakkaido), Minoru Tsuzaki (KCU), Toshie Matsui (TIT), Masanori Morise (Meiji Univ.), Toshio Irino (Wakayama Univ.) |
(62) |
14:00-14:25 |
Comparison of feature parameters from original speech, LPC-based estimated speech and residual speech for speaker identification |
Seiichi Nakagawa, Kohto Hanai, Kazumasa Yamamoto () |
(63) |
14:25-14:50 |
|
|
|
14:50-15:00 |
Break ( 10 min. ) |
Tue, Mar 3 PM 15:00 - 16:40 |
(64) |
15:00-15:25 |
Multiresolutional graph learning |
Koki Yamada, Yuichi Tanaka (TUAT) |
(65) |
15:25-15:50 |
Mixed norm minimization based on epigraphical projection |
Seisuke Kyochi (The Univ. of Kitakyushu), Shunsuke Ono (Tokyo Tech) |
(66) |
15:50-16:15 |
Rice Field Heat Map based on The Gaze of Experts in Growth Diagnosis |
Hidenori Watanabe (Kumamoto IRI) |
(67) |
16:15-16:40 |
A Portscan Detection Based on Low-rankness of Destination Port Matrices |
Hiroki Nousou, Masao Yamagishi, Isao Yamada (Tokyo Tech) |