Committee |
Date Time |
Place |
Paper Title / Authors |
Abstract |
Paper # |
SeMI, IPSJ-UBI, IPSJ-MBL |
2024-02-29 11:30 |
Fukuoka |
|
Detecting Distress Variations Using Multimodal Data Obtained through Interaction with A Smart Speaker Chingyuan Lin, Yuki Matsuda, Hirohiko Suwa, Keiichi Yasumoto (Naist) SeMI2023-73 |
Mental health significantly affects people, with excessive stress potentially causing depression, low productivity, and ... [more] |
SeMI2023-73 pp.13-18 |
EMM |
2023-03-03 09:10 |
Nagasaki |
Fukue culture hall (Primary: On-site, Secondary: Online) |
Study on Analysis of Amplitude and Frequency Perturbation in the Voice for Fake Audio Detection Kai Li, Yao Wang, Minh Le Nguyen, Masato Akagi, Masashi Unoki (JAIST) EMM2022-88 |
Fake audio detection (FAD) aims to detect fake speech generated by advanced voice conversion and text-to-speech technolo... [more] |
EMM2022-88 pp.110-115 |
SP, IPSJ-SLP, EA, SIP [detail] |
2023-02-28 15:55 |
Okinawa |
(Primary: On-site, Secondary: Online) |
Self-Supervised Learning With Spatial Audio-Visual Recording for Sound Event Localization and Detection Yoto Fujita (Kyoto Univ.), Yoshiaki Bando (AIST), Keisuke Imoto (Doshisha Univ./AIST), Masaki Onihsi (AIST), Yoshii Kazuyoshi (Kyoto Univ.) EA2022-89 SIP2022-133 SP2022-53 |
This paper describes an unsupervised pre-training method for sound event localization and detection (SELD) on multi-chan... [more] |
EA2022-89 SIP2022-133 SP2022-53 pp.78-82 |
EMM |
2023-01-26 13:35 |
Miyagi |
Tohoku Univ. (Primary: On-site, Secondary: Online) |
Audio zero-watermarking method based on auditory spectral representation Atsuki Ichikawa, Masashi Unoki (JAIST) EMM2022-65 |
Audio zero-watermark technique creates a detection key from watermark and binary pattern generated from features of the ... [more] |
EMM2022-65 pp.20-25 |
SP, WIT, IPSJ-SLP [detail] |
2022-10-22 15:40 |
Kyoto |
Kyoto University (Primary: On-site, Secondary: Online) |
Conformer based early fusion model for audio-visual speech recognition Nobukazu Aoki, Shun Sawada, Hidefumi Ohmura, Kouichi Katsurada (Tokyo Univ. of Sci.) SP2022-28 WIT2022-3 |
Previous studies of late fusion models with conformer encoders use independent encoders for both visual and audio inform... [more] |
SP2022-28 WIT2022-3 pp.8-13 |
EA, ASJ-H |
2022-08-04 15:15 |
Miyagi |
(Primary: On-site, Secondary: Online) |
[Invited Talk]
Audio Source Separation Combining Wavelet Transform and Deep Neural Network Tomohiko Nakamura (Univ. Tokyo) EA2022-32 |
Audio source separation is a technique of separating an observed audio signal into individual source signals. The use of... [more] |
EA2022-32 p.25 |
EA |
2022-05-13 15:00 |
Online |
Online |
Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino (NTT) EA2022-9 |
Many application studies rely on audio DNN models pre-trained on a large-scale dataset as essential feature extractors, ... [more] |
EA2022-9 pp.41-45 |
MBE, NC (Joint) |
2022-03-03 15:55 |
Online |
Online |
A study on hit classification by machine learning of Japanese popular music using Spotify Audio Features Kengo Kitamura, Susumu Kuroyanagi (NIT) NC2021-67 |
It is assumed that hit songs have common features with respect to the characteristics of hit songs. Based on this assump... [more] |
NC2021-67 pp.112-117 |
SP, EA, SIP |
2020-03-03 13:00 |
Okinawa |
Okinawa Industry Support Center (Cancelled but technical report was issued) |
An objective value for dialogue level auto adjustment on the production of second audio program Hiroki Kubo, Satoshi Oode (NHK) EA2019-160 SIP2019-162 SP2019-109 |
In recent years, next generation audio services using object-based audio has been introduced to broadcasting services. A... [more] |
EA2019-160 SIP2019-162 SP2019-109 pp.343-348 |
EA, EMM |
2019-11-22 15:30 |
Ishikawa |
Kanazawa Institute of Technology |
EA2019-60 EMM2019-88 |
In this paper, we propose a time-domain audio source separation method using down-sampling and up-sampling layers based ... [more] |
EA2019-60 EMM2019-88 pp.41-48 |
SR, RCS (Joint) (2nd) |
2018-10-31 10:25 |
Overseas |
Mandarin Hotel, Bangkok, Thailand |
[Poster Presentation]
A Comparison of Machine Learning Algorithms for Motor Sound Fault Detection Arpith Paida (AIT), Prerapong, Aimaschana Niruntasukrat, Koonlachat Meesublak, Panita (NECTEC) |
Automation plays important role in order to make human activities easier. In industries, machines /motors are used for m... [more] |
|
PRMU, SP |
2018-06-28 15:10 |
Nagano |
|
Multimodal voice conversion using deep bottleneck features and deep canonical correlation analysis Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu (Gifu Univ.), Tomoki Toda (Nagoya Univ.) PRMU2018-24 SP2018-4 |
In this paper, we aim at improving the speech quality in voice conversion and propose a novel multi-modal voice conversi... [more] |
PRMU2018-24 SP2018-4 pp.13-18 |
ITS, IE, ITE-MMS, ITE-HI, ITE-ME, ITE-AIT [detail] |
2018-02-15 15:45 |
Hokkaido |
Hokkaido Univ. |
A Note on Estimation of Users' Emotion Evoked During Listening to Music
-- Performance Improvement Based on Deep Learning Method -- Hakusyou Dan, Takahiro Ogawa, Miki Haseyama (Hokkaido Univ.) |
This paper presents a method that estimates users’ emotion evoked during listening to music. In our method, we use audio... [more] |
|
HCS, HIP, HI-SIGCOASTER [detail] |
2017-05-16 15:45 |
Okinawa |
Okinawa Industry Support Center |
Extraction of acoustic features of emotional speech and their characteristics Takashi Yamazaki, Minoru Nakayama (Tokyo Tech.) HCS2017-17 HIP2017-17 |
In this paper, we extracted the acoustic features of emotional speech and examined the effect of the feature on emotiona... [more] |
HCS2017-17 HIP2017-17 pp.127-130 |
SP |
2017-01-21 11:00 |
Tokyo |
The University of Tokyo |
[Poster Presentation]
Designing linguistic features for expressive speech synthesis using audiobooks Chiaki Asai, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda (Nagoya Inst. of Tech.) SP2016-70 |
In order to synthesize expressive speech, various statistical parametric speech synthesis systems have been proposed. Sp... [more] |
SP2016-70 pp.35-40 |
EA, ASJ-H |
2015-08-04 10:00 |
Miyagi |
Tohoku Univ., Research Inst. of Electrical Communication |
On Development of an Estimation Model for Instantaneous Presence in Audio-visual Content Shota Tsukahara, Kenji Ozawa, Yuichiro Kinoshita, Masanori Morise (Univ. Yamanashi) |
The sense of presence is often used to evaluate the performances of audio-visual (AV) content and systems. However, a pr... [more] |
EA2015-17 pp.41-46 |
EMM |
2015-03-13 13:50 |
Okinawa |
|
Study on Watermarking for Digital Audio based on Adaptive Phase Modulation Nhut Minh Ngo, Masashi Unoki (JAIST) EMM2014-102 |
This paper proposes a novel blind watermarking method for digital audio based on adaptive phase modulation. Audio signal... [more] |
EMM2014-102 pp.149-154 |
SP |
2015-01-22 10:25 |
Gifu |
Juroku Plaza |
A study for the robustness of multi-modal voice conversion Daiki Kawashima, Satoshi Tamura, Satoru Hayamizu (Gifu Univ.) SP2014-128 |
Voice Conversion (VC) is a technique to convert speeches of source speaker into those of target speaker. VC has an issue... [more] |
SP2014-128 pp.7-12 |
SIS |
2014-12-18 15:50 |
Kyoto |
Kyoto Research Park (Kyoto City) |
[Invited Talk]
A Hybrid Systems Approach to Modeling and Learning Multimedia Timing Structures Hiroaki Kawashima (Kyoto Univ.) SIS2014-78 |
Capturing dynamic events of human body motion, facial action, and speech, via sensors, e.g., cameras and microphones, we... [more] |
SIS2014-78 pp.63-68 |
PRMU |
2014-03-14 10:45 |
Tokyo |
|
A study on multi-modal speech recognition using depth images Naoya Ukai, Satoshi Tamura, Satoru Hayamizu (Gifu Univ.) PRMU2013-198 |
This paper presents a novel framework which uses depth information of human face and mouth movements as yet another moda... [more] |
PRMU2013-198 pp.179-184 |