IEICE Technical Report

Online edition: ISSN 2432-6380

Pattern Recognition and Media Understanding

Workshop Date : 2025-03-18 - 2025-03-19 / Issue Date : 2025-03-11

PRMU2024-46
Sign Language Recognition Method through Skeletal Estimation Information in Still Images
Momoko Yoda, Inoue Masato (Waseda Univ.)
pp. 1 - 6

PRMU2024-47
Consistency-Aware Sketch-Based 3D Shape Editing
Tomohiro Aizawa, Chunzhi Gu, Kuriyama shigeru (TUT)
pp. 7 - 12

PRMU2024-48
Recipe-Based Food State Recognition Using CLIP for Partially Occluded Cooking Images
Rina Tagami, Naoto Hiramatsu, Hiroki Kobayashi, Shuichi Akizuki, Manabu Hashimoto (Chukyo Univ.)
pp. 13 - 18

PRMU2024-49
MultiSensor-Home Dataset and Sensor Integration Method for Wide-area Multi-view Multi-modal Action Recognition
Nguyen Trung Thanh (Meidai), Kawanishi Yasutomo, Vijay John (RIKEN), Komamizu Takahiro, Ide Ichiro (Meidai)
pp. 19 - 24

PRMU2024-50
[Invited Talk] Real-World Robot Applications of Foundation Models
Kento Kawaharazuka (The Univ. of Tokyo)
p. 25

PRMU2024-51
[Invited Talk] Image Recognition and Understanding with Large Vision Language Models
Teppei Suzuki (SB Intuitions)
pp. 26 - 27

PRMU2024-52
SVGEditBench V2: A Benchmark for Instruction-based SVG Editing
Kunato Nishina, Yusuke Matsui (UTokyo)
pp. 28 - 40

PRMU2024-53
Active Acoustic Sensing for Predicting Semantic Segmentation Results
Taiki Mori, Junpei Honma, Shogo Yonezawa, Seiya Kodama, Go Irie (TUS)
pp. 41 - 46

PRMU2024-54
Pedestrian Age Recognition by Contrastive Learning Using a Large Visual Language Model That Automatically Generates Well-Separated Descriptions Across Classes.
Takumi Ozaki (Tottori Univ.), Hidenori Kuribayashi (GLORY), Michiko Inoue, Masashi Nishiyama (Tottori Univ.)
pp. 47 - 52

PRMU2024-55
Sign-to-text matching space for new sign selection
Matheus Silva de Lima, Pedro H. V. Valois (ITF), Erica Kido Shimomoto (AIST), Nobuko Kato (NTUT), Kazuhiro Fukui (ITF)
pp. 53 - 57

PRMU2024-56
Incremental Learning of Panoptic Lifting Without Catastrophic Forgetting by Viewpoint Selection for Maximizing Visible Areas
Akira Kohjin (Shiga Univ.), Motoharu Sonogashira (RIKEN), Masaaki Iiyama (Shiga Univ.), Yasutomo Kawanishi (RIKEN)
pp. 58 - 63

PRMU2024-57
Neural Real-Time RGB-D SLAM in Dynamic Environments
Qinyuan Zhou, Kazuhiko Sumi (Aoyama Gakuin Univ.)
pp. 64 - 69

PRMU2024-58
Proposal for Layout Generation in Graduation Albums Using Large Language Models
Hiraku Matsuda, Mutsuo Sano (OIT)
pp. 70 - 75

PRMU2024-59
Improvement of Gaze Direction Detection Model in Driving Simulator Using Novel Gaze Dynamic Parameters
Lu Zheyin, Kazuhiko Sumi (AGU)
pp. 76 - 81

PRMU2024-60
Fine-tuning Text-to-Motion generators based on embedding diffusion models into LLMs
Shinichi Tanaka (Waseda U.), Zhao Wang (Sony/Wasaeda U.), Yoichi Kato, Jun Ohya (Waseda U.)
pp. 82 - 87

PRMU2024-61
Learning VQ-VAE for Image Dimensionality Reduction with Local Phase Loss
Naoyuki Ichimura (AIST)
pp. 88 - 93

PRMU2024-62

Ayato Fujibayashi, Minoru Mori (KAIT)
pp. 94 - 99

PRMU2024-63
Siamese Network-based Answer Similarity for Automatically Scoring Handwritten Very Short Answers
Tuan Nam Ly, Hung Tuan Nguyen, Masaki Nakagawa (TUAT)
pp. 100 - 105

Note: Each article is a technical report without peer review, and its polished version will be published elsewhere.

The Institute of Electronics, Information and Communication Engineers (IEICE), Japan