Paper Abstract and Keywords |
Presentation |
2023-03-17 15:40
Improvement of cross-attention modules for image captioning using pixel-wise semantic information Zhihao Chen, Keisuke Doman, Yoshito Mekada (Chukyo Univ.) IMQ2022-84 IE2022-161 MVE2022-114 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Most of image captioning models have attention modules, and the module outputs an attention map (weighted feature map) from an image and a word sequence. However, a predicted word sequence may contain errors, and thus, the module output may not also be as expected. In this regard, we showed that pixel-wise semantic information (PSI) was effective for improving the attention modules in CNN-based models. This paper proposes the introduction of the PSI to cross attention modules in Transformer-based models. Experimental results showed that the PSI contributed the improvement of captioning accuracy on GRIT, a Transformer-based model. The accuracy of the proposed method, a BLEU4 score of 42.9, was equivalent to the sixth place on the COCO Captions Benchmark. This paper also studies grouping synonyms (e.g. ``table'' and ``coffee table'') in the PSI as a solution of the problem that attentions are distributed among synonyms, and discusses its effectiveness. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
Image Captioning / Vision and Language / Attention / Semantic segmentation / / / / |
Reference Info. |
IEICE Tech. Rep., vol. 122, no. 440, MVE2022-114, pp. 333-338, March 2023. |
Paper # |
MVE2022-114 |
Date of Issue |
2023-03-08 (IMQ, IE, MVE) |
ISSN |
Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
IMQ2022-84 IE2022-161 MVE2022-114 |
Conference Information |
Committee |
IMQ IE MVE CQ |
Conference Date |
2023-03-15 - 2023-03-17 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Okinawaken Seinenkaikan (Naha-shi) |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
Media of five senses, Multimedia, Media experience, Picture codinge, Image media quality, Network,quality and reliability, etc(AC) |
Paper Information |
Registration To |
MVE |
Conference Code |
2023-03-IMQ-IE-MVE-CQ |
Language |
Japanese |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Improvement of cross-attention modules for image captioning using pixel-wise semantic information |
Sub Title (in English) |
|
Keyword(1) |
Image Captioning |
Keyword(2) |
Vision and Language |
Keyword(3) |
Attention |
Keyword(4) |
Semantic segmentation |
Keyword(5) |
|
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Zhihao Chen |
1st Author's Affiliation |
Chukyo University (Chukyo Univ.) |
2nd Author's Name |
Keisuke Doman |
2nd Author's Affiliation |
Chukyo University (Chukyo Univ.) |
3rd Author's Name |
Yoshito Mekada |
3rd Author's Affiliation |
Chukyo University (Chukyo Univ.) |
4th Author's Name |
|
4th Author's Affiliation |
() |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-1 |
Date Time |
2023-03-17 15:40:00 |
Presentation Time |
25 minutes |
Registration for |
MVE |
Paper # |
IMQ2022-84, IE2022-161, MVE2022-114 |
Volume (vol) |
vol.122 |
Number (no) |
no.437(IMQ), no.439(IE), no.440(MVE) |
Page |
pp.333-338 |
#Pages |
6 |
Date of Issue |
2023-03-08 (IMQ, IE, MVE) |
|