Paper Abstract and Keywords |
Presentation |
2023-06-30 14:20
Analogy Tasks in BioConceptVec using Biological Pathways Hiroaki Yamagiwa, Ryoma Hashimoto (Kyoto Univ.), Kiwamu Arakane, Ken Murakami (IPR), Momose Oyama, Hidetoshi Shimodaira (Kyoto Univ.), Mariko Okada (IPR) NC2023-18 IBISML2023-18 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
Natural language processing (NLP), often employing models like skip-gram, is widely utilized across numerous application domains to convert words in text into feature vectors known as word embeddings. The utility of this approach has recently been noted in the field of biology, with the introduction of BioConceptVec, a model trained on about 30 million PubMed abstracts using normalized concepts. In general, skip-gram can solve analogy tasks by manipulating word embeddings, such as predicting $emph{text{queen}}$ from $emph{text{king}} - emph{text{man}} + emph{text{woman}}$. In this study, we applied this principle to biological pathways, conducting analogy tasks for pairs of drugs and genes, treating pathway types as relationships. Our results demonstrated high accuracy in these tasks when defining a vector to represent the pathway relationship for pairs of drugs and genes that belong to the same pathway. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
natural language processing / distributed representations / word embeddings / analogy / Biology / PubMed / / |
Reference Info. |
IEICE Tech. Rep., vol. 123, no. 91, IBISML2023-18, pp. 113-120, June 2023. |
Paper # |
IBISML2023-18 |
Date of Issue |
2023-06-22 (NC, IBISML) |
ISSN |
Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
NC2023-18 IBISML2023-18 |
|