講演抄録/キーワード |
講演名 |
2022-05-27 15:10
[ポスター講演]A Transformer for Long Medical Documents ○Cherubin Mugisha・Incheon Paik(UoA) SC2022-8 |
抄録 |
(和) |
Natural language processing models are advancing technology by extracting valuable information from different datasets. Biomedical texts are characterized by various challenges that require domain-specific models for effective biomedical text mining. Long sequences require large memory for a quadratic computation. With this work, we are introducing a transformer for long medical documents. We Trained this model with different biomedical datasets and it can handle a sequence length of up 4096 tokens. To build our model, we generated a Byte-level
tokenization using the Byte-Pair Encoding inspired by RoBERTa, in addition to an attention mechanism that scales linearly with the sequence
length. We fine-tuned our model and demonstrated competitive results on question answering tasks. |
(英) |
Natural language processing models are advancing technology by extracting valuable information from different datasets. Biomedical texts are characterized by various challenges that require domain-specific models for effective biomedical text mining. Long sequences require large memory for a quadratic computation. With this work, we are introducing a transformer for long medical documents. We Trained this model with different biomedical datasets and it can handle a sequence length of up 4096 tokens. To build our model, we generated a Byte-level
tokenization using the Byte-Pair Encoding inspired by RoBERTa, in addition to an attention mechanism that scales linearly with the sequence
length. We fine-tuned our model and demonstrated competitive results on question answering tasks. |
キーワード |
(和) |
Transformer / Medical text / Tokenization / MIMIC / / / / |
(英) |
Transformer / Medical text / Tokenization / MIMIC / / / / |
文献情報 |
信学技報, vol. 122, no. 50, SC2022-8, pp. 43-53, 2022年5月. |
資料番号 |
SC2022-8 |
発行日 |
2022-05-20 (SC) |
ISSN |
Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
SC2022-8 |