講演抄録/キーワード |
講演名 |
2004-11-19 14:45
Artistic Line Extraction from Indian Documents ○Umapada Pal・Partha Pratim Roy・N. Tripathy(Indian Statistical Inst)・Hiroyuki Hase(Fukui Univ.) |
抄録 |
(和) |
(まだ登録されていません) |
(英) |
There are printed artistic documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or may be curved shapes. For the Optical Character Recognition (OCR) of these documents, we need to extract such lines properly. Because of multi-oriented and curved behaviour it is very difficult to extract different text lines from the document. In this paper, we propose a water reservoir principle based scheme to extract individual text lines from printed Indian artistic documents. In the proposed scheme, at first, analyzing the area of the reservoirs obtained in a component, we compute mode (portrait, landscape, reverse portrait, reverse landscape) of the component. Next based on the mode and the water reservoir features like number of reservoirs, height of reservoirs, overlapping portion of two reservoirs, etc. the components are grouped into isolated or touching class. Next, depending on reservoir base-area and loops of a component, some candidate envelope points are detected. Each touching component is then classified either straight or curve type depending on the candidate envelope points of the component. Based on the type of a component two boundary points are computed from each touching component. Finally, candidate regions (neighborhoods) of the boundary points of each component are detected and analyzing these candidate regions, individual text lines are segmented. |
キーワード |
(和) |
/ / / / / / / |
(英) |
Text line extraction / Artistic document analysis / Multi-oriented document recognition / Indian document analysis. / / / / |
文献情報 |
信学技報, vol. 104, no. 448, PRMU2004-116, pp. 65-70, 2004年11月. |
資料番号 |
PRMU2004-116 |
発行日 |
2004-11-12 (PRMU) |
ISSN |
Print edition: ISSN 0913-5685 |
PDFダウンロード |
|
|