Home

Optical recognition of printed mathematical documents

R. Miyazaki, K. Inoue
RICOH Ltd

M. Suzuki
suzuki@math.kyushu-u.ac.jp
Graduate School of Mathematics
Kyushu University
Japan

Abstract

The recent development of OCR technology made possible to use it practicaly in the various applications. However, there is no commertial OCR softweare which can recognize the scientific documents including mathematical formulas. In fact, there are very few researches on this subject up to recent years. The lack of this technology present a serious difficulty in order to make use of OCR in the scientific fields.

This paper describes our new system of OCR, which can handle the scientific documents containing mathematical expressions. Our system is composed of the following two major steps.

After the extraction of the lines from a scaned image, we first segment each line into the Japanese area and the mathematical formula area. This segmentation and the recognition of the Japanese characters are done at the same time in the DP-matching frame work. The correction algorithm of the recognition based on the linguistic morphology is also implimeted, considering the mathematical areas as nouns.

The second part of the process analyses the mathematical formula area. Here, we improved considerably the two pioneer works [1] and [2] below on the mathematical formula recognition. We took the top-down method throughout all the formula recognition process, making use of the recurrent algorithm. With several contrivances, this recurrent algorithm made the system possible to recognize correctly even more complicated formulas than we use normally, except for the matrices at this moment. The system works reliably on almost noiseless images obtained by 400 dpi scanning from the usual clearly printed documents.

[1] M.Okamoto, Y.Azuma, Mathemaical structure recognition based on the big symbol layouts (in Japanese), Shingakuron, J78-D, No.3, pp474-482 (1995)
[2] R.J.Fateman, T.Tokuyama, B.Berman, N.Mitchell, Optical Character Recognition and Parsing of Typeset Mathematics, Computer Science Division, EECS Dep't (1995)


Go Back
 
Copyright & Disclaimers

© 2005 ATCM, Inc. © 2005 Any2Any Technologies, Ltd.