Optical recognition
of printed mathematical documents
R. Miyazaki, K. Inoue
RICOH Ltd
M. Suzuki
suzuki@math.kyushu-u.ac.jp
Graduate School of Mathematics
Kyushu University
Japan
Abstract
The recent development of OCR technology
made possible to use it practicaly
in the various applications. However,
there is no commertial OCR softweare
which can recognize the scientific
documents including mathematical
formulas. In fact, there are very
few researches on this subject up
to recent years. The lack of this
technology present a serious difficulty
in order to make use of OCR in the
scientific fields.
This paper describes our new system
of OCR, which can handle the scientific
documents containing mathematical
expressions. Our system is composed
of the following two major steps.
After the extraction of the lines
from a scaned image, we first segment
each line into the Japanese area
and the mathematical formula area.
This segmentation and the recognition
of the Japanese characters are done
at the same time in the DP-matching
frame work. The correction algorithm
of the recognition based on the
linguistic morphology is also implimeted,
considering the mathematical areas
as nouns.
The second part of the process
analyses the mathematical formula
area. Here, we improved considerably
the two pioneer works [1] and [2]
below on the mathematical formula
recognition. We took the top-down
method throughout all the formula
recognition process, making use
of the recurrent algorithm. With
several contrivances, this recurrent
algorithm made the system possible
to recognize correctly even more
complicated formulas than we use
normally, except for the matrices
at this moment. The system works
reliably on almost noiseless images
obtained by 400 dpi scanning from
the usual clearly printed documents.
[1] M.Okamoto, Y.Azuma, Mathemaical
structure recognition based on the
big symbol layouts (in Japanese),
Shingakuron, J78-D, No.3, pp474-482
(1995)
[2] R.J.Fateman, T.Tokuyama, B.Berman,
N.Mitchell, Optical Character Recognition
and Parsing of Typeset Mathematics,
Computer Science Division, EECS
Dep't (1995) |