A differential-processing extraction approach to text and image segmentation

Research output: Contribution to journalArticlepeer-review

Abstract

To efficiently store the information found in paper documents, text and non-text regions need to be separated. Non-text regions include half-tone photographs and line diagrams. The text regions can be converted (via an optical character reader) to a computer-searchable form, and the non-text regions can be extracted and preserved in compressed form using image-compression algorithms. In this paper, an effective system for automatically segmenting a document image into regions of text and non-text is proposed. The system first performs an adaptive thresholding to obtain a binarized image. Subsequently the binarized image is smeared using a run-length differential algorithm. The smeared image is then subjected to a text characteristic filter to remove error smearing of non-text regions. Next, baseline cumulative blocking is used to rectangularize the smeared region. Finally, a text block growing algorithm is used to block out a text sentence. The recognition of text is carried out on a text sentence basis.

Original languageAmerican English
Pages (from-to)639-651
Number of pages13
JournalEngineering Applications of Artificial Intelligence
Volume7
Issue number6
DOIs
StatePublished - Dec 1994
Externally publishedYes

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Keywords

  • Text and image segmentation
  • character recognition
  • document processing
  • run-length differential algorithm

Fingerprint

Dive into the research topics of 'A differential-processing extraction approach to text and image segmentation'. Together they form a unique fingerprint.

Cite this