Automatic handwriter identification using advanced machine learning

Durou, Amal (2019) Automatic handwriter identification using advanced machine learning. Doctoral thesis, Northumbria University.

[img]
Preview
Text (Doctoral thesis)
Durou.Amal_phd.pdf - Submitted Version

Download (4MB) | Preview

Abstract

Handwriter identification a challenging problem especially for forensic investigation. This topic has received significant attention from the research community and several handwriter identification systems were developed for various applications including forensic science, document analysis and investigation of the historical documents. This work is part of an investigation to develop new tools and methods for Arabic palaeography, which is is the study of handwritten material, particularly ancient manuscripts with missing writers, dates, and/or places. In particular, the main aim of this research project is to investigate and develop new techniques and algorithms for the classification and analysis of ancient handwritten documents to support palaeographic studies.

Three contributions were proposed in this research. The first is concerned with the development of a text line extraction algorithm on colour and greyscale historical manuscripts. The idea uses a modified bilateral filtering approach to adaptively smooth the images while still preserving the edges through a nonlinear combination of neighboring image values. The proposed algorithm aims to compute a median and a separating seam and has been validated to deal with both greyscale and colour historical documents using different datasets. The results obtained suggest that our proposed technique yields attractive results when compared against a few similar algorithms.

The second contribution proposes to deploy a combination of Oriented Basic Image features and the concept of graphemes codebook in order to improve the recognition performances. The proposed algorithm is capable to effectively extract the most distinguishing handwriter’s patterns. The idea consists of judiciously combining a multiscale feature extraction with the concept of grapheme to allow for the extraction of several discriminating features such as handwriting curvature, direction, wrinkliness and various edge-based features. The technique was validated for identifying handwriters using both Arabic and English writings captured as scanned images using the IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting. The results obtained clearly demonstrate the effectiveness of the proposed method when compared against some similar techniques.

The third contribution is concerned with an offline handwriter identification approach based on the convolutional neural network technology. At the first stage, the Alex-Net architecture was employed to learn image features (handwritten scripts) and the features obtained from the fully connected layers of the model. Then, a Support vector machine classifier is deployed to classify the writing styles of the various handwriters. In this way, the test scripts can be classified by the CNN training model for further classification. The proposed approach was evaluated based on Arabic Historical datasets; Islamic Heritage Project (IHP) and Qatar National Library (QNL). The obtained results demonstrated that the proposed model achieved superior performances when compared to some similar methods

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: Text, line, extraction, writer, identification, oriented, image, dimensionality, reduction, machine, learning
Subjects: G400 Computer Science
G600 Software Engineering
Department: Faculties > Engineering and Environment > Computer and Information Sciences
University Services > Graduate School > Doctor of Philosophy
Depositing User: John Coen
Date Deposited: 07 Feb 2020 15:40
Last Modified: 12 Feb 2020 14:19
URI: http://nrl.northumbria.ac.uk/id/eprint/42061

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics