• unlimited access with print and download
    $ 37 00
  • read full document, no print or download, expires after 72 hours
    $ 4 99
More info
Unlimited access including download and printing, plus availability for reading and annotating in your in your Udini library.
  • Access to this article in your Udini library for 72 hours from purchase.
  • The article will not be available for download or print.
  • Upgrade to the full version of this document at a reduced price.
  • Your trial access payment is credited when purchasing the full version.
Buy
Continue searching

Development of a computer-aided chromosome analysis system to assist cancer diagnosis

Dissertation
Author: Xingwei Wang
Abstract:
Metaphase-finding and karyotyping are standard and fundamental procedures, which are routinely performed in genetic laboratories for clinicians to detect cancers and other genetic diseases. However, visual search, manual identification, and classification of analyzable metaphase chromosomes using optical microscopes are tedious and time-consuming tasks. Hence, an integrated computer-aided detection (CAD) system, which includes both an automated metaphase finding module and a karyotyping chromosome module, may significantly expedite and improve the diagnostic efficiency and performance in a hectic clinical practice. Despite of previous efforts, several challenges have limited the development of automated metaphase-finding and karyotyping schemes: (1) most current automated metaphase finding schemes use low-resolution images to search for the location of potentially analyzable cells and they still needs human intervention; (2) automatic metaphase finding results obtained in low magnification requires technicians to switch to high magnification lens to double check whether it is useful for later karyotyping; (3) most automated karyotyping schemes use a single classifier to classify 24 different types of chromosomes; (4) how to select effective features for classification; and (5) how to find the optimal classifier and thus reduce the complexity of karyotyping. In order to solve these problems, this thesis reports the results of the research in developing a new chromosome analysis system for cancer diagnosis. The integrated chromosome analysis system in this study includes three basic modules. The first module detects whether a microscopic digital image depicts a metaphase chromosome cell in high magnification. A computerized scheme has been developed and tested. It can automatically identify chromosomes in metaphase stage and classify them into analyzable and un-analyzable groups in high magnification. Through computing a set of features for each individual chromosome as well as for each identified metaphase cell, two machine learning classifiers including a decision tree (DT) and an artificial neural network (ANN) are optimized and tested to classify between analyzable and un-analyzable cells. These two classifiers utilize features of individual chromosomes and metaphase cells. The on-line metaphase finding processes less than 350 milliseconds for an eight-bit chromosome image with size 4096*2048. Second, a multi-stage rule-based scheme has been developed to automatically detect centomeres and determine polarities for both abnormal and normal metaphase chromosomes. Automatic centromere identification and polarity assignment are two key factors in the automatic karyotyping of human chromosomes. The scheme implements a modified thinning algorithm to identify the medial axis of a chromosome and extracts four related feature profiles, which include pixel distribution, shape, density, and idealized banding profile. According to a set of pre-optimized classification rules, the scheme adaptively identifies the centromeres, assigns corresponding polarities, and extracts related features of chromosomes. The experimental results demonstrate that the scheme developed in this study can be successfully applied to diverse chromosomes, which include those severely bent and abnormal chromosomes extracted from cancer cells. Third, though analyzing four feature profiles and calculating specific features by weighted density distribution functions, a vector of 31 features is obtained and used to classify chromosomes by a two-layer ANN-based classifier. This two-layer classifier with eight ANNs is optimized by a genetic algorithm. In the first layer, a testing chromosome is classified into one of the seven groups by the ANN. Another ANN is then automatically selected from the seven ANNs in the second layer (one for each group) to further classify this chromosome into one of 24 types. The scheme was evaluated using a "training-testing-validation" method. To assess the performance and robustness of the chromosome analysis system introduced in this thesis, 200 digital microscopic chromosome images are used in this chapter. The automatic metaphase-finding module detects analyzable metaphase cells using a feature-based ANN. The ANN-generated outputs are analyzed by an ROC method and an area under the ROC curve is 0.966. In the extracting feature module, the overall accuracy is 89% for centromere identification and 90.7% for polarity assignment. In karyotyping module, a two-layer DT-based classifier with eight ANNs established in its connection nodes is optimized by a GA. Chromosomes are classified into seven groups by the ANN in the first layer. The chromosomes in these groups are then separately classified by seven ANNs into 24 types in the second layer. The classification accuracy is 94.5% in the first layer. Six ANNs achieved the accuracy above 95% and only one had lessened performance (80.6%) in the second layer. The results demonstrate that the developed automated scheme can achieve high and robust performance in the identification and classification of metaphase chromosomes. Besides assisting cancer detection, this analysis system can help doctors evaluate the prognosis of specific cancer. By analyzing the numerical change of chromosomes and computing the DNA index, the correlated results obtained in this thesis can help doctors to analyze the prognosis of childhood acute lymphoblastic leukemia.

Table of Contents Abstract..........................................................................................................................xvii 1 Introduction.....................................................................................................................1 1.1 Objectives..................................................................................................................1 1.2 Organization...............................................................................................................1 2 Background.....................................................................................................................7 2.1 Human Chromosomes................................................................................................8 2.2 Chromosome Abnormality.........................................................................................8 2.3 What is Metaphase Finding.....................................................................................10 2.4 What is Karyotyping................................................................................................13 2.4.1 Non-Banding Techniques.................................................................................13 2.4.2 Banding Techniques..........................................................................................14 2.4.3 Karyotyping......................................................................................................15 2.5 Fluorescent in Situ Hybridization Analysis.............................................................16 3 Why Automated Computer-aided Diagnostic Chromosome Analysis Systems......17 3.1 Black Diagram of a CAD Chromosome Analysis System......................................17 3.1.1 Metaphase Finding............................................................................................20 3.1.2 Feature Extraction and Selection......................................................................23 3.1.3 Karyotyping - Classification of Chromosomes.................................................26 3.2 Current Status of CAD Chromosome Analysis Systems.........................................28 3.3 Necessity of New CAD Chromosome Analysis Systems........................................30 4 First Module– Automated Metaphase Finding..........................................................33 4.1 A Popular Metaphase Finding Classifier - Backpropagation Algorithm.................33

v

4.1.1 Artificial Neural Network.................................................................................33 4.1.2 Network Topology..............................................................................................i 4.1.3 Backpropagation Algorithm with Momentum Updating..................................35 4.2 A Practical Classifier in Medical Imaging Systems - Decision Tree.......................37 4.3 Kappa Coefficient – a Statistical Measure of Inter-rater Reliability.......................38 4.3.1 The Definition of Kappa Coefficient................................................................38 4.3.2 Kappa Coefficient and Prevalence Index..........................................................39 4.3.3 Kappa Coefficient and Bias Index....................................................................40 4.4 Metaphase Finding by a DT Classifier and a BP Classifier.....................................41 4.4.1 Experiment Data...............................................................................................41 4.4.2 A Flow Chart to Identify Metaphase Chromosomes........................................41 4.4.3 Segmenting Potential Metaphase Cells.............................................................42 4.4.4 Identifying Metaphase Cells by a DT Classifier...............................................45 4.4.5 Identifying Metaphase Cells by a Back Propagation Classifier........................48 4.4.6 Experimental Results using two Classifiers......................................................49 4.5 Automatic On-line Metaphase Finding in High Magnification...............................52 4.5.1 The High-speed Microcopy Scanning System..................................................52 4.5.2 Flat Fielding Correction....................................................................................54 4.5.3 Fast Automated On-line Metaphase Finding....................................................55 4.6 Discussions..............................................................................................................55 5 Second Module - Extracting Features.........................................................................60 5.1 Important Features of Chromosomes.......................................................................60 5.2 A Rule-based Computer Scheme for Extracting Features.......................................63

vi

5.2.1 Experimental Dataset........................................................................................63 5.2.2 A Flow Chart of a Rule-based Computer Scheme............................................65 5.2.3 A Modified Thinning Algorithm......................................................................66 5.2.4 Computation of Feature Profiles.......................................................................69 5.2.5 Identification of Centromeres...........................................................................73 5.2.6 Feature Analysis and Polarity Assignment.......................................................75 5.2.7 Experimental Results........................................................................................76 5.3 Discussions..............................................................................................................81 6 Third Module - Karyotyping.......................................................................................85 6.1 Previous Studies in Classifying Chromosomes.......................................................85 6.2 An Optimization Algorithm – Genetic Algorithm...................................................88 6.3 An Effective Evaluation Method: Training-Testing-Validation..............................89 6.4 A New Optimized Adaptive Scheme to Classify Chromosomes.............................92 6.4.1 Experimental Dataset........................................................................................92 6.4.2 An Initial Feature Pool Extracted from individual chromosomes....................93 6.4.3 Karyotyping by a two Layer BP-based GA - Optimized Classifier..................97 6.4.4 Karyotyping Results........................................................................................103 6.5 Discussions............................................................................................................106 7 Evaluation CAD Systems by Receiver Operating Characteristic Curves.............111 7.1 Basic Concepts.......................................................................................................111 7.2 What is an ROC Curve..........................................................................................113 7.2.1 An Empirical ROC Curve...............................................................................115 7.2.2 A Conventional Binormal ROC Curve...........................................................116

vii

7.3 Why ROC Curves..................................................................................................127 8 A Robust Experiment to Test the Performance and Robustness of a CAD Chromosome Analysis System......................................................................................129 8.1 Experimental Dataset.............................................................................................130 8.2 Results of Testing Performance and Robustness...................................................131 8.3 Discussions and Conclusions.................................................................................135 9 Evaluation of Prognosis for Childhood Acute Lymphoblastic Leukemia.............142 9.1 The Introduction of Childhood Acute Lymphoblastic Leukemia..........................142 9.2 Prognosis Evaluation of Childhood Acute Lymphoblastic Leukemia...................144 9.2.1 Experiment Materials......................................................................................144 9.2.2 Analyzing Numerical Change and Computing DNA Index...........................144 9.2.3 Testing Results................................................................................................146 9.3 Discussions............................................................................................................147 10 Automated Analysis of FISH Chromosomes in Interphase Nucleis of Pap-smear Specimens to Assist Cervical Cancer Diagnosis..........................................................150 10.1 Background about Cervical Cancer Screening Methods.....................................150 10.2 Segmentation of Interphase Cells and Analysis of FISH Spots...........................154 10.2.1 Experimental Materials and Data..................................................................154 10.2.2 Detection and Segmentation of Analyzable Interphase Cells.......................157 10.2.3 Detection of FISH Signal Spots....................................................................159 10.2.4 A Knowledge-based Classifier.....................................................................161 10.2.5 Classification between Normal and Abnormal Cells....................................165 10.2.6 Detecting Interphase Cell and Analyzing FISH Results...............................166

viii

10.3 Discussions..........................................................................................................173 11 Conclusions and Future Work.................................................................................176 11.1 Contributions and Conclusions............................................................................176 11.2 Future Work.........................................................................................................179 12 References..................................................................................................................180

ix

List of Tables

Table 2.1: The classification of chromosomes based on Denver Group classification.

Table 4.1: The strength of agreement with different kappa coefficient.

Table 4.2: Comparison of classification results between a cytogeneticist and the DT based scheme for training dataset.

Table 4.3: Comparison of classification results between a cytogeneticist and the DT based scheme for testing dataset.

Table 5.1: The details of the testing dataset.

Table 5.2: The classification of chromosomes based on Denver Group classification.

Table 5.3: The proportions of correct centromere identification.

Table 5.4: The proportions of correct polarity assignments.

Table 5.5: The results of correct centromere identification and polarity assignments for severely bent chromosomes.

Table 5.6: The results of correct centromere identification and polarity assignments for abnormal chromosomes.

Table 6.1: Possible features used to classify chromosomes.

Table 6.2: GA optimization of eight ANNs and classification results.

Table 6.3: The topologies of eight ANNs used in the second layer of the classifier.

Table 6.4: Summary of different chromosome databases.

Table 7.1: Mammography data used to construct an ROC curve.

Table 7.2: Sensitivity, specificity, FPF for the mammography data at each decision rule.

Table 8.1: Summary of automated centromere identification and polarity assignment results.

Table 8.2: Topologies and classification results of eight ANNs.

Table 9.1: The Prognosis analysis results for three groups.

x

Table 10.1: Comparison of classification results of analyzable interphase cells between cytogeneticist and the DT based computerized scheme.

Table 10.2: Comparison of classification results of normal and abnormal cells between a cytogeneticist and the DT based computerized scheme.

xi

List of Figures

Figure 2.1: Analyzable metaphase chromosome images from three groups.

Figure 2.2: The example of structural changes of chromosomes.

Figure 2.3: Digital images of two metaphase chromosome cells.

Figure 2.4: Ideograms of G-banding patterns for normal human chromosome #1 at five different levels of resolution.

Figure 2.5: An example of karyotyping analyzable metaphase cell. (a) An original analyzable metaphase cell. (b) Corresponding karyotyping chromosomes.

Figure 3.1: The block diagram of a computer-aided chromosome analysis system.

Figure 4.1: The model of an artificial neuron.

Figure 4.2: The sigmoid threshold unit.

Figure 4.3: The topologies of networks.

Figure 4.4: Back propagation algorithm with momentum updating.

Figure 4.5: The calculation of kappa coefficient, prevalence index and bias index.

Figure 4.6: The relation of kappa coefficient and prevalence index.

Figure 4.7: The relation of kappa coefficient and bias index.

Figure 4.8: A flow diagram of a computerized scheme to segment chromosomes and classify metaphase cells into analyzable and un-analyzable cells.

Figure 4.9: A five-feature based DT for recognizing analyzable and un-analyzable metaphase chromosome cells.

Figure 4.10: An example of eliminating the non-chromosome objects. (a) Original image. (b) Processed image.

Figure 4.11: A scatter diagram between two features of 100 training samples including 35 analyzable (“positive”) and 65 un-analyzable (“negative”) cells.

Figure 4.12: A scatter diagram between two features of 70 testing samples including 37 analyzable (“positive”) and 33 un-analyzable (“negative”) cells.

xii

Figure 4.13: Metaphase chromosome images captured with different magnification and different scanning rate. (a) Still image with 10X objective lens. (b) Image captured with 1 mm/sec scan rate with 10X objective lens. (c) Image captured with 1 mm/sec scan rate with 10X objective lens. (d) Still image of with 60X objective lens. (e) Image captured with 1 mm/sec scan rate. (f) Image captured with 2 mm/sec scan rate with 60X objective lens.

Figure 5.1: Ideograms of chromosomes #1 - metacentric, #18 - submetacentric, and #21 – acrocentric.

Figure 5.2: Various morphologies of chromosome #1 acquired from different metaphase cells.

Figure 5.3: The algorithm steps for the centromere identification and polarity assignment of a chromosome.

Figure 5.4: Finding the “true” medial axis of the chromosome #2 and X.

Figure 5.5: 3 × 3 window used for the thinning algorithm.

Figure 5.6: the density and shape profiles of three different types: chromosome #22 (acrocentric), chromosome #10 (submetacentric), and chromosome #1 (metacentric). (a) The shape profile of chromosome #22. (b) The shape profile of chromosome #10. (c) The shape profile of chromosome #1. (d) The density profile of chromosome #22. (e) The density profile of chromosome #10. (f) The density profile of chromosome #1.

Figure 5.7: The procedure of obtaining an idealized banding profile of chromosome #19. (a) An example of chromosome #19. (b) An original banding profile. (c) A reversed banding profile. (d) An idealized density profile gained by a non-linear file.

Figure 5.8: The banding features of ideogram of chromosome #18.

Figure 6.1: Three-way data splits.

Figure 6.2: Display of weighted functions.

Figure 6.3: Illustration of an ANN based DT used to classify 24 different types of chromosomes.

Figure 6.4: Illustration of two ANNs optimized and used in the scheme to classify chromosomes.

Figure 6.5: Distribution of selection of each feature used in the seven ANNs.

Figure 7.1: A matrix for defining four basic concepts as defined in the test.

xiii

Figure 7.2: Illustration of an empirical ROC curve.

Figure 7.3: Illustration of a binormal model. Figure 7.4: An ROC curve constructed by sensitivity and FPF (1-specificity) based on different decision thresholds.

Figure 7.5: The model used in fitting binormal ROC curves. Figure 7.6: A conventional binormal ROC curve plotted by ROCFIT. Figure 7.7: Understanding AUCs. (a) AUC. (b) Four ROC curves with different values of the AUCs.

Figure 7.8: An illustration of a comparison between the sensitivities of ROC curve A and ROC curve B at a specific FPF.

Figure 8.1: An ROC-type performance curve generated by an ANN in the dataset.

Figure 8.2: A scatter diagram between two features of 200 testing samples including 100 analyzable and 100 un-analyzable cells.

Figure 9.1: The main idea of evaluating the prognosis of ALL.

Figure 9.2: Distribution of the number of chromosomes in three testing cases.

Figure 10.1: (a) A large region with many fluorescence artifacts. (b) A cluster of overlapped cells. (c) A huge cluster involving many small areas and stain debris. (d) Examples of analyzable normal interphase cells.

Figure 10.2: The distribution of FISH spots in abnormal cells.

Figure 10.3: (a) An illustration of typical FISH spots. (b) Related features of FISH spots. (c) An example of red/green spots with compact oval shape and stringy diffuse oval shape in a normal cell. (d) An example of a splitting red FISH spot in a normal cell. (e) An example of splitting red spots in an abnormal cell.

Figure 10.4: A flow diagram of a knowledge-based classifier to recognize split, stringy, and diffuse cells

Figure 10.5: Comparisons of different feature distributions for analyzable interphase cells and un-analyzable cell clusters. (a) – (b) The size distribution of all analyzable and un- analyzable cells. (c) – (d) The compactness distribution of all analyzable and un- analyzable cells. (e) – (f) The circularity distribution of all analyzable and un-analyzable cells, respectively.

xiv

Figure 10.6: (a) The distance distribution between red FISH spots in normal cells. (b) The distance distribution between green spots in normal cells.

Figure 10.7: Automated detection result by the scheme.

Figure 10.8: Examples of analyzable interphase cells.

xv

Abstract

Metaphase-finding and karyotyping are standard and fundamental procedures, which are routinely performed in genetic laboratories for clinicians to detect cancers and other genetic diseases. However, visual search, manual identification, and classification of analyzable metaphase chromosomes using optical microscopes are tedious and time- consuming tasks. Hence, an integrated computer-aided detection (CAD) system, which includes both an automated metaphase finding module and a karyotyping chromosome module, may significantly expedite and improve the diagnostic efficiency and performance in a hectic clinical practice. Despite of previous efforts, several challenges have limited the development of automated metaphase-finding and karyotyping schemes: (1) most current automated metaphase finding schemes use low-resolution images to search for the location of potentially analyzable cells and they still needs human intervention; (2) automatic metaphase finding results obtained in low magnification requires technicians to switch to high magnification lens to double check whether it is useful for later karyotyping; (3) most automated karyotyping schemes use a single classifier to classify 24 different types of chromosomes; (4) how to select effective features for classification; and (5) how to find the optimal classifier and thus reduce the complexity of karyotyping. In order to solve these problems, this thesis reports the results of the research in developing a new chromosome analysis system for cancer diagnosis.

The integrated chromosome analysis system in this study includes three basic modules. The first module detects whether a microscopic digital image depicts a metaphase chromosome cell in high magnification. A computerized scheme has been

xvi

developed and tested. It can automatically identify chromosomes in metaphase stage and classify them into analyzable and un-analyzable groups in high magnification. Through computing a set of features for each individual chromosome as well as for each identified metaphase cell, two machine learning classifiers including a decision tree (DT) and an artificial neural network (ANN) are optimized and tested to classify between analyzable and un-analyzable cells. These two classifiers utilize features of individual chromosomes and metaphase cells. The on-line metaphase finding processes less than 350 milliseconds for an eight-bit chromosome image with size 4096*2048. Second, a multi-stage rule- based scheme has been developed to automatically detect centomeres and determine polarities for both abnormal and normal metaphase chromosomes. Automatic centromere identification and polarity assignment are two key factors in the automatic karyotyping of human chromosomes. The scheme implements a modified thinning algorithm to identify the medial axis of a chromosome and extracts four related feature profiles, which include pixel distribution, shape, density, and idealized banding profile. According to a set of pre- optimized classification rules, the scheme adaptively identifies the centromeres, assigns corresponding polarities, and extracts related features of chromosomes. The experimental results demonstrate that the scheme developed in this study can be successfully applied to diverse chromosomes, which include those severely bent and abnormal chromosomes extracted from cancer cells. Third, though analyzing four feature profiles and calculating specific features by weighted density distribution functions, a vector of 31 features is obtained and used to classify chromosomes by a two-layer ANN-based classifier. This two-layer classifier with eight ANNs is optimized by a genetic algorithm. In the first layer, a testing chromosome is classified into one of the seven groups by the ANN.

xvii

Another ANN is then automatically selected from the seven ANNs in the second layer (one for each group) to further classify this chromosome into one of 24 types. The scheme was evaluated using a “training-testing-validation” method.

To assess the performance and robustness of the chromosome analysis system introduced in this thesis, 200 digital microscopic chromosome images are used in this chapter. The automatic metaphase-finding module detects analyzable metaphase cells using a feature-based ANN. The ANN-generated outputs are analyzed by an ROC method and an area under the ROC curve is 0.966. In the extracting feature module, the overall accuracy is 89% for centromere identification and 90.7% for polarity assignment. In karyotyping module, a two-layer DT-based classifier with eight ANNs established in its connection nodes is optimized by a GA. Chromosomes are classified into seven groups by the ANN in the first layer. The chromosomes in these groups are then separately classified by seven ANNs into 24 types in the second layer. The classification accuracy is 94.5% in the first layer. Six ANNs achieved the accuracy above 95% and only one had lessened performance (80.6%) in the second layer. The results demonstrate that the developed automated scheme can achieve high and robust performance in the identification and classification of metaphase chromosomes. Besides assisting cancer detection, this analysis system can help doctors evaluate the prognosis of specific cancer. By analyzing the numerical change of chromosomes and computing the DNA index, the correlated results obtained in this thesis can help doctors to analyze the prognosis of childhood acute lymphoblastic leukemia.

xviii

In addition, a fluorescence in situ hybridization (FISH) chromosome analysis is also investigated in this research and it demonstrates the feasibility of applying an automatic FISH image analysis to expedite the screening of cervical cancer. First, three feature-based classification rules are applied to detect analyzable cells and delete un- analyzable ones. Second, a knowledge-based expert classifier is implemented to identify splitting FISH signals and improve the accuracy in counting independent FISH spots. The scheme then classifies each detected analyzable cell as normal or abnormal. The automated detection results are compared with those visually identified by the cytogeneticist. The results show that (1) the agreement between the computer scheme and the cytogeneticist is 96.9% in classifying between analyzable and un-analyzable cells (Kappa = 0.917) and (2) agreements between the scheme and the cytogeneticist in detecting normal and abnormal cells based on FISH signals are 90.5% and 95.8% , respectively. The kappa coefficient is 0.867.

Key Words: Human chromosomes, Metaphase finding, Feature extraction, Karyotyping, Artificial neural network, Decision tree, Kappa coefficients, Receiver operating characteristic curves, Acute lymphoblastic leukemia, Genetic algorithm, Training- Testing-Validation, Cervical cancer screening, Fluorescent in situ hybridization, Interphase cells, Pap-smear test.

xix

1 Introduction 1.1 Objectives

Identification and classification of human chromosomes is a fundamental process required in diagnosis of cancers (e.g., leukemia) and genetic diseases. However, it is a tedious and time-consuming process, which may also affect diagnostic performance of clinicians in a busy clinical environment. Although a number of research groups around the world have been working on developing computer-assisted chromosome analysis systems (metaphase finding and karyotyping) since the 1980s, most systems and a number of commercial systems used in clinical laboratories to date are all semi- automated systems that often require human intervention to correct the results [1]. The purpose of this thesis is to propose and develop a new computer – aided chromosome analysis system to expedite the diagnosis procedure and assist clinical doctors determine the prognosis evaluation for patients. It includes metaphase finding - the identification of analyzable metaphase chromosomes in high magnification, feature extraction of individual chromosomes, and classification or karyotyping chromosomes. Automatic fluorescence chromosome analysis is also investigated in this thesis. It could be used as a potential tool to expedite cervical cancer screening. 1.2 Organization

The structure of this dissertation is as follows: Chapter 2 gives a description about the background of human chromosomes. First, it explains the definition of chromosomes. Second, this chapter introduces chromosome anomalies that include numerical and structural changes. Third, two fundamental

1

procedures – metaphase finding and karyotyping are introduced in this chapter, which are routinely performed in genetic laboratories. These two procedures can help doctors assist the diagnosis of cancer and other genetic diseases. Chapter 3 analyzes the reasons why new chromosome analysis systems needed. The components of a CAD chromosome analysis system are introduced in this chapter. Chapter 4: visual search and identification of analyzable metaphase chromosomes using optical microscopes is a tedious and time-consuming task, which is routinely performed in genetic laboratories to detect and diagnose cancers and genetic diseases. The purpose of this chapter is to develop and test a computerized scheme that can automatically identify chromosomes in the metaphase stage and classify them into analyzable and un-analyzable groups. Two independent datasets involving 170 images are used to train and test the scheme. The scheme uses image filtering, threshold, and labeling algorithms to detect chromosomes, followed by computing a set of features for each individual chromosome as well as for each identified metaphase cell. Two machine learning classifiers including a decision tree (DT) based on the features of individual chromosomes and an artificial neural network using the features of the metaphase cells are optimized and tested to classify between analyzable and un-analyzable cells. Using the DT based classifier, the Kappa coefficients for agreement between the cytogeneticist and the scheme are 0.83 and 0.89 for the training and testing datasets, respectively. An independent testing and a two-fold cross-validation method are employed to assess the performance of the ANN-based classifier. This preliminary study demonstrates the feasibility of developing a computerized scheme to automatically identify and classify metaphase chromosomes.

2

Chapter 5 mainly discussed about the second module - extract features from individual chromosomes. Automatic centromere identification and polarity assignment are two key factors in the automatic karyotyping of human chromosomes. A multi-stage rule-based computer scheme has been investigated to automatically extract features, detect centomeres, and identify which arms are p-arms for both abnormal and normal metaphase chromosomes. The scheme first implements a modified thinning algorithm to identify the medial axis of a chromosome and extracts four related feature profiles: pixel distribution, shape, density, and idealized banding profiles. Based on a set of pre- optimized classification rules, the scheme adaptively identifies the centromere and then assigns corresponding polarity for a chromosome. An image dataset of 2287 chromosomes acquired from 24 abnormal and 26 normal Giemsa metaphase cells is utilized to optimize and test the scheme. The overall accuracy is 91.4% for centromere identification and 97.4% for polarity assignment. The experimental results demonstrate that the scheme in this study can be successfully applied to diverse chromosomes, which include those severely bent and abnormal chromosomes extracted from cancer cells. The purpose of Chapter 6 is to classify individual chromosomes into 24 types. A new chromosome karyotyping scheme is developed and tested using a two-layer classification platform. The hypothesis is that by selecting most effective feature sets and adaptively optimizing classifiers for the different groups of chromosomes with similar image characteristics, the complexity of automated karyotyping scheme can be reduced and improved by its performance and robustness. For this purpose, an image database involving 6900 chromosomes is assembled and implemented a genetic algorithm to optimize the topology of multi-feature based ANN. In the first layer of the scheme, a

3

single ANN was employed to classify 24 chromosomes into seven classes. In the second layer, seven ANNs were adaptively optimized for seven classes to identify individual chromosomes. The scheme was evaluated using a “training-testing-validation” method. In the first layer, the classification accuracy was 92.9%. In the second layer, classification accuracy of seven ANNs ranged from 67.5% to 97.5%, in which six ANNs achieved accuracy above 93.7% and only one had lessened performance. The maximum difference of classification accuracy between the testing and validation datasets is <1.7%. Chapter 7: ROC curves have been used extensively in the medical imaging systems and medical research since 1980s. ROC analysis is also an effective method in evaluating the performance or quality of diagnostic tests. This chapter mainly introduces the definition of ROC curves and two methods to construct ROC curves. Chapter 8 is to assess the performance and robustness of the chromosome analysis system put forward in this thesis. 200 digital microscopic chromosome images are used in this chapter. The automatic metaphase-finding module detects analyzable metaphase cells using a feature-based ANN. The ANN-generated outputs are analyzed by an ROC method and an area under the ROC curve is 0.966. Then, in karyotyping module, a two- layer DT-based classifier with eight ANNs established in its connection nodes is optimized by a GA. Chromosomes are first classified into seven groups by the ANN in the first layer. The chromosomes in these groups are then separately classified by seven ANNs into 24 types in the second layer. The classification accuracy is 94.5% in the first layer. Six ANNs achieved the accuracy above 95% and only one had lessened performance (80.6%) in the second layer. The results demonstrate that the developed

4

automated scheme in this thesis can achieve high and robust performance in the identification and classification of metaphase chromosomes. Chapter 9: in this chapter, through analyzing the numerical change of chromosomes and computing the DNA index, the correlated results obtained in this study can help doctors to analyze the prognosis of childhood acute lymphoblastic leukemia. In this preliminary study with 60 testing images acquired from three pediatric patients, the computer scheme generated results matched with the diagnostic results provided by the clinical cytogeneticists. Chapter 10: fluorescence in situ hybridization (FISH) technology has been widely recognized as a very promising molecular imaging tool and applied to screen and detect cervical cancer. However, manual FISH analysis is time-consuming and introduces large inter-reader variability. In this study, a computerized scheme is developed and tested to automatically detect and analyze FISH signals depicted on microscopic fluorescence images acquired from Pap-smear specimens. The scheme includes two stages for the identification of analyzable interphase chromosome cells and detection of FISH spots. In the first stage, three feature based classification rules are applied to detect analyzable cells and delete un-analyzable ones. In the second stage, a knowledge-based expert classifier is implemented to identify splitting FISH signals and improve the accuracy in counting independent FISH spots. The scheme then classifies each detected analyzable cell as normal or abnormal. The automated detection results are compared with those visually identified by the cytogeneticist. The results show that: (1) the agreement between the computer scheme and the cytogeneticist is 96.9% (315 of 325) in classifying between analyzable and un-analyzable cells (Kappa = 0.917), and (2) agreements between the

5

scheme and the cytogeneticist in detecting normal and abnormal cells based on FISH signals are 90.5% (95 of 105) and 95.8% (137 of 143), respectively, The Kappa coefficient is 0.867. This study demonstrates the feasibility of applying a computerized scheme for FISH image analysis, which may potentially improve detection efficiency and produce more accurate and consistent results than manual FISH detection method.

6

2 Background Since Tjio and Levan discovered that the number of human chromosomes was 46 using the improved cell culturing and staining technique in 1956 [2], the knowledge about chromosomal abnormalities, as a cause of diseases, increased enormously. The characteristics of abnormal chromosomes will help the diagnosis of diseases or cancer. Because of its advantages and effectiveness over traditionally anatomical imaging techniques in detecting cancers and monitoring cancer treatment efficacy, molecular and chromosome imaging have been attracted extensive research interests. Chromosomal disorder is a powerful indicator in diagnosis of cancers (i.e., leukemia, skin and breast cancers) and other genetic diseases. Although identification of chromosomal aberrations or disorders is routinely performed in the clinical laboratories to provide physicians the diagnostic results and help them decide optimal therapeutic treatment plans for patients, it is quite difficult to obtain clear microscopic chromosome images and find analyzable chromosomes in a genetics clinical laboratory due to the variation of cell culturing conditions, chromosome staining, and microscope illumination. Identification and classification of chromosomes is a tedious and time-consuming task, which could also introduce inter-reader variation and affect diagnostic accuracy. Automatic identification and classification of chromosomes in noisy images have been a long-standing difficulty or technical challenge in the development of computer-assisted metaphase finding and karyotyping systems. Therefore, automated chromosome analysis systems are need to be developed routinely used in clinical laboratories to help doctors analyze genetic disorders, diagnose and predict the prognosis of cancer for patients.

7

2.1 Human Chromosomes

The term chromosome was coined by Waldeyer in 1998 [3]. The chromosomes, containing the person’s deoxyribonucleic acid (DNA), are found within a cell’s nucleus. Each chromosome is made up a single extremely long DNA molecule. Tjio and Levan, using cells cultured from fetal lung tissue, demonstrated that the correct human chromosome is 46 [2]. Chromosomes, the way they appear during cell division or mitosis [4]. Usually, a cell of healthy human being includes 44 autosomes and 2 sex chromosomes: X and Y. In clinic, the test samples or cells used for chromosome imaging and analysis are taken mostly from amniotic fluid, blood sample, and bone marrow. There are standard procedures to obtain metaphase chromosomes: the test samples have to be cultured overnight or longer in a mitotic arresting agent to collect metaphase cells. After the cultures, the cells are processed or harvested with hypotonic solutions that are used to increase cell volume, which spreads the chromosomes apart. The methanol-acetic acid is used to fix them for study. The fixed cells are dropped onto a standard glass microscope slide and allowed to dry. The slide is then subjected to a staining process which reveals distinctive reproducible patterns of transverse bands along chromosomes. These permit accurate identification of different chromosomes and recognition of a host of structural changes. 2.2 Chromosome Abnormality

Chromosome anomalies could be divided into two major categories: structural or numerical changes. Usually, a normal metaphase cell contains 46 chromosomes. Numerical change of chromosomes (seen in Figure 2.1) means there are deletions or

8

redundancy such as monosomy (missing one chromosome) or trisomy (gaining an extra chromosome). Figure 2.1 (f) is an example of monosomy (only one chromosome in type #7, #10, #17, #21). Figure 2.1 (g) shows the example of trisomy (three chromosomes in type #1, #13, #16, #17, #21).

(a) (b) (c)

(e) (f) (g)

Figure 2.1: Analyzable metaphase chromosome images from three groups (a)-(c): Examples of a metaphase cell before karyotyping. (e)-(g): Examples of a metaphase cell after karyotyping. (a) (e): The number of chromosomes is equal to 46. (b) (f): The number of chromosomes is less than 46. (c) (g): The number of chromosomes is more than 46.

Structural change means that there are the translocations between two chromosomes, which could exchange genetic information. For example, Nowell and Hungerford discovered a small chromosome marker, the Philadelphia chromosome, in patients with chronic myeloid leukemia (CML) in 1960 [5]. This was proved to be the

9

first consistent chromosomal abnormality in human cancer and it greatly stimulated interest in cancer cytogenetics. Figure 2.2 describes the structural change of chromosomes, in which there is a reciprocal translocation between chromosome #8 and #21: one of the arms of chromosome #8 has been shortened, while the other chromosome #21 has been lengthened.

Full document contains 222 pages
Abstract: Metaphase-finding and karyotyping are standard and fundamental procedures, which are routinely performed in genetic laboratories for clinicians to detect cancers and other genetic diseases. However, visual search, manual identification, and classification of analyzable metaphase chromosomes using optical microscopes are tedious and time-consuming tasks. Hence, an integrated computer-aided detection (CAD) system, which includes both an automated metaphase finding module and a karyotyping chromosome module, may significantly expedite and improve the diagnostic efficiency and performance in a hectic clinical practice. Despite of previous efforts, several challenges have limited the development of automated metaphase-finding and karyotyping schemes: (1) most current automated metaphase finding schemes use low-resolution images to search for the location of potentially analyzable cells and they still needs human intervention; (2) automatic metaphase finding results obtained in low magnification requires technicians to switch to high magnification lens to double check whether it is useful for later karyotyping; (3) most automated karyotyping schemes use a single classifier to classify 24 different types of chromosomes; (4) how to select effective features for classification; and (5) how to find the optimal classifier and thus reduce the complexity of karyotyping. In order to solve these problems, this thesis reports the results of the research in developing a new chromosome analysis system for cancer diagnosis. The integrated chromosome analysis system in this study includes three basic modules. The first module detects whether a microscopic digital image depicts a metaphase chromosome cell in high magnification. A computerized scheme has been developed and tested. It can automatically identify chromosomes in metaphase stage and classify them into analyzable and un-analyzable groups in high magnification. Through computing a set of features for each individual chromosome as well as for each identified metaphase cell, two machine learning classifiers including a decision tree (DT) and an artificial neural network (ANN) are optimized and tested to classify between analyzable and un-analyzable cells. These two classifiers utilize features of individual chromosomes and metaphase cells. The on-line metaphase finding processes less than 350 milliseconds for an eight-bit chromosome image with size 4096*2048. Second, a multi-stage rule-based scheme has been developed to automatically detect centomeres and determine polarities for both abnormal and normal metaphase chromosomes. Automatic centromere identification and polarity assignment are two key factors in the automatic karyotyping of human chromosomes. The scheme implements a modified thinning algorithm to identify the medial axis of a chromosome and extracts four related feature profiles, which include pixel distribution, shape, density, and idealized banding profile. According to a set of pre-optimized classification rules, the scheme adaptively identifies the centromeres, assigns corresponding polarities, and extracts related features of chromosomes. The experimental results demonstrate that the scheme developed in this study can be successfully applied to diverse chromosomes, which include those severely bent and abnormal chromosomes extracted from cancer cells. Third, though analyzing four feature profiles and calculating specific features by weighted density distribution functions, a vector of 31 features is obtained and used to classify chromosomes by a two-layer ANN-based classifier. This two-layer classifier with eight ANNs is optimized by a genetic algorithm. In the first layer, a testing chromosome is classified into one of the seven groups by the ANN. Another ANN is then automatically selected from the seven ANNs in the second layer (one for each group) to further classify this chromosome into one of 24 types. The scheme was evaluated using a "training-testing-validation" method. To assess the performance and robustness of the chromosome analysis system introduced in this thesis, 200 digital microscopic chromosome images are used in this chapter. The automatic metaphase-finding module detects analyzable metaphase cells using a feature-based ANN. The ANN-generated outputs are analyzed by an ROC method and an area under the ROC curve is 0.966. In the extracting feature module, the overall accuracy is 89% for centromere identification and 90.7% for polarity assignment. In karyotyping module, a two-layer DT-based classifier with eight ANNs established in its connection nodes is optimized by a GA. Chromosomes are classified into seven groups by the ANN in the first layer. The chromosomes in these groups are then separately classified by seven ANNs into 24 types in the second layer. The classification accuracy is 94.5% in the first layer. Six ANNs achieved the accuracy above 95% and only one had lessened performance (80.6%) in the second layer. The results demonstrate that the developed automated scheme can achieve high and robust performance in the identification and classification of metaphase chromosomes. Besides assisting cancer detection, this analysis system can help doctors evaluate the prognosis of specific cancer. By analyzing the numerical change of chromosomes and computing the DNA index, the correlated results obtained in this thesis can help doctors to analyze the prognosis of childhood acute lymphoblastic leukemia.