Validation Methods

From FarsightWiki
(Difference between revisions)
Jump to: navigation, search
Line 4: Line 4:
 
'''Classical Validation Methods: '''The classical approach to validation is to compare automated results to “ground truth data” that is known in advance to be an accurate result. Two such cases are listed below:
 
'''Classical Validation Methods: '''The classical approach to validation is to compare automated results to “ground truth data” that is known in advance to be an accurate result. Two such cases are listed below:
  
'''a. Synthetic Data: ''' One approach to ground truth generation is to create a man-made '''phantom image''' (or image sequence) with known parameters and perturbations. This is helpful for evaluating several aspects of image analysis algorithms in a tightly controlled manner. For example, one can make sure that the algorithm produces the correct result when zero noise or other perturbations are applied. One can also study performance degradation as noise and perturbations are added. This type of study also allows comparisons of algorithms under identical conditions.
+
'''a. Synthetic Data: ''' One approach to ground truth generation is to create a man-made '''phantom image''' (or image sequence) with known parameters and perturbations. This is helpful for evaluating several aspects of image analysis algorithms in a tightly controlled manner. For example, one can make sure that the algorithm produces the correct result when zero noise or other perturbations are applied. Next,  one can study performance degradation (increase in discrepancies compared to the phantom data) as increasing levels of noise and perturbations are added. This can be quantified by computing a measure of discrepancy (e.g., bias and variance) between the correct answer and the answer produced by the automated algorithm. This type of study can be readily extended to the task of comparing two or more algorithms under identical conditions.
  
 
'''b. Data from Physical Phantoms:''' At the next level of sophistication, one can place precisely fabricated man-made objects under the microscope (e.g., fluorescent microbeads, dye-filled micro-pipettes, etc.) and record images. This method is a little more realistic and less tightly controlled compared to the above case, and is often helpful in developing a better understanding of algorithm strengths and weaknesses.
 
'''b. Data from Physical Phantoms:''' At the next level of sophistication, one can place precisely fabricated man-made objects under the microscope (e.g., fluorescent microbeads, dye-filled micro-pipettes, etc.) and record images. This method is a little more realistic and less tightly controlled compared to the above case, and is often helpful in developing a better understanding of algorithm strengths and weaknesses.
  
 
'''c. Real Biological data: ''' A practical strategy for validating automated image analysis results for real data is to use the currently best-available “gold standard”. In this regard, the human observer continues to be the most widely accepted and versatile gold standard since the human visual system remains unbeatable for visual tasks. In other words, image analysis systems are attempting to automate tasks that have traditionally been carried out by humans. The human visual system takes up roughly 2/3 of our brain's computing capacity, and is a formidable competitor. This is not to imply that the human observer is perfect – this is far from true. It has significant weaknesses as well.
 
'''c. Real Biological data: ''' A practical strategy for validating automated image analysis results for real data is to use the currently best-available “gold standard”. In this regard, the human observer continues to be the most widely accepted and versatile gold standard since the human visual system remains unbeatable for visual tasks. In other words, image analysis systems are attempting to automate tasks that have traditionally been carried out by humans. The human visual system takes up roughly 2/3 of our brain's computing capacity, and is a formidable competitor. This is not to imply that the human observer is perfect – this is far from true. It has significant weaknesses as well.

Revision as of 15:26, 14 April 2009

Validation is the process of establishing the validity of an automated algorithm. Closely related topics are performance assessment and quality control. In fact, these topics are closely related and must be considered together. Validation and performance are of utmost importance since a carefully validated algorithm with known (and acceptable) performance can be deployed with confidence in biological studies, and vice versa. In the FARSIGHT project, we are interested in validation and performance assessment methods for detection, segmentation, classification, and change analysis algorithms. We are also interested in methods that are practical to use, scalable, and requiring minimal manpower compared to traditional methods.


Classical Validation Methods: The classical approach to validation is to compare automated results to “ground truth data” that is known in advance to be an accurate result. Two such cases are listed below:

a. Synthetic Data: One approach to ground truth generation is to create a man-made phantom image (or image sequence) with known parameters and perturbations. This is helpful for evaluating several aspects of image analysis algorithms in a tightly controlled manner. For example, one can make sure that the algorithm produces the correct result when zero noise or other perturbations are applied. Next, one can study performance degradation (increase in discrepancies compared to the phantom data) as increasing levels of noise and perturbations are added. This can be quantified by computing a measure of discrepancy (e.g., bias and variance) between the correct answer and the answer produced by the automated algorithm. This type of study can be readily extended to the task of comparing two or more algorithms under identical conditions.

b. Data from Physical Phantoms: At the next level of sophistication, one can place precisely fabricated man-made objects under the microscope (e.g., fluorescent microbeads, dye-filled micro-pipettes, etc.) and record images. This method is a little more realistic and less tightly controlled compared to the above case, and is often helpful in developing a better understanding of algorithm strengths and weaknesses.

c. Real Biological data: A practical strategy for validating automated image analysis results for real data is to use the currently best-available “gold standard”. In this regard, the human observer continues to be the most widely accepted and versatile gold standard since the human visual system remains unbeatable for visual tasks. In other words, image analysis systems are attempting to automate tasks that have traditionally been carried out by humans. The human visual system takes up roughly 2/3 of our brain's computing capacity, and is a formidable competitor. This is not to imply that the human observer is perfect – this is far from true. It has significant weaknesses as well.

Personal tools