How SEER for Software is Calibrated

Published on Author Dan Galorath

Recently this question was asked:

How often is the existing vendor knowledge base refreshed with industry standard data and how is this accomplished?

For which we provided a very public answer, published in Crosstalk, April 2005.  I am copying the relevant portion of that article (http://www.stsc.hill.af.mil/crosstalk/2005/04/0504Fischman.html) here, because the question comes up so often.

Calibrating SEER-SEM

Key components of the SEER-SEM model have been described, but we have not discussed how it adapts to accurately estimate particular development scenarios, and how the model is kept current as software development technologies and methodologies evolve. The answer is simple: masses of ongoing research and analysis.

The modeling team regularly combs through raw data and industry studies to determine the latest trends and their impact on project productivity. As part of this effort, Galorath maintains a software project repository of approximately 6,000 projects (and growing). About 3,500 projects containing effort and duration outcomes are stored in a unified repository that can be readily accessed for studies. These are from both defense and commercial sources representing many development organizations, permitting calibration of the model to a wide array of potential projects. Additional project outcomes, in the hundreds, are also available to the company, which has also collected sizing and other information on thousands of additional projects.

Analysis involves running project data through SEER-SEM using a special calibration mode. The model is essentially run backwards to find calibration factors. These factors are evaluated across different data attributes (e.g. platform, application, etc.) to detect trends. A variety of methods are used to mitigate outlier data points and control for variation. The variance in the data set is also used to establish default parameter ranges; nearly all settings accommodate risk. Model settings are updated as new trends are established.

Galorath’s work also is leveraged with findings from outside studies. For example, when examining relative language productivity, the company first uses its repository to empirically determine the impact of using different languages. However, because not all languages are well covered, it turns to outside sources that provide language descriptions, evolution trees, multidimensional comparisons, etc. Putting all this information together permits the company to make informed judgments about even rarely occurring languages.

Cost estimation models must be able to estimate a wide array of projects. This is accomplished with a significant number of modeling instruments, most of which can be independently set by the user:

  • Sizing Measures. Software’s effective size varies according to many factors, and these factors change over time. As new languages are added to the developer’s toolbox and old ones evolve, language mappings get updated. Sizing proxies also permit entirely new metrics to be added.
  • Knowledge Bases. New platforms (or operating environments) and applications are regularly being identified and added to SEER-SEM by way of its knowledge bases. Knowledge bases actually represent collections of parameter settings. Parameters in turn cover many different facets of the development process and of a software product’s potential characteristics; new platforms and applications usually can be defined with a collection of parameter settings.
  • Allocations. According to project type, the balance shifts between types of activities and labor. Within SEERSEM, detailed activity milestone and labor allocation tables are used to establish baseline allocations, which are then further adjusted depending on project-specific settings related to requirements, testing, and so forth.
  • Internal Calibrations. Several internal instruments, both linear and nonlinear, permit high-level, systematic adjustments to estimates.