Outcome measures for amputees

General introduction

Outcome measures can be used for many different purposes. A predictive measure should be able to classify individuals according to a set of pre-defined categories either concurrently or prospectively e.g. whether an amputee will use a prosthesis successfully [1] [2]. Detecting differences between people or groups demonstrates the discriminative value of an outcome measure e.g. being able to determine the different abilities of a trans-tibial or trans-femoral amputee or differences between prosthetic components from scores or times recorded [3]. Whereas an evaluative measure should be able to detect changes, usually over a period of time in an individual or group. An evaluative outcome measure may also detect changes occurring following some kind of intervention, e.g a therapy programme[4] or provision of a prosthetic component. Some outcome measures are designed to do only one of the above, while others may do a combination, though some of the requirements of these different types of outcome measures are competing [5]. Whichever purpose it is designed for, the psychometric properties of the outcome measure need to be reported to satisfy the user that it is fit for purpose with the population they wish to use it [6]. The psychometric properties of an outcome measure are the characteristics that express it’s adequacy in terms of reliability, validity and responsiveness. Another term often used is clinimetric properties. While being developed from similar origins as psychometrics, clinimetrics has been described as the practice of assessing or describing symptoms, signs, and laboratory findings by means of scales, indices, and other quantitative instruments, all of which should have adequate psychometric properties [7] [8].

Considerations before choosing an outcome measure

If you are considering using an outcome measure with an amputee it is worth asking yourself the questions posed on the Outcome Measures page here in Physiopedia (Guide to Selecting Outcome Measures). At the very least you should consider these questions with your amputee patient or group in mind.

Why am I using an outcome measure?

  • Am I trying to establish a baseline measure from which I can monitor changes over time for an individual patient?
  • Am I trying to predict how my patient is going to perform? 
  • Am I trying to evaluate the impact of a treatment programme or prosthetic component on an individual or a group?
  • Am I trying to evaluate the needs of the amputee attending my service?
  • Am I trying to evaluate how my service is responding to needs of the amputee?

What am I aiming to measure?

  • Impairments of body structure and function?
  • Activity limitations?
  • Participation restrictions?
  • Quality of life?
  • Something else?

When you think you may have an outcome measure in mind you should also consider these questions.

Have the clinimetric properties of the outcome measure I am considering been measured in a population similar to mine?

  • Is the outcome measure reliable?
  1.  Do I know the rate of error detected with scores?
  2.  Do I know the minimum detectable change?
  • Is the outcome measure valid?
  1.  Does it measure what I want it to measure?
  • Is the outcome measure responsive to change?
  1.  Is there a known minimum clinically important difference?

Here are some examples of studies where the clinimetric, sometimes called psychometric, properties have been reported in an amputee population and what the results may tell you.


Reliability is usually measured by Intra-class correlation coefficients (ICC) and is presented as a number between 0 (no consistency) to 1 (complete consistency) [9]
Intra-rater Reliability: This indicates how consistently a rater administers and scores an outcome measure.
Inter-rater Reliability: This indicates how well two raters agree in the way they administer and score an outcome measure.
Test-retest reliability: If an individual completes a self-report survey and then repeats the survey on a second occasion when no change is expected, the results should be similar.

  • Brooks, Hunter et al (2002) examined the reliability of the 2 minute walk test (2MWT) [10] . Participants completed 2 successive timed walks measured by 2 different raters on 2 consecutive days. Intra class correlations (ICC) were >0 .98 showing excellent intra- and inter-rater reliability.

Measurement error: This is the degree to which scores or ratings are identical irrespective of who performs or scores the test and can be reported using the standard error of measurement (SEM) or minimal detectable change (MDC), which is the same as smallest detectable change (SDC) [11] .

  • Deathe & Miller (2005) reported the SEM in absolute values, which was 3 sec for the L-Test [12] .
  • Resnik & Borgia (2011) also reported MDC in absolute values for all the measures they studied: 2MWT (34.3m), 6 minute walk test (6MWT) (45m), timed up and go (TUG) (3.6s) and amputee mobility predictor (AMP) (3.4pts) [13] 

Internal Consistency: This reliability property is reserved for outcome measures that are designed to test only one concept. Internal consistency assesses the extent to which all items or questions in an outcome measure address the same underlying concept, e.g. in a mobility scale, all the items should deal with mobility [5].
There are two main methods used to report internal consistency. The Classical Test theory uses Cronbachs alpha (α) to indicate the reliability of an outcome measure as a whole. And the Item Response Theory uses Rasch Analysis to assess internal consistency by looking at each item within the outcome measure [14] .

  • The internal consistency of the ABC scale was considered excellent as measured by Cronbachs alpha (0.93) in a study by Milleret al (2003) [15]
  • Rasch analysis was used to examine all the items in the Berg Balance Scale which confirmed that it was able to test a range of difficulty and identify four levels of ability [16] .


Content /Face validity: This is the degree to which the content of an outcome measure is an adequate reflection of the construct or concept to be measured (5). It is usually considered and agreed by consensus of an expert group of clinicians and can and should include patient representatives. For example an instrument measuring activity limitation in young athletic individuals should include not only walking but also running, jumping, and climbing.
Structural validity: This refers to the degree to which the scores of an outcome measure are an adequate reflection of the dimension or factor of the construct being measured [5] . It can be measured by performing factor analysis where the result demonstrate that if >50% of data refer to one factor this confirms that the outcome measure is measuring one factor / dimension. Anything less indicates more than one factor is being assessed. Rasch Analysis may also be used where the outcome measures unidimensionality, i.e. whether it is measuring one or more factors or dimensions.

  • Wong et al (2013) reported results from factor analysis performed on the Berg Balance Score (BBS). The results showed that 70% of the data were explained in the model related to one dimension, i.e. balance capability [16].
  • Franchignoni et al (2007) used Rasch modelling on a modified Locomotor Capability Index to confirm good structural validity when level 1 and 2 category responses were combined and 4 items were deleted due to either over or under-fitting [17]. The resultant modified index is known as LCI-5 which many clinicians now use.

Construct Validity: This is the degree to which the scores of an outcome measure are consistent with pre-defined (apriori) hypotheses that outline relationships to the scores of other instruments, or differences between groups. If > 75% of hypotheses are proved this is an indication of good validity [18] .
It can also be referred to as: Concurrent validity – showing the ability to distinguish between groups (e.g. older and younger lower limb amputees (LLAs)) which is often measured by testing hypotheses, or; Convergent validity – showing that measures that should be related are related, which can also be measured using intraclass correlation coefficient (ICC) with high values indicating good validity.

  • Major et al (2013) hypothesised positive relationships between the Berg Balance scale (BBS) scores and the Activities specific Balance Confidence (ABC) scale, the mobility scale of the Prosthetic Evaluation Questionnaire (PEQ (ms), the Frenchay Activities Index (FAI) and the 2MWT and a negative relationship with the L-test score [19]. These were all proved.
  • To assess concurrent and convergent validity of the L-test Deathe & Miller (2005) asked subjects to complete walks test; TUG,10-Meter Walk Test, and 2MWT, followed by self-reported measures; ABC scale, FAI, PEQ (ms). Concurrent validity was high (ICC = 0.86-0.97) between the L-Test data and the other walk tests and fair-to-moderate (ICC = 0.22 – 0.54) for self-report measures. Higher mean times were observed for those subjects who:
  1. were older
  2. used a walking aid
  3. had to concentrate on each step they took
  4. had a vascular amputation and
  5. had a TF amputation.

Therefore it was demonstrated that the L Test was able to discriminate between all groups as hypothesized [12].

Criterion validity: This is the degree to which the scores of an outcome measure are an adequate reflection of a ‘gold standard’. However there are very few situations in rehabilitation where such a gold standard test exists. If no gold standard is available then it may be appropriate to test hypothetical relationships with comparator measures.
The estimation of criterion validity depends on the type of data. Intra-class correlations are used if both instruments (outcome instrument and comparator) have continuous scores (e.g. time, distance etc) and the results should preferably be above 0.70. If the outcome instrument has a continuous score but the comparator has a dichotomous score (e.g. Yes / No) then area under the receiver operated characteristic (ROC) is the preferred method. Again, a criterion of 0.70 is suggested [18].

  • Gremeauxet al (2012) presented ROC curves for the 2MWT. The modified Houghton Scale was used to stratify the patients into two groups; those with no mobility problems (scored 20/20) and those who scored less than 20 indicating a functional limitation. According to the ROC analysis cut off values of 130m or 150m were highly associated with the existence of functional limitations [20].


Internal responsiveness is the ability of a measure to change over a specified time frame. It will depend on the particular population being studied, the treatment or intervention which occurs during the time frame and the outcome measure used to determine any changes [5].
Standard effect size is the difference between the mean baseline scores and the follow-up scores, divided by the baseline standard deviation (SD). If there is a high variability in baseline scores in relation to the mean change scores the effect size will be small and the ability of the outcome measure to detect meaningful changes is also small. A small effect will be 0.2 representing a change of approx 1/5 that of the baseline SD, 0.5 is considered moderate and anything over 0.8, or a change of at least 4/5 of the baseline SD is considered large [21] .
The paired-t-test is a statistical test that can be used to detect the change in the average scores at two time points, but is dependent on the sample size and variability / reliability of the outcome measure used[14].
In a study by Devlinet al (2004) the effect size calculated for the change in mean scores for the Houghton Scale, from discharge to follow-up, was 0.60, indicating a moderate difference[22].

  • Findings by Brooks et al (2001) indicated that the 2MWT was “responsive to change during rehabilitation”. Significant improvements were seen in means and SDs of the distances walked between baseline and discharge and follow-up [23]. However, effect sizes were not calculated.

Other Considerations

Other considerations (Outcome Measures) may come into play when deciding which outcome measure to use:
Financial Considerations:

  • What is the cost of this test?
  • Is a licence required? 
  • Is equipment required?

Therapist Implementation

  • Is the measure easy for a clinician to conduct?
  • Is special training required/available?
  • Are there clear standardised instructions on how to carry out and score the measure?
  • How long does it take to carry out the measure?
  • How long does it take to record results?


  • Is special equipment or are special forms required?
  • Is space sufficient for this measure to be carried out?


  • How much time does it take for the person to complete?
  • Is the task difficult?
  • Is privacy required?

Patient-Reported Outcome Measures (PROQ)

  • Is face-to-face contact required or can this measure be completed in the waiting room?
  • Does the questionnaire cover sensitive personal issues?
  • Is there a specific reading level required?
  • Is the measure available in other languages?

List of outcome measures validated for use with lower limb amputees

Before using an outcome measure remember to consider some questions

Prosthetic Rehabilitation

The following outcome measures are included in the most recent version (2014) of the British Association of Chartered Physiotherapists in Amputee Rehabilitation (BACPAR) Outcome Measures Toolbox

High-level activity amputees

The outcome measures listed above can be used use with high level activity amputees but it should be noted that ceiling effects may be seen, i.e. the amputee will achieve maximum scores when using ordinal scales, whether observed or self reported.

Pre-prosthetic and / or immediately post-op period

A narrative review undertaken by the BACPAR Outcome Measures project group,looking into the evidence for the use of outcome measures for lower limb amputees (LLAs) in the acute or pre-prosthetic phase. The review waspresented in the Spring 2014 BACPAR Journal bacpar.csp.org.uk/. A search of MEDLINE, CINAHL and PsychINFO in May 2013 using such search terms as ‘Acute Care’ and ‘Outcome Measures’ with ‘Lower-limb Amputees’ OR ‘Lower-limb Amputation’ resulted in a total of 26 articles which, after screening, produced two articles that merited further reading. From these articles only the Functional Independence Measure (FIM) was identified as of potential interest and a further search was conducted adding the specific FIM title. While there is evidence that the FIM is used in the acute and / or early rehabilitation phase with LLAs and can demonstrate an improvement between admission and discharge, the evidence was weak. There was no evidence that the total FIM score was effective as a predictor tool, but there was good correlation in one study for the motor subscale with prosthetic outcome.
When the Toolbox was updated in October 2014 it was decided that not to include the FIM or any other specific outcome measure into the Toolbox for this populationas the current evidence was not strong enough. The other outcome measures included in the Toolbox all had good evidence with larger sample sizes. In addition the FIM requires training before using it and is recommended as an MDT tool so does not fit the “easy to use” criteria.........................

Outcome Measures for use with non limb-wearing lower limb amputees

There is little published evidence on the specific use of outcome measures with non limb-wearing lower limb amputees. ..................



  1. Condie ME, McFadyen AK, Treweek S, Whitehead L. The trans-femoral fitting predictor: a functional measure to predict prosthetic fitting in transfemoral amputees--validity and reliability. Arch Phys Med Rehabil 2011 08;92(8):1293-1297.
  2. Raya M, A., Gailey R, S., Gaunaurd I, A., Ganyard H, Knapp-Wood J, McDonough K, et al. Amputee Mobility Predictor-Bilateral: A performance-based measure of mobility for people with bilateral lower-limb loss. J Rehabil Res Dev 2013 11;50(7):961-968.
  3. Hafner BJ, Willingham LL, Buell NC, Allyn KJ, Smith DG. Evaluation of function, performance, and preference as transfemoral amputees transition from mechanical to microprocessor control of the prosthetic knee. Archives of Physical Medicine & Rehabilitation 2007 02;88(2):207-217.
  4. Rau B, Bonvin F, de Bie R. Short-term effect of physiotherapy rehabilitation on functional performance of lower limb amputees. Prosthet Orthot Int 2007;31(3):258-270.
  5. 5.0 5.1 5.2 5.3 Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 2010;63(7):737-745.
  6. Kirshner B, Guyatt G. A methodologicalframework for assessing health indices. J Chronic Dis 1985;38(1):27-36.
  7. Streiner DL. Clinimetrics vs. psychometrics: an unnecessary distinction. J Clin Epidemiol 2003 12;56(12):1142-1145.
  8. Galea M. Introducing Clinimetrics. Australian Journal of Physiotherapy 2005;51(3):139-140.
  9. Shrout PE(1), Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychol Bull 1979 / 03 / 01 /;86(2):420-428.
  10. Brooks D, Hunter JP, Parsons J, Livsey E, Quirt J, Devlin M. Reliability of the two-minute walk test in individuals with transtibial amputation. Arch Phys Med Rehabil 2002 11;83(11):1562-1565
  11. Stratford P, W., Riddle D, L. When Minimal Detectable Change Exceeds a Diagnostic Test-Based Threshold Change Value for an Outcome Measure: Resolving the Conflict. Phys Ther 2012 10;92(10):1338-1347.
  12. 12.0 12.1 Deathe AB, Miller WC. The L test of functional mobility: measurement properties of a modified version of the timed "up & go" test designed for people with lower-limb amputations. Phys Ther 2005 07;85(7):626-635
  13. Resnik L, Borgia M. Reliability of outcome measures for people with lower-limb amputations: distinguishing true change from statistical error. Phys Ther 2011 04;91(4):555-565.
  14. 14.0 14.1 Streiner DL, Norman GR. Health measurement scales : a practical guide to their development and use / David L. Streiner and Geoffrey R. Norman. : Oxford : Oxford University Press, 2003; 3rd ed; 2003.
  15. Miller WC, Deathe AB, Speechley M. Psychometric properties of the Activities-Specific Balance Confidence Scale among individuals with a lower-limb amputation. Arch Phys Med Rehabil 2003 05;84(5):656-661.
  16. 16.0 16.1 Wong C, Kevin, Chen C, C., Welsh J. Preliminary Assessment of Balance With the Berg Balance Scale in Adults Who Have a Leg Amputation and Dwell in the Community: Rasch Rating Scale Analysis. Phys Ther 2013 11;93(11):1520-1529.
  17. Franchignoni F, Giordano A, Ferriero G, Muñoz S, Orlandini D, Amoresano A. Rasch analysis of the Locomotor Capabilities Index-5 in people with lower limb amputation. Prosthet Orthot Int 2007 12;31(4):394-404.
  18. 18.0 18.1 Terwee CB, Bot SDM, de Boer M,R., van der Windt D,A.W.M., Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007 01;60(1):34-42.
  19. Major MJ, Fatone S, Roth EJ. Validity and reliability of the berg balance scale for community-dwelling persons with lower-limb amputation. Arch Phys Med Rehabil 2013 11;94(11):2194-2202.
  20. Gremeaux V, Damak S, Troisgros O, Feki A, Laroche D, Perennou D, et al. Selecting a test for the clinical assessment of balance and walking capacity at the definitive fitting state after unilateral amputation: a comparative study. Prosthet Orthot Int 2012 12;36(4):415-422.
  21. Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989 / 03 / 01 /;27(3):S178-189.
  22. Devlin M, Pauley T, Head K, Garfinkel S. Houghton Scale of prosthetic use in people with lower-extremity amputations: reliability, validity, and responsiveness to change. Arch Phys Med Rehabil 2004 08;85(8):1339-1344.
  23. Brooks D, Parsons J, Hunter JP, Devlin M, Walker J. The 2-minute walk test as a measure of functional improvement in persons with lower limb amputation. Arch Phys Med Rehabil 2001 10;82(10):1478-1483.