Developing a Fitness Testing Battery

Validity | Limitations | Implications | Frivolous | Redundant | Sequence | Tech | References

Multiple factors must be considered when choosing appropriate fitness tests. These factors need to be prioritized and may require certain compromises. Here are points to consider when devising a Fitness Battery.

Fitness Test Selection Criteria

  • Test Validity
    • Validity refers to the accuracy of an assessment, whether or not it measures what it is supposed to measure based on empirical evidence and theoretical rationales
  • Reliability
    • Reliability is the degree to which an assessment tool produces stable and consistent results.
    • Reliability refers to both Inter and Intra rater reliability
      • consistency between testers and consistency within a single tester
  • Practicality
    • Practicality refers to affordability and portability of the equipment, as well as how easy the test is to construct, administer, score, and interpret.
  • Available Norms
    • Assessing test participants results against norms allows a more immediate evaluation
      • without norms, only before and after comparisons can only be made
    • Norms should be derived from studies in which the subjects reflect very similar population demographics as test participants

Test Validity

  • Adapt tests with good test validity
  • Finding a particular test protocol on a web site or seeing it on a YouTube video does not constitute test validity.
  • Generally speaking, a field performance tests that emulate specific athletic activities will be more valid than tests that do not.
    • Performance tests should attempt to match the mechanics, speed, and duration of the movement.
  • Example A: Quadrant Jump
    • Quadrant Jump has less validity for running athletic agility since the jumping movement pattern does not mimic running sports activities as well as running agility tests.
  • Example B: Wall Squat Test
    • Wall Squat Test assesses lower body static muscular endurance, very unlikely used in any sports except perhaps tug of war and possibly certain wrestling or judo holds.
    • Static Muscular Endurance test do not correlate will with maximum strength or power (Nuez 2016, Fleck & Kraemer 2014)
    • Other tests such as vertical jump, broad jump, sprint, and agility tests will have much higher correlation to lower body power and strength, and athleticism.
  • Example C: Cycle Ergometer

Understand Limitations of Tests

  • Every test has its limitations
    • It is the responsibility of the fitness professional to understand each test’s limitations and interpret the results appropriately with an understanding of these limitations.
  • Example A: Hydrostatic Weighing for Body Composition
    • Historically considered the criterion method (ie gold standard) for body composition despite measuring it indirectly.
    • Inaccurate with power athletes with high bone density since formulas assume a constant bone density.
      • Football players have been measured with negative percent body fat (Adams 1982)
      • Possibly inaccurate with individuals with low bone density.
    • Could overestimate body fat if more than the assumed 100 ml of gas is trapped in the gastrointestinal system.
  • Example B: Hamstring Flexibility
    • Sit & Reach
      • Easy to administer and norms are readily available
      • Potential issues
        • Relatively low validity (NSCA 2012)
        • Indirect measurement of flexibility recorded as distance rather than degrees.
        • Spine and hamstring flexibility
        • Asymmetries can't be detected
      • Appears to be more valid for hamstring flexibility (Minkler & Patterson 1994)
      • Qualitative analysis of spinal flexion can be noted along with score
        • ridged, normal, or flexible spine
    • Active Lying Straight-leg Raise [ALSLR]
      • Part of the Functional Movement Screening [FMS]
      • Once a asymmetry or deficiency is detected through the FMS, more precise ROM assessments can pinpoint insufficiency.
      • Potential issues
        • Validity and reliability not established (NSCA 2012)
        • Lack of Norms (NSCA 2012)
        • Potential for no universal screen for all athlete (NSCA 2012)
        • Higher Scoring ALSLR has been correlated to poorer agility and leg power test scores (Lockie 2015)
    • Hip Flexion ROM
      • Direct measurement of flexibility
      • High validity (NSCA 2012)
      • Potential Issues
        • Mobility not assessed during activity
        • ROM Interaction between hip, knee, and ankle
        • Takes time to assess individual joints
          • Even greater time required when varying positions of adjacent joints
            • Eg: Hip Angle with bent-knee versus straight knee with ankle at 90°
        • See Joint Ranges of Motion
    • Pretest conditions should be similar for comparisons to be valid (NSCA 2012)
      • ie: Temperature and pre-activity
    • Also see Flexibility & Mobility Testing

Understanding Implications of Assessments

  • Know how to advise participants based on results of fitness tests
  • Example: Flexibility
    • Assess results against:
      • normative data
      • minimum values required to perform activity
    • Standards
      • No universal standards for ROM (NSCA 2012)
        Guidelines regarding asymmetries have no consensus (NSCA 2012)
        • Some suggest 10%, others suggest 15%
    • Risk / Benefit
      • Specific inflexibility paired with other factors may contribute to increased risk of injury.
      • However, greater flexibility may impair performance in sports that do not require a high degree of flexibility.
      • Athletes without flexibility deficits will experience little benefit from increasing mobility (NSCA 2012)
    • Objective: Flexibility testing should identify mobility deficits (NSCA 2012)
    • Flexibility program should be customized to individual and their specific goals

Avoiding Frivolous Tests

  • Eliminate tests that do not practically distinguish good performers from poor performers.
  • Example A: Balance Tests:
    • Typical balance tests are generally unnecessary for most individuals
      • Stork stand
      • Dynamic Balance Tests
    • Balance is typically not a limiting factor in performance or function (Bompa 2015, NSCA 2012)
    • Other Fitness Tests can screen for possible balance deficiencies
      • Prescreening of the Exercise Readiness Questionnaire
        • Question 4: “Have you ever faint or get dizzy and lose your balance?”
      • In-line Lunge of the Functional Movement Screening Assessment.
      • Agility running test will utilize dynamic mobile balance, most specific to athleticism.
    • Balance test can be performed on participants with known suspected balance disorder or deficient
      • Those with known vestibular disorders.
      • Those suspected of possible balance disorders
      • Elderly with compromised leg strength
      • Injured Participants
        • Orthopedic Injuries can compromise strength, decrease proprioception, and alter programming
        • Balance tests have been used as screening tool for those suspected of suffering a brain concussion (Odem 2016)
  • Example B: Plank and Side Plank Tests
    • Requires participant to be in a static position that is very rarely replicated in the demands of any daily or sport-related activities (Shinkle 2012).
    • Core muscular contractions during nearly all athletic activities are typically dynamic, or last less than 1 second, far less time than what is being tested during the plank test.
    • Static core stability will likely play a greater role in core conditioning exercises, push-ups, and weight lifts that they will in any particular sports activity.
    • Consider alternative core stability tests from the Functional Movement Screening Assessment, testing core stability for brief durations.:
    • Okada (2011) reported that core stability and FMS are not strong predictors of athletic performance.
      • Although training for core and functional movement are important to include in a fitness program, especially for injury prevention, they should not be the primary emphasis of any training program (Okada 2011)
  • Example C: Reaction
    • Limited value when measured as a general ability
    • Direct testing for general reaction are very rarely included in athletic fitness or health fitness batteries.
      • Eg: Ruler Drop test:
        • Participant catches ruler dropped immediately over hand between fingers and thumb
    • More valuable alternative tests
      • Reacting to the initiation of sprint or agility tests may arguably involve a more relevant reaction component
        • One test in particular, the Sherpell Reactive Agility Test, measures reaction time and accuracy of decision through the use of a high speed camera.
      • Sports specific tests involving a reaction component can be implemented after the athlete has had an opportunity to practice the sports skill.
  • Example D: Eye / Hand or Foot Coordination / General Motor Skill
    • Limited value when measured as a general ability
    • General eye / hand or foot coordination tests are generally not performed in fitness assessment batteries
      • Stick Test of Coordination (Corbin & Lindsey 1994):
        • Requires participants to juggle wooden wand using a wooden wand in each hand.
        • Scoring is achieved by attempting to perform half and full flips 5 times each.
        • Ratings provided based on total score.
      • SportsVision system
    • Studies examining validity of 'general' coordination to specific motor skills are lacking.
    • Potentially more valuable alternative tests
      • Sports or activity specific motor skills may be a more meaningful indicator of coordination.
        • Assess after the participant has had adequate training on the fundamentals of the sports task or activity.
      • Combine motor skill with actual running through coarse, field, or court.
        • Eg: Hoff Soccer Test & Conditioning Protocol
          • Both a drill and test of cardiovascular endurance
          • Requires participants to keep control of the soccer ball as they run down the course.
      • If Developmental Coordination Disorder (DCD) is suspected
        • Screen with validated Motor Assessment Batteries (Hands 2015)
  • Example E: Peripheral Awareness
    • Limited value when measured as a general ability
    • General tests of peripheral awareness are generally not included in a fitness battery
      • SportsVision system
    • Factors affecting Peripheral Awareness
      • Different sports and different positions within a sport require varying degrees of peripheral awareness versus focus.
      • Peripheral awareness and focus also vary according to factors
        • Arousal level (See Inverted-U and Optimal Mental States)
        • Skill Level and level of challenge (See Flow)
        • This implies that even if a general test suggests a particular level of peripheral awareness, that this level of awareness will not necessarily manifest itself during the actual sporting event.
      • Peripheral Awareness is a function of vision, understanding of what are important cues in sports, ability to managing arousal levels, and concentration and attention control skill.
    • Potentially more valuable assessment
      • Assess athletes' peripheral awareness and focus during sports practice and tournaments under varying challenges

Eliminate Redundant Tests

  • Performing more than one test that measures the same fitness component generally will not reveal any additional information.
  • Consequences of including too many tests:
    • Sessions will take more time than necessary.
      • This wastes both participant's and test administrator's time
        • particularly if performing tests will low validity and/or the same fitness component with multiple tests.
    • An extended test battery may become more of a feat of endurance, potentially affecting reliability of test scores
      • Lower performance from fatigue in later tests particularly those measuring same components
      • Possibly lower effort during early tests in anticipation of upcoming test in effort to conserve energy
  • Avoid a shotgun approach in selecting your test battery.
    • “If one test is good, more must be better”
    • “Not sure which test is best so I’ll do then all”.
  • Adopt a ‘less is more’ mentality.
    • Identify best tests based on Fitness Test Selection Criteria (above).
    • Choose only one test for each fitness component to improve time efficiency and reliability of test results.
  • Example A: Lower Body Power
    • Choose between Vertical Jump and Broad Jump since they both test for leg power.
    • Vertical jump height and standing long jump distance have been found to be highly correlated in female (Mayhew & Salm 1990, Mayhew 1994) and male (Glencross 1966, Beckenholdt & Mathew 1983, Manning 1988, Seiler 1990) college age students.
  • Example B: Agility
    • Choose only one Running Agility Drill Test since they all test for agility
    • Stewart, Turner, and Miller (2012) found that the Illinois, L-Run, Pro-Agility, T-test, and 505 change of direction tests are all highly reliable measures and all test for the same physical attribute, not a measure of specific movement patterns but of a general athletic ability to effectively change direction
    • Quadrant Jump has less validity for athletic agility since movement pattern does not mimic most sports activities as well as other agility tests.

Testing Sequence

  • Recommended sequence of tests
    • Vitals
    • Body Composition / Circumferences
    • Flexibility / Mobility
    • Speed / Agility
    • Power
    • Muscular Strength
    • Muscular Endurance
    • Cardio
  • Testing sequence can be influence by multiple factors (Hoffman 2006)
    • Number of participants
    • Number of test administrators
    • Duration of tests
    • Number of days available to test
  • Large group testing may warrant multiple test stations to be utilized concurrently
    • Allow for adequate recovery between maximum effort tests
      • Eg: Strength & Speed
      • 5 minutes recovery has been suggested  to restore phosphagen energy system (Hoffman 2006)
        • After maximum performance Creatine phosphate recovery may take 4 minutes to replete between maximal bouts (Trembblay 1994).
      • Perform most fatiguing test at end (Hoffman 2006)
        • Eg: Endurance and Shuttle runs

Validity over Tech

  • Avoid the temptation to choose a testing protocol primarily to impress your participants with the latest tech at the expense of validity.
  • Tech based testing protocols do not necessarily offer additional testing accuracy.
  • Example bioelectrical impedance versus 7 site skinfold
    • Jackson & Pollock (1985)
      • bioelectrical impedance
        • R= 0.71 to 0.76
        • SEE = 4.6 to 6.4%
      • 7 site skinfold
        • R+ 0.88 to 0.92
        • SEE = 2.6 to 3.6%
    • Bioelectrical impedance overestimated bodyfat by more than 3% and 4% in men and women when body fat was higher than 15% and 25% respectively (Sun 2005)
    • Rutherford (2011)
      • Bioelectrical impedance (hand to hand terminals) is not superior to 3 site skinfold measurements.
      • Caution should be exercised when using bioelectrical impedance (foot to foot terminals) based on maufacturer's equations with young adults.
      • "Fitness practitioners who measure their clients’ bodyfat may be likely to pick up the most readily available BIA instrument, believing it to be accurate, and this may not be the case."


Adams J, Mottola M, Bagnall KM, McFadden KD (1982). Total body fat content in a group of professional football players. Can J Appl Sport Sci. 7(1):36-40.

Australian Institute of Sports (2013). Physiological Tests for Elite Athletes, pg 242, 243.

Bompa T, Carrera M (2015). Conditioning for Young Athletes, 263-264.

Corbin CB, Lindsey R (1994). Concepts of Fitness & Wellness, 181.

Fleck SJ, Kraemer WJ (2014). Designing Resistance Training Programs, 9.

Hands B, Licari M, Piek J (2015). A review of five tests to identify motor coordination difficulties in young adults. Res Dev Diabli. 41-42:40-51.

Hoffman J (2006). Norms for Fitness Performance, and Health, pg 98-99.

Jackson AS, Pollock ML, Graves JE, Mahar MT. Reliability and validity of bioelectrical impedance in determining body composition. J Appl Physiol (1985). 64(2):529-34.

Lockie RG, Schultz AB, Callaghan SJ, Jordan CA, Luczo TM, Jeffriess MD (2015). A preliminary investigation into the relationship between functional movement screen scores and athletic physical performance in female team sport athletes. Biol Sport. 32(1): 41–51.

Minkler S & Patterson P (1994) The validity of the modified sit-and-reach test in college-age students. Research Quarterly for Exercise and Sport. 65, 189-192.

NSCA (2012) NSCA's Guide to Tests and Assessments.

Nunez CE, (Accessed 2016). Relationships Between: Muscular Power, Muscular Strength and Muscular Endurance,

Odom MJ, Lee YM, Zuckerman SL, Apple RP, Germanos T, Solomon GS, Sills AK (2016). Balance Assessment in Sports-Related Concussion: Evaluating Test-Retest Reliability of the Equilibrate System. J Surg Orthop Adv. 25(2):93-8.

Okada T, Huxel KC, Nesser TW (2011) Relationship between core stability, functional movement, and performance. J Strength Cond Res. 25(1):252-61.

Reiman MP, Manskey RC (2009) Functional Testing in Human Performance. 194.

Rutherford, W. J.; Diemer, Gary A.; Scott, Eric D (2011). Comparison of Bioelectrical Impedance and Skinfolds with Hydrodensitometry in the Assessment of Body Composition in Healthy Young Adults. ICHPER-SD Journal of Research, 6(2):56-60.

Shinkle J, Nesser TW, Demchak TJ, McMannus DM (2012) Effect of core strength on the measure of power in the extremities. J Strength Cond Res. 26(2):373-80.

Stewart PF, Turner AN, Miller (2012). Reliability, factorial validity, and interrelationships of five commonly used change of direction speed tests. Scand J Med Sci Sports. 24(3):500-6.

Sun G, French CR, Martin GR, Younghusband B, Green RC, Xie YG, Mathews M, Barron JR, Fitzpatrick DG, Gulliver W, Zhang H (2005). Comparison of multifrequency bioelectrical impedance analysis with dual-energy X-ray absorptiometry for assessment of percentage body fat in a large, healthy population. American Journal of Clinical Nutrition. 81(1), 74-78.

Trembblay A, Simoneau JA, Bouchard C. (1994). Impact of Exercise Intensity on Body Fatness and Skeletal Muscle Metabolism, Metabolism. 43(7): 814-818.

Willardson JM (2004). The effectiveness of resistance exercises performed on unstable equipment. Strength and Conditioning Journal; 26 (5), 70-74.

Related Articles