United States Election Assistance Comittee

Register to Vote!

Use the National Mail Voter Registration Form to register to vote, update your registration information with a new name or address, or register with a political party.

Note: If you wish to vote absentee and are a uniformed service member or family member or a citizen living outside the U.S., contact the Federal Voting Assistance Program to register to vote.

EAC Newsletters
and Updates

Sign up to receive information about EAC activities including public meetings, webcasts, reports and grants.

Give Us Your Feedback

Share your feedback on EAC policy proposalsElection Resource Library materials, and OpenEAC activities. Give feedback on general issues, including the Web site, through our Contact Us page.

Military and Overseas Voters

EAC has several projects under way to assist states in serving military and overseas citizens who register and vote absentee under the Uniformed and Overseas Citizens Absentee Voting Act. Learn more

Chapter 5: Test Methods

The accredited test lab must design and perform procedures to test a voting system against the requirements outlined in Part 1. Test procedures must be designed and performed that address:

  • Overall system capabilities;
  • Pre-voting functions;
  • Voting functions;
  • Post-voting functions;
  • System maintenance; and
  • Transportation and storage.

The specific procedures to be used must be identified in the test plan prepared by the accredited test lab (see Part 2: Chapter 5: "Test Plan (test lab)"). These procedures must not rely on manufacturer testing as a substitute for independent testing.

1 Comment

Comment by E Smith/P Terwilliger (Manufacturer)

5.1. It is not clear in this section whether each electronic device that comprises the voting system is to be separately tested, or if the entire system is to be tested as a whole. 5.1.1.2. "the product" is not defined. 5.1.1.2-A.1. "it is recommended" has no place. The limit must be specified exactly. 5.1.1.2-A.2. "it is recommended" has no place. The limit must be specified exactly. 5.1.1.2-B. The "industry-recognized standards" need to be cited. This section needs to acknowledge that not all devices have telephone ports. 5.1.3.2-A. "Voting system" or "voting device"? 5.2.2-A. This is not possible. Earlier VVSG sections (1: 6.4.1.8-B and 3: 4.6) acknowledge that not all paths can be verified. Other sections, such as 5.2.3-B.1, also are counter to this requirement. 5.2.3. Using "system" instead of "voting system". 5.2.3-F.1 and 5.2.3-F.2. How is "...tests to verify..." different from "...tests to check..."? 5.3.2. Far too many significant figures in the calculation results. You can't use 7 or 8 significant digits when the input is only 2 or 3. 5.4.2-B. "expert" is subjective. 5.4.2-C vs 5.4.2-D. Why is the experience requirement higher for the election management "expert" than for the other "experts"? 5.4.2-E.a. "complete knowledge" of anything is impossible. 5.4.2-E. These requirements are heavily biased towards the election field. For instance, why would the Information Security expert be required to have designed a voting system? 5.4.3-B. "system model" is not defined, nor used in any other place in the VVSG. 5.4.3-C. "threat model" is not defined, nor used in any other place in the VVSG. 5.4.4-A, 5.4.4-B, 5.4.4-C. Sections switch from "voting system" to "voting device". Is this intentional? The use of "voting device" implies that parts of a system may pass and others may fail. 5.4.6. VSTL is used, not test lab. Is this intentional under the Program?

5.1 Hardware

5.1.1 Electromagnetic compatibility (EMC) immunity

Testing of voting systems for EMC immunity will be conducted using the black-box testing approach, which "ignores the internal mechanism of a system or component and focuses solely on the outputs generated in response to selected inputs and execution conditions" (from [IEEE00]). It will be necessary to subject voting systems to a regimen of tests including most, if not all, disturbances that might be expected to impinge on the system, as recited in the requirements of Part 1.

Note: Some EMC immunity requirements have been established by Federal Regulations or for compliance with authorities having jurisdiction as a condition for offering equipment to the US market. In such cases, part of the requirements include affixing a label or notice stating that the equipment complies with the technical requirements, and therefore the VVSG does not suggest performing a redundant test.

1 Comment

Comment by Al Backlund (Voting System Test Laboratory)

Does this mean that the VSTLs can accept FCC class B testing performed outside the certification timeframe?

5.1.1.1 Steady-state conditions

Testing laboratories that perform conformity assessments can be expected to have readily available a 120 V power supply from an energy service provider and access to a landline telephone service provider that will enable them to simulate the environment of a typical polling place.

5.1.1.2 Conducted disturbances immunity

Immunity to conducted disturbances will be demonstrated by appropriate industry-recognized tests and criteria for the ports involved in the operation of the voting system.

Adequacy of the product is demonstrated by satisfying specific "pass criteria" as outcome of the tests, which include not producing failure in the functions, firmware, or hardware.

The test procedure, test equipment, and test sequences will be based on some benchmark tests, and observation of the voltage and current waveforms during the tests, including (if relevant) detection of a "walking wounded" condition resulting from a severe but not immediately lethal stress that would produce a hardware failure some time later on.

5.1.1.2-A Power port disturbances

Testing SHALL be conducted in accordance with the power port stress testing specified in IEEE Std C62.41.2™-2002 [IEEE02a] and IEEE Std C62.45™-2002 [IEEE02b].

Applies to: Electronic device

DISCUSSION

Both the IEEE and the IEC have developed test protocols for immunity of equipment power ports. In the case of a voting system intended for application in the United States, test equipment tailored to perform tests according to these two IEEE standards is readily available in tests laboratories, thus facilitating the process of compliance testing.

Source: New requirement

1 Comment

Comment by E Smith/B Pevzner (Manufacturer)

There are two IEEE specs are listed. Does it mean that testing for requirements of either of them is satisfactory?
5.1.1.2-A.1 Combination wave

Testing SHALL be conducted in accordance with the power port stress of "Category B" to be applied by a Combination Waveform generator, in the powered mode, between line and neutral as well as between line and equipment grounding conductor.

Applies to: Electronic device

DISCUSSION

To satisfy this requirement, it is recommended that voting systems be capable of withstanding a 1.2/50 – 8/20 Combination Wave of 6 kV open-circuit voltage, 3 kA short-circuit current, with the following application points:

  1. Three surges, positive polarity at the positive peak of the line voltage;
  2. Three surges, negative polarity at the negative peak of the line voltage, line to neutral;
  3. Three surges, positive polarity at the positive peak of the line voltage, line to equipment grounding conductor; and
  4. Three surges, negative polarity at the negative peak of the line voltage, line to equipment grounding conductor.

The requirement of three successive pulses is based on the need to monitor any possible change in the equipment response caused by the application of the surges.

Source: [IEEE02a] Table 3

5.1.1.2-A.2 Ring wave

Testing SHALL be conducted in accordance with the power port stress of "Category B" to be applied by a "Ring Wave" generator, in the powered mode, between line and neutral as well as between line and equipment grounding conductor and neutral to equipment grounding conductor, at the levels shown below.

Applies to: Electronic device

DISCUSSION

Two different levels are recommended:

  1. 6 kV open-circuit voltage per Table 2 of [IEEE02a], applied as follows:
    1. Three surges, positive polarity at the positive peak of the line voltage, line to neutral;
    2. Three surges, negative polarity at the negative peak of the line voltage, line to neutral;
    3. Three surges, positive polarity at the positive peak of the line voltage, line to equipment grounding conductor; and
    4. Three surges, negative polarity at the negative peak of the line voltage, line to equipment grounding conductor.

  2. 3 kV open circuit voltage, per Table 5 of [IEEE02a], applied as follows:
    1. Three surges, positive polarity at the positive peak of the line voltage, neutral to equipment grounding conductor; and
    2. Three surges, negative polarity at the negative peak of the line voltage, neutral to equipment grounding conductor.

Source: [IEEE02a] Table 2 and Table 5

5.1.1.2-A.3 Electrical fast transient burst

Testing SHALL be conducted in accordance with the recommendations of IEEE Std C62.41.2™-2002 [IEEE02a] and IEEE Std C62.45™-2002 [IEEE02b].

Applies to: Electronic device

DISCUSSION

Unlike the preceding two tests that are deemed to represent possibly destructive surges, the Electrical Fast Transient (EFT) Burst has been developed to demonstrate equipment immunity to non-destructive but highly disruptive events. Repetitive bursts of unidirectional 5/50 ns pulses lasting 15 ms and with 300 ms separation are coupled into terminals of the voting system by coupling capacitors for the power port and by the coupling clamp for the telephone connection cables.

Source: [IEEE02a] Table 6, [ISO04b]

5.1.1.2-A.4 Sags and swells

Testing SHALL be conducted by applying gradual steps of overvoltage across the line and neutral terminals of the voting system unit.

Applies to: Electronic device

DISCUSSION

Testing for sag immunity within the context of EMC is not necessary in view of Requirement Part 1: 6.3.4.3-A.4 that the voting system be provided with a two-hour back-up capability (to be verified by inspection). Testing for swells and permanent overvoltage conditions is necessary to ensure immunity to swells (no loss of data) and to permanent overvoltages (no overheating or operation of a protective fuse).

A) Short-duration Swells

As indicated by the ITI Curve [ITIC00], it is necessary to ensure that voting systems not be disturbed by a temporary overvoltage of 120 % normal line voltage lasting from 3 ms to 0.5 s. (Shorter durations fall within the definition of "surge.")

B) Permanent Overvoltage

As indicated by the ITI Curve [ITIC00], it is necessary to ensure that voting systems not be disturbed nor overheat for a permanent overvoltage of 110 % of the nominal 120 V rating of the voting system.

Source: New requirement

1 Comment

Comment by C R Williams (None)

"... permanent overvoltage of 110% of the nominal 120 V rating ..." A 110% overvoltage could mean that the total applied voltage is 120 v plus 110% of 120 v, or (120 + 132) v for 252 v total. I hope this is not what is meant. It would be good to clarify this, perhaps just stating what the total 'permanent withstand' voltage is for a nominal 120 v supply. (I'm guessing 132 v?)
5.1.1.2-B Communications (telephone) port disturbances

Testing SHALL be conducted in accordance with the telephone port stress testing specified in industry-recognized standards developed for telecommunications in general, particularly equipment connected to landline telephone service providers.

Applies to: Electronic device

DISCUSSION

Voting systems, by being connected to the outside service provider via premises wiring, can be exposed to a variety of electromagnetic disturbances. These have been classified as emissions from adjacent equipment, lightning-induced, power-fault induced, power contact, Electrical Fast Transient (EFT), and steady-state induced voltage.

Source: New requirement

5.1.1.2-B.1 Emissions from other connected equipment

Testing SHALL be conducted in accordance with the emissions limits stipulated for other equipment of the voting system connected to the premises wiring of the polling place.

Applies to: Electronic device

DISCUSSION

Emission limits for the power port of voting systems are discussed in Requirement Part 1: 6.3.4.2-B.1 with reference to numerical values stipulated in [Telcordia06]. EMC of a complete voting system installed in a polling facility thus implies that individual components of voting systems must demonstrate immunity against disturbances at a level equal to the limits stipulated for emissions of adjacent pieces of equipment.

Source: [Telcordia06] subclause 3.2.3

5.1.1.2-B.2 Lightning-induced disturbances

Testing SHALL be conducted in accordance with the requirements of Telcordia GR-1089 [Telcordia06] for simulation of lightning.

Applies to: Electronic device

DISCUSSION

Telcordia GR-6089 [Telcordia06] lists two types of tests, respectively (First-Level Lightning Surge Test and Second-Level Lightning Surge Test), as follows:

A) First-Level Lightning Surge Test

The particular voting system piece of equipment under test (generally referred to as "EUT") is placed in a complete operating system performing its intended functions, while monitoring proper operation, with checks performed before and after the surge sequence. Manual intervention or power cycling is not permitted before verifying proper operation of the voting system.

B) Second-Level Lightning Surge Test

Second-level lightning surge test is performed as a fire hazard indicator with cheesecloth applied to the particular EUT.

This second-level test, which can be destructive, may be performed with the EUT operating at a sub-assembly level equivalent to the standard system configuration, by providing dummy loads or associated equipment equivalent to what would be found in the complete voting system, as assembled in the polling place.

Source: [Telcordia06] subclauses 4.6.7 and 4.6.8

5.1.1.2-B.3 Power faults-induced disturbances

Testing SHALL be conducted in accordance with the requirements of Telcordia GR-1089 [Telcordia06] for simulation power-faults-induced events.

Applies to: Electronic device

DISCUSSION

Tests that can be used to assess the immunity of voting systems to power fault-induced disturbances are described in detail in [Telcordia06] for several scenarios and types of equipment, each involving a specific configuration of the test generator, test circuit, and connection of the equipment.

Source: [Telcordia06] subclause 4.6

5.1.1.2-B.4 Power contact disturbances

Testing SHALL be conducted in accordance with the requirements of Telcordia GR-1089 [Telcordia06] for simulation of power-contact events.

Applies to: Electronic device

DISCUSSION

Tests for power contact (sometimes called "power cross") immunity of voting systems immunity are described in detail in [Telcordia06] for several scenarios and types of equipment, each involving a specific configuration of the test generator, test circuit, and connection of the equipment.

Source: [Telcordia06] subclause 4.6

5.1.1.2-B.5 Electrical Fast Transient (EFT)

Testing SHALL be conducted in accordance with the requirements of Telcordia GR-1089 [Telcordia06] for application of the EFT Burst.

Applies to: Electronic device

DISCUSSION

Telcordia GR-1089 [Telcordia06] calls for performing EFT tests but refers to [ISO4b] for details of the procedure. While EFT generators, per the IEC standard [ISO4b], offer the possibility of injecting the EFT burst into a power port by means of coupling capacitors, the other method described by the IEC standard, the so-called "capacitive coupling clamp," would be the recommended method for coupling the burst into leads connected to the telephone port of the voting system under test. However, because the leads (subscriber wiring premises) vary from polling place to polling place, a more repeatable test is direct injection at the telephone port via the coupling capacitors.

Source: [ISO04b] clause 6

5.1.1.2-B.6 Steady-state induced voltage

Testing SHALL be conducted in accordance with the requirements of Telcordia GR-1089 [Telcordia06] for simulation of steady-state induced voltages.

Applies to: Electronic device

DISCUSSION

Telcordia GR-1089 [Telcordia06] describes two categories of tests, depending on the length of loops, the criterion being a loop length of 20 kft (sic). For metric system units, that criterion may be considered to be 6 km, a distance that can be exceeded for some low-density rural or suburban locations of a polling place. Therefore, the test circuit to be used should be the one applying the highest level of induced voltage.

Source: [Telcordia06] sub-clause 5.2

5.1.1.2-C Interaction between power port and telephone port

Inherent immunity against data corruption and hardware damage caused by interaction between the power port and the telephone port SHALL be demonstrated by applying a 0.5 µs – 100 kHz Ring wave between the power port and the telephone port.

Applies to: Electronic device

DISCUSSION

Although IEEE is in the process of developing a standard (IEEE PC62.50) to address the interaction between the power port and communications port, no standard has been promulgated at this date, but published papers in peer-reviewed literature [Key94] suggest that a representative surge can be the Ring Wave of [IEEE02a] applied between the equipment grounding conductor terminal of the voting system component under test and each of the tip and ring terminals of the voting system components intended to be connected to the telephone network.

Inherent immunity of the voting system might have been achieved by the manufacturer, as suggested in PC62.50, by providing a surge-protective device between these terminals that will act as a temporary bond during the surge, a function which can be verified by monitoring the voltage between the terminals when the surge is applied.

The IEEE project is IEEE PC62.50 "Draft Standard for Performance Criteria and Test Methods for Plug-in, Portable, Multiservice (Multiport) Surge Protective Devices for Equipment Connected to a 120/240 V Single Phase Power Service and Metallic Conductive Communication Line(s)." This is an unapproved standard, with estimated approval date 2008.

Source: New requirement

5.1.1.3 Radiated disturbances immunity

5.1.1.3-A Electromagnetic field immunity (80 MHz to 6.0 GHz)

Testing SHALL be conducted according to procedures in CISPR 24 [ANSI97], and either IEC 61000-4-3 [ISO06a] or IEC 61000-4-21:2003 [ISO06d].

Applies to: Electronic device

DISCUSSION

IEC 61000-4-3 [ISO06a] specifies using an absorber lined shielded room (fully or semi anechoic chamber) to expose the device-under-test. An alternative procedure is the immunity testing procedures of IEC [ISO06d], performed in a reverberating shielded room (radio-frequency reverberation chamber).

Source: [ANSI97], [ISO06a], [ISO06d]

5.1.1.3-B Electromagnetic field immunity (150 kHz to 80 MHz)

Testing for electromagnetic fields below 80 MHz SHALL be conducted according to procedures defined in IEC 61000-4-6 [ISO06b].

Applies to: Electronic device

Source: [FCC07], [ISO06b]

5.1.1.3-C Electrostatic discharge immunity

Testing SHALL be conducted in accordance with the recommendations of ANSI Std C63.16 [ANSI93], applying an air discharge or a contact discharge according to the nature of the enclosure of the voting system.

Applies to: Electronic device

DISCUSSION

Electrostatic discharges, simulated by a portable ESD simulator, involve an air discharge that can upset the logic operations of the circuits, depending on their status. In the case of a conducting enclosure, the resulting discharge current flowing in the enclosure can couple with the circuits and also upset the logic operations. Therefore, it is necessary to apply a sufficient number of discharges to significantly increase the probability that the circuits will be exposed to the interference at the time of the most critical transition of the logic. This condition can be satisfied by using a simulator with repetitive discharge capability while a test operator interacts with the voting terminal, mimicking the actions of a voter or initiating a data transfer from the terminal to the local tabulator.

Source: [ANSI93], [ISO01]

5.1.2 Electromagnetic compatibility (EMC) emissions limits

Testing of voting systems for EMC emission limits will be conducted using the black box testing approach, which "ignores the internal mechanism of a system or component and focuses solely on the outputs generated in response to selected inputs and execution conditions" [IEEE00].

It will be necessary to subject voting systems to a regimen of tests to demonstrate compliance with emission limits. The tests should include most, if not all disturbances that might be expected to be emitted from the implementation under test, unless compliance with mandatory limits such as FCC regulations is explicitly stated for the implementation under test.

 

5.1.2.1 Conducted emissions limits

5.1.2.1.1 Power port – low/high frequency ranges

As discussed in Part 1: 6.3.5 "Electromagnetic Compatibility (EMC) emission limits", the relative importance of low-frequency harmonic emissions and the current drawn by other loads in the polling place will result in a negligible percentage of harmonics at the point of common connection, as discussed in [IEEE92]. Thus, no test is required to assess the harmonic emission of a voting station.

High-frequency emission limits have been established by Federal Regulations [FCC07] as a condition for offering equipment to the US market. In such cases, part of the requirements include affixing a label or notice stating that the equipment complies with the stipulated limits. Therefore, the VVSG does not suggest performing a redundant test.

 

5.1.2.1.2 Communications (Telephone) port

5.1.2.1-A Communications port emissions

Unintended conducted emissions from a voting system telephone port SHALL be tested for its analog voice band leads in the metallic as well as its longitudinal voltage limits.

Applies to: Voting system

DISCUSSION

Telcordia GR-1089 [Telcordia06] stipulates limits for both the common mode (longitudinal) and differential mode (metallic) over a frequency range defined by maximum voltage and terminating impedances.

Source: [Telcordia06] subclause 3.2.3

5.1.2.2 Radiated emissions

5.1.2.2-A Radiated emission limits

Compliance with emission limits SHALL be documented on the hardware in accordance with the stipulations of FCC Part 15, Class B [FCC07].

Applies to: Voting system

Source: [FCC07]

5.1.3 Other (non-EMC) industry-mandated requirements

5.1.3.1 Dielectric stresses

5.1.3.1-A Dielectric withstand

Testing SHALL be conducted in accordance with the stipulations of industry-consensus telephone requirements of Telcordia GR-1089 [Telcordia06].

Applies to: Voting system

Source: [Telcordia06] Section 4.9.5

5.1.3.2 Leakage via grounding port

5.1.3.2-A Leakage current via grounding port

Simple verification of an acceptable low leakage current SHALL be performed by powering the voting system under test via a listed Ground-Fault Circuit Interrupter (GFCI) and noting that no tripping of the GFCI occurs when the voting system is turned on.

Applies to: Voting system

Source: New requirement

5.1.3.3 Safety

The presence of a listing label (required by authorities having jurisdiction) referring to a safety standard, such as [UL05], makes repeating the test regimen unnecessary. Details on the safety considerations are addressed in Part 1: 3.2.8.2 "Safety".

5.1.3.4 Label of compliance

Some industry mandated requirements require demonstration of compliance, while for others the manufacturer affixes of label of compliance, which then makes repeating the tests unnecessary and economically not justifiable.

5.1.4 Non-operating environmental testing

This type of testing is designed to assess the robustness of voting systems during storage between elections and during transporting between the storage facility and the polling place.

Such testing is intended to simulate exposure to physical shock and vibration associated with handling and transportation of voting systems between a jurisdiction's storage facility and polling places. The testing additionally simulates the temperature and humidity conditions that may be encountered during storage in an uncontrolled warehouse environment or precinct environment. The procedures and conditions of this testing correspond to those of MIL-STD-810D, "Environmental Test Methods and Engineering Guidelines."

5.1.4-A Tests of non-operating equipment

All voting systems SHALL be tested in accordance with the appropriate procedures of MIL-STD-810D, "Environmental Test Methods and Engineering Guidelines'' [MIL83].

Applies to: Voting system

Source: [VVSG2005]

1 Comment

Comment by Brian V. Jarvis (Local Election Official)

Note that the latest revision of MIL-STD-810 is revision F (dated 1 January 2000). The most recent change notice (Notice #3) for that standard is dated 5 May 2003. Recommend that this requirement be updated to indicate the most recent revision of this standard. (This latest revision may result in impacts to the requirements in the sub-sections below 5.1.4-A.)
5.1.4-A.1 Bench handling

All voting systems SHALL be tested in accordance with MIL-STD-810D, Method 516.3. Procedure VI.

Applies to: Voting system

DISCUSSION

This test simulates stresses faced during maintenance and repair.

Source: [VVSG2005]

5.1.4-A.2 Vibration

All voting systems SHALL be tested in accordance with MIL-STD-810D, Method 514.3, Category 1 – Basic Transportation, Common Carrier.

Applies to: Voting system

DISCUSSION

This test simulates stresses faced during transport between storage locations and polling places.

Source: [VVSG2005]

5.1.4-A.3 Storage temperature

All voting systems SHALL be tested in accordance with MIL-STD-810D: Method 502.2, Procedure I – Storage and Method 501.2, Procedure I – Storage. The minimum temperature SHALL be -4 degrees F, and the maximum temperature SHALL be 140 degrees F.

Applies to: Voting system

DISCUSSION

This test simulates stresses faced during storage.

Source: [VVSG2005]

5.1.4-A.4 Storage humidity

All voting systems SHALL be tested in accordance with humidity testing specified by MIL-STD-810D: Method 507.2, Procedure II – Natural (Hot-Humid), with test conditions that simulate a storage environment.

Applies to: Voting system

DISCUSSION

This test is intended to evaluate the ability of voting equipment to survive exposure to an uncontrolled temperature and humidity environment during storage.

Source: [VVSG2005]

5.1.5 Operating environmental testing

This type of testing is designed to assess the robustness of voting systems during operation.

5.1.5-A Tests of operating equipment

All voting systems SHALL be tested in accordance with the appropriate procedures of MIL-STD-810D, "Environmental Test Methods and Engineering Guidelines'' [MIL83].

Applies to: Voting system

Source: [VVSG2005]

5.1.5-A.1 Operating temperature

All voting systems SHALL be tested according to the low temperature and high temperature testing specified by MIL-STD-810-D [MIL83]: Method 502.2, Procedure II -- Operation and Method 501.2, Procedure II -- Operation, with test conditions that simulate system operation.

Applies to: Voting system

Source: [VVSG2005]

5.1.5-A.2 Operating humidity

All voting systems SHALL be tested according to the humidity testing specified by MIL-STD-810-D: Method 507.2, Procedure II – Natural (Hot –Humid), with test conditions that simulate system operation.

Applies to: Voting system

Source: New requirement

5.2 Functional Testing

Functional testing is performed to confirm the functional capabilities of a voting system. The accredited test lab designs and performs procedures to test a voting system against the requirements outlined in Part 1. Additions or variations in testing may be appropriate depending on the system's use of specific technologies and configurations, the system capabilities, and the outcomes of previous testing.

Functional tests cover the full range of system operations. They include tests of fully integrated system components, internal and external system interfaces, usability and accessibility, and security. During this process, election management functions, ballot-counting logic, and system capacity are exercised.

The accredited test lab tests the interface of all system modules and subsystems with each other against the manufacturer's specifications. For systems that use telecommunications capabilities, components that are located at the poll site or separate vote counting site are tested for effective interface, accurate vote transmission, failure detection, and failure recovery. For voting systems that use telecommunications lines or networks that are not under the control of the manufacturer (e.g., public telephone networks), the accredited test lab tests the interface of manufacturer-supplied components with these external components for effective interface, vote transmission, failure detection, and failure recovery.

The security tests focus on the ability of the system to detect, prevent, log, and recover from a broad range of security risks. The range of risks tested is determined by the design of the system and potential exposure to risk. Regardless of system design and risk profile, all systems are tested for effective access control and physical data security. For systems that use public telecommunications networks to transmit election management data or election results (such as ballots or tabulated results), security tests are conducted to ensure that the system provides the necessary identity-proofing, confidentiality, and integrity of transmitted data. The tests determine if the system is capable of detecting, logging, preventing, and recovering from types of attacks known at the time the system is submitted for qualification. The accredited test lab may meet these testing requirements by confirming the proper implementation of proven commercial security software.

5.2.1 General guidelines

5.2.1.1 General test template

Most tests will follow this general template. Different tests will elaborate on the general template in different ways, depending on what is being tested.

  1. Establish initial state (clean out data from previous tests, verify resident software/firmware);
  2. Program election and prepare ballots and/or ballot styles;
  3. Generate pre-election audit reports;
  4. Configure voting devices;
  5. Run system readiness tests;
  6. Generate system readiness audit reports;
  7. Precinct count only:
    1. Open poll;
    2. Run precinct count test ballots; and
    3. Close poll.
  8. Run central count test ballots (central count / absentee ballots only);
  9. Generate in-process audit reports;
  10. Generate data reports for the specified reporting contexts;
  11. Inspect ballot counters; and
  12. Inspect reports.

5.2.1.2 General pass criteria

5.2.1.2-A Applicable tests

The test lab need only consider tests that apply to the classes specified in the implementation statement, including those tests that are designated for all systems. The test verdict for all other tests SHALL be Not Applicable.

Applies to: Voting system

1 Comment

Comment by Brian V. Jarvis (Local Election Official)

Under 5.2.1.2, even though the title of the chapter is "General Pass Criteria", none of the subsections to 5.2.1.2 defines criteria for "Pass". Recommend adding a section defining the criteria for "Pass" -- unless (a) applicability of tests, (b) test assumptions, (c) missing functionality, and (d) demonstratable violations comprises an absolute finite list of conditions considered "non-pass."
5.2.1.2-B Test assumptions

If the documented assumptions for a given test are not met, the test verdict SHALL be Waived and the test SHALL NOT be executed.

Applies to: Voting system

5.2.1.2-C Missing functionality

If the test lab is unable to execute a given test because the system does not support functionality that is required per the implementation statement or is required for all systems, the test verdict SHALL be Fail.

Applies to: Voting system

5.2.1.2-D Any demonstrable violation justifies an adverse opinion

A demonstrable violation of any applicable requirement of the VVSG during the execution of any test SHALL result in a test verdict of Fail.

Applies to: Voting system

DISCUSSION

The nonconformities observed during a particular test do not necessarily relate to the purpose of that test. This requirement clarifies that a nonconformity is a nonconformity, regardless of whether it relates to the test purpose.

See Part 3: 2.5.5 "Test practices" for directions on termination, suspension, and resumption of testing following a verdict of Fail.

5.2.2 Structural coverage (white-box testing)

This section specifies requirements for "white-box" (glass-box, clear-box) testing of voting system logic.

For voting systems that reuse components or subsystems from previously tested systems, the test lab may, per Requirement Part 2: 5.1-D, find it unnecessary to repeat instruction, branch, and interface testing on the previously tested, unmodified components. However, the test lab must fully test all new or modified components and perform what regression testing is necessary to ensure that the complete system remains compliant.

5.2.2-A Instruction and branch testing

The test lab SHALL execute tests that provide coverage of every accessible instruction and branch outcome in application logic and border logic.

Applies to: Voting system

DISCUSSION

This is not exhaustive path testing, but testing of paths sufficient to cover every instruction and every branch outcome.

Full coverage of third-party logic is not mandated because it might include a large amount of code that is never used by the voting application. Nevertheless, the relevant portions of third-party logic should be tested diligently.

There should be no inaccessible code in application logic and border logic other than defensive code (including exception handlers) that is provided to defend against the occurrence of failures and "can't happen" conditions that cannot be reproduced and should not be reproducible by a test lab.

Source: Clarification of [VSS2002]/[VVSG2005] II.6.2.1 and II.A.4.3.3

3 Comments

Comment by Brian V. Jarvis (Local Election Official)

The voting system application software should not contain "...a large amount of code that is never used by the voting application." In fact, it should not contain any code that is not used by the voting application. All code in the voting application should only exist because it satisfies a requirement. If exception handlers in the source code cannot be logically invoked, recommend that it must be determined if any of this code is "deactivated code" or whether it is "dead code." If it is "deactivated code," evidence should be made available by the manufacturer that the deactivated code is disabled for the environments where its use is not intended. Unintended activation of deactivated code due to abnormal system conditions is the same as unintended activation of activated code. A combination of analysis and testing should show that the means by which such code could be inadvertantly executed are prevented, isolated, or eliminated. "Dead code" is executable code which, as a result of a design error cannot be executed or used in an operational configuration of the target computer environment and is not traceable to a system or software requirement. The "dead code" should be removed and an analysis performed to assess the effect and the need for reverification.

Comment by Gail Audette (Voting System Test Laboratory)

While this cites the VSS and VVSG this is not a clarification but a new requirement. The requirement for 100% coverage of every accessible instruction and outcome processing during unit test is not achievable in a finite amount of time. Based upon personal experience of flight software the level of scope identified in this requirement exceeds industry best practices for high reliability commercial products. We suggest reconciling this requirement with this type of product.

Comment by Al Backlund (Voting System Test Laboratory)

What is meant by "this is not exhaustive path testing"? I believe that "coverage of every accessible instruction and branch outcome" requires instrumented code and/or unit testing procedures which can be exhaustive. Would recommend that this be written as to indicate that all commonly used instruction and branch outcomes be exercised or something that more effectively communicates the scope expected.
5.2.2-B Interface testing

The test lab SHALL execute tests that test the interfaces of all application logic and border logic modules and subsystems, and all third-party logic modules and subsystems that are in any way used by application logic or border logic.

Applies to: Voting system

Source: Clarification of [VSS2002]/[VVSG2005] II.6.3

5.2.2-C Pass criteria for structural testing

The test lab SHALL define pass criteria using the VVSG (for standard functionality) and the manufacturer-supplied system documentation (for implementation-specific functionality) to determine acceptable ranges of performance.

Applies to: Voting system

DISCUSSION

Because white-box tests are designed based on the implementation details of the voting system, there can be no canonical test suite. Pass criteria must always be determined by the test lab based on the available specifications.

Since the nature of the requirements specified by the manufacturer-supplied system documentation is unknown, conformity for implementation-specific functionality may be subject to interpretation. Nevertheless, egregious disagreements between the behavior of the system and the behavior specified by the manufacturer should lead to a defensible adverse finding.

Source: [VSS2002]/[VVSG2005] II.A.4.3.3

5.2.3 Functional coverage (black-box testing)

All voting system logic, including any embedded in COTS components, is subject to functional testing.

For voting systems that reuse components or subsystems from previously tested systems, the test lab may, per Requirement Part 2: 5.1-D, find it unnecessary to repeat functional testing on the previously tested, unmodified components. However, the test lab must fully test all new or modified components and perform what regression testing is necessary to ensure that the complete system remains compliant.

 

5.2.3-A Functional testing, VVSG requirements

The test lab SHALL execute test cases that provide coverage of every applicable, mandatory ("SHALL"), functional requirement of the VVSG.

Applies to: Voting system

DISCUSSION

Depending upon the design and intended use of the voting system, all or part of the functions listed below must be tested:

  1. Ballot preparation subsystem;
  2. Test operations performed prior to, during, and after processing of ballots, including:
    1. Logic tests to verify interpretation of ballot styles, and recognition of precincts to be processed;
    2. Accuracy tests to verify ballot reading accuracy;
    3. Status tests to verify equipment statement and memory contents;
    4. Report generation to produce test output data; and
    5. Report generation to produce audit data records.
  3. Procedures applicable to equipment used in the polling place for:
    1. Opening the polls and enabling the acceptance of ballots;
    2. Maintaining a count of processed ballots;
    3. Monitoring equipment status;
    4. Verifying equipment response to operator input commands;
    5. Generating real-time audit messages;
    6. Closing the polls and disabling the acceptance of ballots;
    7. Generating election data reports;
    8. Transfer of ballot counting equipment, or a detachable memory module, to a central counting location; and
    9. Electronic transmission of election data to a central counting location.
  4. Procedures applicable to equipment used in a central counting place:
    1. Initiating the processing of a ballot deck, programmable memory device, or other applicable media for one or more precincts;
    2. Monitoring equipment status;
    3. Verifying equipment response to operator input commands;
    4. Verifying interaction with peripheral equipment, or other data processing systems;
    5. Generating real-time audit messages;
    6. Generating precinct-level election data reports;
    7. Generating summary election data reports;
    8. Transfer of a detachable memory module to other processing equipment;
    9. Electronic transmission of data to other processing equipment; and
    10. Producing output data for interrogation by external display devices.
  5. Security controls have been implemented, are free of obvious errors, and operating as described in security documentation.
    1. Cryptography;
    2. Access control;
    3. Setup inspection;
    4. Software installation;
    5. Physical security;
    6. System integrity management;
    7. Communications;
    8. Audit, electronic, and paper records; and
    9. System event logging.

This requirement is derived from [VSS2002]/[VVSG2005] II.A.4.3.4, "Software Functional Test Case Design," in lieu of a canonical functional test suite. Once a complete, canonical test suite is available, the execution of that test suite will satisfy this requirement. For reproducibility, use of a canonical test suite is preferable to development of custom test suites

In those few cases where requirements specify "fail safe" behaviors in the event of freak occurrences and failures that cannot be reproduced and should not be reproducible by a test lab, the requirement is considered covered if the test campaign concludes with no occurrences of an event to which the requirement would apply. However, if a triggering event occurs, the test lab must assess conformity to the requirement based on the behaviors observed.

Source: [VSS2002]/[VVSG2005] II.A.4.3.4

 

1 Comment

Comment by alan (General Public)

gather previous cards used in previous election and use these as test samples to be input in test cycle. gather people off the street and ask them to fill in the test cards as a sample test set give no instructions on how to fill in the cards, all fill in instructions should be on the material presented. testers/developers/team members will fill in the cards in ways that the public will not, so get a sample from the public.
5.2.3-B Functional testing, capacity tests

The test lab SHALL execute tests to verify that the system and its constituent devices are able to operate correctly at the limits specified in the implementation statement; for example:

  1. Maximum number of ballots;
  2. Maximum number of ballot positions;
  3. Maximum number of ballot styles;
  4. Maximum number of contests;
  5. Maximum vote total (counter capacity);
  6. Maximum number of provisional, challenged, or review-required ballots;
  7. Maximum number of contest choices per contest; and
  8. Any similar limits that apply.

Applies to: Voting system

DISCUSSION

See Part 1: 2.4 "implementation statement". Every kind of limit is not applicable to every kind of device. For example, EBMs may not have a limit on the number of ballots they can handle.

Source: Generalization from [VSS2002]/[VVSG2005] II.6.2.3

 

2 Comments

Comment by Carolyn Coggins (Voting System Test Laboratory)

This requirement is a problem in the earlier standards and 5.2.3-B and 5.2.3-B1 still does not provide sufficient guidance to ensure consistency of testing across all VSTLs. Specific benchmarks needs to be provided to the labs, manufacturers and election officials on acceptable stress test limits for a thru g. These guidelines can provide a benchmark for 5.2.3.B-1 on what is a practical test. Further identify: 1) A matrix of what limits are applicable to which voting systems 2) If the manufacturer may set a limit lower than the acceptable stress test limit 3) What/where must this information be documented (provide a reference if it is already identified)

Comment by Frank Padilla (Voting System Test Laboratory)

Who sets the limits and determines if they are accurate or cover enough?
5.2.3-B.1 Practical limit on capacity operational tests

If an implementation limit is sufficiently great that it cannot be verified through operational testing without severe expense and hardship, the test lab SHALL attest this in the test report and substitute a combination of design review, logic verification, and operational testing to a reduced limit.

Applies to: Voting system

DISCUSSION

For example, since counter capacity can easily be designed to 232 and beyond without straining current technology, some reasonable limit for required operational testing is needed. However, it is preferable to test the limit operationally if there is any way to accomplish it.

 

1 Comments

Comment by Cem Kaner (Academic)

This is a perfect example of a situation in which a test fixture can drive the system to a limit that is impractical to reach with end-to-end testing. We should permit the use of risk-focused tests that are not end-to-end as a more desirable alternative to testing with reduced limits. .......... (Affiliation Note: IEEE representative to TGDC)
5.2.3-C Functional testing, stress tests

The test lab SHALL execute tests to verify that the system is able to respond gracefully to attempts to process more than the expected number of ballots per precinct, more than the expected number of precincts, higher than expected volume or ballot tabulation rate, or any similar conditions that tend to overload the system's capacity to process, store, and report data.

Applies to: Voting system

DISCUSSION

In particular, Requirement Part 1: 7.5.6-A should be verified through operational testing if the limit is practically testable.

Source: [VSS2002]/[VVSG2005] II.A.4.3.5

 

2 Comments

Comment by Gail Audette (Voting System Test Laboratory)

"Gracefully" is not a testable requirement. Please identify testable pass/fail criteria.

Comment by Frank Padilla (Voting System Test Laboratory)

"Process more than the number of" is subjective and not testable or repeatable across labs.
5.2.3-D Functional testing, volume test

The test lab SHALL conduct a volume test in conditions approximating normal use in an election. The entire system SHALL be tested, from election definition through the reporting and auditing of final results.

Applies to: Voting system

DISCUSSION

Data collected during this test contribute substantially to the evaluations of reliability, accuracy, and misfeed rate (see Part 3: 5.3 "Benchmarks").

Source: [CA06]

 

2 Comments

Comment by Frank Padilla (Voting System Test Laboratory)

Subjective. What is normal use in an election? This needs to be defined.

Comment by ACCURATE (Aaron Burstein) (Academic)

Volume testing is a vital element of future certification with respect to voting system reliability. It simulates the load that a typical machine might encounter during its peak use period and does so on many devices at once. It has become important in California, at least; but state-level volume testing can never be as instrumentally effective as volume testing performed during national certification. Flaws found during national certification can be fixed immediately and the system re-certified during the ongoing certification process instead of having to re-submit a delta change under a new certification attempt. Thus, this requirement should be adopted.
5.2.3-D.1 Volume test, vote-capture devices

For systems that include VEBDs, a minimum of 100 VEBDs SHALL be tested and a minimum of 110 ballots SHALL be cast manually on each VEBD.

Applies to: VEBD

DISCUSSION

For vote-by-phone systems, this would mean having 100 concurrent callers, not necessarily 100 separate servers to answer the calls, if one server suffices to handle many incoming calls simultaneously. Other client-server systems would be analogous.

To ensure that the correct results are known, test voters should be furnished with predefined scripts that specify the votes that they should cast.

Source: [CA06]

 

1 Comment

Comment by Al Backlund (Voting System Test Laboratory)

100 voting terminals is not a practical number of units for several reasons: · ? Physical space requirements · ? Power requirements · ? Availability from manufacturer A possible solution is a discrete event simulation model developed by the manufacturer and verified for accuracy by the VSTL.
5.2.3-D.2 Volume test, precinct tabulator

For systems that include precinct tabulators, a minimum of 50 precinct tabulators SHALL be tested. No fewer than 10000 test ballots SHALL be used. No fewer than 400 test ballots SHALL be counted by each precinct tabulator.

Applies to: Precinct tabulator

DISCUSSION

[GPO90] 7.5 specified, "The total number of ballots to be processed by each precinct counting device during these tests SHALL be at least ten times the number of ballots expected to be counted on a single device in an election (500 to 750), but in no case less than 5,000."

It is permissible to reuse test ballots. However, all 10000 test ballots must be used at least once, and each precinct tabulator must count at least 400 (distinct) ballots. Cycling 100 ballots 4 times through a given tabulator would not suffice. See also, Requirement Part 3: 2.5.3-A (Complete system testing).

Source: [CA06]

 

2 Comments

Comment by Frank Padilla (Voting System Test Laboratory)

Is the number of test units supportable and representative of the requirements?

Comment by Al Backlund (Voting System Test Laboratory)

50 tabulators is not a practical number of units for several reasons: · ? Physical space requirements · ? Power requirements · ? Availability from manufacturer A possible solution is a discrete event simulation model developed by the manufacturer and verified for accuracy by the VSTL.
5.2.3-D.3 Volume test, central tabulator

For systems that include central tabulators, a minimum of 2 central tabulators SHALL be tested. No fewer than 10000 test ballots SHALL be used. A minimum ballot volume of 75000 (total across all tabulators) SHALL be tested, and no fewer than 10000 test ballots SHALL be counted by each central tabulator.

Applies to: Central tabulator

DISCUSSION

[CA06] did not specify test parameters for central tabulators. The test parameters specified here are based on the smallest case provided for central count systems in Exhibit J-1 of Appendix J, Acceptance Test Guidelines for P&M Voting Systems, of [GPO90]. An alternative would be to derive test parameters from the test specified in [GPO90] 7.3.3.2 and (differently) in [VSS2002]/[VVSG2005] II.4.7.1. A test of duration 163 hours with a ballot tabulation rate of 300 / hour yields a total ballot volume of 48900—presumably, but not necessarily, on a single tabulator.

[GPO90] 7.5 specified, "The number of test ballots for each central counting device SHALL be at least thirty times the number that would be expected to be voted on a single precinct count device, but in no case less than 15,000."

The ballot volume of 75000 is the total across all tabulators; so, for example, one could test 25000 ballots on each of 3 tabulators. The test deck must contain at least 10000 ballots. A deck of 15000 ballots could be cycled 5 times to generate the required total volume. See also, Requirement Part 3: 2.5.3-A (Complete system testing).

Source: [GPO90] Exhibit J-1 (Central Count)

 
5.2.3-D.4 Test imperfect marks and folds

The testing of MCOS SHALL include marks filled according to the recommended instructions to voters, imperfect marks as specified in Requirement Part 1: 7.7.5-D, and ballots with folds that do not intersect with voting targets.

Applies to: MCOS

Source: Numerous public comments and issues

 

2 Comments

Comment by alan (General Public)

test with dull #2 pencil test with black ink pen test with blue ink pen test where X is marked instead of circle filled in test if two circles filled in test if single line drawn through circle (all directions - | \ /) test with one circle is partially filled and the second circle on same line is filled completely test if mark is erased test if mark is partially erased test if mark is filled in and X'd out and other circle is filled in test if circle is ripped, torn or punctured test if circle is missing (bad form is used) test if circle is partially printed (bad form is used)

Comment by Frank Padilla (Voting System Test Laboratory)

Requirement is subjective. How are these imperfect marks to be input?
5.2.3-E Functional testing, languages

The test lab SHALL execute tests to verify that the system is able to produce and utilize ballots in all of the languages that are claimed to be supported in the implementation statement.

Applies to: Voting system

DISCUSSION

See Part 1: 2.4 "Implementation Statement".

 
5.2.3-F Functional testing, error cases

The test lab SHALL execute tests to verify that the system is able to detect, handle, and recover from abnormal input data, operator actions, and conditions.

Applies to: Voting system

DISCUSSION

See Requirement Part 1: 6.4.1.8-A and Part 1: 6.4.1.9.

Source: [VSS2002]/[VVSG2005] II.A.4.3.4

 

1 Comment

Comment by Frank Padilla (Voting System Test Laboratory)

Subjective: "abnormal input data" is not testable or repeatable.
5.2.3-F.1 Procedural errors

The test lab SHALL execute tests to verify that the system detects and handles operator errors such as inserting control cards out of sequence or attempting to install configuration data that are not properly coded for the device.

Applies to: Voting system

Source: [GPO90] 8.8

 

1 Comment

Comment by alan (General Public)

Testing should not be constrained to "sunny day"/positive tests. Testing should include negative test cases. Test cases should validate both the positive and negative aspects of a requirement. There should be no minimum number of tests (both positive/negative).
5.2.3-F.2 Hardware failures

The test lab SHALL execute tests to check that the system is able to respond to hardware malfunctions in a manner compliant with the requirements of Part 1: 6.4.1.9 "Recovery".

Applies to: Voting system

DISCUSSION

This capability may be checked by any convenient means (e.g., power off, disconnect a cable, etc.) in any equipment associated with ballot processing.

This test pertains to "fail safe" behaviors as discussed in Requirement Part 3: 5.2.3-A. The test lab may be unable to produce a triggering event, in which case the test is passed by default.

Source: [GPO90] 8.5

 
5.2.3-F.3 Communications errors

For systems that use networking and/or telecommunications capabilities, the test lab SHALL execute tests to check that the system is able to detect, handle, and recover from interference with or loss of the communications link.

Applies to: Voting system

DISCUSSION

This test pertains to "fail safe" behaviors as discussed in Requirement Part 3: 5.2.3-A. The test lab may be unable to produce a triggering event, in which case the test is passed by default.

Source: [VSS2002]/[VVSG2005] II.6.3

 
5.2.3-G Functional testing, manufacturer functionality

The test lab SHALL execute tests that provide coverage of the full range of system functionality specified in the manufacturer's documentation, including functionality that exceeds the specific requirements of the VVSG.

Applies to: Voting system

DISCUSSION

Since the nature of the requirements specified by the manufacturer-supplied system documentation is unknown, conformity for implementation-specific functionality may be subject to interpretation. Nevertheless, egregious disagreements between the behavior of the system and the behavior specified by the manufacturer should lead to a defensible adverse finding.

Source: [VSS2002]/[VVSG2005] II.3.2.3, II.6.7

 
5.2.3-H Functional test matrix

The test lab SHALL prepare a detailed matrix of VVSG requirements, system functions, and the tests that exercise them.

Applies to: Voting system

Source: [VSS2002]/[VVSG2005] II.A.4.3.4

 
5.2.3-I Pass criteria for functional testing

Pass criteria for tests that are adopted from a canonical functional test suite are defined by that test suite. For all other tests, the test lab SHALL define pass criteria using the VVSG (for standard functionality) and the manufacturer-supplied system documentation (for implementation-specific functionality) to determine acceptable ranges of performance.

Applies to: Voting system

DISCUSSION

Since the nature of the requirements specified by the manufacturer-supplied system documentation is unknown, conformity for implementation-specific functionality may be subject to interpretation. Nevertheless, egregious disagreements between the behavior of the system and the behavior specified by the manufacturer should lead to a defensible adverse finding.

Source: [VSS2002]/[VVSG2005] II.A.4.3.4

 

5.3 Benchmarks

5.3.1 General method

Reliability, accuracy, and misfeed rate are measured using ratios, each of which is the number of some kind of event (failures, errors, or misfeeds, respectively) divided by some measure of voting volume. The test method discussed here is applicable generically to all three ratios; hence, this discussion will refer to events and volume without specifying a particular definition of either.

By keeping track of the number of events and the volume over the course of a test campaign, one can trivially calculate the observed cumulative event rate by dividing the number of events by the volume. However, the observed event rate is not necessarily a good indication of the true event rate. The true event rate describes the expected performance of the system in the field, but it cannot be observed in a test campaign of finite duration, using a finite-sized sample. Consequently, the true event rate can only be estimated using statistical methods.

In accordance with the current practice in voting system testing, the system submitted for testing is assumed to be a representative sample, so the variability of devices of the same type is out of scope.

The test method makes the simplifying assumption that events occur in a Poisson distribution, which means that the probability of an event occurring is assumed to be the same for each unit of volume processed. In reality, there are random events that satisfy this assumption but there are also nonrandom events that do not. For example, a logic error in tabulation software might be triggered every time a particular voting option is used. Consequently, a test campaign that exercised that voting option often would be more likely to indicate rejection based on reliability or accuracy than a test campaign that used different tests. However, since these VVSG require absolute correctness of tabulation logic, the only undesirable outcome is the one in which the system containing the logic error is accepted. Other evaluations specified in these VVSG, such as functional testing and logic verification, are better suited to detecting systems that produce nonrandom errors and failures. Thus, when all specified evaluations are used together, the different test method complement each other and the limitation of this particular test method with respect to nonrandom events is not bothersome.

For simplicity, all three cases (failures, errors, and misfeeds) are modeled using a continuous distribution (Poisson) rather than a discrete distribution (Binomial). In this application, where the probability of an event occurring within a unit of volume is small, the difference in results from the discrete and continuous models is negligible.

The problem is approached through classical hypothesis testing. The null hypothesis (H0) is that the true event rate, rt, is less than or equal to the benchmark event rate, rb (which means that the system is conforming).

The
null hypothesis (H 0) is that the true event rate, r t, is less than or
equal to the benchmark event rate, r b

The alternative hypothesis (H1) is that the true event rate, rt, is greater than the benchmark event rate, rb (which means that the system is non-conforming).

The
null hypothesis (H 1) is that the true event rate, r t, is greater than
the benchmark event rate, r b

Assuming an event rate of r, the probability of observing n or less events for volume v is the value of the Poisson cumulative distribution function.

Assuming an event rate of r, the probability of observing n or less
events for volume v is the value of the Poisson cumulative distribution
function

Let no be the number of events observed during testing and vo be the volume produced during testing. The probability α of rejecting the null hypothesis when it is in fact true is limited to be less than 0.1. Thus, H0 is rejected only if the probability of no or more events occurring given a (marginally) conforming system is less than 0.1. H0 is rejected if 1−P(no−1,rbvo)<0.1, which is equivalent to P(no−1,rbvo)>0.9. This corresponds to the 90th percentile of the distribution of the number of events that would be expected to occur in a marginally conforming system.

If at the conclusion of the test campaign the null hypothesis is not rejected, this does not necessarily mean that conformity has been demonstrated. It merely means that there is insufficient evidence to demonstrate non-conformity with 90 % confidence.

Calculating what has been demonstrated with 90 % confidence, after the fact, is completely separate from the test described above, but the logic is similar. Suppose there are no observed events after volume vo. Solving the equation P(no,rdvo)=0.1 for rd finds the "demonstrated rate" rd such that if the true rate rt were greater than rd, then the probability of having no or fewer events would be less than 0.1. The value of rd could be greater or less than the benchmark event rate rb mentioned above.

Please note that the length of testing is determined in advance by the approved test plan. To adjust the length of testing based on the observed performance of the system in the tests already executed would bias the results and is not permitted. A Probability Ratio Sequential Test (PRST) [Wald47][Epstein55][MIL96] as was specified in previous versions of these VVSG varies the length of testing without introducing bias, but practical difficulties result when the length of testing determined by the PRST disagrees with the length of testing that is otherwise required by the test plan.

 

9 Comments

Comment by Gail Audette (Voting System Test Laboratory)

The benchmarks are determined by the 'observed event"; however, this critical value is not defined. It is the basis for benchmarking and must be defendable. As it is currently stated this is not defendable.

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

Amendment 2 to: USACM Comment #25. Section 5.3.1 General method [incorrect] Reference [2] Musa, Software Reliability Engineering (http://members.aol.com/JohnDMusa/book.htm)

Comment by Cem Kaner (Academic)

VVSG should not estimate reliability (or acceptability in any other way) of software by calculating the number of failures (test events?) divided by the number of tests (test volume?). It is too easy to influence this estimator by including large numbers of easy-to-pass tests. .......... (Affiliation Note: IEEE representative to TGDC)

Comment by Cem Kaner (Academic)

Lab tests focused on conformance testing do not model usage patterns in the field and therefore test results based on them cannot estimate failure rates in the field. This is not a defensible method for estimating reliability. .......... It may be possible to develop operational profiles from which reliability tests could be developed but this will require extensive research that would not be part of the approval process of any particular voting system. .......... (Affiliation Note: IEEE representative to TGDC)

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #25. Section 5.3.1 General method [incorrect] USACM recommends correcting the factual errors present in this subsection, as minimally enumerated below: 1. The system submitted for testing might be a representative sample of the pool of identical hardware/software systems, but the pool of tests should not be a representative sample of the events that happen during an election. 2. There is no reason to expect software reliability, software accuracy, and hardware misfeed rate to follow the same distribution. 3. The Poisson distribution is discrete, not continuous. 4. The Poisson process typically assumes a stationary underlying exponential distribution. The idea that software reliability, software accuracy, and hardware misfeed rates follow the same underlying distribution, or that the concatenation of these three (if there are only three) distributions would be anything like exponential is remarkable in its unlikelihood. 5. The observed event rate ("events" divided by "volume" over the course of a test campaign) is a highly biased measure. a. The first problem is that a regression test suite repeats the same tests from build to build. This gives rise to the classic problem of the "pesticide paradox" [ ]. The test suite is a tiny sample of the collection of possible tests. When the suite reveals bugs, they are fixed. Ultimately, the test suite becomes a collection of tests that have one thing in common: the software has passed all of them at least once. This differs from almost every other possible test (all of the ones that have not been run). Therefore, the reliability of the software is probably vastly overestimated. b. The second problem is that the pool of tests is structured to cover a specification. It does not necessarily target vulnerabilities of the software. Nor is it designed to reflect usage frequencies in the field [ ]. 6. Determining the length of testing in advance by an approved test plan sounds scientific, but many practitioners consider this software testing malpractice. There is substantial evidence that bugs cluster. Given a failure, there is reason to do follow-up testing to study this area of the product in more detail. Rigidly adhering to a plan created in the absence of failure data is to rigidly reject the idea of follow-up testing. This underestimates the number of problems in the code. Worse, this testing method reduces the chance that defects will be found and fixed because it reduces — essentially bans — the follow-up testing that would expose those additional defects.

Comment by Cem Kaner (Academic)

VVSG should not model software failure rates with a Poisson distribution or a Poisson process, or with any other distribution or stochastic process unless that distribution or process is derived from a logically and empirically-defensible model. .......... There is no reason to think that a mixture distribution that combines hardware and software events would be a simple Poisson. It is important to recognize that we are estimating performance in the tails of the distribution. To the extent that the true underlying distribution differs from the Poisson, deviations are particularly likely in the tails, yielding overestimates or underestimates of the significance of a given number of failures in a given period of activity. The statistical tables may have validity for hardware-related failures or for misfeeds, but there is no reason to think they would be valid for software or for the mixture distribution. .......... Robert Austin [Measuring and Managing Performance in Organizations, Dorset, 1996] wrote a particularly compelling discussion of the risks associating with basing high-stakes decisions on metrics that are not tightly tied to the underlying attribute that the metric attempts to estimate. Equipment vendors have a strong interest in making their numbers look good, or at least good enough to pass testing, and therefore they have an incentive to optimize their behavior in ways that improve the numbers. They also have an incentive to challenge the ways in which the numbers (total failures, total volume) are calculated, to arrive at a result that is more favorable. .......... I assume in all that follows that the manufacturers are acting in good faith. When we tell someone that their performance will be passed or failed according to a criterion, there is nothing dishonest in optimizing efforts to meet that criterion. If anything, that is what the criterion is there to accomplish. In particular, you should read the comments that follow with the understanding that I am explicitly and intentionally assuming that the vendors will be factually honest in everything that they do and that they are primarily motivated to achieve a "pass" from the system and not particularly motivated to do so in such a way as to mislead anyone about the underlying quality of the product. .......... * The VVSG (5.3.1) correctly notes that one of the characteristics of the Poisson model is that the probability of an event occurring stays constant over each "unit of volume processed." It then notes that this is not exactly correct for software because software errors might be nonrandom, that is, they might be triggered every time the same set of conditions is tested. It then dismisses this problem by saying "Thus, when all specific evaluations are used together, the different test methods complement each other and the limitation of the particular test method with respect to nonrandom events is not bothersome." I think this is a novel conclusion. I do not understand how mixing nonrandom events with random ones (to the extent that there are random failures in software) is a good foundation for a model that assumes that all events are random. .......... (a) For most software failures, the failure itself is not random at all. Given THESE conditions, THAT failure will occur. What might be thought of as random is whether and when the particular test that includes those conditions is presented to the software. The probability that a given test will yield a failure thus depends on at least two factors: how many problems remain in the software and how powerful the test is with respect to the types of problems that remain. As the software goes through testing, problems are fixed, and so the number of remaining problems diminishes. Therefore the assumption that the rate parameter of the Poisson distribution is stationary is implausible. .......... (b) The power of software tests run is not related to the underlying reliability of the software. Test power is (analogous to the power of a statistical test) the ability of the test to detect an error of a certain type if it is there. Note that no test has "absolute" power—a test that is optimized to expose an off-by-one error might be a weak detector of rounding errors. Thus, a lab can achieve a low failure rate (high apparent reliability) by running relatively low-power tests and a high failure rate (low apparent reliability) by running relatively high-power tests. Regression tests lose their power as they are used repeatedly, because the errors they are optimized to detect get found and fixed. (This problem was labeled the "pesticide paradox" by Boris Beizer in Software Testing Techniques, Van Nostrand, 2nd Edition, 1990; see also Kaner, Testing Computer Software, McGraw-Hill, 1987, p. 94). The improvement in apparent reliability with repeated use of regression tests should not be expected to predict improvement of reliability in the field, because users in the field do other things with the software beyond running these particular regression tests. Varying testing, for example by changing parameter values, combining tests, or running the tests in long random sequences, probably does a better job of mitigating operational risk but under the VVSG benchmarking definition, this testing will drive down estimated reliability at the same time as it contributes to the actual improvement of reliability. .......... (c) If there are B bugs in the software and we find a bug with 100% certainty if we run a specific test (or ones sufficiently like it), the probability of detecting one of the errors boils down to the probability that an error-revealing test makes it into the test suite. That depends on the sampling strategy (any test design strategy can be seen as a sampling strategy), whose details are under the control of the test lab, with some influence by the vendor and the VVSG. It is not clear what this sampling strategy has to do with the underlying reliability of the software. The VVSG-specified rate parameter probably has more to do with this sampling strategy than with operational reality. .......... (d) A Poisson process model for failures makes several assumptions. The first that I noted is that the probability of discovery of a failure is constant over time. This is implausible because the program presumably gets more stable, and the tests (if they are the fully scripted regression tests required by VVSG) get less powerful over time. The second is that instants of time (or units of volume) (that is, tests) are independent. This is also implausible. A widely reported pattern in test data is that some modules are much more error prone than others. Presumably, this is due to the inherent difficulty of some problems, the tremendous variations in individual programmer competence, perhaps a difference in time pressure associated with completing some tasks compared to others, etc. A sensible testing strategy adds new tests to further investigate areas that have shown some failures. If this is done, the probability of these tests exposing problems is relatively high, but that is a conditional probability—test X2 has a high probability of exposing a problem because test X1 did expose a problem. This is precisely the opposite of the assumption of independence between X1 and X2. Of course, one can preserve apparent independence of tests by never adding new tests to more carefully study areas of the program that seem weak. However, if the objective is to check the quality of the software, this restriction (no follow-up with related tests) would be bad testing, in conflict with the objective. Another problem for the idea of independence is the problem of identicality. Do we really think that the same test, run a second or third or fourth time (regression testing) should be treated as an independent sample from the pool of possible tests? .......... (e) Another problem with the Poisson process model is that some bugs are inherently harder to detect than others. Thus, if we have B bugs, the probability of detecting each one is not the same. It is usually easier to find a bug if it depends only on one feature or one parameter of a feature. A relatively simple test will do the trick. The only risk of obscurity is the possibility that only one value of the parameter would lead to failure. Special cases do exist, but they are often (and not always) at boundaries that are either visible externally or on review of the code. VVSG requires testing at boundaries and therefore most (and not all) of the single-variable special-case bugs are probably covered. However, some bugs involve combinations of two, three, or more variables or functions. Some of those variables might be relative timing of events (race conditions) or amount of free memory when a given task is attempted or access to some other resource. These are harder to detect with simple tests. .......... (f) Even the assumption that one test can only expose one defect is empirically challengeable. Unless a test is so successfully focused on the processing of one variable by one method that multiple problems are impossible (unit testing can achieve this, but not system testing), a given test might trip first over one feature and next over another. A test that combines 10 features might yield 10 (or more!) failures. This is not a merely theoretical possibility. It is a common heuristic in system testing that testing should start with single-feature tests and progress to relatively simple multi-feature combinations and then progress to user-meaningful rich scenarios. This is based on experience: companies that do mainly the multi-function scenario testing often find their tests blocked—a failure in the first steps of the test blocks continuation to the later steps. After the first bug is fixed, the test fails again, in a way that blocks further execution, and then when used again, it fails again. In the practitioner community, there are many anecdotes of bugs that should have been easy to find being found very late in testing because the planned test that finally exposed the bug was blocked by other bugs. Thus, we have commonplace examples of tests exposing many more than one defect. .......... In sum, there is no reason to think that any of the assumptions underlying a Poisson process model apply. .......... The VVSG provides a table for determining critical values associated with the ratio of the number of test events to the test volume. The idea is that even if the Poisson distribution is not a perfect estimator, perhaps it is a good first approximation. I am not a professional statistician, but I do have about 8-12 semesters of probability/statistics courses, a few more on modeling, and some practical research experience. My understanding is that if two distributions are similarly shaped, using one as an approximation of the other is possible—but the relative differences are likely to grow as you go out to the tails. That is, similar distributions often differ most in their assignment of probabilities to lower-probability events. A 90% criterion value is pretty far out in the tail of the distribution. If none of the assumptions of the Poisson model apply to software testing, it is hard to believe that numbers taken from the tail of that distribution accurately predict much about the system under test. .......... Here are some other problems associated with the VVSG's estimator of software reliability: .......... (1) As far as I can tell, in its treatment of reliability estimation, the VVSG assumes that the test volume is a fixed value, not itself a random variable. This is only true if one set of tests is run once and no other tests or other events are considered. Given that there will be regression tests, this is not true. Even if we count each regression tests only once, no matter how many times it is run (but that is unfair if the same test later exposes a different bug), a competent test lab does additional testing around any bug reported fixed. That is, if a given set of test conditions exposes a failure, and the equipment vendor fixes the bug and returns a new version of the software, a competent tester will not only test the fix with the original test that exposed the bug but will create new tests to see whether the fix actually covered the underlying problem. These can expose new problems and so they must be new units of test volume. Of course, to preserve a fixed volume of testing, we could choose not to allow such testing. However, as in many cases considered above, it might be highly undesirable to allow an incorrect model to be used as an excuse to constrain the power of testing. .......... (2) The VVSG assumes that the test results obtained in conformance testing can be used as a sound statistical estimator of the population reliability (see 5.3.1). This assumption is unreasonable. The reliability of the voting equipment software, in the field, will depend on how the software is used in the field. The tests designed for conformance testing are not designed with an objective of mapping to field usage. They are designed to achieve a level of simple coverage of the code, another level of simple coverage of documented requirements, another level of coverage of boundary values of individual variables, and so on. There are sound statistical methods available for estimating the reliability of the software in the field (see, for example, Musa, Software Reliability Engineering, McGraw-Hill, 1998), but they start from development of operational profiles—profiles of ways in which people will actually use the software. The next task is estimation of relative frequency of occurrence of each profile—a usage pattern twice as likely in the field should be involved in twice as many of the reliability tests. From here, one generates a large pool of tests, deriving each test from one of the profiles (varying specific parameters, or sequences of operation in ways consistent with the profile). Ideally, that generation should be itself driven by a random process that reflects usage patterns. From there, failure rate over the sample of tests might well be a valid estimator of field failure rate. If it is important to have a failure rate estimator, it is important to have a number that bears a defensible relationship to the underlying parameter. .......... (3) Development of operational profiles is an expensive proposition. Some vendors (such as AT&T and Microsoft) have access to customer usage patterns and, at significant expense, can develop profiles on their own. It is not clear that voting equipment vendors have this level of access to customer usage of their own equipment. In addition, the better study might be of usage of voting equipment generally, across vendors. If the profiles are essentially the same across vendors and equipment models, the same profiles can be used with new models as they are introduced, rather than requiring a hugely burdensome (in time and money) research program for each new model. Rather than requiring voting equipment vendors to do this type of research, it might make more sense for NIST (or some other agency) to fund independent (e.g. university-based) research to develop such profiles and assess their commonality across devices. This will take some number of years. Until those profiles are developed and usable, I think it is inadvisable to predicate any decisions on estimators of software reliability. .......... (5) If we assume that the TEST CAMPAIGN includes all tests done by the independent test lab, then the campaign includes all regression tests, no matter how many times these are repeated. Suppose that a given test is repeated ten times. When we compute the TEST VOLUME of the campaign, is this 10 tests or 1? .......... (6) It is one thing to say that the lab cannot qualify a device based on the testing of a prototype. It is another thing to bar the vendor from submitting a prototype to the lab for evaluation. Evaluating prototypes gives the lab an opportunity to build expertise with the system under test, making its ultimate testing of the final version more effective. And it gives the vendor an opportunity to discover the weaknesses it is blind to, enabling it to fix problems earlier in the development cycle. It is widely believed in the software engineering community that earlier testing improves quality and reduces costs. While VVSG should not require vendors to submit early versions to the lab (there may be more cost effective ways to evaluate early versions), surely it should not ban it. If a vendor does submit an early version for testing, do those tests count as part of the test campaign? Do those failures count in the ultimate total of test events? .......... (7) Suppose two equipment manufacturers have equivalent internal processes, in terms of the quality and functionality of their software, and (for simplicity) equivalent products. One submits its software to the independent test lab a little earlier in development than the other. The first submitter goes to the lab with a few more bugs and goes through one or more rounds of regression testing. Ultimately, the same bugs are found and fixed in both systems. Thus, at the end of testing, we have two equivalently reliable systems. What is the effect on the numbers? If the test campaign counts each time a regression test is run as a separate test, then the first submitter is increasing the measured test volume enormously by submitting early. If the product has only a few more bugs than the product submitted by the late submitter, then even though the first submitter’s test event total will be higher, its ratio of test events to test volume will be lower. In contrast, if regression tests are not counted twice but fixed bugs are counted as test events, then the incentive will go to the vendor who waits until the last possible minute to submit product to the lab. If we want VVSG to drive this strategy as a matter of policy, VVSG should explicitly consider and state the policy and the policy choice should be publicly reviewed. Instead, the method of calculation creates an implicit policy. .......... (8) To the extent that test volume is left loosely defined, the estimated reliability will vary enormously depending on how the test lab (paid by the vendor) computes the test volume. A rational vendor would spend effort advocating for the largest possible interpretation of volume, so as to make the denominator as large as possible. .......... (9) Consider applying a high-volume test strategy to the testing of the device. High-volume strategies have been used effectively for automotive software, telephone switching software, firmware in office automation products, and undoubtedly many other contexts. I will emphasize my own work below, and other work I am personally familiar with, not because I think it is the best in the field but because I can write with authority about the underlying observations. High-volume testing is a well-funded, fashionable area of work. Examples are state-model based testing that execute long sequences of sub-tests, each involving a controlled state transitions; testing using genetic algorithms; search-based testing, in which the test sequence involves test values chosen to be different from each other in a specified way (e.g. maximally dissimilar from the previous tests in the sequence); random-input tests or random-event tests in which a random source generates data or traffic for a long period or until the system crashes; and various types of extreme value attacks (heavy load, big input, extreme combinations, corrupted files) that string many individual tests into one long, grueling sequences of harsh tests. These are often done as security tests today, but they were seen as tests of robustness twenty years ago. These are not fundamentally new ideas. The concerns I raise below in the context of the testing types that I mention applies just as well to all of these other methods. Given that preamble, consider a specific example that I know well: suppose the lab applies long-sequence randomized regression testing (LSRRT), which McGee & Kaner ("Experiments with High Volume Test Automation," Workshop on Empirical Research in Software Testing, ACM SIGSOFT Software Engineering Notes, 29(5) 1-3 2004 discussed under the label "extended random regression"). In LSRRT, you take a set of tests that a particular build of the software has passed individually and string them together in an arbitrarily long random sequence. The key advantage of LSRRT over many other tests that push a device through very long sequences of tests is that the expected results of each test are known and therefore failures can be detected in terms of unexpected responses rather than waiting until the software crashes. A unexpected responses might be unexpected data, but it also might be unexpected behavioral timing. Oracle Corporation used a method like this in its early qualification of its database, for example. If a test that took T1 time to complete at one point in testing took T2 (much longer or much shorter) time a bit later, system engineers investigated the cause of the difference, often finding coding errors. (Unpublished oral personal communication from Bob Miner, 1987) As McGee & Kaner reported in their short case study summary, LSRRT exposed a large number of serious problems that were not being exposed by the individual tests themselves. Similarly, I have seen serious failures exposed in a different type of long-sequence testing by a PBX manufacturer whose code had gone through thorough unit testing. Stack corruption that built up over time, memory corruption triggered by particular subsequences of events or particular combinations of data, race conditions involving unexpected busy-ness of one of the processors in a multiprocessor system--these are examples of the kinds of problems exposed by long-sequence testing that are much harder to find by testing with one distinct functional test at a time. During an election, a voting system has to run without failure for many hours. Long-sequence testing addresses the question of operation over that long period. One-functional-test-at-a-time testing does not. Should a test lab employ this style of testing? If so, how should we count the test volume? At "Mentsville" (a fictitious name, requested by the well-known equipment manufacturer whose processes McGee and Kaner studied), LSRRT was often restricted to 25 distinct tests that were repeated in a random order. Fewer than 25 wasn’t seen as diverse enough. Many more than 25 made troubleshooting a failure a nightmare. (Why? Remember that the system can pass each test on its own, so the secret of the failure lies somewhere in the sequencing. If failure occurs after 48 hours of apparently-trouble-free operation, analysis of that sequence can be very complex. Limiting the number of distinct tests in the sequence was one way to limit that complexity.) Suppose that the test lab runs a 50-test LSRRT for 12 hours, i.e. a sequence that repeats 50 regression tests in random order until testing is terminated by failure or by successful completion of a 12-hour run. Suppose that on average, each of the 50 tests runs 100 times. Is this test volume 1, 50, or 5000? If the test volume is 1, equipment vendors will have a strong incentive to argue that very little of this testing should be done, because this is a very harsh style of testing. If test volume is 5000, equipment vendors will have a strong incentive to encourage the lab to do lots of this type of testing. I submit that the decision to apply this style of testing, the amount of testing to be done, and the characteristics of the tests combined in each suite should be based on other factors than the calculation of test-events/test-volume, but that calculation will drive potentially harsh debates. In practice, I have been told by testers of regulated products that they don’t do long sequence testing specifically because the metrics are impossible to agree on. Benchmark-estimation rules should NEVER drive decisions about what style of testing would be most effective for illuminating the risks associated with a product. .......... (Affiliation Note: IEEE representative to TGDC)

Comment by Cem Kaner (Academic)

It is inappropriate to treat software regression tests as if they were a representative sample of the behavior of the system under test because the system is optimized to pass them as they are repeatedly run. The more times they are run, the less predictive power they have with respect to other tests that involve other data, other combinations of functions, or other sequences of events. .......... (Affiliation Note: IEEE representative to TGDC)

Comment by Cem Kaner (Academic)

As it applies to software, this section's terminology is ambiguous or undefined. What is a test event? What is a test volume? What is a test campaign? .......... (Affiliation Note: IEEE representative to TGDC)

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

Amendment to: USACM Comment #25. Section 5.3.1 General method [incorrect] [1] Boris Beizer, Software Testing Techniques, Second Edition, 1990

5.3.2 Critical values

For a fixed probability p and a fixed value of n, the value of rv satisfying P(n,rv)=p is a constant. Part 3: Table 5-1 provides the values of rv for p=0.1 and p=0.9 for 0≤n≤750.

Given no observed events after volume vo, the demonstrated event rate rd is found by solving P(no,rdvo)=0.1 for rd. The pertinent factor is in the second column (p=0.1) in the row for n=no; dividing this factor by vo yields rd. For example, a volume of 600 with no events demonstrates an event rate of 2.302585/600, or 3.837642×10−3.

Since the condition for rejecting H0 is P(no−1,rbvo)>0.9, the critical value vc, which is the minimum volume at which H0 is not rejected for no observed events and event rate benchmark rb, is found by solving P(no−1,rbvc)=0.9 for vc. The pertinent factor is in the third column (p=0.9) in the row for n=no−1; dividing this factor by rb yields vc. For example, if a test with event rate benchmark rb=10−4 resulted in one observed event, then the system would be rejected unless the actual volume was at least 0.1053605/10−4, or 105.3605. Where the measurement of volume is discrete rather than continuous, one would round up to the next integer.

The values in Part 3: Table 5-1 were generated by the following script and Octave[2] version 2.1.73.

silent_functions=1

# Function for the root finder to zero.  fsolve won't pass extra
# parameters to the function being solved, so we must use globals.
# nGlobal is number of events; pGlobal is probability.
function rvRootFn = rvRoot (rv)
  global nGlobal pGlobal
  rvRootFn = poisson_cdf (nGlobal, rv) - pGlobal
endfunction

# Find rv given n and p.  To initialize the root finder, provide
# startingGuess that is greater than zero and approximates the
# answer.
function rvFn = rv (n, p, startingGuess)
  global nGlobal pGlobal
  nGlobal = n
  pGlobal = p
  startingGuess > 0 || error ("bad starting guess")
  [rvFn, info] = fsolve ("rvRoot", startingGuess)
  if (info != 1)
    perror ("fsolve", info)
  endif
endfunction

function table
  printf (" n      P=0.1        P=0.9\n")
  for n = 0:750
    rv01 = rv (n, 0.1, -4.9529e-05*n*n + 1.0715*n + 2.302585093)
    rv09 = rv (n, 0.9,  4.9522e-05*n*n + 0.9285*n + 0.105360516)
    printf ("%3u %.6e %.6e\n", n, rv01, rv09)
  endfor
endfunction

fsolve_options ("tolerance", 5e-12)
table

Table 5-1 Factors for calculation of critical values

n rv satisfying P(n,rv)=0.1 rv satisfying P(n,rv)=0.9 n rv satisfying P(n,rv)=0.1 rv satisfying P(n,rv)=0.9 n rv satisfying P(n,rv)=0.1 rv satisfying P(n,rv)=0.9
0 2.302585 0.1053605 251 272.5461 231.8821 501 530.9192 473.509
1 3.88972 0.5318116 252 273.5864 232.8418 502 531.9478 474.4804
2 5.32232 1.102065 253 274.6267 233.8015 503 532.9764 475.4519
3 6.680783 1.74477 254 275.6669 234.7613 504 534.0049 476.4233
4 7.99359 2.432591 255 276.707 235.7212 505 535.0334 477.3948
5 9.274674 3.151898 256 277.747 236.6812 506 536.0619 478.3663
6 10.53207 3.894767 257 278.787 237.6412 507 537.0904 479.3379
7 11.77091 4.656118 258 279.8269 238.6013 508 538.1188 480.3094
8 12.99471 5.432468 259 280.8667 239.5615 509 539.1472 481.2811
9 14.20599 6.221305 260 281.9064 240.5218 510 540.1755 482.2527
10 15.40664 7.020747 261 282.946 241.4822 511 541.2039 483.2243
11 16.59812 7.829342 262 283.9856 242.4426 512 542.2322 484.196
12 17.78159 8.645942 263 285.0251 243.4031 513 543.2605 485.1677
13 18.95796 9.469621 264 286.0645 244.3637 514 544.2887 486.1395
14 20.12801 10.29962 265 287.1039 245.3243 515 545.317 487.1113
15 21.29237 11.1353 266 288.1432 246.2851 516 546.3452 488.0831
16 22.45158 11.97613 267 289.1824 247.2459 517 547.3734 489.0549
17 23.60609 12.82165 268 290.2215 248.2067 518 548.4015 490.0267
18 24.75629 13.67148 269 291.2605 249.1677 519 549.4296 490.9986
19 25.90253 14.52526 270 292.2995 250.1287 520 550.4577 491.9705
20 27.0451 15.38271 271 293.3384 251.0898 521 551.4858 492.9424
21 28.18427 16.24356 272 294.3773 252.0509 522 552.5138 493.9144
22 29.32027 17.10758 273 295.416 253.0122 523 553.5418 494.8864
23 30.4533 17.97457 274 296.4547 253.9735 524 554.5698 495.8584
24 31.58356 18.84432 275 297.4934 254.9349 525 555.5978 496.8304
25 32.71121 19.71669 276 298.5319 255.8963 526 556.6257 497.8025
26 33.83639 20.59152 277 299.5704 256.8578 527 557.6536 498.7746
27 34.95926 21.46867 278 300.6088 257.8194 528 558.6815 499.7467
28 36.07992 22.34801 279 301.6472 258.781 529 559.7094 500.7189
29 37.1985 23.22944 280 302.6855 259.7428 530 560.7372 501.691
30 38.3151 24.11285 281 303.7237 260.7046 531 561.765 502.6632
31 39.42982 24.99815 282 304.7618 261.6664 532 562.7928 503.6355
32 40.54274 25.88523 283 305.7999 262.6283 533 563.8205 504.6077
33 41.65395 26.77403 284 306.8379 263.5903 534 564.8482 505.58
34 42.76352 27.66447 285 307.8758 264.5524 535 565.8759 506.5523
35 43.87152 28.55647 286 308.9137 265.5145 536 566.9036 507.5246
36 44.97802 29.44998 287 309.9515 266.4767 537 567.9313 508.497
37 46.08308 30.34493 288 310.9893 267.439 538 568.9589 509.4694
38 47.18676 31.24126 289 312.0269 268.4013 539 569.9865 510.4418
39 48.2891 32.13892 290 313.0646 269.3637 540 571.014 511.4142
40 49.39016 33.03786 291 314.1021 270.3261 541 572.0416 512.3866
41 50.48999 33.93804 292 315.1396 271.2886 542 573.0691 513.3591
42 51.58863 34.83941 293 316.177 272.2512 543 574.0966 514.3316
43 52.68612 35.74192 294 317.2144 273.2138 544 575.1241 515.3042
44 53.7825 36.64555 295 318.2517 274.1765 545 576.1515 516.2767
45 54.87781 37.55024 296 319.2889 275.1393 546 577.1789 517.2493
46 55.97209 38.45597 297 320.3261 276.1021 547 578.2063 518.2219
47 57.06535 39.36271 298 321.3632 277.065 548 579.2337 519.1945
48 58.15765 40.27042 299 322.4002 278.028 549 580.261 520.1672
49 59.249 41.17907 300 323.4372 278.991 550 581.2884 521.1399
50 60.33944 42.08863 301 324.4741 279.9541 551 582.3156 522.1126
51 61.42899 42.99909 302 325.511 280.9172 552 583.3429 523.0853
52 62.51768 43.9104 303 326.5478 281.8804 553 584.3702 524.0581
53 63.60553 44.82255 304 327.5845 282.8437 554 585.3974 525.0309
54 64.69257 45.73552 305 328.6212 283.807 555 586.4246 526.0037
55 65.77881 46.64928 306 329.6578 284.7704 556 587.4517 526.9765
56 66.86429 47.5638 307 330.6944 285.7338 557 588.4789 527.9493
57 67.94901 48.47908 308 331.7309 286.6973 558 589.506 528.9222
58 69.033 49.39509 309 332.7673 287.6609 559 590.5331 529.8951
59 70.11628 50.31182 310 333.8037 288.6245 560 591.5602 530.8681
60 71.19887 51.22923 311 334.84 289.5882 561 592.5872 531.841
61 72.28078 52.14733 312 335.8763 290.5519 562 593.6142 532.814
62 73.36203 53.06608 313 336.9125 291.5157 563 594.6412 533.787
63 74.44263 53.98548 314 337.9486 292.4796 564 595.6682 534.76
64 75.5226 54.90551 315 338.9847 293.4435 565 596.6952 535.7331
65 76.60196 55.82616 316 340.0208 294.4074 566 597.7221 536.7061
66 77.68071 56.74741 317 341.0568 295.3715 567 598.749 537.6792
67 78.75888 57.66924 318 342.0927 296.3355 568 599.7759 538.6523
68 79.83647 58.59165 319 343.1285 297.2997 569 600.8028 539.6255
69 80.9135 59.51463 320 344.1643 298.2639 570 601.8296 540.5986
70 81.98997 60.43815 321 345.2001 299.2281 571 602.8564 541.5718
71 83.06591 61.36221 322 346.2358 300.1924 572 603.8832 542.545
72 84.14132 62.2868 323 347.2714 301.1568 573 604.9099 543.5183
73 85.21622 63.21191 324 348.307 302.1212 574 605.9367 544.4915
74 86.29061 64.13753 325 349.3426 303.0857 575 606.9634 545.4648
75 87.3645 65.06364 326 350.378 304.0502 576 607.9901 546.4381
76 88.4379 65.99023 327 351.4135 305.0148 577 609.0168 547.4115
77 89.51083 66.91731 328 352.4488 305.9794 578 610.0434 548.3848
78 90.58329 67.84485 329 353.4842 306.9441 579 611.07 549.3582
79 91.65529 68.77285 330 354.5194 307.9088 580 612.0966 550.3316
80 92.72684 69.7013 331 355.5546 308.8736 581 613.1232 551.305
81 93.79795 70.63019 332 356.5898 309.8384 582 614.1498 552.2785
82 94.86863 71.55951 333 357.6249 310.8033 583 615.1763 553.2519
83 95.93888 72.48927 334 358.6599 311.7683 584 616.2028 554.2254
84 97.00871 73.41944 335 359.6949 312.7333 585 617.2293 555.1989
85 98.07813 74.35002 336 360.7299 313.6983 586 618.2558 556.1725
86 99.14714 75.281 337 361.7648 314.6634 587 619.2822 557.146
87 100.2158 76.21239 338 362.7996 315.6286 588 620.3086 558.1196
88 101.284 77.14416 339 363.8344 316.5938 589 621.335 559.0932
89 102.3518 78.07631 340 364.8692 317.5591 590 622.3614 560.0668
90 103.4193 79.00885 341 365.9038 318.5244 591 623.3878 561.0405
91 104.4864 79.94175 342 366.9385 319.4897 592 624.4141 562.0141
92 105.5531 80.87502 343 367.9731 320.4552 593 625.4404 562.9878
93 106.6195 81.80865 344 369.0076 321.4206 594 626.4667 563.9615
94 107.6855 82.74263 345 370.0421 322.3861 595 627.493 564.9353
95 108.7512 83.67695 346 371.0765 323.3517 596 628.5192 565.909
96 109.8165 84.61162 347 372.1109 324.3173 597 629.5454 566.8828
97 110.8815 85.54663 348 373.1453 325.283 598 630.5716 567.8566
98 111.9462 86.48197 349 374.1796 326.2487 599 631.5978 568.8304
99 113.0105 87.41764 350 375.2138 327.2144 600 632.624 569.8043
100 114.0745 88.35362 351 376.248 328.1802 601 633.6501 570.7781
101 115.1382 89.28993 352 377.2821 329.1461 602 634.6762 571.752
102 116.2016 90.22655 353 378.3162 330.112 603 635.7023 572.7259
103 117.2647 91.16347 354 379.3503 331.078 604 636.7284 573.6999
104 118.3275 92.1007 355 380.3843 332.044 605 637.7544 574.6738
105 119.3899 93.03823 356 381.4182 333.01 606 638.7804 575.6478
106 120.4521 93.97605 357 382.4521 333.9761 607 639.8064 576.6218
107 121.514 94.91416 358 383.486 334.9422 608 640.8324 577.5958
108 122.5756 95.85256 359 384.5198 335.9084 609 641.8584 578.5699
109 123.6369 96.79124 360 385.5536 336.8747 610 642.8843 579.5439
110 124.698 97.7302 361 386.5873 337.841 611 643.9102 580.518
111 125.7587 98.66944 362 387.6209 338.8073 612 644.9361 581.4921
112 126.8192 99.60895 363 388.6546 339.7737 613 645.962 582.4662
113 127.8794 100.5487 364 389.6881 340.7401 614 646.9879 583.4404
114 128.9394 101.4888 365 390.7217 341.7066 615 648.0137 584.4145
115 129.9991 102.4291 366 391.7552 342.6731 616 649.0395 585.3887
116 131.0586 103.3696 367 392.7886 343.6396 617 650.0653 586.3629
117 132.1177 104.3104 368 393.822 344.6062 618 651.0911 587.3372
118 133.1767 105.2515 369 394.8553 345.5729 619 652.1168 588.3114
119 134.2354 106.1928 370 395.8886 346.5396 620 653.1426 589.2857
120 135.2938 107.1344 371 396.9219 347.5063 621 654.1683 590.26
121 136.352 108.0762 372 397.9551 348.4731 622 655.194 591.2343
122 137.41 109.0182 373 398.9883 349.4399 623 656.2196 592.2086
123 138.4677 109.9605 374 400.0214 350.4068 624 657.2453 593.183
124 139.5252 110.903 375 401.0545 351.3737 625 658.2709 594.1573
125 140.5825 111.8457 376 402.0875 352.3407 626 659.2965 595.1317
126 141.6395 112.7887 377 403.1205 353.3077 627 660.3221 596.1061
127 142.6963 113.7318 378 404.1535 354.2748 628 661.3477 597.0806
128 143.7529 114.6753 379 405.1864 355.2419 629 662.3732 598.055
129 144.8093 115.6189 380 406.2192 356.209 630 663.3987 599.0295
130 145.8655 116.5627 381 407.252 357.1762 631 664.4242 600.004
131 146.9214 117.5068 382 408.2848 358.1434 632 665.4497 600.9785
132 147.9771 118.4511 383 409.3176 359.1107 633 666.4752 601.953
133 149.0326 119.3955 384 410.3503 360.078 634 667.5006 602.9276
134 150.088 120.3402 385 411.3829 361.0453 635 668.5261 603.9022
135 151.1431 121.2851 386 412.4155 362.0127 636 669.5515 604.8768
136 152.198 122.2302 387 413.4481 362.9802 637 670.5768 605.8514
137 153.2527 123.1755 388 414.4806 363.9476 638 671.6022 606.826
138 154.3072 124.121 389 415.5131 364.9152 639 672.6276 607.8007
139 155.3615 125.0667 390 416.5455 365.8827 640 673.6529 608.7754
140 156.4156 126.0126 391 417.5779 366.8503 641 674.6782 609.7501
141 157.4695 126.9586 392 418.6103 367.818 642 675.7035 610.7248
142 158.5233 127.9049 393 419.6426 368.7856 643 676.7287 611.6995
143 159.5768 128.8514 394 420.6749 369.7534 644 677.754 612.6743
144 160.6302 129.798 395 421.7071 370.7211 645 678.7792 613.649
145 161.6834 130.7448 396 422.7393 371.689 646 679.8044 614.6238
146 162.7364 131.6918 397 423.7714 372.6568 647 680.8296 615.5986
147 163.7892 132.639 398 424.8035 373.6247 648 681.8548 616.5735
148 164.8418 133.5864 399 425.8356 374.5926 649 682.8799 617.5483
149 165.8943 134.5339 400 426.8676 375.5606 650 683.905 618.5232
150 166.9465 135.4816 401 427.8996 376.5286 651 684.9302 619.4981
151 167.9987 136.4295 402 428.9316 377.4966 652 685.9552 620.473
152 169.0506 137.3776 403 429.9635 378.4647 653 686.9803 621.4479
153 170.1024 138.3258 404 430.9954 379.4329 654 688.0054 622.4229
154 171.154 139.2742 405 432.0272 380.401 655 689.0304 623.3978
155 172.2054 140.2228 406 433.059 381.3692 656 690.0554 624.3728
156 173.2567 141.1715 407 434.0907 382.3375 657 691.0804 625.3478
157 174.3078 142.1204 408 435.1225 383.3058 658 692.1054 626.3228
158 175.3587 143.0695 409 436.1541 384.2741 659 693.1304 627.2979
159 176.4095 144.0187 410 437.1858 385.2425 660 694.1553 628.2729
160 177.4601 144.9681 411 438.2174 386.2109 661 695.1802 629.248
161 178.5106 145.9176 412 439.2489 387.1793 662 696.2051 630.2231
162 179.5609 146.8673 413 440.2805 388.1478 663 697.23 631.1982
163 180.6111 147.8171 414 441.3119 389.1163 664 698.2549 632.1734
164 181.6611 148.7671 415 442.3434 390.0848 665 699.2797 633.1485
165 182.7109 149.7173 416 443.3748 391.0534 666 700.3045 634.1237
166 183.7606 150.6676 417 444.4062 392.0221 667 701.3293 635.0989
167 184.8102 151.618 418 445.4375 392.9907 668 702.3541 636.0741
168 185.8596 152.5686 419 446.4688 393.9594 669 703.3789 637.0493
169 186.9089 153.5193 420 447.5001 394.9282 670 704.4036 638.0246
170 187.958 154.4702 421 448.5313 395.8969 671 705.4284 638.9999
171 189.0069 155.4213 422 449.5625 396.8658 672 706.4531 639.9751
172 190.0558 156.3724 423 450.5936 397.8346 673 707.4778 640.9505
173 191.1045 157.3237 424 451.6247 398.8035 674 708.5025 641.9258
174 192.153 158.2752 425 452.6558 399.7724 675 709.5271 642.9011
175 193.2014 159.2268 426 453.6868 400.7414 676 710.5518 643.8765
176 194.2497 160.1785 427 454.7178 401.7104 677 711.5764 644.8518
177 195.2978 161.1304 428 455.7488 402.6794 678 712.601 645.8272
178 196.3458 162.0824 429 456.7797 403.6485 679 713.6256 646.8027
179 197.3937 163.0345 430 457.8106 404.6176 680 714.6501 647.7781
180 198.4414 163.9868 431 458.8415 405.5867 681 715.6747 648.7535
181 199.489 164.9392 432 459.8723 406.5559 682 716.6992 649.729
182 200.5365 165.8917 433 460.9031 407.5251 683 717.7237 650.7045
183 201.5839 166.8443 434 461.9338 408.4944 684 718.7482 651.68
184 202.6311 167.7971 435 462.9646 409.4637 685 719.7727 652.6555
185 203.6781 168.7501 436 463.9952 410.433 686 720.7972 653.6311
186 204.7251 169.7031 437 465.0259 411.4023 687 721.8216 654.6066
187 205.7719 170.6563 438 466.0565 412.3717 688 722.8461 655.5822
188 206.8186 171.6096 439 467.0871 413.3412 689 723.8705 656.5578
189 207.8652 172.563 440 468.1176 414.3106 690 724.8949 657.5334
190 208.9117 173.5165 441 469.1481 415.2801 691 725.9192 658.509
191 209.958 174.4702 442 470.1786 416.2496 692 726.9436 659.4847
192 211.0043 175.4239 443 471.209 417.2192 693 727.9679 660.4603
193 212.0504 176.3778 444 472.2394 418.1888 694 728.9922 661.436
194 213.0963 177.3319 445 473.2698 419.1584 695 730.0165 662.4117
195 214.1422 178.286 446 474.3001 420.1281 696 731.0408 663.3874
196 215.1879 179.2403 447 475.3304 421.0978 697 732.0651 664.3631
197 216.2336 180.1946 448 476.3607 422.0675 698 733.0893 665.3389
198 217.2791 181.1491 449 477.3909 423.0373 699 734.1136 666.3147
199 218.3245 182.1037 450 478.4211 424.0071 700 735.1378 667.2904
200 219.3698 183.0584 451 479.4513 424.9769 701 736.162 668.2662
201 220.415 184.0133 452 480.4814 425.9468 702 737.1862 669.2421
202 221.46 184.9682 453 481.5115 426.9167 703 738.2103 670.2179
203 222.505 185.9232 454 482.5416 427.8866 704 739.2345 671.1938
204 223.5498 186.8784 455 483.5716 428.8566 705 740.2586 672.1696
205 224.5945 187.8337 456 484.6016 429.8266 706 741.2827 673.1455
206 225.6392 188.789 457 485.6316 430.7966 707 742.3068 674.1214
207 226.6837 189.7445 458 486.6615 431.7667 708 743.3309 675.0973
208 227.7281 190.7001 459 487.6914 432.7368 709 744.355 676.0733
209 228.7724 191.6558 460 488.7213 433.7069 710 745.379 677.0492
210 229.8166 192.6116 461 489.7511 434.6771 711 746.403 678.0252
211 230.8607 193.5675 462 490.781 435.6473 712 747.427 679.0012
212 231.9047 194.5235 463 491.8107 436.6175 713 748.451 679.9772
213 232.9485 195.4797 464 492.8405 437.5878 714 749.475 680.9532
214 233.9923 196.4359 465 493.8702 438.5581 715 750.499 681.9293
215 235.036 197.3922 466 494.8999 439.5284 716 751.5229 682.9053
216 236.0796 198.3486 467 495.9295 440.4987 717 752.5468 683.8814
217 237.1231 199.3051 468 496.9591 441.4691 718 753.5708 684.8575
218 238.1664 200.2618 469 497.9887 442.4395 719 754.5946 685.8336
219 239.2097 201.2185 470 499.0182 443.41 720 755.6185 686.8097
220 240.2529 202.1753 471 500.0478 444.3805 721 756.6424 687.7859
221 241.296 203.1322 472 501.0773 445.351 722 757.6662 688.762
222 242.339 204.0892 473 502.1067 446.3215 723 758.6901 689.7382
223 243.3819 205.0463 474 503.1361 447.2921 724 759.7139 690.7144
224 244.4247 206.0035 475 504.1655 448.2627 725 760.7377 691.6906
225 245.4674 206.9608 476 505.1949 449.2333 726 761.7614 692.6668
226 246.51 207.9182 477 506.2242 450.204 727 762.7852 693.643
227 247.5525 208.8757 478 507.2535 451.1747 728 763.8089 694.6193
228 248.5949 209.8333 479 508.2828 452.1454 729 764.8327 695.5956
229 249.6372 210.791 480 509.312 453.1162 730 765.8564 696.5718
230 250.6795 211.7488 481 510.3413 454.087 731 766.8801 697.5482
231 251.7216 212.7066 482 511.3704 455.0578 732 767.9038 698.5245
232 252.7636 213.6646 483 512.3996 456.0287 733 768.9274 699.5008
233 253.8056 214.6226 484 513.4287 456.9995 734 769.9511 700.4772
234 254.8475 215.5807 485 514.4578 457.9704 735 770.9747 701.4535
235 255.8893 216.539 486 515.4869 458.9414 736 771.9983 702.4299
236 256.931 217.4973 487 516.5159 459.9123 737 773.0219 703.4063
237 257.9726 218.4557 488 517.5449 460.8833 738 774.0455 704.3827
238 259.0141 219.4141 489 518.5739 461.8544 739 775.0691 705.3592
239 260.0555 220.3727 490 519.6028 462.8254 740 776.0926 706.3356
240 261.0969 221.3314 491 520.6317 463.7965 741 777.1162 707.3121
241 262.1381 222.2901 492 521.6606 464.7676 742 778.1397 708.2885
242 263.1793 223.2489 493 522.6894 465.7388 743 779.1632 709.265
243 264.2204 224.2078 494 523.7183 466.71 744 780.1867 710.2416
244 265.2614 225.1668 495 524.7471 467.6812 745 781.2102 711.2181
245 266.3023 226.1259 496 525.7758 468.6524 746 782.2336 712.1946
246 267.3431 227.0851 497 526.8046 469.6237 747 783.2571 713.1712
247 268.3839 228.0443 498 527.8333 470.595 748 784.2805 714.1478
248 269.4246 229.0037 499 528.862 471.5663 749 785.3039 715.1243
249 270.4652 229.9631 500 529.8906 472.5376 750 786.3273 716.101

Table 5-2 Plot of values from Table 5-1

Table
5-22 Plot of values from Table 5-1

2 Comments

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

Amendment to: USACM Comment #26. Section 5.3.1 General method [incorrect] USACM recommends correcting the factual errors present in this subsection, as minimally enumerated below: 1. The system submitted for testing might be a representative sample of the pool of identical hardware/software systems, but the pool of tests should not be a representative sample of the events that happen during an election. 2. There is no reason to expect software reliability, software accuracy, and hardware misfeed rate to follow the same distribution. 3. The Poisson distribution is discrete, not continuous. 4. The Poisson process typically assumes a stationary underlying exponential distribution. The idea that software reliability, software accuracy, and hardware misfeed rates follow the same underlying distribution, or that the concatenation of these three (if there are only three) distributions would be anything like exponential is remarkable in its unlikelihood. 5. The observed event rate ("events" divided by "volume" over the course of a test campaign) is a highly biased measure. a. The first problem is that a regression test suite repeats the same tests from build to build. This gives rise to the classic problem of the "pesticide paradox" [1]. The test suite is a tiny sample of the collection of possible tests. When the suite reveals bugs, they are fixed. Ultimately, the test suite becomes a collection of tests that have one thing in common: the software has passed all of them at least once. This differs from almost every other possible test (all of the ones that have not been run). Therefore, the reliability of the software is probably vastly overestimated. b. The second problem is that the pool of tests is structured to cover a specification. It does not necessarily target vulnerabilities of the software. Nor is it designed to reflect usage frequencies in the field [2]. 6. Determining the length of testing in advance by an approved test plan sounds scientific, but many practitioners consider this software testing malpractice. There is substantial evidence that bugs cluster. Given a failure, there is reason to do follow-up testing to study this area of the product in more detail. Rigidly adhering to a plan created in the absence of failure data is to rigidly reject the idea of follow-up testing. This underestimates the number of problems in the code. Worse, this testing method reduces the chance that defects will be found and fixed because it reduces — essentially bans — the follow-up testing that would expose those additional defects. References [1] Boris Beizer, Software Testing Techniques, Second Edition, 1990 [2] Musa, Software Reliability Engineering (http://members.aol.com/JohnDMusa/book.htm

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #26. Section 5.3.2. Critical Values [incorrect] This section should be adjusted according to factual corrections made in the previous comment.

5.3.3 Reliability

5.3.3-A Reliability, pertinent tests

All tests executed during conformity assessment SHALL be considered "pertinent" for assessment of reliability, with the following exceptions:

  1. Tests in which failures are forced;
  2. Tests in which portions of the system that would be exercised during an actual election are bypassed (see Part 3: 2.5.3 "Test fixtures").

Applies to: Voting system

1 Comment

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #27. 5.3.3 Reliability [incorrect] USACM notes the apparent self-contradictions in this sub-section, enumerated in the discussion below. DISCUSSION: 1. Failure rate data are not relevant to prediction of reliability in the field unless we assume that the failure rate in the lab is representative of the failure rate that will be found in the field. This might be rational for hardware, but unless we structure the software tests to map to usage in the field, there is no rational basis for this assumption vis-à-vis the software. 2. Pass/fail criteria are based on the concatenation of hardware and software failures. A paper jam rates the same as miscount of votes. 3. Counting all "failures" for statistical purposes creates an adversarial dynamic around the classification of anomalous behaviors. To the extent that an apparently-incorrect behavior is arguably not inconsistent with the specification, there is an incentive to class it as a non-bug and therefore not fix it. The incentives should favor improving the software, not classifying problems as non-problems
5.3.3-B Failure rate data collection

The test lab SHALL record the number of failures and the applicable measure of volume for each pertinent test execution, for each type of device, and for each applicable failure type in Part 1: Table 6-3 (Part 1: 6.3.1.5 "Requirements").

Applies to: Voting device

DISCUSSION

"Type of device" refers to the different models produced by the manufacturer. These are not the same as device classes. The system may include several different models of the same class, and a given model may belong to more than one class.

1 Comment

Comment by Gail Audette (Voting System Test Laboratory)

How are failures defined? Is the number of failures (either recoverable or non-recoverable) counted without respect to the severity?
5.3.3-C Failure rate pass criteria

When operational testing is complete, the test lab SHALL calculate the failure total and total volume accumulated across all pertinent tests for each type of device and failure type. If, using the test method in Part 3: 5.3.1 "General method", these values indicate rejection of the null hypothesis for any type of device and type of failure, the verdict on conformity to Requirement Part 1: 6.3.1.5-A SHALL be Fail. Otherwise, the verdict SHALL be Pass.

Applies to: Voting device

1 Comment

Comment by Gail Audette (Voting System Test Laboratory)

Benchmarks are typically based on industry-wide data. What comparable industry was used for the factors for calculation of critical values? We see how the lab is expected to generate the true event rate (although we don't agree with the collection method). How are the labs supposed to evaluate the benchmark event rate in order to evaluate the null hypothesis (conforming or non-conforming)?.

5.3.4 Accuracy

The informal concept of voting system accuracy is formalized using the ratio of the number of errors that occur to the volume of data processed, also known as error rate.

5.3.4-A Accuracy, pertinent tests

All tests executed during conformity assessment SHALL be considered "pertinent" for assessment of accuracy, with the following exceptions:

  1. Tests in which errors are forced;
  2. Tests in which portions of the system that would be exercised during an actual election are bypassed (see Part 3: 2.5.3 "Test fixtures").

Applies to: Voting system

1 Comment

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #28. Section 5.3.4. Accuracy [incorrect] USACM notes the apparent self-contradictions in this sub-section as described in the discussion below. DISCUSSION: Accuracy is operationalized (not formalized) as a ratio of errors found to volume of data processed. One may assume that the word "error" is tied tightly to events that yield a miscount of the votes, allow someone to cast extra votes, or cause someone to be unable to cast a vote. If "error" includes anything in the behavior of the program that would not create an error in election result, it is difficult to understand what this operationalization has to do with the naturalistic concept of "accuracy" in a system that collects and counts votes. The operationalization is defective as an estimator unless the pool of tests is designed so as to be representative of the pool of behaviors in the field. If some aspect of the system causes a small mistake (e.g. 1-vote miscount), but is only tested once, that might be a major source of inaccuracy if everyone encounters it while voting, and it might be a trivial source if almost no one encounters it. For example, imagine a system that allowed ballots that could accept write-in votes for up to 100 candidates. Imagine an error in which 1 vote in 10 is lost in the 100th race that includes a write-in candidate. As a boundary case, this error might show up in several tests. However, it might never show up in an election. What is the accuracy using the described metric? Without a mapping from the estimator to the construct being estimated, the metric is worthless. This is a fundamental issue in measurement. We normally call it construct validity. The argument that this measure of accuracy estimates underlying system accuracy lacks even face validity.
5.3.4-B Calculation of report total error rate

Given a set of vote data reports resulting from the execution of tests, the observed cumulative report total error rate SHALL be calculated as follows:

  1. Define a "report item" as any one of the numeric values (totals or counts) that must appear in any of the vote data reports. Each ballot count, each vote, overvote, and undervote total for each contest, and each vote total for each contest choice in each contest is a separate report item. The required report items are detailed in Part 1: 7.8.3 "Vote data reports";
  2. For each report item, compute the "report item error" as the absolute value of the difference between the correct value and the reported value. Special cases: If a value is reported that should not have appeared at all (spurious item), or if an item that should have appeared in the report does not (missing item), assess a report item error of one. Additional values that are reported as a manufacturer extension to the standard are not considered spurious items;
  3. Compute the "report total error" as the sum of all of the report item errors from all of the reports;
  4. Compute the "report total volume" as the sum of all of the correct values for all of the report items that are supposed to appear in the reports. Special cases: When the same logical contest appears multiple times (e.g., when results are reported for each ballot configuration and then combined or when reports are generated for multiple reporting contexts), each manifestation of the logical contest is considered a separate contest with its own correct vote totals in this computation;
  5. Compute the observed cumulative report total error rate as the ratio of the report total error to the report total volume. Special cases: If both values are zero, the report total error rate is zero. If the report total volume is zero but the report total error is not, the report total error rate is infinite;

Applies to: Voting system

Source: Revision of [GPO90] F.6

5.3.4-C Error rate data collection

The test lab SHALL record the report total error and report total volume for each pertinent test execution.

Applies to: Voting system

DISCUSSION

Accuracy is calculated as a system-level metric, not separated by device type.

1 Comment

Comment by Gail Audette (Voting System Test Laboratory)

Benchmarks are typically based on industry-wide data. What comparable industry was used for the factors for calculation of critical values? We see how the lab is expected to generate the true event rate (although we don't agree with the collection method). How are the labs supposed to evaluate the benchmark event rate in order to evaluate the null hypothesis (conforming or non-conforming)?.
5.3.4-D Error rate pass criteria

When operational testing is complete, the test lab SHALL calculate the report total error and report total volume accumulated across all pertinent tests. If, using the test method in Part 3: 5.3.1 "General method", these values indicate rejection of the null hypothesis, the verdict on conformity to Requirement Part 1: 6.3.2-B SHALL be Fail. Otherwise, the verdict SHALL be Pass.

Applies to: Voting system

5.3.5 Misfeed rate

This benchmark applies only to paper-based tabulators and EBMs. Multiple feeds, misfeeds (jams), and rejections of ballots that meet all manufacturer specifications are all treated collectively as "misfeeds" for benchmarking purposes (i.e., only a single count is maintained).

5.3.5-A misfeed rate, pertinent tests

All tests executed during conformity assessment SHALL be considered "pertinent" for assessment of misfeed rate, with the following exceptions:

  1. Tests in which misfeeds are forced.

Applies to: Voting system

5.3.5-B Calculation of misfeed rate

For paper-based tabulators and EBMs, the observed cumulative misfeed rate SHALL be calculated as follows:

  1. Compute the "misfeed total" as the number of times that unforced multiple feed, misfeed (jam), or rejection of a ballot that meets all manufacturer specifications has occurred during the execution of tests. It is possible for a given ballot to misfeed more than once – in such a case, each misfeed would be counted:
  2. Compute the "total ballot volume" as the number of successful feeds of ballot pages or cards during the execution of tests. (If the pages of a multi-page ballot are fed separately, each page counts; but if both sides of a two-sided ballot are read in one pass through the tabulator, it only counts once);
  3. Compute the observed cumulative misfeed rate as the ratio of the misfeed total to the total ballot volume. Special cases: If both values are zero, the misfeed rate is zero. If the total ballot volume is zero but the misfeed total is not, the misfeed rate is infinite.

Applies to: Paper-based device Λ Tabulator, EBM

DISCUSSION

"During the execution of tests" deliberately excludes jams that occur during pre-testing setup and calibration of the equipment. Uncalibrated equipment can be expected to jam frequently. Source: New requirement

5.3.5-C Misfeed rate data collection

The test lab SHALL record the misfeed total and total ballot volume for each pertinent test execution, for each type of device.

Applies to: Paper-based device Λ Tabulator, EBM

DISCUSSION

"Type of device" refers to the different models of paper-based tabulators and EBMs produced by the manufacturer.

 

1 Comment

Comment by Gail Audette (Voting System Test Laboratory)

Benchmarks are typically based on industry-wide data. What comparable industry was used for the factors for calculation of critical values? We see how the lab is expected to generate the true event rate (although we don't agree with the collection method). How are the labs supposed to evaluate the benchmark event rate in order to evaluate the null hypothesis (conforming or non-conforming)?.
5.3.5-D Misfeed rate pass criteria

When operational testing is complete, the test lab SHALL calculate the misfeed total and total ballot volume accumulated across all pertinent tests. If, using the test method in Part 3: 5.3.1 "General method", these values indicate rejection of the null hypothesis for any type of device, the verdict on conformity to Requirement Part 1: 6.3.3-A SHALL be Fail. Otherwise, the verdict SHALL be Pass.

Applies to: Paper-based device Λ Tabulator, EBM

5.4 Open-Ended Vulnerability Testing

Vulnerability testing is an attempt to bypass or break the security of a system or a device. Like functional testing, vulnerability testing can falsify a general assertion (namely, demonstrate that the system or device is secure) but it cannot verify the security (show that the system or device is secure in all cases). Open-ended vulnerability testing (OEVT) is conducted without the confines of a pre-determined test suite. It instead relies heavily on the experience and expertise of the OEVT Team Members, their knowledge of the system, its component devices and associated vulnerabilities, and their ability to exploit those vulnerabilities.

The goal of OEVT is to discover architecture, design and implementation flaws in the system that may not be detected using systematic functional, reliability, and security testing and which may be exploited to change the outcome of an election, interfere with voters’ ability to cast ballots or have their votes counted during an election or compromise the secrecy of the vote. The goal of OEVT also includes attempts to discover logic bombs, time bombs or other Trojan Horses that may have been introduced into the system hardware, firmware, or software for said purposes.

7 Comments

Comment by Gail Audette (Voting System Test Laboratory)

If the OEVT relies heavily on the experience and expertise of the OEVT Team Members, the testing is not repeatable and does not comply with the NIST 150-22 for repeatability (4.13.3).

Comment by Brit Williams (Academic)

This entire appears to be hastily written and poorly thought out. The purpose of certification testing is to verify that the voting system complies with the voting system guidelines. OEVT, as written, is a scatter shot approach to testing that does not address compliance with amy specific guideline. There is no management/oversight structure of the OEVT test team presented. There is no process for selecting team members presented. This is not to say that these types of tests have no value, but they should be part of the EAC Certification Procedures and not part of the VVSG. I will submit specific comments below.

Comment by David Beirne, Executive Director, Election Technology Council (Manufacturer)

OEVT is laudable, but difficult to incorporate into voting system design features. It is, by definition, subjective and undefined resulting in a security threshold that is difficult, if not impossible, to design for. Given the fact that the current dynamic for the industry is the financing of the voting system certification, no provider wishes to submit a product through an expensive process to have it fail based on a subjective standard that is not repeatable. This security process should be renamed and should incorporate clear security benchmarks that are broad in scope, but clear in their performance requirement.

Comment by E Smith/L Korb (Manufacturer)

The requirements, scope, mandate and documentation requirements of this specification are poorly defined and are dependent upon the makeup of the OEVT team. Additional work should be done to specify a fair and reasonable test. Many of the requirements of the VVSG would seem to open the possibility for new types of denial of service attacks. Much work needs to be accomplished before this section is ready for implementation.

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #29. OEVT Goal [imprecise] USACM recommends that the present stated goal for OEVT (Sect 5.4, par 2, first sentence) be modified to read as follows: "The goal of OEVT is to test and analyze the target system to discover flaws that could adversely affect the election process and that may reflect systemic development process weaknesses." DISCUSSION: The current text is focused on discovering flaws that could invalidate election results. The proposed language would allow OEVT to be used in checking for flaws in other aspects of election operations, including accessibility and usability. OEVT is not meant as a replacement for quality design. The use of OEVT must be carefully described. It is a process that is difficult to replicate (if it were easy to replicate, it would not really be open-ended), so any requirements on OEVT in this VVSG should focus on the process — how the team is selected, the scope of work — of OEVT rather than specific steps. As with other parts of the testing process, this must be open to review by qualified scholars.

Comment by Matt Bishop, Mark Gondree, Sean Peisert, Elliot Proebstel (Academic)

We are excited by the addition of an "open-ended vulnerability testing" (OEVT) phase of the certification process, as described in the Aug 31, 2007 draft of the EAC's VVSG standards. Adding this type of testing, which is widely used in other venues for systems that must provide high security and reliability, will be invaluable in finding problems that could disrupt elections. As computer security practitioners, researchers, and graduate students, we anticipate that such a phase will be invaluable in future versions of these standards. The value of the OEVT lies in its ability to detect flaws that arise during use. In computer security, many flaws arise because mechanisms are integrated into a single unit, and even if the mechanisms are secure, the integration may inject unanticipated errors and problems. Further, humans are imperfect; so, as they interact with systems, problems--including security flaws--become evident. Lastly, developers of security mechanisms often make assumptions about the environment and use of their systems that differ from the environment and use of their systems in practice. Red teams can exploit this discrepancy to find flaws that security analysts who examine the systems and source code cannot. Therefore, from the point of view of computer security, open ended vulnerability testing is a valuable and necessary addition to the voting system federal certification process. The federal certification process will be better able to catch unanticipated design and implementation vulnerabilities before these flaws become "headliners". This will improve the effectiveness of testing requirements that require source code, design, and document review. It can help to identify areas where procedural defenses are critical to system security. As a result, this may lead to an increase in voter confidence and a decrease in the discrepancies between the results of federal certification and state testing, ultimately reducing the need for testing at the state level. This can also provide a means for more thoroughly testing systems that are not software independent, should such systems be "grandfathered" for a period. Requiring software independence helps the red teams frame their analysis, because one part of their work can be to verify whether the system meets this requirement--for if it does, undetected software flaws become much less harmful. In fact, software independence provides assurance that the system will function correctly even if there are software flaws. Red teaming cannot establish the absence of flaws, merely their presence, and so software independence adds assurance that the voting system will meet its goals. Based on our experience with similar Red Teaming exercises (members of our group have participated in numerous Red Teaming exercises and security evaluations relevant to election security, including the 2007 FSU report and 2007 CA Top-to-Bottom review), we have enumerated several comments below. We often refer to the part of the test lab performing the OEVT review as the "Red Team," while referring to the rest of the staff as "the lab." We refer to those threats achieving goals which would fall under the focus of the Red Team (as defined in 5.4.1-B) as "5.4.1-B threats."

Comment by ACCURATE (Aaron Burstein) (Academic)

This section lists requirements for vulnerability testing including team composition, the scope of testing, resources made available, level of effort to be expended and the rules of engagement for evaluation of the system. The team make-up and qualifications requirements are designed such that the testers possess a high level of expertise. Contrary to some criticisms of the VVSG draft as having no requirements to define vulnerability testing, these concrete requirements help to define this type of testing. They should be included in the VVSG.

5.4.1 OEVT scope and priorities

5.4.1-A Scope of open-ended vulnerability testing

The scope of open ended vulnerability testing SHALL include the voting system security during all phases of the voting process and SHALL include all manufacturer supplied voting system use procedures.

DISCUSSION

The scope of OEVT includes but is not limited to the following:

  1. Voting system security;
  2. Voting system physical security while voting devices are:
    1. In storage;
    2. Being configured;
    3. Being transported; and
    4. Being used.
  3. Voting system use procedures.

Source: New requirement

1 Comment

Comment by Brit Williams (Academic)

This section should specify that the voting system under test should be installed in an election environment, including all of the procedural security features used during an actual election.
5.4.1-B Focus of open-ended vulnerability testing

OEVT Team members SHALL seek out vulnerabilities in the voting system that might be used to change the outcome of an election, to interfere with voters’ ability to cast ballots or have their votes counted during an election or to compromise the secrecy of vote.

Source: New requirement

 

4 Comments

Comment by alan (General Public)

The OEVT team SHALL have at least one member (and not be the same person) with 6 or more years of experience in the area of software engineering, at least one member with 6 or more years of experience in the area of information security, at least one member with 6 or more years of experience in the area of penetration testing and at least one member with 6 or more years of experience in the area of voting system security. This requirement should not individualize the experience described above. These experience requirements should be contained by multiple team members.

Comment by Gail Audette (Voting System Test Laboratory)

Is the review of resumes to verify compliance of the test lab established OEVT members part of the test plan submitted to the EAC for approval (extension of Part 2 section 5.2-F)?

Comment by Brit Williams (Academic)

What is the rational for requiring that the security team members have six years experience and the electiion management team member have eight years experience?

Comment by Matt Bishop, Mark Gondree, Sean Peisert, Elliot Proebstel (Academic)

Red Team member requirements 5.4.2-C requires the Red Team be composed of "at last one member with 6 or more years of experience in the area of voting system security." Under this requirement, no members of the groups involved in the CA Top-to-Bottom review, the Florida State report, the Johns Hopkins report, and the RABA report would qualify. Further, expertise in voting systems, while helpful, has not proven essential to previous work. This clause should probably be a recommendation and not a requirement, and change "6 years" to "3 years".
5.4.1-C OEVT General Priorities

The OEVT team SHALL prioritize testing efforts based on:

  1. threat scenarios for the voting system under investigation;
  2. the availability of time and resources;
  3. the OEVT team’s determination of easily exploitable vulnerabilities; and
  4. the OEVT team’s determination of which exploitation scenarios are more likely to impact the outcome of an election, interfere with voters’ ability to cast ballots or have their votes counted during an election or compromise the secrecy of the vote.

DISCUSSION

Following are suggestions for OEVT prioritization in the areas of threat scenarios, COTS products and Internet based threats. The intent here is to provide guidance on how to prioritize testing efforts given specific voting device implementations.

  1. All threat scenarios must be plausible in that they should not be in conflict with the anticipated implementation, associated use procedures, the workmanship requirements in section 6.4 (assuming those requirements were all met) or the development environment specification as supplied by the manufacturer in the TDP;
  2. Open-ended vulnerability testing should not exclude those threat scenarios involving collusion between multiple parties including manufacturer insiders. It is acknowledged that threat scenarios become less plausible as the number of conspirators increases;
  3. It is assumed that attackers may be well resourced and may have access to the system while under development;
  4. Threats that can be exploited to change the outcome of an election and flaws that can provide erroneous results for an election should have the highest priority;
  5. Threats that can cause a denial of service during the election should be considered of very high priority;
  6. Threats that can compromise the secrecy of the vote should be considered of high priority;
  7. A threat to disclosure or modification of metadata (e.g., security audit log) that does not change the outcome of the election, does not cause denial of service during the election, or does not compromise the secrecy of ballot should be considered of lower priority;
  8. If the voting device uses COTS products, then the OEVT team should also investigate publicly known vulnerabilities; and
  9. The OEVT team should not consider the voting device vulnerabilities that require Internet connectivity for exploitation if the voting device is not connected to the Internet during the election and otherwise. However, if the voting device is connected to another device which in turn may have been connected to the Internet (as may be the case of epollbooks), Internet based attacks may be plausible and should be investigated.

Source: New requirement

2 Comments

Comment by Harry VanSickle (State Election Official)

Please explain how "experts" will be identified for purposes of OEVT team composition. More specifically, please identify who makes that decision. The standards outline the requisite skills that are necessary to be an OEVT team member, but the standards are not clear regarding who determines whether a candidate meets the criteria.

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #31. 5.4.2-E OEVT team knowledge [incorrect] USACM recommends that the word "Complete" in items numbered a and b in this subsection be replaced by the word "Expert". DISCUSSION: No one, and no small team, can have "complete knowledge." A patently impossible requirement offers no guidance as to the expected level of knowledge and competence. On the other hand, "expert knowledge", while subjective, is a recognized standard.

5.4.2 OEVT resources and level of effort

5.4.2-A OEVT team resources

The OEVT team SHALL use the manufacturer supplied Technical Data Package (TDP) and User documentation, have access to voting devices configured similar to how they are to be used in an election, and have access to all other material and tools necessary to conduct a thorough investigation.

DISCUSSION

Materials supplied to the OEVT team should include but not be limited to the following:

  1. Threat analysis describing threats mitigated by the voting system;
  2. Security architecture describing how threats to the voting system are mitigated;
  3. High level design of the system;
  4. Any other documentation provided to the testing laboratory;
  5. Source code;
  6. Operational voting system configured for election, but with the ability for the OEVT team to reconfigure it;
  7. Testing reports from the developer and from the testing laboratory including previous OEVT results;
  8. Tools sufficient to conduct a test lab build; and
  9. Procedures specified by the manufacturer as necessary for implementation and secure use.

Source: New requirement

1 Comment

Comment by Cem Kaner (Academic)

The task of the OEVT should not be to verify anything; it should operate from the assumption that what has been handed to it is defective. The OEVT is tasked with discovering new types of problems which were missed in conventional testing. Focusing the OEVT effort on the materials already relied on by the other testers is a good way to encourage OEVT failure. If you aren't going to unchain the OEVT team, don't waste the manufacturer's money on them. .......... (Affiliation Note: IEEE representative to TGDC)
5.4.2-B Open-ended vulnerability team establishment

The test lab SHALL establish an OEVT team of at least 3 security experts and at least one election management expert to conduct the open-ended vulnerability testing.

Source: New requirement

2 Comments

Comment by Brit Williams (Academic)

The OEVT team should not have the authority to fail a voting system. The OEVT team should conduct their tests and submit to the VSTL a report of their findings. Similarly, the VSTL prepares a report of their tests, including the results of the OEVT, and submits their report to the EAC. The EAC is the only entity that has the authority to fail a voting system. Also, the OEVT should not be allowed to release any of their findings to any organization other than the VSTL. Ultimately, the EAC has the responsibility to determine which reports should be released to the public and which should remain proprietary in order to protect the intellectual property of the vendor or to protect the national security.

Comment by Premier Election Solutions (Manufacturer)

A system's security is based on it's threat model. If the OEVT Team modifies a system's threat model with subjective and unsubstantiated threats, then any system would fail. If the guidelines require OEVT testing, then the guidelines should provide a threat model to guide manufacturer's in the design of their security. Proposed Change: Provide a threat model in the guidelines and remove the ability for the OEVT team to modify that threat model except through formal requests for changes to revisions of the guidelines.
5.4.2-C OEVT team composition – security experts

The OEVT team SHALL have at least one member with 6 or more years of experience in the area of software engineering, at least one member with 6 or more years of experience in the area of information security, at least one member with 6 or more years of experience in the area of penetration testing and at least one member with 6 or more years of experience in the area of voting system security.

Source: New requirement

5.4.2-D OEVT Team Composition- Election Management Expert

The OEVT team SHALL have at least one member with at least 8 years of experience in the area of election management.

DISCUSSION

The OEVT team will require consultation from an elections expert who is familiar with election procedures, how the voting systems are installed and used, and how votes are counted.

Source: New requirement

5.4.2-E OEVT team knowledge

The OEVT team knowledge SHALL include but not be limited to the following:

  1. Complete knowledge of work done to date on voting system design, research and analysis conducted on voting system security, and known and suspected flaws in voting systems;
  2. Complete knowledge of threats to voting systems;
  3. Knowledge equivalent to a Bachelor’s degree in computer science or related field;
  4. Experience in design, implementation, security analysis, or testing of technologies or products involved in voting system; and
  5. Experience in the conduct and management of elections.

Source: New requirement

5.4.2-F OEVT level of effort – test plan

In determining the level of effort to apply to open-ended vulnerability testing, the test lab SHALL take into consideration the size and complexity of the voting system; any available results from the "close ended" functional, security, and usability testing activities and laboratory analysis and testing activities; the number of vulnerabilities found in previous security analyses; and testing of the voting system and its prior versions.

Source: New requirement

5.4.2-G OEVT level of effort – commitment of resources

The OEVT team SHALL examine the system for a minimum of 12 staff weeks.

Source: New requirement

5.4.3 Rules of engagement

5.4.3-A Rules of engagement – context of testing

Open ended vulnerability testing SHALL be conducted within the context of a process model describing a specific implementation of the voting system and a corresponding model of plausible threats.

DISCUSSION

The specification of these models is supported by information provided by the manufacturer as part of the TDP. See Requirement Part 2: 3.5.1.

Source: New requirement

5.4.3-B Rules of engagement – adequate system model

The OEVT team SHALL verify that the manufacturer provided system model sufficiently describes the intended implementation of the voting system.

DISCUSSION

Manufacturer’s system model and associated documentation should reliably describe the voting system and all associated use procedures given the environment in which the system will be used.

Source: New requirement

5.4.3-C Rules of engagement – adequate threat model

The OEVT team SHALL verify that the threat model sufficiently addresses significant threats to the voting system.

DISCUSSION

Significant threats are those that could:

  1. Change the outcome of an election;
  2. Interfere with voters’ ability to cast ballots or have their votes counted during an election; or
  3. Compromise the secrecy of vote.

OEVT team may modify the manufacturer’s threat model to include additional, plausible threats.

Source: New requirement

5.4.4 Fail criteria

5.4.4-A OEVT fail criteria – violation of requirements

The voting device SHALL fail open ended vulnerability testing if the OEVT team finds vulnerabilities or errors in the voting device that violate requirements in the VVSG.

DISCUSSION

While the OEVT is directed at issues of device and system security, a violation of any requirement in the VVSG can lead to failure. Following are examples of issues for which the test lab must give a recommendation of "fail":

  1. Evidence that any single person can cause a violation of a voting system security goal (e.g., integrity of election results, privacy of the voter, availability of the voting system), assuming that all other parties follow procedures appropriate for their roles as specified in the manufacturer’s documentation;
  2. Manufacturer's documentation fails to adequately document all aspects of system design, development, and proper usage that are relevant to system security. This includes but is not limited to the following:
    1. System security objectives;
    2. Initialization, usage, and maintenance procedures necessary to secure operation;
    3. All attacks the system is designed to resist or detect; and
    4. Any security vulnerabilities known to the manufacturer.
  3. Use of a cryptographic module that has not been validated against FIPS 140-2;
  4. Ability to modify electronic event logs without detection;
  5. A VVPR that has an inaccurate or incomplete summary of the cast electronic vote;
  6. Unidentified software on the voting system;
  7. Identified software which lacks documentation of the functionality it provides to the voting device;
  8. Access to configuration file without authentication;
  9. Ability to cast more than one ballot within a voting session;
  10. Ability to perform restore operations in Activated State;
  11. Enabled remote access in Activated State; and/or
  12. Ballot boxes without appropriate tamper evidence countermeasures.

Source: New requirement

6 Comments

Comment by Kevin Baas (Academic)

I that think instead of saying 3. All attacks the system is designed to resist or detect; and 4. Any security vulnerabilities known to the manufacturer. it would be better to know: 3. All attacks the system is NOT designed to resist or detect; and 4. Any security vulnerabilities NOT known to the manufacturer. to accomplish this, instead of - or rather, in addition to - listing these things it would be better to say "list all the points in the process and if each point has or does not have x. For instance, they might say "oh, we use an access database here, and that's pretty secure.. but fail to mention that it's not password protected, though it obviously could be with out much effort. So there should be a list of all points and a checklist for each point, perhaps made by security experts. The goal, of course, is to have as complete a checklist as possible, so community input through something like a wiki would be very helpful in making the list more complete (by reducing the probability that an item is missed). I propose this section be amended with the following additions (or something similar which retains the principle of them): 13. All ballot/vote storage/transfer/processing points/channels in the voting, tabulation, and tabulation reporting system, in the order of flow. (i.e. following the vote from the point of entry to final certification.) And their physical and logical security environment. 14. "Point of entry" includes the software (if a computer-like system is used, such as a touchscreen voting machine) which first records the vote, and by implication, therefore, includes the source code to that software. The source code to the software that runs on the machine should be provided for review, and its vulnerabilities and so forth should be considered as with. Regarding point 13: security environment includes: a. hardware used b. logical access to hardware 1. local access security (password protected account login?, is a screen saver password used?, etc.) 2. network access (through network, modem, etc. (including shared filesystems)) security as local access security, plus network-specific items such as firewalls etc. 3. surveillance (logging, transparency) c. physical access to hardware 1. who is authorized to use the equipment 2. what is required to get into the room where the hardware is stored? (security cards, keys, etc.) 3. who has these tools (to get into the room) and how are they secured? 4. surveillance - cameras, people, etc. b. software and protocols used c. encryption used or lack thereof d. password protection used or lack thereof Regarding point 14: By "computer-like system" I mean a system that includes a subsystem which is "turing complete" or near turing complete, such that it's programmability allows if functionality to be substantially altered. For instance, this applies to systems where computer software is used in the vote entry process, such as "e-voting" machines, due to vulnerabilities inherent in programmable devices. First thing that needs to be cleared up once and for all, is that all claims that the computer program that records the vote is "proprietary software" are bogus, as: 1. Releasing the source code, privately or publicly, does not put the company at a competitive disadvantage. Writing a computer program to record a vote is trivial. (And thus fails to meet the requirements for a software patent.) (And by trivial I mean TRIVIAL.) The manufacturer is not liable to suffer any monetary damage from the private or public disclosure of the source code, as any competent programmer could produce software for recording a vote just as easily without it. Therefore, when it comes to recording a vote, having access to and/or using a competitor's source code would not provide a company with a competitive advantage (as that software is trivial to produce), and therefore would not put the source company at a competitive disadvantage. Any claims to the contrary are bogus. Any honest, competent, computer programmer will tell you this. 2. Releasing the source code, privately or publicly, does subject the company to any potential monetary losses/damages, save those incurred from the consumer being informed of a potential hazard of the product (in this case, the hazard being a flawed election), which information a consumer is legally entitled to for their protection (in this case, the protection of their right to vote). 3. Releasing the source code, privately or publicly, does not create a security vulnerability, as: a. to write source code for a machine that interfaces with input/output devices and runs on the platform, one needs knowledge of the machines _hardware_. b. to install software on a machine one needs physical access to the machine c. if such physical access necessary to install software on the machine were available to a non-specialist, that in itself would constitute a severe security vulnerability. (and therefore a failure of this test.) 4. Withholding the source code from third-party review constitutes a security vulnerability, as: a. without third-party review, the software could be written to do almost anything. It could change votes arbitrarily, or even just ignore votes all together and report an arbitrary total. The total could be preprogrammed, or an outcome could be preprogrammed and a total calculated that reasonably matches the votes but guarantees a pre-programmed outcome. etc. etc. Without explicit software review there is absolutely NO protection against these threats. b. There remains, in fact, even after code review, a number of software-related threats, including but not limited to: 1. Installation testing: The threat that said software is not the software actually installed on the machines. To secure against this, reviewers must be able to install or observe the installation process. 2. Turing completeness: The threat that said software, after being installed on the machine, is not executed by the machine. (That instead, another program is run from previously installed software or from a hardware device such as Read Only Memory.) To make sure that the system is actually running software when it is installed on it, reviewers must be able to provide different programs on the machine and see if the machine executes those instructions and produces the predicted result. (I.e. the machine should be able to demonstrate turing completeness.) 3. Timed code: a system may be set up such that failure scenarios 1 and 2 above may occur only during pre-selected times, such as only on election day. A tester/reviewer must be able to roll forward/back any and all system/internal/connected clocks to an arbitrary date/time, and test the system while it is in a state such that all relevant hardware and software thinks that it is the date/time that the system has been rolled forward (or back) to. The system should be tested in this manner for the date/time that it is supposed to function properly during (such as election day and the day after). c. and as well, the threat of tampering with electronic data, which is much greater than that of tampering with data stored physically, such as on a piece of paper, because data stored electronically is easier to access and alter. To protect against this increased threat, electronic voting systems must provide: 1. Methods, procedures, equipment, storage, and retrieval mechanisms for backing up and securing voting records for later review & verification until a time to be determined by law should be in place and tested. 2. Methods, procedures, equipment, etc. for preventing the digital tampering of voting records should be in place, reviewed, and tested. Such methods include: a. distributed backup/parity check: multiple copies of the data are distributed to different physical locations, such that at a later date they can be compared against each other for discrepancies. (which would constitute evidence of tampering) b. public key encryption: this protects against tampering by making the data essentially read-only. The data is encrypted with an asymmetric encryption scheme, and the decryption key is made public, while the encryption key is kept private, perhaps even randomly generated by the machine at election time, and then wiped from computer memory when all the votes have been recorded, encrypted and transfered. The corresponding decryption key is stored on the machine so that people and widely distributed as soon as possible to prevent tampering (via distributed backup/parity check). In asymmetric encryption, the encryption key cannot decrypt, and the decryption key cannot encrypt. This provides read-only functionality because only those with the encryption key can create data that can be decrypted by the decryption key. Data that cannot be decrypted by the decryption key was not encrypted by the encryption key, and was therefore clearly either "tampered with" after being encrypted, or was not from a person or machine that has the encryption key. There are two weaknesses to this (besides the strength of the encryption): 1. you have to make sure you have the right decryption key (the one corresponding to the one that the desired data was encrypted in, and not some look-alike data that someone else made up and encrypted) Perhaps the machine could produce a key pair right before election, and distribute the public key, and a trial run could be done to ensure that that is in fact, the key that the machine is using. 2. you have to make sure that the encryption key is kept secure, or even "thrown away" immediately after use, so that people can't generate fake voting records that pass the decryption test. c. digital signing. similar to asymmetric encryption, this helps insure that the data is from a trusted source.

Comment by Gail Audette (Voting System Test Laboratory)

All requirements within the VVSG are being tested and the voting system is verified to be in compliance (Parts 1 and 2): however, this requirement is to again fail the voting system if any of those requirements are not met. This is already a basis for voting system certification and not enhanced by the OEVT.

Comment by David Beirne, Executive Director, Election Technology Council (Manufacturer)

Despite assurances that the OEVT would not result in a "failing" mark, this section speaks to a troublesome feature of the new VVSG. The fail criteria reveals the redundant nature of the OEVT as it does not consider the role of the VSTL itself during this process.

Comment by Matt Bishop, Mark Gondree, Sean Peisert, Elliot Proebstel (Academic)

The low priority assigned to the goals in 5.4.1-C Discussion point 7 seems to contradict 5.4.4 part 4 & 5, which state that "the lab must give a recommendation of 'fail'" when it is possible to modify logs and cause incomplete audits to be generated.

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #32. Section 5.4.4-A OEVT Fail Criteria — Failure Interpretation [incomplete] USACM recommends that the following new subsection, with the title above, be added to Section 5.4 as follows: Software testing, including open-ended testing, cannot demonstrate the absence of flaws. Thus, its contribution to the certification process is twofold: a. A final filter to prevent faulty voting system software from achieving certification b. Detect vendors whose development processes are not sufficiently mature to consistently produce high assurance products. The OEVT team should consider a final finding of "failure" to indicate a need to redesign the system or the system testing strategy. DISCUSSION: There is no software testing regimen that can claim comprehensive fault detection. Thus, the best that an OEVT team can hope to do is (1) Detect well-known faults left as a result of immature development processes and (2) Detect subtle faults that the team’s specific skill sets enable them to find and that routine or even mature development processes may not prevent or detect. During deliberations, the OEVT team must assess the vulnerabilities as they apply relative to vendor prescribed procedures. Fail criteria must reflect that an attack based on whether the identified vulnerability would be likely to occur, succeed, and escape detection.

Comment by ACCURATE (Aaron Burstein) (Academic)

This section further defines concrete requirements for vulnerability testing by specifying the fail criteria for vulnerability tests: a system can fail if 1) the vendors system in conjunction with use procedures and security controls do not adequately mitigate significant threats (Part 3:5.4.4-B); or 2) if found vulnerabilities could be used to: "change the outcome of an election, interfere with voters' ability to cast ballots or have their votes counted during an election, or compromise the secrecy of vote [...]" (Part 3:5.4.4-C). Thus, this section should be adopted by the EAC.
5.4.4-B Threat model - failure

Voting systems SHALL fail open ended vulnerability testing if the manufacturer’s model of the system along with associated use procedures and security controls does not adequately mitigate all significant threats as described in the threat model.

DISCUSSION

Team may use a threat model that has been amended based on their findings in accordance with 5.4.3-C.

Source: New requirement

2 Comments

Comment by Brit Williams (Academic)

This section should be deleted. The VTSL will determine whether or not a voting system fails certification testing based on the totality of their findings, including the report from the OEVT.

Comment by ACCURATE (Aaron Burstein) (Academic)

ACCURATE's comments to Part 3:5.4.4-B (including the recommendation to adopt) apply equally to this requirement.
5.4.4-C OEVT fail criteria – critical flaws

The voting device SHALL fail open ended vulnerability testing if the OEVT team provides a plausible description of how vulnerabilities or errors found in a voting device or the implementation of its security features could be used to:

  1. Change the outcome of an election;
  2. Interfere with voters’ ability to cast ballots or have their votes counted during an election; or
  3. Compromise the secrecy of vote without having to demonstrate a successful exploitation of said vulnerabilities or errors.

DISCUSSION

The OEVT team does not have to develop an attack and demonstrate the exploitation of the vulnerabilities or errors they find. They do however have to offer a plausible analysis to support their claims.

Source: New requirement

4 Comments

Comment by Brit Williams (Academic)

This sectiion should be deleted. The VTSL will recommend that the system pass or fail based on the totality of the certification testing, including the report from the OEVY.

Comment by David Beirne, Executive Director, Election Technology Council (Manufacturer)

"The voting device SHALL fail open ended vulnerability testing if the OEVT team provides a plausible description of how vulnerabilities or errors found in a voting device or the implementation of its security features" This section should be stricken as it is too permissive. The OEVT is essentially saying that the security of a voting system doesn't actually have to be penetrated, only a "plausible description" of how the penetration would occur. The bar for failing a voting system has been set low that the OEVT will remain as the final arbiter for the certification of a voting system based on a subjective review and one that only requires a description of events that may cast doubt on the system's security.

Comment by ACCURATE (Aaron Burstein) (Academic)

ACCURATE's comments to Part 3:5.4.4-B (including the recommendation to adopt) apply equally to this requirement.

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #33. Section 5.4.4-C OEVT Fail Criteria - Critical Flaws [incorrect] USACM recommends that subsection 5.4.4-C be modified as follows: The voting device shall fail open-ended vulnerability testing if the OEVT team demonstrates one or more critical flaws that allow an attacker to violate VVSG requirements as specified in paragraph 5.4.4-A above, under a plausible description of how vulnerabilities or errors found in a voting device or the implementation of its security features are used to: a. Change the outcome of an election; b. Interfere with voters’ ability to cast ballots or have their votes counted during an election; or c. Compromise the secrecy of vote without having to demonstrate a successful exploitation of said vulnerabilities or errors. Potential vulnerabilities for which no exploit is demonstrated may be noted as observations, but may not rise to the level of findings. DISCUSSION: OEVT failure is a serious event that may have severe financial ramifications. Thus, it cannot be justified by hypothetical attacks. OEVT testers must be held to high scientific standards that can only be reflected by the three level process of: a. Detecting vulnerability b. Envisioning an exploit for each identified instance and by c. Demonstrating each envisioned attack under plausible conditions.

5.4.5 OEVT reporting requirements

5.4.5-A OEVT reporting requirements

The OEVT team SHALL record all information discovered during the open-ended vulnerability test, including but not limited to:

  1. Names, organizational affiliations, summary qualifications, and resumes of the members of the OEVT;
  2. Time spent by each individual on the OEVT activities;
  3. List of hypotheses considered;
  4. List of hypotheses rejected and rationale;
  5. List of hypotheses tested, testing approach, and testing outcomes; and
  6. List and description of remaining vulnerabilities in the voting system:
    1. A description of each vulnerability including how the vulnerability can be exploited and the nature of the impact;
    2. For each vulnerability, the OEVT team should identify any VVSG requirements violated; and
    3. The OEVT team should flag those vulnerabilities as serious if the vulnerability can result in the violation of one or more VVSG requirements; a change of the outcome of an election; or a denial of service (lack of availability) during the election.

DISCUSSION

Examples of the impact of an exploited vulnerability are over-count of ballots for a candidate; undercount for a candidate; very slow response time during election; erasure of votes; and lack of availability of the voting device during election.

Source: New requirement

5 Comments

Comment by Brit Williams (Academic)

This section needs to be expanded to state that these results will be contained in a report presented to the VSTL and, furthermore, the OEVT will not release any portion of this report to any organization other thant the VTSL.

Comment by Harry VanSickle (State Election Official)

Please explain how the results of testing and threat vulnerability be disseminated to state and county election officials. Access to this information is a critical element for the state’s own certification process. There should be a section that clearly outlines how and when the results of OEVT testing will be made available. Moreover, state and county election officials should have access to the actual OEVT report, not just a condensed report by EAC. Please see our suggestion for language below. Within six (6) weeks after testing is complete at the federal level, EAC shall provide to state election officials a copy of any and all OEVT reports for the voting system.

Comment by Cem Kaner (Academic)

The more reporting, the less testing. .......... If you want time-constrained testing to yield worthwhile test results, most of the time has to be spent on testing (imagining risks,designing tests, implementing tests, executing tests, and evaluating the results). Documentation time is on top of this, and time spent on it subtracts from the time available for the testing. .......... (Affiliation Note: IEEE representative to TGDC)

Comment by Matt Bishop, Mark Gondree, Sean Peisert, Elliot Proebstel (Academic)

Focus on defense in depth We approve of the view that the the Red Teaming exercises must consider the entire system---including physical security and security procedures---as a whole. But the current version of the OEVT section strongly implies that, when procedures ameliorate potential threats to the technological parts of the voting system, the weaknesses in the technology would not be considered flaws, and would go unreported. Thus, the reports from the OEVT would not provide enough information for election officials to know the consequences of failing to follow procedural defenses that mitigate technical flaws, or to determine whether the procedural defenses are appropriate for their locality. In our experience, the fundamental principle of defense-in-depth is best captured by layers of procedural and technical defenses. We would like to see the reporting requirements re-written as follows: Reporting requirements: Include in reporting requirements a list and description of any flaws in the voting system that are remediated by procedures, full descriptions of the associated procedures, and the consequences of not following those procedures. We feel such a discussion would be useful to election officials as they integrate the system's use procedures into their local procedures. In this process, the intention behind a specific procedure may be misunderstood (especially those procedures which serve multiple purposes) and integrated into local procedures in a way that does not address all the vulnerabilities. With a discussion of those system threats for which there are no, or very few, technological defenses in place would better assist officials during system integration.

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

Structured Note-taking [incomplete] USACM recommends adding a paragraph 5.4.5-B as follows: 5.4.5-B. OEVT team process documentation requirement. Each OEVT team will conduct structured note-taking during the analysis. Where possible, all notes will be shared among team members during the entire review period, but must be shared by all members during deliberations, before the final report is prepared. These structured notes become part of the team product and must be delivered along with the OEVT final report. DISCUSSION: It is difficult to overstate the value of structured note-taking during the review process and making the notes database a work-product of each review. The level of continuity it provides between reviews justifies including it as a VVSG requirement. There are also two other benefits that may be equally as important: 1. Process Improvement. Understanding the details of the process that each team goes through can be a gold mine of best practices. 2. Accountability. OEVT is critically dependent on the skill and knowledge of the investigators. Structured note taking provides an avenue to analyze the team’s effort.

5.4.6 VSTL response to OEVT

5.4.6-A VSTL Response to OEVT

The VSTL SHALL examine the OEVT results in the context of all other security, usability, and core function test results and update their compliance assessment of the voting system based on the OEVT.

DISCUSSION

The testing laboratory should examine each vulnerability that could result in the violation of one or more VVSG 2007 requirements; a change of the outcome of an election; or a denial of service (lack of availability) during the election and use the information to form the basis for non-compliance. If significant vulnerabilities are discovered as a result of open-ended vulnerability testing, this may be an indication of problems with test lab procedures in other areas as well as voting system design or implementation.

Source: New requirement

2 Comments

Comment by Kevin Wilson (Voting System Test Laboratory)

This section implies the OEVT team is not necessarily a part of the VSTL. Can the OEVT be members of the VSTL?

Comment by U.S. Public Policy Committee of the Association for Computing Machinery (USACM) (None)

USACM Comment #35. Section 5.4.6. VSTL Response to OEVT [incomplete] USACM recommends changing the first full sentence in Section 5.4.6-A to read: "The VSTL SHALL: 1. Forward the OEVT results to the VSTL licensing authority for their use in assessing vendor development process maturity and to assess potential corrective action; and 2. Examine the OEVT results in the context of all other security, usability, and core function test results and update their compliance assessment of the voting system based on the OEVT." DISCUSSION: The addition of requirement one will encourage feedback to testing lab authorities and the Election Assistance Commission about issues, errors and anomalies uncovered during the testing process that are not connected to specific requirements of the VVSG. Without a feedback process for problems outside the terms of the VVSG, the testing process would be subject to the voting systems equivalent of teaching to the test — covering only those items outlined in the test, and ignoring anything else — regardless of how it could influence voting, voting administration and elections.