The MolDx program conducts detailed technology assessments for molecular tests, and have templates to guide this assessment in many test categories. Where instructions are not specific, MolDx expects "industry norms" to be followed such as validation guidelines for that type of test from AMP or other authorities.
On July 1, 2022, MolDx updated 3 of the tech assessment guidelines:
GEN CQ 003 - Tech Assessment CHECKLIST (hint- it's now "Version 9")
MRS PF 020 - Molecular Tests for Risk Stratification (this is "NEW" altogether)
PGX PF 007 - Tech Assessment for Pharmacogenomics (major update)
The Tech Assessment Checklist is a flow-chart based form that guides you to what you need to do with questions like "Is this an NGS test?" It's adapted to include the brand new MRS Risk Stratification checklist as a landing point, for example.
The new MRS checklist is detailed, and requires detailed performance criteria, such as sensitivity, specificity, AUC, PPV, NPV, "excess biopsies avoided" (where relevant), etc.
This actually lays out a point I've mentioned to startup clients recently. MolDx won't ask if your test measures "6" in positive cases and "7" in negative cases, and p<.05 between the two. Rather, MolDx will ask what the PPV is, the biopsies avoided (or other outcomes). A test can easily have a p<.05 value between Group A and Group B yet have very little value in distinguishing them because 80% of the cases in A and B might overlap, despite the small difference in the average. A different test might also be p<.05 between the groups but have a cutpoint that can separate the cases with little overlap.
(See the fine paper on Spectrum Effect by Juliet Usher-Smith, BMJ 2016.)
Risk stratification is a common purpose (such as the Oncotype Dx test for low and high risk breast cancer), and previously MolDx had provided little instruction as how to show the test's real validity. This should help, and also provides a guide for test developers during the clinical study and data write-up process.
I'd note, however, that risk stratification scores may not be a good fit to concepts like the requested "sensitivity" and "specificity." (For example, the Oncotype DX results in low scores with a 5% or less chance of recurrence, and high scores with a 20% or more chance of recurrence, but these aren't really "sens/spec" types of statistics.) Then again, AUC can also be misleading, since it's based on gold standard for truth, and it's not adjusted for relative frequency (base rate) and assumes no in-between cases other than positive and negative.
On the special needs of prognostic (risk stratification) tests, see new article by Lee et al. (and the Dutch Bossuyt team), QUAPAS (with a P): Adaptation of the QUADAS 2 Tool to Assess Prognostic Accuracy Studies, Ann Intern Med, July 2022 here. They view the shift to prognostic test assessment as important enough as to issue "QUAPAS" with a P for prognostic tests, different that QUADAS with a D for diagnostic tests. The authors write, "Studies to evaluate prognostic tests are longitudinal, which introduces sources of bias different from those for diagnostic accuracy studies. At present, systematic reviews of prognostic tests often use the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool to assess risk of bias and applicability of included studies because no equivalent instrument exists for prognostic accuracy studies."
QUAPAS (Quality Assessment of Prognostic Accuracy Studies) is an adaptation of QUADAS-2 for prognostic accuracy studies.
The updated PGX checklist is a big upgrade, and is much more detailed. MolDx began using markedly upgraded tech assessment forms in recent months, such as for infectious pathogen molecular tests a few months ago.