LIRICAL vs Exomiser Phenotype-based Prioritization In 75 Rare Disease Cases

Exomiser [Smedley, Robinson 2015] has been widely used for phenotype-based gene prioritization in rare disease cohorts. Five years after publishing Exomiser, the same authors created LIRICAL [Robinson, Smedley 2020] - partly to give users a way to tell whether the highest-ranked results in a given case are low confidence and not worth considering.

The previous blog post evaluated LIRICAL on 75 previously solved phenotypically-heterogeneous cases from the Rare Genomes Project (RGP). The phenotypes of these cases are summarized in the phenotype distribution plot below.

This blog post compares LIRICAL to Exomiser on the same set of cases and finds that:

Exomiser and LIRICAL top-5 accuracy is identical (65%), but LIRICAL has better top-3 accuracy (56%) than Exomiser (49%).
Exomiser and LIRICAL ranks are NOT well correlated (R²=0.05) and rare disease pipelines should consider top hits from both tools.
Let's say users have time to evaluate 10 results per case. They would get better sensitivity by evaluating top-5 from Exomiser and top-5 from LIRICAL than top-10 from just one of the tools. In either scenario, users evaluating 75 cases would be looking at 750 results total, but the set of top-5 results from Exomiser and LIRICAL would include 80% of the true-positive genes, while the top-10 results from only one of these tools would include 72% of true-positive genes.

LIRICAL vs Exomiser Comparison in the LIRICAL Paper [Robinson 2020]

The LIRICAL paper [Robinson 2020] compared performance on 116 previously-solved singleton cases from the 100,000 Genomes Project and showed Exomiser slightly outperforms LIRICAL:

[Robinson et al. 2020] Figure 4

Figure 4: "The x axis shows the rank assigned by LIRICAL or Exomiser to the correct disease gene. The y axis shows the percentage of cases in which the given rank was achieved."

"Considering the 89 diagnoses where Exomiser was not utilized, Exomiser prioritized 57/89 (64%) in first place compared to 51/89 (57%) for LIRICAL."

LIRICAL vs Exomiser on 75 Cases from the Rare Genomes Project

These distributions show the rank that each tool assigns to the correct gene in 75 RGP cases:

LIRICAL and Exomiser rank distributions

To summarize these plots, both Exomiser and LIRICAL rank the correct gene among the top-5 results for 49 out of 75 (65%) of cases, so top-5 performance is identical. Performance varies for other cut-offs:

Top-k results	Exomiser: correct genes	LIRICAL: correct genes
top 5	49 out of 75 (65%)	49 out of 75 (65%)
top 4	46 out of 75 (61%)	45 out of 75 (60%)
top 3	37 out of 75 (49%)	42 out of 75 (56%)
top 2	30 out of 75 (40%)	40 out of 75 (53%)
top 1	12 out of 75 (16%)	29 out of 75 (39%)

Also, the median and mean ranks of the correct gene differ between the 2 tools:

	Exomiser	LIRICAL
Median rank of correct gene	4 *	2 *
Mean rank of correct gene	19 *	33 *

* NOTE: For cases where the correct gene wasn't detected by a tool at all, its rank was arbitrarily set = 200 before calculating the median and mean ranks.

LIRICAL and Exomiser Prioritize Different Genes

Interestingly, LIRICAL and Exomiser ranks aren't well correlated (R² is 0.05) as this plot shows.
NOTE: Here rank=200 is again a special value that means the correct gene wasn't in the results at all for that tool.

To summarize this plot:

Quadrant	The Correct Gene is in this Quadrant
Bottom-Left: Both Exomiser and LIRICAL top-5	38 out of 75 (50%) cases
Upper-Left: Exomiser but not LIRICAL top-5	11 out of 75 (15%) cases
Bottom-Right: LIRICAL but not Exomiser top-5	11 out of 75 (15%) cases
Upper-Right: Neither tool's top-5	15 out of 75 (20%) cases

Conclusion:

Let's say users have time to evaluate 10 genes per case. They would get better sensitivity by evaluating top-5 from Exomiser and top-5 from LIRICAL rather than the top-10 results from one of the tools. In either scenario, users evaluating 75 cases would be looking at 750 results total, but the set of top-5 results from Exomiser and LIRICAL would include 60 correct genes, while the top-10 results from only one of these tools would include 54 correct genes.

What Explains the Low Concordance Between LIRICAL and Exomiser?

It's likely due to how they handle phenotype prioritization. Exomiser uses model organism data (mouse & zebra fish) in addition to OMIM gene-disease associations. It also uses protein-protein interactions to check phenotype match scores of adjacent genes. All of these are combined into an Exomiser phenotype match score which is then combined with a variant pathogenicity score to get the final "exomiser score". LIRICAL computes variant pathogenicity scores the same way as Exomiser (reusing its code and reference data), but does its own thing for phenotypes - using only OMIM data and computing a post-test probability. Spot-checking one of the cases where Exomiser ranked the correct gene as #1 and LIRICAL ranked it as #5: 2 out of the 4 false-positive LIRICAL genes weren't in the Exomiser results at all, and 2 were quite far down in the list.

Interestingly, the number of HPO terms specified per case does not significantly differentiate the tools.
NOTE: Here rank=200 is again a special value that means the correct gene wasn't in the results at all for that tool.

LIRICAL and Exomiser Command-Lines

The analysis above prefiltered variants to gnomAD v3.1 popmax allele frequencies < 0.01. It then used these commands to run LIRICAL and Exomiser:

Exomiser command used default settings as defined in this exome_analysis.yml:

java -jar exomiser-cli-13.0.1.jar --analysis exome_analysis.yml

LIRICAL command used LIRICAL v1.3.4 with default settings except --min-diff 200 tells LIRICAL to output up to 200 results regardless of their post-test probability.

java -jar LIRICAL.jar P -p *.phenopacket.json -e /exomiser-cli-13.0.0/2109_hg38 --tsv --mindiff 200

Extra Plots

RGP Phenotype Distribution: This shows HPO term categories for the 75 RGP cases analyzed above. Cases with terms in multiple categories are counted once in each category:

Exomiser Rank vs Exomiser Score: The Exomiser rank and combined score reported for each correct gene are correlated (R²=0.32). Partly because of this, I don't include exomiser scores in the above analysis.

LIRICAL vs. Exomiser Total Number of Results per Case: This is the total number of results reported by each tool.
NOTE: running LIRICAL with --mindiff 200