Briefings in bioinformatics, cilt.27, sa.2, 2026 (SCI-Expanded, Scopus)
Accurate determination of antibody-antigen (Ab-Ag) complex structures is critical for therapeutic development. While deep learning-based methods, beginning with AlphaFold2 (AF2), have revolutionized multimer predictions, the optimal strategies for Ab-Ag modeling, and the reliability of their confidence scores remain active areas of research. This study evaluates the performance of AF2, Boltz-1, Boltz-1x, Boltz-2, Chai-1, Protenix, Protenix-1, OpenFold3, and ESMFold, on a curated dataset of 200 Ab-Ag complexes. Among the nine methods tested, Protenix-1 emerged as the top performer, with Chai-1 consistently ranking second across multiple success metrics, closely followed by AF2. We observed diverse effects of recycling iterations, with AF2, Chai-1, and Protenix variants benefiting from increased cycles, unlike Boltz variants. We analyzed various model confidence scores, noting high precision from pDockQ2 and high recall from predicted Template-Modeling (pTM) score. By integrating these two scores, we developed antibody confidence (AntiConf), a novel metric that achieves superior performance for all methods in terms of precision and recall. These strengths make AntiConf a valuable post score for both computational predictions and downstream experimental workflows, reflecting its potential to improve Ab-Ag complex predictions by AF2 and AF3 architectures. Altogether, this study addresses current limitations in deep learning-based Ab-Ag complex prediction, showcasing the potential of AntiConf for future assessment studies, and providing a guideline for improving the accuracy of Ab-Ag complex prediction.