Caracterización del complejo mayor de histocompatibilidad clase II en primates del género Aotus Carlos Fernando Suárez Martínez “Tesis de Doctorado presentada como requisito para optar por el título de Doctor en Ciencias Biomédicas y Biológicas de la Universidad del Rosario” Bogotá D.C., 2017 Caracterización del complejo mayor de histocompatibilidad clase II en primates del género Aotus Estudiante Carlos Fernando Suárez Martínez Directores Manuel Alfonso Patarroyo Gutiérrez M.D., Dr.Sc. Fundación Instituto de Inmunología de Colomba (FIDIC) Universidad del Rosario Luis Fernando Cadavid Gutiérrez M.D., Ph.D. Universidad Nacional de Colombia DOCTORADO EN CIENCIAS BIOMÉDICAS Y BIOLÓGICAS UNIVERSIDAD DEL ROSARIO Bogotá. D.C., 2017 Agradecimientos Quiero expresar mi gratitud a mi familia, especialmente a mis padres, por su apoyo constante y por ser mi brújula moral. A mis directores, el Doctor Manuel Alfonso Patarroyo y el Doctor Luis Fernando Cadavid, por sus aportes, por la libertad y por la confianza con la que me permitieron desarrollar el proyecto. Al Profesor Manuel Elkin Patarroyo, por su generosidad, tenacidad e inspiración. A mis colegas de la FIDIC, especialmente a Carolina López, Hugo Bohórquez y Ronald González, pues sin su apoyo y aportes, este proyecto no habría sido posible. A la Universidad del Rosario, por la extraordinaria oportunidad de desarrollar mis estudios, especialmente a la Doctora Luisa Matheus, por su diligencia y colaboración para hacer todos los procesos lo más sencillos posibles. Contenido Resumen .......................................................................................................................................................................................... 1 Summary .......................................................................................................................................................................................... 2 Introducción .................................................................................................................................................................................... 3 Aotus, generalidades y distribución ............................................................................................................................................. 3 Aotus como modelo experimental ................................................................................................................................................ 4 Caracterización de las moléculas del sistema inmune de Aotus para corroborar su idoneidad como modelo experimental ...... 5 Complejo mayor de histocompatibilidad. Generalidades ............................................................................................................. 5 CMH. Polimorfismo y convergencia ............................................................................................................................................. 7 CMH. Polimorfismo y repertorio de presentación ........................................................................................................................ 8 CMH. Predicción de péptidos de unión ...................................................................................................................................... 10 Estudio de la interacción CMH-péptido usando métodos cuánticos .......................................................................................... 12 Arquitectura del CMH y diseño de vacunas ............................................................................................................................... 13 Objetivos ........................................................................................................................................................................................ 16 Objetivo General ........................................................................................................................................................................ 16 Objetivos Específicos ................................................................................................................................................................. 16 Preámbulo a los capítulos ........................................................................................................................................................... 17 Polimorfismo .............................................................................................................................................................................. 17 Tipos de sustitución de aminoácidos ......................................................................................................................................... 19 Evaluación y análisis de la unión CMH-péptido. ........................................................................................................................ 20 Capítulo 1. Characterisation and comparative analysis of MHC-DPA1 exon 2 in the owl monkey (Aotus nancymaae) ............... 21 Capítulo 2. Characterising a Microsatellite for DRB Typing in Aotus vociferans and Aotus nancymaae ....................................... 60 Capítulo 3. Structural analysis of owl monkey MHC-DR shows that fully protective malaria vaccine components can be readily used in humans............................................................................................................................................................................... 91 Capítulo 4. Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements ...... 116 Capítulo 5. Semi-empirical quantum evaluation of peptide – MHC class II binding ..................................................................... 153 Conclusiones generales ............................................................................................................................................................. 177 Perspectivas y recomendaciones ............................................................................................................................................. 179 Referencias .................................................................................................................................................................................. 180 Anexo 1. Diccionario de bolsillos del CMH-DRB .......................................................................................................................... 188 Anexo 2. TCR-contacting residues orientation and HLA-DR* binding preference determine long-lasting protective immunity against malaria .............................................................................................................................................................................. 194 Anexo 3. Estimación de la frecuencia en poblaciones humanas de los linajes alélicos del CMH-DRB ...................................... 220 Anexo 4. Uso de la metodología FMO-PIEDA en el análisis del efecto de mutaciones en proteínas ......................................... 222 LISTA DE FIGURAS Y TABLAS Figura 1. Organización genómica del HLA y disposición de los dominios de CMH I y II. ............................................... 6 Figura 2. Arquitectura del CMH-DR. ............................................................................................................................. 9 Figura 3. Persistencia en la estructura secundaria y red de enlaces de hidrógeno en el CMH-DR. ............................ 14 Figura A, Anexo 1. Humano/Aotus MHC-DRB Bolsillo 1 - Perfiles. ……………………………………………………………………….189 Tabla 1, Anexo 1. Perfiles de bolsillo más frecuentes en el HLA-DRB…………………………………………………..……………….190 Tabla 2, Anexo 1. Perfiles de bolsillo más frecuentes en el Aotus-MHC-DRB………………………………………..…………..... 192 Resumen El estudio del complejo mayor de histocompatibilidad (CMH) de los monos del género Aotus, y la comprensión del proceso de unión CMH-péptido, son importantes para entender las semejanzas y diferencias en la respuesta inmune entre humanos y los monos del género Aotus. Esto tiene implicaciones para el uso apropiado y la validez de las conclusiones alcanzadas, cuando se utilizan estos animales como modelos experimentales en el desarrollo de vacunas y fármacos. El presente trabajo tiene como propósito contribuir al conocimiento del complejo mayor de histocompatibilidad clase II de los monos Aotus. Con la determinación de la secuencia de los genes del CMH-DPA y CMH-DRA, se ha completado la caracterización del CMH de los monos Aotus, contribuyendo a la validación de este primate como modelo experimental, y aumentando el conocimiento en la evolución de los genes del CMH en primates. Además, se profundizó en el análisis de convergencia y polimorfismo de los genes del CMH-DR en primates. Adicionalmente, se implementaron metodologías de modelación computacional de la unión CMH-péptido (basadas en química quántica y redes neurales), como herramientas necesarias para entender los mecanismos de presentación de péptidos por parte del CMH clase II a los linfocitos T. El estudio del polimorfismo de la región de unión al péptido, permitió el desarrollo de estrategias (perfiles de bolsillos) para reducir eficientemente el número de sistemas a considerar en el diseño de péptidos a ser usados como candidatos a vacuna contra la malaria. Usando minería de datos sobre distribuciones de Ramachandran, se desarrolló una escala de similitud estructural de aminoácidos, con el fin de implementar su uso en el desarrollo de péptidos candidatos a vacunas. Adicionalmente, se encontró que la estructura secundaria de las proteínas tiene una relación clara con los patrones evolutivos de sustitución y la mutabilidad de los aminoácidos. Así, se ha generado un marco de conceptual que contribuye al desarrollo de vacunas basadas en péptidos, que tiene como base el estudio del polimorfismo del complejo mayor de histocompatibilidad, las restricciones fisicoquímicas/estructurales que moldean el proceso de reconocimiento molecular involucrado en la interacción CMH-péptido y la aplicación de metodologías computacionales para cuantificar el proceso de unión CMH- péptido. 1 Summary Studying the Aotus major histocompatibility complex (MHC) and understanding MHC- peptide binding are important issues for recognizing similarities and differences regarding immune response between humans and Aotus. This has implications for the appropriate use and validity of the conclusions reached when these animals are used as experimental models when developing vaccines and drugs. This work was aimed to contribute to increase our knowledge on the MHC class II in monkeys from the genus Aotus. Determining the sequences of MHC-DPA and MHC-DRA genes has allowed to complete the characterisation of the Aotus MHC, contributing towards validating the role of this primate as experimental model and increasing our knowledge regarding MHC gene evolution in primates. It also dealt with in–depth analysis of MHC-DR genes’ convergence and polymorphism in primates. The study involves computational modelling of MHC-peptide binding methodologies (based on quantum chemistry and neural networks) as necessary tools for understanding the mechanisms of MHC class II peptide presentation to T-lymphocytes. Studying peptide binding region polymorphism has enabled developing strategies (pocket profiles) for efficiently reducing the amount of systems to be considered when designing peptides to be used as candidates for an antimalarial vaccine. Data-mining regarding Ramachandran distribution led to developing an amino acid structural similarity scale for use in developing/designing peptides as vaccine candidates. It was found that protein secondary structure has a clear relationship with amino acid substitution and mutability evolutionary patterns. A conceptual framework thus emerged aimed at developing peptide-based vaccines as a basis for studying the mayor histocompatibility complex polymorphism, the physicochemical/structural restrictions shaping the molecular recognition involved in MHC-peptide interaction and using computational methodologies for quantifying MHC- peptide binding. 2 Introducción Aotus, generalidades y distribución Todas las especies de este género se caracterizan por tener una talla pequeña (50 – 80 cm), con un peso entre 500 -1000 gramos. Su pelaje varía entre gris y marrón brillante, con una coloración rojiza alrededor de su cuello, en la cara interna de sus extremidades y en la base de la cola, que no es prensil. La clasificación taxonómica de las especies de Aotus es compleja debido a su enorme similitud morfológica, lo que ha dificultado establecer un consenso sobre su número. Varios estudios taxonómicos con base en las características fenotípicas y citogenéticas, y la distribución geográfica de los monos Aotus, han permitido proponer la existencia de 9 a 12 especies de Aotus desde Panamá hasta el norte de Argentina (3-7). Existe registro fósil del género en la fauna del mioceno medio en la Venta, Colombia, datado en 12 – 15 Millones de años (Aotus dindensis) (8, 9). El origen del género se data hace aproximadamente 20 Millones de años (~19,3, usando 54 genes nucleares (10) o ~20,0 millones de años usando genomas mitocondriales (11)). Se ha estimado la divergencia de las especies actuales, con base en la caracterización de varias regiones mitocondriales entre 3,1 – 6,4 millones de años (12) o usando genes nucleares entre 3,2 – 7,9 millones de años (10). Las especies de este género se encuentran en altitudes que van desde el nivel del mar hasta 3.200 metros en bosques húmedos tropicales y subtropicales. Es el único grupo de primates neotropicales nocturnos, lo que representa una ventaja adaptativa para su reproducción y supervivencia (5-7, 13-18). Siete especies han sido reportadas en Colombia hasta la fecha: Aotus zonalis (región del pacífico norte), A. griseimembra (costa atlántica y región andina), A. lemurinus (costa atlántica y región andina), A. brumbacki (departamento del Meta), A. vociferans (región amazónica), A. nancymaae (región amazónica) y A. jorgehernandezi (región andina) (3-7, 19). 3 Aotus como modelo experimental La disponibilidad de modelos experimentales animales bien caracterizados es fundamental para el desarrollo de métodos terapéuticos, contribuyendo además a la investigación en inmunología comparada y en la evolución del sistema inmune. La necesidad de primates como modelos animales se resalta por la inhabilidad de otros modelos animales ampliamente usados (como el murino) de presentar susceptibilidad a enfermedades o procesos infecciosos específicos de los seres humanos (por ejemplo, la hipertensión y la osteoporosis ocurren naturalmente en todos los primates). La información experimental obtenida en primates es más fácilmente extrapolable a seres humanos y a otros primates, lo que permite determinar la eficacia de tratamientos en casos donde otros modelos animales fallan (20-22). Durante los últimos 35 años, los monos del género Aotus (Familia Aotidae, Parvorden Platyrrihini) han sido usados en el desarrollo de una vacuna contra la malaria por el Instituto de Inmunología del Hospital San Juan de Dios, y posteriormente por la Fundación Instituto de Inmunología de Colombia (FIDIC) (23, 24). Algunas especies de Aotus han sido usadas desde hace más de 50 años como modelo para el estudio de la malaria (25, 26). A diferencia de otros modelos primates, los Aotus son susceptibles a la infección con esporozoítos, lo que permite el desarrollo de vacunas y fármacos para el tratamiento en todas las fases de la enfermedad (21). Estos monos también son susceptibles a otras enfermedades humanas, como leishmaniosis, esquistosomiasis, hepatitis, tuberculosis, y varios tipos de infecciones entéricas como campylobacteriosis, siendo también usados para el desarrollo de fármacos y estudio de estas enfermedades (27-33). Aotus también es uno de los modelos primates mejor conocidos de fisiología de la visión y electrofisiología del sistema nervioso central (34). Todo lo anterior, sumado a su facilidad de manejo en laboratorio (talla, adaptación, longevidad) son ventajas que hacen de los primates de este género un valioso modelo, y justifican la profundización en el conocimiento biológico de las especies que a él pertenecen. 4 Caracterización de las moléculas del sistema inmune de Aotus para corroborar su idoneidad como modelo experimental Distintos componentes clave del sistema inmune de los monos Aotus han sido caracterizados: KIRs (35), CD1 (36), CD3 (37), CD45 (38, 39), IGKV (40), IGHV (41), TCR (42-44), algunas de las citoquinas (45), receptores similares a Toll (en inglés, Toll- like receptors) (46), células dendríticas (47), células T (48), perfil linfo-proliferativo (49), y los esplenocitos (50). Además de las anteriores, son de especial interés los genes del complejo mayor de histocompatiblidad (CMH). Las proteínas codificadas por los genes del CMH juegan un papel central en el reconocimiento de lo propio y lo ajeno, al efectuar la presentación de los péptidos para su reconocimiento por las células T, siendo fundamentales en la defensa contra los agentes extraños. La variación genética del CMH es clave para entender la respuesta a las vacunas por parte de los hospederos (51, 52). En Aotus, se han caracterizado tanto los genes de clase I (53-55), como los de clase II (56-63). Aotus muestra una alta identidad (>~80%) al compararlo con humanos, en todas las moléculas del sistema inmune caracterizadas hasta el momento, demostrando la viabilidad de su uso para obtener resultados extrapolables a humanos (64). Complejo mayor de histocompatibilidad. Generalidades Los genes del CMH conforman una familia multigénica que codifica para glicoproteínas receptoras expresadas en la membrana celular. Estas proteínas juegan un papel central en el reconocimiento de lo propio y lo ajeno, siendo piezas clave en la defensa contra agentes extraños, al efectuar la presentación de los péptidos para su reconocimiento por las células T. En humanos y otros primates, éstos se organizan en un clúster con otros genes mayoritariamente relacionados con el sistema inmune, y se dividen en tres regiones cromosómicas (I, II y III), reflejando también especializaciones funcionales. Este arreglo está relativamente conservado en todos los mamíferos: (I) La región de los genes de CMH clase I, cuya región de unión al péptido está constituida por dos dominios 5 (1 y 2) que son codificados por un solo gen, y son expresados en todos los tipos celulares nucleados. El CMH clase I presenta péptidos de origen intracelular a los linfocitos T CD8+. En esta región, también se encuentran otros genes críticos para el procesamiento de antígenos como la tapasina. (II) En la siguiente región, se encuentran los genes de clase II, su región de unión al péptido es codificada por dos genes (cadenas y ), y son expresados en células presentadoras de antígeno como los monocitos, macrófagos, linfocitos B, etc., presentando péptidos a los linfocitos T CD4+, que han sido adquiridos primordialmente por endocitosis/fagocitosis de proteínas exógenas o por carga directa en la superficie; y (III) la región de los genes de clase III, que codifican para otros componentes del sistema inmune, como el sistema de complemento (vg. C2, C4, factor B) y citoquinas (vg. TNF-α) (Figura 1). Figura 1. Organización genómica del HLA y disposición de los dominios de CMH I y II. A. Representación del complejo mayor de histocompatibilidad humano en el cromosoma 6p21 ( Tomado de (1)). B. Arquitectura de dominios de CMH I y II (gráfica propia). 6 La región del MHC muestra sintenía entre todos los mamíferos, y en humanos se encuentra en el cromosoma 6, comprendiendo 140 genes y un tamaño de 3,6 Mpb (65), siendo posible usarla como patrón para la caracterización del sector en otros primates, como es el caso de Macaca fascicularis, en donde la región tiene un tamaño de 4,3 Mpb (66). CMH. Polimorfismo y convergencia El CMH incluye los genes más polimórficos en los vertebrados y se constituyen en un modelo paradigmático para el estudio de los mecanismos de adaptación a nivel molecular (67-69). A manera de ejemplo, para los loci más polimórficos en humanos (HLA-B en clase I y HLA-DRB en clase II), se han reportado a la fecha en la base de datos IMGT-HLA (70) más de 4.800 alelos de HLA-B, así como más de 2.100 alelos de HLA-DRB. La dinámica poblacional (cambios en el tamaño de las poblaciones y deriva génica), la recombinación, conversión génica, y la selección natural, son fuerzas que causan y modelan el polimorfismo del CMH (71-73). Se han propuesto múltiples procesos que mantienen el polimorfismo del CMH (74): por selección balanceada, bien sea por sobredominancia (75, 76), por selección dependiente de frecuencia (77, 78), o por variación espacial / temporal de la presión ejercida por patógenos (52, 79, 80). Otros mecanismos, no directamente asociados a la relación hospedero-patógeno, tienen que ver con patrones de apareamiento dependientes de características relacionadas a olores o simetría, que buscan obtener la mayor heterocigosidad posible para la progenie, evitando la endogamia y favoreciendo el emparejamiento con individuos con distinto repertorio de CMH (81-89). También existen mecanismos reproductivos, relacionados con la fertilización selectiva (90, 91). El CMH también muestra polimorfismo trans- específico (TSP) adaptativo, presentándose alelos de larga duración, que son compartidos por varias especies (92-94). 7 Adicionalmente, el estudio de la evolución del CMH-DRB ha mostrado que existe una convergencia a nivel molecular en la región de unión al péptido entre primates del nuevo mundo (Platyrrhini) y primates del viejo mundo (Catarrhini) (60, 62, 95, 96). La existencia de alelos en común entre primates del nuevo mundo y primates del viejo mundo, puede estar relacionada con la conservación de motivos de unión a péptidos (97). La iniciativa para el estudio de los monos Aotus en la FIDIC, nos ha permitido caracterizar diversos genes del CMH en las especies A. nancymaae, A. vociferans y A. nigriceps, centrándose en la variabilidad, comparándolos con humanos y otros primates; todos estos estudios se han enfocado esencialmente en la región de unión al péptido (en el caso de CMH clase II, ésta es codificada por el exón 2, y para clase I, en los exones 2 y 3). Se ha realizado la caracterización del CMH clase I (53, 54), así como del CMH clase II: DQA y DQB (58), DPA (61), DPB (59) y DRB (56, 60, 62, 63). Hasta el momento, la evidencia indica que el locus más polimórfico del complejo mayor de histocompatibilidad clase II en Aotus es el CMH-DRB (como en humanos y en otros primates), en contraposición de un CMH-DRA con muy bajo polimorfismo, seguido por CMH-DQ (A y B) y por ultimo CMH-DP (A y B) (a diferencia de humanos, y similar al mono Rhesus) (98-100). A pesar que la divergencia entre monos del nuevo mundo y humanos se puede datar en aproximadamente ~43 millones de años (10, 11, 101), estos estudios señalan que los genes del CMH de Aotus y humanos presentan algunas semejanzas, bien sea por homología (CMH clase I) (53, 54) o por convergencia (CMH clase II DRB) (60, 62). CMH. Polimorfismo y repertorio de presentación El proceso de presentación de antígenos, tiene un paso critico en la unión de los péptidos al CMH para su presentación al receptor de los linfocitos T (Figura 2A). El receptor del CMH está constituido por una región de unión al péptido que está formado por un conjunto de subreceptores denominados bolsillos de unión (pockets, en inglés) (Figura 2B). Típicamente para clase II, el péptido es anclado en una región de unión de 9 8 aminoácidos, existiendo múltiples marcos de unión, dado que el surco es abierto (Figura 2C), a diferencia de CMH clase I, en donde el surco de unión es cerrado, lo que impone un marco de unión único (102, 103). Estas características, hacen que las moléculas del CMH clase II posean un repertorio de ligandos mayor que las moléculas de clase I (104). Figura 2. Arquitectura del CMH-DR. A. Estructura del HLA-DR1 presentando el péptido de la Triosa-fosfato isomerasa al TCR (PDB=2IAN). B. Vista desde arriba del HLA-DR1 (PDB=1DLH) p resentando el péptido de hemaglutinina; en purpura el bolsillo 1, en azul oscuro el bolsillo 4, en naranja el bolsillo 6, en gris el bolsillo 7 y en verde el bolsillo 9. C. Vista desde arriba de la región d e unión al péptido (en gris), mostrando los posibles marcos de unión del péptido al CMH. La e structura abierta del receptor permite la existencia de múltiples marcos de unión (Gráfica propia generada a partir de las coordenadas descargadas del Protein Data Bank (PDB)). 9 Así, el repertorio de péptidos capaces de ser reconocidos por una molécula específica de CMH, a pesar de ser vasto, cuenta con restricciones que obedecen a la arquitectura del receptor. Esta restricción en el repertorio de presentación está relacionada con diversidad de moléculas observada en el CMH y evidencia de ello es que tal diversidad alélica está concentrada en los residuos que intervienen en el proceso de unión al péptido (67-69). Existe una relación entre la diversificación de linajes en busca de la mayor capacidad de presentación posible y la existencia de variantes alélicas y polimorfismo trans-específico (105). El espectro de péptidos que pueden ser unidos por las moléculas de CMH puede superponerse. Así, se pueden definir supertipos de CMH con base en este espectro. Esta similitud en la capacidad de unión generalmente está relacionada con una significativa similitud en las secuencias que constituyen los bolsillos de unión y tiene implicaciones en la resistencia de las poblaciones naturales (106-111). La estimación de la capacidad de unión de péptidos al CMH es de primordial interés para optimizar el diseño de vacunas (103), y desde el punto de vista de la validación del modelo experimental, es de interés estudiar sí la similitud entre los bolsillos de unión de humanos y monos Aotus implica la existencia de repertorios similares de unión de péptidos. CMH. Predicción de péptidos de unión Estimar experimentalmente la unión de péptidos al CMH es un procedimiento complejo. La obtención de un receptor viable para los ensayos de unión, bien sea por purificación a partir de líneas celulares inmortalizadas, o por la expresión de estas moléculas usando la tecnología de ADN recombinante, requieren de procedimientos dispendiosos y costosos. Adicionalmente, el número de sistemas a estudiar es enorme, dado el polimorfismo de las moléculas del CMH y la diversidad de péptidos con potencial de unión (112, 113). Teniendo en cuenta lo anterior, la implementación y desarrollo de metodologías computacionales para estudiar y predecir la interacción CMH-péptido, es una alternativa racional y necesaria. 10 Los métodos computacionales para la estimación de la interacción CMH-péptido pueden ser divididos en métodos basados en secuencia y métodos basados en estructura. Los primeros, usan datos de unión experimentales como punto de partida, para la generación de motivos de unión por posición (114-121), métodos de inteligencia artificial (redes neuronales como NetMHCIIpan) (122-125), modelos ocultos de Markov (126-128) y máquinas vectoriales/kernels (129-132). Estas aproximaciones, pretenden resolver el problema de la predicción únicamente, pero que no aportan conocimiento en términos de la naturaleza del proceso de unión entre el péptido y el CMH. Los métodos basados en estructura, estiman la energía de unión péptido-CMH, basándose en las propiedades estructurales y no requieren de entrenamiento con datos de unión experimentales. A primera vista, resulta más atractivo este enfoque, pues además de ser predictivo, permite disecar los procesos involucrados en el proceso de unión. El enfoque mayoritariamente usado para calcular la energía de unión, usa aproximaciones de la mecánica molecular clásica, como dinámica molecular, en donde usando campos de fuerza que describen los tipos y magnitud de las interacciones involucradas, se estima el cambio en la energía libre de Gibbs durante la formación del complejo CMH-péptido, el cual se define como la diferencia en la energía libre entre el péptido libre y ligado (133-141). Tanto los métodos de predicción basados en estructura, como los basados en secuencia, ofrecen la promesa de predecir la unión CMH-péptido, reduciendo el costo de la verificación experimental de tal proceso “en húmedo”. Sin embargo, el enfoque basado en inteligencia artificial, se ha desarrollado más rápidamente que el enfoque estructural, produciendo resultados prometedores (especialmente para moléculas de clase I), superiores hasta ahora a los métodos estructurales, pero con resultados dependientes del set de datos (cantidad y calidad) usado para su entrenamiento. Desarrollar un enfoque metodológico basado únicamente en las propiedades estructurales inferidas de la secuencia, resulta lo más adecuado en el caso de Aotus, en donde no se han desarrollado los medios necesarios para hacer ensayos de unión CMH-péptido. 11 Estudio de la interacción CMH-péptido usando métodos cuánticos El estudio de la interacción entre CMH-péptido usando métodos de química teórica computacional, ha sido una de las líneas de la investigación en la FIDIC, en donde hemos apostado por el análisis de estos sistemas desde la química cuántica, usando métodos ab initio (142-148). Esta aproximación, se ha centrado en la comprensión de los mecanismos de interacción entre receptor-ligando (centrándose principalmente en el análisis de los residuos de los bolsillos de unión), usando propiedades electrostáticas como los momentos multipolares, potencial electrostático y análisis de la función de onda, para identificar los orbitales que contribuyen a la unión CMH-péptido. Como resultado, se han identificado los residuos clave en la interacción CMH-péptido, la descripción del paisaje electrostático, así como la importancia relativa de cada bolsillo en el proceso de unión y la estimación de los perfiles de unión de aminoácidos por bolsillo. Estos hallazgo reproducen las tendencias experimentales observadas (132, 149), demostrando la plausibilidad y el poder descriptivo de esta aproximación. A pesar de la solidez de este enfoque, el costo computacional (hardware vs. tiempo) que impone el estudio de macromoléculas desde la mecánica cuántica usando enfoques ab initio, limita el tamaño de los sistemas a estudiar. El desarrollo en la última década de métodos semi-empíricos, estrategias de procesamiento paralelo y técnicas de fragmentación, han permitido solucionar este problema, haciendo posible analizar proteínas completas en tiempos razonables (150, 151). Así, hemos implementado los métodos semi-empíricos PM7 (152) y DFTB (153) para tratar proteínas; estos métodos semi-empíricos de química cuántica, se fundamentan en los mismos formalismos que los métodos ab initio (teoría de Hartree-Fock para el primero y teoría del funcional de la densidad en el segundo), pero hacen diversas simplificaciones y obtienen algunos parámetros de datos empíricos para compensar las imprecisiones derivadas de tales simplificaciones (151, 154). Adicionalmente, nuestro grupo ha implementado el método de fragmentación orbital molecular (FMO, Fragment Molecular Orbital) (155, 156) junto con PIEDA (pair interaction decomposition analysis) (157), que 12 dividen la molécula en fragmentos (en este caso, en la escala de aminoácidos) y hace cálculos de energía para cada uno de ellos, permitiendo obtener las propiedades del sistema global o de partes del mismo, por la combinación de las de los fragmentos. Como resultado, hemos sido capaces de simular el proceso de unión entre CMH-péptido, considerando la totalidad del sistema, con resultados que superan en precisión, los obtenidos por otros enfoques basados en estructura (158). Este enfoque permite la evaluación detallada de los efectos causados por substituciones de aminoácidos en proteínas, enfoque que puede ser aplicado tanto al análisis del CMH, como de otros sistemas (159). Arquitectura del CMH y diseño de vacunas En el desarrollo de vacunas basadas en péptidos, la FIDIC ha implementado una metodología fundamentada en la modificación de péptidos derivados de regiones conservadas de las proteínas de los parásitos, que resultan ser críticas en múltiples funciones biológicas, incluyendo el proceso de invasión a las células hospederas (HABPs, high activity binding peptides) (24, 160). La modificación de tales péptidos, obedece a principios de substitución, que involucran propiedades fisicoquímicas (como masa, volumen y polaridad) y estructurales, como la distancia entre los residuos de anclaje al CMH (161), orientación de las cadenas laterales (162), y su estructura secundaria (163); que en ultimas, producen cambios que modifican la afinidad del péptido al CMH, desencadenando una respuesta inmune contra estos sectores, que de otra forma, son inmunológicamente silentes (24). En particular, el ajuste al CMH-DR tiene especial relevancia en el desarrollo de una vacuna contra la malaria, dado que la inmunidad al parásito es principalmente controlada por esta molécula (164-166), no solo en humanos, sino en otras especies (110, 111). Nuestros estudios han demostrado la similitud a nivel de polimorfismo, presiones selectivas y correlación con actividad inmune, entre el CMH-DR de humanos y Aotus (57, 60, 64, 161, 166, 167). 13 Figura 3. Persistencia en la estructura secundaria y red de enlaces de hidrógeno en el CMH-DR. A. Diez péptidos unidos al CMH-DR de humanos y de murinos. Nótese la notable conservación en la estructura secundaria y en la orientación de las cadenas laterales. En negrilla, las posiciones que se anclan a los bolsillos 1, 4, 6 y 9. B. Conformación de estructura secundaria de 50 complejos CMH-péptido, incluyendo moléculas de CMH-DR tanto de humano como de ratón. (A. y B. son gráficas propias). C. Vista desde arriba de la red de puentes de hidrogeno del péptido de hemaglutina con el HLA-DR1 (tomado de (2)). 14 La mayor contribución a la unión entre péptido y el CMH, está dada por la contribución de un conjunto de enlaces de hidrógeno conservados que interactúan con el esqueleto del péptido. Esto implica que los péptidos unidos, poseen estructuras secundarias que son variaciones alrededor de un mismo tema (Figura 3). Las cadenas laterales de los aminoácidos que interaccionan con los bolsillos de unión, aportan una interacción especifica que modula la afinidad de la unión (103, 168-170). La estructura secundaria consenso de los péptidos que se unen al CMH, es denominada hélice de poliprolina (PPII) y junto a las hélices alfa y hojas beta, son las tres estructuras estables observadas en proteínas naturales. Esta estructura favorece los procesos de interacción proteína- proteína y es frecuente encontrarla en sitios de unión (163, 171). Así, teniendo en cuenta la influencia de los variables estructurales, especialmente las tendencias de estructura secundaria en las interacciones CMH-péptido, para el diseño de péptidos de unión, es necesario el establecimiento de un instrumento que permita realizar substituciones, siguiendo un criterio de similitud estructural. Previamente, habíamos caracterizado los aminoácidos de acuerdo a propiedades no estructurales, lo que ha permitido el establecimiento de principios de substitución (144). Usando la información de estructuras cristalizadas disponibles y analizando sus distribuciones de Ramachandran, hemos establecido una medida cuantitativa de la similitud estructural de los aminoácidos, que puede mejorar el diseño péptidos con la capacidad de adoptar la configuración favorable para su unión al CMH, y que además muestra un hallazgo fundamental: las tendencias estructurales, junto con la masa, explican los patrones de substitución evolutivos de los aminoácidos (172). 15 Objetivos Objetivo General o Completar la caracterización del complejo mayor de histocompatibilidad de clase II (CMH-DPA, CMH-DRA y CMH-DRB) en las especies Aotus nancymaae y Aotus vociferans. Objetivos Específicos o Caracterizar las secuencias de los genes CMH-DPA, CMH-DRA, y CMH- DRB en Aotus nancymaae y Aotus vociferans. o Realizar un análisis comparativo de la evolución de los genes CMH-DPA, CMH-DRA y CMH-DRB en el contexto de los primates. o Estudiar los patrones y la naturaleza de las variaciones a nivel de proteína de las moléculas clásicas del complejo mayor de histocompatibilidad clase II CMH-DRA y CMH-DRB, modelando su estructura y perfiles de unión de péptidos con métodos computacionales. 16 Preámbulo a los capítulos El hilo conductor de este trabajo, se centra en la resolución de problemas relacionados con el desarrollo de vacunas para uso en humanos, usando como modelo animal los monos del género Aotus. Así, este trabajo es la continuación de los esfuerzos realizados para caracterizar el sistema inmune de estos primates y establecer la magnitud de la similitud entre humanos y Aotus, representado la oportunidad de comprender los modos de evolución de estas moléculas. También es la continuación del desarrollo y aplicación de métodos para dilucidar los mecanismos involucrados en la unión CMH-péptido, usando un enfoque computacional. En el caso del desarrollo de una vacuna contra la malaria por parte de la FIDIC, la metodología desarrollada se centra en el diseño de péptidos que deben unirse exitosamente al CMH como condición sine qua non para que ocurra una respuesta de protección exitosa. Existen varios problemas a tener en cuenta en las estrategias de diseño de “péptidos a la medida” para el CMH:  Polimorfismo (tanto en humanos como en Aotus).  Tipos de sustitución de aminoácidos que deben hacerse, para garantizar el ajuste de los péptidos al CMH.  Evaluación y análisis de la unión CMH-péptido. Polimorfismo El hecho de que las moléculas del CMH sean tan polimórficas, dificulta enormemente el diseño de péptidos, dado que, a diferencia de otros problemas de diseño molecular, los receptores no son únicos y, por lo tanto, el número de soluciones posibles se incrementa enormemente. Así, además de estimar la magnitud de este polimorfismo, es necesario diseñar estrategias para manejarlo. 17 En este trabajo, se completó la caracterización de los genes que codifican para el dominio alfa del CMH-DP (61) (capítulo 1) y CMH-DR, ambos mostrando un limitado polimorfismo. Por otra parte, el polimorfismo del CMH-DRB de Aotus es muy grande, encontrándose en la caracterización efectuada, no solamente nuevos alelos, sino nuevos linajes alélicos en estos primates (62) (capítulo 2). La caracterización experimental de este polimorfismo es un reto en si misma, por lo que se describió un microsatélite asociado al intrón 2 del CMH-DRB en Aotus, para evaluar su capacidad de discriminación de los distintos alelos del CMH-DRB, con resultados prometedores (62) (capítulo 2). Así, en cuanto al polimorfismo del CMH-DR en Aotus, encontramos que éste es similar al observado en humanos, con un CMH-DRA con un limitado polimorfismo, y un CMH-DRB muy polimórfico. La modelación computacional de sistemas con miles de receptores y millones de péptidos es impensable. Con el fin reducir el número de moléculas para describir la arquitectura de los bolsillos de unión y poder hacer inferencias basadas en modelos computacionales, se siguió la estrategia de enfocarse únicamente en los residuos críticos en el proceso de unión definidos cristalográficamente. De esta forma, se generó un “diccionario de bolsillos” del CMH-DRB (Anexo 1), por medio del cual se puede reducir el número de sistemas a considerar de manera efectiva. Por ejemplo, solamente 27 bolsillos 1 se encuentran en humanos y Aotus, representando dos de ellos del 91% en humanos y el 72% en Aotus (el dimorfismo VG en la posición 86, Figura A, anexo 1). Cada alelo del CMH-DRB puede ser descrito como la concatenación de distintos los distintos bolsillos para generar un “perfil de bolsillos” que permite reducir directamente el número de alelos a considerar para el diseño de péptidos. A manera de ilustración, para el linaje alélico HLA-DRB1*01, existen 130 alelos reportados, pero, dos perfiles caracterizan el 68% de los alelos descritos. Los perfiles se nombran de acuerdo a un “alelo prototipo” que es representante del conjunto de alelos que comparten el mismo perfil en un linaje alélico determinado. En el anexo 1, tablas 1 (HLA-DRB) y 2 (Aotus CMH-DRB), se muestran los perfiles que cubren al menos el 60% de los alelos estudiados para cada linaje. 18 Sobre cada perfil de bolsillo, se puede realizar el diseño de péptidos usando la información de unión experimental disponible y los principios metodológicos previamente descritos de sustitución de aminoácidos. Para evaluar la capacidad de unión de forma rápida y relativamente precisa, se optimizó el uso del algoritmo NetMHCIIpan 3 (122), usando un conjunto reducido de alelos prototipo, que cubren la mayoría de perfiles de bolsillo de todos los linajes alélicos humanos. Esta estrategia se ha implementado con éxito en el diseño de péptidos que inducen protección de largo término (161) (Anexo 2). Adicionalmente, los perfiles de bolsillos para cada linaje alélico pueden ser usados para extrapolar el cubrimiento potencial de los péptidos diseñados sobre éstos en las poblaciones. Para ello, se realizó una minería de sobre base de datos AFND (Allele Frequency Net Database) (173). Una estimación de la frecuencia de los linajes alélicos del CMH-DRB en poblaciones humanas (Anexo 3), permite evaluar el cubrimiento potencial como el producto de la probabilidad de encontrar un determinado linaje alélico en una población, por la probabilidad del perfil de bolsillo en tal linaje. Este enfoque ha sido seguido para calcular el cubrimiento potencial de péptidos diseñados para unirse tanto a alelos humanos como de Aotus (64) (Capítulo 3). Adicionalmente, en este artículo, se explora el alcance de la similitud de los perfiles de bolsillos entre humanos y Aotus desde un punto de vista estructural y fisicoquímico. Tipos de sustitución de aminoácidos Siendo la tendencia a adquirir una conformación extendida (PPII) necesaria para el ajuste de los péptidos al CMH, se propuso determinar una clasificación basada en las propiedades estructurales de los aminoácidos, analizando los patrones de estructura secundaria en proteínas biológicas, haciendo minería sobre la base de datos PGD (Protein Geometry Database) (174). Como resultado, se obtuvo una clasificación de aminoácidos acompañada de una medida cuantitativa de su similitud, que puede ser usada en el modelamiento y diseño de péptidos También se logró hacer un aporte inédito 19 en el entendimiento de los patrones de sustitución evolutivos en proteínas biológicas y su relación con la estructura secundaria (172) (Capítulo 4). Evaluación y análisis de la unión CMH-péptido. El uso de una estrategia optimizada para la estimación de la unión CMH-péptido usando el método basado en redes neurales NetMHCIIpan 3 (122), permite una evaluación rápida de esta interacción. Sin embargo, queda aún mucho espacio para innovar en este campo, usando aproximaciones más precisas y con la capacidad de brindar información de las fuerzas interactuantes en el proceso de unión CMH-péptido. Así, se ha implementado el uso de métodos cuánticos con resultados que sobrepasan la capacidad predictiva de los métodos disponibles (158) (Capítulo 5). Cabe anotar, que el uso de estas alternativas constituye una segunda línea metodológica, que permite profundizar en el análisis de las fuerzas interactuantes que moldean el proceso de unión y no son, por el momento, métodos de tamización. Sin embargo, la implementación de estas estrategias para el estudio de los efectos de sustituciones en sistemas proteicos resulta muy prometedora, dada su precisión y capacidad explicativa (159) (Anexo 4). * * * 20 Capítulo 1. Characterisation and comparative analysis of MHC- DPA1 exon 2 in the owl monkey (Aotus nancymaae) Suarez CF, Patarroyo MA, Patarroyo ME. Characterisation and comparative analysis of MHC-DPA1 exon 2 in the owl monkey (Aotus nancymaae). Gene. 2011;470(1-2):37-45. La versión publicada del artículo puede ser consultada en: http://www.sciencedirect.com/science/article/pii/S0378111910003823 21 Title: Characterisation and comparative analysis of MHC-DPA1 Exon 2 in the owl monkey (Aotus nancymaae) Authors: Carlos F. Suárez M. 1, 2, Manuel A. Patarroyo 1, 2, Manuel E. Patarroyo 1, 3  Addresses and Affiliations: 1 Fundación Instituto de Inmunología de Colombia (FIDIC), Carrera 50 No. 26-20, Bogotá, Colombia. 2 Universidad del Rosario, Calle 63D No. 24-31, Bogotá, Colombia. 3 Universidad Nacional de Colombia, Carrera 45 No. 26-85 Bogotá, Colombia.  Corresponding Author: Prof. Manuel Elkin Patarroyo. E-mail: mepatarr@gmail.com Fax: (57-1) 4815269 Telephone: (57-1) 4815219 Abstract: The Aotus nacymaae (owl monkey) is an important animal model in biomedical research, particularly for the pre-clinical evaluation of vaccine candidates against Plasmodium falciparum and Plasmodium vivax, which require a precisely typed major histocompatibility complex. The exon 2 from Aotus nancymaae MHC-DPA1 gene was characterised in order to infer its allelic diversity and evolutionary history. Aona-DPA1 shows no polymorphism, and is related to other primate DPA alleles (including Catarrhini and Platyrrhini); constituting an ancient trans- specific and strongly-supported lineage with different variability and selective patterns when compared to other primate-MHC-DPA1 lineages. A. nancymaae monkeys have thus a smaller MHC-DP polymorphism than MHC-DQ or MHC-DR. Key words: Animal model; MHC class-II molecule; molecular evolution; new world monkeys; Platyrrhini. Abbreviations: Major histocompatibility complex (MHC), antigen-presenting cells (APC), peptide binding region (PBR), new world monkeys (NWM), old world monkeys (OWM), neighbour joining (NJ), minimum-evolution (ME), maximum likelihood (ML), local rearrangements of tree topology around an edge (LRSH), parsimony (Pars), global rate minimum deformation method (GRMD), million years (MY), single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), random effects likelihood (REL), substitution per site per million years (Sub/S/MY), trans-specific polymorphism (TSP). 1. Introduction Major histocompatibility complex (MHC) Class II molecules display peptides on the surface of antigen-presenting cells (APC) for subsequent recognition by T-cells, thereby performing a key defence role against pathogens. MHC Class II molecules are heterodimers assembled from an α and a β glycopeptide chains encoded by the MHC Class II A and B genes, respectively. Three main MHC Class II loci, named HLA-DR, -DQ, and -DP, encode functional antigen-presenting molecules in primates. Genetic polymorphism and diversifying selection tied to functional and structural restrictions are common characteristics of these main loci. Such polymorphism is mainly restricted to the second exon of MHC class II A and B genes, constituting the molecule’s peptide binding region (PBR) (Klein et al., 1993b). MHC-DP is an ancient locus, shared by divergent mammalian orders (Takahashi et al., 2000; Yuhki et al., 2003). However, its polymorphism and functionality vary. For example, MHC-DP acquires a pseudo-genic nature in felines, as also occurs in murinae (mouse-like rodents), even though MHC-DP is the most polymorphic MHC Class II locus in other rodents, such as the mole rat (Spalax genus) (Klein et al., 1993a; Yuhki et al., 2003; Kelley et al., 2005). MHC-DP is the most centromeric locus within the primate MHC gene cluster region, being constituted by four genes: DPA1 and DPB1 genes and DPA2 and DPB2 pseudo-genes. This arrangement (position and number) is apparently the same in all primates and was established before the split between Platyrrhini and Catarrhini  43 million years ago (MY) (Klein et al., 1993a; Steiper and Young, 2006). MHC-DPA1 variability in primates varies amongst non-existent and low polymorphism whilst for MHC-DPB1, it fluctuates from moderate to high polymorphism (Otting and Bontrop, 1995; Slierendregt et al., 1995; Bontrop et al., 1999; Doxiadis et al., 2001). HLA-DPA1 exhibits low polymorphism in humans, where 28 alleles have been reported to date, compared to the 138 alleles described for HLA-DPB1 (Robinson, et al., 2003). In contrast, Callithrix jacchus (the common marmoset, a neo-tropical primate), has the MHC-DP region inactive, not expressing any MHC-DP molecule (Antunes et al., 1998). In spite of such low polymorphism, MHC-DPA1 can be important in modulating an immune response, since HLA-DPA1*0301 appears to be involved in the genetic susceptibility to Schistosoma haematobium and several chronic inflammatory diseases (May et al., 1998; Dai et al., 2010). Previous studies have characterised Aotus MHC Class II genes and molecules: MHC DQA-DQB (Diaz et al., 2000), MHC-DRB1 (Niño-Vasquez et al., 2000; Suarez et al., 2006) and MHC-DPB1 (Diaz et al., 2002). These neo-tropical primates have been shown to be susceptible to various human infectious diseases (Lujan et al., 1986; Polotsky et al., 1994; Noya et al., 1998). They can develop human malaria, particularly Plasmodium falciparum (Gysin, 1988; Rodriguez et al., 1990; Collins, 1994) and Plasmodium vivax asexual/blood stage infections (Pico de Coana et al., 2003). This makes the owl monkey a highly valuable animal model for biomedical research. To complete this landmark, the study of MHC-DPA1 might play a key role in understanding the immune response against Plasmodium (Diaz et al., 2002) and contributes towards gaining a deeper knowledge about the immune system of owl monkeys. The exon 2 from A. nancymaae MHC-DPA1 gene was characterised to infer its allelic diversity, variability patterns, the amount and kind of its variation, the type of changes involved, as well as the extent of natural selection and evolutionary relationships within the primate context. 2. Materials and Methods 2.1 Animals Six Aotus nancymaae monkeys (4 males, 2 females) were randomly caught from different familiar groups in Lagos de Leticia and Atacuari River, two widely separated zones (80 Km) in the Colombian Amazon. The monkeys were captured with the authorisation of the official environmental authority of Colombia in this region, CORPOAMAZONIA, which granted the Fundación Instituto de Inmunología de Colombia (FIDIC) permission for the capture, study and scientific research with these primates in the Colombian Amazon (Resolutions #1966/2006 and 0028/2010 and previous authorisations beginning in 1982). This research has been performed following the guidelines approved by FIDIC’s ethics committee. The studied animals have been always under the supervision of expert veterinarians and biologists, and after experimental procedures they are released back into the Amazon jungle in optimal health conditions in the presence of a representative from CORPOAMAZONIA. 2.2 RNA extraction, cDNA synthesis, PCR, cloning and sequencing Leukocytes were obtained from six healthy A. nancymaae monkeys by density gradient separation of peripheral blood obtained by venous puncture. Total cellular RNA was isolated from peripheral blood mononuclear cells using the TRIzol one-step procedure (Invitrogen Life Technologies, CA, USA). Moloney murine leukaemia virus reverse transcriptase (Promega, Madison, WI, USA) was used for cDNA synthesis, according to the Manufacturer’s instructions. Two PCR of MHC DPA1 exon 2 were independently performed for each monkey; PCR primers used were GH98 (5'-CGCGGATCCTGTGTCAACTTATGCCGCG-3') and GH99 (5'- CTGGCTGCAGTGTGGTTGGAACGCTG-3') (Otting and Bontrop, 1995) at a final 0.8 μM concentration. The PCR mixture contained 1.5 μM MgCl2, 50 mM Tris (pH 8.3) and 2.5 U Taq DNA polymerase (Promega). Five microlitres of cDNA were added to each reaction for a 25 μl final volume. These reactions were heated to 95°C for 5 min and then amplified for 40 cycles as follows: denaturing for 30 s at 94°C, annealing for 1 min at 65°C and extension for 2 min at 68°C. A final extension cycle was run at 65°C for 1 min and 68°C for 5 min. A WIZARD PCR Preps Purification kit (Promega) was used for purifying PCR products which were then ligated into pGEM T vector (Promega). MiniPreps Purification Kit (Mo Bio, Carlsbad, CA, USA) was used for isolating double-strand plasmid DNA. Three clones from each PCR were randomly chosen and sequenced using fluorescent dye-labelled dideoxy terminators (Applied Biosystems, Foster City, CA, USA) in an ABI Prism 310 genetic analyser (Applied Biosystems). 2.3 MHC-DPA1 sequences 64 exon 2 MHC-DPA1 gene sequences from 11 primates (suborder Anthropoidea) were used. Platyrrhini (New World monkeys – NWM): Aotus nancymaae –Owl Monkey- (Aona, one sequence, reported here) and Saimiri sciureus -squirrel monkey- (Sasc, three sequences); Catarrhini: Cercopithecoidea (Old World monkeys – OWM): Macaca arctoides -stump-tailed macaque (Maar, one sequence), Macaca fascicularis -crab-eating macaque- (Mafa, six sequences), Macaca mulatta -rhesus monkey- (Mamu, 17 sequences) and Papio hamadryas -hamadryas baboon- (Paha, one sequence); Hominoidea (humans and apes): Homo sapiens –human- (HLA, 25 sequences), Pan troglodytes –chimpanzee- (Patr, three sequences), Gorilla gorilla –gorilla- (Gogo, three sequences), Pongo pygmaeus –Bornean orangutan- (Popy, three sequences), and Pongo abelii -Sumatran orangutan- (Poab, one sequence). The following are the GenBank accession numbers of the studied sequences: Aona-DPA1*01- AF529200, Gogo-DPA1*0401-AF026701, Gogo-DPA1*0402-AF026702, Gogo-DPA1- CU104655, HLA-DPA1*010302-AF074848, HLA-DPA1*010304-DQ274060, HLA- DPA1*0104-X78198, HLA-DPA1*0105-X96984, HLA-DPA1*010601-U87556, HLA- DPA1*010602-EU729350, HLA-DPA1*0107-AF076284, HLA-DPA1*0108-AF346471, HLA- DPA1*0109-AY650051, HLA-DPA1*0110-DQ274061, HLA-DPA1*020101-X78199, HLA- DPA1*020102-L31624, HLA-DPA1*020103-AF015295, HLA-DPA1*020104-AF074847, HLA-DPA1*020105-AF098794, HLA-DPA1*020106-AF165160, HLA-DPA1*020203- AF092049, HLA-DPA1*02021-X79475, HLA-DPA1*02022-X79476, HLA-DPA1*0203- Z48473, HLA-DPA1*0204-EU304462, HLA-DPA1*0301-M83908, HLA-DPA1*0302- AF013767, HLA-DPA1*0303-AY618553, HLA-DPA1*0401-L11643, Maar-DPA1*0201- AF026703, Mafa-DPA1*0201-AF026704, Mafa-DPA1*0202-EF208806, Mafa-DPA1*0204- AM943632, Mafa-DPA1*0401-EF208808, Mafa-DPA1*0701-EF208809, Mafa-DPA1*0702- EF208810, Mamu-DPA1*0101-Z32411, Mamu-DPA1*0201-EF204945, Mamu-DPA1*0203- EF204950, Mamu-DPA1*0208-FJ544416, Mamu-DPA1*0401-FJ544417, Mamu-DPA1*0402- FJ544415, Mamu-DPA1*0403-GQ471885, Mamu-DPA1*0601-EF204949, Mamu-DPA1*0701- EF204946, Mamu-DPA1*0801-EU305663, Mamu-DPA1-AB219099, Mamu-DPA1-AB219100, Mamu-DPA1-AB219101, Mamu-DPA1-AB250754, Mamu-DPA1-AB250756, Mamu-DPA1- AB219102, Mamu-DPA1-AB250757, Paha-DPA1*0201-AF026706, Patr-DPA1*0201- AF026707, Patr-DPA1*0202-AF026693, Patr-DPA1*0301-AF026694, Poab-DPA1-AC207096, Popy-DPA1*0201-AF026695, Popy-DPA1*0202-AF026696, Popy-DPA1*0401-AF026697, Sasc-DPA1*0501-AF026698, Sasc-DPA1*0502-AF026699, Sasc-DPA1*0601-AF026700 2.4 Sequence analysis Clustal X (Thompson et al., 1997) was used for aligning the MHC-DPA1 exon 2 sequences. The A. nancymaae sequence was included and an amino acid alignment was also performed. HLA- DRA1*010101 and HLA-DQA1*010101 were used as outgroups. The resulting alignment had a total of 189/63 nucleotide/amino acid positions (supplementary material 1 and 2). GENEDOC (Nicholas, et al., 1997) was used for calculating the percent of identity (ie, equal positions between sequences) and similarity (ie, positions with conservative substitutions between sequences, in this case, assessed by the PAM 250 substitution matrix) in the considered alignments. Means and standard deviations of pairwise nucleotide and amino acid identity and similarity (this last one for amino acid sequences only) inside each group of sequences were analytically calculated. Each position’s variation for MHC-DPA1 exon 2 amino acid aligned sequences was represented by using WebLogo (Crooks et al., 2004). All amino acids occupying each position were indicated, in which the height of every amino acid letter represented its relative frequency in that position. The logo also allowed conservative and non-conservative substitutions for each position to be determined, where the variation in an amino acid symbol’s colour indicated non-conservative changes and its preservation represented conservative changes based on PAM 250 substitution matrix groups (DENQH/ SAT/ KR/ FYW/ LIVM/ C/ G and P) (Dayhoff M et al., 1978). 2.5 Phylogenetic analysis Neighbour Joining (NJ) and Minimum-evolution (ME) (Rzhetsky and Nei, 1993) trees were constructed using MEGA 4.0 (Tamura et al., 2007). Genetic distances were estimated by using Kimura 2-parameter (Kimura 1980), Log-Det (Tamura and Kumar, 2002) and Maximum Composite Likelihood (Tamura et al., 2004) substitution models for nucleotide sequences and JTT (Jones et al., 1992) and Dayhoff (Schwarz and Dayhoff, 1979) substitution models for amino acid- deduced sequences. Bootstrap analysis (Hillis and Bull, 1993) and interior branch test (IBT) (Sitnikova, 1996), both with 10000 replicates, were used for assigning confidence levels to branch nodes. Nodes having bootstrap values greater than 70% were statistically significant, as well as internal branch test values greater than 95%. Maximum likelihood (ML) (Felsenstein, 1981) trees were constructed using TREEFINDER (Jobb et al., 2004) and DNAML / PROTML included in the PHYLIP package (Felsenstein, 1989); Bootstrap analysis (Hillis and Bull, 1993), with 10000 replicates, was used for assigning confidence levels to branch nodes. Genetic distances for TREEFINDER were calculated by using the estimated model from data following AICc criteria, in this case, HKY (Hasegawa et al., 1985) substitution model for nucleotide sequences and JTT (Jones et al., 1992) substitution model for amino acid sequences. Bootstrap analysis (Hillis and Bull, 1993) and local rearrangements of tree topology around an edge (LRSH) (Shimodaira and Hasegawa, 1999), both having 10000 replicates, were used for assigning confidence levels to branch nodes. Nodes having LRSH values greater than 95% were considered statistically significant. Parsimony (Pars) (Felsenstein, 1983) trees were constructed using MEGA 4.0 (Tamura et al., 2007) and DNAPARS, both included in the PHYLIP package (Felsenstein, 1989). Bootstrap analysis (Hillis and Bull, 1993), with 10000 replicates, was used for assigning confidence levels to branch nodes. A Bayesian approach was also used for inferring phylogenetic relationships using Mr. Bayes (Ronquist and Huelsenbeck, 2003). Default settings for the GTR model with gamma-distributed rate variation across sites and a proportion of invariable sites for nucleotide sequences and a mixed model for amino acid sequences, were used. Two simultaneous Markov chain Monte Carlo analyses were performed using one cold and three heated chains (temperature set to default 0.2) for each analysis. Simulations were run for 15.000.000 generations with a tree being saved each 100th generation. At approximately ten million generations for the nucleotide alignment and 11 million generations for the amino acid alignment, the standard deviation of split frequencies reached a <0.01 value, indicating that both analyses converged on similar trees. The last 25% generations were preserved as burn-in and generated a consensus tree. Nodes having posterior probability values of 85 to 89 were considered to have low statistical support, 90 to 94 to have moderate support and nodes greater than 95 to be highly supported (Huelsenbeck and Ronquist, 2001). 2.6 Tree Calibration Global Rate Minimum Deformation method (GRMD), implemented in TREEFINDER software (Jobb et al., 2004), was used to estimate the evolutionary rates of DPA groups deduced from the Bayesian tree (calculated in MrBayes for nucleotide sequences). As calibration points the divergence time amongst: Catarrhini – Platyrrhini 42.9 million years (MY) (36.1–51.1 MY) Platyrrhini - Platyrrhini, 21.0 (19.15-22.05 MY), Catarrhini - Catarrhini, 30.5 MY (26.9–36.4 MY), Hominoidea - Hominoidea, 18.3 MY (16.3–20.8 MY), Homo - Pan 6.6 MY (6 - 7 MY), M. mulatta - M. fascicularis 0.9 MY, were used (Goodman et al., 1998; Opazo et al., 2006; Osada et al., 2008). 2.7. Natural selection analysis Natural selection was detected using single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL) and random effects likelihood (REL) methods using HYPHY (Kosakovsky-Pond et al., 2005). These maximum likelihood-based methods estimated the rates of non-synonymous and synonymous changes at each site in the sequence alignment and identified sites under positive or negative selection (Kosakovsky-Pond and Muse, 2005; Kosakovsky-Pond and Frost, 2005b). For SLAC and FEL methods, a p-value ≤ 0.1, whilst for REL, the Bayes factor of 50 were considered as significant. The algorithms are available on the Datamonkey Web (Kosakovsky- Pond and Frost, 2005a; Poon et al., 2009). Also MEGA 4.0 software was used for calculating synonymous and non-synonymous substitutions and associated variance rates (assessed by the bootstrap method with 1,000 replicates) by Nei–Gojobori’s method (Nei and Gojobori, 1986). 2.8. 3D representations. Positions under variation/selection were represented in a 3D model of each Pocket (including adjacent residues within a range of 5Å) for DPA, from crystallized DPA1 complex (PDB 3LQZ from DPA1*0103 - DPB1*0201) (Dai, et al., 2010) using VMD 1.87 (Humphrey et al., 1996). 3. Results 3.1. MHC Aona-DPA1 sequence The MHC-DPA1 exon 2 from six A. nancymaae monkeys was amplified by RT-PCR. Amplification products had a 189 bp size, corresponding to exon 2 positions 34 -222 (12 - 74 in α domain). 36 clones were sequenced yielding an identical sequence. Analysed sequences, including the Aona-DPA1 sequence, are shown in supplementary material 1 (Exon 2) and supplementary material 2 ( domain). 3.2 Evolutionary analysis of Aona-DPA1 exon 2 Independently of the tree construction method (Bayesian, Parsimony, NJ, ME or ML) or the substitution model assumed, MHC-DPA1 exon 2 sequences analysed clustered into similar groups. For sake of simplicity, five MHC-DPA1 groups were defined (Fig. 1): Group one, supported by a high posterior probability value and LRSH value, formed by alleles DPA1*05 and DPA1*07 from all Antropoidea groups, including the A. nancymaae DPA sequence. MHC Aona-DPA1 was clearly included in MHC-DPA1*05 lineage, having high statistically supported values in all phylogenetic methods used. Group two, supported by LRSH, formed mainly by DPA1*01 and DPA1*03 alleles from Catarrhini groups, but mainly conformed by human sequences. Group three, formed mainly by HLA-DPA1*02 sequences and supported by LRSH. Group four contains sequences from all Antropoidea groups distributed in four well supported subgroups: DPA1*04 from Hominoidea; DPA1*04 from Cercopithecidae; DPA1*06 from S. sciureus (Platirrhini) and M. mulatta (Cercopithecidae) and a subgroup conformed of two unnamed alleles from M. mulattta. Group five comprises Catarrhini sequences, primarily DPA1*02 sequences from Cercopithecidae and also DPA1*02 from P. pygmaeus. All group associations were relatively well conserved at protein-deduced sequence level, but some not so well supported (data not shown). Group 1, Group 2, Group 4 and Group 5 displayed a trans-species or convergent nature (Fig. 1). Moreover, some sequences were identical amongst species. 3.3 Evolutionary rate estimation in primate MHC-DPA1 exon 2 Aona-DPA1*01 exon 2 appears as one of the most divergent sequence amongst primate MHC- DPA1 sequences. A tree calibration was carried out in order to establish whether divergence corresponds to a high evolutionary rate or corresponds to a long time of existence (Fig. 1). For sake of simplicity, it has been assumed that the divergence times used as calibration points for MHC-DPA1 exon 2, correspond to the divergence time amongst species. As can be seen, primate MHC-DPA1 groups are divided in two tendencies: groups 1, 4 and 5 have similar rates, between 3.8 to 4.5  10-3 Sub/S/MY, evolving about 4 – 4.5 times slower than groups 2 and 3, which have a rate between 1.7 to 1.8  10-2 Sub/S/MY. Within groups, the rates are often very variable. For example, in group 1, the subgroup MHC- DPA1*05 is formed by Sasc-DPA1*0501, *0502 and Aona-DPA1*01, with an evolutionary rate about 10 times slower than the rate of the subgroup formed by sequences from Macaca MHC- DPA1*07, being the rate of this group the highest observed in the analysis (7.3  10-3 vs. 7.9  10- 2 Sub/S/MY). In contrast, Mafa-DPA1*0702 shows the lowest evolutionary rate observed (9.1  10-4 Sub/S/MY). This pattern of variability occurs within all groups considered. The different evolutionary constrains amongst alleles and species may be reflected by the rate variation within and amongst the studied groups. 3.4 Primate MHC-DPA1 exon 2 variability Overall identity at nucleotide level was high, having a 94% mean (88% - 100% range) (Fig. 1). The logo of the deduced amino acid sequence of MHC-DPA1  domain for the set of all analysed species which was remarkably conserved, having 95.1% mean similarity (88% - 100% range) and 90.7% identity (75% - 100% range) (Fig. 2). In general, most amino acid substitutions were non- conservative (24 from 33 variable positions) considering all sequences analysed (Antropoidea DPA, Fig. 2). Group 1 and Group 4 displayed a greater amount of sequence variability, followed by Group 5, whereas the remaining lineages showed a most conservative nature (at nucleotide and amino acid identity, and at amino acid similarity, Fig. 1 and Fig. 2). Aona-DPA1 possessed distinctive nucleotide and amino-acid substitutions (16Q→H, 31I→M, 54V→F, 56V→A, 65A→I), being the most non-conservative (Fig. 2, supplementary material 1 and supplementary material 2). This characteristic highlights its divergent nature, shared with other NWM-DPA sequences. Most variable positions at nucleotide and amino acid levels were grouped within positions 50 and 74 in amino acid sequence (150 and 222 in nucleotide sequence, red line in Fig. 2). This sector includes most of the residues involved in the interaction with peptide (Pocket residues) at PBR, as assigned by homology with HLA-DPA1*0103 (Dai et al., 2010). The region between amino acids 12 and 49 was more conserved (34 to 150 in nucleotide sequence, black line in Fig. 2). The variability of MHC-DPA1 exon 2 is concentrated especially in Pocket residues and their neighbours. The most variable is the Pocket 9, followed by Pockets 6 and 1. Each group varies in a distinct way at Pocket level, i.e., for both nucleotide and amino acids, group 4 is the most variable at Pocket 1, group 5 is the most variable at Pocket 6, group 1 is the most variable at Pocket 9 and group 3 only varies at Pocket 9 (Fig. 2). The substitution pattern at codon level in the PBR is concentrated in first and second positions in all groups, with exception of group 4, in which all codon positions exhibit equivalent variability. In the remaining sequences, substitutions in the third codon position prevail (supplementary material 3). 3.5 Natural selection in primate MHC-DPA1 exon 2 No complete correspondence between SLAC, FEL and REL selection tests for all the analysed positions was observed, only some common positions being detected (Fig. 2, supplementary material 4). MHC-DPA1 exon 2 for the set of all analysed groups displayed an accumulation of negatively selected positions (when present) in the less variable region (codons 12 to 49) and an accumulation of positively selected positions (when present) in the most variable region (codons 50 to 74). This pattern also occurred in most MHC-DPA1 groups (with the exception of Group 5). With a few exceptions, no common positions under selection occurred between groups. Pocket positions assigned by homology (Dai et al., 2010) and near residues showed greater variability, accumulation of non-synonymous and non-conservative substitutions and, in some cases, are under positive selection. Positions submitted to negative selection tend to occur with greater frequency in non-PBR sectors, as has been reported previously (Hughes and Yeager, 1998) (Fig. 2). All Pockets suffer selective pressures, but not in the same way depending of the group. Group 1 and Group 2 show more positions under positive selection than the other lineages analysed (seven of seven in Group 2, seven of eight in Group 1), but in Group 1, those positions are more variable than the observed in Group 2, and comprise Pockets 6 and 9. On the other hand, Group 2 shows selective forces in Pockets 1, 6 and 9. Groups 3 and 4 show only positions under negative selection and in both groups these positions occur outside the Pockets. Group 5 Pockets 1 and 6 are under positive selection, having this group more positively selected positions in the less variable sector, and two of these positions occur at the Pockets 1 and 6. The result of the analysis of all sequences together (Anthropoidea DPA) shows the occurrence of positively selected positions (four of seven) interspersed amongst the negatively selected positions in the less variable sector of the molecule. Two of them occur in residues with potential contact with the peptide (13 and 42), and two in Pocket residues (23 and 31). No positions under negative selection pressure were observed in Pocket residues. The remaining positions under positive selection occur in the most variable region, at Pocket 9 (72 and 73) and at a neighbour residue (68). A detailed analysis of positions under variation and/or selection for MHC-DPA1 has been performed (Fig. 3). Considering all positions under selection for groups and sequences together, all Pockets are under positive pressure (Fig. 3A, 3C, 3E), and that condition might be extended to some neighbour residues (considered as residues in direct contact with Pocket residues, at distances < 5Å, Fig. 3B, 3D, 3F). In Pocket 1, the definition of neighbour comprises the major number of residues amongst DPA1 Pockets, showing all possible tendencies. Six neighbour positions are under positive selection (in red) and two are variable (green) in positions that might involve peptide contact (Fig. 3B). Some of these positions are considered Pocket residues in DRA locus (Stern, et al., 1994, Cardenas, et al., 2004). Pocket 1 is highly conserved amongst DPA1 sequences analysed (Fig. 2), showing the non-variant residue 32F (blue) and negatively selected neighbour positions (ice blue) (Fig. 3A and 3B, respectively). Despite its conservation, the subtle variation observed is a consequence of diversifying forces. Pocket 6 shows six variable neighbour positions above the Pocket residues, and a negatively selected position in the Pocket base alongside a positively selected position (Fig. 3D). Pocket 6 is also highly conserved, but less than Pocket 1, showing variable positions such as 31 (Fig. 2). Pocket 9 is the most variable amongst Pockets considered, showing variable positions as 69, 72 and 73, all under positive selection (Fig. 2, and Fig. 3E). Only one neighbour position shows high variability and positive pressure, (68), and as in the previous cases, it is considered Pocket position for DRA loci. Nei–Gojobori’s method confirms these results, showing a significant accumulation of non- synonymous substitutions in PBR region for all sequences together; Groups 1, 2 and 5 show significant positive selection in PBR, group 3 shows a near neutral substitution pattern, and group 4 a non-significant accumulation of synonymous vs. non-synonymous substitutions. The non-PBR region displays the opposite behaviour, showing a significant accumulation of synonymous vs. non-synonymous substitutions for all sequences together; Groups 3 and 4 show significant negative selection in this region, whilst the remaining groups show the same tendency, but statistically unsupported. When analysing the entire sequence, all DPA1 groups and all DPA1 sequences together show accumulation of synonymous vs. non-synonymous substitutions, being significant only for group 3. In the less variable region (codons 12 to 49), all groups display the same pattern, and groups 3, 4 and 5 show a significant negative selection. In the most variable region (codons 50 to 74) all groups show more non-synonymous than synonymous substitutions, but without statistical support (Supplementary material 3 and 5). 4. Discussion The study of MHC-DPA1 represents an essential task in order to improve our understanding of both the MHC Class II and the immune system in the owl monkey. The central role of MHC Class II in defence against pathogens and its continuous struggle with changing pathogen strategies has caused a complex evolutionary scenario, in which multiple factors such as adaptive evolution by over-dominance, gene conversion, intra-allelic recombination and other recombination processes have shaped MHC polymorphism. The degree of polymorphism varies between MHC loci, as a result of different functional constrains (adaptive diversification or conservation), and stochastic processes (such as a bottleneck in population structure); these differences became relevant when comparing different immune systems (Hughes and Yeager, 1998; Bontrop et al., 1999; Yuhki et al., 2003). A. nancymaae MHC Class II polymorphism and evolutionary relationships have been previously explored using similar strategies to those used in this article. In the case of MHC-DQ, Diaz et al. (Diaz et al., 2000) found 5 MHC-DQA1 (Aona-DQA1*27) alleles isolated from 3 owl monkeys, 14 MHC-DQB1 (Aona-DQB1*22 and Aona-DQB1*23) alleles and two Aona-DQB2 alleles isolated from 19 monkeys. Suarez et al. (Suarez et al., 2006) have found 98 alleles for MHC-DRB (split into 12 lineages), isolated from 86 owl monkeys and Diaz et al. (Diaz et al., 2002), reported 3 alleles for MHC-DPB1, isolated from 7 owl monkeys. This work reports one Aona-DPA1 sequence isolated from 6 owl monkeys, suggesting that Aona- DPA1 may display a limited or non-existent polymorphism. Aona-DPA1 constitutes a divergent sequence located in one of the most variable groups within the context of primate MHC-DPA1. Despite the MHC-DPA1*05 lineage support, internal similarity and identity were often lower than similarity and identity observed between Aona-DPA1 and other MHC-DPA1 sequences from different loci (not shown). However, they share exclusive substitutions (supplementary material 1 and Fig. 2). Such high variability is caused by the age of the group and it has also been associated with greater variable positions and non-conservative changes; when high positively selected positions are added up to these findings (Fig. 2), particular functional constrains might be inferred for the evolution of this group. The most polymorphic locus in the owl monkey, as in humans and other primates, is thus MHC- DR. Such polymorphism is concentrated in MHC-DRB, whilst MHC-DRA is conserved in the studied primates. MHC-DRB is the only polymorphic gene in the common marmoset (C. jacchus) (Antunes et al., 1998; Bontrop et al., 1999). The second most polymorphic MHC Class II locus varies between different species; it is MHC- DQ for the owl monkey and rhesus monkey (M. mulatta) whereas it is MHC-DP for humans. The least variable locus is MHC-DP for the owl monkey and rhesus monkey (Slierendregt et al., 1995) and in humans it is MHC-DQ (Bontrop et al., 1999; Robinson et al., 2003). Although more sampling may be necessary, these results support A. nancymaae as having a smaller polymorphism in MHC-DP than in MHC-DQ or MHC-DR. These data establish differences between Aotus and Callithrix, denoting the different MHC class II restrictions and specialisation in new world monkeys, and in a global view, the different strategies used by each primate species regarding the specialisation and diversification of their MHC class II repertories. The existence of trans-species polymorphism (TSP) has been well-established for several MHC loci in primates (Klein 1987; Klein et al., 1998) but it can be mimicked by molecular convergence phenomena, as established for exon 2 from DRB1, DQA, DQB and DPB MHC Class II genes (O'HUigin, 1995; Trtkova et al., 1995; Kriener et al., 2000a; Kriener et al., 2000b; Kriener et al., 2001). All the above shows that only trans-species polymorphism has been found within Anthropoidea infraorders such as Catarrhini and Platyrrhini. If the TSP occurs, its duration is greater in MHC-DPA1 than MCH-DPB1, as can be observed in A. nancymaae (Diaz et al., 2002) and in other primates (Otting and Bontrop, 1995). Association between Aona-DPA1 and Sasc-DPA1*05 in a highly supported NWM clade (Fig. 1) becomes a trans-specific lineage. Other Catarrihini-exclusive trans-specific MHC-DPA1 lineages have been detected (Fig. 2). Group 1, formed by MHC-DPA1*05 and MHC-DPA1*07, includes Catarrihini sequences from Pongo and Macaca. This indicates the noticeable antiquity of this group, being the best supported clade in primate order. This group and MHC-DPA1*06 lineage (Group 4), show long evolutionary times, predating the divergence between Catarrihini and Platyrrini. The absence of Platyrrhini sequences in other groups might obey to the small sampling of MHC DPA1 in these primates. Interestingly, human, the best sampled primate, restricts most of its allelic repertoire to two groups (2 and 3) with a high conservation, but also high evolutionary rate. The evolutionary significance of this apparent specialisation may be explained by the birth and death model (Takahashi et al., 2000; Piontkivska and Nei, 2003), or by the origin of sequences, frequently derived from expressed genes only. In Group 2, an almost human lineage (with the exception of Patr-DPA1*0301), the virtual identity amongst HLA-DPA1*05 and Mamu- DPA1*0101 may indicate the existence of an ancient TSP, or a molecular convergence. Interestingly, this lineage shows a high number of positively selected positions, indicating a strong process of diversifying selection within the human lineage. These results show the existence of MHC-DPA specific lineages in some primate clades, but also, long term lineages as Group 1 or MHC-DPA1*06 in Group 4. This emphasises the need for a greater sampling amongst primate species to better understand MHC-DPA evolution. In spite of its low variation, MHC-DPA1 exon 2 displayed differential variability constrains along the sequence, exhibiting a conserved region (residues 12 - 49) in which synonymous substitutions and negatively selected positions prevailed, and a mostly variable region (residues 50 - 74) in which non-synonymous substitutions and positively selected positions predominated (Fig. 2). This observation may have functional relevance indicating compartmentalisation. As other MHC genes, the positive selection is focused on PBR positions, and negative selection occurs on non-PBR positions, however, all groups analysed show specific variation and selection patterns, e.g. Group 1 shows a relatively high sequence diversity, a slow evolutionary rate and predominance of diversifying selection; Group 4 also shows a relatively high diversity and slow evolutionary rate, but evidence of purifying selection, explained by the accumulation of substitutions in the third position of the codon that lead to accumulating synonymous substitutions. On the other hand, Group 2 shows a relatively low diversity and a high evolutionary rate as Group 3, but Group 2 shows evidence of a diversifying selection whilst Group 3 displays evidence of a purifying selection. These differences amongst Groups also involve the Pockets themselves, being stressed to different selection patterns depending on the group. All the above suggests the existence of differences between the evolutionary restrictions modelling the peptide binding boundaries for each group analysed. The detection of variation and selective constrains beyond the Pocket residues may have a functional importance. In some cases, those positions might be involved in peptide contact or in the modification of electrostatic properties of the Pocket by surrounding residues (Fig. 3). The visualisation of that “extended Pockets” suggests that the binding interactions described by crystallographic studies might be fuzzier, and the evolutionary analysis provides evidence of different binding capacities for non-crystallised alleles. These subtle residue variations might be functionally relevant, as has been described in other MHC contexts (Posch et al., 1995; Posch et al., 1996). The above results led us to conclude that Aona-DPA1 shows a limited or non-existent polymorphism and is associated with Sasc-DPA1*05, forming a strongly-supported lineage with distinctive variability and selective patterns from the other primate-MHC-DPA1 lineages. Our results show differences in the evolutionary pattern of HLA-DPA, suggesting a recent but strong diversifying process in the human lineage. The groups delimited from our analyses possess a set of distinctive features at diversity and selection patterns, indicating several modes of evolution in primate MHC-DPA. Acknowledgements This work was funded by COLCIENCIAS; contract RC-140-2009. We would like to thank Monica Estupiñan for laboratory technical support in the obtaining of sequences and Gisselle Rivera for helping in the translation of this manuscript. References  Antunes, S.G., De Groot, N.G., Brok, H., Doxiadis, G., Menezes, A.A., Otting, N. and Bontrop, R.E., 1998. The common marmoset: a new world primate species with limited MHC class II variability. Proc Natl Acad Sci U S A. 95, 11745-11750.  Bontrop, R.E., Otting, N., De Groot, N.G. and Doxiadis, G.G., 1999. Major histocompatibility complex class II polymorphisms in primates. Immunol Rev. 167, 339-350.  Cardenas, C., Villaveces, J.L., Bohorquez, H., Llanos, E., Suarez, C., Obregon, M. and Patarroyo, M.E., 2004. Quantum chemical analysis explains hemagglutinin peptide-MHC Class II molecule HLA-DRbeta1*0101 interactions. Biochem Biophys Res Commun. 323, 1265-1277.  Collins, W.E., 1994. The owl monkey as a model for malaria, in: W. K. Baer (Eds.), Aotus: the owl monkey. Academic Press pp. 245-258.  Crooks, G.E., Hon, G., Chandonia, J.M. and Brenner, S.E., 2004. WebLogo: a sequence logo generator. Genome Res. 14, 1188-1190.  Dai, S., Murphy, G.A., Crawford, F., Mack, D.G., Falta, M.T., Marrack, P., Kappler, J.W. and Fontenot, A.P., 2010. Crystal structure of HLA-DP2 and implications for chronic beryllium disease. Proc Natl Acad Sci U S A. 107, 7425-7430.  Dayhoff, M.O., Schwartz R.M., Orcutt, B., 1978. A model of evolutionary change in proteins, in: Dayhoff M. (Eds.), Atlas of protein sequence and structure. National Biomedical Research Foundation, pp. 345-352.  Diaz, D., Daubenberger, C.A., Zalac, T., Rodriguez, R. and Patarroyo, M.E., 2002. Sequence and expression of MHC-DPB1 molecules of the New World monkey Aotus nancymaae, a primate model for Plasmodium falciparum. Immunogenetics. 54, 251-259.  Diaz, D., Naegeli, M., Rodriguez, R., Nino-Vasquez, J.J., Moreno, A., Patarroyo, M.E., Pluschke, G. and Daubenberger, C.A., 2000. Sequence and diversity of MHC DQA and DQB genes of the owl monkey Aotus nancymaae. Immunogenetics. 51, 528-537.  Doxiadis, G.G., Otting, N., De Groot, N.G. and Bontrop, R.E., 2001. Differential evolutionary MHC class II strategies in humans and rhesus macaques: relevance for biomedical studies. Immunol Rev. 183, 76-85.  Felsenstein, J., 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 17, 368-376.  Felsenstein, J., 1983. Parsimony in Systematics: Biological and Statistical Ann. Rev. Ecol. Syst. 313-333.  Felsenstein, J., 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 164-166.  Goodman, M., Porter, C.A., Czelusniak, J., Page, S.L., Schneider, H., Shoshani, J., Gunnell, G. and Groves, C.P., 1998. Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence. Mol Phylogenet Evol. 9, 585-598.  Gysin, J., 1988. Animal models: primates, in: Sherman I.W. (Eds.), Malaria: parasite biology, pathogenesis, and protection. ASM Press pp. 419-439.  Hasegawa, M., Kishino, H. and Yano, T., 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 22, 160-174.  Hillis, D.B. and Bull, J.J., 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 182-192.  Huelsenbeck, J.P. and Ronquist, F., 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 17, 754-755.  Hughes, A.L. and Yeager, M., 1998. Natural selection at major histocompatibility complex loci of vertebrates. Annu Rev Genet. 32, 415-435.  Humphrey, W., Dalke, A. and Schulten, K., 1996. VMD: visual molecular dynamics. J Mol Graph. 14, 33-38, 27-38.  Jobb, G., Von Haeseler, A. and Strimmer, K., 2004. TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol. 4, 18.  Jones, D.T., Taylor, W.R. and Thornton, J.M., 1992. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 8, 275-282.  Kelley, J., Walter, L. and Trowsdale, J., 2005. Comparative genomics of major histocompatibility complexes. Immunogenetics. 56, 683-695.  Kimura, M., 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 16, 111-120.  Klein, J., 1987. Origin of major histocompatibility complex polymorphism: the trans-species hypothesis. Hum Immunol. 19, 155-162.  Klein, J., O'huigin, C., Figueroa, F., Mayer, W.E. and Klein, D., 1993a. Different modes of MHC evolution in primates. Mol Biol Evol. 10, 48-59.  Klein, J., Satta, Y., O'huigin, C. and Takahata, N., 1993b. The molecular descent of the major histocompatibility complex. Annu Rev Immunol. 11, 269-295.  Klein, J., Sato, A., Nagl, S. and O'huigín, C., 1998. Molecular Trans-Species Polymorphism Annual Review of Ecology and Systematics. 29, 1-21.  Kosakovsky-Pond, S.K. and Muse, S.V., 2005. Site-to-site variation of synonymous substitution rates. Mol Biol Evol. 22, 2375-2385.  Kosakovsky-Pond, S.L. and Frost, S.D., 2005a. Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics. 21, 2531-2533.  Kosakovsky-Pond, S.L. and Frost, S.D., 2005b. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. 22, 1208-1222.  Kosakovsky-Pond, S.L., Frost, S.D. and Muse, S.V., 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics. 21, 676-679.  Kriener, K., O'huigin, C. and Klein, J., 2000a. Alu elements support independent origin of prosimian, platyrrhine, and catarrhine Mhc-DRB genes. Genome Res. 10, 634-643.  Kriener, K., O'huigin, C., Tichy, H. and Klein, J., 2000b. Convergent evolution of major histocompatibility complex molecules in humans and New World monkeys. Immunogenetics. 51, 169-178.  Kriener, K., O'huigin, C. and Klein, J., 2001. Independent origin of functional MHC class II genes in humans and New World monkeys. Hum Immunol. 62, 1-14.  Lujan, R., Chapman, W.L., Jr., Hanson, W.L. and Dennis, V.A., 1986. Leishmania braziliensis: development of primary and satellite lesions in the experimentally infected owl monkey, Aotus trivirgatus. Exp Parasitol. 61, 348-358.  May, J., Kremsner, P.G., Milovanovic, D., Schnittger, L., Loliger, C.C., Bienzle, U. and Meyer, C.G., 1998. HLA-DP control of human Schistosoma haematobium infection. Am J Trop Med Hyg. 59, 302-306.  Nei, M. and Gojobori, T., 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 3, 418-426.  Nicholas, K.B., Nicholas, H.B. and Deerfield, D.W., 1997. Genedoc. Analysis and visualization of genetic variation. EMBNEW NEWS 4, 14.  Niño-Vasquez, J.J., Vogel, D., Rodriguez, R., Moreno, A., Patarroyo, M.E., Pluschke, G. and Daubenberger, C.A., 2000. Sequence and diversity of DRB genes of Aotus nancymaae, a primate model for human malaria parasites. Immunogenetics. 51, 219-230.  Noya, O., Gonzalez-Rico, S., Rodriguez, R., Arrechedera, H., Patarroyo, M.E. and Alarcon De Noya, B., 1998. Schistosoma mansoni infection in owl monkeys (Aotus nancymai): evidence for the early elimination of adult worms. Acta Trop. 70, 257-267.  O'huigin, C., 1995. Quantifying the degree of convergence in primate Mhc-DRB genes. Immunol Rev. 143, 123-140.  Opazo, J.C., Wildman, D.E., Prychitko, T., Johnson, R.M. and Goodman, M., 2006. Phylogenetic relationships and divergence times among New World monkeys (Platyrrhini, Primates). Mol Phylogenet Evol. 40, 274-280.  Osada, N., Hashimoto, K., Kameoka, Y., Hirata, M., Tanuma, R., Uno, Y., Inoue, I., Hida, M., Suzuki, Y., Sugano, S., Terao, K., Kusuda, J. and Takahashi, I., 2008. Large-scale analysis of Macaca fascicularis transcripts and inference of genetic divergence between M. fascicularis and M. mulatta. BMC Genomics. 9, 90.  Otting, N. and Bontrop, R.E., 1995. Evolution of the major histocompatibility complex DPA1 locus in primates. Hum Immunol. 42, 184-187.  Pico De Coana, Y., Rodriguez, J., Guerrero, E., Barrero, C., Rodriguez, R., Mendoza, M. and Patarroyo, M.A., 2003. A highly infective Plasmodium vivax strain adapted to Aotus monkeys: quantitative haematological and molecular determinations useful for P. vivax malaria vaccine development. Vaccine. 21, 3930-3937.  Piontkivska, H. and Nei, M., 2003. Birth-and-death evolution in primate MHC class I genes: divergence time estimates. Mol Biol Evol. 20, 601-609.  Polotsky, Y.E., Vassell, R.A., Binn, L.N. and Asher, L.V., 1994. Immunohistochemical detection of cytokines in tissues of Aotus monkeys infected with hepatitis A virus. Ann N Y Acad Sci. 730, 318-321.  Poon, A.F., Frost, S.D. and Pond, S.L., 2009. Detecting signatures of selection from DNA sequences using Datamonkey. Methods Mol Biol. 537, 163-183.  Posch, P.E., Araujo, H.A., Creswell, K., Praud, C., Johnson, A.H. and Hurley, C.K., 1995. Microvariation creates significant functional differences in the DR3 molecules. Hum Immunol. 42, 61-71.  Posch, P.E., Hurley, C.K., Geluk, A. and Ottenhoff, T.H., 1996. The impact of DR3 microvariation on peptide binding: the combinations of specific DR beta residues critical to binding differ for different peptides. Hum Immunol. 49, 96-105.  Robinson, J., Waller, M.J., Parham, P., De Groot, N., Bontrop, R., Kennedy, L.J., Stoehr, P. and Marsh, S.G., 2003. IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex. Nucleic Acids Res. 31, 311-314.  Rodriguez, R., Moreno, A., Guzman, F., Calvo, M. and Patarroyo, M.E., 1990. Studies in owl monkeys leading to the development of a synthetic vaccine against the asexual blood stages of Plasmodium falciparum. Am J Trop Med Hyg. 43, 339-354.  Ronquist, F. and Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19, 1572-1574.  Rzhetsky, A. and Nei, M., 1993. Theoretical foundation of the minimum-evolution method of phylogenetic inference. Mol Biol Evol. 10, 1073-1095.  Schwarz, R. and Dayhoff, M., 1979. Matrices for detecting distant relationships, in: Dayhoff, M. (Eds.), Atlas of protein sequences. National Biomedical Research Foundation, pp. 353 - 358.  Shimodaira, H. and Hasegawa, M., 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16, 1114-1116.  Sitnikova, T., 1996. Bootstrap method of interior-branch test for phylogenetic trees. Mol Biol Evol. 13, 605-611.  Slierendregt, B.L., Otting, N., Kenter, M. and Bontrop, R.E., 1995. Allelic diversity at the Mhc-DP locus in rhesus macaques (Macaca mulatta). Immunogenetics. 41, 29-37.  Steiper, M.E. and Young, N.M., 2006. Primate molecular divergence dates. Mol Phylogenet Evol. 41, 384-394.  Stern, L.J., Brown, J.H., Jardetzky, T.S., Gorga, J.C., Urban, R.G., Strominger, J.L. and Wiley, D.C., 1994. Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide. Nature. 368, 215-221.  Suarez, C.F., Patarroyo, M.E., Trujillo, E., Estupiñan, M., Baquero, J.E., Parra, C. and Rodriguez, R., 2006. Owl monkey MHC-DRB exon 2 reveals high similarity with several HLA-DRB lineages. Immunogenetics. 58, 542-558.  Takahashi, K., Rooney, A.P. and Nei, M., 2000. Origins and divergence times of mammalian class II MHC gene clusters. J Hered. 91, 198-204.  Tamura, K., Dudley, J., Nei, M. and Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 24, 1596-1599.  Tamura, K. and Kumar, S., 2002. Evolutionary distance estimation under heterogeneous substitution pattern among lineages. Mol Biol Evol. 19, 1727-1736.  Tamura, K., Nei, M. and Kumar, S., 2004. Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci U S A. 101, 11030-11035.  Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins, D.G., 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876-4882.  Trtkova, K., Mayer, W.E., O'huigin, C. and Klein, J., 1995. Mhc-DRB genes and the origin of New World monkeys. Mol Phylogenet Evol. 4, 408-419.  Yuhki, N., Beck, T., Stephens, R.M., Nishigaki, Y., Newmann, K. and O'brien, S.J., 2003. Comparative genome organization of human, murine, and feline MHC class II region. Genome Res. 13, 1169-1179. Figure legends Fig. 1. Phylogenetic tree calculated using a Bayesian approach for primate MHC-DPA1 exon 2 sequences. Topologies obtained for Parsimony (Pars), maximum likelihood (ML) and minimum evolution (ME) are similar, and significant node support from these analyses are also shown (Bootstrap >70%; IBT >90%, LRSH >95%. See the code at the bottom of the figure). Allelic lineages are shown in different colours. Primate species divergence time in million years (MY) and mean substitution per site per million years (Sub/S/My) are shown for each group and subgroup, the average nucleotide identity obtained from all possible pairwise comparisons of exon 2 is also shown. The scale indicates 0.2 substitutions per site. See Materials and Methods section for species abbreviations and calculation details. Fig. 2. MHC-DPA1 exon 2-deduced amino acid sequence logo. PAM 250 substitution matrix groups (DENQH (green), SAT (blue), KR (red), FYW (black), LIVM (purple), C (Gray), G (Brown) and P (Yellow)) are used to show conservative or non-conservative substitutions; colour changes imply non-conservative substitutions. Above each logo, sites under positive selection (combined results for SLAC, FEL and REL tests) are marked with +, whilst those under negative selection are shown with –; the remaining sites are unmarked and are considered neutral. Coloured numbers below each logo denote Pocket positions: fuchsia P1, orange P6, green P9, coloured arrows indicate other residues in contact with Pocket residues. At the right-hand side, the amino acid identity and amino acid similarity in primate MHC-DPA1 are shown. The average was obtained from all possible pairwise comparisons of deduced MHC-DPA1 protein sequences within each group. Similarity was calculated based on PAM 250 substitution matrix. Fig. 3. Pockets of MHC-DPA1. Based in the PDB 3LQZ (DPA1*0103, DPB1*0201), the pockets and their neighbouring residues are shown. A. Pocket 1, B. Pocket 1 neighbour residues, C. Pocket 6, D. Pocket 6 neighbour residues, E. Pocket 9 and F. Pocket 9 neighbour residues. In red, positively selected residues, in ice blue, negatively selected residues, in blue, invariant residues, in green, variable residues, and in white, non-considered residues. Figure 1. Figure 2. Figure 3. Supplementary Material 1. Exon 2 nucleotide sequence alignment of MHC-DPA1 alleles from 11 primates. Position is indicated by the top numbers and asterisks (*) symbolise 10 base intervals. A dot (.) denotes identity with regards to Aona-DPA1 sequence. GenBank Accession numbers appear after each sequence name. 34 * * * * * * * * * * * 147 Aona-DPA1*01 - AF529200 : TTT GTA CAG ACG CAG AGA CCA ACA GGG GAG TTT ATG TTT GAG TTT GAT GAG GAT GAG ATA TTC TAC GTG GAT CTG GAC AAG AAG GAG ACC GTC TGG CAT CTG GAG GAG TTT GGC Sasc-DPA1*0501 - AF026698 : ... ... ... .T. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... A.. ... ... ... ... ... ... ... Sasc-DPA1*0502 - AF026699 : ... ... ... .T. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mafa-DPA1*0702 - EF208810 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Poab-DPA - AC207096 : ... A.. ... ... ..T ... ..G ... ... ... .A. ... ... ... ... ... ... ... ... ..G ... C.T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... ... ... Mafa-DPA1*0701 - EF208809 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1*0701 - EF204946 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA - AB219102 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA - AB250757 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*010302 - AF074848 : ... ... ... ... ..T ... ... ... ..A ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0107 - AF076284 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*010304 - DQ274060 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0105 - X96984 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1*0101 - Z32411 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0109 - AY650051 : ... ... ... ... ..T ... ... ... ... ... ... .C. ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0110 - DQ274061 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ..T ... ... ... ... ... ... HLA-DPA1*0104 - X78198 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..C ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0108 - AF346471 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..C ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0302 - AF013767 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0303 - AY618553 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..C ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0301 - M83908 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Patr-DPA1*0301 - AF026694 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0203 - Z48473 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*010601 - U87556 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ..T ..A ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*020102 - L31624 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ..T ..A ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*020101 - X78199 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ..T ..A ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*020104 - AF074847 : ... ... ..A ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ..T ..A ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*020106 - AF165160 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ..T ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*02021 - X79475 : ... ... ... ..C ..T ... ... ... ..A ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ..T ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*02022 - X79476 : ... ... ... ..C ..T ... ... ... ..A ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0204 - EU304462 : ... ... ... ..C ..T ... ... ... ..A ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*010602 - EU729350 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*020103 - AF015295 : ... ... ... ... ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*020105 - AF098794 : ... ... ... ... ..T ... ... ... ..A ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*020203 - AF092049 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Patr-DPA1*0201 - AF026707 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Patr-DPA1*0202 - AF026693 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Gogo-DPA1 - CU104655 : ... ... ... ..C ..T ... ... ... ... ... ... ... ... ..A ... ... ..A ... ... ..G ... ..T ... ... ... ..T ... ... ... ... ... ... ... ... ... ... ... ... HLA-DPA1*0401 - L11643 : ... ... ... ... ..T ... A.. ... ..A ... ... ... ... ... ... ... ..T ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Popy-DPA1*0401 - AF026697 : ... ... ... ... ..T ... A.. ... ..A ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Gogo-DPA1*0402 - AF026702 : ... ... ... ... ..T ... A.. ... ..A ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Gogo-DPA1*0401 - AF026701 : ... ... ... ... ..T ... A.. ... ..A ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Sasc-DPA1*0601 - AF026700 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1 - AB219101 : ... ... ... ..A ..T ... ... ... ..A ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1*0601 - EF204949 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1 - AB219099 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mafa-DPA1*0401 - EF208808 : ... ... ... ... ..T ... ... ... ... ... ... ... .A. ... ..G ... ..A ... ..A ... ... .TT ... ... ... ... ... ... ... ... A.. ... ... ... ... ... ... ... Mamu-DPA1*0403 - GQ471885 : ... ... ... ... ..T ... ... ... ... ... .A. ... .A. ... ..G ... ..A ... ..A ... ... .TT ... ... ... ... ... ... ... ... A.. ... ... ... ... ... ... ... Mamu-DPA1*0401 - FJ544417 : ... ... ... ... ..T ... ... ... ... ... .A. ... .A. ... ..G ... ..A ... ..A ..G ... .TT ... ... ... ... ... ..A ... ... A.. ... ... ... ... ... ... ... Mamu-DPA1*0402 - FJ544415 : ... ... ... ... ..T ... ... ... ... ... .A. ... .AC ... ..G ... ..A ... ..A ... ... .TT ... ... ... ... ... ... ... ..T ... ... ... ... ... ... ... ... Mamu-DPA1 - AB219100 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1 - AB250756 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A ... ... ..G ... ..T ... ... ... ... ... ... ... ... A.. ... ... ... ... ... ... ... Maar-DPA1*0201 - AF026703 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Paha-DPA1*0201 - AF026706 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mafa-DPA1*0204 - AM943632 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mafa-DPA1*0201 - AF026704 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A .G. ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mafa-DPA1*0202 - EF208806 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1*0801 - EU305663 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ... ... ... ..G ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1*0208 - FJ544416 : ... ... ... ..A ..T ... ... ... ... ... .A. ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Popy-DPA1*0201 - AF026695 : ... ... ... ... ..T ... ..G ... ..A ... ... ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ..T ... ... ... ... ... ... ... ... ... ... ... ... Popy-DPA1*0202 - AF026696 : ... ... ... ... ..T ... ..G ... ... ... .A. ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ..T ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1 - AB250754 : ... AC. ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ..T ... ... ... ... ... ... ... ... Mamu-DPA1*0201 - EF204945 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Mamu-DPA1*0203 - EF204950 : ... ... ... ..A ..T ... ... ... ... ... ... ... ... ... ... ... ..A ... ... CAG ... ..T ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..T 148 * * * * * * * * 222 Aona-DPA1*01 - AF529200 : CGG GCC TTT TCC GTT GAG GTT CAG GGT GGG CTG GCT AAC ATT GCT GCA TTG AAC AAC AAC TTG AAT ATC CTG ATC Sasc-DPA1*0501 - AF026698 : ..A ... .C. ... T.. ... T.. ... AA. ... ... T.. ... ... ... ... ... ... ... C.. ... ... ... A.. ... Sasc-DPA1*0502 - AF026699 : ..A ... ... ... T.. ... T.. ... AA. ... ... T.. ... ... ... ... ... ... ... C.. ... ... ... A.. ... Mafa-DPA1*0702 - EF208810 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... ... ... ... ... ... ... ... .C. T.. ... Poab-DPA - AC207096 : ..A ... ... ... T.G ... .C. ... ..C ... ... ... ... ... ... ... ... ... G.. C.. ... ... GC. A.. ... Mafa-DPA1*0701 - EF208809 : ..A ... ... ... T.. ... .C. ... ..A ... ... ... ... ... ... ... ... ... ... ... ... ... .C. T.. ... Mamu-DPA1*0701 - EF204946 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... ... ... ... C.. ... ... ... .C. T.. ... Mamu-DPA - AB219102 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... ... ... ... C.. ... ... ... ... T.. ... Mamu-DPA - AB250757 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... ... ... ... C.. ... ... ... .C. T.. ... HLA-DPA1*010302 - AF074848 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0107 - AF076284 : .AA A.. ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*010304 - DQ274060 : .AA ... ... ... T.. ... .C. ... ... ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0105 - X96984 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... Mamu-DPA1*0101 - Z32411 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0109 - AY650051 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0110 - DQ274061 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0104 - X78198 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0108 - AF346471 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0302 - AF013767 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0303 - AY618553 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. .C. ... ... ... ... ... .C. T.. ... HLA-DPA1*0301 - M83908 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. .C. ... ... ... ... ... .C. T.. ... Patr-DPA1*0301 - AF026694 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... T.. ... HLA-DPA1*0203 - Z48473 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*010601 - U87556 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*020102 - L31624 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*020101 - X78199 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*020104 - AF074847 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*020106 - AF165160 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*02021 - X79475 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*02022 - X79476 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*0204 - EU304462 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... G.. ... ... .C. T.. ... HLA-DPA1*010602 - EU729350 : .AA ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*020103 - AF015295 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*020105 - AF098794 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... HLA-DPA1*020203 - AF092049 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... .C. T.. ... Patr-DPA1*0201 - AF026707 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... T.. ... Patr-DPA1*0202 - AF026693 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... Gogo-DPA1 - CU104655 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... HLA-DPA1*0401 - L11643 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... GCT ... Popy-DPA1*0401 - AF026697 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ..T ... ... ... GCT ... Gogo-DPA1*0402 - AF026702 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... GCT ... Gogo-DPA1*0401 - AF026701 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... ACT ... Sasc-DPA1*0601 - AF026700 : ..A ... ... ... T.. ... .C. ... A.C ATC .A. ... C.. ... A.. AT. ... ..G G.. G.. ... ... ... AC. ... Mamu-DPA1 - AB219101 : ..A ... ... .T. T.. ... .C. ... A.G .T. ... T.. C.. ... .T. AT. ... ..T G.. .G. ... ... ... A.. ... Mamu-DPA1*0601 - EF204949 : .AA ... A.. ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... C.. ... ... ... ... AC. ... Mamu-DPA1 - AB219099 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... Mafa-DPA1*0401 - EF208808 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. .C. ... ... ... ... ... G.. AC. ... Mamu-DPA1*0403 - GQ471885 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. .C. ... ... ... ... ... G.. AC. ... Mamu-DPA1*0401 - FJ544417 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. .C. ... ... ... ... ... G.. AC. ... Mamu-DPA1*0402 - FJ544415 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... G.. AC. ... Mamu-DPA1 - AB219100 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... ACC ... Mamu-DPA1 - AB250756 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... ACC ... Maar-DPA1*0201 - AF026703 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... Paha-DPA1*0201 - AF026706 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... Mafa-DPA1*0204 - AM943632 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... Mafa-DPA1*0201 - AF026704 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... Mafa-DPA1*0202 - EF208806 : ..A ... ... ... T.. ... .C. ... ..C ... ... .A. G.. ... ... A.. ... ..G T.. ... ... ... ... AC. ... Mamu-DPA1*0801 - EU305663 : ..A ... ... ... T.. ... .C. ... ..C ... ... .A. G.. ... ... A.. ... ..G T.. ... ... ... ... AC. ... Mamu-DPA1*0208 - FJ544416 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... A.. ... Popy-DPA1*0201 - AF026695 : ..A ... ... ... T.. ... .C. ... ..C .C. ... ... ... ... ... AT. ... ... ... ... ... ... ... A.. ... Popy-DPA1*0202 - AF026696 : ..A ... ... ... T.. ... .C. ... ..C .C. ... ... G.. ... ... AT. ... ... ... ... ... ... ... A.. ... Mamu-DPA1 - AB250754 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. .C. ... ... ... ... ... .C. A.. ... Mamu-DPA1*0201 - EF204945 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... Mamu-DPA1*0203 - EF204950 : ..A ... ... ... T.. ... .C. ... ..C ... ... ... ... ... ... AT. ... ... ... ... ... ... ... AC. ... 2. Alpha domain sequence alignment of MHC-DPA1 alleles from 11 primates. Position is indicated by the top numbers and asterisks (*) symbolise 10 amino acid intervals. A dot (.) denotes identity with regards to Aona-DPA1 sequence. GenBank Accession numbers appear after each sequence name. 12 * * * * * * 74 Aona-DPA1*01 - AF529200 : FVQTQRPTGEFMFEFDEDEIFYVDLDKKETVWHLEEFGRAFSVEVQGGLANIAALNNNLNILI Sasc-DPA1*0501 - AF026698 : ...M...............M..........I.........S.F.F.N..S.......H...M. Sasc-DPA1*0502 - AF026699 : ...M...............M......................F.F.N..S.......H...M. Mafa-DPA1*0702 - EF208810 : ....H.....Y........M......................F.A...............T.. Poab-DPA - AC207096 : .I..H.....Y........M.H....................L.A...........DH..AM. Mafa-DPA1*0701 - EF208809 : ....H..............M......................F.A...............T.. Mamu-DPA1*0701 - EF204946 : ....H..............M......................F.A...........H...T.. Mamu-DPA - AB219102 : ....H..............Q......................F.A...........H...... Mamu-DPA - AB250757 : ....H..............M......................F.A...........H...T.. HLA-DPA1*010302 - AF074848 : ....H..............M..................Q...F.A........I......T.. HLA-DPA1*0107 - AF076284 : ....H..............M..................QT..F.A........I......T.. HLA-DPA1*010304 - DQ274060 : ....H..............M..................Q...F.A........I......T.. HLA-DPA1*0105 - X96984 : ....H..............M..................Q...F.A........I......T.. Mamu-DPA1*0101 - Z32411 : ....H..............M..................Q...F.A........I......T.. HLA-DPA1*0109 - AY650051 : ....H......T.......M..................Q...F.A........I......T.. HLA-DPA1*0110 - DQ274061 : ....H..............M...........C......Q...F.A........I......T.. HLA-DPA1*0104 - X78198 : ....H...........D..M..................Q...F.A........I......T.. HLA-DPA1*0108 - AF346471 : ....H...........D..M......................F.A........I......T.. HLA-DPA1*0302 - AF013767 : ....H..............M..................Q...F.A........I......T.. HLA-DPA1*0303 - AY618553 : ....H...........D..M..................Q...F.A........IS.....T.. HLA-DPA1*0301 - M83908 : ....H..............M..................Q...F.A........IS.....T.. Patr-DPA1*0301 - AF026694 : ....H..............M......................F.A........I......... HLA-DPA1*0203 - Z48473 : ....H..............M......................F.A........I......T.. HLA-DPA1*010601 - U87556 : ....H..............Q..................Q...F.A........I......T.. HLA-DPA1*020102 - L31624 : ....H..............Q......................F.A........I......T.. HLA-DPA1*020101 - X78199 : ....H..............Q......................F.A........I......T.. HLA-DPA1*020104 - AF074847 : ....H..............Q......................F.A........I......T.. HLA-DPA1*020106 - AF165160 : ....H..............Q......................F.A........I......T.. HLA-DPA1*02021 - X79475 : ....H..............Q......................F.A........I......T.. HLA-DPA1*02022 - X79476 : ....H..............Q......................F.A........I......T.. HLA-DPA1*0204 - EU304462 : ....H..............Q......................F.A........I...D..T.. HLA-DPA1*010602 - EU729350 : ....H..............Q..................Q...F.A........I......T.. HLA-DPA1*020103 - AF015295 : ....H..............Q......................F.A........I......T.. HLA-DPA1*020105 - AF098794 : ....H..............Q......................F.A........I......T.. HLA-DPA1*020203 - AF092049 : ....H..............Q......................F.A........I......T.. Patr-DPA1*0201 - AF026707 : ....H..............Q......................F.A........I......... Patr-DPA1*0202 - AF026693 : ....H..............Q......................F.A........I.......T. Gogo-DPA1 - CU104655 : ....H..............M......................F.A........I.......T. HLA-DPA1*0401 - L11643 : ....H.T.........D..M......................F.A........I.......A. Popy-DPA1*0401 - AF026697 : ....H.T............M......................F.A........I.......A. Gogo-DPA1*0402 - AF026702 : ....H.T............M......................F.A........I.......A. Gogo-DPA1*0401 - AF026701 : ....H.T............M......................F.A........I.......T. Sasc-DPA1*0601 - AF026700 : ....H..............M......................F.A.SIQ.H.TI.KDD...T. Mamu-DPA1 - AB219101 : ....H..............M.....................FF.A.RV.SH.VI..DS...M. Mamu-DPA1*0601 - EF204949 : ....H..............M..................Q.I.F.A........I..H....T. Mamu-DPA1 - AB219099 : ....H..............M......................F.A........I.......T. Mafa-DPA1*0401 - EF208808 : ....H.......Y.L......F........I...........F.A........IS.....VT. Mamu-DPA1*0403 - GQ471885 : ....H.....Y.Y.L......F........I...........F.A........IS.....VT. Mamu-DPA1*0401 - FJ544417 : ....H.....Y.Y.L....M.F........I...........F.A........IS.....VT. Mamu-DPA1*0402 - FJ544415 : ....H.....Y.Y.L......F....................F.A........I......VT. Mamu-DPA1 - AB219100 : ....H.....Y........M......................F.A........I.......T. Mamu-DPA1 - AB250756 : ....H.....Y........M..........I...........F.A........I.......T. Maar-DPA1*0201 - AF026703 : ....H.....Y........Q......................F.A........I.......T. Paha-DPA1*0201 - AF026706 : ....H.....Y........Q......................F.A........I.......T. Mafa-DPA1*0204 - AM943632 : ....H.....Y........Q......................F.A........I.......T. Mafa-DPA1*0201 - AF026704 : ....H.....Y......G.Q......................F.A........I.......T. Mafa-DPA1*0202 - EF208806 : ....H.....Y........Q......................F.A....DD..T.KY....T. Mamu-DPA1*0801 - EU305663 : ....H.....Y........M......................F.A....DD..T.KY....T. Mamu-DPA1*0208 - FJ544416 : ....H.....Y........Q......................F.A........I.......M. Popy-DPA1*0201 - AF026695 : ....H..............Q......................F.A..A.....I.......M. Popy-DPA1*0202 - AF026696 : ....H.....Y........Q......................F.A..A..D..I.......M. Mamu-DPA1 - AB250754 : .T..H..............Q......................F.A........IS.....TM. Mamu-DPA1*0201 - EF204945 : ....H..............Q......................F.A........I.......T. Mamu-DPA1*0203 - EF204950 : ....H..............Q......................F.A........I.......T. 3. Codon positions variability and Nei-Gojobori test. P distance (pd) and standard error (S.E.) were calculated for the codon and each of its positions, for each group of analysed sequences in different sectors of the DPA exon 2. Synonymous (dS) and non synonymous substitutions (dN) and associated variance rates (assessed by the bootstrap method with 1,000 replicates) for every group of analysed sequences in each sector of exon 2 were calculated by Nei–Gojobori’s method. Z values for statistically significant tests (significance levels of 5% ≥ 1.64) were marked in red for positive selection (dN > dS) and in gray for negative selection (dS > dN). Exon 2 region 1 comprises the less variable sector (codons 12 to 49) and exon 2 region 2 contains the most variable region (codons 50 to 74). Definition of PBR and extended PBR positions are according to figure 2. Exon2 - Region 1 Exon 2 - Region 2 Exon 2 G1 G2 G3 G4 G5 Total G1 G2 G3 G4 G5 Total G1 G2 G3 G4 G5 Total Codon first pd 0.04 0.00 0.00 0.00 0.04 0.10 0.23 0.00 0.03 0.21 0.08 0.18 0.02 0.01 0.00 0.11 0.00 0.04 position S.E. 0.04 0.00 0.00 0.00 0.04 0.08 0.11 0.00 0.03 0.06 0.07 0.07 0.02 0.01 0.00 0.06 0.00 0.02 Codon second pd 0.12 0.00 0.00 0.18 0.15 0.19 0.13 0.07 0.00 0.10 0.16 0.20 0.13 0.04 0.00 0.14 0.15 0.20 position S.E. 0.07 0.00 0.00 0.10 0.09 0.09 0.08 0.04 0.00 0.05 0.08 0.08 0.06 0.03 0.00 0.05 0.06 0.06 Codon third pd 0.04 0.03 0.00 0.10 0.00 0.04 0.00 0.00 0.00 0.12 0.00 0.04 0.02 0.01 0.00 0.11 0.00 0.04 position S.E. 0.04 0.03 0.00 0.06 0.00 0.02 0.00 0.00 0.00 0.12 0.00 0.03 0.02 0.01 0.00 0.06 0.00 0.02 pd 0.07 0.01 0.00 0.09 0.06 0.11 0.12 0.02 0.01 0.14 0.04 0.14 0.10 0.03 0.01 0.12 0.07 0.13Codon S.E. 0.03 0.01 0.00 0.04 0.04 0.04 0.05 0.02 0.01 0.04 0.04 0.04 0.03 0.01 0.01 0.03 0.03 0.03 dS 0.01 0.00 0.00 0.06 0.00 0.03 0.04 0.00 0.00 0.20 0.00 0.06 0.03 0.00 0.00 0.17 0.00 0.05 Nei-Gojobori dN 0.08 0.01 0.00 0.09 0.07 0.12 0.13 0.03 0.01 0.13 0.09 0.16 0.11 0.02 0.01 0.11 0.08 0.14 Test Z-value 1.00 0.17 nc 0.43 1.75 1.00 1.80 1.50 1.00 0.64 3.00 2.00 1.75 2.00 1.00 0.67 2.67 2.25 Codon first pd 0.07 0.00 0.00 0.03 0.03 0.06 0.18 0.01 0.01 0.12 0.04 0.11 0.07 0.02 0.00 0.09 0.02 0.05 position pd S.E. 0.03 0.00 0.00 0.03 0.02 0.04 0.05 0.01 0.01 0.03 0.03 0.03 0.03 0.01 0.00 0.03 0.01 0.02 Codon second pd 0.05 0.01 0.00 0.11 0.09 0.10 0.09 0.04 0.02 0.07 0.11 0.12 0.07 0.03 0.01 0.09 0.10 0.11 position pd S.E. 0.03 0.01 0.00 0.05 0.04 0.04 0.04 0.02 0.02 0.02 0.04 0.04 0.03 0.02 0.01 0.03 0.03 0.03 Codon third pd 0.08 0.04 0.00 0.12 0.02 0.07 0.06 0.01 0.00 0.07 0.02 0.03 0.07 0.02 0.00 0.09 0.02 0.05 position pd S.E. 0.05 0.03 0.00 0.06 0.02 0.03 0.04 0.01 0.00 0.03 0.02 0.01 0.03 0.01 0.00 0.03 0.01 0.02 pd 0.07 0.02 0.00 0.09 0.05 0.08 0.11 0.02 0.01 0.09 0.06 0.09 0.09 0.02 0.01 0.09 0.05 0.08 Codon pd S.E. 0.02 0.01 0.00 0.03 0.02 0.02 0.03 0.01 0.01 0.02 0.02 0.02 0.02 0.01 0.00 0.01 0.01 0.01 dS 0.14 0.00 0.00 0.11 0.04 0.08 0.07 0.01 0.00 0.07 0.00 0.03 0.09 0.01 0.00 0.08 0.01 0.04Nei-Gojobori dN 0.06 0.02 0.00 0.09 0.05 0.08 0.12 0.02 0.01 0.09 0.07 0.11 0.09 0.02 0.06 0.09 0.07 0.09 Test Z-value 0.82 2.00 nc 0.20 0.40 0.00 1.50 0.50 1.00 0.75 3.50 4.00 1.86 1.17 1.20 0.34 2.89 1.97 Codon first pd 0.02 0.00 0.00 0.03 0.01 0.01 0.11 0.01 0.00 0.05 0.02 0.05 0.05 0.00 0.00 0.04 0.01 0.03 position pd S.E. 0.01 0.00 0.00 0.02 0.01 0.01 0.04 0.01 0.00 0.02 0.02 0.02 0.02 0.00 0.00 0.01 0.01 0.01 Codon second pd 0.01 0.00 0.00 0.01 0.01 0.01 0.05 0.02 0.02 0.04 0.06 0.06 0.02 0.01 0.01 0.02 0.03 0.03position pd S.E. 0.01 0.00 0.00 0.01 0.01 0.00 0.03 0.02 0.02 0.02 0.03 0.02 0.01 0.01 0.01 0.01 0.01 0.01 Codon third pd 0.07 0.03 0.07 0.09 0.06 0.09 0.06 0.01 0.00 0.03 0.02 0.02 0.06 0.02 0.04 0.07 0.04 0.07 position pd S.E. 0.03 0.02 0.03 0.03 0.02 0.03 0.04 0.01 0.00 0.03 0.02 0.01 0.02 0.01 0.02 0.02 0.01 0.02 pd 0.03 0.01 0.02 0.04 0.02 0.04 0.07 0.01 0.01 0.04 0.03 0.04 0.05 0.01 0.02 0.04 0.03 0.04Codon pd S.E. 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 dS 0.08 0.03 0.11 0.13 0.09 0.13 0.06 0.01 0.00 0.02 0.00 0.02 0.07 0.02 0.06 0.08 0.05 0.08 Nei-Gojobori dN 0.02 0.01 0.00 0.02 0.01 0.01 0.07 0.01 0.01 0.05 0.04 0.05 0.04 0.01 0.00 0.03 0.02 0.03Test Z-value 1.78 1.02 2.23 2.06 2.55 2.58 0.46 0.07 1.03 1.97 2.64 1.91 1.23 0.95 2.12 1.65 1.59 1.96 Codon first pd 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 position pd S.E. 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 Codon second pd 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 position pd S.E. 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 Codon third pd 0.06 0.03 0.09 0.08 0.06 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.02 0.07 0.06 0.05 0.07 position pd S.E. 0.03 0.02 0.04 0.03 0.02 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.02 0.03 0.03 0.02 0.03 pd 0.02 0.01 0.03 0.03 0.02 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.02 0.03 0.02 0.03 Codon pd S.E. 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.01 dS 0.05 0.04 0.14 0.13 0.11 0.14 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.04 0.12 0.11 0.09 0.12 Nei-Gojobori dN 0.02 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00 Test Z-value 1.55 1.32 2.25 2.25 2.85 2.60 nc nc nc nc nc nc 1.57 1.34 2.23 2.28 2.77 2.60 Codon first pd 0.02 0.00 0.00 0.02 0.01 0.02 0.14 0.01 0.01 0.09 0.03 0.08 0.07 0.00 0.00 0.05 0.02 0.05 position pd S.E. 0.01 0.00 0.00 0.02 0.01 0.01 0.04 0.01 0.01 0.03 0.02 0.03 0.02 0.00 0.00 0.01 0.01 0.01 Codon second pd 0.03 0.00 0.00 0.03 0.03 0.03 0.07 0.03 0.01 0.06 0.08 0.09 0.04 0.02 0.00 0.04 0.05 0.06 position pd S.E. 0.01 0.00 0.00 0.02 0.02 0.01 0.03 0.02 0.01 0.02 0.03 0.03 0.02 0.01 0.00 0.01 0.02 0.02 Codon third pd 0.06 0.03 0.06 0.09 0.05 0.08 0.04 0.01 0.00 0.05 0.01 0.03 0.06 0.02 0.04 0.08 0.04 0.06 position pd S.E. 0.02 0.02 0.03 0.03 0.02 0.02 0.03 0.01 0.00 0.03 0.01 0.01 0.02 0.01 0.02 0.02 0.01 0.02 pd 0.04 0.01 0.02 0.05 0.03 0.05 0.08 0.02 0.01 0.07 0.04 0.07 0.06 0.01 0.01 0.06 0.03 0.05 Codon pd S.E. 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.01 0.00 0.01 0.01 0.01 0.01 0.00 0.01 0.01 0.01 0.01 dS 0.08 0.03 0.10 0.12 0.09 0.13 0.05 0.01 0.00 0.05 0.00 0.03 0.07 0.02 0.06 0.09 0.05 0.08Nei-Gojobori dN 0.03 0.01 0.00 0.03 0.02 0.03 0.09 0.02 0.01 0.07 0.06 0.08 0.05 0.01 0.00 0.05 0.03 0.05Test Z-value 1.46 0.98 2.26 1.90 2.18 2.24 1.04 0.61 1.41 0.49 3.51 2.82 0.60 0.71 2.05 1.51 0.80 1.19 All positions No PBR Ext 5 Å No PBR PBR Ext 5 Å PBR 4. Selected sites using SLAC, FEL and REL methods. For SLAC and FEL methods, a p-value ≤ 0.1 was considered as significant, and for REL, the Bayes factor of ≥ 50 was considered as significant. Significant positively selected codons has been marked in red, and significant negatively selected codons has been marked in gray. SLAC S L A C F E L F E L R E L REL Codon dN-dS p-value dN-dS p-value dN-dS Bayes Factor 13 4.381 0.701 2.008 0.249 2.182 54.212 22 8.203 0.626 4.096 0.303 2.157 72.289 28 -13.876 0.245 -15.503 0.046 -1.571 3.840 54 8.467 0.539 3.719 0.318 2.158 73.199 56 9.027 0.462 3.731 0.211 2.209 109.947 68 11.177 0.559 4.610 0.213 2.183 87.433 69 7.451 0.679 2.957 0.313 2.182 86.489 72 13.811 0.296 4.443 0.156 2.216 117.933 23 3.519 0.991 10.738 0.870 1.815 320.525 28 8.987 0.746 27.692 0.404 1.840 1.8E+05 43 4.416 0.996 -9.1E+04 0.953 1.816 339.136 50 9.612 0.673 26.301 0.303 1.840 1.7E+05 51 5.230 0.667 14.578 0.357 1.819 381.336 66 5.258 0.742 15.439 0.521 1.819 391.442 72 5.206 0.670 13.690 0.373 1.818 369.870 14 -25.190 0.201 -193.283 0.056 -7.924 5.6E+12 15 -25.171 0.111 -91.231 0.052 -7.931 8.4E+09 20 -37.757 0.037 -246.682 0.004 -7.940 2.4E+38 37 -16.672 0.252 -95.249 0.103 -7.924 3.6E+12 38 -19.729 0.213 -122.547 0.090 -7.920 4.4E+12 15 -5.586 0.037 -6.536 0.007 -2.575 3.2E+14 20 -3.724 0.114 -6.080 0.016 -2.565 3.5E+13 25 -3.080 0.216 -9.523 0.042 -2.525 3.1E+03 30 -3.024 0.220 -5.553 0.063 -2.521 1.1E+03 37 -2.990 0.208 -3.098 0.098 -2.536 2.0E+03 39 -3.082 0.216 -10.204 0.037 -2.544 4.9E+03 41 -1.862 0.333 -2.099 0.116 -2.575 1.4E+08 13 8.832 0.453 8.159 0.140 0.979 111.828 22 12.293 0.651 10.881 0.350 1.018 1.2E+03 28 -23.357 0.143 -13.958 0.069 -0.586 2.143 31 7.567 0.768 5.927 0.590 0.968 88.879 37 -26.673 0.111 -20.159 0.044 -0.608 2.349 62 6.687 0.790 6.074 0.412 0.969 90.300 13 0.745 0.300 0.381 0.070 0.962 25.815 14 -0.842 0.249 -0.765 0.055 -1.707 20.904 15 -4.800 0.000 -2.530 0.000 -6.771 2.5E+07 20 -2.991 0.001 -2.082 0.000 -6.988 1.5E+06 22 1.506 0.191 0.873 0.055 2.693 102.582 25 -1.617 0.049 -1.122 0.009 -3.151 270.393 28 -3.201 0.019 -1.665 0.022 -5.871 205.564 30 -0.837 0.213 -0.669 0.054 -1.517 24.624 31 2.620 0.252 1.390 0.217 2.618 66.889 37 -2.418 0.009 -1.005 0.004 -3.104 1.9E+03 38 -0.838 0.213 -0.733 0.043 -1.680 34.413 39 -0.841 0.212 -0.760 0.042 -1.735 34.964 41 -0.997 0.111 -0.419 0.029 -1.066 151.425 42 0.745 0.299 0.394 0.084 0.966 20.834 68 1.259 0.250 0.659 0.058 2.027 31.837 72 1.676 0.078 0.719 0.025 2.467 74.320 73 1.363 0.209 0.047 0.954 0.197 50.089 Anthropoidea DPA Group 5 Group 4 Group 3 Group 2 Group 1 5. Relationships between synonymous (dS) and non synonymous substitutions (dN) in the different sectors of the DPA exon 2. Values above neutrality line (dS = dN) denotes accumulation of non-synonymous substitutions (positive selection pressure), values below neutrality line denotes accumulation of synonymous substitutions (negative selection pressure). Bold markers indicate a statistically significant Nei-Gogobori’s test. Test significances, definition of exon 2 sectors, PBR and extended PBR positions are according to supplementary material 3 and figure 2. Exon 2 - Region 1 Exon 2 -Region 2 Exon 2 0.25 0.25 0.25 0.2 0.2 0.2 Total 0.15 0.15 0.15 Total G1 G4 Total G1 G4 0.1 0.1 0.1 G4 G5 G1 G5 G5 0.05 0.05 0.05 G2 G2 G2 G3 G3 0 G3 0 0 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.15 0.15 0.15 G1 Total 0.1 0.1 0.1 Total G4 G4 G4 G1 Total G5 G5 G1 0.05 G5 0.05 0.05 G3 G2 G2 G2 G3 0 G3 0 0 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.15 0.15 0.15 0.1 0.1 0.1 G4 G1 0.05 0.05 Total 0.05 G1 G4 G4G1 G5 G5 G5 Total G2 G3 G2 Total G3 G2 G3 0 0 0 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.15 0.15 0.15 0.1 0.1 0.1 0.05 0.05 0.05 G5 Total G2 G1 G4 G3 G4G4 Total G2 G1 G5 Total 0 G2 G3 0 G1 0 G5 G3 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.25 0.25 0.25 0.2 0.2 0.2 0.15 0.15 0.15 0.1 0.1 0.1Total G1 G5 Total 0.05 0.05 G4 0.05 G4 G1 G4 G1 Total G2 G5 G2 G5 0 G3 G3 G2 G3 0 0 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 dS All positions No PBR Ext 5 Å No PBR PBR Ext 5 Å PBR dN Capítulo 2. Characterising a Microsatellite for DRB Typing in Aotus vociferans and Aotus nancymaae López C, Suárez CF, Cadavid LF, Patarroyo ME, Patarroyo MA. Characterising a microsatellite for DRB typing in Aotus vociferans and Aotus nancymaae (Platyrrhini). PLoS One. 2014;9(5):e96973. La versión publicada del artículo puede ser consultada en: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0096973 60 Characterising a Microsatellite for DRB Typing in Aotus vociferans and Aotus nancymaae (Platyrrhini) Carolina López1,2,3., Carlos F. Suárez1,2., Luis F. Cadavid4, Manuel E. Patarroyo5, Manuel A. Patarroyo1,2* 1 Molecular Biology and Immunology Department, Fundación Instituto de Inmunologı́a de Colombia (FIDIC), Bogotá, Cundinamarca, Colombia, 2 School of Medicine and Health Sciences, Universidad del Rosario, Bogotá, Cundinamarca, Colombia, 3 MSc Microbiology Programme, Instituto de Biotecnologı́a (IBUN), Universidad Nacional de Colombia, Bogotá, Cundinamarca, Colombia, 4 Genetics Institute, Universidad Nacional de Colombia, Bogotá, Cundinamarca, Colombia, 5 School of Medicine, Universidad Nacional de Colombia, Bogotá, Cundinamarca, Colombia Abstract Non-human primates belonging to the Aotus genus have been shown to be excellent experimental models for evaluating drugs and vaccine candidates against malaria and other human diseases. The immune system of this animal model must be characterised to assess whether the results obtained here can be extrapolated to humans. Class I and II major histocompatibility complex (MHC) proteins are amongst the most important molecules involved in response to pathogens; in spite of this, the techniques available for genotyping these molecules are usually expensive and/or time-consuming. Previous studies have reported MHC-DRB class II gene typing by microsatellite in Old World primates and humans, showing that such technique provides a fast, reliable and effective alternative to the commonly used ones. Based on this information, a microsatellite present in MHC-DRB intron 2 and its evolutionary patterns were identified in two Aotus species (A. vociferans and A. nancymaae), as well as its potential for genotyping class II MHC-DRB in these primates. Citation: López C, Suárez CF, Cadavid LF, Patarroyo ME, Patarroyo MA (2014) Characterising a Microsatellite for DRB Typing in Aotus vociferans and Aotus nancymaae (Platyrrhini). PLoS ONE 9(5): e96973. doi:10.1371/journal.pone.0096973 Editor: Roscoe Stanyon, University of Florence, Italy Received October 17, 2013; Accepted April 14, 2014; Published May 12, 2014 Copyright:  2014 López et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This research was supported by the ‘‘Departamento Administrativo de Ciencia, Tecnologı́a e Innovación (COLCIENCIAS)’’, contract RC#0309-2013. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: mapatarr.fidic@gmail.com . These authors contributed equally to this work. Introduction binding region (PBR). This sector mainly defines the alleles observed in vertebrates and is subject to diversifying selection and Using non-human primates in the field of biomedical research is recombination, thereby modelling its variability [15–17]. Twelve useful for validating methodologies for diagnosing and treating allele lineages have been characterised for Aotus MHC class II diseases affecting human beings [1,2]. Monkeys from the Aotus DRB, having considerable similarity with human HLA-DRB genus are used for studying the main types of human malaria lineages [12,18,19]. (Plasmodium falciparum and Plasmodium vivax), being suitable models Precise typing of MHC genes implies using laborious and costly due to their susceptibility to the infection, thereby facilitating the techniques due to their complex genomic organisation (usually into evaluation of vaccines and drugs for treating and controlling this different haplotypes) and their individual (expressing different disease. These primates have also been used for studying genes) and population variability (polymorphism) [13]. Current leishmaniasis, schistosomiasis, hepatitis, tuberculosis and various techniques would include restriction fragment length polymor- types of enteric infection [3–9]. phism (RFLP), single strand conformation polymorphism (SSCP), Previous studies have shown that this animal model is similar to denaturing gradient gel electrophoresis (DGGE), reference strand- humans regarding immune system molecules, particularly con- mediated conformational analysis (RSCA) and amplifying, cloning cerning MHC class II and especially those corresponding to and sequencing fragments of interest, especially exon 2. The latter human HLA-DR. Such similarity enables evaluating the immune represents the most precise approach but does involve some response to different pathogens and evaluating the potential of disadvantages such as its high cost and the longer time involved in molecules which are candidates for a vaccine aimed at controlling obtaining results. The other approaches offer results having diseases of importance for human health [10–12]. variable agreement with the data obtained by sequencing [20–22]. The high degree of polymorphism and allele diversity shown by In addition to the above, a microsatellite located at the start of MHC-DRB molecules in humans and other primates, as well as intron 2 in humans, macaques and chimpanzees has been used for their importance in interaction with peptides so that they can be typing MHC-DRB [23,24]. Short tandem repeat (STR) polymor- presented to the T-lymphocyte receptor, makes their typing phism has been shown to be well-correlated with the diversity relevant for evaluating an immune response to malaria and shown by exon 2. The microsatellite is basically presented as the vaccines designed for controlling it [13]. MHC-DR variability is repeat of (GT)x (GA)y dinucleotides, showing different degrees of mainly concentrated in MHC-DRB exon 2 and to a lesser extent complexity, according to the species being analysed [23]. in MHC-DRA exon 2 [14], both regions encoding the peptide PLOS ONE | www.plosone.org 1 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR Regarding HLA-DRB, the STR has been called D6S2878, monkeys being studied belonged to, following the methodology being present in all HLA-DRB genes/pseudogenes, except HLA- described by Ashley & Vaughn [31]. PCR was used for amplifying DRB2, HLA-DRB8 and HLA-DRB9 where the first part of intron the gene, using high fidelity Taq DNA polymerase. Two 2 is lost. It is highly polymorphic in composition and length and independent PCR reactions were performed and the amplified can specifically differentiate between HLA-DRB gene alleles [25]. products were purified using a Wizard SV gel and PCR clean-up This sector also exhibits polymorphism in Macaca mulatta, having system kit (Promega, Madison, WI, USA); these were sent for high variability regarding length and sequence, thus allowing the sequencing with mtCOII-specific primers using the BigDye characterisation of different MHC-DRB alleles in this primate Terminator method (MACROGEN, Seoul, South Korea). The [24]. DRB-STR microsatellite ancestral structure in Old World sequences so obtained were analysed for constructing phylogenetic monkeys (OWM) contains a simple nucleotide repeat, whilst HLA trees and these were then compared to previously described and Mamu-DRB-associated microsatellite structure is more sequences from databases for mtCOII from primates. complex [25]. Taking into account that this microsatellite’s variability pattern in humans and macaques is correlated with DNA, RNA extraction and cDNA synthesis exon 2 polymorphism, making it an attractive option for typing Genomic DNA (gDNA) from each specimen was isolated for A. these genes [25,26], it was thus of interest to verify whether the vociferans from 300 mL peripheral blood samples using an same occurs in New World monkeys (NWM). The MHC-DRB UltraClean Blood DNA Isolation kit (Carlsbad, CA, USA), intron 2 in Platyrrhini is very variable in length, ranging from following the manufacturer’s instructions. Total RNA was isolated 50 bp to 1 Kbp [27], including a simple repeat sequence of from 2 mL peripheral blood in EDTA diluted 1:1 with PBS. A around 50 bp downstream the limit between exon 2 and intron 2 Ficoll-Hypaque density gradient (Lymphocyte Separation Medi- [28,29]. um, ICN Biomedicals, CA, USA) was used for isolating The microsatellite present at the start of MHC-DRB genes’ mononuclear cells, according to the manufacturer’s recommen- intron 2 in individuals from the A. vociferans and A. nancymaae dations. The lymphocytes so recovered were immediately homog- species has thus been verified and characterised here, this being enised with TRIzol reagent (Life Technologies, NY, USA). cDNA the first systematic characterisation of this marker in NWM, was synthesised with a SuperScript III First-Strand Synthesis indicating the feasibility of its use in these primates for typing System for RT-PCR kit (Life Technologies, NY, USA), using MHC-DRB. Oligo(dT)20 (Invitrogen, NY, USA) as primer, according to the manufacturer’s instructions. Materials and Methods Genomic DNA was isolated from leucocytes for A. nancymaae, using a NucleoSpin C+T kit (Macherey-Nagel AG, Oensingen, Sample origin Switzerland), according to the manufacturer’s protocol. Total Monkeys from the Aotus nancymaae (25 adults) and Aotus vociferans RNA was isolated from PBMC using a NucleoSpin RNA kit species (23 adults) were studied; they came from FIDIC’s primate (Macherey-Nagel AG, Oensingen, Switzerland), according to the station in Leticia, Amazonas, Colombia. Blood samples from A. manufacturer’s recommendations. Reverse transcription was vociferans were collected fresh, whilst those from A. nancymaae had performed using SuperScript and Oligo(dT)12–18 primer (Gibco been collected in 2001. All primates were kept in conditions laid BRL Life Technologies, Basel, Switzerland). Both gDNA and down by Colombian Ministry of Health (law 84/1989) and cDNA were preserved in 95% ethanol at 280uC until use. DNA Colombian Institute of Health regulations for animal care, integrity was verified by electrophoresis on 1% agarose gel, stained monitored weekly by CORPOAMAZONIA (resolutions 0202/ with SYBR Safe (Invitrogen) for visualisation under UV light. 1999 and 0028/2010). All procedures were approved and NanoDrop 2000 (Thermo Scientific) was used for calculating the supervised by the Health Research Ethics Committee and FIDIC’s concentration. Primate Station Ethics Committee. The US Committee on the Care and Use of Laboratory Amplifying, cloning and sequencing Animals’ guidelines were followed for all animal handling The primers used here were designed by aligning available procedures, in turn complying with Colombian regulations for genome sequences for the Callithrix jaccus, Homo sapiens and Macaca biomedical research (resolution 8430/1993 and law 84/ mulatta MHC-DRB region (Table S1 in File S1), using Netprimer 1989).Monkeys at the station were numbered, sexed, weighed, software [32] for optimising parameters. Two sets of primers were given a physical-clinical exam and kept temporally in individual used for amplifying exon 2+ intron 2 sequences. The first primer cages, prior to all experimental procedures. They were kept in u u set included direct primer GEX2DRBf (59-GGTCAAGGTTCC-controlled conditions regarding temperature (25 –30 centigrade) CAGAGC-3) to the end of intron 1 and reverse GEX2DRBr (59- and relative humidity (83%), similar to those present in their CTCCAAGGATAAGAAGAAGCC-39) located about 100 bp natural environment. The monkeys’ diet was based on a supply of downstream of the end of the microsatellite. The second set fruit typical of the Amazon region (i.e. such primates’ natural diet), included direct primer F-DRBINT1-2 (59-TTCGTGTCCCCA- vegetables and a nutritional supplement including vitamins, CAGCAC-39) to the end of intron 1 and reverse R-DRBINT2-2 minerals and proteins. Environmental enrichment included visual (59-TAAACCCTCACCCCAGCC-39) situated about 160 bp barriers to avoid social conflict, feeding devices, some branches downstream of the end of the microsatellite (Figure 1). Direct and vegetation, perches and habitat. Any procedure requiring primer DRBExon1PF (59-CACTGGCTTTGGCTGGGGAC-39) animal handling was undertaken by trained veterinary personnel in exon 1 was used for amplification from cDNA with either and animals were submitted to sedation and analgesia procedures DRBExon6PR1 (59-CCACAAGGGAGGACATTTCTGC-39) or to reduce stress when necessary [30]. DRBExon6PR2 (59-CCAAGGGCAGAAGCTGAGGAA-39) re- verse primers in exon 6. Molecular characterisation of species of owl monkeys Two independent PCR reactions were carried out for each studied primate; the reactions followed recommendations made by Lenz et Mitochondrial gene cytochrome c oxidase subunit II (mtCOII) al., [33] for avoiding chimera formation. The KAPA HiFi sequences were used for determining the species to which the owl HotStart Readymix enzyme (Kapa Biosystems, Woburn, MA, PLOS ONE | www.plosone.org 2 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR Figure 1. Diagram of the MHC-DRB region studied. The primers used for amplifying the exon 2+ intron 2 (partial) from gDNA are shown as arrows (purple and green); the PRExon2 primer was designed for confirmatory colony PCR (pink arrow). The MHC-DRB amplified sector (exon 2, intron alignable sectors 2 (A and B) and STR) was partitioned for sequence analysis (position and sites). doi:10.1371/journal.pone.0096973.g001 USA) was used with 0.3 mM each primer and 10–40 ng DNA (in Clustal X software (v2.1) was used for aligning all the MHC- the case of gDNA) or 2 mL recently synthesised cDNA for 25 mL DRB exon 2 and exon 2+ intron 2 sequences [36], using BioEdit final volume. The PCR reaction at saturation was carried out in a Sequence Alignment Editor software for manual editing [37]. PerkinElmer GeneAmp 9600 thermocycler. The following thermal MEGA software (v5.2) was used for selecting the best nucleotide profile was used for cDNA: 95uC for 5 min, 35 cycles at 98uC for substitution model using Bayesian Information Criteria (BIC); 20 s, 66uC/67uC (when using the first or the second reverse phylogenetic trees were constructed using minimum evolution, primer, respectively) for 15 s, 72uC for 30 s and a final 5 min neighbour joining, parsimony and maximum likelihood methods. extension step at 72uC. The following thermal profile was used for The bootstrap test was used for supporting the trees so obtained, in gDNA: 95uC for 5 min, 35 cycles at 98uC for 20 s, 57uC/66uC addition, the interior branch test was used for supporting trees (for the set of primers 1 or 2, respectively) for 15 s, 72uC for 30 s constructed using the minimum evolution and neighbour joining and the final extension step at 72uC for 5 min. methods. 1,000 replicates were carried out; those groups having Amplified products were purified using a Wizard SV Gel and greater than or equal to 70% by bootstrap and greater than or PCR Clean-Up System kit (Promega, USA) and a protocol was equal to 95% by interior branch test were considered as supported used for extending A with GoTaq Flexi DNA polymerase groups [38,39]. (Promega) to enable ligating them with the pGem-T Easy Vector Systems (Promega, Madison, WI, USA)vector, following the Microsatellite analysis manufacturer’s recommendations. The transformation was carried Microsatellite search and building database (MSDB) software out in Escherichia coli JM109 strain competent cells. The clones [40] was used for identifying the microsatellite, using the imperfect were selected using positive selection with ampicillin and lacZ gene search mode; valid repeats were considered as those having 12 or a-complementation. Plasmid DNA was extracted using an more mononucleotide segments and repeats having 4 or more di- UltraClean 6 Minute Mini Plasmid Prep kit (MO BIO, USA). tri-tetra-penta-hexa nucleotides. Their descriptors were construct- Given that other targets were observed for the pairs of primers ed using previous results and manual edition as guidelines. A used for amplifying the exon 2+ intron 2 STR sector, a primer was compressibility method was used, given the difficulty of obtaining designed at the end of exon 2 (PRExon2) (59- an unambiguous alignment of repeat sectors when they were TCGCCGCTGCACTGTGAAG-39), enabling confirmatory col- analysed exclusively. The sequences were organised as 100 tandem ony PCR, using those used in amplifying gDNA as direct primers repeats and compressed into separate files using an adaptive (Figure 1). The reaction contained 1 mL enzyme buffer, 0.6 mL Lempel-Ziv algorithm (using the Linux command compress). From MgCl2 [25 mM], 1.6 mL dNTPs [1.25 mM], 0.8 mL of each the resulting vector obtained from the bytes for each compressed primer [5 mM], 0.12 mL GoTaq Flexi DNA polymerase (Promega) sequence, a distance matrix was then calculated using either the and 10–40 ng colony DNA at 10 mL final volume. PCR conditions Euclidean, Maximum or Manhattan metrics through the DIST consisted of one cycle at 95uC for 5 min, 35 cycles at 95uC for package from R [41].Hierarchical clusters were constructed with 1 min, 60uC for 1 min, 72uC for 1 min and a final extension step the R hclust package [41], using single and complete methods. at 72uC for 5 min. At least 8 clones (confirmed from each amplification) were Results selected for sequencing; their DNA was sequenced in both directions using T7 and SP6 primers, following the BigDye Amplicons ranging from ,700 bp to ,1,000 bp were obtained Terminator method (MACROGEN, Seoul, South Korea). for A. vociferans and A. nancymaae samples (Figure 2); 289 sequences were obtained from exon 2+STR intron 2. One to five different Sequence analysis MHC-DRB sequences per animal were observed from two independent PCR reactions; this implied the duplication of this The MHC-DRB sequence electropherograms were assembled loci, as has been reported previously [12]. A total of 34 distinct using CLC Main Workbench software v.5 (CLC bio, Cambridge, nucleotide sequences were validated, 28 of which were also MA, USA). The sequences so obtained had to comply with the isolated from cDNA: two new sequences belonging to two new A. following requirements to be considered as being valid: having nancymaae lineages (Aona-DRB*W9101 and Aona-DRB*W8901), been found in at least two independent PCR from the same 7 new sequences belonging to five new A. vociferans lineages (Aovo- individual, or coming from two different individuals (including DRB*W9101, Aovo-DRB*W9102, Aovo-DRB*W9201, Aovo- previously reported sequences in this category). The alleles found DRB*W9202, Aovo-DRB*W9301, Aovo-DRB*W8801, Aovo- were validated and named by a curator from the Immuno DRB*W9001), 11 new sequences from previously reported A. Polymorphism Database (IPD) [34,35]. vociferans lineages (Aovo-DRB1*0304, Aovo-DRB1*0305, Aovo- PLOS ONE | www.plosone.org 3 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR DRB1*0306, Aovo-DRB1*0307, Aovo-DRB3*0601, Aovo- analysis methods described in the methodology were used on an DRB*W1801, Aovo-DRB*W1802, Aovo-DRB*W1803, Aovo- alignment of 268 positions. Figure 3 shows the tree with the DRB*W2901, Aovo-DRB*W3001, Aovo-DRB*W4501), 6 new maximum likelihood method using a GTR+G+I model. from previously reported A. nancymaae lineages (Aona- The alleles observed came from some lineages previously DRB1*031701, Aona-DRB1*0329, Aona-DRB3*062502, Aona- reported by Suárez et al., [12] thereby highlighting the existence of DRB3*0628, Aona-DRB*W1808, Aona-DRB*W3002) and 8 seven new lineages. Most lineages were supported by all the already reported sequences for A. nancymaae lineages (Aona- phylogenetic reconstruction and support methods (those only DRB1*0328, Aona-DRB3*0615, Aona-DRB3*062501, Aona- supported by some of them are indicated by circles in the node); DRB3*0626, Aona-DRB3*0627, Aona-DRB*W1806, Aona- however, the relationships between such lineages had low support DRB*W2908, Aona-DRB*W2910) (see Table S1 in File S1). (Figure 3). Based on the sequences studied here, most observed The MHC-DRB amplified sector was divided into the following lineages were trans-specific, DRB1*03 GB and DRB*W89 partitions for sequence analysis: intron 1 (positions 1–15: 15 sites), lineages being species-specific for A. nancymaae and DRB*W88, exon 2 (positions 16–285: 270 sites), intron 2A (alignable; positions DRB*W92, DRB*W90, DRB*W45 and DRB*W93 for A. 286-325: 40 sites), intron 2R (STR sector; positions 326–1,110: vociferans. 785 sites), intron 2B (alignable; positions 1,111–1,378: 268 sites) Molecular phylogenetic analysis was made regarding the 34 (Figure 1). These size ranges were related to aligning the sequences sequences reported here, examining separately either exon 2 or the given in Figure S1 (within File S1). concatenated intron 2 alignable sectors (2A+2B) using previously Greater conservation of alignable areas was observed in intron 2 described evolutionary analysis methods. Figure 4A shows the tree (A+B, 9561% identity) compared to exon 2 (9161% identity). An obtained by aligning exon 2 sequences (271 positions) with the unambiguous alignment could not be made for intron 2 STR. This maximum likelihood method, using a HKY+G+I model. Figure 4B had substantial variation regarding its size, representing an 83 bp shows the tree obtained by aligning intron 2 alignable sectors (344 (Aovo-DRB*W9301) to 761 bp (Aona-DRB1*0329GA) interval. positions) using the maximum likelihood method and an HKY+ Exon 2 in the sequences reported here were analysed together G+I model. with 57 representative sequences of Aotus MHC-DRB allele Most groups’ identity was maintained regarding intron 2 lineages reported in previous studies by Suárez et al., and Niño et alignable sectors compared to those observed in exon 2, although al.,[12,18] and others available in Genbank. The evolutionary some became fused (i.e. DRB3*06 - DRB1*03 GA, DRB*W45 - DRB*W92 and DRB*W89 - DRB*W29), changing their relation- ships for each partition. However, lineage differentiation was well supported and even the association between some lineages (e.g. DRB3*06 - DRB1*03 GA, DRB*W30 – DRB*W92) was very clear, being maintained for the sets of data and methods analysed. Compressibility was used for estimating similarity between sequences, given that the intron 2 repeat sector was not unequivocally alignable due to its repeat nature. The Lempel– Ziv algorithm was used with the Linux standard command compress for compressing files. Each sequence was repeated 100 times in tandem to ensure better resolution, so that files were 734–7,249 bytes after having been compressed (Figure 4C). Equivalent results were obtained using different metrics and grouping/clustering methods. Figure 4C shows the results using Manhattan metrics and the complete linkage agglomeration method. The STR grouping pattern is an intermediate between that of exon 2 and that generated from intron 2 A+B sectors. It was observed that DRB3*06 and DRB1*03 GA lineages were associated in all the sectors analysed, being included in this grouping the DRB1*03GB lineage sequence in intron 2 A+B sectors and in STR. Each lineage’s definition became lost in the STR, Aona-DRB1-0329GA, Aona-DRB1-031701GA and Aona- DRB1-0328GB sequences being differentiated by differences in STR length but being maintained in a common cluster with the remaining DRB3*06 and DRB1*03 sequences. DRB*W88, DRB*W29, DRB*W30, DRB*W92, DRB*W91 and DRB*W90 lineages were associated in both exon 2 and the STR, the difference being that DRB*W89 and DRB*W45 lineages were inserted in the latter analysis, grouping with DRB*W29 and DRB*W30/*W91 lineages, respectively, in the STR and intron 2 A+B sectors. DRB*W89 and DRB*W45 were grouped in exon 2 with the DRB1*03GA - DRB3*06 - DRB*W18 Figure 2. A. nancymaae and A. vociferans exon 2+ intron 2 partial group. The DRB*W30 and DRB*W92 lineages formed a cluster amplicons. Amplicons ranging from ,700 bp to ,1,000 bp were with the DRB1*03GA and DRB3*06 group in the intron 2 A+B obtained from A. vociferans and A. nancymaae samples. A. Lanes 1–10 sectors. The DRB*W18 lineage was always well characterised, show A. nancymaae amplicons. B. Lanes 1–10 show A. vociferans amplicons, lane 11 negative control. MW. molecular weight. having a cluster in STR and exon 2 which included DRB1*03GA doi:10.1371/journal.pone.0096973.g002 - DRB3*06 – DRB1*03 GB lineages. The DRB*W92/*W91/ PLOS ONE | www.plosone.org 4 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR PLOS ONE | www.plosone.org 5 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR Figure 3. Maximum likelihood tree constructed from Aotus MHC-DRB exon 2 sequences (91 OTUs, 268 aligned positions). The analysis involved using the general time reversible model with invariable positions and Gamma distribution (5 categories, +G, parameter = 0.3371), . 70% bootstrap values are displayed. Green dots represent nodes supported by parsimony (.70% bootstrap), Neighbour joining and minimum evolution tests (.70% bootstrap and .95% interior branch test), but not in maximum likelihood analysis. Nodes represented by blue dots were supported only by parsimony (.70% bootstrap), but not in the maximum likelihood analysis. Bootstrap and interior branch tests involved using 1,000 replicates. The scale bar represents substitutions per site. New sequences reported in this study are shown in bold. Abbreviations and GenBank accession numbers for the sequences compared here are shown in Table S1 (within File S1). doi:10.1371/journal.pone.0096973.g003 *W45 lineages were also included in intron 2 A+B sectors in this It was observed that the microsatellite was characteristic for group. some lineages, being clearly differentiated by length and structure, The DRB*W93 lineage appeared in all analysis as a divergent forming 3 groups which included the 34 sequences described for member of the cluster formed by DRB3*06 - DRB1*03 GA - the Aotus species included in this study. The STR could be divided DRB1*03 GB - DRB*W18 and was related to the DRB*W45 into 3 sectors (Table S2 in File S1), the initial and final sectors lineage in exon 2, losing such relationship in intron 2. This lineage being similar in all sequences; greater variability (intra and inter had a similar pattern to that of DRB*W89, whose grouping was lineages) was observed in the microsatellite’s central region. (GA)y very different between exon 2 and intron 2. was the main repeat motif found in all cases. MSDB software [40] was used for characterising the amplified The STR had a similar structure throughout the DRB1*03 and sequences (exon 2+ intron 2 (partial)) for analysing motifs (Table DRB3*06 lineage sequence repeat sector, but there were S2 in File S1). The different types of microsatellite agreed with the differences regarding the number of repeats. The microsatellite results found by the compression method. had lengths ranging from 294 to 354 bp in A. nancymaae and A. Figure 4. Comparison amongst exon 2, alignable sectors of intron 2 and intron 2 STR. A. Maximum likelihood tree constructed from Aotus MHC-DRB exon 2 sequences (34 OTUs, 344 aligned positions). The analysis used the Hasegawa-Kishino-Yano model with invariable positions and Gamma distribution (5 categories, +G, parameter = 0.2659, +I, 51.7393% sites). B. Maximum likelihood tree constructed from Aotus MHC-DRB intron 2 (A+B) sequences (34 OTUs, 271 aligned positions). The analysis involved using the Hasegawa-Kishino-Yano model with invariable positions and Gamma distribution (5 categories, +G, parameter = 0.2378, +I, 0.0% sites). C. Complete linkage tree constructed from Aotus MHC-DRB intron 2 STR sequences. The analysis was done using a Manhattan distance over Lempel-Ziv compression. Compression in bytes (B) and length in nucleotides (L) are also shown. Nodes indicated by red dots were supported by all methods. Nodes shown by green dots were supported by parsimony (.70% bootstrap), Neighbour joining and minimum evolution tests (.70% bootstrap and .95% interior branch test), but not in maximum likelihood analysis. Nodes represented by blue dots were supported only by parsimony (bootstrap .70%), but not in maximum likelihood analysis. Bootstrap and interior branch tests were performed using 1,000 replicates. The scale bar represents substitutions per site (A and B), and bytes (C). Abbreviations and GenBank accession numbers of the analysed sequences are shown in Table S1 (within File S1). doi:10.1371/journal.pone.0096973.g004 PLOS ONE | www.plosone.org 6 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR vociferans DRB3*06 lineage sequences, a very similar structure however, the presence of a central motif (GA)y was constant, being being maintained in the initial and final part. There were slight very idiosyncratic for each allelic lineage analysed. The sequences differences in the repeats towards its central part and identical obtained from the C. jacchus genome were illustrative in this sequences were even observed in the STR, such as Aona- respect; whilst Caja-DRB*04 only had 3 base pairs in the repeat DRB3*062501/*0615. The DRB1*03 lineage sequences did not sector, Caja-DRB*05 was 849 bp. have a specific STR pattern, length varying from 274 to 462 bp. The selected sequences were subjected to two molecular However, two defined groups were identified, one for the Aovo- phylogenetic analysis; one used just exon 2 and another used DRB1*0304, 1*0307 and 1*0306GA sequences and another for intron 2 alignable sectors (A+B). Figure 5A shows the maximum the Aona-DRB1*0328GB, Aona-DRB1*031701 and 1*0329GA likelihood analysis for exon 2. Several Catarrhini and Platyrrhini sequences, having similar structure and length. The Aona- sequences were associated, presenting a mixture of alleles from DRB1*0329GA and *031701GA sequences had very similar both types of primate in several groups. For Catarrrihini, some distribution, having minimal differences regarding length at the groups were formed by a mixture of species belonging to different start of the STR. Aovo-DRB1*0305GA had an STR having a genera and families. This did not happen for NWM; the particular structure, but maintaining similarity concerning lineage. Callitrichidae maintained their identity in well-supported nodes, The Aona-DRB3*0627 and Aona-DRB3*0628 sequences’ repeat whilst the Cebus sequence was associated with one of the groups of sector had similar distribution with DRB1*03 lineage sequences sequences formed by Aotus sequences. regarding repeats and length. Figure 5B shows the maximum likelihood analysis for intron 2 Regarding DRB*W18 lineage sequences, the STR had a size alignable sectors (2A+2B). having clear division of Platyrrhini and ranging from 144 to 160 bp, having similar distribution concern- Catarrhini sequences. Regarding Catarrhini, most groups were ing composition and number of repeats at the beginning and end seen to be well-differentiated, being mainly groups exclusively of the STR. Each sequence varied specifically at the central part in containing Anthropoidea (Homo, Pan, Gorilla) or Cercopithecoidea both nucleotide sequence and number of repeats. The Aovo- (Macaca, Chlorocebus), few cases involving both groups occurring DRB*W9301 sequence had a 66 bp STR, being the smallest of all simultaneously. A genus-specific disposition predominated in the sequences. It maintained a similar structure in the initial and Platyrrhini. The Aotus sequences were configured into three final part to that described in other lineages, having a relatively groups, whilst Callitrichidae formed multiple genus- specific short central region (26 bp). clusters. The result for this sector was similar to that observed The microsatellite had similar structure at the start and end in for exon 2+ intron 2 (A+B) (not shown). the DRB*W89/*W29/*W88/*W90/*W91/*W45/*W30/and *W92 lineages, having a length ranging from 68 to 156 bp. Discussion Various sequences had practically identical STR in this group, such as Aovo-DRB*W9201 and Aovo-DRB*W3001 (only one Analysis of Aotus MHC-DRB gene exon 2 sequences showed repeat being different), or identical STR, such as Aona- how the number of trans-specific lineages for the genus were DRB*W8901, Aona-DRB*W2910 and Aona-DRB*W2908. Re- increased and defined by improving A. vociferans sampling. Except garding this group, the Aovo-DRB*W2901 sequence had very for DRB*W41, DRB*W43, DRB*W44, DRB*W38, DRB*W42, similar organisation in the STR, having slight differences DRB*W47, DRB*W13 and DRB1*03GC lineages, the remaining regarding structure and the number of repeats, given that even Aotus lineages were sampled in the present study (Figure 3). though belonging to the same lineage (W29), it came from a The definition of two sub-lineages could be observed in lineages different species. The Aovo-DRB*W8801 sequence was similar to like DRB*W18 (having no report of alleles for A. vociferans), one the DRB*W29 lineage, but had differences concerning the belonging to species typically from the north of the Amazon region number of repeats in the central region. The Aovo-DRB*W9102 (A. vociferans and A. trivirgatus) and another related to species and Aona-DRB*W9101 sequences in the DRB*W91 lineage had typically from the south of the Amazon region (A. nancymaae and A. similar microsatellite structure, having few differences concerning nigriceps). Such tendency (although less marked) was observed for the number of repeats in the central region. the DRB1*03GA lineage where a well-supported sub-lineage was Regarding primates, 34 sequences from the MHC-DRB gene’s exclusively grey-neck (there were also exclusively red-neck sub- exon 2+ intron 2 (partial) were analysed in A. nancymaae and A. lineages). An A. vociferans sequence (Aovo-DRB3*0601) was vociferans; sequences related to the sector being studied were reported for the DRB3*06 lineage (apparently exclusive to red- selected from previous typing reports [14,27] and a search of neck monkeys) which was identical to an A. nancymaae sequence available complete or ongoing primate genomes using the BLAST (Aona-DRB3*062501). This was also true for the DRB*W45 and algorithm [42]. This led to 86 primate sequences being included, DRB*W30 lineages where A. vociferans sequences were described including representatives for distinct human lineages (Table S1 in (Figure 3). Apparently exclusive lineages exist, such as the File S1). Clustal X v2.1 software was used for aligning the DRB1*03GB lineage, which has just A. nancymaae sequences; sequences [36]; these were then edited manually (especially in the however, differing degrees of trans-specificity were observed in the repeat sector). The MHC-DRB sector was divided into the rest of the lineages, even though there could be specific sub- partitions shown in Figure 1 for their analysis. lineages. A satisfactory alignment could not be made for the intron 2 There were differences regarding frequencies but not regarding repeat area (which is why it has not been considered in the the repertoires of the two Aotus species studied here, indicating that phylogenetic analysis); however, the alignable sectors from intron 2 each had undergone diversification; however, they maintained (A and B) had a notable degree of identity (9060.8% for all notable identity between their MHC-DRB repertoires over a primates), this being 94.160.7% for NWM and 90.260.7% for relatively long period of time (from 13–8 mya) [43]. Such trans- OWM. Such degree of conservation was even greater than that specific polymorphism in repertoires suggests that using both observed for exon 2, whose average identity for the primates species as animal models could be equivalent for MHC-DRB- studied here was 87.360.1% (similar values being obtained for mediated processes [44]. both OWM and NWM). The intron 2 repeat region had notable Comparative analysis of Aotus DRB genes’ exon 2 phylogenies variation regarding length between the primates analysed here; (Figure 4A) and intron 2 alignable sectors (Figure 4B) showed that PLOS ONE | www.plosone.org 7 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR PLOS ONE | www.plosone.org 8 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR Figure 5. Maximum likelihood trees. A. Maximum likelihood tree constructed from Aotus MHC-DRB exon 2 sequences (120 OTUs, 271 aligned positions). The analysis involved using Kimura’s 2 parameter model with invariable positions and Gamma distribution (5 categories, + G, parameter = 0.5550). B Maximum likelihood tree constructed from Aotus MHC-DRB alignable sectors of intron 2 (132 OTUs, 359 aligned positions). The analysis involved using the general time reversible model with invariable positions and Gamma distribution (5 categories, + G, parameter = 1.2072). .70% bootstrap values are displayed. The bootstrap test involved using 1,000 replicates. The scale bar represents substitutions per site. Abbreviations and GenBank accession numbers for the sequences compared here are shown in Table S1 (within File S1). doi:10.1371/journal.pone.0096973.g005 some of the lineages clearly maintained their identity, whilst others from exon 2 and those observed for the STR was not always became merged. The relationship between lineages also changed consistent, just as in previous reports concerning OWM published from one sector to another, groups of well-supported lineages by Bontrop et al., [23,24,49]. becoming formed in analysis of intron 2 (this did not happen in The ancestral structure of the microsatellite in Catarrhini has exon 2). The degree of intron 2 A+B sector conservation was evolved from dinucleotide repeats (GT)x (GA)y. Current structure notable compared to exon 2, thereby highlighting the magnitude of the HLA- and Mamu-DRB-associated microsatellite was seen to of the latter’s selection process. be more complex (Figure 6). The repeat in the 59 extreme was the Differential grouping showed that distinct forces have modu- longest, uninterrupted part; the second part (GA)z was short and lated each DRB gene sector’s evolution, thereby posing the interrupted by other dinucleotides, being able to correlate well question, ‘‘Which one reflects more accurately the origin of DRB with different DRB gene lineages. The length of the third segment genes en Aotus?’’ If the intron 2 alignable sectors were to be chosen (GA)y could also be correlated with some DRB gene lineages in M. (given that they apparently have not undergone the previously mulatta. The 39 extreme consisted of a short (GC)n repeat part. It is described phenomena generating diversity in exon 2), then one known that mutation tendency depends on repeat length, since would have a scenario where the number of lineages would be less there is less microsatellite stability in the longer dinucleotide than that proposed based on exon 2 polymorphism, and the repeats than in the shorter ones [23,28,47]. relationships between them would have been different. Positive The (GA)y dinucleotide in Aotus was maintained in STR selection and recombination would thus have generated variability structure and the (GT)x repeat was not present. Initial and final which would have grouped (by convergence and/or recombina- extreme repeat length in the microsatellites was similar between tion) the sequences in previously described lineages. If exon 2 were lineages, whilst repeat composition and number in the middle part to be chosen, the scenario would be marked by intron 2 could have been associated with specific lineages, sequences or recombination which would lead to the different groups’ groups; this could have been explained by the inherent differences homogenisation in fewer lineages. in mutation rate between the different parts of the microsatellite. Recombination substantially affects support for trees [45,46], The A. nancymaae and A. vociferans MHC-DRB microsatellite was thereby making the first scenario more probable, given that the present in all the DRB genes studied here, having considerable tree for the intron 2 alignable sectors was better supported than differences regarding length and variability, enabling it to that for exon 2. However, complete DRB gene sequences differentiate some lineages, and even DRB sequences, thereby (including coding and non-coding sectors) are needed to clarify agreeing with exon 2 diversity. STR variability in other primate this point. species was not always consistent with a given lineage; however, STR in Aotus mainly had (GA)y repeats interrupted by CT others could be characterised by a unique pattern [23,26]. motifs and a similar structure between sequences at the 59 and 39 Analysis of the repeat region of 5 sequences from another extremes belonging to the same group according to phylogeny for Platyrrhini genus, Callithrix jacchus (Caja-DRB*01/*02/*03/*05/ the intron 2 alignable sectors (Figure S1 and Table S2 in File S1, *06), revealed the same organisational pattern described for Aotus, Figure 4B). The (GA)y repeats form part of the ancestral structure having a (GA)y repeat in the central sector which was complex, described for Catarrhini [29,47,48]. interrupted by CT motifs, highly variable in length and number of The Aotus MHC-DRB microsatellite is variable in length, as has repeats; it came within the same ranges observed for Aotus, having been described for humans, macaques and chimpanzees. Exon 2 130-554 bp repeats. The initial and final parts of the Caja-DRB analysis led to observing that the microsatellite for the DRB3*06 STR had similar length and sequence, the initial part being similar lineage (the Aovo-DRB3*0601, Aona-DRB3*062502, Aona- to that for Aotus, but having a more complex final part (Table S2 in DRB3*0626, Aona-DRB3*0628 and Aona-DRB3*0627 sequence File S1). group) could differentiate them due to their variable length, except Using techniques which did not require sequence alignment for for the Aona-DRB3*062501 and Aona-DRB3*0615 sequences comparing them was useful in cases where this was impractical (i.e. which had identical length and sequence, meaning that sequencing analysis of complete genomes). As compression gives a basic methods were needed for identifying these alleles. measurement of a sequence of characters’ algorithmic complexity, The microsatellite had highly variable length in the it could be especially useful when dealing with biological DRB1*03GA, DRB*W18, DRB*W91, DRB*W93, DRB*W88, sequences. Using Lempel-Ziv complexity as a tool for data-mining DRB*W90, DRB*W91, DRB*W45 and DRB*W30 lineage and and classifying nucleic acid and protein sequences has already could differentiate the sequences to which it belonged in A. been proposed [50,51]. nancymaae and A. vociferans, except for the Aona-DRB*W8901, Compression in the present work measured two relevant Aona-DRB*W2910/*W2908 and Aovo-DRB*W9201 sequences parameters in microsatellite analysis, given that compressed size where the microsatellite had the same length thereby differenti- (in bytes) would have depended on a sequence’s length and degree ating it as a group, but not individually, and thus working as a of simplicity (monotony), being very correlated with length in this screening but not as a typing method for these alleles. case (R2 = 0.9793) given that the repeats between sequences were According to the results reported here, the composition of the the same type and had the same complexity, mainly varying microsatellite described for MHC-DRB sequences in A. nancymaae regarding number (Figure S1 and Table S2 in File S1). and A. vociferans was more variable and complex than in humans Results for the repeat sector and exon 2 and intron 2 alignable and other Catarrhini (Figure 6). Comparison of the groups deduced sectors (Figures 4A and B) highlighted sector agreement. There PLOS ONE | www.plosone.org 9 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR Figure 6. MHC-DRB STR model for Platyrrhini cf Catarrhini. The Figure shows the STR structure described by Bontrop et al., for Human HLA- DRB (STR-HLA) and Macaca mulatta MHC-DRB (STR-Mamu); and our proposed Aotus MHC-DRB model (STR-Aotus). The lengths ranges for each STR are shown. The ancestral structure of the microsatellite in Catarrhini has evolved from dinucleotide repeats (GT)x (GA)y; the (GA)y dinucleotide in Aotus was maintained in STR structure and the (GT)x repeat was not present. STR in Aotus mainly had (GA)y repeats interrupted by CT motifs, this being more complex and bigger than Catarrhini STR. doi:10.1371/journal.pone.0096973.g006 were two large groups, one formed by DRB3*06, DRB1*03, [27,53,57] and other orders of mammals [58–60]. Evidence DRB*W18 and DRB*W93 and another formed by DRB*W29, sustaining such observation has been based on independent DRB*W91 and DRB*W88, DRB*W89, DRB*W45, DRB*W30 - analysis of other MHC-DRB sectors not implicated in PBR DRB*W92 lineages being associated with one of the two, formation, where sequences belonging to Catarrhini and Platyr- according to the sector being analysed. The DRB*W89 and rhini have been shown to cluster apart, whilst for exon 2, they DRB*W45 lineages had the greatest differences regarding cluster within common allelic lineages [27,53,57], thus favouring grouping pattern between exon 2 and the STR, whilst this the appearance of common motifs between different lineages, occurred between the STR and intron 2 alignable sectors in thereby contributing towards reducing bootstrap support [45]. DRB*W30 - DRB*W92 groups. Phylogenetic comparison of exon 2 (Figure 5A) and intron 2 There was no differentiation between lineages for the alignable sectors (Figure 5B) from the Aotus sequences so obtained DRB3*06-DRB1*03, DRB*W89, DRB*W29, DRB*W30, and a representative sample from other primates, showed that DRB*W92 and DRB*W91 groups, suggesting that exon 2 origin whilst the last displays a clear division between Platyrrhini MHC- and diversity represented a characteristic which could have been DRB sequences (shown in red) and Catarrhini (Hominoidea in derived from a less diverse original set. This agreed with the origin blue and Cercopithecoidea in green), the analysis of exon 2 of NWM arising from these primates’ African transfer during the presented a mixture of alleles from both types of primate, and thus Eocene age (35 mya) [52], implying that current class I and class II molecular convergence between several groups is observed. This MHC lineages were generated from a founding event [53]. agreed with previous reports [27,53]. Phylogenetic analysis of MHC-DRB gene exon 2 in primates Differently to the convergence regarding phenotypical features, (Figure 5A) highlighted the difficulty of inferring this gene’s convergence at molecular level is a rare phenomenon producing evolutionary relationships based just on this sector. Previous the same effect as another phenomenon which has shaped MHC studies [12,27,54] have shown that even though the alleles being evolution, trans-specific polymorphism, implying the maintenance studied have been associated in assigned lineages, there has been of allele diversity going beyond speciation events due to balanced poor support for such relationships, given the occurrence of selection [61]. phenomena guaranteeing PBR functional and structural stability. The extent of the convergence between related groups’ lineages However, as a response to the diversity exhibited by pathogen has not been previously described for DRB genes in primates; our proteins as a mechanism for avoiding the immune response, analysis showed that the phylogenies obtained from exon 2 and variation in the PBR has been produced by several mechanisms, those obtained for intron 2 differed regarding the relationship thereby establishing a co-evolutionary arms race [55]. The most inside Platyrrhini and Catarrhini. The occurrence of groups relevant features would include balanced selection (for conserving containing Hominoidea and Cercopithecoidea sequences was both functional integrity and diversifying the receptor) and greater in analysis inferred from exon 2 (Figure 5A) than in clusters recombination (intra-locus and inter-loci) [15–17,56]. obtained from intron 2 (Figure 5B). The same was true for Analysis of just exon 2 has revealed the occurrence of groups of Platyrrhini, where the C. apella sequence appeared to be included multiple primate species, thus showing the existence of groups within a group of Aotus sequences in analysis of exon 2 (Figure 5A), containing Platyrrhini and Catarrhini sequences, even though whilst this did not occur regarding inference from intron 2 most groups of sequences were biased regarding the types of (Figure 5B). The foregoing could imply more recent convergence primate forming them (i.e. showing some group as being than that described to date. It also shows that MHC-DRB in predominant) (Figure 5A). The inferences drawn regarding exon primates has had a complex evolutionary mode in which trans- 2 did not lead to concluding whether such grouping reflected a specific evolution has occurred at the same time as convergence common origin for these lineages or convergence. between the different species analysed, underlining a predomi- Concerning the particular case of MHC-DRB, molecular nantly intra-generic TSP pattern. convergence at exon 2 level has been described in both primates PLOS ONE | www.plosone.org 10 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR The molecular study in primates of the DRB gene in intron 2 for analysing MHC-DRB exon 2+ intron 2 (partial) in primates. (without considering the repeat sector) showed a high degree of Table S2. Microsatellite sequence and length in Platyr- identity for all the primates, indicating a clear division between rhini MHC-DRB. STR structure corresponding to each DRB NWM and OWM and between DRB gene lineages, demonstrat- gene sequence for A. nancymaae and A. vociferans. The colours signify ing an independent origin for each DRB repertoire in Platyrrhini microsatellite identity or similarity and microsatellite sequences and Catarrhini. The study also verified that the microsatellite corresponding to MHC-DRB Callithrix jacchus (in bold) are shown present in A. nancymaae and A. vociferans MHC-DRB gene intron 2 at the end. Figure S1. Aligning A. vociferans and A. could be a useful marker for high and medium resolution nancymaae MHC-DRB gene exon 2+ intron 2 (partial) genotyping of the MHC-DRB gene in these species, and probably sequences. in NWM. The microsatellite sequences could have been associated (PDF) with the polymorphism observed for the corresponding Aotus MHC-DRB exon 2, making this a valuable tool for studying these Acknowledgments genes’ variability. We would like to thank Wendy Ortiz, Luis Alfredo Baquero and Yoelis Supporting Information Yepes for their technical assistance, and Jason Garry for translating the manuscript. File S1 Supporting tables and figure. Table S1. Se- quences used for designing primers and analysis of exon Author Contributions 2+ intron 2. Available genome sequences for the Callithrix jaccus, Conceived and designed the experiments: CL CFS. Performed the Homo sapiens and Macaca mulatta MHC-DRB region were used for experiments: CL. Analyzed the data: CFS CL. Wrote the paper: CFS designing the primers. Sequences used for comparative analysis of CL LFC MEP MAP. Aotus MHC-DRB exon 2+ intron 2 (partial), as well as those used References 1. Ward JM, Vallender EJ (2012) The resurgence and genetic implications of New 19. Middleton SA, Anzenberger G, Knapp LA (2004) Identification of New World World primates in biomedical research. Trends Genet 28: 586–591. monkey MHC-DRB alleles using PCR, DGGE and direct sequencing. 2. Bontrop RE (2001) Non-human primates: essential partners in biomedical Immunogenetics 55: 785–790. research. Immunol Rev 183: 5–9. 20. Ujvari B, Belov K (2011) Major histocompatibility complex (MHC) markers in 3. Bone JF, Soave OA (1970) Experimental tuberculosis in owl monkeys (Aotus conservation biology. Int J Mol Sci 12: 5168–5186. trivirgatus). Lab Anim Care 20: 946–948. 21. Baquero JE, Miranda S, Murillo O, Mateus H, Trujillo E, et al. (2006) 4. Gysin J (1988) Animal models: primates. In: Sherman IW, editor.Malaria: Reference strand conformational analysis (RSCA) is a valuable tool in identifying parasite biology, pathogenesis and protection. Washington DC: ASM. pp. 419– MHC-DRB sequences in three species of Aotus monkeys. Immunogenetics 58: 439. 590–597. 5. Jones FR, Baqar S, Gozalo A, Nunez G, Espinoza N, et al. (2006) New World 22. Knapp LA, Cadavid LF, Eberle ME, Knechtle SJ, Bontrop RE, et al. (1997) monkey Aotus nancymae as a model for Campylobacter jejuni infection and Identification of new mamu-DRB alleles using DGGE and direct sequencing. immunity. Infect Immun 74: 790–793. Immunogenetics 45: 171–179. 6. Lujan R, Dennis VA, Chapman WL Jr, Hanson WL (1986) Blastogenic 23. Doxiadis GG, de Groot N, Claas FH, Doxiadis II, van Rood JJ, et al. (2007) A responses of peripheral blood leukocytes from owl monkeys experimentally highly divergent microsatellite facilitating fast and accurate DRB haplotyping in infected with Leishmania braziliensis panamensis. Am J Trop Med Hyg 35: humans and rhesus macaques. Proc Natl Acad Sci U S A 104: 8907–8912. 1103–1109. 24. de Groot NG, Heijmans CM, de Groot N, Doxiadis GG, Otting N, et al. (2009) 7. Noya O, Gonzalez-Rico S, Rodriguez R, Arrechedera H, Patarroyo ME, et al. The chimpanzee Mhc-DRB region revisited: gene content, polymorphism, (1998) Schistosoma mansoni infection in owl monkeys (Aontus nancymai): pseudogenes, and transcripts. Mol Immunol 47: 381–389. evidence for the early elimination of adult worms. Acta Trop 70: 257–267. 25. Doxiadis GG, de Groot N, Dauber EM, van Eede PH, Fae I, et al. (2009) High 8. Pico de Coana Y, Rodriguez J, Guerrero E, Barrero C, Rodriguez R, et al. resolution definition of HLA-DRB haplotypes by a simplified microsatellite (2003) A highly infective Plasmodium vivax strain adapted to Aotus monkeys: typing technique. Tissue Antigens 74: 486–493. quantitative haematological and molecular determinations useful for P. vivax 26. de Groot N, Doxiadis GG, de Vos-Rouweler AJ, de Groot NG, Verschoor EJ, et malaria vaccine development. Vaccine 21: 3930–3937. al. (2008) Comparative genetics of a highly divergent DRB microsatellite in 9. Polotsky YE, Vassell RA, Binn LN, Asher LV (1994) Immunohistochemical different macaque species. Immunogenetics 60: 737–748. detection of cytokines in tissues of Aotus monkeys infected with hepatitis A virus. 27. Kriener K, O’Huigin C, Tichy H, Klein J (2000) Convergent evolution of major Ann N Y Acad Sci 730: 318–321. histocompatibility complex molecules in humans and New World monkeys. 10. Diaz D, Naegeli M, Rodriguez R, Nino-Vasquez JJ, Moreno A, et al. (2000) Immunogenetics 51: 169–178. Sequence and diversity of MHC DQA and DQB genes of the owl monkey Aotus 28. Riess O, Kammerbauer C, Roewer L, Steimle V, Andreas A, et al. (1990) nancymaae. Immunogenetics 51: 528–537. Hypervariability of intronic simple (gt)n(ga)m repeats in HLA-DRB genes. 11. Guerrero JE, Pacheco DP, Suarez CF, Martinez P, Aristizabal F, et al. (2003) Immunogenetics 32: 110–116. Characterizing T-cell receptor gamma-variable gene in Aotus nancymaae owl 29. Andersson G, Larhammar D, Widmark E, Servenius B, Peterson PA, et al. monkey peripheral blood. Tissue Antigens 62: 472–482. (1987) Class II genes of the human major histocompatibility complex. 12. Suarez CF, Patarroyo ME, Trujillo E, Estupinan M, Baquero JE, et al. (2006) Organization and evolutionary relationship of the DR beta genes. J Biol Chem Owl monkey MHC-DRB exon 2 reveals high similarity with several HLA-DRB 262: 8748–8758. lineages. Immunogenetics 58: 542–558. 30. National Research Council (U.S.). Committee for the Update of the Guide for 13. Bontrop RE, Otting N, de Groot NG, Doxiadis GG (1999) Major the Care and Use of Laboratory Animals, Institute for Laboratory Animal histocompatibility complex class II polymorphisms in primates. Immunol Rev Research (U.S.), National Academies Press (U.S.) (2011) Guide for the care and 167: 339–350. use of laboratory animals. Washington, D.C.: National Academies Press. xxv, 14. Doxiadis GG, de Groot N, de Groot NG, Doxiadis, II, Bontrop RE (2008) 220 p. p. Reshuffling of ancient peptide binding motifs between HLA-DRB multigene 31. Ashley A (1995) Owl monkeys (Aotus) are highly divergent in mitochondrial family members: old wine served in new skins. Mol Immunol 45: 2743–2751. cytochrome C oxidase (COII) sequences. Journal of Primatology 16: 793–806. 15. Yeager M, Hughes AL (1999) Evolution of the mammalian MHC: natural 32. PREMIER Biosoft International PA, CA, USA (2013) Netprimer. selection, recombination, and convergent evolution. Immunol Rev 167: 45–58. 33. Lenz TL, Becker S (2008) Simple approach to reduce PCR artefact formation 16. Takahata N, Satta Y (1998) Selection, convergence, and intragenic recombi- leads to reliable genotyping of MHC and other highly polymorphic loci– nation in HLA diversity. Genetica 102–103: 157–169. implications for evolutionary analysis. Gene 427: 117–123. 17. Takahata N, Satta Y (1998) Footprints of intragenic recombination at HLA loci. 34. de Groot NG, Otting N, Robinson J, Blancher A, Lafont BA, et al. (2012) Immunogenetics 47: 430–441. Nomenclature report on the major histocompatibility complex genes and alleles 18. Nino-Vasquez JJ, Vogel D, Rodriguez R, Moreno A, Patarroyo ME, et al. (2000) of Great Ape, Old and New World monkey species. Immunogenetics 64: 615– Sequence and diversity of DRB genes of Aotus nancymaae, a primate model for 631. human malaria parasites. Immunogenetics 51: 219–230. 35. Robinson J, Halliwell JA, McWilliam H, Lopez R, Parham P, et al. (2013) The IMGT/HLA database. Nucleic Acids Res 41: D1222–1227. PLOS ONE | www.plosone.org 11 May 2014 | Volume 9 | Issue 5 | e96973 Aotus Intron 2 MHC-DRB STR 36. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) 49. Doxiadis GG, de Groot N, de Groot NG, Rotmans G, de Vos-Rouweler AJ, et Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. al. (2010) Extensive DRB region diversity in cynomolgus macaques: recombi- 37. Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and nation as a driving force. Immunogenetics 62: 137–147. analysis program for Windows 95/98/NT. Nucl Acids Symp Ser 41: 95–98. 50. Otu HH, Sayood K (2003) A new sequence distance measure for phylogenetic 38. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5: tree construction. Bioinformatics 19: 2122–2130. molecular evolutionary genetics analysis using maximum likelihood, evolution- 51. Gusev VD, Nemytikova LA, Chuzhanova NA (1999) On the complexity ary distance, and maximum parsimony methods. Mol Biol Evol 28: 2731–2739. measures of genetic sequences. Bioinformatics 15: 994–999. 39. Sitnikova T (1996) Bootstrap method of interior-branch test for phylogenetic 52. Schrago CG, Russo CA (2003) Timing the origin of New World monkeys. Mol trees. Mol Biol Evol 13: 605–611. Biol Evol 20: 1620–1625. 40. Du L, Li Y, Zhang X, Yue B (2013) MSDB: a user-friendly program for 53. Trtkova K, Mayer WE, O’Huigin C, Klein J (1995) Mhc-DRB genes and the reporting distribution and building databases of microsatellites from genome origin of New World monkeys. Mol Phylogenet Evol 4: 408–419. sequences. J Hered 104: 154–157. 54. Suarez CF, Cardenas PP, Llanos-Ballestas EJ, Martinez P, Obregon M, et al. (2003) Alpha1 and alpha2 domains of Aotus MHC class I and Catarrhini MHC 41. R Core Team (2013) R: A language and environment for statistical computing. class Ia share similar characteristics. Tissue Antigens 61: 362–373. Available: http://www.R-project.org/: R Foundation for Statistical Computing. 55. Acevedo-Whitehouse K, Cunningham AA (2006) Is MHC enough for 42. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped understanding wildlife immunogenetics? Trends Ecol Evol 21: 433–438. BLAST and PSI-BLAST: a new generation of protein database search 56. Reusch TB, Langefors A (2005) Inter- and intralocus recombination drive MHC programs. Nucleic Acids Res 25: 3389–3402. class IIB gene diversification in a teleost, the three-spined stickleback 43. Menezes AN, Bonvicino CR, Seuanez HN (2010) Identification, classification Gasterosteus aculeatus. J Mol Evol 61: 531–541. and evolution of owl monkeys (Aotus, Illiger 1811). BMC Evol Biol 10: 248. 57. O’Huigin C (1995) Quantifying the degree of convergence in primate Mhc-DRB 44. Klein J (1987) Origin of major histocompatibility complex polymorphism: the genes. Immunol Rev 143: 123–140. trans-species hypothesis. Hum Immunol 19: 155–162. 58. Srithayakumar V, Castillo S, Mainguy J, Kyle CJ (2012) Evidence for 45. Efron B, Halloran E, Holmes S (1996) Bootstrap confidence levels for evolutionary convergence at MHC in two broadly distributed mesocarnivores. phylogenetic trees. Proc Natl Acad Sci U S A 93: 7085–7090. Immunogenetics 64: 289–301. 46. Martin D, Rybicki E (2000) RDP: detection of recombination amongst aligned 59. Gustafsson K, Andersson L (1994) Structure and polymorphism of horse MHC sequences. Bioinformatics 16: 562–563. class II DRB genes: convergent evolution in the antigen binding site. 47. Bergstrom TF, Engkvist H, Erlandsson R, Josefsson A, Mack SJ, et al. (1999) Immunogenetics 39: 355–358. Tracing the origin of HLA-DRB1 alleles by microsatellite polymorphism. 60. Gustafsson K, Brunsberg U, Sigurdardottir S, Andersson L (1991) A Am J Hum Genet 64: 1709–1718. Phylogenetic Investigation of MHC Class II DRB Genes Reveals Convergent 48. Epplen C, Santos EJ, Guerreiro JF, van Helden P, Epplen JT (1997) Coding Evolution in the Antigen Binding Site. In: Klein J, Klein D, editors.Molecular versus intron variability: extremely polymorphic HLA-DRB1 exons are flanked Evolution of the Major Histocompatibility Complex: Springer Berlin Heidel- by specific composite microsatellites, even in distant populations. Hum Genet berg. pp. 119–130. 99: 399–406. 61. Klein J, Sato A, Nikolaidis N (2007) MHC, TSP, and the origin of species: from immunogenetics to evolutionary genetics. Annu Rev Genet 41: 281–304. PLOS ONE | www.plosone.org 12 May 2014 | Volume 9 | Issue 5 | e96973 Table S1. Sequences used in this article Accesion Sequence Remarks Number Genbank Sequences used for primer designing Caja_DRB_G_01 43380-44258 Callithrix jacchus BAC clone CH259-77F15 from chromosome unknown, complete sequence AC242730 Caja_DRB_G_02 105908-107178 Callithrix jacchus BAC clone CH259-77F15 from chromosome unknown, complete sequence AC242730 Caja_DRB_G_03 192613-193403 Callithrix jacchus BAC clone CH259-77F15 from chromosome unknown, complete sequence AC242730 Caja_DRB_G_04 138963-138357 Callithrix jacchus BAC clone CH259-15O7 from chromosome unknown, complete sequence AC243457 Caja_DRB_G_05 76772-75338 Callithrix jacchus BAC clone CH259-15O7 from chromosome unknown, complete sequence AC243457 Caja_DRB_G_06 161835-162584 Callithrix jacchus BAC clone CH259-49P2 from chromosome unknown, complete sequence AC242576 Mamu_DRB_G_01 140068-140822 Macaca mulatta Major Histocompatibility Complex BAC MMU012K22 AC148663 Mamu_DRB_G_02 28917-29606 Macaca mulatta Major Histocompatibility Complex BAC MMU012K22 AC148663 Mamu_DRB_G_03ps 111593-112214 Macaca mulatta Major Histocompatibility Complex BAC MMU012K22 AC148663 Mamu_DRB_G_05 c75283-74620 Macaca mulatta Major Histocompatibility Complex BAC MMU248N17 AC148697 Mamu_DRB_G_06 c151148-150454 Macaca mulatta Major Histocompatibility Complex BAC MMU248N17 AC148697 Mamu_DRB_G_07 c29018-28359 Macaca mulatta Major Histocompatibility Complex BAC MMU248N17 AC148697 Mamu_DRB_G_08 c26068-25409 Macaca mulatta Major Histocompatibility Complex BAC MMU281E18 AC148700 Mamu_DRB_G_04 173697-174407 Macaca mulatta Major Histocompatibility Complex BAC MMU370O02 AC148706 HLA_DRB1*040101 Homo sapiens major histocompatibility complex, class II, DR53 haplotype (DR53) NG_002433 HLA_DRB1*070101 Human DNA sequence from clone DADB-102D14 on chromosome 6 CR753309 HLA_DRB1*1501 Homo sapiens major histocompatibility complex, class II, DR51 haplotype (DR51) on chromosome 6 NG_002432 HLA_DRB1*1602 Homo sapiens MHC class II antigen (HLA-DRB1) gene DR51 AB774985 HLA_DRB3*01012 Homo sapiens major histocompatibility complex, class II, DR52 haplotype (DR52) on chromosome 6 NG_002392 HLA_DRB3*020201 Human DNA sequence from clone DAQB-97F8 on chromosome 6 AL929581 HLA_DRB4*01030101 Homo sapiens major histocompatibility complex, class II, DR53 haplotype (DR53) NG_002433 HLA_DRB5*0101 Homo sapiens major histocompatibility complex, class II, DR51 haplotype (DR51) on chromosome 6 NG_002432 HLA_DRB6ps Homo sapiens major histocompatibility complex, class II, DR51 haplotype (DR51) on chromosome 6 NG_002432 HLA_DRB7ps Homo sapiens major histocompatibility complex, class II, DR53 haplotype (DR53) NG_002433 HLA_DRB1*03 c91958-91273 Human DNA sequence from clone DAQB-97F8 on chromosome 6 AL929581 SEQUENCES HERE REPORTED Aovo-DRB1*03:05 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447756 Aovo-DRB1*03:06 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447757 Aovo-DRB1*03:07 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447758 Aovo-DRB*W91:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447733 Aovo-DRB*W92:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447735 Aovo-DRB*W92:02 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447736 Aovo-DRB*W91:02 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447737 Aovo-DRB*W93:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447740 Aovo-DRB*W88:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447741 Aovo-DRB*W29:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447742 Aovo-DRB1*03:04 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447759 Aovo-DRB*W18:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447762 Aovo-DRB*W18:02 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447763 Aovo-DRB*W18:03 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447764 Aovo-DRB*W90:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447765 Aovo-DRB3*06:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447766 Aovo-DRB*W30:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447738 Aovo-DRB*W45:01 Aotus vociferans MHC class II antigen beta chain (Aovo-DRB) gene KF447739 Aona-DRB*W91:01 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447734 Aona-DRB*W29:10 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447743 Aona-DRB*W29:08 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447744 Aona-DRB*W30:02 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447745 Aona-DRB1*03:28 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447746 Aona-DRB*W89:01 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447747 Aona-DRB3*06:25:01 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447748 Aona-DRB3*06:26 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447749 Aona-DRB3*06:27 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447750 Aona-DRB3*06:25:02 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447751 Aona-DRB3*06:15 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447752 Aona-DRB3*06:28 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447753 Aona-DRB1*03:29 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447754 Aona-DRB1*03:17:01 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447755 Aona-DRB*W18:08 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447760 Aona-DRB*W18:06 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene KF447761 Accesion Sequence Remarks Number Genbank Sequences of Aotus MHC-DRB (molecular phylogenetic analysis exon 2) Aoaz-DRB*W3801 Aotus azarai MHC class II antigen (Aoaz-DRB) gene Aoaz-DRB*W3801 allele AY429143 Aoaz-DRB3*0601 Aotus azarai MHC class II antigen (Aoaz-DRB3) gene Aoaz-DRB3*0601 allele AY429142 Aona-DRB*W1301 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene Aona-DRB*W1301 allele AF132767 Aona-DRB*W1305 Aotus nancymaae isolate 20896_2 MHC class II antigen beta chain (DRB) mRNA DRB*W1305 allele AY563223 Aona-DRB*W1306 Aotus nancymaae isolate 21100_1 MHC class II antigen beta chain (DRB) mRNA DRB*W1306 allele AY563218 Aona-DRB*W1309 Aotus nancymaae isolate 22417-7 MHC class II antigen beta chain (DRB) mRNA DRB*W1309 allele AY563255 Aona-DRB*W1312 Aotus nancymaae MHC class II antigen (DRB) mRNA DRB*W1312 allele DQ162705 Aona-DRB*W1801 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene Aona-DRB*W1801 allele AF132768 Aona-DRB*W1802 Aotus nancymaae MHC-DRB (DRB*W) mRNA DRB*W1802 allele A F169487 Aona-DRB*W2901 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene Aona-DRB*W2901 allele AF129806 Aona-DRB*W2906 Aotus nancymaae MHC class II antigen (DRB) mRNA DRB*W2906 allele DQ162688 Aona-DRB*W2907 Aotus nancymaae isolate 20894_3 MHC class II antigen beta chain (DRB) mRNA DRB*W2907 allele AY563201 Aona-DRB*W3001 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB) gene Aona-DRB*W3001 allele AF132766 Aona-DRB*W3801 Aotus nancymaae isolate 16606_7 MHC class II antigen beta chain (DRB) mRNA DRB*W3801 allele AY563194 Aona-DRB*W4201 Aotus nancymaae isolate 20337_3 MHC class II antigen beta chain (DRB) mRNA DRB*W4201 allele AY563209 Aona-DRB*W4401 Aotus nancymaae isolate 20559_2 MHC class II antigen beta chain (DRB) mRNA DRB*W4401 allele AY563206 Aona-DRB*W4501 Aotus nancymaae isolate 20249_12 MHC class II antigen beta chain (DRB) mRNA DRB*W4501 allele AY563180 Aona-DRB*W4701 Aotus nancymaae isolate 20465_3 MHC class II antigen beta chain (DRB) mRNA DRB*W4701 allele AY563181 Aona-DRB*W4702 Aotus nancymaae isolate 22822_13 MHC class II antigen beta chain (DRB) mRNA DRB*W4702 allele AY563183 Aona-DRB*W470404 Aotus nancymaae MHC class II antigen (DRB) mRNA DRB*W470404 allele DQ162645 Aona-DRB1*0301 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB1) gene Aona-DRB1*0301 allele AF129793 Aona-DRB1*0302 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB1) gene Aona-DRB1*0302 allele AF129792 Aona-DRB1*0303 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB1) gene Aona-DRB1*0303 allele AF129794 Aona-DRB1*0305 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB1) gene Aona-DRB1*0305 allele AF129796 Aona-DRB1*0307 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB1) gene Aona-DRB1*0307 allele AF129798 Aona-DRB1*0313 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB1) gene Aona-DRB1*0313 allele AF132760 Aona-DRB1*0314 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB1) gene Aona-DRB1*0314 allele AF132761 Aona-DRB1*0319 Aotus nancymaae isolate 20719_2 MHC class II antigen beta chain (DRB1) mRNA DRB1*0319 allele AY563188 Aona-DRB1*0324 Aotus nancymaae isolate 21955_10 MHC class II antigen beta chain (DRB1) mRNA DRB1*0324 allele AY563193 Aona-DRB3*0601 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB3) gene Aona-DRB3*0601 allele AF129799 Aona-DRB3*0602 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB3) gene Aona-DRB3*0602 allele AF129800 Aona-DRB3*0603 Aotus nancymaae MHC class II antigen beta chain (Aona-DRB3) gene Aona-DRB3*0603 allele AF129801 Aona-DRB3*0614 Aotus nancymaae isolate 20444_7 MHC class II antigen beta chain (DRB3) mRNA DRB3*0614 allele AY563212 Aoni-DRB*W1301 Aotus nigriceps isolate 21921_5 MHC class II antigen beta chain (DRB) mRNA DRB*W1301 allele AY563261 Aoni-DRB*W1303 Aotus nigriceps MHC class II antigen (DRB) mRNA DRB*W1303 allele DQ162732 Aoni-DRB*W2901 Aotus nigriceps isolate 20596_8 MHC class II antigen beta chain (DRB) mRNA DRB*W2901 allele AY563259 Aoni-DRB*W2902 Aotus nigriceps isolate 21919_18 MHC class II antigen beta chain (DRB) mRNA DRB*W2902 allele AY563246 Aoni-DRB*W3801 Aotus nigriceps isolate 16584_5 MHC class II antigen beta chain (DRB) mRNA DRB*W3801 allele AY563245 Aoni-DRB*W4201 Aotus nigriceps isolate 20483_1 MHC class II antigen beta chain (DRB) mRNA DRB*W4201 allele AY563253 Aoni-DRB*W4301 Aotus nigriceps isolate 20848_16 MHC class II antigen beta chain (DRB) mRNA DRB*W4301 allele AY563249 Aoni-DRB*W4401 Aotus nigriceps isolate 20596_4 MHC class II antigen beta chain (DRB) mRNA DRB*W4401 allele AY563247 Aoni-DRB1*0301 Aotus nigriceps MHC class II antigen beta chain (Aoni-DRB1) gene Aoni-DRB1*0301 allele AF129797 Aoni-DRB1*0304 Aotus nigriceps isolate 21791_15 MHC class II antigen beta chain (DRB1) mRNA DRB1*0304 allele AY563242 Aoni-DRB1*0307 Aotus nigriceps MHC class II antigen (DRB1) mRNA DRB1*0307 allele DQ162711 Aoni-DRB1*W1801 Aotus nigriceps isolate 20456_8 MHC class II antigen beta chain (DRB1) mRNA DRB1*W1801 allele AY563257 Aoni-DRB3*0601 Aotus nigriceps isolate 20506_2 MHC class II antigen beta chain (DRB3) mRNA DRB3*0601 allele AY563229 Aotr-DRB*W1801 Aotus trivirgatus MHC class II DRB*W1801 gene exon L12477 Aotr-DRB1*0301 Aotus trivirgatus MHC class II DRB1*0301 gene exon L12472 Aotr-DRB1*0303 Aotus trivirgatus partial DRB1 gene for MHC class II antigen DRB1*0303 allele exon 2 AJ544176 Aotr-DRB3*0602 Aotus trivirgatus partial DRB3 gene for MHC class II antigen DRB3*0602 allele exon 2 AJ544174 Aotr-DRB3*0603 Aotus trivirgatus partial DRB3 gene for MHC class II antigen DRB3*0603 allele exon 2 AJ544175 Aotr-DRB3*06 Aotus trivirgatus MHC class II DRB gene exon L12474 Aovo-DRB*W130101 Aotus vociferans MHC class II antigen (DRB) mRNA DRB*W130101 allele DQ162634 Aovo-DRB*W1303 Aotus vociferans isolate 20789_1 MHC class II antigen beta chain (DRB) mRNA DRB*W1303 allele AY563258 Aovo-DRB*W4301 Aotus vociferans MHC class II antigen (DRB) mRNA DRB*W4301 allele DQ162630 Aovo-DRB*W4701 Aotus vociferans isolate 16704_9 MHC class II antigen beta chain (DRB) mRNA DRB*W4701 allele AY563227 Aovo-DRB1*0301 Aotus vociferans MHC class II antigen (DRB1) mRNA DRB1*0301 allele DQ162628 Accesion Sequence Remarks Number Genbank Sequences of primates MHC-DRB (molecular phylogenetic analysis exon 2 + intron 2) Caja_DRB_G_01 43380-44258 Callithrix jacchus BAC clone CH259-77F15 from chromosome unknown, complete sequence AC242730 Caja_DRB_G_02 105908-107178 Callithrix jacchus BAC clone CH259-77F15 from chromosome unknown, complete sequence AC242730 Caja_DRB_G_03 192613-193403 Callithrix jacchus BAC clone CH259-77F15 from chromosome unknown, complete sequence AC242730 Caja_DRB_G_04 138963-138357 Callithrix jacchus BAC clone CH259-15O7 from chromosome unknown, complete sequence AC243457 Caja_DRB_G_05 76772-75338 Callithrix jacchus BAC clone CH259-15O7 from chromosome unknown, complete sequence AC243457 Caja_DRB_G_06 161835-162584 Callithrix jacchus BAC clone CH259-49P2 from chromosome unknown, complete sequence AC242576 SAOE-DRB1*0303 Saguinus oedipus MHC class II antigen (SAOE-DRB1) pseudogene AF173332 SAOE-DRB3*0501 Saguinus oedipus MHC class II antigen (SAOE-DRB3) pseudogene AF173333 SAOE-DRB11*0102 Saguinus oedipus MHC class II antigen (SAOE-DRB11) gene AF173334 SAOE-DRB11*0105 Saguinus oedipus MHC class II antigen (SAOE-DRB11) gene AF173335 SAOE-DRB*W2209 Saguinus oedipus MHC class II antigen (SAOE-DRB) gene AF173336 CAJA-DRB1*0304 Callithrix jacchus MHC class II antigen (CAJA-DRB1) gene AF173337 CAMO-DRB1*0302 Callicebus moloch MHC class II antigen (CAMO-DRB1) gene AF173338 CAMO-DRB3*0503 Callicebus moloch MHC class II antigen (CAMO-DRB3) gene AF173339 CAMO-DRB3*0504 Callicebus moloch MHC class II antigen (CAMO-DRB3) pseudogene AF173340 CEAP-DRB*W1301 Cebus apella MHC class II antigen (CEAP-DRB) gene AF173341 CAJA-DRB*W1201 Callithrix jacchus MHC class II antigen (CAJA-DRB) gene AF173348 Mamu_DRB_G_01 140068-140822 Macaca mulatta Major Histocompatibility Complex BAC MMU012K22 AC148663 Mamu_DRB_G_02 28917-29606 Macaca mulatta Major Histocompatibility Complex BAC MMU012K22 AC148663 Mamu_DRB_G_03ps 111593-112214 Macaca mulatta Major Histocompatibility Complex BAC MMU012K22 AC148663 Mamu_DRB_G_05 c75283-74620 Macaca mulatta Major Histocompatibility Complex BAC MMU248N17 AC148697 Mamu_DRB_G_06 c151148-150454 Macaca mulatta Major Histocompatibility Complex BAC MMU248N17 AC148697 Mamu_DRB_G_07 c29018-28359 Macaca mulatta Major Histocompatibility Complex BAC MMU248N17 AC148697 Mamu_DRB_G_08 c26068-25409 Macaca mulatta Major Histocompatibility Complex BAC MMU281E18 AC148700 Mamu_DRB_G_04 173697-174407 Macaca mulatta Major Histocompatibility Complex BAC MMU370O02 AC148706 Chae_DRB_G_01 63309-64000 Chlorocebus aethiops BAC clone CH252-249I23 from chromosome 6 AC241608 Chae_DRB_G_02 143590-144262 Chlorocebus aethiops BAC clone CH252-249I23 from chromosome 6 AC241608 Chae_DRB_G_03 111557-112224 Chlorocebus aethiops BAC clone CH252-249I23 from chromosome 6 AC241608 Chae_DRB_G_04 21256-21901 Chlorocebus aethiops BAC clone CH252-249I23 from chromosome 6 AC241608 MAFA-DRB*W3301 Macaca fascicularis MHC class II antigen (MAFA-DRB) gene AF173349 MAAR-DRB1*0301 Macaca arctoides MHC class II antigen (MAAR-DRB1) gene AF173350 MAAR-DRB1*0302 Macaca arctoides MHC class II antigen (MAAR-DRB1) gene AF173351 MAAR-DRB1*0701 Macaca arctoides MHC class II antigen (MAAR-DRB1) gene AF173352 MAFA-DRB*W301 Macaca fascicularis MHC class II antigen (MAFA-DRB) gene AF173353 MAMU-DRB*W402 Macaca mulatta MHC class II antigen (MAMU-DRB) gene AF173354 MAAR-DRB*W601 Macaca arctoides MHC class II antigen (MAAR-DRB) pseudogene AF173355 MAAR-DRB5*0301 Macaca arctoides MHC class II antigen (MAAR-DRB5) gene AF173356 MAMU-DRB6*0106 Macaca mulatta MHC class II antigen (MAMU-DRB6) pseudogene AF173357 Mamu-DRB*W201 7681-8413 Macaca mulatta partial Mamu-DRB gene for MHC class II antigen AM910410 Mamu-DRB*W305 7704-8400 Macaca mulatta partial Mamu-DRB gene for MHC class II antigen AM910411 Mamu-DRB*W603 6501-7247 Macaca mulatta partial Mamu-DRB gene for MHC class II antigen AM910412 Mamu-DRB*W2507 6540-7190 Macaca mulatta partial Mamu-DRB gene for MHC class II antigen AM910413 Mamu-DRB1*0303 3087-3748 Macaca mulatta partial Mamu-DRB gene for MHC class II AM910414 Mamu-DRB1*0306 Macaca mulatta partial Mamu-DRB1 gene for MHC class II antigen AM910415 Mamu-DRB1*0309 7622-8327 Macaca mulatta partial Mamu-DRB1 gene for MHC class II antigen AM910417 Mamu-DRB1*0404 2994-3688 Macaca mulatta partial Mamu-DRB1 gene for MHC class II antigen AM910419 Mamu-DRB1*1007 8014-8684 Macaca mulatta partial Mamu-DRB1 gene for MHC class II antigen AM910420 Mamu-DRB3*0405 Macaca mulatta partial Mamu-DRB3 gene for MHC class II antigen AM910421 Mamu-DRB3*0408 2995-3667 Macaca mulatta partial Mamu-DRB3 gene for MHC class II antigen AM910422 Mamu-DRB5*0301 976-8612 Macaca mulatta partial Mamu-DRB5 gene for MHC class II antigen AM910423 Accesion Sequence Remarks Number Genbank Sequences of primates MHC-DRB (molecular phylogenetic analysis exon 2 + intron 2) HLA_DRB1*010101 Homo sapiens HLA-DRB1 gene for MHC class II antigen AM493435 HLA_DRB1*010201 Homo sapiens voucher Coriell Cell Repository DNA sample NA01018 AY663400 HLA_DRB1*040101 Homo sapiens major histocompatibility complex, class II, DR53 haplotype (DR53) NG_002433 HLA_DRB1*0405 Homo sapiens MHC class II antigen (HLA-DRB1) gene, HLA-DRB1*0404 AB715390 HLA_DRB1*070101 Human DNA sequence from clone DADB-102D14 on chromosome 6 CR753309 HLA_DRB1*08:03:02 Homo sapiens HLA-DRB1 gene for MHC class II antigen FN823238 HLA_DRB1*110101 Homo sapiens voucher Coriell Cell Repository DNA sample NA00576 AY663412 HLA_DRB1*110401 Homo sapiens voucher Coriell Cell Repository DNA sample NA14661 AY663394 HLA_DRB1*12:01:01 Homo sapiens HLA-DRB1 gene for major histocompatibility complex AB715399 HLA_DRB1*130201 Homo sapiens voucher Coriell Cell Repository DNA sample NA14663 AY663413 HLA_DRB1*140101 Homo sapiens voucher Coriell Cell Repository DNA sample NA10540 AY663405 HLA_DRB1*140501 Homo sapiens voucher Coriell Cell Repository DNA sample NA04535 AY663408 HLA_DRB1*1501 Homo sapiens major histocompatibility complex, class II, DR51 haplotype (DR51) on chromosome 6 NG_002432 HLA_DRB1*16 Homo sapiens major histocompatibility complex, class II NG_002432 HLA_DRB1*1602 Homo sapiens MHC class II antigen (HLA-DRB1) gene DR51 AB774985 HLA_DRB3*01012 Homo sapiens major histocompatibility complex, class II, DR52 haplotype (DR52) on chromosome 6 NG_002392 HLA_DRB3*020201 Human DNA sequence from clone DAQB-97F8 on chromosome 6 AL929581 HLA_DRB4*01030101 Homo sapiens major histocompatibility complex, class II, DR53 haplotype (DR53) NG_002433 HLA_DRB5*0101 Homo sapiens major histocompatibility complex, class II, DR51 haplotype (DR51) on chromosome 6 NG_002432 HLA_DRB6ps Homo sapiens major histocompatibility complex, class II, DR51 haplotype (DR51) on chromosome 6 NG_002432 HLA_DRB7ps Homo sapiens major histocompatibility complex, class II, DR53 haplotype (DR53) NG_002433 HLA_DRB1*10:01:01 Homo sapiens MHC class II antigen (HLA-DRB1) gene JN157606 HLA_DRB1*15:02:01 Homo sapiens HLA-DRB1 gene for MHC class II antigen AB774991 HLA_DRB1*03- AL929581 c91958-91273 Human DNA sequence from clone DAQB-97F8 on chromosome 6 AL929581 Patr-DRB1*020101 Pan troglodytes partial Patr-DRB1 gene for MHC class II antigen, Patr-DRB1*020101 AM910425 Patr-DRB3*0208 Pan troglodytes partial Patr-DRB3 gene for MHC class II antigen AM910428 Patr-DRB1*10:01 Pan troglodytes verus partial patr-DRB1 gene for MHC class II antigen HE800526 Gogo_DRB_01 44822-45507 Gorilla gorilla voucher Coriell Cell Repository DNA sample NG05251 AY663402 Patr-DRB*W902 Pan troglodytes partial Patr-DRB gene for MHC class II antigen AM910424 Patr-DRB3*01020101 Pan troglodytes partial Patr-DRB3 gene for MHC class II antigen AM910426 Patr_DRB_01 Pan troglodytes voucher Coriell Cell Repository DNA sample NS03646 AY663401 Gogo_DRB_02 c218276-217597 Gorilla DNA sequence from clone CH255-114D6, complete sequence CU104652 Gogo_DRB_03 c218276-217597 Gorilla DNA sequence from clone CH255-114D6, complete sequence CU104652 Gogo_DRB_04 73853067:151462-152106 Gorilla DNA sequence from clone CH255-351B13 CT025711 Gogo_DRB_05 73853067:151462-152106 Gorilla DNA sequence from clone CH255-351B13 CT025711 Patr-DRB4*02:01 5938-6579 Pan troglodytes troglodytes partial patr-DRB4 HE800525 Poab_DRB_01 200269-200926 Pongo abelii BAC clone CH276-191M9 from chromosome 6 AC206450 Patr_DRB_01 27628-28281 Pan troglodytes genomic DNA, chromosome 5 AP006503 Table S2. Microsatellite sequence and length in Platyrrhini MHC-DRB. MICROSATELLITE FIRST FINAL PART CENTRAL PART PART CA(GA)4CT(G (GA)5CACA(GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CACAGT(GA)2CA(GA)2AACA( (GA)15C A)4CT(GA)4C GA)2AAGACT(GA)3TACACT(GA)4CA(GA)2CT(GA)3CA(GA)3CACA(GA)3CA(GA)3CTGAAAGACT(GA)4CA((GA)3,4C T(GA)3G C T)(GA)3AAGACT((GA)4,3CT)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT((GA)3,5CT)(GA) CGCCTT3CACTGATA((GA)2,5,3CT) G (GA)5(CA)2(GA)2CACT(GA)4CAGAG(GA)4CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CACAGT(GA)2CA(GA)2AA CA(GA)4CT(G CA(GA)2AAGACT (GA)15C A)4CT(GA)4C (GA)3TACACT(GA)4CA(GA)2CT(GA)3CA(GA)3CACA(GA)3CA(GA)3CTGAAAGACT(GA)4CA((GA)3,4CT)(GA)3AAGA T(GA)3G C CT((GA)4,3)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3 CGCCTT CACAGATA(GA)2TT((GA)3,5CT)(GA)3CACTGATA((GA)2,5,3CT) G CA(GA)4CT(G (GA)5CACA(GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CACAGT(GA)2CA(GA)2AACA( (GA)16C A)4CC(GA)4C GA)2AAGACT(GA)3TACACT(GA)4CA(GA)2CT((GA)3,3CA)CA(GA)3CA(GA)3CTGAAAGACT(GA)4CA((GA)3,4CT)GA T(GA)3G C AAGACT((GA)4,3CT)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT((GA)3,5CT)(GA)3CACT CGCCTTGATA(GA)2CT G CA(GA)4CT(G (GA)5CACA(GA)2CACT(GA)4CA(GA)5CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CACAGT(GA)2CA(GA)2AACA( (GA)15C A)4CT(GA)4C GA)2AAGACT(GA)3TACACT(GA)4CA(GA)2CT((GA)3,3CA)CA(GA)3CA(GA)3CTGAAAGACT(GA)4CA((GA)3,4CT)(GA T(GA)3G C )3AAGACT((GA)4,3CT)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT((GA)3,5CT)(GA)3CAC CGCCTTTGATA((GA)2,5,3CT) G CA(GA)4CT(G (GA)5CACA(GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CAGAG((GA)2,3CA) (GA)11A A)4CT(GA)4C (GA)2AAGACT(GA)3CA(GA)2CT(GA)3TACACT(GA)4CA(GA)2CT((GA)3,3CA)CT(GA)4CA(GA)3CTGAAAGAT ACT(GA C (GA)4CA((GA)4,5,3CT)(GA)3TACTTAGG(GA)2CT(GA)4CA(GA)3CT(GA)3AAGACT((GA)4,3,3CT)(GA)3AACA )3GCAC((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT(GA)3CT(GA)3CACTGATA((GA)5,3) TTG CA(GA)4CT(G (GA)5CACA(GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CAGAG((GA)2,3CA) (GA)10A A)4CT(GA)4C (GA)2AAGACT(GA)3CA(GA)4CT(GA)3TACACT(GA)4CA(GA)2CT((GA)3,3CA)CT(GA)4CA(GA)3CTGAAAGACTCA(G ACT(GA C A)4CA(GA)4CT(GA)4GGCT(GA)3CT(GA)3TACTTAGG(GA)2CT(GA)4CA(GA)3CT(GA)3AAGACT((GA)4,3,3CT)(GA)3A )3GCACACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT(GA)3CT(GA)3CACTGATA((GA)2,5,3CT) TTG CA(GA)4CT(G (GA)5CACA(GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CACAGT(GA)2CA(GA)2AACA( (GA)15C A)4CT(GA)4C GA)2AAGACT(GA)3TACACT(GA)4CA(GA)2CT((GA)3,3CA)CA(GA)3CA(GA)3CTGAAAGACT(GA)4CA((GA)3,4CT)(GA T(GA)3G C )3AAGACT((GA)4,3CT)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA CGCCTT(GA)2TT((GA)3,5CT)(GA)3CACTGATA((GA)2,5,3CT) G (GA)2GT((GA)2,3CA)(GA)2AAGACT(GACA)2(GA)4CT(GA)2GGTACACT(GA)4CAGAGT((GA)2,3CA)GAGTAAGACTG CA(GA)4CTG ACA(GA)4CT(GA)3TACACT(GA)4CA(GA)2CT((GA)3,3CA)CT(GA)4CA(GA)3CTGAAAGACT(GA)4CA((GA)4,5,3CT)(G AAA(GA)2CT( A)3TACTTAGG(GA)2CT(GA)4CA(GA)3CT(GA)2CAGACT(GA)3AAGACT(GA)6CT(GA)3GGCT(GA)5CTGAGG(GA)2CA (GA)23C GA)4CACTGA (GA)2AT(GA)5CT(GA)3CAAGACA(GA)2CT(GA)4T(GA)2CA((GA)4,4CT)AA T(GA)3G GT (GA)3CTGAGG((GA)2,4CT)GAAAGACT(GA)5CT(GA)4CAAGACA(GA)2CT(GA)4T(GA)2CA(GA)4CT(GA)3 CGCCG AAGACT(GA)3CA((GA)2,3,2CT)(GA)3AATAGACT(GA)3CT(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT TG ((GA)3,5CT)(GA)3CACTGATA((GA)3,5,3CT) (GA)4GT((GA)2,3CA)(GA)2AAGACTGACAGACA(GA)4CT(GA)2GGTACACT(GA)4CAGAGT((GA)2,3CA)GAGTAAGAC CA(GA)4CT(G TGACA(GA)4CT(GA)3TACACT(GA)4CA(GA)2CT(GA)3CAGC(GA)2CACT(GA)4CA(GA)3CTGAAAGACT(GA)4CA((GA (GA)25C A)4CT(GA)4C )4,5,3CT)(GA)3TACTTAGG(GA)2CT(GA)4CA(GA)3CT(GA)2CAGACT(GA)3AAGACT(GA)6CT(GA)3GGCT(GA)5CTGA T(GA)3G ACT GG(GA)2CC(GA)2AT(GA)5CT(GA)3CAAGACA(GA)2CT(GA)4T(GA)2CA((GA)4,4CT)AA(GA)3CTGAGG((GA)2,4CT)GA CGCCGAAGACT(GA)5CT(GA)4CAAGACA(GA)2CT(GA)4T(GA)2CA(GA)4CT(GA)3AAGACT(GA)3CA((GA)2,3,2CT)(GA)3AAT TG AGACT(GA)3CT(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT((GA)3,5CT)(GA)3CACTGATA ((GA)2,5,3CT) Aona Aona Aona Aona Aona Aona Aona Aovo Aona SEQUENCE DRB1*031701 DRB1*0329GA DRB3*0615 DRB3*0627 DRB3*0628 DRB3*0626 DRB3*062502 DRB3*0601 DRB3*062501 462 446 314 350 346 312 294 310 314 LENGTH 7205 7249 4333 4951 4855 4357 4310 4343 4333 LZSize (Bytes) CA(GA)4CT(G (GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3CT)(GA)4CACT(GA)4CAGAGT(GA)2CA(GA)2AACA(GA)2AAGACT (GA)11A A)4CC(GA)5C (GA)3TACACT(GA)4CA(GA)2CT((GA)3,3CA)CT(GA)4CA(GA)3CTGAAAGACT(GA)4CA(GA)3CT(GA)3AAGACT((GA)4 ACT(GA ACA ,3CT)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT(GA)3CT(GA)4CACTGATA((GA)2,4,3CT) )3GCACTTG (GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,2CT)(GA)4CACT(GA)4CAGAGG((GA)2,3CA)(GA)2AAGACT(GA)3C CA(GA)4CT(G A(GA)3CT(GA)3 (GA)9A A)4CC(GA)5C TACACT(GA)4CA(GA)2CT(GA)3CAGG(GA)2CACT(GA)4CA(GA)3CTGAAAGACT(GA)4CA(GA)4CT((GA)5,3CT)(GA)3 ACT(GA ACA TACTTAGG(GA)2CT )3GCAC(GA)4CA(GA)3CT(GA)3AAGACTCT(GA)4CT(GA)3AACAGGGACT(GA)3CT(GA)3CACT(GA)3CT(GA)3CACAGATAGA TTG AATT(GA)3CT(GA)3CACTGATA ((GA)2,5,3CT)(GA)3CA(GA)2CT CA(GA)4CT(G (GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3CT)(GA)4CT(GA)4CACT(GA)4CAGAGT(GA)2CA(GA)2 (GA)13A A)4CC(GA)5C AACA(GA)2AAGACT(GA)2AATACACT(GA)4CA(GA)2CT((GA)3,3CA)CT(GA)4CA(GA)3CTGAAAGACT ACT(GA ACA (GA)4CA(GA)3CT(GA)3AAGACT((GA)4,3CT)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA )3GCAC(GA)2TT(GA)3CT(GA)3CACTGATA((GA)2,4,3CT) TTG CA(GA)4CT(G (GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,4CT)(GA)4CACT(GA)4CAGAGT(GA)2CA(GA)2AACA (GA)13A A)4CC(GA)7C (GA)2AAGACT(GA)2AATACACT(GA)4CA(GA)2CT((GA)3,3CA)CT(GA)4CA(GA)3CTGAAAGACT(GA)4CA ACT(GA ACA (GA)3CT(GA)3AAGACT((GA)4,3CT)(GA)3AACA((GA)2,3CT)(GA)3CACT(GA)3CT(GA)3CACAGATA(GA)2TT )3GCAC(GA)3CT(GA)3CACTGATA((GA)2,4,3CT) TTG CA(GA)2CA(G (GA)5CA A)2CT(GA)4C (GA)7AT((GA)4,5,4CT)(GA)3TA(GA)2CTGAAAGAC(GA)2CA(GA)3AACT(GA)3AACATA(GA)2CT(GA)5CACA GAGC T (GA)4 GCG CA(GA)5CT(G (GA)6 A)4CT(GA)10 (GA)4TAGACA(GA)2CA((GA)3,3CT)GT((GA)4,4CT)GTGCAAGACC((GA)5,2CT)(GA)3TT(GA)4CA(GA)2CT (GC)3 CACA (GA)3GGCT(GA)3CTCG(GA)3CTGAGC(GA)3CT(GA)2TAGACT(GA)3AACT(GA)3TACT GA CAGCG CA(GA)5CT(G A)4CT(GA)10 (GA)4TAGACA(GA)2CA((GA)3,3CT)GT(GA)4CTGTGCAAGACC((GA)5,2CT)(GA)3TT(GA)4CA((GA)2,4CT) (GA)6 CACA (GA)3CTCG(GA)3CTGAGC(GA)3CT(GA)2TAGACT(GA)3AACT(GA)3TACT (GC)3GA CAGCG CA(GA)5CT(G (GA)4TAGACA(GA)2CA((GA)3,3CT)GT(GA)4CT(GA)4CTGTGCAAGACC((GA)5,2,4CT)(GA)3CTCG(GA)3 (GA)6 A)15CACA CTGAGC(GA)3CT(GA)2TAGACT(GA)3AACT(GA)3TACT (GC)3GACAGCG CA(GA)7CAC (GA)4TAGACA(GA)2CA((GA)3,3CT)GT((GA)4,4CT)GTGCAAGACC(GA)5TT(GA)2CT(GA)2TT(GA)4CA (GA)6 A ((GA)2,4CT)(GA)3CTCG(GA)3CTGAGC(GA)3CT(GA)2TAGACT(GA)3AACT(GA)3TACT (GC)3GACAGCG CA(GA)5CT(G A)4CT(GA)13 (GA)4TAGACA((GA)2,3CT)GT(GA)4CTGTGCAAGACC((GA)5,2CT)(GA)3TT(GA)4CA((GA)2,4CT)(GA)3 (GA)6 CACA CTCG(GA)3CTGAGC(GA)3CT(GA)2TAGACT(GA)3AACA(GA)2TACT (GC)3GA CAGCG (GA)5CACA(GA)2CACT(GA)4CA(GA)6CT(GA)4CAGT((GA)3,3,2CT)(GA)4CACT(GA)4CAGAGT((GA)2,3CA)(GA)2AAG GCTGACAGACA CA(GA)4CT(G (GA)4CT(GA)3TACACT(GA)4CA(GA)2CT(GA)3CAGACACTCAGACACT((GA)4,3CT)GT(GA)5CT(GA)4CA(GA)4CT(GA (GA)11C A)4CT(GA)4C )4CACT(GA)4CA T(GA)3G C (GA)3CTGAAAGACT(GA)3TACTTAGG(GA)2CT(GA)3AACA((GA)3,4CT)(GA)3AAGACT(GA)5CT(GA)3GGCT(GA)3CT CACGT GAGG(GA)2CA(GA)2AT(GA)5CT(GA)3TAAGACA(GA)2CT(GA)4T(GA)2CA(GA)4CT(GA)3AAGACT(GA)3CA((GA)3,3C G T)(GA)4CA((GA)2,3,3CT)(GA)3CACAGATA(GA)2TT((GA)3,5CT)GGGGGGG Aona Aovo Aovo Aovo Aona Aona Aona Aovo Aovo Aovo Aovo DRB1*0328 DRB*W1802 DRB*W1801 DRB*W1803 DRB*W1806 DRB*W1808 DRB*W8901 DRB1*0307 DRB1*0306 DRB1*0305 DRB1*0304 414 160 144 154 162 168 114 286 282 336 274 6113 2663 2591 2479 2729 2893 1833 4096 4102 4282 3949 CA(GA)2CA(G (GA)2GC A)2CT(GA)4C (GA)3CAGACT(GA)5GT(GA)12G((GA)2,4CA)(GA)2GGCT(GA)3GCCA(GA)3CT(GA)2AAGACA CAGAG T CCA(GA)3GCA CA(GA)4CT (GA)6CT(GA)5CT(GA)15CT (GA)3GCGCGTG CA(GA)2CA(G (GA)5CA A)2CT(GA)4C (GA)7AT((GA)4,5,4CT)(GA)3TA(GA)2CTGAAAGAC(GA)2CA(GA)3AACT(GA)3AACATA(GA)2CT(GA)5CACA GAGC T (GA)4 GCG CA(GA)2CA(G (GA)5CA A)2CT(GA)4C (GA)7AT((GA)4,5,4CT)(GA)3TA(GA)2CTGAAAGAC(GA)2CA(GA)3AACT(GA)3AACATA(GA)2CT(GA)5CACA GAGC T (GA)4 GCG CA(GA)2CA(G (GA)5CA A)2CT(GA)4C (GA)7ATGAAA((GA)2,5,4CT)(GA)3TA(GA)2CTGAAAGAC(GA)2CA(GA)4CT(GA)3AACATA(GA)2CT(GA)5CACA GAGC T (GA)4 GCG CA(GA)2CA(G (GA)5CA A)2CT(GA)4C (GA)4AT((GA)4,4CT)(GA)3TA(GA)2CT(GA)3CAGACACA GAGC T (GA)4 GCG CA(GA)4CA(G (GA)5CA A)7CT(GA)7C (GA)7CT(GA)4AT((GA)4,5,4,6CT)(GA)6CTGAAAGAC(GA)2CA(GA)4CT(GA)3AACA(GA)3CT(GA)3GGGACACA GAGC T (GA)4 GCG CA(GA)2CA(G (GA)5CA A)3CC(GA)5C (GA)6CTGAAAGAC(GA)2CA(GA)4CT(GA)3GGGACACA GAGC T (GA)4 GCG (GA)5CA CA(GA)2CA(G A)3CC (GA)3GCGACT((GA)4,6CT)(GA)6CTGAAAGAC(GA)2CA(GA)4CT(GA)3GGGACACA GAGC (GA)4 GCG (GA)5CA CA(GA)2CA(G A)3CC ((GA)5,4CT)GC((GA)5,6CT)GAAAGAC(GA)2CA(GA)4CT(GA)3GGGACACACACA GAGC (GA)4 GCG (GA)2GC CA(GA)2CA(G A)2CA (GA)10AA(GA)7AGACA(GA)14G((GA)2,4CA)((GA)3,3CT)(GA)2AAGACA(GA)3GCCA CAGAG CCA(GA )3GCA (GA)2GC CA(GA)2CA(G A)2CA (GA)9AA(GA)7AGACA(GA)15G((GA)2,4CA)(GA)3CT(GA)2AAGACA(GA)3GCCA CAGAG CCA(GA )2GCA Aovo Aovo Aona Aovo Aovo Aovo Aovo Aovo Aona Aona Aovo Aovo DRB*W8602 DRB*W8601 DRB*W8501 DRB*W8502 DRB*W8501 DRB*W8801 DRB*W9001 DRB*W2901 DRB*W2908 DRB*W2910 DRB*W8701 DRB*W4501 106 114 90 84 68 156 74 112 114 114 66 98 1513 1528 1443 1477 1244 1975 1182 1854 1833 1833 734 1593 (GA)2GC CA(GA)2CA(G A)2CA (GA)10AA(GA)7AGACA(GA)15G((GA)2,4CA)((GA)3,3CT)(GA)2AAGACA(GA)3GCCA CAGAG CCA(GA )3GCA CA(GA)2CA(G (GA)2GC A)2CA(GA)9A (GA)11AA(GA)7AGACA(GA)9GTG((GA)2,4CA)(GA)3GCCA(GA)3CT(GA)2AAGACA CAGAG A CCA(GA)3GCA (GA)3G CA(GA)3CA( (GA)4CT(GA)3GG(GA)2CA(GA)2CT(GA)4CA(GA)9CA(GA)2TACA(GA)4CT(GA)4CA(GA)2GC(GA)2GCCA(GA)3AA GAAA( GA)2TACA GA)30G GGCG (GA)7G CA(GA)3CA( GA)2TACA (GA)4CT(GA)4CA(GA)2GC(GA)2GC(GA)2GC(GA)2GCCA GAAA( GA)37G GGCG (GA)10TTGACA(GA)2CA(GA)3CT(GA)3CGAGGCAAAGACT(GA)2CAGACGAGAAG(GA)4CA(GA)2TT(GA)3AAG ACT(GA)3CA(GA)4CT(GA)2AACT(GA)4CA(GA)2CT(GA)2CACT(GA)3CT(GA)3CC(GA)4C(GA)2CA(GA)4TA(GA)2 (GA)2G CA(GA)11CA( AAGACA(GA)4CT(GA)3CA(GA)2CAGACT(GA)4CT(GA)2CT(GA)4CTTA(GA)2CT(GA)3CTGAAAGACT(GA)5CT(G CCAGA GA)9CA A)3CACT(GA)4CT(GA)2GC(GA)2CA(GA)3CT(GA)5CCGAGG(GA)2CT(GA)2CAGAAACT(GA)3CT(GA)2AAGACT( GTGACGA)3CTAA(GA)2TACT(GA)5CT(GA)2CACTGATA(GA)2CT(GA)4CA(GA)2CT(GA)4TA(GA)4CTGAGCGACTAA(G ACT(GA A)2CACT(GA)3CT(GA)2CACTCATA(GA)2CT(GA)3CT(GA)2GGGACT(GA)3CTGACAGACACTGATA(GA)2CTCA( )16GCG GA)4CT(GA)3CACT(GA)4CT (GA)2CT(GA)3CT(GA)3CGAGGCAAAGACT(GA)2CAGAC(GA)2AG(GA)4CA(GA)2TT(GA)3AAGACT(GA)3CA(GA) (GA)2G 4CT(GA)2AACT(GA)4CA(GA)2CT(GA)2CACT(GA)3CT(GA)3CTGAAAGTAAGACT(GA)5CT(GA)2GT(GA)2CA(GA) CCAGA CA(GA)34CA( 3CA(GA)3GCCT(GA)3CAGACA(GA)2CA(GA)4CT(GA)2CT(GA)5TT(GA)4CT(GA)3CACT(GA)7CT(GA)3CC(GA)4C( GTGAC GA)13TTGAC GA)2CA(GA)3GATAAAGACA(GA)4CT(GA)3CA(GA)2CAGACT(GA)4CT(GA)2CT(GA)4CTTA(GA)2CT(GA)3CTGA ACT(GA A AAGACT(GA)3CACT(GA)4CT(GA)2GC(GA)2CA(GA)3CT(GA)5CCGAGG(GA)2CT(GA)2CAGAAACT(GA)3CT(GA) )15(GCG4CT(GA)3CTAA(GA)2TACT(GA)5CT(GA)2CACTGATA(GA)2CT(GA)4CA(GA)2CT(GA)4TA(GA)4CT(GA)3CTAA(G A)2GCG A)2CACT(GA)3CT(GA)2CACTCATA(GA)2CT(GA)2GGGACT(GA)3CTGACAGACACTGATA(GA)2CTCA(GA)4CT( CAAGC GA)3CACT(GA)4CT G (GA)16CA(GA (GA)5CT(GA)3GTCTCA(GA)2CT(GA)4CA(GA)2CT(GA)3CTGT(GA)5CT(GA)4C(GA)2GG(GA)2(CAGA)2(GA)2AAG (GA)2A )9CT(GACA)2 ACT(GA)2CT(GA)5CT(GA)4CTGAAAGACT(GA)3CT(GA)3CAAGACA(GA)2CT(GA)4TAAGACAGACT(GA)4CTGA AGG(G CT TG(GA)4CT(GA)9GG A)3GGGCG Caja-DRB01 Caja-DRB05 Caja-DRB02 Caja- Caja- Aona Aovo DRB06 DRB03 DRB*W3002 DRB*W3001 212 554 434 130 158 118 116 1623 1527 Figure S1. Aotus MHC-DRB Exon 2 + Intron 2 (partial) * 20 * 40 * 60 TTCGTGTCCCCACAGCACGTTTCtTg C G TA g TGAGTGT ATTTC TCAAcGGGACGGAGC Aona-DRB1*0328GB : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB1*0329GA : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB1*031701GA : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB1*0304GA : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB1*0305GA : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB1*0306GA : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB1*0307GA : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aona-DRB3*0615 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB3*0627 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB3*062501 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB3*0626 : TTCGTGTCCCCACAGCACGTTTCTTTGAGCTGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB3*062502 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB3*0628 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB3*0601 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aona-DRB*W8901 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGACTAAGAGTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aona-DRB*W1808 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGGTTAAGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aona-DRB*W1806 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGTTAAGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB*W1801 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGCTAAGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB*W1802 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGCTAAGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB*W1803 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGGTAAATCTGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB*W8801 : TTCGTGTCCCCACAGCACGTTTCTTGGAACAGGTTAAGGATGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aovo-DRB*W2901 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGACTAAGAGTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aona-DRB*W2908 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGACTAAGAGTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aona-DRB*W2910 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGACTAAGAGTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aovo-DRB*W3001 : TTCGTGTCCCCACAGCACGTTTCCTGGAGCAGGTTAAGTATGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aona-DRB*W3002 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGTTAAGTATGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aovo-DRB*W9201 : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB*W9202 : TTCGTGTCCCCACAGCACGTTTCTTGTTCCAGACTACGTCTGAGTGTTATTTCTTCAACGGGACGGAGC : 69 Aona-DRB*W9101 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGCTAAGTGTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aovo-DRB*W9101 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGCTAAGTGTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aovo-DRB*W9102 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGCTAAGGGTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aovo-DRB*W9001 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGGTAAGTCTGAGTGTCATTTCCTCAACGGGACGGAGC : 69 Aovo-DRB*W4501 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCAGGTTAAGCATGAGTGTCATTTCTTCAACGGGACGGAGC : 69 Aovo-DRB*W9301 : TTCGTGTCCCCACAGCACGTTTCTTGGAGCTGATTAAGTTTGAGTGTCATTTCTTCAATGGGACGGAGC : 69 * 80 * 100 * 120 * 1 GGGTGCgGt cCTGgA AGAtactT tATAACCaGgAGGAG gtGCGCTTCgACAGCGACGTGGGGG Aona-DRB1*0328GB : GGGTGCGGTTCCTGGACAGATACTTCTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB1*0329GA : GGGTGCGGTACCTGGACAGATACTTTTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB1*031701GA : GGGTGCGGTACCTGGACAGATACTTTTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB1*0304GA : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGAAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB1*0305GA : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGGAGGAGTTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB1*0306GA : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGGAGGAGTTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB1*0307GA : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGGAGGAGTTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB3*0615 : GGGTGCGGTACCTGGACAGATACATCCATAACCAGGAGGAGGTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB3*0627 : GGGTGCGGTACCTGGACAGATACATCCATAACCAGGAGGAGGTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB3*062501 : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGAAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB3*0626 : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGAAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB3*062502 : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGAAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB3*0628 : GGGTGCGGTACCTGGACAGATACCTTTATAACCAGAAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB3*0601 : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGAAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB*W8901 : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB*W1808 : GGGTGCAGTTCCTGGAAAGATACTTTCATAACCAGGAGGAGTTGGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB*W1806 : GGGTGCAGTTCCTGGAAAGATACTTTCATAACCAGGAGGAGTTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W1801 : GGGTGCAGTTCCTGGAAAGATACTTTTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W1802 : GGGTGCAGTTCCTGGAAAGATACTTTTATAACCAGGAGGAGTATGTGCGCTTCAACAGCGACGTGGGGG : 138 Aovo-DRB*W1803 : GGGTGCAGTTCCTGGAAAGATACTTTTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W8801 : GGGTGCGGCTCCTGCAAAGATACTTCTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W2901 : GGGTGCGGCTCCTGCAAAGATACTTCTATAACCAGGAGGAGTATGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB*W2908 : GGGTGCGGCTCCTGCAAAGATACTTCTATAACCAGGAGGAGTATGCGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB*W2910 : GGGTGCGGCTCCTGCAAAGATACTTCTATAACCAGGAGGAGTATGCGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W3001 : GGGTGCGGTACCTGGAAAGACTCATCTATAACCGGGAGGAGTTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB*W3002 : GGGTGCGGTACCTGGAAAGATACTTCTATAACCGGGAGGAGTTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W9201 : GGGTGCTGTACCTGGACAGATACTTCTATAACCAGGAGGAGTTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W9202 : GGGTGCGGTACCTGGACAGATACTTCTATAACCAGGAGGAGGTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aona-DRB*W9101 : GGGTGCGGTTCCTGGAAAGATACTTCTATAACCAGGAGGAGGTCCTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W9101 : GGGTGCGGTTCCTGGAAAGATACTTCTATAACCAGGAGGAGGTCCTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W9102 : GGGTGCGGTTCCTGGAAAGATACTTCTATAACCAGGAGGAGGTCCTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W9001 : GGGTGCGGCTCCTGCAAAGATACTTCTATAACCAGGAGGAGGTCCTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W4501 : GGGTGCGGTTCCTGGACAGATACATCCATAACCAGGAGGAGGTCGTGCGCTTCGACAGCGACGTGGGGG : 138 Aovo-DRB*W9301 : GGGTGCGGTTTCTGGAAAGACAAATCCATAACCAGGAGGAGTATCTGCGCTTCGACAGCGACGTGGGGG : 138 1 40 * 160 * 180 * 200 AGTaCCGGGCGGTGACGGAGCTGGGgCGGCctg GC gAGtactggAACaGcCaGaAGGAc c TGG Aona-DRB1*0328GB : AGTACCGGGCGGTGACGGAGCTGGGGCGGCGGAGCGCAGAGTACTGGAACAGCCAGAAGGACTTCCTGG : 207 Aona-DRB1*0329GA : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB1*031701GA : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aovo-DRB1*0304GA : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aovo-DRB1*0305GA : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aovo-DRB1*0306GA : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aovo-DRB1*0307GA : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB3*0615 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB3*0627 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB3*062501 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB3*0626 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB3*062502 : AGTACCGGGCGGTGACGGAGCTGGGCCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB3*0628 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aovo-DRB3*0601 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB*W8901 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCGGACCGCAGAGTACTGGAACAGCCAGAAGGACTACGTGG : 207 Aona-DRB*W1808 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCAAAGTACTGGAACGGTCAGCAGGACATCCTGG : 207 Aona-DRB*W1806 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCAAAGTACTGGAACGGTCAGCAGGACATCCTGG : 207 Aovo-DRB*W1801 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCAAAGTACTGGAACGGTCAGCAGGACATCCTGG : 207 Aovo-DRB*W1802 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCAAAGTACTGGAACGGTCAGCAGGACATCCTGG : 207 Aovo-DRB*W1803 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGATGCAAAGTACTGGAACGGTCAGCAGGACATCCTGG : 207 Aovo-DRB*W8801 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCAGAAGGACATCCTGG : 207 Aovo-DRB*W2901 : AGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCTGAAGGAACGCCTGG : 207 Aona-DRB*W2908 : AGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCTGAAGGAACGCCTGG : 207 Aona-DRB*W2910 : AGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCCGAGTACTGGAACAGCCTGAAGGAACGCCTGG : 207 Aovo-DRB*W3001 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGTTGCAGAGAAGCTCAACAGCCAGAAGGACATCCTGG : 207 Aona-DRB*W3002 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTAGCGCAGAGAAGCTCAACAGCCAGAAGGACATCCTGG : 207 Aovo-DRB*W9201 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGTTGCAGAGAAGCTCAACAGCCAGAAGGACATCCTGG : 207 Aovo-DRB*W9202 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGTTGCAGAGAAGCTCAACAGCCAGAAGGACATCCTGG : 207 Aona-DRB*W9101 : AGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTGAGGCCGAGTCCTGGAACAGCCAGAAGGACATCCTGG : 207 Aovo-DRB*W9101 : AGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTGAAGCAGAGAAGTACAACAGCCAGAAGGACTTCCTGG : 207 Aovo-DRB*W9102 : AGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTAGCGCAGAGAAGTACAACAGCCAGAAGGACATCCTGG : 207 Aovo-DRB*W9001 : AGTTCCGGGCGGTGACGGAGCTGGGGCGGCCTAGCGCAGAGAAGTTAAACAGCCAGAAGGAAAGCCTGG : 207 Aovo-DRB*W4501 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCGGAGTACTGGAACAGCCAGAAGGACATCCTGG : 207 Aovo-DRB*W9301 : AGTACCGGGCGGTGACGGAGCTGGGGCGGCCTGACGCGGAGTCCTGGAACAGCCAGAAGGACTTAATGG : 207 * 220 * 240 * 260 * AG a GCGGG C GGTgGACA CTaCTGcAgAcACAAcTACGGGGTTg TGAGAGCTTCACAGTGC Aona-DRB1*0328GB : AGGAGAGGCGGGCCTTGGTGGACACCTACTGTAGATACAACTACGGGGTTGCTGAGAGCTTCACAGTGC : 276 Aona-DRB1*0329GA : AGCGGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aona-DRB1*031701GA : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB1*0304GA : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB1*0305GA : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aovo-DRB1*0306GA : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB1*0307GA : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aona-DRB3*0615 : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aona-DRB3*0627 : AGCAGAAGCGGGGCCAGGTGGACAACTACTGCAGACACAACTACGGGGTTTTTGAGAGCTTCACAGTGC : 276 Aona-DRB3*062501 : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aona-DRB3*0626 : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aona-DRB3*062502 : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aona-DRB3*0628 : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aovo-DRB3*0601 : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aona-DRB*W8901 : AGCAGAAGCGGGGCCGGGTGGACAACTACTGCAGACACAATTACGGGGTTGCTGAGAGCTTCACAGTGC : 276 Aona-DRB*W1808 : AGCTCAAGCGGGGCCAGGTAGACAACTACTGCAGACACAACTACGGGGTTTTTGAGAGCTTCACAGTGC : 276 Aona-DRB*W1806 : AGCTGAAGCGGGGCCAGGTAGACAACTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W1801 : AGCTGAAGCGGGGCCAGGTAGACAACTACTGCAGACACAACTACGGGGTTTTTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W1802 : AGCTGAAGCGGGGCCAGGTAGACAACTACTGCAGACACAACTACGGGGTTTTTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W1803 : AGCTGAAGCGGGGCCAGGTAGACAACTACTGCAGACACAACTACGGGGTTTTTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W8801 : AGTATCTGCGGGCCGCGGTGGACAACTACTGCAGACACAACTACGGGGTTGCTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W2901 : AGTATCTGCGGGCCGCGGTGGACACCTGCTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aona-DRB*W2908 : AGTATCTGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aona-DRB*W2910 : AGTATCTGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTTTTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W3001 : AGGACAGGCGGGCCTCGGTGGACACCTACTGTAAACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aona-DRB*W3002 : AGGACAGGCGGGCCGCGGTGGACACCTACTGTAAACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W9201 : AGGACAGGCGGGCCTCGGTGGACACCTACTGTAAACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W9202 : AGGACAGGCGGGCCTCGGTGGACACCTACTGTAAACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aona-DRB*W9101 : AGACCAGGCGGGCCGCGGTGGACACCTTCTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W9101 : AGACCAGGCGGGCCGCGGTGGACACCTTCTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W9102 : AGACCAGGCGGGCCGCGGTGGACACCTTCTGCAGACACAACTACGGGGTTTTTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W9001 : AGTATCTGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGGTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W4501 : AGGACAAGCGGGCCTCGGTGGACACCTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 Aovo-DRB*W9301 : AGGACAGGCGGGCCGCGGTGGACACCTACTGCAGACACAACTACGGGGTTGTTGAGAGCTTCACAGTGC : 276 2 280 * 300 * 320 * 340 AGCGGAgAGGTGAGcGCGGCGGGgCGGGGCCTCCCTGTGAgCTGCcgaTCAGAGA gaga ga Aona-DRB1*0328GB : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB1*0329GA : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAAAGAGA : 345 Aona-DRB1*031701GA : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aovo-DRB1*0304GA : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aovo-DRB1*0305GA : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aovo-DRB1*0306GA : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aovo-DRB1*0307GA : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB3*0615 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB3*0627 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB3*062501 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB3*0626 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB3*062502 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB3*0628 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aovo-DRB3*0601 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACTGAGAGAGA : 345 Aona-DRB*W8901 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----CAGAGACTGA : 341 Aona-DRB*W1808 : AGCGGAGAGGTGAGTGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----GAGAGACTGA : 341 Aona-DRB*W1806 : AGCGGAGAGGTGAGTGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----GAGAGACTGA : 341 Aovo-DRB*W1801 : AGCGGAAAGGTGAGTGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----GAGAGA---- : 337 Aovo-DRB*W1802 : AGCGGAAAGGTGAGTGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----GAGAGACTGA : 341 Aovo-DRB*W1803 : AGCGGAAAGGTGAGTGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----GAGAGACTGA : 341 Aovo-DRB*W8801 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGAGAGACAGAGAGAGA : 345 Aovo-DRB*W2901 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----CAGAGACTGA : 341 Aona-DRB*W2908 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----CAGAGACTGA : 341 Aona-DRB*W2910 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA----CAGAGACTGA : 341 Aovo-DRB*W3001 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCGAATCAGAGA----CAGAGACAGA : 341 Aona-DRB*W3002 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCGGATCAGAGA----CAGAGACAGA : 341 Aovo-DRB*W9201 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCGAATCAGAGA----CAGAGACAGA : 341 Aovo-DRB*W9202 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCGAGTCAGAGA----CAGAGACAGA : 341 Aona-DRB*W9101 : AGCGGAGAGGTGAGCGCGGCGGGACGGGGCCTCCCTGTGAACTGCCAATCAGAGA-------------- : 331 Aovo-DRB*W9101 : AGCGGAGAGGTGAGCGCGGCGGGACGGGGCCTCCCTGTGAACTGCCAATCAGAGA-------------- : 331 Aovo-DRB*W9102 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA-------------- : 331 Aovo-DRB*W9001 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA-------------- : 331 Aovo-DRB*W4501 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCGAATCAGAGA----CAGAGACTGA : 341 Aovo-DRB*W9301 : AGCGGAGAGGTGAGCGCGGCGGGGCGGGGCCTCCCTGTGAGCTGCCGATCAGAGA-------------- : 331 * 360 * 380 * 400 * gAga agagaGA gA Aona-DRB1*0328GB : CTGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 406 Aona-DRB1*0329GA : CTGAGAGAGACACTGAGTGAGAGTGAGACAGAGAGACAGAGAAAGACTGACAGACAGAGAGAGACTGAG : 414 Aona-DRB1*031701GA : CTGAGAGAGACACTGAGAGAGAGTGAGACAGAGAGACAGAGAAAGACTGACAGACAGAGAGAGACTGAG : 414 Aovo-DRB1*0304GA : CCGAGAGAGA----------------GACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 396 Aovo-DRB1*0305GA : CCGAGAGAGA----------------GACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 396 Aovo-DRB1*0306GA : CCGAGAGAGA----------------GACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 396 Aovo-DRB1*0307GA : CCGAGAGAGA------------GAGAGACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 400 Aona-DRB3*0615 : CTGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 406 Aona-DRB3*0627 : CTGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 406 Aona-DRB3*062501 : CTGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 406 Aona-DRB3*0626 : CTGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAG--AGAGAGACTGAG : 404 Aona-DRB3*062502 : CCGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 406 Aona-DRB3*0628 : CTGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAGAGAGAGAGACTGAG : 406 Aovo-DRB3*0601 : CTGAGAGAGAC------CGAGAGAGAGACACAGAGACACTGAGAGA--GACAGAG-GAGAGAGACTGAG : 405 Aona-DRB*W8901 : GA--------------------------------------------------GAGACTGAGAGAGAGAG : 360 Aona-DRB*W1808 : GAGAGACTGAG----------------------AGAGAGAGAGAGA------GAGACACAGAGAGAGAT : 382 Aona-DRB*W1806 : GAGAGACTGAG----------------------AGAGAGAGAGAGA------GAGACACAGAGAGAGAT : 382 Aovo-DRB*W1801 : ----------------------------------------------------GAGACACAGAGAGAGAT : 354 Aovo-DRB*W1802 : GAGAGACTGAG----------------------AGAGAG----AGA------GAGAGAGAGAGAGACAC : 378 Aovo-DRB*W1803 : GAGAGAGAGAG----------------------AGAGAGAGAGAGA------GAGACACAGAGAGAGAT : 382 Aovo-DRB*W8801 : GA--------------------------------------------------GAGACTGAGAGAGAGAG : 364 Aovo-DRB*W2901 : GA--------------------------------------------------GAGACTGAGAGAGAGAG : 360 Aona-DRB*W2908 : GA--------------------------------------------------GAGACTGAGAGAGAGAG : 360 Aona-DRB*W2910 : GA--------------------------------------------------GAGACTGAGAGAGAGAG : 360 Aovo-DRB*W3001 : GA--------------------------------------------------GAGAGAGAGAGAGAGAA : 360 Aona-DRB*W3002 : GA--------------------------------------------------GAGAGAGAGAGAGAAAG : 360 Aovo-DRB*W9201 : GA--------------------------------------------------GAGAGAGAGAGAGAGAA : 360 Aovo-DRB*W9202 : GA--------------------------------------------------GAGAGAGAGAGAGAAAG : 360 Aona-DRB*W9101 : ----------------------------------------------------CAGAGAGACCGAGAGAG : 348 Aovo-DRB*W9101 : ----------------------------------------------------CAGAGAGACCGAGAGAG : 348 Aovo-DRB*W9102 : ----------------------------------------------------CAGAGAGACCGAGAGAG : 348 Aovo-DRB*W9001 : ----------------------------------------------------CAGAGACTGAGAGAGAC : 348 Aovo-DRB*W4501 : GA--------------------------------------------------GAGACTGAGAGACAGAC : 360 Aovo-DRB*W9301 : ----------------------------------------------------GAGACTGAGAGAGAGAG : 348 3 420 * 440 * 460 * 480 aga Aona-DRB1*0328GB : AGA-GACAGTGA--GAGA------------------------------CTGAGAGACTGAGACTGAGAG : 442 Aona-DRB1*0329GA : AGG-TACACTGA--GAGAGACAGAGTGAGACAGAGAGACAGAGTAAGACTGACAGAGAGAGACTGAGAG : 480 Aona-DRB1*031701GA : AGG-TACACTGA--GAGAGACAGAGTGAGACAGAGAGACAGAGTAAGACTGACAGAGAGAGACTGAGAG : 480 Aovo-DRB1*0304GA : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGAC------ : 430 Aovo-DRB1*0305GA : AGA-GACAGTGA--GAGA------------------------------CTGAGAGACTGAGACTGAGAG : 432 Aovo-DRB1*0306GA : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 436 Aovo-DRB1*0307GA : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 440 Aona-DRB3*0615 : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 446 Aona-DRB3*0627 : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 446 Aona-DRB3*062501 : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 446 Aona-DRB3*0626 : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 444 Aona-DRB3*062502 : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 446 Aona-DRB3*0628 : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 446 Aovo-DRB3*0601 : AGA-GACAGTGA--GAGACT--------------------------GAGAGACTGAGAGAGACTGAGAG : 445 Aona-DRB*W8901 : AGA------------------------------------------------------------------ : 363 Aona-DRB*W1808 : AGAC--AGAGACAGA--GAG------------------------------------------------- : 398 Aona-DRB*W1806 : AGAC--AGAGACAGA--GAG------------------------------------------------- : 398 Aovo-DRB*W1801 : AGAC--AGAGACAGA--GAG------------------------------------------------- : 370 Aovo-DRB*W1802 : AGAG--AGAGATAGACAGAG------------------------------------------------- : 396 Aovo-DRB*W1803 : AGAC--AGAGACAGA--GAG------------------------------------------------- : 398 Aovo-DRB*W8801 : AGAC--TGAGAGAGAGAGAG------------------------------------------------- : 382 Aovo-DRB*W2901 : AGA------------------------------------------------------------------ : 363 Aona-DRB*W2908 : AGA------------------------------------------------------------------ : 363 Aona-DRB*W2910 : AGA------------------------------------------------------------------ : 363 Aovo-DRB*W3001 : AGA------------------------------------------------------------------ : 363 Aona-DRB*W3002 : AGA------------------------------------------------------------------ : 363 Aovo-DRB*W9201 : AGA------------------------------------------------------------------ : 363 Aovo-DRB*W9202 : AGA------------------------------------------------------------------ : 363 Aona-DRB*W9101 : AGA------------------------------------------------------------------ : 351 Aovo-DRB*W9101 : A-------------------------------------------------------------------- : 349 Aovo-DRB*W9102 : CGA------------------------------------------------------------------ : 351 Aovo-DRB*W9001 : TGA------------------------------------------------------------------ : 351 Aovo-DRB*W4501 : TGA------------------------------------------------------------------ : 363 Aovo-DRB*W9301 : ACT------------------------------------------------------------------ : 351 * 500 * 520 * 540 * gagaga a Aona-DRB1*0328GB : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAGACA--GAGAAAGGCTGACAGACAGAGA : 499 Aona-DRB1*0329GA : ATAC------ACTGAGAGAGACAGAGACTGAGAGACAGAGAGACACTGAGAGAGACAGAGAGACTGAAA : 543 Aona-DRB1*031701GA : ATAC------ACTGAGAGAGACAGAGACTGAGAGACAGCGAGACACTGAGAGAGACAGAGAGACTGAAA : 543 Aovo-DRB1*0304GA : ----------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 470 Aovo-DRB1*0305GA : AGAC------ACTGAGAGAGACAGAGG----GAGACAGAGAGACA--GAGAAAGACTGAGAGACAGAGA : 489 Aovo-DRB1*0306GA : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 480 Aovo-DRB1*0307GA : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 484 Aona-DRB3*0615 : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 490 Aona-DRB3*0627 : AGAC------ACTGAGAGAGACAGAGG----GAGACAGAGAGACA--GAGAAAGACTGAGAGACAGAGA : 503 Aona-DRB3*062501 : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 490 Aona-DRB3*0626 : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 488 Aona-DRB3*062502 : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 490 Aona-DRB3*0628 : AGAC------ACTGAGAGAGACAGAGG----GAGACAGAGAGACA--GAGAAAGACTGAGAGACAG--- : 500 Aovo-DRB3*0601 : AGAC------ACTGAGAGAGACAGAGT----GAGACAGAGAAACA--GAGAAAGAC------------- : 489 Aona-DRB*W8901 : -------------ATGAGAGAGACTGA------------------------------------------ : 377 Aona-DRB*W1808 : --ACTGAGAGACTGTGAGAGAGACTGA------------------------------------------ : 423 Aona-DRB*W1806 : --ACTGAGAGACTGTGAGAGAGACTG------------------------------------------- : 422 Aovo-DRB*W1801 : --ACTGAGAGACTGTGAGAGAGACTGA------------------------------------------ : 395 Aovo-DRB*W1802 : --ACTGAGAGACTGTGAGAGAGACTG------------------------------------------- : 420 Aovo-DRB*W1803 : --ACTGAGAGACTGTGAGAGAGACTG------------------------------------------- : 422 Aovo-DRB*W8801 : --ACTGAGAGAGAATGAGAGAGACTGA------------------------------------------ : 407 Aovo-DRB*W2901 : -------------ATGAAAGAGACTGA------------------------------------------ : 377 Aona-DRB*W2908 : -------------ATGAGAGAGACTGA------------------------------------------ : 377 Aona-DRB*W2910 : -------------ATGAGAGAGACTGA------------------------------------------ : 377 Aovo-DRB*W3001 : -------------GAGAGAGAGAG--------------------------------------------- : 374 Aona-DRB*W3002 : -------------GAGAGAGAGAG--------------------------------------------- : 374 Aovo-DRB*W9201 : -------------GAGAGAGAGAG--------------------------------------------- : 374 Aovo-DRB*W9202 : -------------GAGAGAGAGAA--------------------------------------------- : 374 Aona-DRB*W9101 : -------------CTGAGAGAGACTGC------------------------------------------ : 365 Aovo-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9102 : -------------CTGAGAGAGACTGA------------------------------------------ : 365 Aovo-DRB*W9001 : -------------GAGA---------------------------------------------------- : 355 Aovo-DRB*W4501 : -------------GAGAGAGAGTG--------------------------------------------- : 374 Aovo-DRB*W9301 : -------------GAGAG--------------------------------------------------- : 356 4 560 * 580 * 600 * 620 Aona-DRB1*0328GB : GAGACTGAGAGATACACTGAGAG----AGACAGAGACTGAGAGAC----AGACACT----------GAG : 550 Aona-DRB1*0329GA : GA--CTGAGAGAGACAGAGAGAGACTGAGAGAGAGACTGAGAGACTGAGAGATACTTAGGGAGACTGAG : 610 Aona-DRB1*031701GA : GA--CTGAGAGAGACAGAGAGAGACTGAGAGAGAGACTGAGAGACTGAGAGATACTTAGGGAGACTGAG : 610 Aovo-DRB1*0304GA : -----TGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACACT----------GAG : 520 Aovo-DRB1*0305GA : GA--CTGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGGGAGACACT----------GAG : 542 Aovo-DRB1*0306GA : -----TGAGAAATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACACT----------GAG : 530 Aovo-DRB1*0307GA : -----TGAGAAATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACACT----------GAG : 534 Aona-DRB3*0615 : -----TGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACAC------------AG : 538 Aona-DRB3*0627 : GAGACTGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACACT----------GAG : 558 Aona-DRB3*062501 : -----TGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACAC------------AG : 538 Aona-DRB3*0626 : -----TGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACAC------------AG : 536 Aona-DRB3*062502 : -----TGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACAC------------AG : 538 Aona-DRB3*0628 : -AGACTGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACACT----------GAG : 554 Aovo-DRB3*0601 : -----TGAGAGATACACTGAGAG----AGACAGAGACTGAGAGACAGAGAGACAC------------AG : 537 Aona-DRB*W8901 : ---------------GA----------------GAGAGACTGAGAGAGACTGAGA-------------- : 401 Aona-DRB*W1808 : ---------------GAGAGACT------GTGCAAGACCGAGAGAGAGACTGAGA-------------- : 457 Aona-DRB*W1806 : ------------------------------TGCAAGACCGAGAGAGAGACTGAGA-------------- : 447 Aovo-DRB*W1801 : ---------------GAGAGACT------GTGCAAGACCGAGAGAGAGATTGAGA-------------- : 429 Aovo-DRB*W1802 : ------------------------------TGCAAGACCGAGAGAGAGACTGAGA-------------- : 445 Aovo-DRB*W1803 : ------------------------------------------AGAGAGACTGTG--------------- : 434 Aovo-DRB*W8801 : ---------------GA----------------GAGAGACTGAGAGAGACTGAGA-------------- : 431 Aovo-DRB*W2901 : ---------------GA----------------GAGAGACTGAGAGAGACTGAGA-------------- : 401 Aona-DRB*W2908 : ---------------GA----------------GAGAGACTGAGAGAGACTGAGA-------------- : 401 Aona-DRB*W2910 : ---------------GA----------------GAGAGACTGAGAGAGACTGAGA-------------- : 401 Aovo-DRB*W3001 : ---------------------------------AAGACAGAGAGAGAGAGAGAGA-------------- : 396 Aona-DRB*W3002 : ----------------A----------------GAGAGAAAGAGAGAGAGAGAGAA------------- : 398 Aovo-DRB*W9201 : ---------------------------------AAGACAGAGAGAGAGAGAGAGA-------------- : 396 Aovo-DRB*W9202 : ---------------------------------GACAGAGAGAGAGAGAGAGAGA-------------- : 396 Aona-DRB*W9101 : ---------------------------------------GAGAGAGA---------------------- : 373 Aovo-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9102 : ---------------------------------------GAGAGAGA---------------------- : 373 Aovo-DRB*W9001 : --------------------------------------------------------------------- : - Aovo-DRB*W4501 : --------------------------------------AGAGAGAGAGAGAGAGA-------------- : 391 Aovo-DRB*W9301 : --------------------------------------------------------------------- : - * 640 * 660 * 680 * agaga a Aona-DRB1*0328GB : AGAGACTGAGAGACTG--------TGAGAGAGAGACTGAGAGAGACAGA----GAGAGACTGAGAGAGA : 607 Aona-DRB1*0329GA : AGAGACAGAGAGACTGAGACAGACTGAGAGAAAGACTGAGAGAGAGAGACTGAGAGAGGCTGAGAGAGA : 679 Aona-DRB1*031701GA : AGAGACAGAGAGACTGAGACAGACTGAGAGAAAGACTGAGAGAGAGAGACTGAGAGAGGCTGAGAGAGA : 679 Aovo-DRB1*0304GA : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAAA : 569 Aovo-DRB1*0305GA : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA----GAGAGACTGAGAGAGA : 593 Aovo-DRB1*0306GA : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAAA : 579 Aovo-DRB1*0307GA : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAAA : 583 Aona-DRB3*0615 : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAGA : 587 Aona-DRB3*0627 : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA----GAGAGACTGAGAGAGA : 609 Aona-DRB3*062501 : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAGA : 587 Aona-DRB3*0626 : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAGA : 585 Aona-DRB3*062502 : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAGA : 587 Aona-DRB3*0628 : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA----GAGAGACTGAGAGAGA : 605 Aovo-DRB3*0601 : AGAGACAGAGAGACTGA--------------AAGACTGAGAGAGACAGA------GAGACTGAGAGAGA : 586 Aona-DRB*W8901 : -----------------------------------GATAGAGACTGA---------------------- : 413 Aona-DRB*W1808 : -----------------------------------CTGAGAGATTGAGA------------GAGACAGA : 479 Aona-DRB*W1806 : -----------------------------------CTGAGAGATTGAGA------------GAGACAGA : 469 Aovo-DRB*W1801 : -----------------------------------CTGAGAGATTGAGA------------GAGACAGA : 451 Aovo-DRB*W1802 : -----------------------------------CTGAGAGATTGAGA------------GAGACAGA : 467 Aovo-DRB*W1803 : ------------------------------------CAAGACCGAGAGA------------GAGACTGA : 455 Aovo-DRB*W8801 : -----------------------------------GAGAGAGACTGAGA------------GAGAGAGA : 453 Aovo-DRB*W2901 : -----------------------------------GATAGAGACTGA---------------------- : 413 Aona-DRB*W2908 : -----------------------------------GATAGAGACTGA---------------------- : 413 Aona-DRB*W2910 : -----------------------------------GATAGAGACTGA---------------------- : 413 Aovo-DRB*W3001 : -----------------------------------GAGAGAGA--GA---------------------- : 406 Aona-DRB*W3002 : -----------------------------------GACAGAGAGAGA---------------------- : 410 Aovo-DRB*W9201 : -----------------------------------GAGAGAGA--GA---------------------- : 406 Aovo-DRB*W9202 : -----------------------------------GAGAGAGA--GA---------------------- : 406 Aona-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9102 : --------------------------------------------------------------------- : - Aovo-DRB*W9001 : --------------------------------------------------------------------- : - Aovo-DRB*W4501 : -----------------------------------GAGAGAGG-------------------------- : 399 Aovo-DRB*W9301 : --------------------------------------------------------------------- : - 5 700 * 720 * 740 * 76 Aona-DRB1*0328GB : CACTGAGAGAGACAGAGA--GACTGAAAGACTGAGAGATACTTAGGGAGACTGAGAGAAAC-------A : 667 Aona-DRB1*0329GA : GACTGAGGGAGACAGAGAATGAGAGAGAGACTGAGAGACAAG-ACAGAGACTGAGAGAGATGAGACAGA : 747 Aona-DRB1*031701GA : GACTGAGGGAGACCGAGAATGAGAGAGAGACTGAGAGACAAG-ACAGAGACTGAGAGAGATGAGACAGA : 747 Aovo-DRB1*0304GA : --------------------------------------------------------------------- : - Aovo-DRB1*0305GA : GACT----------------------------------------------------------------- : 597 Aovo-DRB1*0306GA : --------------------------------------------------------------------- : - Aovo-DRB1*0307GA : --------------------------------------------------------------------- : - Aona-DRB3*0615 : --------------------------------------------------------------------- : - Aona-DRB3*0627 : GGCT----------------------------------------------------------------- : 613 Aona-DRB3*062501 : --------------------------------------------------------------------- : - Aona-DRB3*0626 : --------------------------------------------------------------------- : - Aona-DRB3*062502 : --------------------------------------------------------------------- : - Aona-DRB3*0628 : GACT----------------------------------------------------------------- : 609 Aovo-DRB3*0601 : --------------------------------------------------------------------- : - Aona-DRB*W8901 : --------------------------------------------------------------------- : - Aona-DRB*W1808 : --------------------------------------------------------------------- : - Aona-DRB*W1806 : --------------------------------------------------------------------- : - Aovo-DRB*W1801 : --------------------------------------------------------------------- : - Aovo-DRB*W1802 : --------------------------------------------------------------------- : - Aovo-DRB*W1803 : --------------------------------------------------------------------- : - Aovo-DRB*W8801 : --------------------------------------------------------------------- : - Aovo-DRB*W2901 : --------------------------------------------------------------------- : - Aona-DRB*W2908 : --------------------------------------------------------------------- : - Aona-DRB*W2910 : --------------------------------------------------------------------- : - Aovo-DRB*W3001 : --------------------------------------------------------------------- : - Aona-DRB*W3002 : --------------------------------------------------------------------- : - Aovo-DRB*W9201 : --------------------------------------------------------------------- : - Aovo-DRB*W9202 : --------------------------------------------------------------------- : - Aona-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9102 : --------------------------------------------------------------------- : - Aovo-DRB*W9001 : --------------------------------------------------------------------- : - Aovo-DRB*W4501 : --------------------------------------------------------------------- : - Aovo-DRB*W9301 : --------------------------------------------------------------------- : - 0 * 780 * 800 * 820 Aona-DRB1*0328GB : GAGAGACTGAGAGAGACTGAGAGAAAGACTGAGAGAGAGACTGAGAGAGGCTGAGAGACTGAGGGAGA- : 735 Aona-DRB1*0329GA : GAGAGACTGAGAGAGACTAAGAGA--GACTGAGGGAGA--CTGAGAGAGACTGAAAGACTGAGAGAGAG : 812 Aona-DRB1*031701GA : GAGAGACTGAGAGAGACTAAGAGA--GACTGAGGGAGA--CTGAGAGAGACTGAAAGACTGAGAGAGAG : 812 Aovo-DRB1*0304GA : ----------------------GA----------------CTGAGAGAGACTGAGAGACTGAGAGAAA- : 599 Aovo-DRB1*0305GA : ----------GAGAGACTGAGAGA--TACTTAGGGAGA--CTGAGAGAGACAGAGAGACTGAGAGAGA- : 651 Aovo-DRB1*0306GA : ----------------------GA----------------CTGAGAGAGACTGAGAGACTGAGAGAAA- : 609 Aovo-DRB1*0307GA : ----------------------GA----------------CTGAGAGAGACTGAGAGACTGAGAGAAA- : 613 Aona-DRB3*0615 : ----------------CTGAGAGA----------AAGA--CTGAGAGAGACTGAGAGACTGAGAGAAA- : 627 Aona-DRB3*0627 : ----------GAGAGACTGAGAGA--TACTTAGGGAGA--CTGAGAGAGACAGAGAGACTGAGAGAAAG : 668 Aona-DRB3*062501 : ----------------CTGAGAGA----------AAGA--CTGAGAGAGACTGAGAGACTGAGAGAAA- : 627 Aona-DRB3*0626 : ----------------CTGAGAGA----------AAGA--CTGAGAGAGACTGAGAGACTGAGAGAAA- : 625 Aona-DRB3*062502 : ----------------CTGAAAGA----------------CTGAGAGAGACTGAGAGACTGAGAGAAA- : 623 Aona-DRB3*0628 : ----------GAGAGACTGAGAGA--TACTTAGGGAGA--CTGAGAGAGACAGAGAGACTGAGAGAAAG : 664 Aovo-DRB3*0601 : ----------------CTGAGAGA----------AAGA--CTGAGAGAGACTGAGAGACTGAGAGAAA- : 626 Aona-DRB*W8901 : ---------------------------------------------------------------AAGAC- : 418 Aona-DRB*W1808 : -----------------------------------------------------------GACTGAGA-- : 487 Aona-DRB*W1806 : -----------------------------------------------------------GACTGAGA-- : 477 Aovo-DRB*W1801 : -----------------------------------------------------------GACTGAGA-- : 459 Aovo-DRB*W1802 : -----------------------------------------------------------GACTGAGA-- : 475 Aovo-DRB*W1803 : -----------------------------------------------------------GACTGAGA-- : 463 Aovo-DRB*W8801 : -----------------------------------------------------------CTGAAAGAC- : 462 Aovo-DRB*W2901 : ---------------------------------------------------------------AAGAC- : 418 Aona-DRB*W2908 : ---------------------------------------------------------------AAGAC- : 418 Aona-DRB*W2910 : ---------------------------------------------------------------AAGAC- : 418 Aovo-DRB*W3001 : ---------------------------------------------------------------GAGAG- : 411 Aona-DRB*W3002 : ---------------------------------------------------------------GAG--- : 413 Aovo-DRB*W9201 : ---------------------------------------------------------------GAG--- : 409 Aovo-DRB*W9202 : ---------------------------------------------------------------GAG--- : 409 Aona-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9102 : --------------------------------------------------------------------- : - Aovo-DRB*W9001 : --------------------------------------------------------------------- : - Aovo-DRB*W4501 : --------------------------------------------------------------------- : - Aovo-DRB*W9301 : --------------------------------------------------------------------- : - 6 * 840 * 860 * 880 * Aona-DRB1*0328GB : ----------------CAGAGAATGAGAGAGA---GACTGAGAGA---TAAGACAGAGACTGAGAGAGA : 782 Aona-DRB1*0329GA : ACTGAGAGAGACAAGACAGAGACTGAGAGAGATGAGACAGAGAGAGACTGAGAGAAAGACTGAGAGACA : 881 Aona-DRB1*031701GA : ACTGAGAGAGACAAGACAGAGACTGAGAGAGATGAGACAGAGAGAGACTGAGAGAAAGACTGAGAGACA : 881 Aovo-DRB1*0304GA : --------------------------------------------------------------------- : - Aovo-DRB1*0305GA : -----------------------------------------------CTGAGAGAAAGACTGAGAGA-- : 671 Aovo-DRB1*0306GA : --------------------------------------------------------------------- : - Aovo-DRB1*0307GA : --------------------------------------------------------------------- : - Aona-DRB3*0615 : --------------------------------------------------------------------- : - Aona-DRB3*0627 : ACT------------------------------------------------GAGAGAGACTGAGAGACT : 689 Aona-DRB3*062501 : --------------------------------------------------------------------- : - Aona-DRB3*0626 : --------------------------------------------------------------------- : - Aona-DRB3*062502 : --------------------------------------------------------------------- : - Aona-DRB3*0628 : ACT------------------------------------------------GAGAGAGACTGAGAGACT : 685 Aovo-DRB3*0601 : --------------------------------------------------------------------- : - Aona-DRB*W8901 : --------------------------------------------------------------------- : - Aona-DRB*W1808 : --------------------------------------------------------------------- : - Aona-DRB*W1806 : --------------------------------------------------------------------- : - Aovo-DRB*W1801 : --------------------------------------------------------------------- : - Aovo-DRB*W1802 : --------------------------------------------------------------------- : - Aovo-DRB*W1803 : --------------------------------------------------------------------- : - Aovo-DRB*W8801 : --------------------------------------------------------------------- : - Aovo-DRB*W2901 : --------------------------------------------------------------------- : - Aona-DRB*W2908 : --------------------------------------------------------------------- : - Aona-DRB*W2910 : --------------------------------------------------------------------- : - Aovo-DRB*W3001 : --------------------------------------------------------------------- : - Aona-DRB*W3002 : --------------------------------------------------------------------- : - Aovo-DRB*W9201 : --------------------------------------------------------------------- : - Aovo-DRB*W9202 : --------------------------------------------------------------------- : - Aona-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9101 : --------------------------------------------------------------------- : - Aovo-DRB*W9102 : --------------------------------------------------------------------- : - Aovo-DRB*W9001 : --------------------------------------------------------------------- : - Aovo-DRB*W4501 : --------------------------------------------------------------------- : - Aovo-DRB*W9301 : --------------------------------------------------------------------- : - 900 * 920 * 940 * 960 ga ag g agagac Aona-DRB1*0328GB : T-GAGACAGAGAGA----GACTGA--GAGAAAGACTGAGAGACAGAGA----CTGAGAGACTGAGAGA- : 839 Aona-DRB1*0329GA : --GAGACTGAGAGACTGAGACTGAGAGAAATAGACTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 948 Aona-DRB1*031701GA : --GAGACTGAGAGACTGAGACTGAGAGAAATAGACTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 948 Aovo-DRB1*0304GA : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 640 Aovo-DRB1*0305GA : ----GACTGAGAGA------------AACAGGGACTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 724 Aovo-DRB1*0306GA : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 650 Aovo-DRB1*0307GA : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 654 Aona-DRB3*0615 : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 668 Aona-DRB3*0627 : GAGAGACTGAGAGA------------AACAGAGACTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 746 Aona-DRB3*062501 : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 668 Aona-DRB3*0626 : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 666 Aona-DRB3*062502 : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 664 Aona-DRB3*0628 : GAGAGACTGAGAGA------------AACAGAGACTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 742 Aovo-DRB3*0601 : ------CAGAGA----------------------CTGAGAGACTGAGAGACACTGAGAGACTGAGAGAC : 667 Aona-DRB*W8901 : ----------------------------------GAGACAG------------AG--------AGAAAC : 433 Aona-DRB*W1808 : ----------------------------------GAGGCTG------------AGAGACTCGGAGAGAC : 510 Aona-DRB*W1806 : ----------------------------------GAGACTG------------AGAGACTCGGAGAGAC : 500 Aovo-DRB*W1801 : ----------------------------------GAGACTG------------AGAGACTCGGAGAGAC : 482 Aovo-DRB*W1802 : ----------------------------------GAGACTG------------AGAGACTCGGAGAGAC : 498 Aovo-DRB*W1803 : ----------------------------------GAGACTG------------AGAGACTCGGAGAGAC : 486 Aovo-DRB*W8801 : ----------------------------------GAGACAG------------AG--------AGAGAC : 477 Aovo-DRB*W2901 : ----------------------------------GAGACAG------------AG--------AGAGAC : 433 Aona-DRB*W2908 : ----------------------------------GAGACAG------------AG--------AGAAAC : 433 Aona-DRB*W2910 : ----------------------------------GAGACAG------------AG--------AGAAAC : 433 Aovo-DRB*W3001 : ----------------------------------GAGACAG------------AG--------AGAGAC : 426 Aona-DRB*W3002 : -----------------------------------AGAGAG------------AGT-------GGAGAC : 428 Aovo-DRB*W9201 : ----------------------------------GAGACAG------------AG--------AGAGAC : 424 Aovo-DRB*W9202 : ----------------------------------GAGACAG------------AG--------AGAGAC : 424 Aona-DRB*W9101 : ----------------------------------GACTGAG------------AGAGA----GAGACTG : 392 Aovo-DRB*W9101 : ----------------------------------GACTGAG------------AGAGA----GAGACTG : 368 Aovo-DRB*W9102 : ----------------------------------GACTGAG------------AGAGA----GAGACTG : 392 Aovo-DRB*W9001 : ----------------------------------GAATGAG------------AGAGA----CTGAGAG : 374 Aovo-DRB*W4501 : -----------------------------------AGACAG------------AG--------AGAGAC : 413 Aovo-DRB*W9301 : --------------------------------------------------------------------- : - 7 * 980 * 1000 * 1020 * ag aga a ActGA agAga agA A Aona-DRB1*0328GB : ---GACAGAGACTGAGAGACTGAGAGA----CTGAGAGACACAGATAGAGATTGAGAGACTGAGAGAGA : 901 Aona-DRB1*0329GA : ACAGATAGAGATTGAGAGACTGAGAGAGAGACTGAGAGACACTGATAGAGACTGAGAGAGAGACTGAGA : 1017 Aona-DRB1*031701GA : ACAGATAGAGATTGAGAGACTGAGAGAGAGACTGAGAGACACTGATAGAGACTGAGAGAGAGACTGAGA : 1017 Aovo-DRB1*0304GA : ACAGATAGAGATTGAGAGACTGAGAGA------------CACTGATAGAGACTGAGAGA--GACTGAGA : 695 Aovo-DRB1*0305GA : ACAGATAGAAATTGAGAGACTGAGAGA------------CACTGATAGAGACTGAGAGAGAGACTGAGA : 781 Aovo-DRB1*0306GA : ACAGATAGAGATTGAGAGACTGAGAGA------------CACTGATAGAGACTGAGAGA--GACTGAGA : 705 Aovo-DRB1*0307GA : ACAGATAGAGATTGAGAGACTGAGAGA------------CACTGATAGAGACTGAGAGA--GACTGAGA : 709 Aona-DRB3*0615 : ACAGATAGAGATTGAGAGACTGAGAGAGAGACTGAGAGACACTGATAGAGACTGAGAGAGAGACTGAGA : 737 Aona-DRB3*0627 : ACAGATAGAGATTGAGAGACTGAGAGA------------CACTGATAGAGACTGAGAGAGAGACTGAGA : 803 Aona-DRB3*062501 : ACAGATAGAGATTGAGAGACTGAGAGAGAGACTGAGAGACACTGATAGAGACTGAGAGAGAGACTGAGA : 737 Aona-DRB3*0626 : ACAGATAGAGATTGAGAGACTGAGAGAGAGACTGAGAGACACTGATAGAGACTGAGAGAGAGACTGAGA : 735 Aona-DRB3*062502 : ACAGATAGAGATTGAGAGACTGAGAGAGAGACTGAGAGACACTGATAGAGACTGAGAGAGAGA------ : 727 Aona-DRB3*0628 : ACAGATAGAGATTGAGAGACTGAGAGA------------CACTGATAGAGACTGAGAGAGAGACTGAGA : 799 Aovo-DRB3*0601 : ACAGATAGAGATTGAGAGACTGAGAGAGAGACTGAGAGACACTGATAGAGACTGAGAGAGAGACTGAGA : 736 Aona-DRB*W8901 : TGAG--AGAAACAT--------------------AGA--GACTGAGAGAGAG--ACACA---------- : 466 Aona-DRB*W1808 : TGAGCGAGAGACTG--------------------AGATAGACTGAGAGAAACTGAGAGA---------- : 549 Aona-DRB*W1806 : TGAGCGAGAGACTG--------------------AGATAGACTGAGAGAAACTGAGAGA---------- : 539 Aovo-DRB*W1801 : TGAGCGAGAGACTG--------------------AGATAGACTGAGAGAAACTGAGAGA---------- : 521 Aovo-DRB*W1802 : TGAGCGAGAGACTG--------------------AGATAGACTGAGAGAAAC--AGAGA---------- : 535 Aovo-DRB*W1803 : TGAGCGAGAGACTG--------------------AGATAGACTGAGAGAAACTGAGAGA---------- : 525 Aovo-DRB*W8801 : TGAG--AGAAACAG--------------------AGA--GACTGAGAGAGGG--ACACA---------- : 510 Aovo-DRB*W2901 : TGAG--AGAAACAT--------------------AGA--GACTGAGAGAGAG--ACACA---------- : 466 Aona-DRB*W2908 : TGAG--AGAAACAT--------------------AGA--GACTGAGAGAGAG--ACACA---------- : 466 Aona-DRB*W2910 : TGAG--AGAAACAT--------------------AGA--GACTGAGAGAGAG--ACACA---------- : 466 Aovo-DRB*W3001 : --AG--AGAGACTG--------------------AGA--GACTGAGAAAGAC--AGAGA---------- : 457 Aona-DRB*W3002 : AGAG--AGAGACAG--------------------AGA--GACTGAGAAAGAC--AGAGA---------- : 461 Aovo-DRB*W9201 : --AG--AGAGACTG--------------------AGA--GACTGAGAAAGAC--AGAGA---------- : 455 Aovo-DRB*W9202 : --AG--AGAGACTG--------------------AGA--GACTGAGAAAGAC--AGAGA---------- : 455 Aona-DRB*W9101 : AAAGACGAGACA-------------------------GAGAGAGACTGAGAGAGGGACA---------- : 426 Aovo-DRB*W9101 : AAAGACGAGACA-------------------------GAGAGAGACTGAGAGAGGGACA---------- : 402 Aovo-DRB*W9102 : AAAGACGAGACA-------------------------GAGAGAGACTGAGAGAGGGACA---------- : 426 Aovo-DRB*W9001 : AGACTGAGAGAT----------------------------AGAGACTGAGAGACAGACA---------- : 405 Aovo-DRB*W4501 : --AG--AGAGGCTG--------------------AGA--GACTGAGAAAGAC--AGAGA---------- : 444 Aovo-DRB*W9301 : ------------------------------------AGAGACTGAGAGAGAGAGAGAGA---------- : 379 1040 * 1060 * 1080 * 1100 ag agaGAGa AgaG ga GaGagaGc Aona-DRB1*0328GB : GACTGGGGGGG-GAGAGAGAGAGAGAGAGAGAGA--------------------CTGAGAGAGCACGTG : 949 Aona-DRB1*0329GA : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA----CTGAGAGAGCGCGTG : 1082 Aona-DRB1*031701GA : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGACTGAGAGAGCGCGTG : 1086 Aovo-DRB1*0304GA : GACTGAGAGAGAGAGAGAGAGAGAGAAA--------------------------CTGAGAGAGCACTTG : 738 Aovo-DRB1*0305GA : GACTGAGAGACAGAGAGAGAGAGAGAGAGAAA----------------------CTGAGAGAGCACTTG : 828 Aovo-DRB1*0306GA : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAAA----------------------CTGAGAGAGCACTTG : 752 Aovo-DRB1*0307GA : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAAA----------------------CTGAGAGAGCACTTG : 756 Aona-DRB3*0615 : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA--------------------CTGAGAGAGCGCTTG : 786 Aona-DRB3*0627 : GACTGAGAGAGAGAGAGAGAGAGAAA----------------------------CTGAGAGAGCACTTG : 844 Aona-DRB3*062501 : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA--------------------CTGAGAGAGCGCTTG : 786 Aona-DRB3*0626 : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA--------------------CTGAGAGAGCGCTTG : 784 Aona-DRB3*062502 : ------GAGAGAGAGAGAGAGAGAGAGA--------------------------CTGAGAGAGCGCTTG : 764 Aona-DRB3*0628 : GACTGAGAGAGAGAGAGAGAGAGAGAAA--------------------------CTGAGAGAGCACTTG : 842 Aovo-DRB3*0601 : GACTGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA--------------------CTGAGAGAGCGCTTG : 785 Aona-DRB*W8901 : ----GAG--AGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 493 Aona-DRB*W1808 : ----TACTGAGAGAGAGAGAGCGC--------------------------------GCGACAGCG---- : 578 Aona-DRB*W1806 : ----TACTGAGAGAGAGAGAGCGC--------------------------------GCGACAGCG---- : 568 Aovo-DRB*W1801 : ----TACTGAGAGAGAGAGAGCGC--------------------------------GCGACAGCG---- : 550 Aovo-DRB*W1802 : ----TACTGAGAGAGAGAGAGCGC--------------------------------GCGACAGCG---- : 564 Aovo-DRB*W1803 : ----TACTGAGAGAGAGAGAGCGC--------------------------------GCGACAGCG---- : 554 Aovo-DRB*W8801 : ----GAG--AGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 537 Aovo-DRB*W2901 : ----GAG--AGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 493 Aona-DRB*W2908 : ----GAG--AGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 493 Aona-DRB*W2910 : ----GAG--AGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 493 Aovo-DRB*W3001 : ----GAGCCAGAGAGCCAGAGCCA--------------------------------GAGAGAGCA---- : 486 Aona-DRB*W3002 : ----GAGCCAGAGAGCCAGAGCCA--------------------------------GAGAGAGCA---- : 490 Aovo-DRB*W9201 : ----GAGCCAGAGAGCCAGAGCCA--------------------------------GAGAGAGCA---- : 484 Aovo-DRB*W9202 : ----GAGCCAGAGAGCCAGAGCCA--------------------------------GAGAGAGCA---- : 484 Aona-DRB*W9101 : ----CAGAGAGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 455 Aovo-DRB*W9101 : ----CAGAGAGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 431 Aovo-DRB*W9102 : ----CAGAGAGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 455 Aovo-DRB*W9001 : ----CAGAGAGAGAGACAGAGCGA--------------------------------GAGAGAGCG---- : 434 Aovo-DRB*W4501 : ----GAGCCAGAGAGCCAGAGCCA--------------------------------GAGAGAGCA---- : 473 Aovo-DRB*W9301 : ----GAGAGAGAGAGAGACTGAGA--------------------------------GAGCGCGTG---- : 408 8 * 1120 * 1140 * 1160 * C aTCTGTGAG TT AGaATCCTcTc ATCCTGAGCagGGAGcTCtGaGGGCACAggTGTgTGTgt Aona-DRB1*0328GB : CCATCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 1016 Aona-DRB1*0329GA : CCATCTGTGAGCATTCAGAATCCTGTCCATCCTGAGCAGGGAGCTCTGGGGGCACAGGTGTGTGTAT-- : 1149 Aona-DRB1*031701GA : CCATCTGTGAGCATTCAGAATCCTGTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTTTGTAT-- : 1153 Aovo-DRB1*0304GA : CAATCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 805 Aovo-DRB1*0305GA : CAATCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 895 Aovo-DRB1*0306GA : CAATCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 819 Aovo-DRB1*0307GA : CAATCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 823 Aona-DRB3*0615 : CAGTCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGT---- : 851 Aona-DRB3*0627 : CCATCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 911 Aona-DRB3*062501 : CAGTCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGT---- : 851 Aona-DRB3*0626 : CAGTCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGT---- : 849 Aona-DRB3*062502 : CAGTCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGT---- : 829 Aona-DRB3*0628 : CCATCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 909 Aovo-DRB3*0601 : CAGTCTGTGAGCATTCAGAATCCTCTCCATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGT---- : 850 Aona-DRB*W8901 : CCATCTGTGAGAGTTCAGTATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 560 Aona-DRB*W1808 : CCATCTGTGAGCGTTTAGAATCCTCTCAATCCTGAGCGGGGAGCTCTGAGGGCACAAGTGTGTGTGT-- : 645 Aona-DRB*W1806 : CCATCTGTGAGCGTTTAGAATCCTCTCAATCCTGAGCAGGGAGCTCCGAGGGCACAAGTGTGTGTGT-- : 635 Aovo-DRB*W1801 : CCATCTGTGAGCGTTTAGAATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGTGT : 619 Aovo-DRB*W1802 : CCATCTGTGAGAGTTTAGAATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 631 Aovo-DRB*W1803 : CCATCTGTGAGCGTTTAGAATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 621 Aovo-DRB*W8801 : CCATCTGTGAGAGTTTAGAATCCTCTCAATCCTGAGCAAGGAGTTCTGAGGGCACAGATGTGTGTGT-- : 604 Aovo-DRB*W2901 : CCATCTGTGAGAGTTCAGTATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 560 Aona-DRB*W2908 : CCATCTGTGAGAGTTCAGTATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 560 Aona-DRB*W2910 : CCATCTGTGAGAGTTCAGTATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 560 Aovo-DRB*W3001 : CCATCTGTGAGAGTTTAGAATCCTCTAAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 553 Aona-DRB*W3002 : CCATCTGTGAGAGTTTAGAATCCTCTAAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 557 Aovo-DRB*W9201 : CCATCTGTGAGAGTTTAGAATCCTCTAAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 551 Aovo-DRB*W9202 : CCATCTGTGAGAGTTTAGAATCCTCTAAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 551 Aona-DRB*W9101 : CCATCTGTGAGAGTTTAGAATCCTCTCAATCCTGAGCAAGGAGTTCTGAGGGCACAGATGTGTGTGT-- : 522 Aovo-DRB*W9101 : CCATCTGTGAGAGTTTAGAATCCTCTCAATCCTGAGCAAGGAGTTCTGAGGGCACAGATGTGTGTGT-- : 498 Aovo-DRB*W9102 : CCATCTGTGAGAGTTTAGAATCCTCTCAATCCTGAGCAAGGAGTTCTGAGGGCACAGATGTGTGTGT-- : 522 Aovo-DRB*W9001 : CCATCTGTGAGAGTTCAGTATCCTCTCAATCCTGAGCAAGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 501 Aovo-DRB*W4501 : CCATCTGTGAGAGTTTAGAATCCTCTAAATCCTGAGCAGGGAGCTCTGAGGGCACAGGTGTGTGTGT-- : 540 Aovo-DRB*W9301 : CCATCTGTGAGCATTCAGTATCCTCTCAATCCTGAGCAGGGAGCTCTGAGGGCACAGATGTGTGTGT-- : 475 1180 * 1200 * 1220 * 1240 AGAGTGTGGATTTGTGTG G GGCTGTTGTGgGagGgGAgGCAGGAGGGGGCTTCTTC TA CCTTGGA Aona-DRB1*0328GB : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTACCCTTGGA : 1085 Aona-DRB1*0329GA : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 1218 Aona-DRB1*031701GA : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 1222 Aovo-DRB1*0304GA : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 874 Aovo-DRB1*0305GA : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 964 Aovo-DRB1*0306GA : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 888 Aovo-DRB1*0307GA : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 892 Aona-DRB3*0615 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 920 Aona-DRB3*0627 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 980 Aona-DRB3*062501 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 920 Aona-DRB3*0626 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 918 Aona-DRB3*062502 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 898 Aona-DRB3*0628 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 978 Aovo-DRB3*0601 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 919 Aona-DRB*W8901 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGAG-AGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 628 Aona-DRB*W1808 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGGGAGGGGAAGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 714 Aona-DRB*W1806 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGGGAGGGGAAGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 704 Aovo-DRB*W1801 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGGGAGGGGAAGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 688 Aovo-DRB*W1802 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 700 Aovo-DRB*W1803 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGGGAGGGGAAGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 690 Aovo-DRB*W8801 : AGAGTGTGGATTTGTGTGCGTGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 673 Aovo-DRB*W2901 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGAGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 629 Aona-DRB*W2908 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGAGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 629 Aona-DRB*W2910 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGAGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 629 Aovo-DRB*W3001 : AGAGTGTGGATTTGTGTGAGAGGCTGTTGTGGGAGGAGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 622 Aona-DRB*W3002 : AGAGTGTGGATTTGTGTGAGAGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCCTATCCTTGGA : 626 Aovo-DRB*W9201 : AGAGTGTGGATTTGTGTGAGAGGCTGTTGTGGGAGGAGAGGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 620 Aovo-DRB*W9202 : AGAGTGTGGATTTGTGTGAGAGGCTGTTGTGGGAGGAGAGGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 620 Aona-DRB*W9101 : AGAGTGTGGATTTGTGTGCGTGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 591 Aovo-DRB*W9101 : AGAGTGTGGATTTGTGTGCGTGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 567 Aovo-DRB*W9102 : AGAGTGTGGATTTGTGTGCGTGGCTGTTGTGGGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 591 Aovo-DRB*W9001 : AGAGTGTGGATTTGTGTGTGTGGCTGTTGTGAGAGGGGAGGCAGGAGGGGGCTTCTTCTTACCCTTGGA : 570 Aovo-DRB*W4501 : AGAGTGTGGATTTGTGTGAGAGGCTGTTGTGGGAGGAGAGGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 609 Aovo-DRB*W9301 : AGAGTGTGGATTTGTGTGTGAGGCTGTTGTGAGAGGCGAGGCAGGAGGGGGCTTCTTCTTATCCTTGGA : 544 9 * 1260 * 1280 * 1300 * Ggcctct gtg gagg gaca gagg gg t cagggg tggaga ggaggagacct gattgtcc Aona-DRB1*0328GB : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTGGATTGTCC : 1152 Aona-DRB1*0329GA : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 1285 Aona-DRB1*031701GA : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 1289 Aovo-DRB1*0304GA : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 941 Aovo-DRB1*0305GA : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 1031 Aovo-DRB1*0306GA : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 955 Aovo-DRB1*0307GA : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 959 Aona-DRB3*0615 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 987 Aona-DRB3*0627 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 1047 Aona-DRB3*062501 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 987 Aona-DRB3*0626 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 985 Aona-DRB3*062502 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 965 Aona-DRB3*0628 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 1045 Aovo-DRB3*0601 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGGGG-TGGAGAGGGAGGAGACCTCGATTGTCC : 986 Aona-DRB*W8901 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTGCAGGGG-TGGAGAGGGAGGAGACCTGGATTGTCC : 695 Aona-DRB*W1808 : GGCCTCTTGTGTTGAGGGGACATGTGAGGTGGCTGCAGGGGCTGGAGAGGGAGGAGACCTCGATTGTCC : 783 Aona-DRB*W1806 : GGCCTCTTGTGTTGAGGGGACATGTGAGGTGGCTGCAGGGGCTGGAGAGGGAGGAGACCTCGATTGTCC : 773 Aovo-DRB*W1801 : G-------------------------------------------------------------------- : 689 Aovo-DRB*W1802 : GGCCTCTTGTGTTGAGGGGACATGTGAGGTGGCTGCAGGGGCTGGAGAGGGAGGAGACCTCGATTGTCC : 769 Aovo-DRB*W1803 : GGCCTCTTGTGTTGAGGGGACATGTGAGGTGACTGCAGGGGCTGGAGAGGGAGGAGACCTCGATTGTCC : 759 Aovo-DRB*W8801 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTGCAGGGG-TGGAGACGGAGGAGACCTGGATTGTCC : 740 Aovo-DRB*W2901 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTGCAGGGG-TGGAGAGGGAGGAGACCTGGATTGTCC : 696 Aona-DRB*W2908 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTGCAGGGG-TGGAGAGGGAGGAGACCTGGATTGTCC : 696 Aona-DRB*W2910 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTGCAGGGG-TGGAGAGGGAGGAGACCTGGATTGTCC : 696 Aovo-DRB*W3001 : G-------------------------------------------------------------------- : 623 Aona-DRB*W3002 : GGCCTCT-GTGGGGAGGTGACACAGGAGGTGGGTGCAGAGG-TGGGGAGGGAGGAGACCTCGTTTGTCA : 693 Aovo-DRB*W9201 : G-------------------------------------------------------------------- : 621 Aovo-DRB*W9202 : G-------------------------------------------------------------------- : 621 Aona-DRB*W9101 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTACAGGGG-TGGAGACGGAGGAGACCTGGATTGTCC : 658 Aovo-DRB*W9101 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTACAGGGG-TGGAGACGGAGGAGACCTGGATTGTCC : 634 Aovo-DRB*W9102 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTACAGGGG-TGGAGACGGAGGAGACCTGGATTGTCC : 658 Aovo-DRB*W9001 : GGCCTCT-GTGAGGAGGTGACATGGGAGGCGGGTGCAGGGG-TGGAGAGGGAGGAGACCTGGATTGTCC : 637 Aovo-DRB*W4501 : G-------------------------------------------------------------------- : 610 Aovo-DRB*W9301 : G-------------------------------------------------------------------- : 545 1320 * 1340 * 1360 * t ggtccttagagat caggaa g aa tga gtgtgtgtggctggggtgagggttta Aona-DRB1*0328GB : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1212 Aona-DRB1*0329GA : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1345 Aona-DRB1*031701GA : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1349 Aovo-DRB1*0304GA : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1001 Aovo-DRB1*0305GA : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1091 Aovo-DRB1*0306GA : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1015 Aovo-DRB1*0307GA : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1019 Aona-DRB3*0615 : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1047 Aona-DRB3*0627 : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1107 Aona-DRB3*062501 : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1047 Aona-DRB3*0626 : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1045 Aona-DRB3*062502 : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1025 Aona-DRB3*0628 : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1105 Aovo-DRB3*0601 : TTGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 1046 Aona-DRB*W8901 : TGGGTCCTTAGAGATGCAGGAAGGGAAATG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 755 Aona-DRB*W1808 : TGGGTCCTTAGAGATGCAGGAAGGGAAATG-TGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 843 Aona-DRB*W1806 : TGGGTCCTTAGAGATGCAGGAAGGGAAATG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 833 Aovo-DRB*W1801 : ------------------------------------------------------------- : - Aovo-DRB*W1802 : TGGGTCCTTAGAGATGCAGGAAGGGAAATG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 829 Aovo-DRB*W1803 : TTGGTCCTTAGAGATGCAGGAAGGGAAATG-TGAAGTGTGTGTGGCTGGGGTGAGGGTTTA : 819 Aovo-DRB*W8801 : TGGGTCCTTAGAGATTCAGGAATGGAAATG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 800 Aovo-DRB*W2901 : TGGGTCCTTAGAGATGCAGGAAGGGAAATG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 756 Aona-DRB*W2908 : TGGGTCCTTAGAGATGCAGGAAGGGAAATG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 756 Aona-DRB*W2910 : TGGGTCCTTAGAGATGCAGGAAGGGACCTG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 756 Aovo-DRB*W3001 : ------------------------------------------------------------- : - Aona-DRB*W3002 : TTGGTCCTTAGAGATGCAGGAATGGAAATG-TGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 753 Aovo-DRB*W9201 : ------------------------------------------------------------- : - Aovo-DRB*W9202 : ------------------------------------------------------------- : - Aona-DRB*W9101 : TGGGTCCTTAGAGATTCAGGAA-TGGAAATGTGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 718 Aovo-DRB*W9101 : TGGGTCCTTAGAGATTCAGGAA-TGGAAATGTGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 694 Aovo-DRB*W9102 : TGGGTCCTTAGAGATTCAGGAA-TGGAAATGTGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 718 Aovo-DRB*W9001 : TGGGTCCTTAGAGATGCAGGAA-GGGAAATGTGAGGTGTGTGTGGCTGGGGTGAGGGTTTA : 697 Aovo-DRB*W4501 : ------------------------------------------------------------- : - Aovo-DRB*W9301 : ------------------------------------------------------------- : - 10 Capítulo 3. Structural analysis of owl monkey MHC-DR shows that fully protective malaria vaccine components can be readily used in humans Suárez CF, Pabón L, Barrera A, Aza-Conde J, Patarroyo MA, Patarroyo ME. Structural analysis of owl monkey MHC-DR shows that fully-protective malaria vaccine components can be readily used in humans. Biochem Biophys Res Commun. 2017;491(4):1062-1069. La versión publicada del artículo puede ser consultada en: http://www.sciencedirect.com/science/article/pii/S0006291X17315486 91 Structural analysis of Owl monkey MHC-DR shows that fully-protective malaria vaccine components can be readily used in humans Carlos F. Suáreza,b,c, Laura Pabóna, Ana Barreraa, Jorge Aza-Condea, Manuel Alfonso Patarroyoa,b, Manuel Elkin Patarroyoa,d,* a Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá D.C., Colombia b Universidad del Rosario, Bogotá D.C., Colombia c Universidad de Ciencias Aplicadas y Ambientales (UDCA), Bogotá, Colombia d Universidad Nacional de Colombia, Bogotá DC, Colombia. * Corresponding author. e-mail: mepatarr@gmail.com Abstract More than 50 years ago the owl monkey (genus Aotus) was found to be highly susceptible to developing human malaria, making it an excellent experimental model for this disease. Microbes and parasites’ (especially malaria) tremendous genetic variability became resolved during our malaria vaccine development, involving conserved peptides having high host cell binding activity (cHABPs); however, cHAPBs are immunologically silent and must be specially modified (mHABPs) to induce a perfect fit into major histocompatibility complex (MHC) molecules (HLA in humans). Since malarial immunity is mainly antibody-mediated and controlled by the HLA- DRB genetic region, ~1,000 Aotus have been molecularly characterised for MHC-DRB, revealing striking similarity between human and Aotus MHC-DRB repertories. Such convergence suggested that a large group of immune protection-inducing protein structures (IMPIPS), highly immunogenic and protection inducers against malarial intravenous challenge in Aotus, could easily be used in humans for inducing full protection against malaria. We highlight the value of a logical and rational methodology for developing a vaccine in an appropriate animal model: Aotus monkeys. Keywords: MHC-DR, animal model, IMPIPS, malarial-vaccine, HLA-peptide binding Introduction Searching for an appropriate experimental model for human malaria research, Carl Johnson’s group [1] demonstrated that Plasmodium vivax malaria could be transmitted to Aotus monkeys through infected human blood and to humans via an infected mosquito’s bite, thereby replicating the malaria parasite’s biological cycle. Contacos & Collins [2] repeated the trial, infecting Aotus with P. falciparum-infected blood and humans by mosquito bite, concluding that Aotus is an excellent experimental model for human malaria research. Many human P. falciparum, P. vivax, P. malariae and P. ovale parasite strains have now been adapted to grow in Aotus. Such primates are native to Panama and tropical South America; we have been using them during the last 35 years in the search for a logical and rational vaccine development methodology. Aotus reduce dangerous, cumbersome and expensive human trials to a minimum as they involve thousands of people who have to be followed-up for years. Our experimental guidelines regarding the animal model follow stringent methodology, followed by meticulous immunological analysis [3, 4]. A very robust, sensitive and specific methodology has emerged from working with modified high-activity binding peptides (mHABPs) derived from conserved high-activity binding peptides (cHABPs) from the most relevant proteins involved in host (red blood, hepatocyte and endothelial cell) binding and invasion. This has led to defining some specific principles and rules for vaccine development [3, 4]. MHC-mediated antigen presentation represents the first step in inducing immune protection; HLA- DR molecules have two very deep pockets (P1, P9) in their peptide binding region (PBR). Along with two shallow ones (P4, P6), they enable a perfect antigen fit for establishing H-bonds and becoming fixed for proper presentation to TCR molecules, thereby activating an appropriate immune response. We thus characterised the main Aotus immune system components by molecular biology such as MHC-I/II [5-11] and other molecules as TCR [12, 13], finding 80%-100% similarity with human counterparts, thus enabling information regarding Aotus to be extrapolated to vaccines for human use. In-depth understanding of antigen presentation involved studying ~1,000 Aotus monkeys; 215 MHC-DRB sequences were obtained [8-11], analysed and grouped into lineages according to their sequences. Similarity with human HLA-DRB allele lineages was investigated by generating pocket profiles. Molecular modelling methods were used to generate Aotus-MHC-DR pockets from HLA-DR molecules whose structure had already been determined by X-ray crystallography; Aotus/human variant residues were replaced to determine their impact on volume and electrostatic characteristics regarding experimentally-obtained results in Aotus for using such peptides as fully- effective vaccine components. Materials and methods Pocket profiling The main problem when dealing with great MHC-DRB allele diversity was resolved by abstracting sequences to a “pocket dictionary” which estimated unique pocket variety, defined by key contact- residues involved in peptide binding. Pocket profiles were determined by the occurrence of a specific amino-acid (aa) combination in MHC pockets defined from previous crystallographic studies [14]. The most frequently occurring profiles, named by allele prototype were determined for each allele lineage (PPF in Figure 1); translated HLA-DRB sequences reported in the IMGT [15] were used for humans along with Aotus-MHC-DRB sequences previously reported by our group [8-11], Allele Frequency Net Database (http://www.allelefrequencies.net) was used for calculating allele lineage frequency for humans and our previous surveys for Aotus (% in red, Figure 1) [8-11]. IMPIPS’ potential population coverage was calculated as the product of MHC lineage probability and the probability of the profile on which it was designed (% in blue, Figure 1). PAM250 matrix was used for calculating average percentage identity and similarity between HLA-DRB and Aotus-DRB lineages. HLA-DR peptide-binding prediction NETMHCIIPAN-3.1 [16], the best available tool for peptide-binding prediction, was used for predicting peptide-HLA-DRB allele binding affinity with peptide vaccine candidates and evaluating residue affinity for each pocket. We categorised epitopes as being strong binders (≤100 nM), binders (>100 to ≤500 nM) and non-binders (>500 nM). The pocket profile approach selected 65 HLA-DRB allele prototypes for predictions, covering at least 60% of the pocket profiles displayed in each HLA-DRB1 lineage (% in green, Figure 1): DRB1*0101/02/04/09/06, DRB1*0301/02/05/25/13, DRB1*0401/02/03/04/05/06/07/08/22, DRB1*0701/04/03/06/24, DRB1*0801/02/04/05/06/12/24/34, DRB1*0901/02, DRB1*1001/02, DRB1*1101/02/04/06/11/10, DRB1*1201/16, DRB1*1301/02/03/12/07/05, DRB1*1401/05/03/04/14/06/08/25/32, DRB1*1501/02/03, DRB1*1601/04/15. MHC-DR modelling and analysis HLA-DRβ1*0101 (PDB-1DLH), HLA-DRβ1*1501 (PDB-1BX2), HLA-DRβ1-03 (PDB-1A6A) and HLA-DRβ1*04 (PDB-1J8H) crystallographic structures were used as templates for sterically localising residue/aa differences between humans and Aotus. Since Aotus MHC-DR structure has not been described, molecular modelling (Insight II energy minimisation analysis) involved replacing β-chain residues for obtaining energetically-favoured structures [17]. Residues forming P1, P4, P6, P9 (Figure 1 and Figure 2 for β-chain residues, since α-chain residues are conserved) were used for each complex; human and Aotus electrostatic potential surface and volume were determined via UCSF Chimera package. APBS was used to evaluate each pocket’s electrostatic surface potential; solvent-accessible potential surface values were set from 7 kT/e (negative charge, red) to 7 kT/e (positive, blue) [18]. Peptides used for immunisation, protection and 3D structure determination Chemically-synthesised peptides used for 600 MHz 1H-NMR spectrometry 3D structure determination, assessing Aotus immunisation, challenge, protection and infection, determining immunofluorescence antibody test (IFA) and western blot (WB) reactivity with P. falciparum lysates have been thoroughly described [4]. Results and Discussion Analysing Aotus class II gene MHC-DRB sequences revealed 17 allele lineages’ striking convergence with human HLA-DRβ lineages [8-11] (~82% mean similarity in MHC-DRB 1 domain) (Figure 1). Remarkably, no allele differences were observed between humans/Aotus in large hydrophobic P1, since both had dimorphic variation β86G (accepting aromatic residues W, Y, F) or β86V (accepting large aliphatic residues L, I, M, V) in all allele lineages [19]. Human HLA-DRβ1* and Aotus DR-like allele lineages’ variant ratio (β86Gβ86V) was the main difference between humans and Aotus regarding P1. A detailed analysis follows regarding alleles having differences or similarities between humans and Aotus. HLA-DRβ1*03 lineage The human HLA-DRβ1*03 lineage covers 19.7% of the global population, 5 allele-prototypes accounting for 65.9% of HLADRβ1*03 pocket profiles whilst Aotus-MHC-DRβ1*03 lineage accounts for 57.1% of its population. Figure 1 shows Aona- DRβ1*0305/07/04/09 alleles as being almost identical to human HLA-DRβ1*0302/05 regarding aa sequence and Aona-DRβ1*0311 being identical to HLA-DRβ1*0301, 0325 and 0313 alleles. β86V as predominant dimorphic allele (~80%) in humans compared to ~20% in Aotus represented the difference between humans and Aotus in P1; P4 was almost identical electrostatically and volumetrically in both species, accommodating D and S. HLA-DRβ1*03 structure revealed differences in P6 (adjoining the PBR groove), as Fβ9E and Qβ10Y had similar volume (131.8 Å3 cf 139.7 Å3) and charge, were far apart in P6 side wall in humans and did not interact directly with a peptide, having no impact on antigen presentation; so, binding prediction preferences for R, K, P, S could be equivalent for both species. P9 Fβ9E and Yβ37N differences (Figure 1) made it slightly larger (202.7 Å3 cf 196.4 Å3) and more pi(п)-charged in Aotus (Figure 2, row 1, columns 5-6). The aforementioned residues plus Yβ30, Yβ60 and Wβ61 conserved residues formed P9 in both species. Such electrostatic and volumetric difference induced Aotus to prefer aromatic residue Y rather than R as in humans; peptide-binding prediction gave R, Y, S as classical binding motifs for P9. However, alleles HLA-DRβ1*0338, 0319, 0313, 0310 also bound apolar residues V, L, I. The five HLA-DRβ1*03-binding IMPIPS protecting Aotus monkeys against intravenous challenge with fresh, living P. falciparum parasites could thus be readily used to protect ~13.0% of the human population, since both allele lineages are almost identical in both species. Figure 3A gives an excellent example of AMA-1 cHABP 4313-derived 10022 IMPIPS fully protecting Aotus, displaying all HLA-DRβ1*0302 and 1312 allele binding molecular characteristics. Theoretically, interaction with HLA-DRβ1*0302 could protect ~1.0% and ~1.2% of the world's population, respectively, according to this pocket profile's frequency in humans (Figure 1); 4 more IMPIPS binding to the other HLA-DRβ1*03 lineage would thus be required to protect the remaining human population. HLA-DRβ1*04 lineage Aona-DRβW4704/01/02/03/05/08/09 alleles (in 21.5% of the Aotus population) are quite similar to HLA-DRβ1*0401/05/03/04/02/06 alleles (in 26.1% of humans). Dimorphic variants in P1 were quite similar between both species (26.3% in humans, 23.1% in Aotus). P4 differences (Aotus P4, Dβ70Q and Qβ74A/E, Figure 1) made it slightly smaller (134.0 Å3 cf 149.5 Å3) and more apolar in Aotus (Figure 2, row 2, columns 5-6), mainly accepting residues L, M, I; all predominated in peptide-binding prediction HLA-DRβ1*04 binding motifs. Ile was always avoided when designing peptides due to its unfavourable PPIIL-forming propensity. However, 1/3 of HLA- DRβ1*04 alleles (i.e. HLA-DRβ1*0422/01,07,09,16,17,21,33,34,35) also received D in P4, common in Aotus. P6 was identical in both species and allele lineage ratios were very similar. P9 was almost identical, receiving S, T, D in both species; however, alleles HLA-DRβ1*0422, 25, 44, 55 also received R, K. Thus, nine HLA-DRβ1*04-binding IMPIPS (having highly immunogenic and protection-inducing characteristics in Aotus which could be readily used in humans) could protect ~15.6% of the human population (Figure 1). Figure 3B gives another clear example regarding Aotus EBA-175 cHABP 1758-derived 13790 IMPIPS (equivalent to HLA- DRβ1*0422-binding IMPIPS for human use) could protect ~0.26% of the world’s population. HLA-DRβ1*15 lineage This lineage covers 24.6% of the human population; the four allele-prototypes shown here represent 76.9% of lineage pocket profiles. As this lineage is present in only 2% of the Aotus population, it hampered identifying IMPIPS for these alleles. Differences with Aona-DRβ*03GC in HLA-DRβ1*15 lineages were minimal in P1, where large aliphatic residues L, M, V, I were preferred due to β86V dimorphic variant predominance (>85% in humans and almost 100% in Aotus) in both groups (Figure 1). Eβ70Q and Tβ71A replacements slightly reduced P4 size in Aotus, preferring large aromatic F, Y or large aliphatic L, M, I residues, as described by peptide- binding prediction (F, L, M, I, Y, respectively). P6 replacements F9βW (far apart in pocket wall), Tβ71A (not forming this pocket), Tβ11P, Tβ12K and Sβ13R had little relevance, since P6 had little involvement in this lineages’ peptide binding. Critical, specific P7 was identical for humans and Aotus. Regarding P9 replacements, HLA-DRβ1*1501 3D-structure (Figure 2, row 3, column 4) showed Sβ13R far away from P9 floor whilst Yβ37S and Eβ57D (on pocket floor) made it negatively- charged in Aotus, showing a slightly greater volume than in humans (244.0 Å3 cf 223.6 Å3) (Figure 2, row 3, column 5-6). Positively-charged residues R, K were preferred in Aotus rather than L, V, also accepted in humans. Fβ61W replacement was a relevant difference, since nitrogen from the indole ring in aromatic β9W (absent in aromatic Aona β9F) established one H-bond with the peptide’s backbone, stabilising this MHCII-peptide complex. Therefore, four IMPIPs having Aotus-protecting characteristics could be readily used in humans could protect ~18.9% of the human population against P. falciparum malaria. Unfortunately, we lack 3D-structures for the few HLA-DRβ1*15-binding IMPIPS identified to date. HLA-DRβ1*01 lineage This HLA-DRβ1* lineage accounts for 17% of the global population; its 5 allele-prototypes comprises 72.3% of HLA-DRβ*01 pocket profiles. However, the counterpart (Aona-DRβ*W43 lineage) in only 2.4% of the Aotus population hampered finding monkeys having genetic traits equivalent to HLA-DRβ1*01 and HLA-DRβ1*15 for identifying IMPIPs fitting into these alleles’ PBR. Gβ86V dimorphism frequency was similar in P1 (75% in Aotus, 60% in humans). P4 was almost identical; Dβ28E and Nβ70Q differences did not make any substantial volumetric (150.5 Å3 cf 163.4 Å3) or electrostatic differences since replacements were electrostatically similar (DE, NQ). Binding motifs were the same, preferentially accepting apolar residues L, M, V, I in both species. Regarding highly specific and relevant HLA-DRβ1*01, Aotus and humans had differences regarding Kβ9W in P6, far apart in pocket wall according to 3D-model (Figure 2, row 4, column 3), and Vβ11L, Cβ13F, Hβ30C (on top of P6) having equivalent volume in Aotus and humans (147.9 Å3 cf 149.6 Å3 respectively) being somewhat negatively-charged in Aotus (Figure 2, row 4, columns 5-6). Therefore, positively-charged (N), alcohol-derived S, T or apolar P residues were preferred for binding in Aotus P6. Apolar residues A, P, G were preferred in humans, according to peptide-binding prediction. Cβ13F and Yβ37S replacements in deep hydrophobic P9 were structurally far apart on P9 floor (Figure 2, row 4, column 4); Kβ9W and Hβ30C replacements were directly located on P9 floor thereby modifying its volume (247.0 Å3 cf 198.4 Å3) and electrostatic landscape, making it larger and negatively-charged in Aotus (Figure 2, row 4, column 5) (as in HLA-DRβ1*0104). This enabled Aotus to accept large, positively-charged residues R, Y, while preferred residues in humans were apolar or large aliphatic L, I, V, M. Tβ57D replacement seemed a critical modification (Figure 1) due to canonical α76R=β57D salt bridge rupture, making P9 wider than deeper, small apolar residues A, S, T being preferred according to peptide-binding prediction. IMPIPs modified according to these characteristics could thus be used to protect ~12.3% of the human population against P. falciparum malaria. Figure 3C shows that MSP1 cHABP 1585- derived 10014 IMPIPS characterised as HLA-DRβ1*0101 allele-prototype could protect ~8.8% of the human population, based on this allele’s frequency (Figure 1), or ~3.2% if bound to HLA- DRβ1*0901. HLA-DRβ1*08 lineage Comparing Aona-DRβ1*03-GB lineage (Aona-DRβ1*0302/01/26 covering 18.8% Aotus population) to human HLA-DRβ1*08 lineage (covering 8.1% human population) showed that pocket profiles shown here represent 69.5% of the HLA-DRβ1*08 lineage. The Aβ86G/V difference in P1 made it intermediate in size in Aotus between β86G and β86V dimorphic sequences; this large hydrophobic pocket could bind F, Y, L, I, V, M, but not W. P4 was almost identical in both species; the Eβ70D difference had no impact on binding preference, fitting apolar residues L, M, V, S, A according to peptide-binding prediction. Fβ9E Qβ10Y differences (as in HLA-DRβ*03) were distant in P6 side wall and did not interact with peptide. Sβ13G slightly reduced P6 space in Aotus, maintaining polarity, and could bind residues R, K. Peptide-binding prediction indicated that P, S, A were equally accepted. Replacements in Fβ9E (on the floor) and Sβ13G were especially distant in P9 and did not interact with peptide, therefore having no influence on human HLA-DRβ1*08 residue preference regarding Aona-DRβ1*03GB. However, compared to HLA-DRβ1*03, the Yβ37N difference rendered P9 smaller, preferentially accepting apolar residues S, G, A and to a lesser extent L, V, I, M, as in HLA-DRβ1*0803, HLA-DRβ1*0810, HLA-DRβ1*0815, HLA-DRβ1*0830. Dβ57S variation was another difference due to the aforementioned canonical α76R=β57D salt bridge rupture in both species (~40% in humans, 100% in Aotus), inducing preference for S, G, A and in aforesaid alleles like HLA-DRβ1*0803, preferring L, I, V, M. Similar to Aotus, mice I-Ag7 MHC-II, HLA-DRβ*08 and Aona-DRβ0306B have β57S in P9, preferentially accepting residues G, S, A, D, E in optimum fitting conditions. P9 is wider than deeper in I-Ag7, having greater lateral freedom than in other class II molecules; it accepts L, V, I, M in non-optimum conditions [20], as could happen in humans and Aotus. Eight HLA-DRβ1*08-binding IMPIPS, inducing protection in Aotus, could thus be readily used to protect ~5.6% of the human population. Figure 3D provides another excellent example; MSP2 cHABP 4044-derived IMPIPS 24112, classified as HLA-DRβ1*0802, could cover ~1.1% of the world population (Figure 1). Characterised as HLA-DRβ1*1312, it would protect ~1.2% of the world’s population (though having greater affinity for HLA-DRβ1*0802). HLA-DRβ1*07 lineage Human HLA-DRβ1*07 lineage covers 22.4% of the global population whilst convergent Aona- DRβ*W30 allele lineage is found in 20% of the Aotus population. Five HLA-DRβ1*07 pocket profiles represent 69.1% of the HLA-DRβ1-07 lineage (Figure 1). β86G is the predominant dimorphic allele (>80% worldwide) in P1 in humans while this dimorphic allele is almost exclusive to Aotus-DRβ*W30, preferentially receiving aromatic residues F, Y, W. HLA-DRβ1*04 modelling gave Eβ14K and Aβ73G replacements in HLA-DRβ1*07 lineage in P4 wall and Qβ74S and Yβ78V on the floor. Such electrostatic and volumetric differences made Aotus accept small apolar residues S or T, whilst humans could accept also larger apolar residues V, I, A, according to peptide-binding prediction. The Eβ9W difference in P6 was far apart within the pocket and did not intervene in interaction with peptide whilst Vβ11G made P6 smaller and thus preferentially accept small aa S, A, G, as in humans according to peptide-binding prediction. Eβ9W, Sβ57V, Kβ60S, Lβ61W in P9 showed that Kβ60S was above and far apart. Canonical α76R=β57D rupture meant that the salt bridge replaced here by Sβ57 would allow large aliphatic residues L, I to fit. The same could be happening in HLA-DRβ1*15 and HLA-DRβ1*08 as in HLA-DRβ1*07, as Lβ61W did not establish one H-bond (out of 13) with P9 due to the lack of pyrrole nitrogen, since the change involved β9W in HLA to β9L in Aotus-MHC, making IMPIPS having such weaker optimal characteristics fit this pocket. Five HLA-DRβ1*07-binding IMPIPS, inducing protection in Aotus, could thus be readily used to protect 15.5% of the human population (Figure 1). Figure 3E shows that HRPII cHABP 6800- derived 24230 IMPIPS, characterised as HLA-DRβ1*0701, could protect as much as 11.2% of the human population, unfortunately being a short-memory protection inducing IMPIPS [23]. Implications for a vaccine development methodology The foregoing suggests that the five IMPIPS shown here (Figure 3), inducing immunogenicity and full-protection regarding the most stringent challenge against P. falciparum malaria in Aotus, could be used for human immunisations and in so doing protect 22.4% (considering the strongest binder only) of such population. New IMPIPS derived from functionally-relevant cHABPs from proteins involved in RBC invasion designed to fit the lineages’ pocket profiles presented above would suggest that the 36 IMPIPS mentioned below could protect ~80.9% of the world’s population against P. falciparum malaria. This would involve 5 having HLA-DRβ1*03-binding characteristics (totally protecting Aotus) which could cover ~13% of the world’s population, plus another 9 HLA-DRβ*04-binding IMPIPS (~15.6%), another 4 HLA-DRβ1*15-binding IMPIPS (~18.9%), plus 5 HLA-DRβ1*01-binding IMPIPS (~12.3%), 5 HLA-DRβ1*07-binding IMPIPS (~15.5%) and 8 HLA-DRβ1*08-binding IMPIPS (~5.6%). According to our calculations, 14 additional IMPIPS covering all allele lineages representing the most frequently-occurring pocket profiles (giving 50 IMPIPS in total) could protect ~96.6% of the world’s population [21] with a minimum of 1.19% IMPIPS recognised by 90% of the world’s population. This approach could achieve the objective of developing a complete, totally-effective vaccine against pathogens, even complex parasites like P. falciparum which uses multiple proteins and complex strategies during invasion to escape the immune response [4]. The aforementioned volumetric and electrostatic findings regarding IMPIPS side-chains enabling a perfect fit into MHC-DR pockets according to allele lineage suggests their immediacy for use in humans as they have completely protected Aotus. The great immunological similarity between humans and Aotus has allowed the development of a logical and rational methodology for developing complete, fully-protective, minimal subunit- based, multi-epitope, multi-stage chemically-synthesised universal vaccines for human use. This has had to be complemented with already-described steric, electronic [3, 4, 22] and topological rules (i.e. 26.5 Å ± 1.5 Å) distance between P1 and P9 residues [23], φ and ψ torsion angles to induce PPIIL conformation [24], correct side-chain orientation [25] and peripheral flanking residue preference [26]. These emerging rules, combined with a quantum chemistry approach to studying MHC-peptide binding [27], provides a strong framework for peptide-based vaccine design. The forgoing, based on the aforementioned principles together with the use of Aotus as appropriate experimental model, has paved the way forward for effective vaccine development regarding malaria and other infectious diseases, as well as cancer induced by viruses, bacteria or parasites [28]. Conflict of interest The authors declare that they have no financial/commercial conflicts of interest. Acknowledgments We would like to thank Mr Jason Garry for translating and revising the manuscript. This research was supported by Colciencias, contract 860-2015. References [1] M.D. Young, J.A. Porter, Jr., C.M. Johnson, Plasmodium vivax transmitted from man to monkey to man, Science, 153 (1966) 1006-1007. [2] P.G. Contacos, W.E. Collins, Falciparum malaria transmissible from monkey to man by mosquito bite, Science, 161 (1968) 56-56. [3] L.E. Rodriguez, H. Curtidor, M. Urquiza, G. Cifuentes, C. Reyes, M.E. Patarroyo, Intimate molecular interactions of P. falciparum merozoite proteins involved in invasion of red blood cells and their implications for vaccine design, Chem Rev, 108 (2008) 3656-3705. [4] M.E. Patarroyo, A. Bermudez, M.A. Patarroyo, Structural and immunological principles leading to chemically synthesized, multiantigenic, multistage, minimal subunit-based vaccine development, Chem Rev, 111 (2011) 3459-3507. [5] D. Diaz, M. Naegeli, R. Rodriguez, J.J. Nino-Vasquez, A. Moreno, M.E. Patarroyo, G. Pluschke, C.A. Daubenberger, Sequence and diversity of MHC DQA and DQB genes of the owl monkey Aotus nancymaae, Immunogenetics, 51 (2000) 528-537. [6] C.F. Suarez, M.A. Patarroyo, M.E. Patarroyo, Characterisation and comparative analysis of MHC-DPA1 exon 2 in the owl monkey (Aotus nancymaae), Gene, 470 (2011) 37-45. [7] P.P. Cardenas, C.F. Suarez, P. Martinez, M.E. Patarroyo, M.A. Patarroyo, MHC class I genes in the owl monkey: mosaic organisation, convergence and loci diversity, Immunogenetics, 56 (2005) 818-832. [8] J.J. Nino-Vasquez, D. Vogel, R. Rodriguez, A. Moreno, M.E. Patarroyo, G. Pluschke, C.A. Daubenberger, Sequence and diversity of DRB genes of Aotus nancymaae, a primate model for human malaria parasites, Immunogenetics, 51 (2000) 219-230. [9] J.E. Baquero, S. Miranda, O. Murillo, H. Mateus, E. Trujillo, C. Suarez, M.E. Patarroyo, C. Parra-Lopez, Reference strand conformational analysis (RSCA) is a valuable tool in identifying MHC-DRB sequences in three species of Aotus monkeys, Immunogenetics, 58 (2006) 590-597. [10] C.F. Suarez, M.E. Patarroyo, E. Trujillo, M. Estupinan, J.E. Baquero, C. Parra, R. Rodriguez, Owl monkey MHC-DRB exon 2 reveals high similarity with several HLA-DRB lineages, Immunogenetics, 58 (2006) 542-558. [11] C. Lopez, C.F. Suarez, L.F. Cadavid, M.E. Patarroyo, M.A. Patarroyo, Characterising a microsatellite for DRB typing in Aotus vociferans and Aotus nancymaae (Platyrrhini), PLoS One, 9 (2014) e96973. [12] C.A. Moncada, E. Guerrero, P. Cardenas, C.F. Suarez, M.E. Patarroyo, M.A. Patarroyo, The T-cell receptor in primates: identifying and sequencing new owl monkey TRBV gene sub-groups, Immunogenetics, 57 (2005) 42-52. [13] J.E. Guerrero, D.P. Pacheco, C.F. Suarez, P. Martinez, F. Aristizabal, C.A. Moncada, M.E. Patarroyo, M.A. Patarroyo, Characterizing T-cell receptor gamma-variable gene in Aotus nancymaae owl monkey peripheral blood, Tissue Antigens, 62 (2003) 472-482. [14] L.J. Stern, J.H. Brown, T.S. Jardetzky, J.C. Gorga, R.G. Urban, J.L. Strominger, D.C. Wiley, Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide, Nature, 368 (1994) 215-221. [15] J. Robinson, J.A. Halliwell, H. McWilliam, R. Lopez, P. Parham, S.G. Marsh, The IMGT/HLA database, Nucleic Acids Res, 41 (2013) D1222-1227. [16] M. Andreatta, E. Karosiene, M. Rasmussen, A. Stryhn, S. Buus, M. Nielsen, Accurate pan- specific prediction of peptide-MHC class II binding affinity with improved binding core identification, Immunogenetics, 67 (2015) 641-650. [17] M.S. Inc, Insight II User Guide, in: M.S. Inc (Ed.), Molecular Simulations Inc, San Diego, 1998. [18] E.F. Pettersen, T.D. Goddard, C.C. Huang, G.S. Couch, D.M. Greenblatt, E.C. Meng, T.E. Ferrin, UCSF Chimera--a visualization system for exploratory research and analysis, J Comput Chem, 25 (2004) 1605-1612. [19] C. Cardenas, J.L. Villaveces, H. Bohorquez, E. Llanos, C. Suarez, M. Obregon, M.E. Patarroyo, Quantum chemical analysis explains hemagglutinin peptide-MHC Class II molecule HLA-DRbeta1*0101 interactions, Biochem Biophys Res Commun, 323 (2004) 1265-1277. [20] T. Stratmann, V. Apostolopoulos, V. Mallet-Designe, A.L. Corper, C.A. Scott, I.A. Wilson, A.S. Kang, L. Teyton, The I-Ag7 MHC class II molecule linked to murine diabetes is a promiscuous peptide binder, J Immunol, 165 (2000) 3214-3225. [21] H.H. Bui, J. Sidney, K. Dinh, S. Southwood, M.J. Newman, A. Sette, Predicting population coverage of T-cell epitope-based diagnostics and vaccines, BMC Bioinformatics, 7 (2006) 153. [22] A. Moreno-Vranich, M.E. Patarroyo, Steric-electronic effects in malarial peptides inducing sterile immunity, Biochem Biophys Res Commun, 423 (2012) 857-862. [23] M.P. Alba, C.F. Suarez, Y. Varela, M.A. Patarroyo, A. Bermudez, M.E. Patarroyo, TCR- contacting residues orientation and HLA-DRbeta* binding preference determine long-lasting protective immunity against malaria, Biochem Biophys Res Commun, 477 (2016) 654-660. [24] M.E. Patarroyo, A. Moreno-Vranich, A. Bermudez, Phi (Phi) and psi (Psi) angles involved in malarial peptide bonds determine sterile protective immunity, Biochem Biophys Res Commun, 429 (2012) 75-80. [25] A. Bermudez, D. Calderon, A. Moreno-Vranich, H. Almonacid, M.A. Patarroyo, A. Poloche, M.E. Patarroyo, Gauche(+) side-chain orientation as a key factor in the search for an immunogenic peptide mixture leading to a complete fully protective vaccine, Vaccine, 32 (2014) 2117-2126. [26] C. Reyes, R. Rojas-Luna, J. Aza-Conde, L. Tabares, M.A. Patarroyo, M.E. Patarroyo, Critical role of HLA-DRbeta* binding peptides' peripheral flanking residues in fully-protective malaria vaccine development, Biochem Biophys Res Commun, 489 (2017) 339-345. [27] R. González, C.F. Suárez, H.J. Bohórquez, M.A. Patarroyo, M.E. Patarroyo, Semi-empirical quantum evaluation of peptide–MHC class II binding, Chemical Physics Letters, 668 (2017) 29- 34. [28] H. zur Hausen, The search for infectious causes of human cancers: where and why (Nobel lecture), Angew Chem Int Ed Engl, 48 (2009) 5798-5808. Figure Legends Figure 1. Human HLA-DRβ1* and Aona DRβ* convergent allele lineages, showing their identical aa sequences in P1 (fuchsia), P4 (blue), P6 (orange) and P9 (green). Similar amino acids using volumetric or electrostatic criteria, are shown by lighter colours and dissimilar aa are not shaded. Allelic lineage percentage in the global population (% in red), number of HLA-DRB pocket profiles considered (n), IMPIPS´ potential global population coverage (% in blue), pocket profile frequency (PPF) and the final percentage covered by such profiles (% in green) are displayed. 36 IMPIPS fitting into these allele prototypes would thus protect ~80.9% of the human population. Aotus nancymaae (Aona), A. vociferans (Aovo), A. nigriceps (Aoni). Figure 2. The first column shows the HLA-DR molecule a-chain in magenta, -chain in pale blue (both shown in ribbon); the aa forming Pocket 1 are shown by fuchsia balls, Pocket 4 in dark blue, Pocket 6 orange and Pocket 9 green. Residues differing amongst HLA-DR1* and Aotus-MHC-DR are highlighted by red balls. Columns 2, 3 and 4 show Pockets 4, 6 and 9 surface conforming residues (differences highlighted in red). Columns 5 (Aotus) and 6 (Human) give a top/side view of selected pockets, showing determined volume (Å3) Figure 3. Side, front and top view of cHABP-derived protein 3D-structure (bold letters) (mHABPs, bold numbers). Corresponding aa sequences highlighted in colour; residues fitting into HLA-DR1* are indicated and regions having PPIIL conformation are underlined. The yellow box contains HLA-DR1* allele binding (≤ 100 nM, in bold highest affinity) with IC (<200), IFA antibody titre reciprocals (II20/III20: 20 days post-second and 20 days post-third dose, respectively) and the amount of monkeys protected after intravenous challenge (Prot, highlighted in red). Below the side-view the distance (Å) between residues fitting into HLA-DRb1* molecules PBR P1 to P9 is shown. Figure 1. Figure 2. Figure 3. Capítulo 4. Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements Bohórquez HJ, Suárez CF, Patarroyo ME. Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements. Scientific Reports. 2017;7(1):7717. La versión publicada del artículo puede ser consultada en: https://www.nature.com/articles/s41598-017-08041-7 116 www.nature.com/scientificreports OPEN Mass & secondary structure propensity of amino acids explain their mutability and evolutionary Received: 30 January 2017 Accepted: 28 June 2017 replacements Published: xx xx xxxx Hugo J. Bohórquez1, Carlos F. Suárez1,2,3 & Manuel E. Patarroyo1,4 Why is an amino acid replacement in a protein accepted during evolution? The answer given by bioinformatics relies on the frequency of change of each amino acid by another one and the propensity of each to remain unchanged. We propose that these replacement rules are recoverable from the secondary structural trends of amino acids. A distance measure between high-resolution Ramachandran distributions reveals that structurally similar residues coincide with those found in substitution matrices such as BLOSUM: Asn ↔ Asp, Phe↔ Ty r, Lys↔ Ar g, Gln↔ Gl u, Ile↔ Va l,Met →L eu; with Ala, Cys, His, Gly, Ser, Pro, and Thr, as structurally idiosyncratic residues. We also found a high average correlation (R = 0.85) between thirty amino acid mutability scales and the  mutational inertia (IX), which measures the energetic cost weighted by the number of observations at  the most probable amino acid conformation. These results indicate that amino acid substitutions follow two optimally-efficient principles: (a) amino acids interchangeability privileges their secondary  structural similarity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost,  and inversely with its frequency. These two principles are the underlying rules governing the observed amino acid substitutions. In molecular evolution, protein stability is a solid indicator of function preservation thanks to a positive corre- lation between protein functionality and native stability1, 2. Natural protein sequences evolved to avoid aggre- gation and increase functional diversity3, and once a protein fold is established, the selection pressure at most positions in the protein will preserve fold stability. Homologous families of proteins have related functions, and structures are similar although sequences have diverged4, even in regions with less than 30% sequence identity5, 6. Accordingly, mutation events over time may replace a residue by another while keeping the backbone dihedral angles at that position unchanged7. These facts indicate that the amino acid sequence alone is an incomplete measure of evolutionary relationships between proteins. Indeed, structural similarities better reflect homology than sequence similarities8. Therefore, sequence variation around a conserved molecular architecture could be traced through amino acid substitution patterns fixed during protein evolution. The intrinsic secondary structure propensities of amino acids are given by the statistics of Ramachandran dis- tributions9–11. In this way, we could know the conformational bias of each amino acid towards specific secondary structures12, 13. For instance, long polypeptide chains with the same backbone conformation are found exclusively in α − helix, PPII, and β strands structures14. In general, examining the frequency of occurrence of particular amino acid residues in stable secondary structures have been useful for determining protein structure, folding, and energetics15. We propose that, in addition, the statistics of the secondary structure of proteins may reveal their evolutionary information. To confirm this assumption, we explore a combination of extensive physical quantities with the statistics of Ramachandran distributions PX(φ, ψ). In particular, we investigate the molecular mass as a measure of the amino acids biosynthetic cost. In addition, we use the protein geometry database (PGD 1.1)16 for obtaining 1Bio-mathematics, Fundación Instituto de Inmunología de Colombia, FIDIC, Cra. 50 No. 26-00, Of. 102, Bogotá DC, 111321160 Cundinamarca, Colombia. 2Universidad de Ciencias Aplicadas y Ambientales, UDCA, Bogotá DC, Colombia. 3Universidad del Rosario, Bogotá DC, Colombia. 4Universidad Nacional de Colombia, Bogotá DC, Colombia. Hugo J. Bohórquez and Carlos F. Suárez contributed equally to this work. Correspondence and requests for materials should be addressed to H.J.B. (email: hugo.j.bohorquez@fidic.org.co) SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 1 www.nature.com/scientificreports/ Figure 1. High-resolution Ramachandran probability distributions PX(φ, ψ) (logarithmic scale) as derived from the PGD 1.1 database at 1.895° × 1.895° bin size. Structurally similar open sets: yellow, SI = {{Arg, Lys}, {Glu, Gln}, Leu}; green, SII = {Trp, {Phe, Tyr}}; magenta, SIII = {Ans, Asp}; cyan, SIV = {Val, Ile}. Ala, Met, and Ser have their first neighbor in SI; His, Thr, and Cys are adjacent to SII. Larger images of each Ramachandran distribution are given by Supplementary Figs. S1–S20. high-resolution Ramachandran distributions as 2D-binned probability histograms (Fig. 1). This choice has some practical advantages, including the possibility of directly applying distance measures between the distributions. The secondary structure distance between the amino acids (Fig. 2) is the main task in our research because the emerging close-distance pairs can be straightforwardly compared to pairwise mutations. The optimal bin area (ΔφΔψ) dividing the Ramachandran map is given by the method of Shimazaki & Shinomoto17. This is a key element in histogram binning because a very small bin size will result in noise amplification whereas a very large value will overpass important details of the distribution. SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 2 www.nature.com/scientificreports/ Arg Lys Glu Gln Leu Met Ala Trp Phe Tyr His Thr Cys Ser Asn Asp Val Ile Gly Pro Arg Lys Glu Gln Leu Met Ala Trp Phe Tyr His Thr Cys Ser Asn Asp Val Ile Gly Pro Figure 2. Distance matrix ordered according to structurally similar amino acids. The smallest distance is represented in yellow, and the largest distance in blue, with intermediate values in green. Open subsets appear, consistently, in yellow. Additionally, Gly, and Pro appear as the most distant elements, followed by Asn, Val-Ile, Ala, and Thr. We explore the twenty amino acid distributions through some of their distinctive features such as the most probable conformation, which is given by the highest peak of each distribution. Additionally, we propose a plau- sible mutability parameter that combines structural information with the molecular mass of the amino acids. Our results indicate that amino acid evolutionary substitutions occur by following two optimal-efficiency principles: (a) interchangeability between amino acids occurs by preserving secondary structural propensity, and (b) the mutability of an amino acid depends directly on its mass, and inversely with its frequency. The methodology introduced here gives the basis for developing a new kind of scoring matrices involving physical quantities and secondary structure statistics. Hopefully, these future efforts will further help to improve the peptide design strat- egies, which can contribute to close the gap between the primary sequence and the 3D structure of proteins. Results and Discussion High-resolution Ramachandran Probability Distributions. We distinguish two concepts regarding the backbone dihedral angles of proteins, as suggested by Dunbrack Jr. et al.11. The first is a Ramachandran plot or Ramachandran map, which is simply a scatter plot of the φ, ψ values for the amino acids in a single protein structure or a set of protein structures. It provides a simple view of the conformation of a protein. The second is a Ramachandran probability distribution P(φ, ψ) which is a statistical representation of Ramachandran data, usually in the form of a probability density function. PX(φ, ψ) gives the probability of finding an amino acid conformation in a specific range of (φ, ψ) values. We obtained non-parametric density estimates of PX(φ, ψ) for each amino acid X from 1,153,791 residues retrieved from the high-resolution protein geometry database (PGD 1.1)16. In our approach—frequentist—events have a specific probability whose determination depends on the number of observations. Therefore each SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 3 www.nature.com/scientificreports/ Amino acid M (Da) B ∆min PmaxX X X X (%) NX WX IX Ala 71.079 4 1.176° 0.437 113609 496.654 0.143 Arg 156.188 10 1.593° 0.265 45373 120.333 1.298 Asn 114.104 2 2.535° 0.156 46573 72.701 1.569 Asp 115.089 1 2.169° 0.192 56963 109.191 1.054 Cys 103.139 5 2.951° 0.173 15823 27.298 3.778 Gln 128.131 2 2.118° 0.307 35633 109.470 1.170 Glu 129.116 1 1.748° 0.321 48458 155.431 0.831 Gly 57.052 5 2.118° 0.124 98983 122.840 0.464 His 137.141 13 2.609° 0.173 27675 47.910 2.862 Ile 113.159 7 1.488° 0.285 74768 213.090 0.531 Leu 113.159 7 1.463° 0.276 116941 322.560 0.351 Lys 128.174 10 1.856° 0.276 40135 110.584 1.159 Met 131.193 7 1.782° 0.284 20968 59.610 2.201 Phe 147.177 11 2.169° 0.190 56511 107.242 1.372 Pro 97.117 4 2.222° 0.110 54555 60.167 1.614 Ser 87.078 4 1.978° 0.141 66612 93.593 0.930 Thr 101.105 6 2.069° 0.178 68557 121.726 0.831 Trp 186.213 14 2.687° 0.200 21118 42.340 4.398 Tyr 163.176 11 2.400° 0.184 48972 90.250 1.808 Val 99.133 4 1.622° 0.241 95564 230.082 0.431 Table 1. Properties of the Amino acids used in the present study. MX is the residue average mass (without water). BX gives Davis’ biosynthetic steps37. ∆minX (deg) is the optimal bin angle determined by MISE method17. PmaxX corresponds to the peak of the Ramachandran distribution PX(φ, ψ). NX is the number of points used for determining PX(φ, ψ). WX = P max X × NX is an estimator of the maximum possible observations at the most frequent conformation. IX = MX/WX is the mutational inertia. distribution PX(φ, ψ) is given by a joint histogram. Such an approach depends on finding an optimal grid size, which can be determined with Shimazaki & Shinomoto method17. Said strategy requires a heuristic exhaustive sampling of a cost function whose minimum corresponds to an optimal binning of the distribution—see methods for details. Table 1 reports the optimal bin width for each Ramachandran probability distribution, ∆minX . The weighted average of these optimal bin widths gave us the bin size used (1.895°) in the present study. Thus, we obtained a grid with a total of 190 × 190 bins (36,100), each one covering an area of 1.895° × 1.895° of the dihedral space (Fig. 1), which is a significant improvement on the resolution of Ramachandran distributions previously reported. For comparison, the 3D representation of the Ramachandran distributions for the first version of PGD uses a grid of 20.0° × 20.0° (i.e. a total of 324 bins), from a dataset containing 72,376 residues10. In another approach, the predicted protein backbone torsion angles from NMR chemical shifts made by the TALOS+ program uses an identical bin size (20.0° × 20.0°)18, 19, other studies on folding trends uses a resolution of 10.0° × 10.0° (i.e 1,296 bins)11. An early report on detailed Ramachandran distributions used bin widths of 4.0° × 4.0° (i.e. 90 × 90 bins), involving 237,384 amino acids from 1,042 proteins20. Our distributions have a resolution 4.5 times higher, which translates into a higher accuracy in the distance computations between the set of distributions PX(φ, ψ). This high resolution was possible thanks to the fact that at least 84% of the structures reported at the protein data bank (PDB) were obtained during the last decade alone, most of which have atomic resolution. Figure 1 reports the 3D plots of the twenty Ramachandran distributions determined for the present study; the dihedral angles are given in degrees, while the percentage probability per bin is given on a logarithmic scale. All the plots have the same height to facilitate their comparison. Larger plots are included in Supplementary Figs. S1–S20. While most distributions look similar one to another, there are some key differences. The probabil- ity distribution of glycine is very symmetrical and occupies all the allowed regions of the Ramachandran map. It is the only residue having a maximum at the left-handed α-helix conformation with a peak almost as high as the one at the α-helix region; these features are a consequence of its lack of a side chain21. On the other hand, pro- line—an imino acid—has two highly-populated states, with a slightly higher probability at the PPII conformation than at the α-helix conformation. It belongs to the set of structurally restricted amino acids composed by {Ile, Pro, Thr, Val}, which have an extremely low probability of occupying the right-hand side of the Ramachandran map. Indeed, the corresponding plots (Fig. 1) show few points within the quadrants I and IV (φ > 0). The con- formational restrictions of proline arise from its pyrrolidine ring, whose flexibility is coupled to the backbone22. Isoleucine, threonine, and valine are the only amino acids with C-β branching, which means that they have more bulkiness near to the protein backbone than the rest of amino acids23. They also have a local maximum within the β-sheet region—shown as red shaded peaks in Fig. 1—a feature only shared with the three aromatic residues, Phe, Tyr, Trp, and Leu. The remaining amino acids occupy the allowed regions in a generic fashion20, 24, whose distributions agree with the original Ramachandran and co-workers explanation in terms of steric clashes25. All these observations point to the qualitative aspects of the distributions. However, a systematic comparison of the twenty Ramachandran distributions requires the use of a quantitative evaluation of their similarities. In the SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 4 www.nature.com/scientificreports/ following subsection, we show a distance matrix accounting for dissimilarities between the secondary-structural trends of amino acids. Secondary-structural vs BLOSUM replacements. A quantitative assessment of the similarities between the twenty distributions PX(φ, ψ) requires a distance measure. We used the city-block distance, which can be used to assess the differences in discrete frequency distributions. It gives more weight to the most probable dihedral conformations of the Ramachandran distributions. Each amino acids X has a set of twenty distances, DX, including with itself, (in which case ||PX − PX|| = 0): DX = {||PX − PAla||, ||PX − PArg||, …, ||PX − PTyr||, ||PX − PVal||} (1) The most plausible secondary-structural replacement to X is that amino acid Y having the smallest positive distance to X, or the minimum positive value from the set of distances: min+{DX}. That min+{DX} = PX − PY does not imply necessarily that min+{DY} = PY − PX . In other words, the structural replacement is not always a reciprocal operation; hence if Y is the replacement of X, we denote this by X → Y. In the case of a reciprocal replacement, we denote it by X ↔ Y. The secondary-structural distance matrix between the amino acids is shown in Fig. 2. The proximity between amino acids is given by a color scheme: the smallest distance is represented in yellow, and the largest distance in blue, with intermediate values in green. We found open subsets by a nearest-neighbor criterion: any element within an open subset has exactly the remaining elements of said subset as its nearest neighbors—the procedure is explained in the methods section. For instance, the simplest open subset is composed by two elements for which the other one is the closest element—i.e. those elements for which Dmin(PX, PY) = Dmin(PY, PX) or, equivalently, X ↔ Y. We found the following open sets (Fig. 3): a five-member set including a couple of two-member subsets: SI = {{Arg, Lys}, {Glu, Gln}, Leu}—in yellow; a three-member set containing a two-member set, SII = {Trp, {Phe, Tyr}}—in green; and a pair of two-member sets: SIII = {Val, Ile}, and SIV = {Asn, Asp}—in cyan and magenta, respectively. Within this topology, Met appears as a boundary element of the first set SI; Fig. 3 shows that Met first five neighbors are exactly the elements of SI. In turn, every residue in SI has Met as the fifth neighbor but Glu, which has Ala closer; this proximity may result from Ala and Glu being the strongest α-helix formers, as their respective PmaxX values indicate (Table 1). The SI group includes aliphatic saturated side chains, while SII contains the aromatic residues. Adjacent to these two major sets we found residues sharing their physiochemical charac- teristics—as shown by their close distances to the main groups in the distance matrix (Fig. 2). Specifically, four residues have their nearest neighbor within a major open set: Ala have its first neighbor in SI, whereas His, Thr, and Cys have their first neighbor in SII. Those amino acids outside an open set or its boundaries were considered structurally idiosyncratic: Ala, Cys, His, Gly, Ser, Pro, and Thr. Gly and Pro are the farthest ones from any other residue, as the last column of Fig. 3 shows. Certainly, these amino acids populate the Ramachandran map in a unique way. The Ramachandran distribution of glycine is widespread over the allowed regions; while Pro is the most structurally restricted. Alanine has twice the probability of forming an α-helix (PmaxAla = 0.437% from Table 1) than any other residue (Pmaxaver≠Ala = 0.214%). The Ramachandran distribution of Thr has four peaks around the β and π regions unlike any other residue, including the C-β branched amino acids (Fig. 1). While Thr is chemically similar to Ser26, they have different structural propensities. According to our distance matrix (Fig. 2), Thr is closer to Tyr & Phe, while Ser is closer to His & Arg. A recent study shows that the phosphorylation of Ser increases its propensity of forming PPII, whereas that of Thr has the opposite effect27. This result indicates that Ser and Thr are far from being ideal secondary structural replacements. In summary, our classification reflects the intrinsic structural trends of amino acids; in particular, the SI set and its adjacent elements Met and Ala are the same alpha formers found by Fujiwara et. al.28. Within the same scale, the aromatic set, SII, and its adjacent elements (Cis, Thr) and SIII are beta formers. The remaining amino acids are turn/bend formers, includ- ing S maxIV and Gly, Ser, and Pro, most of which have the lowest PX values in Table 1. More importantly, nevertheless, is the fact that an unexpected pattern emerged: our structurally similar pairs of amino acids matches with most BLOSUM matrices pair replacements29, which are shown as shadowed boxes in Fig. 3. More details about the substitution matrices are in the methods section. Our list of structural replace- ments is: Asn ↔ Asp, Phe ↔ Tyr, Lys ↔ Arg, Gln ↔ Glu, Ile ↔ Val, Met → Leu. In BLOSUM matrices, Thr and Ser are replacements. For all BLOSUM matrices, Gly, Pro, Cys, His, and Ala are idiosyncratic residues. In general, our set of structurally-similar amino acids coincide with most canonical residue substitutions given by scoring matrices such as BLOSUM62 and BLOSUM10029, and consensus replacements30. This is a remarkable finding considering the extremely low probability of randomly finding six out of seven replacement pairs: less than one in a 681 million, as detailed in the methods section. In consequence, our result reveals an underlying correlation between mutation matrices and structural propensities. Hence, the replacement rules implied by the secondary structure distance (Fig. 2) may be directly used for for exploring structural amino acid replacements in peptide design strategies. We conclude that during evolution, mutational replacements occurred between structurally similar amino acids. Hence, mutations followed a process that privileges structure and hence preserves function. But BLOSUM and PAM substitution matrices give additional information about the mutational trends of amino acids. The diagonal of these matrices determine how easy is for an amino acid to be replaced. A large value means more resistance to change. However, our distance matrix (Fig. 2) has a diagonal of zeros. For studying the mutability, we explored a parameter that combines the statistical information at the PmaxX with a basic extensive property. Molecular mass and optimum evolutionary cost. Molecular mass is a fundamental extensive property that might have played a central role in defining the actual protein landscape. Previously, our group revealed a SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 5 www.nature.com/scientificreports/ Figure 3. Rows ordered according to the cityblock distance. Open sets are indicated by the same color code used in Fig. 1. The shadowed boxes contain the BLOSUM100 pair replacements. The procedure for determining an open set consists on finding rows with the same set of first neighbors. For instance, the first neighbor of Arg (top row) is Lys; after placing the Lys row under the top row, we see that they share the seven first neighbors (up to Trp). The third row corresponds to Arg second neighbor, i.e. Glu, which also shares the same first neighbors with the previous ones up to Trp. The fourth row corresponds to Arg third neighbor, i.e. Gln, whose fifth neighbour is Ala, unlike the previous rows. The fifth row corresponds to Arg fourth neighbor, i.e. Leu, which has all the previous rows as its first neighbors. In this way, the yellow box includes those elements whose first four neighbors are completely contained within the set. Methionine is a frontier element of this set: its first five neighbors are exactly the elements of the whole closed set; however, Glu does not include Met within its first five neighbours and for that reason Met is not contained in the set. The remaining open sets SII to SIV were obtained in the same way. Notice that Pro and Gly are the farthest residues from any other one, as a consequence of their structural propensity uniqueness. very high correlation (R = 0.98) between mass and the electronic energy of amino acids—excluding the two sulfur-containing side chains31. In the present study, we found a complex relationship between the amino acids mass MX and the structural trends via the probability at the most frequent conformational state,  PmaxX ; this quan- tity is given by the highest peak of each Ramachandran distribution—max(PX(φ, ψ)). PmaxX corresponds to the most frequent conformation and, therefore, it is an indicator of structural persistence32. The α-helix conformation is the highest peak for all amino acids (but proline) with alanine at the top as the strongest helix former. While mass has an overall poor correlation with PmaxX (R = 0.05), we identified two main and opposite trends delimited by separate ranges of PmaxX : (a) P max X > 0.200% defines the set of strong helix formers {Ala, Glu, Gln, Ile, Met, Leu, Lys, Arg, Val} (in descending order), with a negative correlation R = −0.61; and, (b) PmaxX ≤ 0.200% defines the weak helix formers: {Trp, Asp, Phe, Tyr, Thr, His, Cys, Asn, Ser, Gly, Pro}, with a positive correlation of R = 0.76. The small set of C-β branched amino acids ({Ile, Thr, Val}) plus proline shows a correlation of R = 0.78 between mass and PmaxX . After excluding these four elements from the two main sets, their respective correlations rise to R = −0.87 for the strong helix formers, and to R = 0.87 for the set of weak helix formers. In strong helix formers, the negative correlation between PmaxX and the molecular mass indicates that light side chains have a better chance of forming an alpha helix than heavy ones. These three correlations reveal a direct involvement of the molecular mass on the α-helical propensities of the amino acids. A recent observation by Lehmann et. al. reports a negative correlation between the background frequency and codon degeneracy of amino acids with mass33. Seligmann already observed that the evolutionary rate of amino acid replacements correlates negatively with mass34. Accordingly, heavier amino acids are less frequent, which suggests that the genomes preserve a fundamental distribution ruled by simple energetics. Inverse correlations between the average amino acid biosynthetic cost and the levels of gene expression are consistent with natural SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 6 www.nature.com/scientificreports/ Figure 4. Correlation between the molecular mass of the amino acids MX and their energetic cost as accounted by the number of biosynthetic steps BX proposed by Davis37. The outliers {Asn, Asp, Gln, Glu} are excluded from the Pearson’s correlation and from the linear interpolation. selection to minimize costs35. Seligmann also shows a positive correlation (R = 0.80) between the molecular mass MX and the total energetic cost per amino acid (in ATPs)34, as reported by Akashi & Gojobori36. According to Lehmann et al., highly expressed proteins tend to use amino acids with relatively low synthetic costs33. Therefore, heavy amino acids are less frequent because they are biosynthetically more expensive. We found a further confir- mation of this statement: the molecular mass grows with the number of biosynthetic steps, as shown in Fig. 4. The values proposed by Davis37, are included in Table 1 as BX. The number of biosynthetic steps has been proposed as a natural way of determining the evolutionary history of amino acids38, and so does the amino acids molecular mass. We found a correlation of R = 0.64 between mass and biosynthetic steps, which rises up to R = 0.88 after excluding the set of outliers {Asn, Asp, Gln, Glu} (Fig. 4). In summary, we found a high correlation—by parts—between the molecular mass and the probability at the most frequent conformational state (PmaxX ). We also found a high correlation between mass and the number of biosynthetic steps (BX). These correlations are consistent with the fact that evolution privileges energetically opti- mal costs34, 39. Thus, in the quest for a physical quantity that can explain amino acid’s mutability, mass is irreplace- able as a fundamental measure of energetic cost. Mass over the frequency at the most probable conformation correlates with mutability. The background frequency or natural abundance of amino acids, NX, may be indicative of their evolutionary age: more abundance reflects an early adoption in molecular evolution40. The values of NX were obtained from the PGD 1.1 database (Table 1). The quantity W maxX = PX × NX is an estimator of the maximum observations at the most frequent conformation. In this way, WX combines the probability at the most probable conformation with the background frequency. In the previous section we showed that an amino acid has less probability to be changed if it is more energetically expensive, and therefore mass directly measures the resistance to be changed. Additionally, less frequent amino acids are also less replaceable, indicating an inverse correlation with the muta- bility. Under these considerations, we define a “replacement inertia” as the mass MX weighted by WX: IX = MX/WX. It summarizes the energetic cost per number of observations at the most probable conformation. We hypothesize that IX might reflect the mutability of amino acids—i.e. the diagonal of substitution matrices (see more details in the Methods). SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 7 www.nature.com/scientificreports/ Figure 5. Pearson correlation coefficients between the replacement inertia IX (Table 1) and the mutability of thirty replacement matrices. Alignment derived matrices are shown in blue, force field derived matrices in purple, and the genetic code derived matrix in green. See Supplementary Table S1 for the abbreviations. In order to test if IX reflects the mutability of amino acids, we selected thirty replacement matrices reported by the AAindex41: twenty-seven that were built from sequence alignments—including a selection of six PAM and eight BLOSUM matrices; two more that were crafted from force fields (THREADER and SAUSAGE)42; and a last one that was obtained from replacements at the genetic code level43. Supplementary Table S1 contains the list of matrices used in our survey. We computed the Pearson correlation coefficient between IX and each mutability, which is shown in Fig. 5; in this figure, the correlation with alignment-derived matrices is colored in blue; the correlation with force-field derived appears in purple; and the correlation with the genetic code based matrix is plotted in green. We found a very strong average correlation between IX and the whole mutability set of R30 = 0.85. This aver- age value can be explained by the strong correlation found between IX and the mutability of matrices derived from sequence alignments, which have values R > 0.78, as Fig. 5 shows. For the family of BLOSUM matrices, R values were obtained between 0.90 and 0.96, with an average correlation of RB = 0.92. For PAM matrices, the correlation was lower with an average value of RP = 0.82 for the six PAM matrices included in our survey. On the other hand, the correlation between IX and the mutability of the THREADER substitution matrix was the lowest we found, RTHREADER = 0.52. The second lowest correlation for was with the matrix based on the genetic code (RBENNER = 0.64). The other force field derived matrix gave a correlation of RSAUSAGE = 0.68. These low correlations may have an interesting explanation: while force field based substitution matrices do not include evolutionary information, BENNER matrix, on the other hand, assumes that the genetic code is the only determi- nant of amino acid substitutions. As a consequence, the underlying factors controlling these matrices are poorly reflected on IX. Therefore, we must conclude that the very high correlation between IX and the mutability of matri- ces derived from sequence alignments implies that molecular mass, abundance, and the most probable secondary structure conformation may have played a decisive role on shaping the molecular evolution of proteins. However, how significant an average correlation of R = 0.85 between IX and the mutability set is? We evalu- ated the correlation coefficients between the mutability of all the substitution matrices, which yields a total of 430 correlations for the thirty matrices considered. The average value for these correlations is R430 = 0.84. This value differs little from R , which means that IX describes amino acids mutability as well as any the mutability of the accepted mutation matrices. The correlation matrix with significance levels for IX and the mutability of the whole set of matrices is shown in Supplementary Fig. S1. An excerpt of this plot is shown in Fig. 6, which includes the following matrices: BLOSUM30, BLOSUM62, BLOSUM100, PAM40, PAM160, and PAM250. This plot reveals that the correlations between PAM and BLOSUM fall within 0.70 and 0.83. Expectedly, correlations between matrices of the same family are higher, up to 0.96 for BLOSUM and up to 0.97 for PAM. It is surprising that IX had better simultaneous correlations with both matrix families than they have with each other. This observation holds for the eight BLOSUM and six PAM matrices included in our study, as shown in Supplementary Fig. S21. SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 8 www.nature.com/scientificreports/ Figure 6. Correlation matrix plot with significance levels between the replacement inertia (IX) and the mutability of a representative set of BLOSUM and PAM matrices. The lower triangular matrix is composed by the bivariate scatter plots with a fitted smooth line. The upper triangular matrix shows the Pearson correlation plus significance level (as stars). Each significance level is associated to a symbol: p-values 0.001 (***), 0.01 (**), 0.05 (*). This plot was generated with the Performance Analytics package in R program57. The correlation matrix for the complete mutability set is plotted in Supplementary Fig. S1. Our results indicate that amino acids mutability may be an evolutionary invariant that depends on the bio- synthetic cost per amino acid and on the background frequency. These observations might have relevant conse- quences for future developments and improvements of the actual scoring matrices, as well on structure prediction and design. Conclusions Our study provides compelling evidence about the physiochemical nature of the substitution matrices. Taylor’s early work44 on evolutionary biochemistry45 proposes an integrative amino acid classification schema based on Dayhoff ’s PAM matrix and properties such as volume and polarity. In a complementary way, our approach puts the evolutionary concepts closer to physiochemical properties, which might be helpful for treating proteins as integrated physical and historical wholes. The main findings of the present work agree with accepted ideas about the molecular evolution of proteins. In the first place, we claim that secondary structural similarities resemble to a great extent the canonical replace- ments given by substitution matrices (Figs 2 and 3). We interpret this result as a manifestation of an underlying structural preservation principle according to which amino acids interchangeability is highly determined by their secondary structural similarity. It might be a consequence of the fact that less structurally important parts of a protein evolve faster than more important ones. In this way, conservative substitutions occur more frequently in evolution than more disruptive ones. Our result agrees with Koonin & Wolf view according to which the primary causes of protein evolution could have more to do with fundamental principles of protein folding than with unique biological functions46. In the second place, we showed that amino acids mutability is correlated with the replacement inertia IX (Fig. 5). Therefore, amino acids mutability depends on the biosynthetic cost, the most probable conformation, and the background frequency. Davis proposes that the timeline of genetically encoded amino acids correlates with the number of chemical reactions required to synthesize each amino acid37, 38, 47. As a consequence, the correlation between mass and biosynthetic steps (Fig. 4) indicates that the mutability of amino acids might be a timeline of protein evolution as well. Undeniably, the biosynthetic cost, structural preservation, and frequency distribution of amino acids, all played a significant role in the molecular evolution of proteins. Indeed, two main selective factors determining the evolution of proteins are structural robustness against misfolding, and energy-cost efficiency46, 48, 49. Protein syn- thesis is very error-prone in comparison to DNA replication, and hence many folding-recognition mechanisms seem to have evolved to minimize costs of erroneous protein synthesis49. This energy-cost efficiency may explain why highly expressed proteins evolve slowly and at rates largely unrelated to their functions48. SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 9 www.nature.com/scientificreports/ We can summarize our two main findings in similar terms with the following optimal-efficiency principles: (a) amino acids interchangeability occurs by preserving the secondary structural propensity, and (b) the amino acid mutability depends directly on its biosynthetic energy cost, and inversely with its frequency at the most probable conformation. We believe that these two principles are the underlying rules governing the observed amino acid substitutions. They provide a unified interpretation to mutation matrices, outside the statistical realm alone. Our results also indicate that amino acids mutability might be an invariant scale that differs little from one substitution matrix to another (Supplementary Fig. S21). These results may offer a new understanding of the evolutionary processes determining the structure of proteins. Finally, the statistical similarities between secondary structural propensities used here offer a viable methodol- ogy for systematically exploring amino acid structural replacements. For instance, one can determine a structural distance matrix limited to the β-strand region, which may differ from the one of the whole Ramachandran map. With this type of sectoral statistics one can envision new rules for the design of polypeptide chains. Methods Data source. We calculated the Ramachandran distributions from the protein geometry database PGD 1.1, retrieved in June 201616. We selected crystallized protein geometries with resolution equal or less than 2Å, a R-factor equals to 0.2, and a R-free maximum of 0.3. In order to avoid over-representation bias of some protein families, we used 7,398 proteins with a maximum identity of 25%. A total of 1,153,791 residues were considered. Data analysis. The statistical analysis of the present work was implemented in Python 2.7 programming language50, 51. A Python routine extracts the observed (φ, ψ) values from the PGD database for each amino acid (PGDread.py). The 2D optimization process was done with a routine that computes the cost function by chang- ing the bin width equally for both dihedral variables Δ = Δφ = Δψ, (MISE.py). The Ramachandran distribution histograms were computed and plotted with Matplotlib libraries (3DRamadistr.py)52. The cityblock distance was taken from the SCIPY package. A total of 600 code lines were written for the complete analysis shown here. The Python codes are available upon request. Histogram optimization. Histograms are a type of non-parametric density estimates for which the num- ber of parameters equals the number of data points53. A different approach uses analytic functions for obtaining smooth distributions that minimize low resolution and outliers effects54. The discrete (histogram) representation of the joint probability distribution PX(φi, ψj) depends on the bin width of the dihedral variables, i.e. Δφ and Δψ. A coarse binning size decreases the data noise but it might overpass relevant details of the structural information. On the other hand, a very fine grain bin size might highlight underlying statistical noise. The mean integrated squared error (MISE) can be estimated from the data through a cost function C(Δ). A histogram with the bin size that minimizes the MISE is optimal17. This method guarantees that a substantial increasing in the observations will further increase the accuracy of the histogram representation of probability distributions even more. The main assumption underlying this method is that the distribution can be represented by a smooth continuum function. Previous works have proven that Ramachandran distributions obey such assumption11. We assumed a regular partitioning of the Ramachandran maps i.e having the same bin size Δ for both dihedral variables: Δ = Δφ = Δψ. The cost function for two variables is therefore given by C( ) 2n − v∆ = ∆4 (2) where the mean n and the variance v of the number of occurrences are given, respectively, by n = 1 ∑Ni ni, and N v = 1 N ∑ N 2 i (ni − n) . The obtained optimal bin value for each amino acid is ΔX (Table 1). We used the weighted average as the bin with for all the Ramachandran distributions: ∆ = ∑20X N 20 X∆X /∑X NX. From the obtained ΔX values, ∆ = 1.887°, which was approximated by the integer fraction 360°/190 ≃ 1.895°, i.e. we used 190 bins in each angular coordinate, for a total of 190 × 190 = 36,100. Amino acid classification.  We classified the amino acids according to the city-block (Manhattan) distance. Our grouping method takes advantage of the fact that a metric induces a topology on a set. Accordingly, we determined the topology induced by the city-block distance over the set of amino acids. The increasing distance between a given element X and the remaining ones determines an ordered list. Therefore, for the present case, we have twenty ordered lists, one for each amino acid. The intersection between the first neighbors of these lists gave us open subsets. An open subset consists on those elements such that, for every element within the subset, its neighbors belong to the same subset. Figure 3 reports the twenty ordered lists with an example about how to obtain open sets. Substitution matrices and mutability. The most common method of evaluating the amino acid substi- tution patterns is through substitution matrices such as PAM55 or BLOSUM29. A typical substitution matrix has 20 × 20 elements, in which non-diagonal pairwise scores (log odds) represent the probability of one amino acid could be substituted by other in protein evolution. The diagonal scores of the matrix are estimators of amino acid mutability. For each amino acid, a greater score implies lesser possibilities to be substituted, on the other hand, lesser scores implies a greater chance to be substituted55, 56. We used a set of thirty substitution matrices reported in the AAindex41 and NCBI (ftp://ftp.ncbi.nih.gov/blast/matrices/). Probability of randomly finding six out of seven sets.  Substitution matrices, such as BLOSUM62 & BLOSUM100, define seven replacement pairs of amino acids. Our structural similar pairs do coincide with six of them. We need an assessment of the probability for correctly obtaining six out of seven pairs. The probability of SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 1 0 www.nature.com/scientificreports/ obtaining the first element of a pair is the number of elements of such pair (2) divided by the total of elements (14). Then, the probability of finding the match is the number of pair elements still in the set (1) divided by the total left (13). Hence, the combined probability of randomly finding the first pair out of seven is P1 = 2/14 × 1/13. By a similar reasoning, the probability of obtaining a second pair is P2 = 2/12 × 1/11, and so on. Therefore, the probabi l ity of s imultaneously f inding s ix out of seven pairs is ∏6i=1 Pi , or equiva lent ly, 2 ∏7k=2 = 1/681,080,400 = 1.468 × 10 −9. In other words, there is a chance of one in 681 million of simul- 2k(2k − 1) taneously obtaining six correct pairs from a set of seven pairs. References 1. Sikosek, T. & Chan, H. S. Biophysics of protein evolution and evolutionary protein biophysics. Journal of The Royal Society Interface 11, 20140419 (2014). 2. Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proceedings of the National Academy of Sciences 103, 5869–5874 (2006). 3. Yu, J.-F. et al. Natural protein sequences are more intrinsically disordered than random sequences. Cellular and Molecular Life Sciences 15, 2949–2957 (2016). 4. Worth, C. L., Gong, S. & Blundell, T. L. Structural and functional constraints in the evolution of protein families. Nature Reviews Molecular Cell Biology 10, 709–720 (2009). 5. Levy, E. D., Erba, E. B., Robinson, C. V. & Teichmann, S. A. Assembly reflects evolution of protein complexes. Nature 453, 1262–1265 (2008). 6. Yang, Y. et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Briefings in Bioinformatics bbw 129 (2016). 7. Orengo, C. A. & Thornton, J. M. Protein families and their evolution—a structural perspective. Annu. Rev. Biochem. 74, 867–900 (2005). 8. Dokholyan, N. V. & Shakhnovich, E. I. Scale-free evolution. In Power Laws, Scale-Free Networks and Genome Biology, 86–105 (Springer, 2006). 9. Ramachandran, G. t. & Sasisekharan, V. Conformation of polypeptides and proteins. Advances in protein chemistry 23, 283–437 (1968). 10. Hollingsworth, S. A. & Karplus, P. A. A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. Biomolecular concepts 1, 271–283 (2010). 11. Ting, D. et al. Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model. PLoS computational biology 6, e1000763 (2010). 12. Levitt, M. Conformational preferences of amino acids in globular proteins. Biochemistry 17, 4277–4285 (1978). 13. Koehl, P. & Levitt, M. Structure-based conformational preferences of amino acids. Proceedings of the National Academy of Sciences 96, 12524–12529 (1999). 14. Hollingsworth, S. A., Berkholz, D. S. & Karplus, P. A. On the occurrence of linear groups in proteins. Protein Science 18, 1321–1325 (2009). 15. DeBartolo, J., Jha, A., Freed, K. F. & Sosnick, T. R. Local Backbone Preferences and Nearest-Neighbor Effects in the Unfolded and Native States. Protein and Peptide Folding, Misfolding, and Non-Folding 79–98 (2012). 16. Berkholz, D. S., Krenesky, P. B., Davidson, J. R. & Karplus, P. A. Protein Geometry Database: a flexible engine to explore backbone conformations and their relationships to covalent geometry. Nucleic Acids Res. 38, D320–D325 (2010). 17. Shimazaki, H. & Shinomoto, S. A method for selecting the bin size of a time histogram. Neural Computation 19, 1503–1527 (2007). 18. Shen, Y., Delaglio, F., Cornilescu, G. & Bax, A. Talos+: a hybrid method for predicting protein backbone torsion angles from nmr chemical shifts. Journal of biomolecular NMR 44, 213–223 (2009). 19. Shen, Y. & Bax, A. Protein structural information derived from nmr chemical shift with the neural network program talos-n. In Artificial Neural Networks, 17–32 (Springer, 2015). 20. Hovmöller, S., Zhou, T. & Ohlson, T. Conformations of amino acids in proteins. Acta Crystallographica Section D: Biological Crystallography 58, 768–776 (2002). 21. Ho, B. K. & Brasseur, R. The ramachandran plots of glycine and pre-proline. BMC structural biology 5, 14 (2005). 22. Ho, B. K., Coutsias, E. A., Seok, C. & Dill, K. A. The flexibility in the proline ring couples to the protein backbone. Protein Science 14, 1011–1018 (2005). 23. Betts, M. J. & Russell, R. B. Amino acid properties and consequences of substitutions. Bioinformatics for geneticists 317, 289 (2003). 24. Ho, B. K., Thomas, A. & Brasseur, R. Revisiting the ramachandran plot: Hard-sphere repulsion, electrostatics, and h-bonding in the α-helix. Protein Science 12, 2508–2522 (2003). 25. Ramachandran, G. & Ramakrishnan, C. t. & Sasisekharan, V. Stereochemistry of polypeptide chain configurations. Journal of molecular biology 7, 95 (1963). 26. Bohórquez, H. J. et al. Electronic energy and multipolar moments characterize amino acid side chains into chemically related groups. The Journal of Physical Chemistry A 107, 10090–10097 (2003). 27. Kim, S.-Y., Jung, Y., Hwang, G.-S., Han, H. & Cho, M. Phosphorylation alters backbone conformational preferences of serine and threonine peptides. Proteins: Structure, Function, and Bioinformatics 79, 3155–3165 (2011). 28. Fujiwara, K., Toda, H. & Ikeguchi, M. Dependence of α-helical and β-sheet amino acid propensities on the overall protein fold type. BMC structural biology 12, 18 (2012). 29. Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences 89, 10915–10919 (1992). 30. Bordo, D. & Argos, P. Suggestions for “safe” residue substitutions in site-directed mutagenesis. Journal of molecular biology 217, 721–729 (1991). 31. Bohórquez, H. J., Cárdenas, C., Matta, C. F., Boyd, R. J. & Patarroyo, M. E. Methods in biocomputational chemistry: a lesson from the amino acids. Quantum Biochemistry 403–421. 32. Chatterjee, P. & Sengupta, N. Effect of the a30p mutation on the structural dynamics of micelle-bound α synuclein released in water: a molecular dynamics study. European Biophysics Journal 41, 483–489 (2012). 33. Lehmann, J., Libchaber, A. & Greenbaum, B. D. Fundamental amino acid mass distributions and entropy costs in proteomes. Journal of Theoretical Biology 410, 119–124 (2016). 34. Seligmann, H. Cost-minimization of amino acid usage. Journal of molecular evolution 56, 151–161 (2003). 35. Raiford, D. W. et al. Do amino acid biosynthetic costs constrain protein evolution in saccharomyces cerevisiae? Journal of molecular evolution 67, 621–630 (2008). 36. Akashi, H. & Gojobori, T. Metabolic efficiency and amino acid composition in the proteomes of escherichia coli and bacillus subtilis. Proceedings of the National Academy of Sciences 99, 3695–3700 (2002). 37. Davis, B. K. Evolution of the genetic code. Progress in biophysics and molecular biology 72, 157–243 (1999). 38. Griffiths, G. Cell evolution and the problem of membrane topology. Nature Reviews Molecular Cell Biology 8, 1018–1024 (2007). SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 1 1 www.nature.com/scientificreports/ 39. Guilloux, A. & Jestin, J.-L. The genetic code and its optimization for kinetic energy conservation in polypeptide chains. Biosystems 109, 141–144 (2012). 40. Brooks, D. J., Fresco, J. R., Lesk, A. M. & Singh, M. Evolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code. Molecular Biology and Evolution 19, 1645–1655 (2002). 41. Kawashima, S. & Kanehisa, M. Aaindex: amino acid index database. Nucleic acids research 28, 374–374 (2000). 42. Dosztanyi, Z. & Torda, A. E. Amino acid similarity matrices based on force fields. Bioinformatics 17, 686–699 (2001). 43. Benner, S., Cohen, M. A. & Gonnet, G. H. Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Engineering 7, 1323–1332 (1994). 44. Taylor, W. R. The classification of amino acid conservation. Journal of theoretical Biology 119, 205–218 (1986). 45. Harms, M. J. & Thornton, J. W. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nature Reviews Genetics 14, 559–571 (2013). 46. Koonin, E. V. & Wolf, Y. I. Constraints, plasticity, and universal patterns in genome and phenome evolution. In Evolutionary Biology–Concepts, Molecular and Morphological Evolution, 19–47 (Springer, 2010). 47. Davis, B. K. Molecular evolution before the origin of species. Progress in biophysics and molecular biology 79, 77–133 (2002). 48. Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proceedings of the National Academy of Sciences of the United States of America 102, 14338–14343 (2005). 49. Drummond, D. A. & Wilke, C. O. The evolutionary consequences of erroneous protein synthesis. Nature Reviews Genetics 10, 715–724 (2009). 50. van Rossum, G. & de Boer, J. Linking a stub generator (ail) to a prototyping language (python). In Proceedings of the Spring 1991 EurOpen Conference, Troms, Norway, 229–247 (1991). 51. Python Software Foundation. Python language reference. URL http://www.python.org. 52. Hunter, J. D. Matplotlib: A 2d graphics environment. Computing In Science & Engineering 9, 90–95 (2007). 53. Shapovalov, M. V. & L., D. J. R. Non-Parametric Statistical Analysis Of The Ramachandran Map. Biomolecular Forms and Functions: A Celebration of 50 Years of the Ramachandran Map 76 (2013). 54. Lovell, S. C. et al. Structure validation by C α geometry: φ, ψ and C β deviation. Proteins: Structure, Function, and Bioinformatics 50, 437–450 (2003). 55. Dayhoff, M. O. & Schwartz, R. M. A model of evolutionary change in proteins. In In Atlas of protein sequence and structure (Citeseer, 1978). 56. Valdar, W. S. Scoring residue conservation. Proteins: Structure, Function, and Bioinformatics 48, 227–241 (2002). 57. Peterson, B. G. et al. Performanceanalytics: Econometric tools for performance and risk analysis. r package version 1.4. 3541 (2014). Acknowledgements We would like to thank Professor Mario Amzel for his insightful comments on the paper. Author Contributions C.F.S. and H.J.B. proposed the project and developed the methodology of the study. H.J.B. wrote the Python codes. C.F.S. and H.J.B. carried out computations. C.F.S. and H.J.B. analyzed the data. M.E.P. supervised the project. H.J.B. wrote the manuscript whose final version include contributions by all authors. Additional Information Supplementary information accompanies this paper at doi:10.1038/s41598-017-08041-7 Competing Interests: The authors declare that they have no competing interests. Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre- ative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per- mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. © The Author(s) 2017 SCientifiC REPORTS | 7: 7717 | DOI:10.1038/s41598-017-08041-7 1 2 Supplementary material Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements Hugo J. Bohórquez1,+,*, Carlos F. Suárez1,2,3,+, and Manuel E. Patarroyo1,4 1Fundación Instituto de Inmunologı́a de Colombia, FIDIC, Biomathematics, Cra. 50 No. 26-00, Bogotá D. C., Colombia 2Universidad de Ciencias Aplicadas y Ambientales, UDCA, Bogotá D. C., Colombia 3Universidad del Rosario, Bogotá D. C., Colombia 4Universidad Nacional de Colombia, Bogotá D. C., Colombia +Hugo J. Bohórquez and Carlos F. Suárez contributed equally to this work. ABSTRACT We use the protein geometry database (PGD 1.1)1 for obtaining the high-resolution Ramachandran distributions as 2D-binned probability histograms (Figures S1 to S20). The optimal bin area ( 1.895◦× 1.895◦) dividing the Ramachandran map was obtained with the method of Shimazaki & Shinomoto.2 Figure S21 shows the correlation matrix plot with significance levels between the replacement inertia IX and the mutability of the full set of replacement matrices used in the present study (Table S1). 1 Figure S1. High-resolution Ramachandran distribution PAla(φ ,ψ) of alanine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 2/24 Figure S2. High-resolution Ramachandran distribution PArg(φ ,ψ) of arginine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 3/24 Figure S3. High-resolution Ramachandran distribution PAsn(φ ,ψ) of asparagine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 4/24 Figure S4. High-resolution Ramachandran distribution PAsp(φ ,ψ) of aspartic acid as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 5/24 Figure S5. High-resolution Ramachandran distribution PCys(φ ,ψ) of cysteine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 6/24 Figure S6. High-resolution Ramachandran distribution PGln(φ ,ψ) of glutamine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 7/24 Figure S7. High-resolution Ramachandran distribution PGlu(φ ,ψ) of glutamic acid as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 8/24 Figure S8. High-resolution Ramachandran distribution PGly(φ ,ψ) of glycine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 9/24 Figure S9. High-resolution Ramachandran distribution PHis(φ ,ψ) of histidine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 10/24 Figure S10. High-resolution Ramachandran distribution PIle(φ ,ψ) of isoleucine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 11/24 Figure S11. High-resolution Ramachandran distribution PLeuX (φ ,ψ) of leucine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 12/24 Figure S12. High-resolution Ramachandran distribution PLys(φ ,ψ) of lysine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 13/24 Figure S13. High-resolution Ramachandran distribution PMet(φ ,ψ) of methionine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 14/24 Figure S14. High-resolution Ramachandran distribution PPhe(φ ,ψ) of phenilalanine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 15/24 Figure S15. High-resolution Ramachandran distribution PPro(φ ,ψ) of proline as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 16/24 Figure S16. High-resolution Ramachandran distribution PSer(φ ,ψ) of serine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 17/24 Figure S17. High-resolution Ramachandran distribution PT hr(φ ,ψ) of threonine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 18/24 Figure S18. High-resolution Ramachandran distribution PTrp(φ ,ψ) of tryptophan as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 19/24 Figure S19. High-resolution Ramachandran distribution PTyr(φ ,ψ) of tyrosine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 20/24 Figure S20. High-resolution Ramachandran distribution PVal(φ ,ψ) of valine as derived from the PGD 1.1 database at 1.895◦×1.895◦ bin size (logartihmic scale). 21/24 22/24 5 15 6 14 4 8 5 9 6 11 4 10 2 8 6 14 40 2 10 0.4 1.4 8 16 4 12 10 35 6 14 I *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** * **x 0.90 0.92 0.91 0.91 0.92 0.90 0.92 0.96 0.80 0.83 0.84 0.83 0.83 0.80 0.91 0.86 0.88 0.83 0.83 0.93 0.78 0.91 0.83 0.90 0.81 0.92 0.88 0.68 0.52 0.64 ° °° BLOSUM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** *** *** *** *** *** *** *** * **30 0.99 0.98 0.96 0.96 0.96 0.93 0.90 0.71 0.80 0.82 0.81 0.81 0.82 0.95 0.94 0.96 0.85 0.91 0.92 0.66 0.88 0.91 0.94 0.90 0.96 0.94 0.84 0.56 0.57°°°°°°°° °° °BLOSUM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** * ** °°°° °°°° ° 40 0.98 0.97 0.97 0.96 0.93 0.92 0.71 0.81 0.82 0.82 0.82 0.84 0.96 0.95 0.96 0.87 0.92 0.93 0.69 0.89 0.92 0.94 0.91 0.96 0.94 0.84 0.54 0.58 °°°°° °°°° °° °°BLOSUM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** * ** ° ° °° °° 0.97 0.98 0.97 0.95 0.92 0.70 0.80 0.82 0.81 0.81 0.83 0.97 0.96 0.97 0.88 0.94 0.91 0.69 0.87 0.94 0.96 0.93 0.97 0.95 0.87 0.53 0.60° 50 °°° °° °°°°° °°°°° ° °° °°BLOSUM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** *** *** *** *** *** *** *** * **° ° °°° °°° °°° 0.98 0.98 0.96 0.93 0.70 0.79 0.80 0.79 0.79 0.82 0.96 0.95 0.97 0.86 0.92 0.88 0.66 0.82 0.92 0.95 0.90 0.96 0.91 0.79 0.46 0.66 °°°°°°°°° °°°°°°° ° 62 °°° °°°°°° . ° °° ° °° ° ° °°° °°° ° ° °°° BLOSUM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** *** *** *** *** *** *** *** ** ° ° ° ° ° 0.99 0.98 0.96 0.74 0.82 0.82 0.81 0.81 0.80 0.96 0.94 0.97 0.85 0.90 0.90 0.67 0.86 0.90 0.94 0.88 0.96 0.90 0.77 0.44 0.64 °°°°°°°° °°°° °°°° °°°°° °°°° °° °°°° 70 ° ° ° ° °° °°° °°° ° ° °°° °° BLOSUM *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** *** *** *** *** *** *** *** . ** ° ° 0.98 0.95 0.75 0.81 0.82 0.80 0.80 0.81 0.96 0.94 0.96 0.85 0.90 0.89 0.66 0.85 0.90 0.93 0.88 0.94 0.89 0.78 0.44 0.64°°°° °°° °° °°° °° °° 80° °° °° °° °° °° ° ° ° ° ° ° ° ° ° BLOSUM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** . *** ° ° °°° °°° °° °° °° °° 0.98 0.79 0.84 0.84 0.82 0.82 0.81 0.94 0.91 0.93 0.87 0.87 0.89 0.69 0.85 0.87 0.90 0.85 0.94 0.86 0.73 0.44 0.73 °°°°°° °°°°°° °°° °°°°°° °°°° °°° °°°° 90°°° °°° °° °° °° °° °°° ° ° *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ** .° ° BLOSUM ***°°° °°°° °°°°° °°°° ° ° °°° °°°°° °°°°° °°°° 0.82 0.85 0.85 0.83 0.83 0.80 0.92 0.88 0.88 0.86 0.83 0.91 0.73 0.86 0.83 0.89 0.82 0.91 0.84 0.67 0.44 0.72100 °°°°° °°° °°°°° °°° °°° °°°° °°°° °°°° °°°°° °°° ° °°°°°° °°°°°° °°°°° ° ° ° ° ° °° °°°° ° ° ° ° ° ° ° ° ° ° ° ° *** *** *** *** *** *** *** ** *** *** *** *** *** *** ** *** *** ** * ** **°°°° °°°° °° °° °°° ° °°°° °°°° °°°° °°°° °°°° PAM40 0.95 0.93 0.91 0.91 0.82 0.79 0.74 0.66 0.82 0.73 0.81 0.88 0.83 0.73 0.62 0.74 0.73 0.63 0.53 0.65 0.67 °°°°°°° °°°° °°°°° °°° °°°°°°° °°°°°°°° ° °°°°°°°°° °°°°°°°° ° °°°°°°°°°°° °°°°°°°°° °°°°° ° ° ° °°°°°°°° ° ° *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** °°°°° °°°°° °°° °° °°°° ° °°°°° °°°°° °°°°° °°°°° °°°° °°°° PAM80 0.99 0.98 0.98 0.93 0.87 0.85 0.74 0.94 0.85 0.88 0.90 0.88 0.85 0.73 0.87 0.82 0.75 0.71 0.71 0.74°°°°°° °°°°°°°° °°°°°°°° °°° ° ° °°°° °°° °°°°°° °°°°° ° °°°°°°°° °°°°° °°°° ° ° ° ° ° °°°° °°°° °°°° °° °°°° °°°°°°° ° °° °°° ° ° ° ° ° ° ° ° ° ° ° ° ° PAM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***°°°° ° 0.99 0.99 0.95 0.88 0.86 0.75 0.95 0.87 0.90 0.90 0.91 0.87 0.74 0.88 0.84 0.79 0.73 0.74 0.72°°°°°° °°°°°° °°° °°°°°° °°°°°° °°°°°° °°° °°°° °°° °° °°° °°° °°° °°° °°° ° ° ° 120°° °° °°°° °°°° °°°° ° °° ° ° °° ° °°°° °°°° ° ° °°° °°° °°°°°°° °°° ° °°°° ° °°°°°°° °°°° °°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° °° ° °° ° ° PAM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***°°° °°°° °° ° °°° °° ° ° ° °°° °°° °°°° °° ° °°° °°° 160 1.00 0.97 0.89 0.88 0.75 0.96 0.88 0.90 0.91 0.90 0.88 0.73 0.89 0.84 0.80 0.73 0.76 0.70°°° ° ° ° ° ° ° ° °° ° ° °°°° °°°°° °°°°°° °°°°° °°°°°° °°°°°° °°°°°°° °°°°°° °°°°°° °°°°° °°°° °°°° ° ° ° ° ° ° ° ° ° ° ° ° °° °° ° °° ° ° °° PAM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***°°° °°°° °° ° °°° °° ° ° ° °°° °°° °°°° °° ° °°° °°° °° 0.97 0.89 0.88 0.75 0.96 0.88 0.90 0.91 0.90 0.88 0.73 0.89 0.84 0.80 0.73 0.76 0.70°°° °°°° °°° °°°° °°°° °°° °° °° °° °° °°°° °° °°°°° °°° °°°°° °° ° ° °° 200°° ° °°°° °°° °°° °°° °°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° PAM *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** **°° ° ° ° ° ° ° ° ° ° ° ° 0.91 0.91 0.76 0.98 0.94 0.85 0.89 0.85 0.94 0.74 0.94 0.83 0.82 0.79 0.75 0.66 °°°°°°°° °°°°°° °°°° °°°°°°° °°° °°° °°°°°°° ° ° ° °°°°°°° °°°°°° ° °°° ° °°°°°° °°°° °°°°° ° ° ° °°°°°°°°° °°°°°°°° ° ° ° ° ° ° 250 ° ° ° ° ° ° °°°°° °°° °°°°°°°° °°°°°°° °°°°°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °VTML *** *** *** *** *** *** *** *** *** *** *** *** *** ** **° ° °° °°° °° °°° °° °° °° °°° °°° °°° °°° °°° °°° °° 160 0.99 0.95 0.92 0.97 0.92 0.78 0.90 0.97 0.93 0.96 0.96 0.92 0.84 0.59 0.62 °°°°°° °°°°° ° °°°° °°°°° °°°° °°°° °°°°°° °°°° °°°°°°° °°°°°° ° °°°°°°° °°°°°°° °°°°°°° °°°°°°°° ° ° ° ° ° ° °°° °°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° VTML *** *** *** *** *** *** *** *** *** *** *** *** ** **° °° ° ° ° ° ° ° °° ° ° ° ° ° ° 0.93 0.93 0.99 0.89 0.77 0.86 0.99 0.91 0.97 0.94 0.92 0.87 0.60 0.60°°°°°° °°°°° ° °° ° ° °° ° ° ° ° ° ° °° °°° °°°° ° °° °°° °°°° °°° °°° °° °°° 200 °° °°° ° °°°°° °°°° °°°° °°°°°° °°°° °°°°°° °°°°° ° °°°°°°°° °°°°°°°° °°°°°°°° °°°°°°°° °°°° °° ° ° °° ° ° °° °° ° ° ° ° ° ° ° ° °° °° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° *** *** *** ** *** *** *** *** *** *** *** . * °°° °° °° °° OPTIMA 0.80 0.90 0.88 0.61 0.84 0.90 0.96 0.88 0.97 0.92 0.79 0.44 0.54 °°°°° °°°° °°° °°°° °°°° °° °°°° °° °° °°°° °° °° ° ° ° ° ° ° °°° °°°°°° °°°°° ° °° °°° °°°° °°° ° °°° ° °°°° °°° °°° ° °° °° °° °° °° °°° °° °°° °°°° °° ° ° °°°°°° ° °°°°°°° ° °°°°°° ° °°°°°° °° °°°°° °°° °°°° °°°°° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° *** *** *** *** *** *** *** *** *** *** *** ***°° ° ° ° °° °° °° °° °° °° °° ° °° °° ° °° °° °° PET91 0.94 0.88 0.85 0.86 0.94 0.80 0.95 0.87 0.86 0.82 0.71 0.74°° °° °° ° °° ° ° ° °° ° ° °° °° ° ° ° ° °° °°°°°° °°°°° °°°°° °°°° ° °°°° °°° °°°° °°°°°° °° °°°° °°°° °°°°°° °°°°°°° °°°°° °°°° °° °°° °°° °° °° °°°° °°°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° °° ° ° ° ° ° ° °° °° °°°° °°° °°°° ° ° °° °° °° ° °GONNET *** *** *** *** *** *** *** *** *** ** **0.87 0.79 0.85 0.88 0.67 0.59 °°°°° ° °° °°°° °°°° °°°° ° °° °° ° °° ° °°°° ° °°°° ° °°°°°°° °°° °°°° ° °° ° ° °° °° °°° °°°°°°° °°°° ° °°°°°°° °°°°°°° °°°° ° ° ° °°°°° °°°° ° ° ° ° °°°°° °°°°° °°°°°° °°°° °°°°°° °°°° 92 1.00 0.99 0.93 0.91 0.90 °° °°°° °°° ° °° °°° ° ° ° ° °° ° ° ° °° ° °°°° °°°° °°° ° °°° ° °°° ° °° ° °° ° °° ° °° °°° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° *** *** *** *** *** *** *** *** ** ** °°°° °°°° °°° °°°°°° °°°°° °°°°° °°°°°°° °°°° °°°° ° °°°°° °°°° JOHNSON °°° °°°° °°°° °°° °°°° °°°° ° °° °°° °°° 0.78 0.98 0.87 0.91 0.86 0.94 0.93 0.77 0.65 0.62 °°°° °°° °°° °°° °°°° °°°° °°°°° °°°° °°°° °°° °°° °°° °° ° ° ° ° ° ° ° ° °° °°° °°°° °°°° °°°°° ° °°°°°° ° ° °° °° °° ° °° 93 °° °° °°°° °° °° °°°° °°° °°°° °° °° °°° °°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° *** *** ** *** *** ** ** *** **°° °° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° MIYA91 0.82 0.79 0.60 0.81 0.71 0.68 0.63 0.77 0.58°°°° ° °°° °°°° °°°°°° °°°°°°° ° ° ° ° ° ° ° ° ° ° ° ° ° °°°°°°°°°° °°°°°°°°°° °°°°°°° ° °°°°°°°°°° °°°°°°°°° °°°°°°°°°° °°°°°°° °°°°°°° ° ° ° ° ° °° ° ° ° °°°°°°°°°° °°°°°°°°°° °°°°°°°°°° °°°°°°° °°°°°°°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° °° °°°°°°°° °° °°° °°°°°°°°°° °°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °°° °°° ° °°° °°°°° ° ° °° ° ° ° ° ° ° ° ° ° °OVER *** *** *** *** *** *** *** ** °°°°°°°° °°°°°°° °°°°°° °°°°°°°°°° °°°°°°°°° ° °° ° ° °°°° °° °°°°°°° °°°°°° °°°°° °°°°° °°°°° °°°°° °°°° °°°°°° °°°°°° °°° °°°°° °°°°° °°°° °°° °°°° °°° °°°°° 0.85 0.86 0.85 0.92 0.90 0.76 0.69 0.57 °° ° ° ° ° ° ° °°°°°° °°° ° ° ° ° ° ° °°°°°°° ° 92 °°° °°° °°°° ° ° °°°° °° °°°°°° °°°°°° °°°°° °°°°°°° °°°°°°° °°°°°°°°° °°°°°° °°°°°° °°°° °°°°°°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° °° °° ° ° °° °° ° ° °°VOGT *** *** *** *** *** ** ** °° ° ° ° ° °° °°° 0.88° ° ° ° ° ° ° 0.99 0.93 0.91 0.90 0.67 0.59 °°°°°° °°°° ° °°° ° °°°°°° °°°° ° °°°° ° °°°°°°° °°°° °°°° ° °°° °°°° °°°° °°°°° °°°°° °° °°°° °° °°°° °° °° °°° °°°° ° °° 95 ° ° ° °°° °°° °°° ° ° ° ° ° ° ° ° ° °°°°° ° °°°°°°°° °°°°°° ° °°° °°°° ° ° ° ° ° ° ° ° ° ° ° ° °°°°° °° ° °°°°° °° ° °°°°°°°°° ° ° ° °°°°°°°°°° °°°°° ° ° °°° °°° °°°°° ° ° °°° ° ° °°°°°° °°°° ° ° ° °°° °°° °°°°°° °°°°° °°°° °°°°°°° ° °° °° *** *** *** *** * * °°°°° °°°° °°° °°° °°°°° °°°° °°°°° °°°° °°°°° °°° °° °°°° °°°° °°°°° °°°°° °°°°° °°°° °°°° °°°° ° °° °° °° °° °° °° PRLIC° ° °° ° ° ° °°° °°° 0.86 0.97 0.95 0.81 0.45 0.55 °°°° °°°°° °° °°° °° °°° °°°°° °°° °°° °°°°°° °°°°°° °°°°° °°°°° °°°°° °°°° °°°° °°°° °°°° °°°°° °°°°° °°°°°°° °° ° ° ° ° ° ° ° ° ° ° ° °° °° °° ° ° ° °° °° ° °°°°°° ° °° 00 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °°°° °° °°°°°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° °° °°STROMA *** *** *** *** **°°° °° °° °° °° °°° °°°° °°° °°° ° ° ° °°° °°° °°°° °°°° °° °°° °° °°° °° °° °° °°° ° ° °° °°° 0.91 0.90 0.90 0.70 0.58°°°°°°° °°°°°° ° °°° ° ° ° ° ° ° ° °° ° ° °°°°°° °°°°°° °°°°° °°°°°°° °°°°°° °°°°°°° °°°°°° ° °°°°°° °° ° ° ° ° ° ° °°°°°°°° °°°°°°°° °°°°°°°° °°°° °°°°°° °°°°°°° °°°°°° ° ° °°°° °° ° °°° °° ° °° ° °° °° °° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° °° °°°°° ° °°° °°°°°° °°°°°° °°°° °°°° °°°°°°° ° °° ° ° °°° °°° °°° °° °°° °°° °° °° °°° °°° °°° °°° °°° °°° °° ° ° ° ° ° ° ° ° *** *** ** ** ° ° °° ° ° °° ° ° °°° °° CROOKS 0.96 0.84 0.59 0.61 °° °°°°° °°°°°° °°°° ° °°°° °° °°°°° °°°°° °°°° °°°°°° °°° °° ° °° ° ° ° ° ° °°°° ° °°°° °°°°°° °°°°°°° ° °°°°°°°° °° ° ° ° ° °°° °°°° °°°°°°°° °°°° ° ° °° °°°° °°°° °° °°°° °°°°°° °° °°°°°° °°°°°°°° °°°°°°° °°°°°° °°°°°°° °°°°°°°°° °°°°°°°° °°°°°°° °°°°°° °°°°°°°° 05° ° ° ° ° °° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °BLAKE *** ** * °°° °° °°° °°° °°° °°° °°°°° °°° ° ° ° ° ° ° °°°°° °°°°°°° °°°°°° °°°°° °°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° 0.85 0.62 0.54 ° ° ° ° ° ° ° °° ° °° °° ° °°°° °°°° °°°°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °°° °° °° °° °°° °°° °°° °°° °°°° °°° °°° °°° °°° °°° °° ° °° °°°°° °°°°° °° °°°°° °°° °°°°°° °°°° °°°°° °°°°° °°° °°°°°° ° ° ° °° °° ° ° ° ° ° °°°°°° °°°°°°° °°°°° ° ° °°° °°° °°° °°°° 01 ° ° ° ° ° ° ° ° ° ° °° °° °° ° ° ° ° ° ° ° ° ° ° °°° °°°°° °°° °°°° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °° ° ° ° ° *** *°° ° ° ° 0.69 0.45 °°°°°°° ° °° ° ° ° ° ° ° °°°° °°°° °°°°° °°° °°°° °°°°° °°° °°°°° °°°°° °°°°°° °°°°°°°° °° °°°°°°° ° °°°°°° ° °°°°°°° ° °°°° ° ° ° ° ° ° °°° °° °°° °°° °°° °°°°°°° °°° °° °°°°° °° °°°° ° °°°°° °°° °°°° °°°° ° ° °° ° ° °° °° ° °° ° ° SAUSAGE_P ° ° ° ° ° ° °°° °°°° °°°°° °°°° °°° °°°° °°°°°° °°° ° °°°°° °°°° °°°°°°°° °°°° °°° °°°°° °°°°° ° ° ° ° °°°°°° °° ° °° °° ° ° ° °° ° °°° °°° °°° .. °° °° ° ° ° ° °°°° °°°°° °°°°° °°°°° °°°°° °° °° °° ° °° ° °° ° ° °° ° °° ° ° ° ° ° ° ° ° ° ° ° ° °° °°° °°° °°° ° °° °° ° °° °° ° ° °°° °° ° °° ° °° ° ° ° °° °° °° °° °° °°°°° ° ° ° °°° ° °° ° ° ° °°° ° ° ° ° ° ° °° ° ° ° °° ° ° ° °°°° °° °°° ° °° ° °° ° °°°° ° °°°° ° °°° °°° °°°° °° ° ° ° ° ° ° ° °°° °° ° °° °° 0.38 °°° °°° °° °°° °° °°° °°°° °° °° °° °° °° °° °° °° °° °°°°° ° °°°° °°°°°° °°°°°° ° °° °°°°°° ° °°° °°°°° ° °°° °°° °°°°°° °°° °°° °°°° °°° °°°° °° °°°° °°°°°° ° ° °° °°° ° °°°° °°°°° °°°° ° ° °° ° THREAD_P°° °° °° °° °° ° ° °° °°° °° °° °°° °°°°° ° °°° ° °°° ° ° ° ° °°° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° °° ° ° ° ° °° ° °° ° °° ° °° ° °° ° °° ° ° °° ° ° °BENNER °°°°°° ° °°°°°°°° °°°°°° °°°°°°°° °°°°°° ° °° ° °° °°°°° °° °°°°°°°° °°°°°° ° ° °°°°° °°°°°° °°°°°° ° °°°°°°° ° °°°°°°°° °° ° °°°°°°° ° ° ° ° °°°° °°°° °°°° °°°° °°°°°° ° °°°° °° °°°°°°°° ° °° °°°°°°°°° ° ° °°°°° °°° ° ° ° ° ° ° ° °° ° ° °°° ° ° °°°°°° °°°°°°° °°°°°° °°°°°°°° °° °° ° ° ° °°°°° °°°°° °°°°° °°°°°°° ° °°° °°° ° °° °° °°° °°°°°°° °°°°° °°°°°°° °°° ° °°°°° °° °°° ° °°°°° ° ° °° °°°°°°°° 94 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °°° 0 3 6 16 4 8 6 14 8 14 4 10 2 8 5 15 4 12 2 12 6 14 4 8 6 16 4 10 4 12 3 6 Figure S21. Correlation matrix plot with significance levels between the replacement inertia IX and the mutability of the full set of replacement matrices used in the present study (Table S1). The lower triangular matrix is composed by the bivariate scatter plots with a fitted smooth line. The upper triangular matrix show the Pearson correlation plus significance level (as stars). Each significance level is associated to a symbol: p-values 0.001 (***), 0.01 (**), 0.05 (*). This plot was generated with the Performance Analytics package in the R program.3 The abbreviations used in this plot are detailed in Table S1. 6 1 6 1 0 4 8 1 8 0 . 4 2 1 2 4 0 6 1 6 2 1 2 4 1 2 6 1 2 5 1 0 4 9 6 5 2 0 3 7 4 1 4 4 1 2 6 1 8 4 9 6 1 6 2 1 4 4 1 4 5 2 1 2 4 1 2 8 1 6 6 1 6 4 9 6 1 8 0 3 Table S1. Abbreviations used in the present study (left) and the corresponding description (center) of the set of substitution matrices with their respective source or AAindex code (right). Name Description AAindex Entry/Source BLOSUM30 The BLOSUM30 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ BLOSUM40 The BLOSUM40 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ BLOSUM50 The BLOSUM50 matrix HENS920104 BLOSUM62 The BLOSUM62 matrix HENS920102 BLOSUM70 The BLOSUM70 matrix HENS920103 BLOSUM80 The BLOSUM80 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ BLOSUM90 The BLOSUM90 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ BLOSUM100 The BLOSUM100 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ PAM40 The PAM40 matrix DAYM780302 PAM80 The PAM80 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ PAM120 The PAM120 matrix ALTS910101 PAM160 The PAM160 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ PAM200 The PAM200 matrix ftp://ftp.ncbi.nih.gov/blast/matrices/ PAM250 The PAM250 matrix DAYM780301 VTML160 The VTML160 matrix MUET020101 VTML200 The VTML250 matrix MUET020102 OPTIMA The OPTIMA matrix KANM000101 PET91 The 250 PAM PET91 matrix JOND920103 GONNET92 The mutation matrix for initially aligning GONG920101 JOHNSON93 Structure-based amino acid scoring table JOHM930101 MIYA91 Base-substitution-protein-stability matrix MIYS930101 OVER92 STR matrix from structure-based alignments OVEJ920101 VOGT95 Amino acid exchange matrix VOGG950101 PRLIC00 Homologous structure derived matrix PRLA000102 STROMA STROMA score matrix for the alignment of known distant homologs QUIB020101 CROOKS05 Substitution matrix computed from the Dirichlet Mixture Model CROG050101 BLAKE01 Matrix built from structural superposition data for identifying potential BLAJ010101 SAUSAGE P Amino acid similarity matrix based on the SAUSAGE force field DOSZ010101 THREAD P Amino acid similarity matrix based on the THREADER force field DOSZ010103 BENNER94 Genetic code matrix BENS940104 23/24 References 1. Berkholz, D. S., Krenesky, P. B., Davidson, J. R. & Karplus, P. A. Protein Geometry Database: a flexible engine to explore backbone conformations and their relationships to covalent geometry. Nucleic Acids Res. 38, D320–D325 (2010). 2. Shimazaki, H. & Shinomoto, S. A method for selecting the bin size of a time histogram. Neural Computation 19, 1503–1527 (2007). 3. Peterson, B. G. et al. Performanceanalytics: Econometric tools for performance and risk analysis. r package version 1.4. 3541 (2014). 24/24 Capítulo 5. Semi-empirical quantum evaluation of peptide – MHC class II binding González R, Suárez CF, Bohórquez HJ, Patarroyo MA, Patarroyo ME. Semi-empirical quantum evaluation of peptide–MHC class II binding. Chemical Physics Letters. 2017;668:29-34 La versión publicada del artículo puede ser consultada en: http://www.sciencedirect.com/science/article/pii/S0009261416309642 153 Semi-empirical quantum evaluation of peptide – MHC class II binding Ronald Gonzáleza,b,e, Carlos F. Suáreza,b,c,e, Hugo J. Bohórqueza,b,c, Manuel A. Patarroyoa,b, Manuel E. Patarroyoa,d,∗ aFundación Instituto de Inmunoloǵıa de Colombia (FIDIC), Bogotá D. C., Colombia bUniversidad del Rosario, Bogotá D. C., Colombia cUniversidad de Ciencias Aplicadas y Ambientales (UDCA), Bogotá D. C., Colombia dUniversidad Nacional de Colombia, Bogotá D. C., Colombia eBoth authors equally contributed as first author Abstract Peptide presentation by the major histocompatibility complex (MHC) is a key process for triggering a specific immune response. Studying peptide- MHC (pMHC) binding from a structural-based approach has potential for reducing the costs of investigation into vaccine development. This study involved using two semi-empirical quantum chemistry methods (PM7 and FMO-DFTB) for computing the binding energies of peptides bonded to HLA-DR1 and HLA-DR2. We found that key stabilising water molecules involved in the peptide binding mechanism were required for finding high correlation with IC50 experimental values. Our proposal is computationally non-intensive, and is a reliable alternative for studying pMHC binding inter- actions. Keywords: FMO-DFTB, PM7, HLA-DR, Receptor-ligand interactions ∗Corresponding author Preprint submitted to Chemical Physics Letters November 15, 2016 1. Introduction The major histocompatibility complex (MHC) —or human leukocyte antigen (HLA) in humans— plays a key role in an adaptive immune re- sponse against pathogens and cancer, presenting self and non-self peptides to T-cells. Researching peptide-MHC (pMHC) binding mechanisms should improve our understanding of pathogenic diseases, autoimmunity and can- cer; consequently, this is of paramount importance in designing drugs and vaccines [1]. MHC molecules involved in antigen presentation can be divided into two classes: I and II. MHC class I molecules bind especially to endogenous pep- tides and are present in all nucleated cells. MHC class II molecules are expressed in professional antigen-presenting cells (such as dendritic and B- cells) and bind to exogenous antigens. Although MHC class I and II peptide binding region (PBR) have similar architecture —a groove that attaches antigenic peptides within a binding frame of nine amino acids (P1 to P9)—, MHC class I having a unique binding frame while MHC class II PBR has an open groove, consequently, calculations of pMHC binding for MHC class II is difficult, because peptide length variation and multiple binding frames increasing the amount of required calculations [2]. Studying peptide binding to MHC is extremely challenging: First, the receptor isolation and the binding assays themselves require extensive and expensive testing[3, 4]; second, the high MHC polymorphism increases the number of molecular systems to be studied [5]; and third, up to 1.2 x 1019 2 potential peptides might bind to each receptor. A promising line of attack is the use of computational methods to evaluate whether a given pMHC binding occurs, thereby reducing the number of experimental measurements required. The computational methods for pMHC binding estimation can be di- vided into sequence-based methods —which use experimental binding data as training input for several kind of algorithms (e.g. neural networks) [6]; and structure-based methods —which use mainly the pMHC binding energy from structural information alone[7]; this approach is specially advantageous for studying pMHC interactions, due to its independence from experimental data and the possibility of obtaining structures of non-crystallised complexes using homology modelling [8]. The present work describes a structure-based approach, using quantum mechanical semi-empirical methods for calculating pMHC-DR binding en- ergies. Semi-empirical methods can be defined as the simplest version of electronic structure theory; by performing a large number of approxima- tions and parameterisations it is possible to obtain an efficient computational approach [9]. The PM7 method and the density-functional tight-binding method (DFTB) are two of the most used and efficient semi-empirical meth- ods for studying large bio-molecular systems [10, 11]. Furthermore, the re- cent implementation of the fragment molecular orbital method (FMO) on DFTB [12] has reduced computation times by dividing large bio-molecules into smaller pieces [13, 14]. 3 We calculated the binding energy of 22 peptides bound to MHC class II: HLA-DR1 (8 peptides) and HLA-DR2 (14 peptides). Ligand-receptor bind- ing has high sensitivity to water molecules in the interface[15]; the role of water molecules has already been described regarding pMHC binding [16]. Thus, we including crystallographic waters located near the pMHC inter- face and correlated these values with the corresponding experimental bind- ing affinities (IC50), estimating the capacity of discriminate binders from non-binders using receiver operating characteristic (ROC) analyses. 2. Methodology 2.1. Studied sets Two sets of MHC class II molecules were studied: 1) a crystallised HLA- DR1 structure, (HLA-DRA*01:01/HLA-DRB1*01:01, pdb code 1DLH) com- plexed with haemagglutinin peptide (HA306−318) [17], using IC50 experimen- tal values for native HA306−318 and 7 mono-substituted (Asp) analogues from Geluḱs et al’s. study [18] (FIG 2A) and 2) a crystallised HLA-DR2 struc- ture (HLA-DRA*01:01/HLA-DRB1*15:01, pdb code 1BX2) complexed with myelin peptide (MY86−96) [19], using IC50 experimental values for native MY86−96 and 13 mono-substituted (Ala) analogues from Krogsgaard́s et al’s. study [20] (FIG 3A). The sequence variation in HLA-DR molecules focused on the HLA-DRB gene (being the most polymorphic MHC class II in humans), HLA-DRA being almost monomorphic [21]. In this case, DRB1*01:01 vs. DRB1*15:01 had 5% sequence divergence in the β1 domain, showing very 4 different peptide-binding profiles [22]. The HLA-DR1 set had well differ- entiated IC50 values (separated into four orders of magnitude, IC50 values ranging from 5 to >12,500 nM) Figure 2A, while the HLA-DR2 set had a more challenging IC50 range, having narrow IC50 values (4 to 199 nM) Figure 3A, some repeated several times. The chosen HLA-DR sets enabled evaluat- ing peptide mono-substitutions using two different kinds of amino acids, Asp for DR1 set and Ala for the case of DR2 set. 2.2. Structure preparation and modelling Amino acid substitutions were made in peptides using the UCSF Chimera swapaa function, using the Dunbrack backbone-dependent rotamer library [23, 24]. The first preparation step involved adding hydrogen atoms to the protein structures using MOPAC2016 software[25]. It should be noted that crystal structures must be optimised before any kind of calculation can be made (for example, binding energies) since minor errors in protein atom coordinates could become in non-realistic energies. We explored several op- timization strategies, and found that the best result was obtained optimising hydrogen atoms using the PM7 method with conductor-like screening model (COSMO) as an implicit solvent model with fixed heavy atoms in their crys- tallographic positions. All residues were neutralised. This strategy has been used previously in ligand-receptor studies [26]. Calculations included all crys- tallographic water molecules within a radius of ≤ 8.0 Å around the peptide. The computing time for the minimisation of the near 3050 hydrogen atoms 5 (∼ 50% of atoms for each system) took 6 hours using 4 CPU cores. 2.3. Binding calculations using the PM7 method Using the previously optimised models, binding enthalpies for the pMHC complexes were calculated according to the following equation: ∆HPM7bind = ∆Hcomplex − ∆Hreceptor − ∆Hpeptide, (1) where ∆Hcomplex is the calculated enthalpy of formation for the pMHC com- plex, ∆Hreceptor is the calculated enthalpy of formation for the MHC protein without peptide and ∆Hpeptide is the calculated enthalpy of formation for the peptide. Binding energies calculated by the PM7/COSMO method took some minutes ( 15 minutes) on 4 CPU cores. 2.4. Binding calculations using the FMO-DFTB method We used the FMO-DFTB method (version 5.1) [27] as implemented in General Atomic and Molecular Electronic Structure System (GAMESS)[28]. The first step in this method consisted of assigning every atom to a fragment. The second step involved calculating a self-consistent field (SCF) for every fragment due to the presence of electrostatic field generated by the remaining fragments. The third step consisted of fragment pair SCF calculations (i.e., the inter-fragment interaction energy) and total properties evaluation, for instance: total energy, gradient, minimisation, etc. These steps summarise the two-body FMO approach. 6 Total energy E in the two-body FMO expansion is: ∑N ∑N E = EI + (EIJ − EI − Ej) , (2) I I>J where EI is the energy of monomer I immersed in the external electrostatic potential generated by the remaining monomers; EIJ is the interaction energy of dimer IJ , which is also immersed in the external electrostatic potential of the other fragments. Using the optimised models with the PM7/COSMO method, total en- ergies for the pMHC complexes and its components were calculated using equation 2 and binding energies were calculated using equation 1 at the FMO2-DFTB level of theory. Binding energies calculated by FMO-DFTB method took only some minutes ( 5 minutes) on 1 CPU core. 2.5. Statistical test of pMHC binding energies vs. experimental IC50 We used a linear model for ln(IC50) vs. ∆Hbind to calculate determination coefficients R2. Receiver operating characteristic (ROC) analyses were per- formed —using the R program pROC package [29]— to estimate the values of the area under the curve (AUC). Affinity IC50 cutoffs for binary codifi- cation were: very strong binders (≤ 5 nM), strong binders (≤ 50 nM) and weak binders (≤ 500 nM). 7 3. Results and discussion We only found strong correlations between ∆Hbind and IC50 by keeping the crystallographic water molecules. These results agreed with Petrone et al. [16] who studied class I pMHC complexes, finding that bound water molecules in the interface have two main tasks: filling empty spaces and bridging hy- drogen bonds between the MHC and a peptide. Interestingly, Li et al., [30] found that breaking the water-mediated hydrogen bond network produced a binding energy loss of at least 8 kcal/mol, as for class I pMHC complexes. We only focused on crystallographic waters located within a radius of ≤ 8.0 Å from the peptide. The correlations observed with this approach were the same as those including all water molecules in the calculations. Hence, only water molecules in close proximity to the pMHC contact region were required for an accurate estimation of binding energy. The correlation plots for exper- imentally measured IC50 values and calculated binding energies are shown in Figure 2 for the HLA-DR1 set and in Figure 3 for the HLA-DR2 set. The same high correlation value (R2 = 0.81) for the HLA-DR1 set was found with the semi-empirical methods used; however, FMO-DFTB gave a higher AUC value for discriminating strong binders (AUCDFTB = 0.86) than PM7 (AUCPM7 = 0.71). On the other hand, FMO-DFTB outperformed PM7 with the HLA-DR2 set, having a correlation of R2 2DFTB = 0.74 vs. RPM7 = 0.61 for FMO-DFTB. This was also true for strong binders discriminated by AUC values: AUCDFTB = 0.94 vs. AUCPM7 = 0.74. Overall, FMO-DFTB showed better predictability than PM7. Moreover, compared to the best 8 sequence-based method, NetMHCIIpan 3.1 [6] (HLA-DR1 set R2 = 0.75 and HLA-DR2 set R2 = 0.66), our results had better or equivalent correlation with experimental IC50 values. Entropic contributions are important during the binding process, since peptide’s flexibility entails large conformational changes [31]. In addition, some solvent molecules must be displaced from the corresponding binding region during a specific ligand’s docking; ergo, a desolvation energy could play an important role in determining binding energies [32]. Therefore, the strong correlation between the computed values of enthalpy ∆Hbind and IC50 experimental values indicate that these contributions were small regarding the present cases. The receptor cavities interacting with the peptide side-chains of posi- tions P1, P4, P6, P7 and P9 are called pockets, and named according their interacting peptide amino acid. Our binding energy calculations indicated the following pocket order for the HLA-DR1 set (see Figure 2): Pocket-1  Pocket-7  Pocket-6 > Pocket-4, which is in perfect agreement with the experimentally measured IC50 values. Remarkably, Tyr 308 substitution in peptide position 1 (P1) yielded a four orders of magnitude variation in IC50 values, making this one of the most important anchoring residues for HLA-DR1 set studied here. Moreover, it is well known that Pocket-1 has a strong preference for large hydrophobic side-chains, presumably being the most determinant binding site[17]. Consequently, substituting Leu for Asp in peptide position 314 (P7) (Figure 2) changed peptide binding to HLA- 9 DR1 by up to two orders of magnitude. On the other hand, replacing Thr by Asp in position 313 (P6) produced a one order of magnitude change in binding energy —big enough for altering binding affinity from a high binder to a non-binder. Substituting Val, Lys, Gln and Asn for Asp in positions 309 (P2), 310 (P3), 311 (P4), and 312 (P5), respectively, all gave high binding energies. PM7 and FMO-DFTB binding energies agreed with the respective IC50 values for the HLA-DR2 set, yielding the following pocket order: Pocket-4  Pocket-1 > Pocket-6 = Pocket-7 = Pocket-9. In this case, hydrophobic pocket 4 is the primary binding site in the PBR[19]. Substituting Val for Ala in position 89 (P1) produced a substantial change in binding energy; In this case, pocket 1 had a secondary role according to the HLA-DR2 set’s peptide binding energies —unlike the HLA-DR1 set. Furthermore, replacing Asn, Ile and Thr by Ala in peptide positions 94 (P6), 95 (P7), and 97 (P9), respectively, left HLA-DR2 binding energies unaltered. Our results for both sets revealed definite variability regarding HLA pocket binding hierarchy, relative to anchoring residues. This may well be a result of PBR differences due to each receptor’s specific pocket architecture. We explored the stabilising role of water molecules regarding the mech- anism of peptide binding to a class II MHC —HLA-DR2 set— by replacing Asn-94 (P6) for Ala in the Myelin86−98 peptide. According to the protein crystal structure, Myelin’s Asn-P6 side-chain is buried within HLA polar pocket 6 (Figure 4A.). This amino acid makes a stabilising network consist- 10 ing of five hydrogen bonds involving Glu α11, Arg β13, and Asn α62, amino acids. The guanidinium group of Arg β13 participates in two hydrogen bonds with the carboxyl oxygen from the Asn-P6 side-chain. Simultaneously, Asn- P6 side-chain amide group establishes two hydrogen bonds: one with Asn α62 backbone oxygen and another with the unprotonated oxygen from the Glu α11 side-chain carboxylic acid. The backbone hydrogen of the Asn-P6 amide group makes a hydrogen bond with Asn α62 side-chain carboxyl oxy- gen. This latter hydrogen bond remained unchanged after replacing Asn-P6 for Ala-P6, as indicated by the arrow in Fig. 4B. However, the missing hy- drogen bonds destabilised anchoring by 16.5 and 7.6 kcal/mol with PM7 and FMO2-DFTB, respectively. Such computations contradicted the binding re- ported by IC50 values for the myelin86−98 (IC50 = 5 nM) and MY A94 (IC50 = 4 nM) peptides, thereby indicating similar stabilising interactions. Ac- cordingly, these results lowered the correlations between enthalpies and IC50 values for the whole set, at both levels of theory, to R2PM7 = 0.15 and R 2 DFTB = 0.47. Interestingly, adding a water molecule at the location of the former amide Asn-P6 group created three hydrogen bonds locally stabilising the Ala-P6 side-chain. The hydrogen atoms of this water molecule coordinate the Asn α62 backbone carboxylic carbon and the unprotonated oxygen of the Glu α11 side-chain carboxylic acid, i.e. similar to the Asn-P6 side-chain. The water molecule’s oxygen makes a hydrogen bond with a hydrogen from the Ala-P6 side-chain. As can be seen in Figure 4B, the water molecule re- 11 constructed a great part of the former hydrogen bond network, which was consistently reflected in stronger binding energy. This correction alone raised correlation values between the binding energies and the IC50 values for both semi-empirical methods: PM7 (R2 = 0.61) and FMO-DFTB (R2 = 0.74) (Fig. 3). These results demonstrate the stabilising role of water molecules at the pMHC interface. 4. Conclusions Studying two different pMHC systems gave strong correlation between calculated binding energies and experimental IC50 values. Our binding energy calculations discriminated weak from strong and even very strong binders having a high level of accuracy, thereby showing the advantages of the ap- proach proposed here. It provides valuable proof that semi-empirical quan- tum mechanical methods are reliable and cost-effective for studying high complex systems —such as the pMHC HLA-DR1 and HLA-DR2 systems. The two levels of theory used here (DFTB and PM7) are fast enough — assuming conventional computational resources— to understand the pMHC binding. We anticipate increasing use of these quantum methods in the near future for drug and synthetic vaccine design. 5. Acknowledgments We would like to thank Jason Garry for revising the text. We also want to thank Dmitri Fedorov for his support in implementing the FMO-DFTB 12 method. 6. References [1] Manuel E Patarroyo and Manuel A Patarroyo. Emerging rules for subunit-based, multiantigenic, multistage chemically synthesized vac- cines. Accounts of chemical research, 41(3):377–386, 2008. [2] Linus Backert and Oliver Kohlbacher. Immunoinformatics and epitope prediction in the age of genomic medicine. Genome medicine, 7(1):1, 2015. [3] Peng Wang, John Sidney, Courtney Dow, Bianca Mothe, Alessandro Sette, and Bjoern Peters. A systematic assessment of mhc class ii pep- tide binding predictions and evaluation of a consensus approach. PLoS Comput Biol, 4(4):e1000048, 2008. [4] John Sidney, Scott Southwood, Carrie Moore, Carla Oseroff, Clemen- cia Pinilla, Howard M Grey, and Alessandro Sette. Measurement of mhc/peptide interactions by gel filtration or monoclonal antibody cap- ture. Current protocols in immunology, pages 18–3, 2013. [5] John Trowsdale and Julian C Knight. Major histocompatibility complex genomics and human disease. Annual review of genomics and human genetics, 14:301, 2013. 13 [6] Massimo Andreatta, Edita Karosiene, Michael Rasmussen, Anette Stryhn, Søren Buus, and Morten Nielsen. Accurate pan-specific pre- diction of peptide-mhc class ii binding affinity with improved binding core identification. Immunogenetics, 67(11-12):641–650, 2015. [7] Atanas Patronov and Irini Doytchinova. T-cell epitope vaccine design by immunoinformatics. Open biology, 3(1):120139, 2013. [8] Bernhard Knapp, Samuel Demharter, Reyhaneh Esmaielbeiki, and Charlotte M Deane. Current status and future challenges in t-cell recep- tor/peptide/mhc molecular dynamics simulations. Briefings in bioinfor- matics, page bbv005, 2015. [9] Anders S Christensen, Tomás Kubar, Qiang Cui, and Marcus Elst- ner. Semiempirical quantum mechanical methods for noncovalent in- teractions for chemical and biochemical applications. Chemical reviews, 116(9):5301–5337, 2016. [10] James J. P. Stewart. Optimization of parameters for semiempirical methods vi: more modifications to the nddo approximations and re- optimization of parameters. J Mol Model, 19:1–32, 2013. [11] M Elstner. The scc-dftb method and its application to biological sys- tems. Theoretical Chemistry Accounts, 116(1-3):316–325, 2006. [12] Yoshio Nishimoto, Dmitri G. Fedorov, and Stephan Irle. Density- 14 functional tight-binding combined with the fragment molecular orbital method. J. Chem. Theory Comput., 10:4801–4812, 2014. [13] Kazuo Kitaura, Eiji Ikeo, Toshio Asada, Tatsuya Nakano, and Masami Uebayasi. Fragment molecular orbital method: an approximate com- putational method for large molecules. Chemical Physics Letters, 313, 1999. [14] D. G. Fedorov, T. Nagata, and K. Kitaura. Exploring chemistry with the fragment molecular orbital method. Phys. Chem. Chem. Phys, 14, 2012. [15] Caterina Barillari, Justine Taylor, Russell Viner, and Jonathan W Essex. Classification of water molecules in protein binding sites. Journal of the American Chemical Society, 129(9):2577–2587, 2007. [16] Paula M. Petrone and Angel E. Garcia. Mhc-peptide binding is assisted by bound water molecules. Journal of Molecular Biology, 338:0–435, 2004. [17] Lawrence J Stern, Jerry H Brown, Theodore S Jardetzky, Joan C Gorga, Robert G Urban, Jack L Strominger, and Don C Wiley. Crystal structure of the human class ii mhc protein hla-dr1 complexed with an influenza virus peptide. 1994. [18] A Geluk, KE Van Meijgaarden, and TH Ottenhoff. Flexibility in t-cell 15 receptor ligand repertoires depends on mhc and t-cell receptor clonotype. Immunology, 90(3):370, 1997. [19] Kathrine J Smith, Jason Pyrdol, Laurent Gauthier, Don C Wiley, and Kai W Wucherpfennig. Crystal structure of hla-dr2 (dra* 0101, drb1* 1501) complexed with a peptide from human myelin basic protein. The Journal of experimental medicine, 188(8):1511–1520, 1998. [20] Michelle Krogsgaard, Kai W Wucherpfennig, Barbara Canella, Bjarke E Hansen, Arne Svejgaard, Jason Pyrdol, Henrik Ditzel, Cedric Raine, Jan Engberg, and Lars Fugger. Visualization of myelin basic protein (mbp) t cell epitopes in multiple sclerosis lesions using a monoclonal antibody specific for the human histocompatibility leukocyte antigen (hla)-dr2–mbp 85–99 complex. The Journal of experimental medicine, 191(8):1395–1412, 2000. [21] James Robinson, Jason A Halliwell, James D Hayhurst, Paul Flicek, Peter Parham, and Steven GE Marsh. The ipd and imgt/hla database: allele variant databases. Nucleic acids research, page gku1161, 2014. [22] Nicolas Rapin, Ilka Hoof, Ole Lund, and Morten Nielsen. Mhc motif viewer. Immunogenetics, 60(12):759–765, 2008. [23] Eric F Pettersen, Thomas D Goddard, Conrad C Huang, Gregory S Couch, Daniel M Greenblatt, Elaine C Meng, and Thomas E Ferrin. Ucsf 16 chimera—a visualization system for exploratory research and analysis. Journal of computational chemistry, 25(13):1605–1612, 2004. [24] Roland L Dunbrack. Rotamer libraries in the 21 st century. Current opinion in structural biology, 12(4):431–440, 2002. [25] James J. P. Stewart. Mopac2016. Stewart Computational Chemistry, Version 7.263W, 2016. [26] Alexander Heifetz, Giancarlo Trani, Matteo Aldeghi, Colin H MacK- innon, Paul A McEwan, Frederick A Brookfield, Ewa I Chudyk, Mike Bodkin, Zhonghua Pei, Jason D Burch, et al. Fragment molecular orbital method applied to lead optimization of novel interleukin-2 inducible t- cell kinase (itk) inhibitors. Journal of medicinal chemistry, 59(9):4352– 4363, 2016. [27] Alexeev Yuri, P. Mazanetz Michael, Ichihara Osamu, and G. Fedorov Dmitri. Gamess as a free quantum-mechanical platform for drug re- search. Current Topics in Medicinal Chemistry, 12, 2012. [28] Michael W. Schmidt, Kim K. Baldridge, Jerry A. Boatz, Steven T. El- bert, Mark S. Gordon, Jan H. Jensen, Shiro Koseki, Nikita Matsunaga, Kiet A. Nguyen, Shujun Su, Theresa L. Windus, Michel Dupuis, and John A. Montgomery Jr. General atomic and molecular electronic struc- ture system. Journal of Computational Chemistry, 14, 1993. 17 [29] Xavier Robin, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frédérique Lisacek, Jean-Charles Sanchez, and Markus Müller. proc: an open-source package for r and s+ to analyze and compare roc curves. BMC bioinformatics, 12(1):1, 2011. [30] Yuanchao Li, Yadong Yang, Ping He, and Qingwu Yang. Qm/mm study of epitope peptides binding to hla-a*0201: The roles of anchor residues and water. Chemical Biology & Drug Design, 74:611–618, 2009. [31] Andrea Ferrante; Jack Gorski. Enthalpy–entropy compensation and co- operativity as thermodynamic epiphenomena of structural flexibility in ligand–receptor interactions. Journal of Molecular Biology, 417, 2012. [32] Dmitri G. Fedorov and Kazuo Kitaura. Subsystem analysis for the fragment molecular orbital method and its application to protein-ligand binding in solution. Journal of Physical Chemistry A, 120, 2016. 18 A B α1 α1 P6 P6 P1 P1 P9 P7 P9 P4 P7 P4 β1 β1 Figure 1: Top view of the A. HLA DR1 (1DLH) and B. HLA DR2 (1BX2) PBR, including water molecules in a range of 8 Å from each peptide. Peptides and water molecules are depicted in a ball & stick model and coloured by atoms (C: cyan, H: white, O: red, N: blue). α1 (blue) and β1 (red) domains are shown as cartoons. Pockets, showed here as receptor contact atoms in a range of 3.5 Å from peptide anchor residues, are represented as surfaces. P1 (magenta), P4 (blue), P6 (orange), P7 (grey), P9 (green). 19 A Peptide Sequence PM7/COSMO FMO-DFTB IC50 1 4 67 9 HA 306-318 PKYVKQNTLKLAT -182.35 -181.54 40.0 HA D308 PKDVKQNTLKLAT -164.34 -159.02 100000.0 HA D309 PKYDKQNTLKLAT -182.85 -181.23 80.0 HA D310 PKYVDQNTLKLAT -178.33 -180.99 72.0 HA D311 PKYVKDNTLKLAT -177.71 -174.42 72.0 HA D312 PKYVKQDTLKLAT -188.47 -184.30 52.0 HA D313 PKYVKQNDLKLAT -177.93 -174.62 720.0 HA D314 PKYVKQNTDKLAT -172.85 -174.32 6600.0 R2 0.81 0.81 AUC (50/500 nM) 0.71/0.93 0.86/0.93 B -155 -160 -165 D308 P1 -170 P7 -175 D314 P4 D311 -180 D310 P6 D313 HA R² = 0.81 -185 D309 D312 -190 3 4 5 6 7 8 9 10 11 12 ln (I C 50 ) C -155 D308 -160 P1 -165 -170 D311 P6 D314 -175 P4 P7D313 R² = 0.81 -180 HA D310 D309 -185 D312 -190 3 4 5 6 7 8 9 10 11 12 ln (I C 50 ) Figure 2: HLA-DR1/HA and mono-substituted analogue (Asp) set. A. Values of exper- imentally measured affinity (IC50, nM) along with binding energies ∆Hbind, kcal/mol), coefficient of determination (R2) and ROC AUC ( 50 and 500 nM cutoff) for each method evaluated. Binding cores are underlined. B. Correlation plot between ln of IC50 and bind- ing energies calculated using the PM7 method. C. Correlation plot between ln of IC50 and binding energies calculated using the FMO-DFTB method. Substitutions in anchor residues are represented by colours: P1 (magenta), P4 (blue), P6 (orange), P7 (grey), and P9 (green). 20 ΔHbind (kcal/mol) ΔHbind (kcal/mol) A Peptide Sequence PM7/COSMO FMO-DFTB IC50 MY 86-98 NPVV 1H F F4K 6N7I V9TP -189.55 -215.71 5.0 MY A86 APVVHFFKNIVTP -190.52 -210.39 7.0 MY A87 NAVVHFFKNIVTP -188.54 -211.70 10.0 MY A88 NPAVHFFKNIVTP -185.70 -211.94 10.0 MY A89 NPVAHFFKNIVTP -181.84 -207.52 50.0 MY A90 NPVVAFFKNIVTP -182.17 -206.45 10.0 MY A91 NPVVHAFKNIVTP -181.23 -207.44 10.0 MY A92 NPVVHFAKNIVTP -174.15 -199.13 199.0 MY A93 NPVVHFFANIVTP -189.33 -214.87 4.0 MY A94 NPVVHFFKAIVTP -182.37 -209.99 4.0 MY A95 NPVVHFFKNAVTP -187.15 -212.30 4.0 MY A96 NPVVHFFKNIATP -186.10 -212.96 4.0 MY A97 NPVVHFFKNIVAP -188.33 -213.87 4.0 MY A98 NPVVHFFKNIVTA -187.25 -213.50 5.0 R2 0.61 0.75 AUC (5/50 nM) 0.74/1.0 0.94/1.0 B -172 -174 A92P4 -176 -178 -180 A91 A89 R² = 0.61 -182 A94 A90 P6 P1 -184 A96 A88-186 A95 P7 -188 A97 A98P9 A87 -190 A93 A86Myelin -192 1 2 3 4 5 6 ln (I C 50 ) C -198 A92 -200 P4 -202 -204 R² = 0.74 -206 A90 A89 -208 A91 P1 -210 P6 A94 A86 A87 -212 A95 P7 A96 A88 -214 A97 A98P9 -216 A93 Myelin -218 1 2 3 4 5 6 ln (I C 50 ) Figure 3: HLA-DR2/myelin & mono-substituted analogues (Ala) set. A. Values of exper- imentally measured affinity (IC50, nM) along with binding energies (∆Hbind, kcal/mol), coefficient of determination (R2) and ROC AUC ( 5 and 50 nM cutoff) for each method evaluated. Binding cores are underlined. B. Correlation plot between ln of IC50 and bind- ing energies calculated using the PM7 method. C. Correlation plot between Ln of IC50 and binding energies calculated using the FMO-DFTB method. Substitutions in anchor residues are represented by colours: P1 (magenta), P4 (blue), P6 (orange), P7 (grey), and P9 (green). 21 ΔHbind (kcal/mol) ΔHbind (kcal/mol) A B Asn α62 Asn α62 Asn P6 Ala P6 Arg β13 Glu α11 Glu α11 Figure 4: Hydrogen-bonding network in the P6-binding site for: A. native myelin86−98 peptide (Asn 94) and B. mono-substituted analogue (Ala 94). Interacting HLA-DR2 and P6 residues, including a water molecule that stabilises binding in the analogue peptide, are shown by a ball & stick model and coloured by atoms (C: cyan, H: white, O: red, N: blue). Peptide (yellow), α1 (blue) and β1 (red) domains are shown as cartoons. Hydrogen-bond distances are between 1.8 to 2.2 Å . A red arrow indicates the only H-bond peptide formed without the addition of a water molecule in pocket 6 . 22 Graphical Abstract 23 Conclusiones generales Aumentar el conocimiento sobre la biología de los primates no humanos, tiene un impacto directo en la mejora de la salud humana por medio de la investigación científica. Dada la estrecha relación evolutiva e identidad biológica (genética, anatómica y fisiológica) entre todos los primates -incluyendo a los seres humanos-, éstos son referentes obligados en el campo de la biología comparada y en la investigación biomédica. Siguiendo este planteamiento, este trabajo ha contribuido a la caracterización de las moléculas del complejo mayor de histocompatibilidad los monos Aotus, buscando estimar y analizar su polimorfismo. Los aportes realizados, si bien tienen como objeto contribuir al desarrollo de vacunas, también implican una contribución a aspectos más básicos de la biología del CMH en primates y de la evolución de estas proteínas. Como resultado, se han estudiado por primera vez los loci CMH-DPA y CMH-DRA de Aotus y se profundizó en el estudio del CMH-DRB, analizando los modos de evolución de estos genes y proponiendo estrategias para manejar su polimorfismo. Desde el punto de vista experimental, se realizó análisis de un microsatélite del CMH- DRB que puede constituirse en un sensible método de tipificación. Desde el punto de vista computacional, se diseñaron y aplicaron estrategias para manejar el polimorfismo del CMH-DRB tanto en humanos como en Aotus, con el fin de optimizar el proceso de diseño de péptidos modificados como candidatos a vacuna, su evaluación en el modelo animal y se brinda una estrategia para estimar su cubrimiento potencial en poblaciones humanas. Adicionalmente, se implementaron protocolos computacionales para modelar la unión CMH-péptido, usando estrategias basadas en redes neurales y se desarrollaron protocolos basados en métodos cuánticos semi-empíricos, que permiten un modelamiento más preciso y detallado de este proceso. En la búsqueda de una escala de similitud estructural para los aminoácidos, se encontró una relación entre las tendencias de estructura secundaria, masa y los patrones de 177 sustitución y mutabilidad de los aminoácidos, mostrando alta correlación con matrices de sustitución como las BLOSUM. Esta relación es inédita y muestra cómo los procesos históricos que gobiernan evolución de las proteínas tienen un contrapunto con las propiedades estructurales de los aminoácidos. Esta investigación parte de un enfoque multidisciplinario que trata con el problema central la unión de péptidos al CMH. La evolución de estas secuencias puede considerarse como un experimento, en donde la selección natural ha probado múltiples soluciones, y se han mantenido aquellas que resultan adecuadas (aunque sin garantía que sean las mejores). El análisis de estos patrones en busca de identificar cuales propiedades fisicoquímicas describen este proceso, nos muestra una perspectiva valiosa, señalando que la búsqueda de explicaciones que incorporen, tanto información evolutiva como fisicoquímica, es clave para la comprensión de este complejo proceso. 178 Perspectivas y recomendaciones El desarrollo de métodos para modelar los procesos de interacción proteína - proteína (como la interacción CMH-péptido) es uno de los campos de enorme interés para comprender las funciones de las proteínas, y son clave para estudiar procesos como metabolismo celular, transducción de señales, y reconocimiento molecular, entre otros. Los enfoques propuestos no solamente tienen aplicación al campo concreto del estudio del CMH en Aotus y Humanos, sino que tienen el potencial de aplicarse a problemas similares en otros sistemas. Las metodologías desarrolladas permitirán caracterizar con gran detalle la interacción CMH-péptido, siendo especialmente promisorio el uso de FMO-PIEDA en el estudio de residuos claves en la región de unión al péptido (bien sea por su conservación y variabilidad), lo que permitirá una visión de los factores fisicoquímicos que determinan los procesos selectivos y los patrones de variabilidad en el CMH. Las metodologías de modelamiento de la unión CMH-péptido propuestas, permitirán evaluar computacionalmente los perfiles de unión de moléculas de interés, para lo cual se pueden usar modelos estructurales generados por homología. Esto es de especial interés, dado el grado de dificultad que implica el establecimiento de datos de unión en húmedo. Usando estrategias similares, se puede generalizar la metodología propuesta para otros loci de CMH clase I y CMH clase II, con interés biomédico para otras patologías. A partir de la minería de datos sobre información cristalográfica, se adelantará el análisis de los patrones de secuencia relacionados con estructuras secundarias estables (hélice alfa, beta extendidas y hélice de PPII), con el fin de completar un marco para el diseño de péptidos basados en parámetros estructurales. 179 Referencias 1. Julian K. Professor Julian C Knight - Nuffield Department of Medicine https://www.ndm.ox.ac.uk/principal-investigators/researcher/julian-knight: Nuffield Department of Medicine, University of Oxford; 2017 (08/11/2017) 2. Neefjes J, Ovaa H. A peptide's perspective on antigen presentation to the immune system. Nature chemical biology. 2013;9(12):769-75. 3. Hershkovitz P. Two new species of night monkeys, genus Aotus (Cebidae: Platyrrhini): A preliminary report on Aotus taxonomy. Am J Primatol. 1983;4:209–43. 4. Torres O, Enciso S, Ruiz F, Silva E, Yunis I. Chromosome diversity of the genus Aotus from Colombia. Am J Primatol. 1998;44:255–75. 5. Fernandez-Duque E. Primates in Perspective. New York: Oxford University Press; 2007. p. 139– 54. 6. Defler T, Bueno M. Aotus diversity and the species problem. Primate Conservation. 2007; 22: 55- 70. 7. Defler T. Historia Natural de los Primates Colombianos. Bogotá D.C.: Universidad National de Colombia; 2010. 8. Setoguchi T, Rosenberger AL. A fossil owl monkey from La Venta, Colombia. Nature. 1987;326(6114):692-4. 9. Takai M, Nishimura T, Shigehara N, Setoguchi T. Meaning of the canine sexual dimorphism in fossil owl monkey, Aotus dindensis from the middle Miocene of La Venta, Colombia. Front Oral Biol. 2009;13:55-9. 10. Perelman P, Johnson WE, Roos C, Seuanez HN, Horvath JE, Moreira MA, et al. A molecular phylogeny of living primates. PLoS Genet. 2011;7(3):e1001342. 11. Finstermeier K, Zinner D, Brameier M, Meyer M, Kreuz E, Hofreiter M, et al. A mitogenomic phylogeny of living primates. PLoS One. 2013;8(7):e69504. 12. Menezes AN, Bonvicino CR, Seuanez HN. Identification, classification and evolution of owl monkeys (Aotus, Illiger 1811). BMC Evol Biol. 2010;10:248. 13. Aquino R, Encarnación F. Characteristics and use of sleeping site in Aotus (Cebidae: Primates) in the Amazonian lowland of Perú. Am J Primatol. 1986;11:319-31. 14. Aquino R, Encarnación F. Population densities and geographic distribution of night monkeys (Aotus nancymai and Aotus vociferans) (Cebidae: Primates) in Northeastern Perú. American Journal of Primatology. 1988;14:375–81. 15. Aquino R, Encarnación F. Aotus: The Owl Monkey. San Diego: Academic Press; 1994. p. 59–95. 16. Fernandez-Duque E, Rotundo M, Sloan C. Density and population structure of owl monkeys (Aotus azarai) in the Argentinean Chaco. Am J Primatol. 2001;53:99–108. 17. Chapman A, Chapman J. Implications of Small Scale Variation in Ecological Conditions for the Diet and Density of Red Colobus Monkeys. Primates. 1999; 40: 215-31. 18. Ankel-Simons F, Rasmussen DT. Diurnality, nocturnality, and the evolution of primate visual systems. Am J Phys Anthropol. 2008;Suppl 47:100-17. 19. Hernández A, Díaz A. Estado preliminar poblacional del mono nocturno (Aotus sp. Humboldt, 1812) en las comunidades Indígenas Siete de Agosto y San Juan de Atacuari- Puerto Nariño, Departamento de Amazonas, Colombia. Ibagué, Colombia.: Universidad del Tolima; 2011. 20. Bontrop R. Non-human primates: essential partners in biomedical research. Immunol Rev. 2001;183:5-9. 21. Langhorne J, Buffet P, Galinski M, Good M, Harty J, Leroy D, et al. The relevance of non-human primate and rodent malaria models for humans. Malar J. 2011;10(1):23. 22. Ward JM, Vallender EJ. The resurgence and genetic implications of New World primates in biomedical research. Trends Genet. 2012;28(12):586-91. 23. Rodriguez LE, Curtidor H, Urquiza M, Cifuentes G, Reyes C, Patarroyo ME. Intimate molecular interactions of P. falciparum merozoite proteins involved in invasion of red blood cells and their implications for vaccine design. Chem Rev. 2008;108(9):3656-705. 180 24. Patarroyo ME, Bermudez A, Patarroyo MA. Structural and immunological principles leading to chemically synthesized, multiantigenic, multistage, minimal subunit-based vaccine development. Chem Rev. 2011;111(5):3459-507. 25. Young MD, Porter JA, Jr., Johnson CM. Plasmodium vivax transmitted from man to monkey to man. Science. 1966;153(3739):1006-7. 26. Contacos PG, Collins WE. Falciparum malaria transmissible from monkey to man by mosquito bite. Science. 1968;161(3836):56-. 27. Gysin J. Malaria: parasite biology, pathogenesis and protection. Washington DC.: ASM.; 1988. p. 419–39. 28. Lujan R, Dennis V, Chapman WJ, Hanson W. Blastogenic responses of peripheral blood leukocytes from owl monkeys experimentally infected with Leishmania braziliensis panamensis. Am J Trop Med Hyg. 1986;35(6):1103-9. 29. Pico de Coaña Y, Rodriguez J, Guerrero E, Barrero C, Rodriguez R, Mendoza M, et al. A highly infective Plasmodium vivax strain adapted to Aotus monkeys: quantitative haematological and molecular determinations useful for P. vivaxmalaria vaccine development. Vaccine. 2003;21:3930–7. 30. Polotsky Y, Vassell R, Binn L, Asher L. Immunohistochemical detection of cytokines in tissues of Aotus monkeys infected with hepatitis A virus. Ann N Y AcadSci. 1994;730:318–21. 31. Noya O, Gonzalez-Rico S, Rodriguez R, Arrechedera H, Patarroyo M, Alarcon D. Schistosomamansoniinfection in owl monkeys (Aotus nancymai): evidence for the early elimination of adult worms. Acta Trop. 1998;70:257–67. 32. Bone J, Soave O. Experimental tuberculosis in owl monkeys (Aotus trivirgatus). Lab Anim Care. 1970;5(946-8). 33. Jones F, Baqar S, Gozalo A, Nunez G, Espinoza N, Reyes S, et al. New World monkey Aotus nancymae as a model for Campylobacter jejuni infection and immunity. Infect Immun. 2006;74(1):790-3. 34. Ding Y, Casagrande V. The distribution and morphology of LGN K pathway axons within the layers and CO blobs of owl monkey V1. Vis Neurosci. 1997;14:691-704. 35. Cadavid LF, Lun CM. Lineage-specific diversification of killer cell Ig-like receptors in the owl monkey, a New World primate. Immunogenetics. 2009;61(1):27-41. 36. Castillo F, Guerrero C, Trujillo E, Delgado G, Martinez P, Salazar LM, et al. Identifying and structurally characterizing CD1b in Aotus nancymaae owl monkeys. Immunogenetics. 2004;56(7):480-9. 37. del Castillo H, Vernot JP. Characterizing the CD3 epsilon chain from the New World primate Aotus nancymaae. Biomedica. 2008;28(2):262-70. 38. Montoya GE, Vernot JP, Patarroyo ME. Partial characterization of the CD45 phosphatase cDNA in the owl monkey (Aotus vociferans). Am J Primatol. 2002;57(1):1-11. 39. Montoya GE, Vernot JP, Patarroyo ME. Comparative analysis of CD45 proteins in primate context: owl monkeys vs humans. Tissue Antigens. 2004;64(2):165-72. 40. Diaz OL, Daubenberger CA, Rodriguez R, Naegeli M, Moreno A, Patarroyo ME, et al. Immunoglobulin kappa light-chain V, J, and C gene sequences of the owl monkey Aotus nancymaae. Immunogenetics. 2000;51(3):212-8. 41. Hernandez EC, Suarez CF, Parra CA, Patarroyo MA, Patarroyo ME. Identification of five different IGHV gene families in owl monkeys (Aotus nancymaae). Tissue Antigens. 2005;66(6):640-9. 42. Favre N, Daubenberger C, Marfurt J, Moreno A, Patarroyo M, Pluschke G. Sequence and diversity of T-cell receptor alpha V, J, and C genes of the owl monkey Aotus nancymaae. Immunogenetics. 1998;48(4):253-9. 43. Guerrero JE, Pacheco DP, Suarez CF, Martinez P, Aristizabal F, Moncada CA, et al. Characterizing T-cell receptor gamma-variable gene in Aotus nancymaae owl monkey peripheral blood. Tissue Antigens. 2003;62(6):472-82. 44. Moncada CA, Guerrero E, Cardenas P, Suarez CF, Patarroyo ME, Patarroyo MA. The T-cell receptor in primates: identifying and sequencing new owl monkey TRBV gene sub-groups. Immunogenetics. 2005;57(1-2):42-52. 45. Hernandez EC, Suarez CF, Mendez JA, Echeverry SJ, Murillo LA, Patarroyo ME. Identification, cloning, and sequencing of different cytokine genes in four species of owl monkey. Immunogenetics. 2002;54(9):645-53. 181 46. Spirig R, Peduzzi E, Patarroyo ME, Pluschke G, Daubenberger CA. Structural and functional characterisation of the Toll like receptor 9 of Aotus nancymaae, a non-human primate model for malaria vaccine development. Immunogenetics. 2005;57(3-4):283-8. 47. Delgado G, Parra C, Patarroyo M. Phenotypical and functional characterization of non-human primate Aotus spp. dendritic cells and their use as a tool for characterizing immune response to protein antigens. Vaccine. 2005;23(26):3386-95. 48. Daubenberger CA, Salomon M, Vecino W, Hubner B, Troll H, Rodriques R, et al. Functional and structural similarity of V gamma 9V delta 2 T cells in humans and Aotus monkeys, a primate infection model for Plasmodium falciparum malaria. J Immunol. 2001;167(11):6421-30. 49. Pinzon-Charry A, Vernot JP, Rodriguez R, Patarroyo ME. Proliferative response of peripheral blood lymphocytes to mitogens in the owl monkey Aotus nancymae. J Med Primatol. 2003;32(1):31-8. 50. Daubenberger CA, Spirig R, Patarroyo ME, Pluschke G. Flow cytometric analysis on cross- reactivity of human-specific CD monoclonal antibodies with splenocytes of Aotus nancymaae, a non- human primate model for biomedical research. Vet Immunol Immunopathol. 2007;119(1-2):14-20. 51. Glass EJ. Genetic variation and responses to vaccines. Anim Health Res Rev. 2004;5(2):197- 208. 52. Spurgin LG, Richardson DS. How pathogens drive genetic diversity: MHC, mechanisms and misunderstandings. Proc Biol Sci. 2010;277(1684):979-88. 53. Suarez CF, Cardenas PP, Llanos-Ballestas EJ, Martinez P, Obregon M, Patarroyo ME, et al. alpha(1) and alpha(2) domains of Aotus MHC Class I and Catarrhini MHC class Ia share similar characteristics. Tissue Antigens. 2003;61(5):362-73. 54. Cardenas PP, Suarez CF, Martinez P, Patarroyo ME, Patarroyo MA. MHC class I genes in the owl monkey: mosaic organisation, convergence and loci diversity. Immunogenetics. 2005;56(11):818-32. 55. Cadavid LF, Shufflebotham C, Ruiz FJ, Yeager M, Hughes AL, Watkins DI. Evolutionary instability of the major histocompatibility complex class I loci in New World primates. Proceedings of the National Academy of Sciences of the United States of America. 1997;94(26):14536-41. 56. Nino-Vasquez JJ, Vogel D, Rodriguez R, Moreno A, Patarroyo ME, Pluschke G, et al. Sequence and diversity of DRB genes of Aotus nancymaae, a primate model for human malaria parasites. Immunogenetics. 2000;51(3):219-30. 57. Patarroyo ME, Cifuentes G, Baquero J. Comparative molecular and three-dimensional analysis of the peptide-MHC II binding region in both human and Aotus MHC-DRB molecules confirms their usefulness in antimalarial vaccine development. Immunogenetics. 2006;58(7):598-606. 58. Diaz D, Naegeli M, Rodriguez R, Nino-Vasquez JJ, Moreno A, Patarroyo ME, et al. Sequence and diversity of MHC DQA and DQB genes of the owl monkey Aotus nancymaae. Immunogenetics. 2000;51(7):528-37. 59. Diaz D, Daubenberger CA, Zalac T, Rodriguez R, Patarroyo ME. Sequence and expression of MHC-DPB1 molecules of the New World monkey Aotus nancymaae, a primate model for Plasmodium falciparum. Immunogenetics. 2002;54(4):251-9. 60. Suarez CF, Patarroyo ME, Trujillo E, Estupinan M, Baquero JE, Parra C, et al. Owl monkey MHC- DRB exon 2 reveals high similarity with several HLA-DRB lineages. Immunogenetics. 2006;58(7):542-58. 61. Suarez CF, Patarroyo MA, Patarroyo ME. Characterisation and comparative analysis of MHC- DPA1 exon 2 in the owl monkey (Aotus nancymaae). Gene. 2011;470(1-2):37-45. 62. Lopez C, Suarez CF, Cadavid LF, Patarroyo ME, Patarroyo MA. Characterising a microsatellite for DRB typing in Aotus vociferans and Aotus nancymaae (Platyrrhini). PLoS One. 2014;9(5):e96973. 63. Baquero JE, Miranda S, Murillo O, Mateus H, Trujillo E, Suarez C, et al. Reference strand conformational analysis (RSCA) is a valuable tool in identifying MHC-DRB sequences in three species of Aotus monkeys. Immunogenetics. 2006;58(7):590-7. 64. Suárez CF, Pabón L, Barrera A, Aza-Conde J, Patarroyo MA, Patarroyo ME. Structural analysis of owl monkey MHC-DR shows that fully-protective malaria vaccine components can be readily used in humans. Biochemical and Biophysical Research Communications. 2017. 65. Stephens R, Horton R, Humphray S, Rowen L. Gene organisation, sequence variation and isochore structure at the centromeric boundary of the human MHC. J Mol Biol. 1999;291:789-99. 66. Watanabe A, Shiina T, Shimizu S, Hosomichi K, Yanagiya K, Kita Y, et al. A BAC-based contig map of the cynomolgus macaque (Macaca fascicularis) major histocompatibility complex genomic region. Genomics. 2007;89(3):402-12. 182 67. Tregenza T, Wedell N. Genetic compatibility mate choice and patterns of parentage. Invited Review Mol Ecol. 2000;9:1013-27. 68. Hughes A, Hughes M. Natural selection on the peptide-binding regions of major histocompatibility complex molecules. Immunogenetics. 1995;42:233-43. 69. Sommer S. The importance of immune gene variability (MHC) in evolutionary ecology and conservation. Front Zool. 2005;2(16:1–16:18). 70. Robinson J, Halliwell JA, McWilliam H, Lopez R, Parham P, Marsh SG. The IMGT/HLA database. Nucleic Acids Res. 2013;41(Database issue):D1222-7. 71. Sutton JT, Nakagawa S, Robertson BC, Jamieson IG. Disentangling the roles of natural selection and genetic drift in shaping variation at MHC immunity genes. Mol Ecol. 2011;20(21):4408-20. 72. Yeager M, Hughes AL. Evolution of the mammalian MHC: natural selection, recombination, and convergent evolution. Immunol Rev. 1999;167:45-58. 73. Hughes AL, Yeager M. Natural selection at major histocompatibility complex loci of vertebrates. Annu Rev Genet. 1998;32:415-35. 74. Hedrick PW. Pathogen resistance and genetic variation at MHC loci. Evolution. 2002;56(10):1902-8. 75. Potts WK, Wakeland EK. Evolution of MHC genetic diversity: a tale of incest, pestilence and sexual preference. Trends Genet. 1993;9(12):408-12. 76. Worley K, Collet J, Spurgin LG, Cornwallis C, Pizzari T, Richardson DS. MHC heterozygosity and survival in red junglefowl. Mol Ecol. 2010;19(15):3064-75. 77. Ejsmond MJ, Babik W, Radwan J. MHC allele frequency distributions under parasite-driven selection: A simulation model. BMC Evol Biol. 2010;10:332. 78. Apanius V, Penn D, Slev PR, Ruff LR, Potts WK. The nature of selection on the major histocompatibility complex. Crit Rev Immunol. 1997;17(2):179-224. 79. Potts WK, Slev PR. Pathogen-based models favoring MHC genetic diversity. Immunol Rev. 1995;143:181-97. 80. Borghans JA, Beltman JB, De Boer RJ. MHC polymorphism under host-pathogen coevolution. Immunogenetics. 2004;55(11):732-9. 81. Potts WK, Manning CJ, Wakeland EK. The role of infectious disease, inbreeding and mating preferences in maintaining MHC genetic diversity: an experimental test. Philos Trans R Soc Lond B Biol Sci. 1994;346(1317):369-78. 82. Jordan WC, Bruford MW. New perspectives on mate choice and the MHC. Heredity. 1998;81 ( Pt 2):127-33. 83. Huchard E, Raymond M, Benavides J, Marshall H, Knapp LA, Cowlishaw G. A female signal reflects MHC genotype in a social primate. BMC Evol Biol. 2010;10:96. 84. Huchard E, Knapp LA, Wang J, Raymond M, Cowlishaw G. MHC, mate choice and heterozygote advantage in a wild social primate. Mol Ecol. 2010;19(12):2545-61. 85. Setchell JM, Huchard E. The hidden benefits of sex: evidence for MHC-associated mate choice in primate societies. Bioessays. 2010;32(11):940-8. 86. Roberts SC, Little AC, Gosling LM, Jones BC, Perrett DI, Carter V, et al. MHC-assortative facial preferences in humans. Biol Lett. 2005;1(4):400-3. 87. Havlicek J, Roberts SC. MHC-correlated mate choice in humans: a review. Psychoneuroendocrinology. 2009;34(4):497-512. 88. Manning CJ, Wakeland EK, Potts WK. Communal nesting patterns in mice implicate MHC genes in kin recognition. Nature. 1992;360(6404):581-3. 89. Yamazaki K, Beauchamp GK. Genetic basis for MHC-dependent mate choice. Adv Genet. 2007;59:129-45. 90. Wedekind C, Chapuisat M, Macas E, Rulicke T. Non-random fertilization in mice correlates with the MHC and something else. Heredity. 1996;77 ( Pt 4):400-9. 91. Dorak MT, Lawson T, Machulla HK, Mills KI, Burnett AK. Increased heterozygosity for MHC class II lineages in newborn males. Genes Immun. 2002;3(5):263-9. 92. Klein J, Sato A, Nagl S, O’hUigín C. Molecular trans-species polymorphism. Annu Rev Ecol Syst. 1998;29:1-21. 93. Klein J, Sato A, Nikolaidis N. MHC, TSP, and the origin of species: from immunogenetics to evolutionary genetics. Annu Rev Genet. 2007;41:281-304. 183 94. Klein J, Satta Y, Takahata N, O'HUigin C. Trans-specific Mhc polymorphism and the origin of species in primates. J Med Primatol. 1993;22(1):57-64. 95. Trtkova K, Mayer WE, O'Huigin C, Klein J. Mhc-DRB genes and the origin of New World monkeys. Molecular phylogenetics and evolution. 1995;4(4):408-19. 96. O'HUigin C. Quantifying the degree of convergence in primate Mhc-DRB genes. Immunol Rev. 1995;143:123-40. 97. Doxiadis GG, de Groot N, de Groot NG, Doxiadis, II, Bontrop RE. Reshuffling of ancient peptide binding motifs between HLA-DRB multigene family members: old wine served in new skins. Mol Immunol. 2008;45(10):2743-51. 98. Slierendregt BL, Otting N, Kenter M, Bontrop RE. Allelic diversity at the Mhc-DP locus in rhesus macaques (Macaca mulatta). Immunogenetics. 1995;41(1):29-37. 99. Bontrop RE, Otting N, de Groot NG, Doxiadis GG. Major histocompatibility complex class II polymorphisms in primates. Immunol Rev. 1999;167:339-50. 100. Robinson J, Waller MJ, Parham P, de Groot N, Bontrop R, Kennedy LJ, et al. IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex. Nucleic Acids Res. 2003;31(1):311-4. 101. Steiper M, Young N. Primate molecular divergence dates. Molecular phylogenetics and evolution. 2006;41:384–94. 102. Wang JH, Reinherz EL. Structural basis of T cell recognition of peptides bound to MHC molecules. Mol Immunol. 2002;38(14):1039-49. 103. Backert L, Kohlbacher O. Immunoinformatics and epitope prediction in the age of genomic medicine. Genome Med. 2015;7:119. 104. Lafuente EM, Reche PA. Prediction of MHC-peptide binding: a systematic and comprehensive overview. Curr Pharm Des. 2009;15(28):3209-20. 105. Lenz TL. Computational prediction of MHC II-antigen binding supports divergent allele advantage and explains trans-species polymorphism. Evolution. 2011;65(8):2380-90. 106. Doytchinova IA, Flower DR. In silico identification of supertypes for class II MHCs. Journal of Immunology. 2005;174(11):7085-95. 107. Doytchinova IA, Guan P, Flower DR. Identifiying human MHC supertypes using bioinformatic methods. Journal of Immunology. 2004;172(7):4314-23. 108. Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, et al. Definition of supertypes for HLA molecules using clustering of specificity matrices. Immunogenetics. 2004;55(12):797- 810. 109. Schwensow N, Fietz J, Dausmann K, Sommer S. Neutral versus adaptive genetic variation in parasite resistance: importance of major histocompatibility complex supertypes in a free-ranging primate. Heredity. 2007;99(3):265-77. 110. Sepil I, Lachish S, Hinks AE, Sheldon BC. Mhc supertypes confer both qualitative and quantitative resistance to avian malaria infections in a wild bird population. Proceedings of the Royal Society of London B: Biological Sciences. 2013;280(1759):20130134. 111. Hill AV. Common West African HLA antigens are associated with protection from severe malaria. Nature. 1991;352(6336):595-600. 112. Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters B. A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput Biol. 2008;4(4):e1000048. 113. Sidney J, Southwood S, Moore C, Oseroff C, Pinilla C, Grey HM, et al. Measurement of MHC/peptide interactions by gel filtration or monoclonal antibody capture. Curr Protoc Immunol. 2013;Chapter 18:Unit 18 3. 114. Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U, et al. Generation of tissue- specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol. 1999;17(6):555-61. 115. Zhang L, Chen Y, Wong HS, Zhou S, Mamitsuka H, Zhu S. TEPITOPEpan: extending TEPITOPE for peptide binding prediction covering over 700 HLA-DR molecules. PLoS One. 2012;7(2):e30483. 116. Rothbard JB, Taylor WR. A sequence pattern common to T cell epitopes. Embo J. 1988;7(1):93- 100. 184 117. Udaka K, Wiesmuller KH, Kienle S, Jung G, Tamamura H, Yamagishi H, et al. An automated prediction of MHC class I-binding peptides based on positional scanning with peptide libraries. Immunogenetics. 2000;51(10):816-28. 118. Peters B, Sette A. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics. 2005;6:132. 119. Sidney J, Assarsson E, Moore C, Ngo S, Pinilla C, Sette A, et al. Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries. Immunome Res. 2008;4:2. 120. Nielsen M, Lundegaard C, Lund O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics. 2007;8:238. 121. Zhang W, Liu J, Niu Y. Quantitative prediction of MHC-II binding affinity using particle swarm optimization. Artif Intell Med. 2010;50(2):127-32. 122. Andreatta M, Karosiene E, Rasmussen M, Stryhn A, Buus S, Nielsen M. Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification. Immunogenetics. 2015;67(11-12):641-50. 123. Lundegaard C, Lund O, Nielsen M. Prediction of epitopes using neural network based methods. J Immunol Methods. 2011;374(1-2):26-34. 124. Nielsen M, Lundegaard C, Worning P, Lauemoller SL, Lamberth K, Buus S, et al. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 2003;12(5):1007-17. 125. Roomp K, Antes I, Lengauer T. Predicting MHC class I epitopes in large datasets. BMC Bioinformatics. 2010;11:90. 126. Nielsen M, Justesen S, Lund O, Lundegaard C, Buus S. NetMHCIIpan-2.0 - Improved pan- specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure. Immunome Res. 2010;6:9. 127. Noguchi H, Kato R, Hanai T, Matsubara Y, Honda H, Brusic V, et al. Hidden Markov model-based prediction of antigenic peptides that interact with MHC class II molecules. J Biosci Bioeng. 2002;94(3):264-70. 128. Nielsen M, Lund O, Buus S, Lundegaard C. MHC class II epitope predictive algorithms. Immunology. 2010;130(3):319-28. 129. Vider-Shalit T, Louzoun Y. MHC-I prediction using a combination of T cell epitopes and MHC-I binding peptides. J Immunol Methods. 2011;374(1-2):43-6. 130. Liu W, Meng X, Xu Q, Flower DR, Li T. Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models. BMC Bioinformatics. 2006;7:182. 131. Donnes P. Support vector machine-based prediction of MHC-binding peptides. Methods Mol Biol. 2007;409:273-82. 132. Agudelo W, Patarroyo M. Quantum chemical analysis of MHC-peptide interactions for vaccine design. Mini reviews in medicinal chemistry. 2010;10(8):746-58. 133. Wan S, Knapp B, Wright DW, Deane CM, Coveney PV. Rapid, Precise, and Reproducible Prediction of Peptide–MHC Binding Affinities from Molecular Dynamics That Correlate Well with Experiment. J Chem Theory Comput. 2015;11(7):3346-56. 134. Patronov A, Doytchinova I. T-cell epitope vaccine design by immunoinformatics. Open Biol. 2013;3(1):120139. 135. Bordner AJ, Abagyan R. Ab initio prediction of peptide-MHC binding geometry for diverse class I MHC allotypes. Proteins. 2006;63(3):512-26. 136. Zhang H, Wang P, Papangelopoulos N, Xu Y, Sette A, Bourne PE, et al. Limitations of Ab initio predictions of peptide binding to MHC class II molecules. PLoS One. 2010;5(2):e9272. 137. Bordner AJ. Towards universal structure-based prediction of class II MHC epitopes for diverse allotypes. PLoS One. 2010;5(12):e14383. 138. Yanover C, Bradley P. Large-scale characterization of peptide-MHC binding landscapes with structural simulations. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(17):6981-6. 139. Knapp B, Omasits U, Schreiner W. Side chain substitution benchmark for peptide/MHC interaction. Protein Sci. 2008;17(6):977-82. 185 140. Tong JC, Tan TW, Ranganathan S. Modeling the structure of bound peptide ligands to major histocompatibility complex. Protein Sci. 2004;13(9):2523-32. 141. Bui HH, Schiewe AJ, von Grafenstein H, Haworth IS. Structural prediction of peptides binding to MHC class I molecules. Proteins-Structure Function and Genetics. 2006;63(1):43-52. 142. Cárdenas C, Ortiz M, Balbín A, Villaveces JL, Patarroyo ME. Allele effects in MHC–peptide interactions: A theoretical analysis of HLA-DRβ1* 0101-HA and HLA-DRβ1* 0401-HA complexes. Biochemical and biophysical research communications. 2005;330(4):1162-7. 143. Balbín A, Cárdenas C, Villaveces JL, Patarroyo ME. A theoretical analysis of HLA-DRβ1* 0301– CLIP complex using the first three multipolar moments of the electrostatic field. Biochimie. 2006;88(9):1307-11. 144. Bohorquez HJ, Obregon M, Cárdenas C, Llanos E, Suárez C, Villaveces JL, et al. Electronic energy and multipolar moments characterize amino acid side chains into chemically related groups. The Journal of Physical Chemistry A. 2003;107(47):10090-7. 145. Cárdenas C, Villaveces JL, Bohórquez H, Llanos E, Suárez C, Obregón M, et al. Quantum chemical analysis explains hemagglutinin peptide–MHC Class II molecule HLA-DRβ1* 0101 interactions. Biochemical and biophysical research communications. 2004;323(4):1265-77. 146. Cárdenas C, Villaveces JL, Suárez C, Obregón M, Ortiz M, Patarroyo ME. A comparative study of MHC Class-II HLA-DRβ1* 0401-Col II and HLA-DRβ1* 0101-HA complexes: a theoretical point of view. Journal of structural biology. 2005;149(1):38-52. 147. Cárdenas C, Obregón M, Balbín A, Villaveces JL, Patarroyo ME. Wave function analysis of MHC–peptide interactions. Journal of Molecular Graphics and Modelling. 2007;25(5):605-15. 148. Agudelo WA, Galindo JF, Ortiz M, Villaveces JL, Daza EE, Patarroyo ME. Variations in the electrostatic landscape of class II human leukocyte antigen molecule induced by modifications in the myelin basic protein peptide: a theoretical approach. PLoS One. 2009;4(1):e4164. 149. Bohórquez HJ, Cárdenas C, Matta CF, Boyd RJ, Patarroyo ME. Methods in biocomputational chemistry: a lesson from the amino acids. Quantum Biochemistry. 2010:403-21. 150. Stone JE, Hardy DJ, Ufimtsev IS, Schulten K. GPU-accelerated molecular modeling coming of age. Journal of Molecular Graphics and Modelling. 2010;29(2):116-25. 151. Akimov AV, Prezhdo OV. Large-scale computations in chemistry: a bird’s eye view of a vibrant field. Chemical reviews. 2015;115(12):5797-890. 152. Stewart JJ. Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters. Journal of molecular modeling. 2013;19(1):1- 32. 153. Elstner M. The SCC-DFTB method and its application to biological systems. Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta). 2006;116(1):316- 25. 154. Christensen AS, Kubar Ts, Cui Q, Elstner M. Semiempirical quantum mechanical methods for noncovalent interactions for chemical and biochemical applications. Chemical reviews. 2016;116(9):5301- 37. 155. Kitaura K, Ikeo E, Asada T, Nakano T, Uebayasi M. Fragment molecular orbital method: an approximate computational method for large molecules. Chemical Physics Letters. 1999;313(3):701-6. 156. Fedorov DG, Nagata T, Kitaura K. Exploring chemistry with the fragment molecular orbital method. Physical Chemistry Chemical Physics. 2012;14(21):7562-77. 157. Fedorov DG, Kitaura K. Pair interaction energy decomposition analysis. Journal of computational chemistry. 2007;28(1):222-37. 158. González R, Suárez CF, Bohórquez HJ, Patarroyo MA, Patarroyo ME. Semi-empirical quantum evaluation of peptide–MHC class II binding. Chemical Physics Letters. 2017;668:29-34. 159. Patiño LC, Beau I, Carlosama C, Buitrago JC, González R, Suárez CF, et al. New mutations in non-syndromic primary ovarian insufficiency patients identified via whole-exome sequencing. Human Reproduction. 2017:1-9. 160. Patarroyo ME, Arévalo-Pinzón G, Reyes C, Moreno-Vranich A, Patarroyo MA. Malaria parasite survival depends on conserved binding peptides' critical biological functions. Current issues in molecular biology. 2016;18:57-78. 186 161. Alba MP, Suarez CF, Varela Y, Patarroyo MA, Bermudez A, Patarroyo ME. TCR-contacting residues orientation and HLA-DRbeta* binding preference determine long-lasting protective immunity against malaria. Biochem Biophys Res Commun. 2016;477(4):654-60. 162. Bermudez A, Calderon D, Moreno-Vranich A, Almonacid H, Patarroyo MA, Poloche A, et al. Gauche(+) side-chain orientation as a key factor in the search for an immunogenic peptide mixture leading to a complete fully protective vaccine. Vaccine. 2014;32(18):2117-26. 163. Patarroyo ME, Moreno-Vranich A, Bermudez A. Phi (Phi) and psi (Psi) angles involved in malarial peptide bonds determine sterile protective immunity. Biochem Biophys Res Commun. 2012;429(1-2):75- 80. 164. Beck HP, Felger I, Barker M, Bugawan T, Genton B, Alexander N, et al. Evidence of HLA class II association with antibody response against the malaria vaccine SPF66 in a naturally exposed population. Am J Trop Med Hyg. 1995;53(3):284-8. 165. Patarroyo ME, Vinasco J, Amador R, Espejo F, Silva Y, Moreno A, et al. Genetic control of the immune response to a synthetic vaccine against Plasmodium falciparum. Parasite Immunol. 1991;13(5):509-16. 166. Patarroyo MA, Bermudez A, Lopez C, Yepes G, Patarroyo ME. 3D analysis of the TCR/pMHCII complex formation in monkeys vaccinated with the first peptide inducing sterilizing immunity against human malaria. PLoS One. 2010;5(3):e9771. 167. Cifuentes G, Patarroyo ME, Urquiza M, Ramirez LE, Reyes C, Rodriguez R. Distorting malaria peptide backbone structure to enable fitting into MHC class II molecules renders modified peptides immunogenic and protective. J Med Chem. 2003;46(11):2250-3. 168. Stern LJ, Wiley DC. Antigenic peptide binding by class I and class II histocompatibility proteins. Structure. 1994;2(4):245-51. 169. Madden DR. The three-dimensional structure of peptide-MHC complexes. Annu Rev Immunol. 1995;13:587-622. 170. Barber LD, Parham P. Peptide binding to major histocompatibility complex molecules. Annu Rev Cell Biol. 1993;9:163-206. 171. Adzhubei AA, Sternberg MJ, Makarov AA. Polyproline-II helix in proteins: structure and function. Journal of molecular biology. 2013;425(12):2100-32. 172. Bohórquez HJ, Suárez CF, Patarroyo ME. Mass & secondary structure propensity of amino acids explain their mutability and evolutionary replacements. Scientific Reports. 2017;7(1):7717. 173. González-Galarza FF, Takeshita LY, Santos EJ, Kempson F, Maia MHT, Silva ALSd, et al. Allele frequency net 2015 update: new features for HLA epitopes, KIR and disease and HLA adverse drug reaction associations. Nucleic acids research. 2014;43(D1):D784-D8. 174. Berkholz DS, Krenesky PB, Davidson JR, Karplus PA. Protein Geometry Database: a flexible engine to explore backbone conformations and their relationships to covalent geometry. Nucleic acids research. 2009;38(suppl_1):D320-D5. 187 Anexo 1. Diccionario de bolsillos del CMH-DRB 188 Humano/Aotus MHC-DRB Bolsillo 1 - Perfiles YNYVVFTV 1% Others HNYAVFTV 2% 6% HNYVGFTV 49% HLA-DRB P1 HNYVVFTV 42% 20 Perfiles 2100 alelos Others 7% YNYVAFTV 7% HNYVGFTV 37% Others HNYAVFTV 6% Aotus HNYVFFTV 5% 14% MHC-DRB P1 HNYVGFTV 48% HNYVVFTV 35% HLA + Aotus MHC-DRB P1 11 perfiles 215 alelos HNYVVFTV 41% Figura A Tabla 1 Perfiles de bolsillo más frecuentes en el HLA-DRB (>60%) Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*010101 51.7 H N Y V G F T V E C L E F Q R R A A Y C W Q L K F E R C R W F C S V D Y W HLA-DRB1*010201 14.9 H N Y A V F T V E C L E F Q R R A A Y C W Q L K F E R C R W F C S V D Y W HLA-DRB1*0104 2.3 H N Y V V F T V E C L E F Q R R A A Y C W Q L K F E R C R W F C S V D Y W HLA-DRB1*0109 2.3 H N Y V G F T V E C L E F Q A R A A Y C W Q L K F E R C A W F C S V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*120101 59.6 H N Y A V F T V E C L E F D R R A A Y C E Y S T G E R H R E G H L L V S W HLA-DRB1*121601 8.5 H N Y V G F T V E C L E F D R R A A Y C E Y S T G E R H R E G H L L V S W HLA-DRB1*120302 6.4 H N Y V V F T V E C L E F D R R A A Y C E Y S T G E R H R E G H L L V S W HLA-DRB1*1204 4.3 H N Y A V F T V E C L E F D R R A A Y C E Y S T G E R H R E G H L L D Y W HLA-DRB1*1205 4.3 H N Y A V F T V E C L E F D R R A A Y C E Y S T G E R H R E G H F L V S W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*03010101 51.8 H N Y V V F T V E C Y D F Q K R G R Y C E Y S T S D R Y K E S Y N V D Y W HLA-DRB1*030201 5.3 H N Y V G F T V E C F E F Q K R G R Y C E Y S T S E R Y K E S Y N V D Y W HLA-DRB1*030501 4.4 H N Y V G F T V E C Y D F Q K R G R Y C E Y S T S D R Y K E S Y N V D Y W HLA-DRB1*0325 2.6 H N Y V V F T V E C Y D F Q K R G R Y C E Y S T S D R Y K E S Y Y V D Y W HLA-DRB1*0357 1.8 H N Y V A F T V E C Y D F Q K R G R Y C E Y S T S D R Y K E S Y N V D Y W HLA-DRB1*0340 1.8 H N Y V G F T V E C F D F Q K R G R Y C E Y S T S D R Y K E S Y Y V D Y W HLA-DRB1*0326 1.8 H N Y V V F T V E C Y D F Q K R G R Y C E Y S T S D R Y K E S Y N A D Y W HLA-DRB1*031301 1.8 H N Y V V F T V E C Y D F Q K R G R Y C E Y S T S D R Y K E S Y N V D S W HLA-DRB1*030401 1.8 H N Y V V F T V E C Y D F Q K R G R Y C E Y S T S D R Y K E S Y S V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*040101 13.9 H N Y V G F T V E C F D F Q K R A A Y C E Q V K H D R Y K E H Y Y V D Y W HLA-DRB1*040501 13.0 H N Y V G F T V E C F D F Q R R A A Y C E Q V K H D R Y R E H Y Y V S Y W HLA-DRB1*040301 10.6 H N Y V V F T V E C F D F Q R R A E Y C E Q V K H D R Y R E H Y Y V D Y W HLA-DRB1*040401 7.7 H N Y V V F T V E C F D F Q R R A A Y C E Q V K H D R Y R E H Y Y V D Y W HLA-DRB1*040201 3.8 H N Y V V F T V E C F D F D E R A A Y C E Q V K H D R Y E E H Y Y V D Y W HLA-DRB1*040601 3.8 H N Y V V F T V E C F D F Q R R A E Y C E Q V K H D R Y R E H Y S V D Y W HLA-DRB1*040701 2.9 H N Y V G F T V E C F D F Q R R A E Y C E Q V K H D R Y R E H Y Y V D Y W HLA-DRB1*040801 2.4 H N Y V G F T V E C F D F Q R R A A Y C E Q V K H D R Y R E H Y Y V D Y W HLA-DRB1*0415 1.4 H N Y V V F T V E C F D F D R R A A Y C E Q V K H D R Y R E H Y Y V D Y W HLA-DRB1*0418 1.4 H N Y V V F T V E C F D F D R R A L Y C E Q V K H D R Y R E H Y Y V D Y W HLA-DRB1*041001 1.4 H N Y V V F T V E C F D F Q R R A A Y C E Q V K H D R Y R E H Y Y V S Y W HLA-DRB1*041101 1.4 H N Y V V F T V E C F D F Q R R A E Y C E Q V K H D R Y R E H Y Y V S Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*07010101 50.0 H N Y V G F T V K C F E F D R R G Q V C W Q G K Y E R L R W Y L F V V S W HLA-DRB1*0704 7.7 H N Y V G F T V K C F E F D R R G Q Y C W Q G K Y E R L R W Y L F V V S W HLA-DRB1*0703 3.8 H N Y V G F T V K C F E F D R R G Q V C W Q G K Y E S L R W Y L F V V S W HLA-DRB1*0706 3.8 H N Y V G F T V K C F E F D R R G Q V C W Q G K Y E R L R W Y L F V A Y W HLA-DRB1*0708 3.8 H N Y V G F T V K C F E F D R R G Q V C W Q G K Y E R L R W Y L V V V S W HLA-DRB1*0709 3.8 H N Y V G F T V K C F E F D R R G Q V C W Q G K Y E R F R W Y F F V V S W HLA-DRB1*0712 3.8 H N Y V G F T V K C F E F D R R G Q V C W Q G K Y E R L R W Y L F V I S W HLA-DRB1*0717 3.8 H N Y V G F T V K C F E F D R W G Q V C W Q G K Y E R L R W Y L F V V S W HLA-DRB1*0718 3.8 H N Y V G F T V K C F E F D R R S Q V C W Q G K Y E R L R W Y L F V V S W HLA-DRB1*0720 3.8 H N Y V D F T V K C F E F D R R G Q V C W Q G K Y E R L R W Y L F V V S W HLA-DRB1*0722 3.8 H N Y V G F T V K C F E F D R R G Q V C W Q G K C E R L R W C L F V V S W HLA-DRB1*0723 3.8 H N Y V G F T V K C F E F D R R G Q V C W R G K Y E R L R W Y L F V V S W HLA-DRB1*0724 3.8 H N Y V V F T V K C F E F D R R G Q V C W Q G K Y E R L R W Y L F V V S W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*080101 30.0 H N Y V G F T V E C F D F D R R A L Y C E Y S T G D R Y R E G Y Y V S Y W HLA-DRB1*080201 14.3 H N Y V G F T V E C F D F D R R A L Y C E Y S T G D R Y R E G Y Y V D Y W HLA-DRB1*080401 11.4 H N Y V V F T V E C F D F D R R A L Y C E Y S T G D R Y R E G Y Y V D Y W HLA-DRB1*0805 2.9 H N Y V G F T V E C F D F D R R A A Y C E Y S T G D R Y R E G Y Y V S Y W HLA-DRB1*0806 2.9 H N Y V V F T V E C F D F D R R A L Y C E Y S T G D R Y R E G Y Y V S Y W HLA-DRB1*0812 2.9 H N Y A V F T V E C F D F D R R A L Y C E Y S T G D R Y R E G Y Y V S Y W HLA-DRB1*0824 2.9 H N Y V G F T V E C F D F D R R A A Y C E Y S T G D R Y R E G Y Y V D Y W HLA-DRB1*0834 2.9 H N Y V G F T V E C F D F D R R A L Y C E Y S T G D R Y R E G Y Y V V S W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*090102 57.1 H N Y V G F T V E C Y H F R R R A E V C K Q D K F H R G R K F G N V V S W HLA-DRB1*090201 7.1 H N Y V G F T V E C Y H F R R R A E V C K Q D K F H R G R K F G N V D Y W HLA-DRB1*0903 3.6 H N Y V G F T V E C Y H F D R R A E V C K Q D K F H R G R K F G N V V S W HLA-DRB1*0905 3.6 H N Y V G F T V E C Y H F R R R A E Y C K Q D K F H R G R K F G N V V S W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 10 HLA-DRB1*100101 42.9 H N Y V G F T V E C L E Y R R R A A Y C E E V K F E R R R E F R Y A D Y W 1*B DR HLA-DRB1*1002 14.3 H N Y V V F T V E C L E Y R R R A A Y C E E V K F E R R R E F R Y A D Y W LA - H HLA-DRB1*1003 14.3 H N Y V G F T V E C L E F R R R A A Y C E E V K F E R R R E F R Y A D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*110101 27.3 H N Y V G F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y Y V D Y W HLA-DRB1*110401 14.9 H N Y V V F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y Y V D Y W HLA-DRB1*110201 6.2 H N Y V V F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y Y V D Y W HLA-DRB1*110601 2.6 H N Y A V F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y Y V D Y W HLA-DRB1*111101 2.6 H N Y V G F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y Y V D Y W HLA-DRB1*111001 2.1 H N Y V G F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y F V D Y W HLA-DRB1*11103 2.1 H N Y V G F T V E C F D F Q K R G R Y C E Y S T S D R Y K E S Y Y V D Y W HLA-DRB1*11113 2.1 H N Y V V F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y N V D Y W HLA-DRB1*1109 1.5 H N Y V G F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y N V D Y W HLA-DRB1*1123 1.5 H N Y V G F T V E C F D F D R R A L Y C E Y S T S D R Y R E S Y Y V D Y W HLA - DRB 1* 11 HLA-DRB1*09 HLA - DRB1 * 08 HLA - DRB 1 * 07 HLA-DRB1*04 HLA-DRB1*03 HLA-DRB1*12 HLA-DRB1*01 Tabla 1 (cont) Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*130101 18.5 H N Y V V F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y N V D Y W HLA-DRB1*130201 11.0 H N Y V G F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y N V D Y W HLA-DRB1*130301 7.5 H N Y V G F T V E C F D F D K R A A Y C E Y S T S D R Y K E S Y Y V S Y W HLA-DRB1*1312 5.0 H N Y V G F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y Y V S Y W HLA-DRB1*130701 4.0 H N Y V G F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y Y V D Y W HLA-DRB1*130501 3.0 H N Y V G F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y N V D Y W HLA-DRB1*13149 2.5 H N Y V V F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y Y V D Y W HLA-DRB1*132301 2.0 H N Y V G F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y Y V D Y W HLA-DRB1*1304 1.5 H N Y V V F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y Y V S Y W HLA-DRB1*1308 1.5 H N Y V V F T V E C F D F D E R A A Y C E Y S T S D R Y E E S Y F V D Y W HLA-DRB1*131101 1.5 H N Y V V F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y Y V D Y W HLA-DRB1*1313 1.5 H N Y V G F T V E C F D F D R R A L Y C E Y S T S D R Y R E S Y Y V S Y W HLA-DRB1*1389 1.5 H N Y V V F T V E C F D F D K R A A Y C E Y S T S D R Y K E S Y Y V S Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*140101 15.4 H N Y V V F T V E C F D F R R R A E Y C E Y S T S D R Y R E S Y F V A H W HLA-DRB1*140501 8.3 H N Y V V F T V E C F D F R R R A E Y C E Y S T S D R Y R E S Y F V D Y W HLA-DRB1*140301 4.8 H N Y V G F T V E C F E F D R R A L Y C E Y S T S E R Y R E S Y N V D Y W HLA-DRB1*1404 4.2 H N Y V V F T V E C F D F R R R A E Y C E Y S T G D R Y R E G Y F V A H W HLA-DRB1*1414 4.2 H N Y V G F T V E C F D F R R R A E Y C E Y S T S D R Y R E S Y F V D Y W HLA-DRB1*140601 2.6 H N Y V V F T V E C F E F Q R R A A Y C E Y S T S E R Y R E S Y N V D Y W HLA-DRB1*1408 2.0 H N Y V V F T V E C F D F R R R A E Y C E Y S T S D R Y R E S Y F V D H W HLA-DRB1*1425 2.0 H N Y V G F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y Y V A H W HLA-DRB1*143201 2.0 H N Y V V F T V E C F D F R R R A A Y C E Y S T S D R Y R E S Y F V A H W HLA-DRB1*1402 2.0 H N Y V G F T V E C F E F Q R R A A Y C E Y S T S E R Y R E S Y N V D Y W HLA-DRB1*140701 1.3 H N Y V G F T V E C F D F R R R A E Y C E Y S T S D R Y R E S Y F V A H W HLA-DRB1*1409 1.3 H N Y V G F T V E C F D F Q R R A A Y C E Y S T S D R Y R E S Y N V D Y W HLA-DRB1*14100 1.3 H N Y V V F T V E C F D F R R R A A Y C E Y S T S D R Y R E S Y F V D Y W HLA-DRB1*14105 1.3 H N Y V V F T V E C F D F D R R A A Y C E Y S T S D R Y R E S Y F V A H W HLA-DRB1*14107 1.3 H N Y V V F T V E C F D F Q K R G R Y C E Y S T G D R Y K E G Y F V A H W HLA-DRB1*1411 1.3 H N Y V V F T V E C F D F R R R A E Y C E Y S T G D R Y R E G Y F V D Y W HLA-DRB1*141201 1.3 H N Y V V F T V E C F E F D R R A L Y C E Y S T S E R Y R E S Y N V D Y W HLA-DRB1*1417 1.3 H N Y V V F T V E C F D F Q R R A A Y C E Y S T S D R Y R E S Y N V D Y W HLA-DRB1*1463 1.3 H N Y V G F T V E C F E F D R R A L Y C E Y S T S E R Y R E S Y N V S Y W HLA-DRB1*1468 1.3 H N Y V G F T V E C F D F R R R A E Y C E Y S T G D R Y R E G Y F V A H W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB1*15010101 55.5 H N Y V V F T V E C F D F Q A R A A Y C W Q P K R D R Y A W R Y S V D Y W HLA-DRB1*150201 15.6 H N Y V G F T V E C F D F Q A R A A Y C W Q P K R D R Y A W R Y S V D Y W HLA-DRB1*15030101 4.7 H N Y V V F T V E C F D F Q A R A A Y C W Q P K R D R H A W R H S V D Y W HLA-DRB1*1538 1.6 H N Y V G F T V E C F D F Q A R A A Y C W Q P K R D R Y A W R Y S V D S W HLA-DRB1*1527 1.6 H N Y V G F T V E C F D F Q R R A A Y C W Q P K R D R Y R W R Y S V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 16 HLA-DRB1*160101 64.0 H N Y V G F T V E C F D F D R R A A Y C W Q P K R D R Y R W R Y S V D Y W 1* RB HLA-DRB1*1604 8.0 H N Y V G F T V E C F D F D R R A L Y C W Q P K R D R Y R W R Y S V D Y W A- D HL HLA-DRB1*1615 8.0 H N Y V V F T V E C F D F D R R A A Y C W Q P K R D R Y R W R Y S V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB3*01010201 42.1 H N Y V G F T V E C Y D F Q K R G R Y C E L R K S D R Y K E S Y F L V S W HLA-DRB3*0102 5.3 H N Y V G F T V E C Y D F Q K R G R Y C E L C K S D R Y K E S Y F L V S W HLA-DRB3*0103 5.3 H N Y V G F T V E C Y E F Q K R G R Y C E L R K S E R Y K E S Y F L V S W HLA-DRB3*0105 5.3 H N Y V G F T V E C Y N F Q K R G R Y C E L R K S N R Y K E S Y F L V S W HLA-DRB3*0106 5.3 H N Y V G F T V E C Y D F Q K R G R Y C E L R K S D R Y K E S Y F V V S W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB3*02020101 39.4 H N Y V G F T V E C F E F Q K R G Q Y C E L L K S E R H K E S H Y A D Y W HLA-DRB3*0201 6.1 H N Y V V F T V E C F E F Q K R G Q Y C E L L K S E R H K E S H Y A D Y W HLA-DRB3*0209 6.1 H N Y V G F T V E C F E F Q K R G Q Y C E L L K S E R H K E S H Y A V S W HLA-DRB3*0203 3.0 H N Y V G F T V E C F E F Q K R G Q Y C E L L K S E R H K E S H S V D Y W HLA-DRB3*0204 3.0 H N Y V V F T V E C F E F Q K R G R Y C E L L K S E R H K E S H Y A D Y W HLA-DRB3*0205 3.0 H N Y V G F T V E C F E F Q K R G Q Y C E L L K S E R Y K E S Y Y A D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 03 HLA-DRB3*030101 60.0 H N Y V V F T V E C F E F Q K R G Q Y C E L L K S E R Y K E S Y F V V S W* RB 3 HLA-DRB3*0303 20.0 H N Y V G F T V E C F E F Q K R G R Y C E L L K S E R Y K E S Y F V V S W D LA - H HLA-DRB3*0302 20.0 H N Y V V F T V E C F E F Q K R G Q Y C E L L K S E R H K E S H F V V S W Pocket 1 Pocket 4 Pocket 6 Pocket 9 01 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 4* RB HLA-DRB4*01010101 91.7 Y N Y V V F T V E C N I Y R R R A E Y C E Q A K C I R Y R E C Y Y A D Y W -D LA HLA-DRB4*0105 8.3 H N Y V V F T V E C N I Y R R R A E Y C E Q A K C I R Y R E C Y Y A D Y WH Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB5*010101 30.8 H N Y V G F T V E C F H F D R R A A Y C Q Q D K Y H R D R Q Y D D L D Y W HLA-DRB5*0102 7.7 H N Y V G F T V E C F H F D R R A A Y C Q Q D K Y H R G R Q Y G N V D Y W HLA-DRB5*0103 7.7 H N Y V G F T V E C F H F D T R A A Y C Q Q D K Y H R G T Q Y G N V D Y W HLA-DRB5*0104 7.7 H N Y V G F T V E C F H F D R R A L Y C Q Q D K Y H R D R Q Y D D L D Y W HLA-DRB5*0105 7.7 H N Y V G F T V E C F H F D R R A A Y C Q Q D K Y H R D R Q Y D D V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 HLA-DRB5*0202 60.0 H N Y A V F T V E C F H F Q A R A A Y C Q Q D K Y H R G A Q Y G N V D Y W HLA-DRB5*0205 20.0 H N Y A V F T V E C F H F Q R R A A Y C Q Q D K Y H R G R Q Y G N V D Y W HLA-DRB5*0203 20.0 H N Y V G F T V E C F H F Q A R A A Y C Q Q D K Y H R G A Q Y G N V D Y W HLA-DRB5*02 HLA-DRB5*01 HLA-DRB3*02 HLA-DRB3*01 HLA-DRB1*15 HLA-DRB1*14 HLA-DRB1*13 Tabla 2 Perfiles de bolsillo más frecuentes en el Aotus-MHC-DRB (>60%) Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aoaz-DRB*W3801 100.0 H N Y V G F T V E C F E F D R R A Q V C E Q A K Y E R H R E Y H Y A T Y W Aona-DRB*W3802 50.0 H N Y V G F T V E C F E F D R R A Q V C E Q A K Y E R H R E Y H Y A T Y W Aona-DRB*W3801 50.0 H N Y V G F T V E C F E F V R R A Q V C E Q A K Y E R H R E Y H Y A T Y W Aoni-DRB*W3801 100.0 H N Y V V F T V E C F E F D R R A Q V C E Q A K Y E R H R E Y H Y A T Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W1302 25.0 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y F Aona-DRB*W1308 25.0 H N Y V A F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V E Y F Aona-DRB*W1301 16.7 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V E Y F Aona-DRB*W1303 8.3 H N Y V V F T V E C F D F E T R A A Y C E Q F K L D R Y T E L Y Y V E Y F Aona-DRB*W1307 8.3 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y L Aona-DRB*W1310 8.3 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y W Aona-DRB*W1312 8.3 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y F V T Y F Aoni-DRB*W1301 33.3 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y F Aoni-DRB*W1306 22.2 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y W Aoni-DRB*W1302 11.1 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D H F Aoni-DRB*W1305 11.1 H N Y V A F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V E Y F Aoni-DRB*W1307 11.1 H N Y V G F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y F Aoni-DRB*W1308 11.1 H D Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y F Aovo-DRB*W130101 50.0 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y F V T Y F Aovo-DRB*W1302 25.0 H N Y V V F T V E C F D F E T R A A F C E Q F K P D R Y T E P Y F V T Y F Aovo-DRB*W1304 25.0 H N Y V V F T V E C F D F E T R A A Y C E Q F K P D R Y T E P Y Y V D Y F Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W1802 42.9 H N Y V F F T V E C F E F L K R G Q Y C E L V K S E R Y K E S Y L V D Y W Aona-DRB*W1801 28.6 H N Y V G F T V E C F E F L K R G Q Y C E Q V K S E R Y K E S Y F V D Y W Aona-DRB*W1803 14.3 H N Y V V F T V E C F E F L K R G Q Y C E Q V K S E R Y K E S Y F V D Y W Aona-DRB*W1804 14.3 H N Y V F F T V E C F E F L K R G Q Y C E L V K S E R Y K E S Y L A D Y W Aoni-DRB1*W1801 100.0 H N Y V G F T V E C F E F L K R G Q Y C E Q V K S E R Y K E S Y F V D Y W Aotr-DRB*W1801 100.0 H N Y V F F T V E C F E F L K R G Q Y C E Q A K S E R Y K E S Y Y V D Y W Aovo-DRB*W1801 66.7 H N Y V F F T V E C F E F L K R G Q Y C E Q A K S E R Y K E S Y Y V D Y W Aovo-DRB*W1803 33.3 H N Y V F F T V E C F E F L K R G Q Y C E Q G K S E R Y K E S Y Y V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W2901 50.0 H N Y V F F T V E C L Q F Y L R A A Y C E Q T K S Q R Y L E S Y Y A D Y W Aona-DRB*W2906 25.0 H N Y V V F T V E C L Q F Y L R A A Y C E Q T K S Q R Y L E S Y Y A D Y W Aona-DRB*W2907 12.5 H N Y V G F A V E C L Q F Y L R A A Y C E Q T K S Q R Y L E S Y Y A D Y W Aona-DRB*W2908 12.5 H N Y V G F T V E C L Q F Y L R A A Y C E Q T K S Q R Y L E S Y Y A D Y W Aoni-DRB*W2902 80.0 H N Y V V F T V E C L Q F Y L R A A C C E Q T K S Q R Y L E S Y Y V D Y W Aoni-DRB*W2901 20.0 H N Y V G F T V E C L Q F Y L R A A Y C E Q T K S Q R Y L E S Y Y A D Y W Aovo-DRB*W2901 100.0 H N Y V V F T V E C L Q F Y L R A A C C E Q T K S Q R Y L E S Y Y V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 30 Aona-DRB*W3002 50.0 H N Y V G F T V E C Y E F D R R A A Y C E Q V K Y E R Y R E Y Y F V S K L W B* Aona-DRB*W3001 50.0 H N Y V G F T V E C Y E F D R R A S Y C E Q V K Y E R L R E Y L F V S K L DR- Ao Aovo-DRB*W3001 100.0 H N Y V G F T V E C Y E F D R R A S Y C E Q V K Y E R L R E Y L F V V K L Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W4201 100.0 H N Y V V F T V E C F E F Y L R A A Y C E Q V K D E R Y L E D Y Y V D Y W W42 Aoni-DRB*W4201 100.0 H N Y V V F T V E C F E F Y L R A A Y C E Q V K D E R Y L E D Y Y V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aoni-DRB*W4301 25.0 H N Y V G F T V E C L D F N R R A A Y C K Q V K C D R H R K C H Y V T Y W Aoni-DRB*W4302 25.0 H N Y V G F T V E C L D S N R R A A Y C K Q V K C D R H R K C H Y V T Y W Aoni-DRB*W4303 25.0 H N Y V G L T V E C L D F N R R A A Y C K Q V K C D R H R K C H Y V T Y W Aoni-DRB*W4304 25.0 H N Y V V F T V E C L D F N R R A A Y R K Q V K C D R H R K C H Y V T Y W Aovo-DRB*W4301 100.0 H N Y V G F T V E C L D F N R R A A Y C K Q V K C D R H R K C H Y V T Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W4401 100.0 H N Y V G F T V E C Y D F D R R A A Y C E Q A K S D R Y R E S Y Y V T Y W W44 Aoni-DRB*W4401 100.0 H K Y V G F T V E C Y D F D R R A A Y C E Q A K S D R Y R E S Y Y V T Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W4501 100.0 H N Y V V F T V E C F D F D R R A Q Y C E Q V K H D R Y R E H Y V V D Y W W45 Aovo-DRB*W4501 100.0 H N Y V V F T V E C F D F D K R A S Y C E Q V K H D R Y K E H Y V V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W470401GA 53.8 H N Y V G F T V E C F D F D R R A Q Y C E Q V K H D R Y R E H Y Y V D Y W Aona-DRB*W4701GA 7.7 H N Y V V F T V E C F D F Y R R A Q Y C E Q V K H D R Y R E H Y Y V D Y W Aona-DRB*W4702GA 7.7 H N Y V V F T V E C F D F D R R P Q Y C E Q V K H D R Y R E H Y Y V D Y W Aona-DRB*W4703GA 7.7 H N Y V V F T V E C F D F D R R A Q Y C E Q V K H D R Y R E H Y Y V D Y W Aona-DRB*W4705GA 7.7 H N Y V G F T V E C F D F D R R A Q Y C E Q V K H D R Y R E H Y Y V D H W Aona-DRB*W4708GA 7.7 H N Y V G F T V E C F D F D R R A Q Y C K Q V K H D R Y R K H Y Y V D Y W Aona-DRB*W4709GA 7.7 H N Y V G F T V E C F D F D R R A Q Y C E Q V K D D R Y R E D Y Y V D Y W Aovo-DRB*W4701GA 100.0 H N Y V G F T V E C F D F D R R A Q Y C E Q V K H D R Y R E H Y Y V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aoni-DRB*W4701GB 100.0 H N Y V G F T V E C F D F D R R A Q Y C E Q V K H D R Y R E H Y Y A D Y W W47 Aovo-DRB*W4702GB 100.0 H N Y V G F T E E C F D F D R R A Q Y C E Q V K H D R Y R E H Y Y A D Y W Ao-DRB*W47 Ao-DRB*W43 Ao-DRB*W29 Ao-DRB*W18 Ao-DRB*W13 Ao-DRB*W38 Tabla 2 (cont) Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 W88 Aovo-DRB*W8801 100.0 H N Y V A F T V E C L Q F Y L R A A Y C E Q V K D Q R Y L E D Y Y V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 W89 Aona-DRB*W8901 100.0 H N Y V A F T V E C Y D F Q K R G R Y C E Q T K S D R Y K E S Y Y V T Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 W90 Aovo-DRB*W9001 100.0 H N Y V G F T V E C L Q F Y L R A A Y C E Q G K S Q R Y L E S Y V L S K L Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB*W9101 100.0 H N Y V G F T V E C F E F T R R A A F C E Q A K C E R Y R E C Y V L E S W W91 Aovo-DRB*W9102 50.0 H N Y V F F T V E C F E F T R R A A F C E Q A K G E R Y R E G Y V L S K Y Aovo-DRB*W9101 50.0 H N Y V G F T V E C F E F T R R A A F C E Q A K C E R Y R E C Y V L E K Y Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aovo-DRB*W9202 50.0 H N Y V G F T V E C Y D F D R R A S Y C F Q T T S D R Y R F S Y F V V K L W92 Aovo-DRB*W9201 50.0 H N Y V G F T V E C Y D F D R R A S Y C F Q T T S D R Y R F S Y V V V K L Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 W93 Aovo-DRB*W9301 100.0 H N Y V V F T V E C F E F D R R A A Y C E L I K F E R Q R E F Q Y L D S W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aona-DRB1*0305GA 42.9 H N Y V G F T V E C Y D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aona-DRB1*0307GA 14.3 H N Y V G F T V E C Y D F R K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aona-DRB1*0303GA 7.1 H N Y V F F T V E C Y D F Q K R G Q Y C F Q T T S D R Y K F S Y Y V D Y W Aona-DRB1*0304GA 7.1 H N Y V G F T V E C F D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aona-DRB1*0309GA 7.1 H N Y V G F T V E C F D F R K R G Q Y C F Q T T S D R Y K F S Y Y V D Y W Aona-DRB1*0311GA 7.1 H N Y V V F T V E C F D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aona-DRB1*0312GA 7.1 H N Y V V F T V E C Y D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aona-DRB1*0319GA 7.1 H N Y V G F T V E C H D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aoni-DRB1*0303GA 33.3 H N Y V G F T V E C Y D F R K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aoni-DRB1*0304GA 33.3 H N Y V G F T V E C Y D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aoni-DRB1*0301GA 16.7 H N Y V G F T V E C Y D F R K R G Q Y C F Q T T S D R Y K F S Y Y V D Y W Aoni-DRB1*0307GA 16.7 H N Y V G F T V E C H D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aotr-DRB1*0303GA 33.3 H N Y V G F T V E C Y D F Q K R A R Y C F Q T T S D R Y K F S Y Y V D Y W Aotr-DRB1*0301GA 33.3 H N Y V G F T V E C Y D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aotr-DRB1*0302GA 33.3 H N Y V G F T V E C Y D F R K R G Q Y C F Q T T S D R Y K F S Y Y V D Y W Aovo-DRB1*0302GA 28.6 H N Y V G F T V E C Y D F Q K R G R Y C F Q T T S D R Y K F S Y Y V D Y W Aovo-DRB1*0305GA 28.6 H N Y V V F T V E C Y D F Q K R G R Y C F Q T T S D R Y K F S Y F V D Y W Aovo-DRB1*0301GA 14.3 H N Y V G F T V E C Y D F R K R G Q Y C F Q T T S D R Y K F S Y F V D Y W Aovo-DRB1*0303GA 14.3 H N Y V G F T V E C Y H F Q K R G R Y C F Q T T S H R Y K F S Y Y V D Y W Aovo-DRB1*0306GA 14.3 H N Y V G F T V E C Y D F Q K R G R Y C F Q T T S D R Y K F S Y F V D Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 GB Aona-DRB1*0302GB 72.7 Y N Y V A F T V E C F D F E R R A L Y C F Q T T S D R Y R F S Y Y V S Y W *0 3 B1 Aona-DRB1*0301GB 18.2 Y N Y V A F T V E C Y D F E R R A L Y C F Q T T S D R Y R F S Y Y V S Y W -D R Ao Aona-DRB1*0326GB 9.1 Y N Y V A F T V E C F D F E R R A L Y C F Q T T Y D R Y R F Y Y Y V S Y W Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 GC Aona-DRB1*0313GC 50.0 H N Y V V F T V E C F D F E T R A A Y C F Q T T S D R Y T F S Y Y V E Y F 03 B1 * Aona-DRB1*0314GC 50.0 H N Y V V F T V E C Y D F E T R A A Y C F Q T T S D R Y T F S Y Y V E Y F DR Ao - Aoni-DRB1*0305GC 50.0 H N Y V V F T V E C Y D F E T R A A Y C F Q T T S D R Y T F S Y Y V D Y F Pocket 1 Pocket 4 Pocket 6 Pocket 9 Allele prototype PPF 81 82 83 85 86 89 90 91 14 15 26 28 40 70 71 72 73 74 78 79 9 10 11 12 13 28 29 30 71 9 13 30 37 38 57 60 61 Aoaz-DRB3*0601 100.0 H N Y V F F T V E C Y D F Q K R G Q Y C E L V K H D R Y K E H Y Y V D Y W Aona-DRB3*0603 36.8 H N Y V V F T V E C Y D F Q K R G R Y C E L V K H D R Y K E H Y Y V D Y W Aona-DRB3*0601 15.8 H N Y V F F T V E C Y D F Q K R G Q Y C E L V K H D R Y K E H Y V V D Y W Aona-DRB3*0613 15.8 H N Y V V F T V E C Y D F Q K R G R Y C E L V K H D R Y K E H Y V V D Y W Aona-DRB3*0602 10.5 H N Y V F F T V E C Y D F Q K R G Q Y C E L V K H D R Y K E H Y Y V D Y W Aona-DRB3*0604 5.3 H N Y V G F T V E C Y D F Q K R G R Y C E L V K H D R Y K E H Y Y V D Y W Aona-DRB3*0607 5.3 H N Y V F F T V E C Y D F Q K R G Q Y C E L V K H D R Y K E H Y S V D Y W Aona-DRB3*0618 5.3 H N Y V F F T V E C Y Y F Q K R G Q Y C E L V K H Y R Y K E H Y Y V D Y W Aona-DRB3*0624 5.3 Y N Y V V F T V E C Y D F Q K R G R Y C E L V K H D R Y K E H Y Y V D Y W Aoni-DRB3*0601 100.0 H N Y V V F T V E C Y D F Q K R G Q Y C E L V K H D R Y K E H Y Y V D Y W Aotr-DRB3*06L 100.0 H N Y V V F T V E C Y D F Q K R G R Y C E L V K H D R Y K E H Y V V D Y W Aovo-DRB3*0601 100.0 H N Y V V F T V E C Y D F Q K R G R Y C E L V K H D R Y K E H Y Y V D Y W Ao-DRB3*06 Ao-DRB1*03GA Anexo 2. TCR-contacting residues orientation and HLA-DR* binding preference determine long-lasting protective immunity against malaria Alba MP, Suarez CF, Varela Y, Patarroyo MA, Bermudez A, Patarroyo ME. TCR- contacting residues orientation and HLA-DR* binding preference determine long-lasting protective immunity against malaria. Biochem Biophys Res Commun. 2016;477(4):654- 60. La versión publicada del artículo puede ser consultada en: http://www.sciencedirect.com/science/article/pii/S0006291X16310336 194 TCR-contacting residues orientation and HLA-DRβ* binding preference determine long-lasting protective immunity against malaria Martha P. Alba a, b, c, Carlos F. Suarez a, b, c, Yahson Varela a, Manuel A. Patarroyo a, b, Adriana Bermudez a, b, Manuel E. Patarroyo a, d, * a Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá D.C., Colombia b Universidad del Rosario, Bogotá D.C., Colombia c Universidad de Ciencias Aplicadas y Ambientales (UDCA), Bogotá, Colombia d Universidad Nacional de Colombia, Bogotá DC, Colombia. * Corresponding author. e-mail: mepatarr@gmail.com Abstract Fully-protective, long-lasting, immunological (FPLLI) memory against Plasmodium falciparum malaria regarding immune protection-inducing protein structures (IMPIPS) vaccinated into monkeys previously challenged and re-challenged 60 days later with a lethal Aotus monkey- adapted P. falciparum strain was found to be associated with preferential high binding capacity to HLA-DR1* allelic molecules of the major histocompatibility class II (MHC-II), rather than HLA- DR3*, 4*, 5* alleles. Complete PPIIL 3D structure, a longer distance (26.5 Å ± 1.5 Å) between residues perfectly tting into HLA-DR1*PBR pockets 1 and 9, a gauche- rotamer orientation in p8 TCR-contacting polar residue and a larger volume of polar p2 residues was also found. This data, in association with previously-described p3 and p7 apolar residues having gauche+ orientation to form a perfect MHC-II-peptide-TCR complex, determines the stereo-electronic and topochemical characteristics associated with FPLLI immunological memory. Keywords: Antimalarial-vaccine, T-cell-receptor, MHC-II, Immunological memory, Rotamer-orientation. Introduction One of the main problems in vaccine development is the in- duction of FPLLI memory. Microbes (viruses, bacteria, parasites, etc.) have developed an incredible number of escape mechanisms against immune pressure, such as antigenic diversity where a single amino acid (aa) mutation or replacement can completely avert previously developed immunity, as occurs with Plasmodium falciparum malaria proteins apical membrane antigen-1 (AMA-1) [1,2], merozoite surface protein- 1 (MSP-1) [3], etc., to quote a few. Microbes can also induce suppression, blocking [4], impeding [5,6] and many other escape mechanisms [7] rendering new or previously acquired immunity useless. In continents like Africa, the development of FPLLI poses a tantalizing and insurmountable problem as one person can receive as many as eighteen P. falciparum infectious mosquito bites per day during the high transmission season. The putative vaccine candidate RTS,S/ASO1 provides a clear example [8], since the suggested protective immunity (considering protection to be less than 5000 parasites per microliter of blood) was short-lived (less than 6 months) [8] and observed in only 27% of the vaccinated population after the fourth booster immunisation 6 months later [9]. The WHO thus did not recommend its use for infants [9]. For more than three decades, we have pursued the idea that fully-protective immunity: zero parasites in the blood or spontaneous rapid and permanent recovery after very low parasitaemia (less than 0.1%) can be induced with chemically-synthesized vaccines, based on the concept that functionality relevant conserved high activity binding peptides (cHABP) have to be recognized in the corresponding [10] protein to properly modify them (mHABP) and render them highly immunogenic and protection-inducing [11]. Such minimal subunit-based mHABPs must ful l a set of physicochemical and topochemical rules (previously described) to properly display a perfect tting into MHCII-pep-TCR complex [12]. That goal was achieved when a large number of highly immunogenic protection-inducing peptide structures (IMPIPS) [13] ful lled those requirements when used as individual epitopes in primary challenges. Therefore, these merozoite-derived IMPIPS which had demonstrated clear FPLLI against experimental challenge with the highly- infectious Aotus monkey-adapted P. falciparum FVO strain were used to solve the immunological memory problem. Protected monkeys and some non- protected ones kept in captivity after challenge, after all of them had received anti-malarial treatment (to clear any residual parasites), were then re-challenged 60 days later (after all traces of anti-malarial drugs had disappeared) to determine the development of FPLLI. By the same token, sera from Aotus monkeys immunized with Spz-derived IMPIPS and kept in captivity for up to 900 days (~2 ½ years) after the rst immunization were analysed for the presence of very high long- lasting antibody (VHLLA) titres against P. falciparum Spz, as determined by immuno uorescence assay (IFA), and their corresponding recombinant proteins by western blot (WB), to determine antibody titre duration [14]. Materials and methods IMPIPS mHABPs were synthesized according to Merri eld’s peptide synthesis methodology, as modi ed by Houghten and thoroughly described [10]; a 600 MHz spectrometer was used for determining the 1H NMR 3D structure of a large panel of mHABPs [11]. Monkeys Wild-caught Aotus monkeys from the Amazon jungle were used for trials authorized by Colombian environmental authorities (CORPOAMAZONIA, permission number 0632 and 0042/2010); they were kept in our eld-station in Leticia (Amazon department capital), looked after by expert veterinarians and workers supervised weekly by expert biologists and veterinarians from the local environmental authorities and ethics committee. After the study was completed, they were treated with paediatric doses of quinine, kept in quarantine for 20 more days and released back into the jungle close to their capture site, accompanied by environmental authority of cials. Those participating in this trial were kept according to methods above described. Immunization After arriving at our eld station, monkeys were deparasitized, kept in quarantine for twenty days and fed on a hypercaloric, hyperproteic diet before experiments commenced. Each monkey received 150 mg polymerized IMPIPS subcutaneously, in complete Freund’s adjuvant, on day zero; a second dose of the same IMPIPS with incomplete Freund’s adjuvant was administered 20 days later. They were challenged 20 days later. First challenge This involved intravenous inoculating 100,000 erythrocytes infected with the highly-virulent Aotus-adapted P. falciparum FVO strain freshly obtained from another infected Aotus monkey [11]; intravenous challenge with a 100% infectious, virulent P. falciparum malaria strain being the most stringent vaccine testing methodology. Assessing infection Parasitaemia was determined by uorescence microscopy using Acridine Orange staining; the percentage of parasitized RBC in their blood was counted, starting on day ve. Protected monkeys in their rst or primary challenge had no parasites in their blood while non-protected ones started showing parasites by day ve, reaching ≥6% parasitaemia on days eight to ten; they were immediately treated with paediatric doses of chloroquine. All protected and non-protected monkey; were treated after the experiment ended (day 20 after challenge) and kept in quarantine. Determining antibodies IFA titres were determined as previously described [11], blood samples being taken one day prior to the first immunization (i.e. preimmune - PI) or ten days after the second dose (II10) and 20 days after the second immunization (II20), the day before challenge. Total schizont lysate or recombinant proteins containing the aa sequence from which the IMPIPS were derived were used for WB. Re-challenge No further immunizations were performed after the second dose was given at the beginning of the experiment. All protected and some non-protected monkeys were kept in quarantine for a further 60 days and re-challenge with 100,000 iRBC freshly-obtained from another previously-infected monkeys parasitaemia was assessed as before. Two trials (A and B) were performed with two different groups of IMPIPS used for immunization in the rst challenge. HLA-DR* binding IMPIPS The NetMHCIIpan-3.0 algorithm (predictor of peptide binding to MHC-II molecules), having >95% speci city and 90% sensitivity accurately, predicting (≥90%) correct HLA-DR peptide binding cores (previously determined by X-ray crystallography) was used. This in silico method identi es peptides having very high theoretical binding to speci c HLA-DR1* alleles and alternative -chain isotypes like HLA-DR3*, 4* and 5* alleles measured as peptides half inhibitory capacity (IC ≤ 100 nM), based on the Immune Epitope Database [15]. Determining 3D structure 600 MHz spectrometer 1H NMR 3D structures were determined with RP-HPLC-puri ed IMIPS; their sequential connectivities and dihedral angles have already been described [13,14,16 - 19]. Only 1 angle degrees of residues considered TCR contacting (positions p2, p5, p8) regarding their binding in the HLA-DR peptide binding region (PBR) are described, based on their predicted binding to HLA- DR molecules. For other very relevant TCR contacting residues (p3, p7) their rotameric orientation and relevant immunological functions have been already described [14]. Results and discussion Reminder: All participating monkeys were immunized only twice with a single IMPIPS; immune protection was therefore elicited by just two doses of individual IMPIPS. Antibodies Remarkably, Group I (protected) and Group II (non-protected) antibody (Ab) patterns, titres and reactivities (assessed by IFA and WB) were extremely similar prior to the rst-challenge (Table 1), as can be appreciated when comparing cHABP 4044-derived MSP-2 24112 (protected) and 22774 (non-protected) analogue mHABPs (Table 1) as assessed by IFA and WB (Figure 1B, Aotus 16087 and 12877 respectively). Similarly the Ab reactivity by IFA of SERA-5 6725-derived 22830 (protected) and 24216 (non-protected) derived from 6746 were very similar by WB analysis (not shown). It is thus extremely dif cult to distinguish between permanent long-lasting protective epitopes and permanent short-protective ones based on actual Ab reactivity; such thoroughly-described phenomenon shows the exquisite reactivity of the immune response regarding FPLLI induction. Furthermore, IFA, ELISA or WB serological analysis involving recombinants fragments prior to high malarial transmission seasons have shown that the bulk of immune response is directed against highly polymorphic, hypervariable regions of the molecule, the same occurs when immunizing humans or experimental animals with X-ray attenuated whole Spz, recombinant proteins, DNA vector based fragments, etc., showing that polymorphism is a very common mechanism used by microbes to escape immune pressure. Such approach (immunological) to epitope selection has been exhaustively shown to be inappropriate in countless human vaccine trials [20] due to skewing the immune response towards highly polymorphic hypervariable regions. Immunogenetic analysis Genetic restriction ascribed to a particular HLA-DR1* allele represents an alternative to such long-lasting protective response but it is extremely dif cult to ascertain due to the tremendous polymorphism this region displays. The NetMHCIIpan-3.0 algorithm revealed no preference for any HLA-DRβ1* allele, since the same alleles were present in both groups (I and II) but showing a skewing towards binding to alternative β-chain HLA-DRβ3*, β4* and β5* alleles in the non- protected group II (Table 2). Such preference deserves further analysis. Protection against re-challenge Two trials (A and B) performed with different Mrz-derived IMPIPS to cover the MHC-II genetic restriction, trying to address memory or FPLLI phenomena, produced similar results (Figure 2) when these previously protected monkeys were re-challenge. Three IMPIPS-induced FPLLI: 1585 MSP-1-derived 22770 (Aotus 12824), 6737 SERA-5-derived 22834 (Aotus 12984) and 4044 MSP-2-derived 24112 (Aotus 16006 and 16087) in some immunized monkeys having complete absence of parasites in their blood during the whole trial. All these IMPIPS showed high binding capacity to HLA-DRβ1* alleles but none bound to HLA- DRβ3*, β4* or β5* alleles. Short-lived (~5 days), very low parasitaemia (<0.1%) that spontaneously recovered, not showing any more parasites during the rest of the experiment was seen in some previously-protected monkeys participating in re-challenge trial involving other IMPIPS (cHABP 4313 AMA-1-derived 22780, cHABP 6725 SERA-5-derived 22830, and cHABP 1783 EBA-175-derived 22814). Therefore, they were considered protective IMPIPS since parasitaemia was very low and rapidly cleared being this behaviour totally different to the well-known semi-immune chronicity phenomena. The latter two IMPIPS: 22830 and 22814 bound with high capacity to HLA-DRβ1* molecules and simultaneously to HLA-DRβ5*0101/ 0102 and HLA-DRβ5*0202 alleles (Table 2). Another striking nding that correlates with the previous observation is that all non-protection inducing in re-challenge IMPIPS (group II) display shorter (22.5 Å ± 1.5 Å) structures (Table 1) as determined by 1H NMR (Figure 1C. IMPIPS 22774.47 and 24216.48, for example) when compared with all group I IMPIPS having 26.5 Å ± 1.5 Å (Table 1) distances between residues tting into HLA-DRβ1* PBR pockets 1 to 9 (Figure 1C. IMPIPS 24112.39 and 25608.37, for example). Group I IMPIPS totally displayed complete polyproline type II left-handed (PPIIL) structures while group B displayed a mixture of -helical and PPIIL structures, making them ±3.0 Å shorter. Such clear and neat difference had not been observed previously, due to the fact that re- challenge experiments had not been performed beforehand; therefore, our previously reported distances for IMPIPS were 26.5 Å ± 3.5 Å which included both groups (I and II). Most monkeys which were not protected in re-challenge trials displayed greater binding capacity to HLA-DRβ3*, β4* or β5* alleles (Table 2), suggesting these IMPIPS clear skewing regarding their binding to these MHC-II alleles. It might be speculated that such preferential HLA-DRβ3*, β4* or β5* binding could bias the immune response towards short-lived memory protective immunity. Supporting such information, we have previously shown that peptides inducing short- lived antibody responses against P. falciparum malaria have shorter structure registers between aa tting into the HLA-DRβ1* peptide binding region (PBR) as determined by 1H NMR spectrometry and are read in a different MHC-II functional register [21]. X-ray crystallography has shown that HLA-DRβ3* molecules are 2.0 Å wider in Kβ71 than DRβ1*, that Wβ61 is rotated 90 and more distant from pocket 9, that 76R is notably displaced upwards leaving pocket 9 highly hydrophobic, that H-bonds between peptide backbone atoms and DRβ3* interacting residues are >4 Å distant, making these interaction between DRβ3*-IMPIPS longer, unstable and weaker for stimulating an appropriate immune response. All of these stereo chemical characteristics could probably be associated with short memory induction [22,23]. Since IMPIPS cannot be involved in Spz challenge, due to irreproducible results regarding the only Anopheles mosquito-derived P. falciparum strain (Santa Lucia) adapted to Aotus monkeys, such antibodies’ permanence in Spz-derived IMPIPS immunized monkey sera was determined by IFA and WB with recombinant fragments corresponding to the protein from which the aa sequence was derived. Monkeys, kept in our eld station in the Amazon jungle for 900 days after the 1st vaccination, followed-up for 840 days after the 3rd dose (~2½ years) with IMPIPS CSP-1 4383- derived 25608; 4389-derived 32958 and STARP 24230 20546-derived produced very high and long-lasting Ab titres (Figure 1B). Some others like 3289-derived TRAP 24246 and SPECT-2 34938 derived 38890 had high Ab titres that slowly declined over a 6-month period. These short- lived antibody inducer mHABP also had high binding to HLA-DRβ5*0202 allele molecules. p2 volume in long-lasting protective immunity Besides the distance between P1 and P9 residues, and  and  angles having PPIIL conformation, we have found volume and charge to be critical physicochemical characteristics for a proper t into the HLA-DRβ1* PBR. Something similar occurs with upwardly-orientated TCR-contacting residues, as previously shown for p3 and p7 [14]. Table 1 in the present manuscript clearly shows that most FPLLI- and VHLLAI-inducing IMPIPS in group I had a larger volume in p2 than those in groups II, whereas positively-charged residues having p electrons (H, R, K) predominated in group I. Smaller polar residues predominating in group II had alcohol groups (S, T) in their side chains acting as nucleophiles or acidic negatively-charged aa (E, D). p8 residue orientation determines long-lasting protective immunity Protein and peptide studies have thoroughly demonstrated that aa side-chain orientation has trimodal distribution based on 1 angle rotation related to a protein or peptide’s frontal plane; gauche+ (trans to the carbonyl group), gauche- (trans to the H atom) or trans (trans to the amino group), except for Gly, Ala and Pro, the later (warning) an iminoacid having different 1 angle rotation, depending on the preceding’s residue f angle. According to 1 angle rotation degree, aa have been divided into gauche+ (-120○ to 0○), gauche- (0○ to +120○) and trans (trans +120○ to +240○). Therefore when the 600 MHz 3D structures of our IMPIPS used for immunization were determined it was found that, strikingly, all protected Aotus monkeys during re-challenge had been vaccinated with IMPIPS having 1 angles ranging from +89.9○ to +8.1○ in residues located in p8 (Table 2), therefore having gauche- aa side-chain orientation in p8. By contrast, all non-protected monkeys in re-challenge had been immunized with IMPIPS having -167.1○ to -12.3○ rotation angles, therefore gauche+ orientation in p8 (Table 2). VHLLAI Spz-derived IMPIPS (25608, 32958 and 24320) and Mrz- derived FPLLI 24112 included in mixtures [14] not blocking, interfering or suppressing each other’s activity had also gauche- sidechain orientation in p8. When analysing the aa sequence of IMPIPS used to immunise re-challenge protected monkeys, p8 was occupied by polar residues (S, T, E, D), the same as those Spz derived VHLLAI IMPIPS (N, N, P), except for AMA-1-derived 22780 and STARP-derived 24320 both having the iminoacid Pro (which could be puckered up or down) and 22770 having Val in this position (Table 1. Group I). Strikingly, all non-protected monkeys during re-challenge were immunised with IMPIPS having apolar residues in p8 (M, I, M, L, N, G, A, F), except for 38890 (E) (Table 1) including the Spz derived IMPIPS 24246 and 38890 inducing short lived Abs titres. Our previous data regarding IMPIPS previously reported 3D structures has shown the critical role of 1 angle in residues p3 and p7, having gauche+ orientation associated with being able to be mixed to induce FPLLI and VHLLAI without interfering, blocking, suppressing, abolishing or poisoning each other’s immunological activity in the process of developing a complete multi- epitope, multi-stage minimal subunit-based, chemically-synthesized anti-malarial vaccine. Conversely, the completely abolished immune response induced in mixtures with other IMPIPS when mixing them i.e. 24148 and 24246 corresponds to the same IMPIPS which could not induce either re-challenge protection or VHLLAI memory; these IMPIPS also displayed gauche+ orientation in p8 (Table 2. Group II) suggesting some stereo-chemical interference in memory induction and combination in mixture composition. In intracellular pathogenic diseases, the development of polyfunctional, rapidly proliferating T- cells, with low apoptosis seems to be the key issue [24] to clear infection and develop a robust T- cell memory [25] and many hypotheses have arisen for explaining the absence of memory induction, i.e. T-cell exhaustion after infection [26] leading to the loss of parasite-speci c memory T-cells inducing protection from re-infection [27], (as in this manuscript), or alternative up- regulation of FOXP3 expressing CD4+ CD25+ T-regulatory cells associated with more rapid parasite growth during infection [28] or elevated number of highly suppressive T-regulatory cells in severe malaria [29]. Alternative explanations are the induction of programed cell death-1 (PD-1) molecules on activated CD4+ or CD8+ T-cells that in conjunction with LAG-3+ T-cells modulate immunity against malaria [30]. It has been recently demonstrated in mice having the PD-1 gene deleted (PD- 1KO) that such deletion generates sterile protective immunity, unlike wild type mice infected with Plasmodium chabaudi which maintained ~1% parasitaemia [31] equivalent to human chronic subclinical malaria. There are many more alternative hypothesis associated with the lack or absence of protective immunity memory but this 3D structural analysis of 20 IMPIPS clearly suggested that p8 residue 1 angle rotation and orientation is associated with or determines long-lasting protective memory. We therefore suggest that in a complete, fully-protective minimal subunit-based, chemically- synthesized vaccine able to induce very long-lasting protective immunological memory, besides the previously-described physicochemical principles regarding a perfect t, into the HLA- DR1*PBR, TCR-contacting residues p3 (apolar) and p7 (also apolar) should have gauche+ rotamer orientation [14] while p8 (polar) should have gauche- orientation and p2 should have the polar characteristics shown here. These ndings allow us to propose that such stereo chemical and topological rules mediate FPLLI memory. This is the rst time protective memory induction has been shown, at 3D structural level to be associated with speci c electronic and rotamer orientation of a particular TCR-contacting residue (p8) while negatively associated also with a binding capacity to HLA-DR3*, 4* or 5* allelic molecules, paving the way for a logical rational methodology for long-lasting protective immunity. Con ict of interest The authors declare that they have no nancial or commercial con icts of interest. Acknowledgments This research was supported by “The Colombian Science, Technology and Innovation Department (Colciencias)”, Contract RC#0309-2013. We would like to thank Mr. Jason Garry for his collaboration in the translation of this manuscript. References [1] S. Dutta, S.Y. Lee, A.H. Batchelor, D.E. Lanar, Structural basis of antigenic escape of a malaria vaccine candidate, Proc. Natl. Acad. Sci. U. S. A. 104 (2007) 12488-12493. [2] D.P. Eisen, A. Saul, D.J. Fryauff, J.C. Reeder, R.L. Coppel, Alterations in Plasmodium falciparum genotypes during sequential infections suggest the presence of strain speci c immunity, Am. J. Trop. Med. Hyg. 67 (2002) 8-16. [3] W.D. Morgan, M.J. Lock, T.A. Frenkiel, M. Grainger, A.A. Holder, Malaria parasite-inhibitory antibody epitopes on Plasmodium falciparum merozoite surface protein-1(19) mapped by TROSY NMR, Mol. Biochem. Parasitol. 138 (2004) 29-36. [4] W.D. Morgan, T.A. Frenkiel, M.J. Lock, M. Grainger, A.A. Holder, Precise epitope mapping of malaria parasite inhibitory antibodies by TROSY NMR cross- saturation, Biochemistry 44 (2005) 518-523. [5] C.Q. Schmidt, A.T. Kennedy, W.H. Tham, More than just immune evasion: hijacking complement by Plasmodium falciparum, Mol. Immunol. 67 (2015) 71-84. [6] J.Y.A. Doritchamou, VAR2CSA domain-speci c analysis of naturally acquired functional antibodies to P. falciparum placental malaria, J. Infect. Dis. (2016). [7] F. Farooq, E.S. Bergmann-Leitner, Immune escape mechanisms are Plasmodium’s secret weapons foiling the success of potent and persistently ef cacious malaria vaccines, Clin. Immunol. 161 (2015) 136-143. [8] Ef cacy and safety of RTS, S/AS01 malaria vaccine with or without a booster dose in infants and children in Africa: nal results of a phase 3, individually randomised, controlled trial, Lancet 386 (2015) 31-45. [9] W.H.O. (WHO), Malaria vaccine, Wkly. Epidemic 9 (2016) 33-52. [10] L.E. Rodriguez, H. Curtidor, M. Urquiza, G. Cifuentes, C. Reyes, M.E. Patarroyo, Intimate molecular interactions of P. falciparum merozoite proteins involved in invasion of red blood cells and their implications for vaccine design, Chem. Rev. 108 (2008) 3656-3705. [11] M.E. Patarroyo, A. Bermudez, M.A. Patarroyo, Structural and immunological principles leading to chemically synthesized, multiantigenic, multistage, minimal subunit-based vaccine development, Chem. Rev. 111 (2011) 3459-3507. [12] M.A. Patarroyo, A. Bermudez, C. Lopez, G. Yepes, M.E. Patarroyo, 3D analysis of the TCR/pMHCII complex formation in monkeys vaccinated with the rst peptide inducing sterilizing immunity against human malaria, PLoS One 5 (2010) e9771. [13] M.E. Patarroyo, A. Bermudez, M.P. Alba, M. Vanegas, A. Moreno-Vranich, L.A. Poloche, M.A. Patarroyo, IMPIPS: the immune protection-inducing protein structure concept in the search for steric-electron and topochemical principles for complete fully-protective chemically synthesised vaccine development, PLoS One 10 (2015) e0123249. [14] A. Bermudez, D. Calderon, A. Moreno-Vranich, H. Almonacid, M.A. Patarroyo, A. Poloche, M.E. Patarroyo, Gauche(+) side-chain orientation as a key factor in the search for an immunogenic peptide mixture leading to a complete fully protective vaccine, Vaccine 32 (2014) 2117-2126. [15] M. Andreatta, E. Karosiene, M. Rasmussen, A. Stryhn, S. Buus, M. Nielsen, Accurate pan- speci c prediction of peptide-MHC class II binding af nity with improved binding core identi cation, Immunogenetics 67 (2015) 641-650. [16] M.E. Patarroyo, A. Moreno-Vranich, A. Bermudez, Phi (Phi) and psi (Psi) angles involved in malarial peptide bonds determine sterile protective immunity, Biochem. Biophys. Res. Commun. 429 (2012) 75-80. [17] M.E. Patarroyo, A. Bermudez, M.P. Alba, The high immunogenicity induced by modi ed sporozoites’ malarial peptides depends on their phi (varphi) and psi (psi) angles, Biochem. Biophys. Res. Commun. 429 (2012) 81-86. [18] M.E. Patarroyo, M.A. Patarroyo, L. Pabon, H. Curtidor, L.A. Poloche, Immune protection- inducing protein structures (IMPIPS) against malaria: the weapons needed for beating Odysseus, Vaccine 33 (2015) 7525-7537. [19] M.E. Patarroyo, G. Arevalo-Pinzon, C. Reyes, A. Moreno-Vranich, M.A. Patarroyo, Malaria parasite survival depends on conserved binding peptides’ critical biological functions, Curr. Issues Mol. Biol. 18 (2015) 57-78. [20] S. Li, M. Plebanski, P. Smooker, E.J. Gowans, Editorial: why vaccines to HIV, HCV, and malaria have So far failed-challenges to developing vaccines against immunoregulating pathogens, Front. Microbiol. 6 (2015) 1318. [21] M.E. Patarroyo, M.P. Alba, L.E. Vargas, Y. Silva, J. Rosas, R. Rodriguez, Peptides inducing short-lived antibody responses against Plasmodium falciparum malaria have shorter structures and are read in a different MHC II functional register, Biochemistry 44 (2005) 6745-6754. [22] C.S. Parry, J. Gorski, L.J. Stern, Crystallographic structure of the human leukocyte antigen DRA, DRB3*0101: models of a directional alloimmune response and autoimmunity, J. Mol. Biol. 371 (2007) 435-446. [23] S. Dai, F. Crawford, P. Marrack, J.W. Kappler, The structure of HLA-DR52c: comparison to other HLA-DRB3 alleles, Proc. Natl. Acad. Sci. U. S. A. 105 (2008) 11893-11897. [24] J.R. Lukens, M.W. Cruise, M.G. Lassen, Y.S. Hahn, Blockade of PD-1/B7-H1 interaction restores effector CD8+ T cell responses in a hepatitis C virus core murine model, J. Immunol. 180 (2008) 4875-4884. [25] E.J. Wherry, T cell exhaustion, Nat. Immunol. 12 (2011) 492-499. [26] M.N. Wykes, J.M. Horne-Debets, C.Y. Leow, D.S. Karunarathne, Malaria drives T cells to exhaustion, Front. Microbiol. 5 (2014) 249. [27] R. Stephens, J. Langhorne, Effector memory Th1 CD4 T cells are maintained in a mouse model of chronic malaria, PLoS Pathog. 6 (2010) e1001208. [28] M. Walther, J.E. Tongren, L. Andrews, D. Korbel, E. King, H. Fletcher, R.F. Andersen, P. Bejon, F. Thompson, S.J. Dunachie, F. Edele, J.B. de Souza, R.E. Sinden, S.C. Gilbert, E.M. Riley, A.V. Hill, Upregulation of TGF-beta, FOXP3, and CD4þCD25þ regulatory T cells correlates with more rapid parasite growth in human malaria infection, Immunity 23 (2005) 287-296. [29] G. Minigo, T. Woodberry, K.A. Piera, E. Salwati, E. Tjitra, E. Kenangalem, R.N. Price, C.R. Engwerda, N.M. Anstey, M. Plebanski, Parasite-dependent expansion of TNF receptor II-positive regulatory T cells with enhanced sup- pressive activity in adults with severe malaria, PLoS Pathog. 5 (2009) e1000402. [30] N.S. Butler, J. Moebius, L.L. Pewe, B. Traore, O.K. Doumbo, L.T. Tygrett, T.J. Waldschmidt, P.D. Crompton, J.T. Harty, Therapeutic blockade of PD-L1 and LAG-3 rapidly clears established blood-stage Plasmodium infection, Nat. Immunol. 13 (2012) 188-195. [31] J.M. Horne-Debets, D.S. Karunarathne, R.J. Faleiro, C.M. Poh, L. Renia, M.N. Wykes, Mice lacking Programmed cell death-1 show a role for CD8(þ) T cells in long-term immunity against blood-stage malaria, Sci. Rep. 6 (2016) 26210. Table legends Table 1. IMPIPS molecule of origin and our laboratory’s serial number in bold; below the native cHABP number, aa sequence; distance between the farthest atoms in pockets 1 and 9, measured in Å; (NA = not-applicable), antibody titres as assessed by IFA, the pre x the number of monkeys displaying such titre, PI = pre-immune, 20 days after the second dose (II20) and performance after rst challenge including the number of fully protected monkeys and those protected after rechallenge (+ o -). Colours indicate residues tting into HLA-DR1* PBR pockets: fuchsia pocket 1, blue pocket 4, orange pocket 6 and green pocket 9. TCR-contacting residues in this study (p2, p5, p8) are indicated. Table 2. IMPIPS inducing merozoite-FPLLI or sporozoite-VHLLAI. HLA-DR1* or 3*, 4*, 5* alleles binding activity and their IC below in parenthesis based on the NetMHCIIpan-3.0 method. According to their PBR register, TCR-contacting residues p2, p5, p8 side-chain c1 angles are described. Figure Legends Figure 1. A. Immuno uorescence patterns recognised by sera from Aotus monkeys immunised with speci c IMPIPS and determined by immuno uorescence. MSP-2 and MSP-1 detected on the membrane surface proteins; SERA-5, serine repeat antigen-5 intracytoplasmic; AMA-1, apical merozoite antigen: present on the apical and Mrz membrane; HRP-II histidine-rich protein II: identi ed as small intra-erythrocyte dots; EBA-175, erythrocyte binding antigen-175 present in micronemes. CSP-1 membranal circumsporozoite protein-1; SPECT-1 sporozoite microneme protein essential for cell traversal-1 identi ed in membrane and micronemal small dots; STARP sporozoite threonine and asparagine-rich protein, and TRAP thrombospondin-related anonymous protein, identi ed in rhoptries and micronemes. B. WB analysis of MSP-2 (4044) 24112 immunised and re-challenge protected monkeys compared to (4044) 22774 MSP-2 immunized re-challenge and non-protected monkey. C. IMPIPS lowest energy conformer 3D structure determined by 600 MHz 1H NMR identi ed by our serial number followed by dot corresponding to conformer number. Amino-acid colour based on HLA-DR1* binding activities, binding motifs, and binding registers as follows: pocket 1, fuchsia; p2, red; p3, turquoise; pocket 4, dark blue; p5, rose; pocket 6, light brown; p7, gray; p8, yellow and pocket 9, green. The distances between the farthest atoms of residues tting into pockets 1 and 9 are measured in angstroms (Å). Figure 2. Parasitaemia levels, percentage of infected RBC (%) displayed in a semi-logarithmic scale as assessed by AO staining in, monkeys participating in re-challenge trials A and B group I (protected); group II (non-protected) on days after re-challenge. Table 1. Table 2. Figure 1 Figure 2 Anexo 3. Estimación de la frecuencia en poblaciones humanas de los linajes alélicos del CMH-DRB 220 Frecuencias linajes alelicos. HLA-DRB DRB1*01 21.2 2.4 9.8 8.3 0.9 1.4 15.1 25.7 14.0 20.3 13.5 16.5 8.0 0.3 19.3 0.0 18.9 4.7 4.9 2.7 4.0 22.2 17.0 DRB1*03 4.7 1.8 24.4 17.7 2.7 8.0 10.3 16.9 25.7 22.5 37.1 13.1 18.4 0.1 17.0 2.3 15.8 7.5 17.4 4.7 4.8 12.6 19.7 DRB1*04 35.3 65.2 31.8 18.7 23.0 12.6 23.3 19.1 9.9 25.6 15.1 33.9 28.2 32.4 31.0 35.7 23.7 26.7 20.8 50.5 33.0 17.8 26.1 DRB1*07 32.9 2.8 23.4 23.9 2.4 17.7 43.2 31.6 18.3 23.7 19.7 28.4 20.7 0.2 23.0 0.0 20.6 16.9 17.7 5.3 11.4 39.3 22.4 DRB1*08 18.8 35.8 4.5 6.6 58.2 3.3 5.5 14.0 11.8 6.7 1.2 2.4 1.7 29.4 15.7 17.8 14.5 14.9 2.6 23.3 12.1 5.9 8.1 DRB1*09 2.4 7.5 0.7 8.8 0.0 6.7 9.6 5.1 5.1 1.6 0.0 0.6 0.0 3.4 3.5 0.0 4.4 28.3 1.4 22.7 20.7 5.9 5.6 DRB1*10 2.4 0.6 7.6 10.5 0.3 3.8 1.4 5.1 5.0 2.4 5.8 5.0 4.6 0.0 4.1 0.0 3.5 2.8 4.6 1.6 2.7 3.0 2.8 DRB1*11 9.4 4.1 37.7 16.1 0.0 6.4 12.3 28.7 28.3 29.1 16.6 43.4 48.3 23.3 18.8 8.5 18.4 11.8 35.6 27.4 10.9 20.7 26.8 DRB1*12 12.9 0.6 1.7 11.0 14.9 58.7 16.4 0.0 8.0 2.1 1.2 4.3 2.3 5.1 2.6 67.4 6.6 25.6 1.6 37.4 18.2 4.4 5.9 DRB1*13 8.2 2.8 28.0 17.6 1.2 5.9 30.8 29.4 37.7 26.2 10.4 26.3 23.6 0.1 28.7 1.6 27.2 11.5 19.4 3.8 11.4 28.9 23.8 DRB1*14 21.2 50.9 7.2 13.0 80.6 8.7 8.9 0.0 3.5 4.8 38.2 7.3 7.5 14.8 10.4 19.4 10.1 13.2 9.1 23.6 34.6 6.7 6.7 DRB1*15 30.6 2.5 18.4 34.9 13.7 58.4 18.5 17.6 30.3 24.6 23.6 14.0 27.6 69.4 17.9 47.3 19.7 27.5 21.6 15.8 14.1 22.2 24.6 DRB1*16 0.0 21.6 2.8 2.5 0.9 8.0 4.8 2.2 2.5 4.6 14.3 3.9 9.2 21.7 7.6 0.0 5.7 5.0 7.5 0.3 2.3 10.4 4.8 N 85 4800 19145 6860 335 2760 146 136 8745 628401 259 23926 174 1560 15423 129 228 120744 15996 639 1409 135 853309 Global: Basado en la tipificación de 853309 individuos Minería de datos a partir de: Allele Frequency Net Database (AFND). Nucleic Acid Research 2011 39:D913-D919. http://www.allelefrequencies.net/ Aleut Amerindian Arab Asian Aust. Abor Austronesian Bashkir Berber Black Caucasoid Gypsy Jew Kurd Melanesian Mestizo Micronesian Mulatto Oriental Persian Polynesian Siberian Tatar Global Anexo 4. Uso de la metodología FMO-PIEDA en el análisis del efecto de mutaciones en proteínas “New mutations in non-syndromic primary ovarian insufficiency patients identified via whole-exome sequencing” Patiño LC, Beau I, Carlosama C, Buitrago JC, González R, Suárez CF, et al. New mutations in non-syndromic primary ovarian insufficiency patients identified via whole- exome sequencing. Human Reproduction. 2017:1-9. La versión publicada del artículo puede ser consultada en: https://academic.oup.com/humrep/article-abstract/32/7/1512/3823627/New-mutations- in-non-syndromic-primary-ovarian?redirectedFrom=fulltext 222 New mutations in non-syndromic primary ovarian insufficiency patients identified via whole-exome sequencing Liliana Catherine Patiño1, Isabelle Beau2, Carolina Carlosama1, July Constanza Buitrago1, Ronald González3, Carlos Fernando Suárez3,4, Manuel Alfonso Patarroyo3,5 Brigitte Delemer6, Jacques Young2,7, Nadine Binart2, Paul Laissue1,* 1Center For Research in Genetics and Genomics (CIGGUR). GENIUROS Research Group. School of Medicine and Health Sciences. Universidad del Rosario. Bogotá, Colombia. 2Inserm 1185, Le Kremlin-Bicêtre, Université Paris-Saclay, Faculté de Médecine Paris Sud, Le Kremlin-Bicêtre, France; 3Fundación Instituto de Inmunología de Colombia (FIDIC), Bogotá D.C., Colombia.;4Universidad de Ciencias Aplicadas y Ambientales (UDCA), Bogotá D.C., Colombia; 5Basic Sciences Department, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá D.C., Colombia; 6Service d'Endocrinologie- Diabète-Nutrition, CHU de Reims-Hôpital Robert-Debré, Reims, France ; 7APHP, Hôpital de Bicêtre, Service d'Endocrinologie et des Maladies de la Reproduction, Le Kremlin- Bicêtre, France. *Correspondence address: Paul Laissue MD, PhD, HDR, Center For Research in Genetics and Genomics (CIGGUR). GENIUROS Research Group. School of Medicine and Health Sciences. Universidad del Rosario. Bogotá, Colombia. Address: Carrera 24 N° 63C-69, CP 112111, Bogotá DC, Colombia. Tel : +5712970200; Fax : +5712970200; E-mail: paul.laissue@urosario.edu.co Running Title: Mutations in primary ovarian insufficiency Abstract STUDY QUESTION: It is able to identify new mutations potentially associated to non- syndromic primary ovarian insufficiency (POI) via whole-exome sequencing (WES)? SUMMARY ANSWER: WES is an efficient tool to study genetic causes of POI as we have identified new mutations, some of which lead to protein destabilisation potentially contributing to the disease aetiology. WHAT IS KNOWN ALREADY: POI is a frequently occurring complex pathology leading to infertility. Mutations in only few candidate genes, mainly identified by Sanger sequencing, have been definitively related to the pathogenesis of the disease. STUDY DESIGN, SIZE, DURATION: This is a retrospective cohort study performed on 69 women affected by POI. PARTICIPANTS/MATERIALS, SETTING, METHODS: WES and an innovative bioinformatics analysis were used on non-synonymous sequence variants in a subset of 420 selected POI candidate genes. Mutations in BMPR1B and GREM1 were modelled by using fragment molecular orbital analysis. MAIN RESULTS AND THE ROLE OF CHANCE: Fifty-five coding variants in 49 genes potentially related to POI were identified in 33 out of 69 patients (48%). These genes participate in key biological processes in the ovary, such as meiosis, follicular development, granulosa cell differentiation/proliferation and ovulation. The presence of at least two mutations in distinct genes in 45% of the patients argued in favor of a polygenic nature of POI. LARGE SCALE DATA: Exome data was uploaded at the Open Science Framework. LIMITATIONS, REASONS FOR CAUTION: It would be possible that regulatory regions, not analysed in the present study, carry further variants related to POI. WIDER IMPLICATIONS OF THE FINDINGS: WES and the in silico analyses presented here represent an efficient approach for mapping variants associated with POI etiology. Computational modelling of variants suggested a significant change in protein stability secondary to BMPR1B-p.Arg254His, BMPR1B-p.Phe272Leu and p.GREM1-p.Arg169Thr mutations. Taken together, our findings add valuable information regarding POI molecular origin. Sequence variants presented here represents potential future genetic biomarkers. STUDY FUNDING/COMPETING INTERESTS: This study was supported by the Universidad del Rosario and Colciencias (Grants CS/CIGGUR-ABN062-2016 and 672- 2014). Colciencias supported Liliana Catherine Patiño´s work (Fellowship: 617, 2013). Key words: whole-exome sequencing; primary ovarian insufficiency; female infertility; molecular etiology Introduction Primary ovarian insufficiency (POI), is a frequently occurring complex pathology affecting 1% of women under 40 years old (Conway, 2000). Clinically, it is characterized by amenorrhea, hypoestrogenism, and high gonadotropin levels reflecting precocious ovarian depletion of the follicular reserve (Nelson, 2009; De Vos et al., 2010). POI has been proposed as a progressive condition describing ovarian dysfunction (e.g. ovarian function impairment and irregular ovulation) leading to infertility (premature ovarian failure, POF) (Welt, 2008). Although most POI cases are considered idiopathic, genetic anomalies have been described in syndromic and non-syndromic forms of the disease, such as chromosomal abnormalities and point mutations in POI genes’ coding regions (autosomes and X-linked genes) (Laissue, 2015; Qin et al., 2015). Mutations in only a few candidate genes have been definitively related to pathogenesis of the disease, despite numerous attempts at identifying sequence variants via Sanger sequencing (Laissue, 2015; Qin et al., 2015) (and references therein). This might have been due to the fact that female reproduction requires numerous steps, from sex determination/gametogenesis to ovulation, to guarantee oocyte health for normal fecundation. It has been shown that several transcription factors (e.g. NR5A1, NOBOX, FIGLA, FOXL2) play key roles during female gonadal development and their mutations lead to POI (Laissue, 2015). TGF-β molecules and their downstream molecular pathways have also demonstrated to be essential for ovary physiology in distinct mammalian species. BPM15 and GDF9 are especially interesting as they participate as major regulators of mammalian ovulation rate. Furthermore, their mutations have been related to POI origin (Laissue, 2015). Meiotic genes as MCM8, MCM9, STAG3, SYCE1, MSH3, MSH4 and MLH3 have been considered as important molecules for determining the oocyte pool. To date, more than 60 mouse models presenting a well-defined phenotype of ovarian failure have been described (Barnett, 2006; Roy and Matzuk, 2006; Edson et al., 2009; Jagarlamudi et al., 2010; Sullivan and Castrillon, 2011; Monget et al., 2012 and www.jax.org). Such a scenario, in which hundreds of genes are involved in complex dynamic regulatory networks, has hampered selecting relevant candidates to be screened by Sanger sequencing. This constraint, as well as the rarity of families affected by the disease (theoretically facilitating classical genetic mapping), has made research concerning POI genetic causes particularly challenging. Very recently, some studies based on next generation sequencing (NGS) have been successfully undertaken as they have led to new genes being proposed, as well as mutations associated with POI etiology (Caburet et al., 2014; de Vries et al., 2014; Wood-Trageser et al., 2014; Fonseca et al., 2015; Bouilly et al., 2016; Bramble et al., 2016; Fauchereau et al., 2016). However, experiments have not been performed on large genomic regions in unrelated POI individuals. The present study involved whole-exome sequencing of 69 unrelated Caucasian women affected by POI. Innovative bioinformatics analysis was used on non-synonymous sequence variants in a subset of 420 selected POI candidate genes. Fifty-five coding variants in 49 genes potentially related to the phenotype were identified in 33 out of 69 patients (48%). These genes participate in key biological processes in the ovary, such as meiosis, follicular development, granulosa cell differentiation/proliferation and ovulation. The presence of at least two mutations in distinct genes in 45% of the patients argued in favour of a polygenic nature of POI. Computational 3D modelling, via fragment molecular orbital method, of three mutations (two in BMPR1B and one in GREM1) argued strongly in favour of pathogenic effects. The novel genes and mutations described here represent potential future genetic biomarkers for POI. Materials and Methods Women affected by POI Sixty-nine women (Pt-1 through Pt-69) affected by idiopathic POI were included in the study. These patients were Caucasians living in France who were referred for evaluation to the Reproductive Endocrinology Department at Bicêtre Teaching Hospital and the Endocrinology Department at Robert Debré Hospital, both in France. All patients exhibited at least 6 months of amenorrhea before age 40 with FSH values >20 IU/L measured in two samples at least 1 month apart and had a normal 46,XX karyotype. Turner syndrome, X- chromosome karyotypic abnormalities and FMR1 premutations were excluded and none of the patients had circulating ovarian antibodies. Women having antecedents of pelvic surgery, ovarian infections, chemotherapy and/or autoimmune disease were also excluded from the study. Twelve and 57 displayed primary or secondary amenorrhea, respectively. NGS, Sanger sequencing and bioinformatics analysis Total DNA from patients was extracted from blood leucocytes by conventional salting-out procedure. Experimental details of NGS experiments, Sanger sequencing and bioinformatics analysis have been included as Supplemental Methods. Structure preparation, modelling and fragment molecular orbital (FMO) calculations Details on the in silico approaches for modelling BMPR1B-p.Arg254His, BMPR1B- p.Phe272Leu and p.GREM1-p.Arg169Thr mutations have been included as Supplemental Methods. Ethical approval All clinical and experimental steps of this study were approved by Institutional Review Board (reference PHRC No. A0R03 052) and by Bicêtre Ethical committee (CPP # PP 16-024 Ile- de-France VII). The clinical investigation was performed according to Helsinki Declaration guidelines (1975, as revised in 1996). All the women had given their informed consent to participate. Results The percentage of reads on target (coverage) ranged from 80%–95%. Coverage was defined as the percentage of target bases that are sequenced a given number of times. More than 85% of the target was covered at 40X depth. Exome data was uploaded at the Open Science Framework (Patiño, L. 2016, December 16, http://doi.org/10.17605/OSF.IO/EY9ME). 43337 sequence variants were identified in the POI-420 subset (Figure 1). 2544 variants having MAF <0.05 were present in the POI-420 group while 137996 were found throughout the exome (all exome data, All-ex). Among POI-420, 488 induced a protein change: 7 nonsense, 4 splice site, 53 frameshift and 424 missense variants. Among these 460 missense variants, 120 had scores compatible with deleterious effects by using PolyPhen-2 and SIFT bioinformatics tools. 55 sequence variants were definitely confirmed by Sanger sequencing (Table 1, Figure 1). All variants were found at heterozygous state. In this series of 69 POI patients, 33 presented one or more confirmed variant (Table 1). The frequency of each variation in the ExAC database was indicated. Four genes displayed at least two mutations: NOTCH2 (n=3), ADAMTS16 (n=2), BMPR1A (n=2), BMPR1B (n=2) and C3ORF77 (n=2). Clinical characteristics of patients having candidate mutations are shown in Table 1. Four patients presented with primary amenorrhea with varying pubertal development. The other patients presented with normal puberty and secondary amenorrhea. Symptoms appeared between 15-39 years of age (median 32±8 yrs). Hormonal characteristics included markedly elevated FSH (73,6 ± 6.2 IU/L), LH (36,5 ± 3,8 IU/L) and low levels of estradiol (14,7 ± 2,5 ng/L). In sum, among the 33 patients, 19, 9, 2 and 3 patients were found to carry 1, 2, 3 and 4 mutations, respectively. Interestingly 43% of these patients had at least two mutations in different genes arguing in favor of a polygenic origin for POI. The BMPR1B modelled mutations by FMO analysis involved changes in stabilising interactions (Supplemental Figure S2). The mutations highlighted a major change in total interaction energy from -54.75 (WT) to -29.54 (MT) kcal/mol (in position Arg254) and - 44.69 (WT) to -33.38 (MT) kcal/mol (in position Phe272). Replacing a charged amino acid by a neutral amino acid and the loss of a non-classical H-bond (CH-π interactions) contributed to BMPR1B-MT protein destabilisation. Similarly, changes of one order of magnitude were found (-239.86 kcal/mol WT vs. -27.86 kcal/mol MT) concerning stabilising interactions between GREM1-WT (wild type) and GREM1-MT (mutant) (Supplemental Figure S3). Detailed information on results from FMO analysis has been included as Supplemental Results. Discussion The present work describes whole-exome sequencing in 69 patients who were affected by classical clinical signs of POI. Primary analysis of data was focused on 420 POI candidate genes which had been systematically selected from public databases. Stringent filters (e.g. low MAF, non-synonymous mutations, SIFT and PolyPhen2 software screening) were used to facilitate the selection of rare mutations having (theoretically) moderate/strong pathogenic functional effects. These mutations affected genes involved in several key biological processes, such as meiosis, follicular development, granulosa cell differentiation/proliferation, ovulation, cell metabolism and extracellular matrix regulation (Table 1). Although all the 55 filtered variants (and genes) may have contributed to the POI phenotype (some of them probably in an additive/epistatic fashion), several of them belonging to distinct molecular cascades are especially interesting because of their previously described roles in ovary physiology. GDF9, BMPR1B, GREM1, which participate in the TGF-β (transforming growth factor) signalling pathway, have been clearly linked to specific ovary biological functions, such as granulosa cell proliferation, ovulation and/or follicular development regulation (Figure 2). GDF9 (as well as its close homologue BMP15) is a soluble oocyte-secreted factor which binds to specific serine/threonine kinase types I and II receptors located on granulosa cell surface (Weiss and Attisano, 2013; Laissue, 2015). Several mutations in humans, most located in the protein’s pro-region, have been identified in POI patients and women displaying twinning (Montgomery et al., 2004; Palmer et al., 2006; Laissue et al., 2008; Persani et al., 2014). Functional tests of mutant GDF9 have been seen to have deleterious effects, such as the synthesis of defective mature products, the reduction of mature protein expression/secretion and the inhibition of granulosa cell proliferation (Inagaki and Shimasaki, 2010; Wang et al., 2013; Persani et al., 2014; Simpson et al., 2014). Some mutations, especially those located at the end (C-ter) of the pro-domain, have been related to an increase in granulosa cell proliferation (Simpson et al., 2014). The GDF9 p.Ser83Cys mutation identified in Pt-34 was located in the protein’s pro-region which is important for proper protein folding, dimerization, secretion and stability. Similar to other GDF9 mutations located in the pro-region, GDF9-p.Ser83Cys might lead to mature peptide dysfunction and granulosa cell proliferation inhibition. BMP15:GDF9 heterodimers (which have greater biological activity than either BMP15 or GDF9 homodimers alone) act in human and mouse species via a receptor complex constituted by the BMPR2 receptor, the ALK4/5/7 type I receptor and the BMPR1B (ALK6) co-receptor (Peng et al., 2013). ALK6 has been shown to be essential for downstream intracellular signalling by triggering SMAD1/5/8 phosphorylation. Alk6 knockout females have been shown to suffer infertility secondary to cumulus expansion impairment while the p.Gln249Arg mutation in sheep (located in the protein’s highly conserved intracellular kinase signalling domain) has been linked to hyperfertility, due to an increase in ovulation rate (Souza et al., 2001; Yi et al., 2001; Davis, 2004). Overexpression of BMPR1B has been described in women having a reduced ovarian reserve (Regan et al., 2016). Both mutations identified in BMPR1B (p.Arg254His and p.Phe272Leu) in the present study were located in the functional intracellular kinase domain, suggesting that they might be associated with POI pathogenesis. In addition, results from FMO analysis suggested a significant change in protein stability secondary to these mutations, which might related to and impairment of the TGF-β signalling between oocytes and granulosa cells (Supplemental Figure S2). Regarding TGF-β signalling regulation, GREM1 (Gremlin1), a member of the DAN family of BMP inhibitors, binds to BMP proteins, preventing them from activating specific receptors (Kattamuri et al., 2012). Although the mechanism used by DAN proteins during BMP ligand inhibition is not well understood, it has been shown that GREM1 regulates important factors having roles during folliculogenesis, such as BMP2, BMP4 and BMP15 (Hsu et al., 1998; Pangas et al., 2004; Nilsson et al., 2014; Church et al., 2015; Bayne et al., 2016) (Figure 2). Grem1 knockout mice have displayed delayed meiotic progression, defects regarding primordial follicle assembly dysfunction and a reduced amount of oocytes (Myers et al., 2011). GREM1 is expressed in humans during early and until late stages of follicular development, and has been linked to granulosa cell development (Kristensen et al., 2014; Bayne et al., 2016). Furthermore, a significant decrease in its expression has been reported in women having reduced ovarian reserve (Jindal et al., 2012). The GREM1-p.Arg169Thr mutation found in Pt-24 strongly suggests a functional role since it is located in a critical region (Pro145 to Gln174 residues) of the DAN domain which directly interacts with BMP4 (Sun et al., 2006). Furthermore, the GREM1-Arg169 residue is conserved in other DAN-family members and among numerous vertebrate species (Sun et al., 2006; Veverka et al., 2009). Indeed, abnormal folding of the β2/β3 (finger 2) sheet could modify the protein’s local chemical properties which might then lead to interaction disturbances with BMP4 (or other BMP factors). As for BMPR1B mutations, the FMO analysis showed that the GREM1-p.Arg169Thr mutation led to changes in protein stability which might contribute to the phenotype (Supplemental Figure S3). These findings strongly suggest a relevant role for TGF-β proteins, especially those involved in oocyte-to-granulosa cell signalling, during POI pathogenesis. Concerning molecules involved in meiosis, the present study was able to identify 16 mutations potentially contributing to the phenotype. Functional protein association networks of some meiotic proteins have been included as supplemental material (Supplemental Figure S1). STAG3 and MCM9 are especially interesting due to their well-established role during female fertility and POI. To date, all mutations in meiotic genes linked to POI etiology have been found in biallelic state (homozygous or compound heterozygous) thereby underlining meiosis’ key role in reproduction and species maintenance (Caburet et al., 2014; de Vries et al., 2014; Wang et al., 2014; Wood-Trageser et al., 2014; AlAsiri et al., 2015; Fauchereau et al., 2016). Mutations in meiotic genes were present at heterozygous state in our present study, which might be associated with a background of POI predisposition. Further variants would be necessary to originate the phenotype in such hypothetically scenario. Interestingly, we found that 64% (7 out 11) of patients having a heterozygous mutation in a meiotic gene were carriers of at least one further variant in the same or a distinct gene. Interestingly, we have found three different mutations in NOTCH2, a gene encoding one of the four NOTCH family single-pass Type I (SPTI) transmembrane receptors (Andersson et al., 2011). The NOTCH2-p.Ser1804Leu, p.Gln1811His and p.Leu2408His mutations identified in the present study were located in the intracellular domain of the protein which translocates to the nucleus where it mediates transactivation/repression (Kopan and Ilagan, 2009). Thus, it would be possible that these mutant forms lead to expression disturbances of key target genes involved during oocyte development. We consider that additional mutations in genes participating in follicular development, granulosa cell differentiation and proliferation, ovulation and extracellular matrix regulation could also contribute to the phenotype due to their molecular behaviour during ovary development and physiology. For example, this is the case of ATG7-p.Phe403Leu, THBS1- p.Gln96Arg, PTCH1-p.Val1131Ala, PCSK6-p.Thr964Met, UMODL1-p.Ile1330Asn, ADAMTS16-p.Arg100Trp, p.Arg789Cys and PTX3-p.Pro303Arg. To note, in clinical practice it has been observed that patients affected by POI report similar phenotypes in some women from their families which suggests a genetic origin of the disease. In our case, although candidate mutations have not shown to be clustered in particular familial cases, incomplete penetrance cannot be excluded. Thus, it would be interesting to study potential segregation analysis of interesting variants but, unfortunately, although we did propose to most of our POI patients the idea of contacting their parents regarding their participation in our study they decided not to involve their families. The genetic approach presented here revealed that 33 out of 69 (48%) patients were carriers of mutations potentially related to the phenotype. Interestingly, 42% of these patients had at least two mutations in different genes and 49 out 55 variants were identified in distinct genes, thereby arguing in favor of a polygenic origin for POI. Furthermore, our findings evoke the importance of rare variants in complex disease pathogenesis and contribute information for resolving genomic concerns such as “missing hereditability” (Manolio et al., 2009; Gibson, 2012; Lee et al., 2014; Laissue, 2015). Concerning our methodological approach it is clear that correct gene subset configuration depends on multiple variables, such as the availability of previous accurate data relating specific genes to ovarian biology and the rigor (and method) used when investigating potential candidates. This approach may lose further candidates contributing to the phenotype. However, we consider that it represents interesting middle ground between a large amount of genomic data (e.g. All-ex variants) and the results obtained from other sequencing designs (custom array sequencing or single Sanger approaches). An advantage of the present design is that the availability of sequences from all encoding regions enables future reanalysing of data by including additional genes and/or by setting up alternative methods (e.g. interactome approaches). We estimate that whole-exome sequencing and the in silico analysis presented here represent an efficient approach for mapping variants (having potentially moderate/strong functional effects) associated with POI etiology. Further NGS studies, performed in larger panels of women affected by POI, would be a valuable exercise to identify novel causative mutations. Taken together, our findings add valuable information regarding POI molecular etiology and ought to form the starting point for further functional in vitro and in vivo studies. Authors' Roles Clinical work was performed by BD, JY, NB and IB. The experiments were performed by LCP, CC, JCB. MAP, CFS and RG performed the FMO analysis. All authors contributed to interpretation of findings. The study was designed and directed by PL. The manuscript was draft by PL with contributions to revision and final version by all authors. Funding This study was supported by the Universidad del Rosario, Grant CS/CIGGUR-ABN062- 2016. Conflict of Interest The authors declare no conflict of interest. References AlAsiri S, Basit S, Wood-Trageser MA, Yatsenko SA, Jeffries EP, Surti U, Ketterer DM, Afzal S, Ramzan K, Faiyaz-Ul Haque M, et al. Exome sequencing reveals MCM8 mutation underlies ovarian failure and chromosomal instability. J Clin Invest 2015;125:258–262. Andersson ER, Sandberg R, Lendahl U. Notch signaling: simplicity in design, versatility in function. Development 2011;138:3593–3612. Barnett KR. Ovarian follicle development and transgenic mouse models. Hum Reprod Update 2006;12:537–555. Bayne RA, Donnachie DJ, Kinnell HL, Childs AJ, Anderson RA. BMP signalling in human fetal ovary somatic cells is modulated in a gene-specific fashion by GREM1 and GREM2. Mol Hum Reprod 2016;22:622–633. Bouilly J, Beau I, Barraud S, Bernard V, Azibi K, Fagart J, Fèvre A, Todeschini AL, Veitia RA, Beldjord C, et al. Identification of multiple gene mutations accounts for a new genetic architecture of primary ovarian insufficiency. J Clin Endocrinol Metab 2016;jc.2016-2152. Bramble MS, Goldstein EH, Lipson A, Ngun T, Eskin A, Gosschalk JE, Roach L, Vashist N, Barseghyan H, Lee E, et al. A novel follicle-stimulating hormone receptor mutation causing primary ovarian failure: a fertility application of whole exome sequencing. Hum Reprod 2016;31:905–914. Caburet S, Arboleda VA, Llano E, Overbeek PA, Barbero JL, Oka K, Harrison W, Vaiman D, Ben-Neriah Z, García-Tuñón I, et al. Mutant cohesin in premature ovarian failure. N Engl J Med 2014;370:943–949. Church RH, Krishnakumar A, Urbanek A, Geschwindner S, Meneely J, Bianchi A, Basta B, Monaghan S, Elliot C, Strömstedt M, et al. Gremlin1 preferentially binds to bone morphogenetic protein-2 (BMP-2) and BMP-4 over BMP-7. Biochem J 2015;466:55– 68. Conway GS. Premature ovarian failure. Br Med Bull 2000;56:643–649. Davis GH. Fecundity genes in sheep. Anim Reprod Sci 2004;82–83:247–253. Edson MA, Nagaraja AK, Matzuk MM. The mammalian ovary from genesis to revelation. Endocr Rev 2009;30:624–712. Fauchereau F, Shalev S, Chervinsky E, Beck-Fruchter R, Legois B, Fellous M, Caburet S, Veitia RA. A non-sense MCM9 mutation in a familial case of primary ovarian insufficiency. Clin Genet 2016;89:603–607. Fonseca DJ, Patiño LC, Suárez YC, Jesús Rodríguez A de, Mateus HE, Jiménez KM, Ortega-Recalde O, Díaz-Yamal I, Laissue P. Next generation sequencing in women affected by nonsyndromic premature ovarian failure displays new potential causative genes and mutations. Fertil Steril 2015;104:154–162.e2. Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet 2012;13:135–145. Hsu DR, Economides AN, Wang X, Eimon PM, Harland RM. The Xenopus dorsalizing factor Gremlin identifies a novel family of secreted proteins that antagonize BMP activities. Mol Cell 1998;1:673–683. Inagaki K, Shimasaki S. Impaired production of BMP-15 and GDF-9 mature proteins derived from proproteins WITH mutations in the proregion. Mol Cell Endocrinol 2010;328:1–7. Jagarlamudi K, Reddy P, Adhikari D, Liu K. Genetically modified mouse models for premature ovarian failure (POF). Mol Cell Endocrinol 2010;315:1–10. Jindal S, Greenseid K, Berger D, Santoro N, Pal L. Impaired Gremlin 1 (GREM1) expression in cumulus cells in young women with diminished ovarian reserve (DOR). J Assist Reprod Genet 2012;29:159–162. Kattamuri C, Luedeke DM, Nolan K, Rankin SA, Greis KD, Zorn AM, Thompson TB. Members of the DAN Family Are BMP Antagonists That Form Highly Stable Noncovalent Dimers. J Mol Biol 2012;424:313–327. Kopan R, Ilagan MXG. The canonical Notch signaling pathway: unfolding the activation mechanism. Cell 2009;137:216–233. Kristensen SG, Andersen K, Clement CA, Franks S, Hardy K, Andersen CY. Expression of TGF-beta superfamily growth factors, their receptors, the associated SMADs and antagonists in five isolated size-matched populations of pre-antral follicles from normal human ovaries. Mol Hum Reprod 2014;20:293–308. Laissue P. Aetiological coding sequence variants in non-syndromic premature ovarian failure: From genetic linkage analysis to next generation sequencing. Mol Cell Endocrinol 2015;411:243–257. Laissue P, Vinci G, Veitia RA, Fellous M. Recent advances in the study of genes involved in non-syndromic premature ovarian failure. Mol Cell Endocrinol 2008;282:101–111. Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014;95:5–23. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747–753. Monget P, Bobe J, Gougeon A, Fabre S, Monniaux D, Dalbies-Tran R. The ovarian reserve in mammals: A functional and evolutionary perspective. Mol Cell Endocrinol 2012;356:2–12. Montgomery GW, Zhao ZZ, Marsh AJ, Mayne R, Treloar SA, James M, Martin NG, Boomsma DI, Duffy DL. A deletion mutation in GDF9 in sisters with spontaneous DZ twins. Twin Res 2004;7:548–555. Myers M, Tripurani SK, Middlebrook B, Economides AN, Canalis E, Pangas SA. Loss of Gremlin Delays Primordial Follicle Assembly but Does Not Affect Female Fertility in Mice. Biol Reprod 2011;85:1175–1182. Nelson LM. Primary Ovarian Insufficiency. N Engl J Med 2009;360:606–614. Nilsson EE, Larsen G, Skinner MK. Roles of Gremlin 1 and Gremlin 2 in regulating ovarian primordial to primary follicle transition. Reproduction 2014;147:865–874. Palmer JS, Zhao ZZ, Hoekstra C, Hayward NK, Webb PM, Whiteman DC, Martin NG, Boomsma DI, Duffy DL, Montgomery GW. Novel variants in growth differentiation factor 9 in mothers of dizygotic twins. J Clin Endocrinol Metab 2006;91:4713–4716. Pangas SA, Jorgez CJ, Matzuk MM. Growth differentiation factor 9 regulates expression of the bone morphogenetic protein antagonist gremlin. J Biol Chem 2004;279:32281– 32286. Peng J, Li Q, Wigglesworth K, Rangarajan A, Kattamuri C, Peterson RT, Eppig JJ, Thompson TB, Matzuk MM. Growth differentiation factor 9:bone morphogenetic protein 15 heterodimers are potent regulators of ovarian functions. Proc Natl Acad Sci 2013;110:E776–E785. Persani L, Rossetti R, Pasquale E Di, Cacciatore C, Fabre S. The fundamental role of bone morphogenetic protein 15 in ovarian function and its involvement in female fertility disorders. Hum Reprod Update 2014;20:869–883. Qin Y, Jiao X, Simpson JL, Chen Z-J. Genetics of primary ovarian insufficiency: new developments and opportunities. Hum Reprod Update 2015;21:787–808. Regan SLP, Knight PG, Yovich JL, Stanger JD, Leung Y, Arfuso F, Dharmarajan A, Almahbobi G. Dysregulation of granulosal bone morphogenetic protein receptor 1B density is associated with reduced ovarian reserve and the age-related decline in human fertility. Mol Cell Endocrinol 2016;425:84–93. Roy A, Matzuk MM. Deconstructing mammalian reproduction: using knockouts to define fertility pathways. Reproduction 2006;131:207–219. Simpson CM, Robertson DM, Al-Musawi SL, Heath DA, McNatty KP, Ritter LJ, Mottershead DG, Gilchrist RB, Harrison CA, Stanton PG. Aberrant GDF9 expression and activation are associated with common human ovarian disorders. J Clin Endocrinol Metab 2014;99:E615-24. Souza CJ, MacDougall C, MacDougall C, Campbell BK, McNeilly AS, Baird DT. The Booroola (FecB) phenotype is associated with a mutation in the bone morphogenetic receptor type 1 B (BMPR1B) gene. J Endocrinol 2001;169:R1-6. Sullivan S, Castrillon D. Insights into Primary Ovarian Insufficiency through Genetically Engineered Mouse Models. Semin Reprod Med 2011;29:283–298. Sun J, Zhuang F-F, Mullersman JE, Chen H, Robertson EJ, Warburton D, Liu Y-H, Shi W. BMP4 activation and secretion are negatively regulated by an intracellular gremlin- BMP4 interaction. J Biol Chem 2006;281:29349–29356. Veverka V, Henry AJ, Slocombe PM, Ventom A, Mulloy B, Muskett FW, Muzylak M, Greenslade K, Moore A, Zhang L, et al. Characterization of the structural features and interactions of sclerostin: molecular insight into a key regulator of Wnt-mediated bone formation. J Biol Chem 2009;284:10890–10900. Vos M De, Devroey P, Fauser BCJM. Primary ovarian insufficiency. Lancet (London, England) 2010;376:911–921. Vries L de, Behar DM, Smirin-Yosef P, Lagovsky I, Tzur S, Basel-Vanagaite L. Exome Sequencing Reveals SYCE1 Mutation Associated With Autosomal Recessive Primary Ovarian Insufficiency. J Clin Endocrinol Metab 2014;99:E2129–E2132. Wang J, Zhang W, Jiang H, Wu B-L, Primary Ovarian Insufficiency Collaboration. Mutations in HFM1 in recessive primary ovarian insufficiency. N Engl J Med 2014;370:972–974. Wang T-T, Ke Z-H, Song Y, Chen L-T, Chen X-J, Feng C, Zhang D, Zhang R-J, Wu Y-T, Zhang Y, et al. Identification of a mutation in GDF9 as a novel cause of diminished ovarian reserve in young women. Hum Reprod 2013;28:2473–2481. Weiss A, Attisano L. The TGFbeta Superfamily Signaling Pathway. Wiley Interdiscip Rev Dev Biol 2013;2:47–63. Welt CK. Primary ovarian insufficiency: a more accurate term for premature ovarian failure. Clin Endocrinol (Oxf) 2008;68:499–509. Wood-Trageser MA, Gurbuz F, Yatsenko SA, Jeffries EP, Kotan LD, Surti U, Ketterer DM, Matic J, Chipkin J, Jiang H, et al. MCM9 mutations are associated with ovarian failure, short stature, and chromosomal instability. Am J Hum Genet 2014;95:754– 762. Yi SE, LaPolt PS, Yoon BS, Chen JY, Lu JK, Lyons KM. The type I BMP receptor BmprIB is essential for female reproductive function. Proc Natl Acad Sci U S A 2001;98:7994–7999. Figure Legends Figure 1 POI gene subset included 420 candidate genes. Among 2 244 677 total variants, 55 were selected and confirmed by Sanger sequencing. All ex: variants found throughout the exome. MAF: minor allele frequency. Missense S&P+: missense mutations displaying potential deleterious effects by both SIFT and PolyPhen2 bioinformatic tools. Figure 2 Signaling pathways and proteins involved in follicular development. a) Autophagy; b) P13K/AKT pathway; c) SOHLH1 pathway; d) TGF-β´s pathway; e) KIT-L and c-Kit f) leptin pathway; g) NOTCH pathway; h) connexins. Supplemental Figure S1 Protein-protein interaction network made by STRING software for different meiosis proteins. Main proteins are shown into red circles: A) STAG3; B) MLH3; C) MEI1; D) PRDM1. Colored lines display known and predicted interactions. Light blue: curated databases; pink: experimentally determined; green: gene neighborhood; red: gene fusions; blue: gene co-occurrence; yellow: textmining; black: co-expression; purple: protein homology. Supplemental Figure S2 FMO results for BMPR1B: A. PIEDA contributions of amino acids interacting with positions 254 -- Arg (WT) and His (MT) and 272 -- Phe (WT) and Leu (MT). Energies are expressed in kcal/mol B. Overall view of the analysed system. BMPR1B (chain A) and FKBP12 (chain B) are shown in blue and red, respectively. The mutation zones for positions 254 and 272 are shown in green and purple boxes, respectively. C. Arg254 WT, D. His254 MT, E. Phe272 WT, F. Leu272 MT: Bar plots describe the PIEDA of energy interaction terms: electrostatics (green), exchange-repulsion (red), charge-transfer (blue), dispersion (yellow), and solvation (cyan). Positive values are considered destabilising and negative stabilising. G. Detail of the amino acids interacting with Arg254 in BMPR1B WT. H. Detail of the amino acids interacting with His254 in BMPR1B MT. I. Detail of the amino acids interacting with Phe272 in BMPR1B WT. J. Detail of the amino acids interacting with Leu272 in BMPR1B MT. Hydrogen bonds are shown as dotted lines. The backbone-backbone hydrogen bond with Glu256A could not be calculated due the limitations of fragmentation model. Supplemental Figure 3 FMO results for GREM1: A. PIEDA contributions of amino acids interacting with positions 169 chain A -- Arg (WT) and Thr (MT) and the same position, but in chain B. Energies are expressed in kcal/mol B. Overall view of the analysed system. Chain A (blue), chain B (red), chain C (gray) and chain D (orange). The mutation zone for position 169 in the chain A (site 1) and the mutation zone for position 169 in the chain B (site 2) are shown in green and purple boxes, respectively. C. Site 1 WT, D. Site 1 MT, E. Site 2 WT: Bar plots describe the PIEDA of energy interaction terms: electrostatics (green), exchange-repulsion (red), charge-transfer (blue), dispersion (yellow), and solvation (cyan). Positive values are considered destabilising and negative stabilising. F. Detail of the amino acids interacting with Arg169A (site 1) in GREM1 WT (side chain view). G. Detail of the amino acids interacting with Arg169A (site 1) in GREM1 WT (backbone view) H. Detail of the amino acids interacting with Thr169A in in GREM1 MT. I. Detail of the amino acids interacting with Arg169B in in GREM1 WT. Hydrogen bonds are shown as dotted lines. Supplemental Results The FMO method was used for studying BMPR1B and GREM1 WT and MT structures regarding the effect of amino acid substitutions (BMPR1B-p.Arg254His, BMPR1B p.Phe272Leu and p.GREM1-p.Arg169Thr). Supplementary Figures 2 and 3 show the calculated values for BMPR1B and GREM1 models, respectively. Regarding BMPR1B, FMO analysis showed that Arg254A when replaced by His254A evoked a deleterious effect on stabilising interactions. Two major interactions were found concerning the WT protein (Supplementary Figure 2 A, C and G). A hydrogen bond (H- bond) was formed between Glu204A backbone and Arg254A side chain. Another significant interaction between charged Arg254A and charged Glu55B (corresponding to FKBP12 protein) was detected. These two interactions were dominated by the electrostatic term. Regarding His254A-MT, four interactions were identified by FMO (Supplementary Figure 2 A, D and H). The side chain of His254A formed a H-bond with Gln233A side chain. An important electrostatic interaction was detected between His254A and charged Glu55B (corresponding to FKBP12 protein). Two interactions dominated by the solvation component of PIEDA were found between His254A and Glu204A and Glu256A. Concerning the Phe272A in WT, four interactions were identified by FMO (Supplementary Figure 2 A, E and I). Two H-bonds were formed between Phe272A backbone and the backbone of Glu276A and Glu268A. A non-classical H-bond CH-π interaction was detected between Phe272A side chain and Pro89B side chain of FKBP12. This interaction was dominated by the dispersion term. An additional H-bond was found between Phe272A side chain and Glu268A side chain. FMO only identified two interactions for Leu272A-MT (Supplementary Figure 2 A, F and G). It is worth noting that the CH-π interaction is missing due to substituting Phe272A by Leu272A. As in the previous case, two H-bonds were formed between Phe272A backbone and the backbone of Glu276A and Glu268A. Analysis of the GREM1 model led to identifying eight major interactions by means of the FMO method for residue Arg169A in WT (Site 1) (Supplementary Figure 3 A, C, F and G). A salt bridge was formed between deprotonated Glu135B and protonated Arg169A; this interaction consisted of a combination of two non-covalent interactions: hydrogen bonding and electrostatic interactions. Four additional H-bonds were detected by FMO. Three H- bonds between Arg169A side chain and the backbone of Asp184C, Asp182C and Leu183C, respectively. An additional H-bond was formed between Arg169A backbone and Met153A side chain. Two other interactions dominated by the electrostatic term were found between the guanidinium group of Arg169A with Glu134B and Gln139C side chains. The weakest interaction was driven by the solvation component of PIEDA between Arg169A side chain and Thr151A side chain. FMO only identified two interactions for Thr169A MT (Supplementary Figure 3 A, D, and H). It is worth noting that the salt bridge between deprotonated Glu135B and protonated Arg169A was missing due to the charged Arg169A being replaced by the non-charged Thr169A. Two H-bonds were formed between the Thr169A side chain and the backbone of Asp182C and Asp184C. These two interactions were dominated by the electrostatic term. The remaining interactions stabilising the interaction in the WT became lost in the Thr169A MT structure. Concerning the Arg169B in WT (Supplementary Figure 3 A, E and I), three interactions were identified by FMO. The guanidinium group of Arg169B formed a H-bond with Met153B side chain, this interaction was dominated by the dispersion term. Another important interaction driven by the electrostatic term was detected between Arg169B and Glu105B. As in the previous case, the weakest interaction was driven by the solvation component of PIEDA between the Arg169B side chain and the Thr151B side chain. The FMO method did not detect significant interactions for Thr169B MT. Supplemental methods NGS, Sanger sequencing and bioinformatics analysis Library preparation and Ion Proton sequencing were performed following certified protocols from Life Technology. Briefly, 100 ng of genomic DNA was used to amplify exonic target regions, and were enriched and amplified for the 69 DNA samples using Ion AmpliSeqTM Exome RDY Library Preparation kit (Thermo Scientific, A27192). Each sample was processed separately. The amplicons were partially digested with FuPa reagent (proprietary to Thermo Scientific) and phosphorylated prior to ligation of Ion XpressTM Barcode Adapters followed by cleanup using HighPrep PCR clean up system (Magbio, AC 60050). The final libraries were quantified on Qubit® Flurometer using Qubit® dsDNA HS Assay Kit (Thermo Scientific, Q32854) and Agilent® Bioanalyzer using Agilent High Sensitivity DNA Kit (Agilent, 5067-4626). 2 samples were pooled according to the concentrations on the Bioanalyzer and loaded on Ion PITM Chip to be sequenced on Ion ProtonTM system. The samples were sequenced with Ion Proton Sequencer and analyzed with Torrent suite v 4.4.3. The raw reads undergo the process of trimming and filtering to get only the high quality reads. Only those which pass these filters will be considered for the downstream analysis. The raw reads obtained are aligned to the reference HG19 with the TMAP algorithm. The variants detected with the variant caller plugin were further annotated using the Ion Reporter 4.2 to give location (intronic/exonic/utr), gene name, protein change, function and dbSNP Id (from the dbSNP database 137) and Variant effect predictor for SIFT and Polyphen prediction. Library preparation and sequencing were carried out at Genotypic Technology’s Genomics facility (Bangalore, Karnataka, India). The POI gene subset (POI-420) consisted of 420 genes (Supplementary Table 1) which were considered candidates as they had been reported as having expression/function during distinct reproductive processes (e.g. sex determination, meiosis, folliculogenesis and ovulation). Several websites were used for creating this list of genes, such as Highwire, PubMed, MGI-Jackson Laboratory, Geoprofiles, Genecards and Illumina NextBio. These databases were exhaustively mined for pertinent information by using numerous combinations of keywords: premature ovarian failure, primary ovarian insufficiency, POI/POF genetics, hypergonadotropic hypogonadism, gametogenesis, molecular regulation of meiosis, folliculogenesis, ovulation genetics, sex determination, granulosa cell physiology and hypothalamic/pituitary/gonadal axis. R software programming and Excel (Microsoft) functions were used for exome data filtering. Sequence variants (synonymous and non-synonymous) in the POI-420 subset reported as having minor allele frequencies (MAF) <0.05 were selected for subsequent analysis. Variants having a potential effect at sequence protein level (e.g. missense, nonsense, splice site, frameshift) were then filtered for downstream analysis. Concerning missense mutations, all those displaying potential deleterious effects by both PolyPhen2 and SIFT bioinformatics tools (n=119) were filtered for subsequent analysis. The PolyPhen2 prediction software includes an algorithm that uses distinct variables such as interspecific protein alignments, mapping residues to 3-dimensional protein structures and physicochemical characteristics of the interchanged amino acids. The SIFT algorithm is based on calculations of evolutionary conservation of amino acids. All filtered candidate sequence variants were checked by PCR/Sanger sequencing. Technical conditions for PCR/sequencing assays, including oligonucleotide sequences, are available upon request. Clustal W software was used for aligning human protein sequences with those from orthologous species. STRING software (string-db.org) was used for constructing functional protein association networks for STAG3 MLH3 MEI1 and PRDM1. Structure preparation and modelling BMPR1B (pdb: 3MDY) and GREM1 (pdb: 5AEJ) crystal structures (WT versions) and their respective mutants (MT) (BMPR1B-p.Arg254His, BMPR1B-p.Phe272Leu and p.GREM1- p.Arg169Thr) were analysed (Chaikuad et al., 2012; Kišonaitė et al., 2016). The UCSF Chimera swapaa function was used to make amino acid substitutions in crystal structures, using the Dunbrack backbone-dependent rotamer library (Dunbrack, 2002; Pettersen et al., 2004). The Poisson-Boltzmann method was used for calculating residue protonation states, using the H++ web server and a pH of 7.4 for both proteins (Gordon et al., 2005). The structures were subjected to a restrained minimization procedure with the ff14SB classical force field implemented in the AMBER14 program. Each structure was solvated in an octahedral box of TIP3P water molecules containing chloride as the counter-ion. The minimum distance between the protein surface and the edge of the box was set at 10 Å for the solvated box. Only the protein without water molecules was included in the fragment molecular orbital (FMO) calculations after the minimization procedure. Fragment molecular orbital (FMO) calculations The FMO method was used for studying the effect induced by amino acid substitutions (Fedorov et al., 2012). This approach allows a comprehensive evaluation of variation types and energy changes caused by mutations. This ab initio quantum method enables an accurate evaluation of large molecular systems by means of a partition scheme (fragments). Total interaction energy can be decomposed into electrostatic, repulsion, charge transfer, dispersion and solvation terms by using a pair interaction decomposition analysis (PIEDA) for each fragment pair (Fedorov and Kitaura, 2007). The FMO method (version 5.2) implemented in the GAMESS 2016 software and the Hartree Fock (HF) theory with the 6- 31G* basis set was used (Schmidt et al., 1993). Solvent effects were included with the polarizable continuum model (PCM). Grimme’s dispersion model D3 was used for correcting all HF energies (Grimme et al., 2011). All the models were fragmented using Facio v. 19.2.1 (Suenaga, 2005). Interactions between fragments having a ≥ 3 kcal/mol absolute value were considered significant (Heifetz et al., 2016). Only interactions within 6.5 Å from the studied amino acid were included for each structure. References Chaikuad A, Alfano I, Kerr G, Sanvitale CE, Boergermann JH, Triffitt JT, Delft F von, Knapp S, Knaus P, Bullock AN. Structure of the bone morphogenetic protein receptor ALK2 and implications for fibrodysplasia ossificans progressiva. J Biol Chem 2012;287:36990–36998. Dunbrack RL. Rotamer libraries in the 21st century. Curr Opin Struct Biol 2002;12:431– 440. Fedorov DG, Kitaura K. Pair interaction energy decomposition analysis. J Comput Chem 2007;28:222–237. Fedorov DG, Nagata T, Kitaura K. Exploring chemistry with the fragment molecular orbital method. Phys Chem Chem Phys 2012;14:7562. Gordon JC, Myers JB, Folta T, Shoja V, Heath LS, Onufriev A. H++: a server for estimating pKas and adding missing hydrogens to macromolecules. Nucleic Acids Res 2005;33:W368–W371. Grimme S, Ehrlich S, Goerigk L. Effect of the damping function in dispersion corrected density functional theory. J Comput Chem 2011;32:1456–1465. Heifetz A, Chudyk EI, Gleave L, Aldeghi M, Cherezov V, Fedorov DG, Biggin PC, Bodkin MJ. The Fragment Molecular Orbital Method Reveals New Insight into the Chemical Nature of GPCR–Ligand Interactions. J Chem Inf Model 2016;56:159–172. Kišonaitė M, Wang X, Hyvönen M. Structure of Gremlin-1 and analysis of its interaction with BMP-2. Biochem J 2016;473:1593–1604. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. UCSF Chimera-A visualization system for exploratory research and analysis. J Comput Chem 2004;25:1605–1612. Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JH, Koseki S, Matsunaga N, Nguyen KA, Su S, et al. General atomic and molecular electronic structure system. J Comput Chem 1993;14:1347–1363. Suenaga M. Facio 3D-Graphic program for molecular modeling and visualization of quantum chemical calculations. J Comput Chem Japan 2005;4:25–32. Table 1. Clinical and molecular findings of POI patients studied via whole-exome sequencing Hormone values Mutation Patient Age at Locus Accession ExAC Allele Phenotype Gene Biological process ID diagnosis FSH LH E2 (position) number Frecuency Sequence variation Protein Position (IU/L) (IU/L) (ng/L) HK3 5;176318162 NM_002115.2 c.290G>A p.Gly97Glu 0.001034 Cell metabolism Pt-2 Primary 29 50 16 17 Granulosa cell NOTCH2 1;120458122 NM_024408.3 c.7223T>A p.Leu2408His 0.001788 differentiation and proliferation Granulosa cell GATA4 8;11615928 NM_002052.3 c.1273G>A p.Asp425Asn 0.002117 proliferation and differentiation INHBC 12;57843255 NM_005538.3 c.509T>A p.Leu170Gln 0.002969 Meiosis Pt-3 Secondary 17 91 34 9 MLH3 14;75515926 NM_001040108.1 c.433A>G p.Thr145Ala ND Meiosis PCSK5 9;78796345 NM_001190482.1 c.2035T>C p.Tyr679His 0.000008274 Ovulation Pt-6 Secondary 21 64 28 55 TSC1 9;135781014 NM_000368.4 c.1951A>G p.Arg651Gly ND Follicular development Pt-7 Secondary 37 83 17 1 ATG7 3;11389434 NM_006395.2 c.1209T>A p.Phe403Leu 0.000008238 Ovarian reserve Granulosa cell Pt-11 Secondary 35 44 23 6 UMODL1 21;43547856 NM_173568.3 c.3989T>A p.Ile1330Asn 0.0002650 differentiation and proliferation Granulosa cell HTRA3 4;8295883 NM_053044.3 c.1006C>T p.Arg336Cys 0.00004640 differentiation and Pt-14 Secondary 39 64 27 11 proliferation NBL1 1;19981530 NM_182744.3 c.112C>T p.Leu38Phe 0.0005640 Follicular development Pt-16 Secondary 39 141 58 7 UBR2 6;42571438 NM_015255.2 c.644C>T p.Pro215Leu 0.0001484 Meiosis PCSK1 5;95730629 NM_000439.4 c.1823C>T p.Thr608Met 0.00002471 Other Pt-17 Secondary 35 22 14 31 BMP6 6;7862681 NM_001718.4 c.1154G>A p.Arg385His 0.00004124 Follicular development Pt-22 Secondary 20 101 28 2 CXCR4 2;136873083 NM_003467.2 c.415G>A p.Val139Ile ND Ovulation Pt-23 Secondary 32 37 5 8 FGFR2 10;123353268 NM_022970.3 c.64C>T p.Arg22Trp 0.00009078 Follicular development Pt-24 Secondary 37 58 7 12 GREM1 15;33023397 NM_013372.6 c.506G>C p.Arg169Thr ND Follicular development MEI1 22;42095664 NM_152513.3 c.122C>A p.Pro41His 0.00006178 Meiosis GJA4 1;35260779 NM_002060.2 c.965G>A p.Arg322His 0.00009366 Meiosis Pt-25 Primary 29 54 25 9 IPO4 14;24649689 NM_024658.3 c.3205G>C p.Asp1069His 0.000008760 Meiosis Regulation of the ADAMTS16 5;5239880 NM_139056.2 c.2365C>T p.Arg789Cys 0.001292 extracellular matrix Granulosa cell GDF9 5;132199978 NM_005260.4 c.248C>G p.Ser83Cys ND differentiation and Pt-34 Secondary 16 18 19 21 proliferation PDE3A 12;20769270 NM_000921.4 c.1376G>A p.Arg459Gln 0.001566 Meiosis Granulosa cell Pt-35 Secondary 39 72 38 60 PTCH1 9;98215817 NM_000264.3 c.3392T>C p.Val1131Ala ND differentiation and proliferation BMPR1B 4;96051153 NM_001256793.1 c.816C>G p.Phe272Leu 0.000008247 Ovulation Pt-36 Secondary 27 102 64 5 TSC2 16;2138096 NM_000548.3 c.5116C>T p.Arg1706Cys 0.0002665 Follicular development Pt-37 Primary 17 56 26 8 BMPR1A 10;88681384 NM_004329.2 c.1274A>G p.Tyr425Cys ND Follicular development Regulation of the Pt-38 Secondary 34 105 69 20 LAMC1 1;183079729 NM_002293.3 c.961C>T p.Pro321Ser 0.0005848 extracellular matrix Regulation of the Pt-39 Secondary 28 86 84 7 ADAMTS16 5;5146365 NM_139056.2 c.298C>T p.Arg100Trp 0.0008860 extracellular matrix Pt-41 Primary 17 42 17 45 PTX3 3;157160530 NM_002852.3 c.908C>G p.Pro303Arg 0.0005518 Ovulation Pt-42 Secondary 32 96 82 20 FANCG 9;35078733 NM_004629.1 c.176G>A p.Gly59Glu 0.00003301 Meiosis Granulosa cell Pt-43 Secondary 35 77 52 9 NOTCH2 1;120462920 NM_024408.3 c.5411C>T p.Ser1804Leu 0.00002472 differentiation and proliferation MCM9 6;119234579 NM_017696.2 c.911A>G p.Asn304Ser 0.003325 Meiosis Pt-45 Secondary 34 136 46 2 BMPR1B 4;96051098 NM_001256793.1 c.761G>A p.Arg254His 0.001081 Ovulation Pt-47 Secondary 35 12 24 7 SEBOX 17;26691490 NM_001080837.2 c.362_371delGCACCTCAGT p.Ser116Ala*fs7 ND Meiosis FANCL 2;58386928 NM_004629.1 c.1114_1115insATTA p.Thr372Asnfs*11 ND Meiosis Pt-49 Secondary 24 136 37 1 ZP1 11;60637010 NM_207341.3 c.319G>A p.Asp107Asn 0.002254 Follicular development BMPER 7;34086005 NM_133468.4 c.664C>T p.Pro222Ser 0.0002637 Follicular development NOTCH2 1;120462898 NM_024408.3 c.5433G>C p.Gln1811His ND Granulosa cell differentiation and proliferation CYP26B1 2;72362437 NM_019885.3 c.541G>A p.Val181Met 0.00009900 Granulosa cell differentiation and Pt-51 Secondary 38 137 78 1 proliferation PRDM1 6;106554919 NM_001198.3 c.2036G>A p.Arg679His 0.00004120 Meiosis STAG3 7;99797247 NM_012447.3 c.1657G>A p.Gly553Ser ND Meiosis PADI6 1;17698849 NM_207421.3 c.109C>T p.Leu37Phe ND Follicular development Regulation of follicular Pt-54 Secondary 16 65 22 12 KIT 4;55524204 NM_000222.2 c.23G>C p.Trp8Ser ND development Regulation of follicular THBS1 15;39874613 NM_003246.2 c.287A>G p.Gln96Arg ND development Pt-55 Secondary 23 96 43 10 MTHFR 1;11850895 NM_005957.4 c.1813T>C p.Ser605Pro 0.000008245 Cell metabolism Pt-56 Secondary 31 75 68 10 BRD2 6;32942354 NM_001199456.1 c.4G>T; c.5C>G p.Ala2Cys ND Meiosis SOX15 17;7492861 NM_006942.1 c.134C>T p.Pro45Leu ND Other Pt-58 Secondary 15 60 29 15 BMPR1A 10;88681435 NM_004329.2 c.1325G>A p.Arg442His ND Follicular development Pt-59 Secondary 23 23 20 19 LEPR 1;66064368 NM_002303.5 c.875C>A p.Ser292Tyr 0.0001735 Ovulation Granulosa cell PCSK6 15;101845484 NM_002570.3 c.2891C>T p.Thr964Met 0.003727 differentiation and Pt-64 Secondary 37 114 33 10 proliferation SAPCD1 6;31731303 NM_001039651.1 c.226C>T p.Gln76Ter 0.0009166 Other Granulosa cell Pt-67 Secondary 35 38 15 8 BMP5 6;55739432 NM_021073.2 c.232C>T p.Pro78Ser ND differentiation and proliferation C3orf77 3;44284349 NM_001145030.1 c.351G>T p.Lys117Asn ND Meiosis Pt-68 Secondary 39 76 59 30 C3orf77 3;44284351 NM_001145030.1 c.353A>T p.Glu118Val ND Meiosis Figure 1 Figure 2 Supplemental table S1. POI gene subset (POI-420) analyzed via NGS Gene Gene name ACVR2A Activin a receptor, type iia ADAMTS1 A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 1 ADAMTS15 A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 15 ADAMTS16 A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 16 ADAMTS19 A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 19 ADAMTS4 A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 4 ADAMTS5 A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 5 ADAMTS6 A disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 6 ADIPOR1 Adiponectin receptor 1 ADIPOR2 Adiponectin receptor 2 AFP Alpha-fetoprotein AHR Aryl hydrocarbon receptor AKT Akt serine/threonine kinase 1 ALK4 Activin a receptor type 1b ALK6 Bone morphogenetic protein receptor type 1b ALK7 Activin a receptor type 1c AMBP Alpha-1 microglobulin/bikunin precursor AMH Anti-mullerian hormone AMHR2 Anti-mullerian hormone receptor type 2 AR Androgen receptor AREG Amphiregulin ARHGEF7 Rho guanine nucleotide exchange factor 7 ARNTL Aryl hydrocarbon receptor nuclear translocator like ATG10 Autophagy-Related Protein 10 ATG16L1 Autophagy related 16 like 1 ATG2A Autophagy-related protein 2 homolog A ATG4A Autophagy related 4A Cysteine Peptidase ATG4B Autophagy Related 4B Cysteine Peptidase ATG4C Autophagy related 4C Cysteine Peptidase ATG5 Autophagy related 5 ATG7 Autophagy related 7 ATG9A Autophagy-related protein 9A ATG9B Autophagy-related protein 9B ATM Ataxia-telangiectasia mutated gene AURKA Aurora kinase a AURKB Aurora kinase b AURKC Aurora kinase c BAX BCL2 associated X, apoptosis regulator BCL2 BCL2, apoptosis regulator BCL2L1 BCL2 like 1 BCL2L2 BCL2 like 2 BCL6 B-cell CLL/lymphoma 6 BDNF Brain derived neurotrophic factor BMAL1 Aryl hydrocarbon receptor nuclear translocator-like BMP15 Bone morphogenetic protein 15 BMP2 Bone morphogenetic protein 2 BMP4 Bone morphogenetic protein 4 BMP5 Bone morphogenetic protein 5 BMP6 Bone morphogenetic protein 6 BMP7 Bone morphogenetic protein 7 BMP8B Bone morphogenetic protein 8b BMPER BMP binding endothelial regulator BMPR1A Bone morphogenetic protein receptor type 1A BMPR1B Bone morphogenetic protein receptor type 1B BMPR2 Bone morphogenetic protein receptor type 2 BOLL Boule-Like RNA Binding Protein BRCA1 Breast cancer 1 gene BRD2 Bromodomain containing 2 BRD3 Bromodomain containing 3 BRD4 Bromodomain containing 4 BRDT Bromodomain testis associated BRSK1 BR serine/threonine kinase 1 BRWD1 Bromodomain and WD repeat domain containing 1 BUB1B BUB1 mitotic checkpoint serine/threonine kinase B BVES Blood vessel epicardial substance C1GALT1 Core 1 synthase, glycoprotein-N-acetylgalactosamine 3-beta-galactosyltransferase 1 CASP2 Caspase 2 CBX2 Chromobox 2 CCNA1 Cyclin A1 CCNB1IP1 Cyclin B1 interacting protein 1 CCND2 Cyclin D2 CDC25B Cell division cycle 25B CDK2 Cyclin dependent kinase 2 CDK4 Cyclin dependent kinase 4 CDKN1B Cyclin dependent kinase inhibitor 1B CDKN1C Cyclin dependent kinase inhibitor 1C CEBPA CCAAT/enhancer binding protein alpha CEBPB CCAAT/enhancer binding protein beta CGA Glycoprotein hormones, alpha polypeptide CITED2 Cbp/p300 interacting transactivator with Glu/Asp rich carboxy-terminal domain 2 CKS2 CDC28 protein kinase regulatory subunit 2 CMYC V-myc avian myelocytomatosis viral oncogene homolog CPE Carboxypeptidase E CPEB1 Cytoplasmic polyadenylation element binding protein 1 CRTC1 CREB regulated transcription coactivator 1 CTGF Connective tissue growth factor CTNNB1 Catenin beta 1 CUGBP1 CUGBP, Elav-like family member 1 CXCL19 Chemokine (C-X-C motif) ligand 19 CXCR4 C-X-C motif chemokine receptor 4 CYP11B1 Cytochrome P450 family 11 subfamily B member 1 CYP17A1 Cytochrome P450 family 17 subfamily A member 1 CYP19A1 Cytochrome P450 family 19 subfamily A member 1 CYP21A2 Cytochrome P450 family 21 subfamily A member 2 CYP26B1 Cytochrome P450 family 26 subfamily B member 1 CYP27B1 Cytochrome P450 family 27 subfamily B member 1 DAND5 DAN domain BMP antagonist family member 5 DAZL Deleted in azoospermia like DDR2 Discoidin domain receptor tyrosine kinase 2 DHCR24 24-dehydrocholesterol reductase DICER1 Dicer 1, ribonuclease III DLX5 Distal-less homeobox 5 DLX6 Distal-less homeobox 6 DMC1 DNA meiotic recombinase 1 DMRT1 Doublesex and mab-3 related transcription factor 1 DMRT3 Doublesex and mab-3 related transcription factor 3 DND1 DND microrna-mediated repression inhibitor 1 DPPA2 Developmental pluripotency associated 2 EDNRB Endothelin receptor type B EGR1 Early growth response 1 EIF4ENIF1 Eukaryotic translation initiation factor 4E nuclear import factor 1 EPAB Poly(A) binding protein cytoplasmic ERCC1 ERCC excision repair 1, endonuclease non-catalytic subunit ERCC2 ERCC excision repair 2, endonuclease non-catalytic subunit EREG Epiregulin ERK1 Mitogen-activated protein kinase 3 ERK2 Mitogen-activated protein kinase 1 ESCO2 Establishment of sister chromatid cohesion N-acetyltransferase 2 ESR1 Estrogen receptor 1 ESR2 Estrogen receptor 2 EVI1 MDS1 and EVI1 complex locus EXO1 Exonuclease 1 FABP6 Fatty acid binding protein 6 FANCA Fanconi anemia complementation group A FANCC Fanconi anemia complementation group C FANCG Fanconi anemia complementation group G FANCL Fanconi anemia complementation group L FGF2 Fibroblast growth factor 2 FGF9 Fibroblast growth factor 9 FGFR1 Fibroblast growth factor receptor 1 FGFR2 Fibroblast growth factor receptor 2 FHL2 Four and a half LIM domains 2 FIGLA Olliculogenesis specific bhlh transcription factor FKTN Fukutin FMN2 Formin 2 FMR1 Fragile X mental retardation 1 FOG2 Zinc finger protein, FOG family member 2 FOXC1 Forkhead box C1 FOXE1 Forkhead box E1 FOXG1B Forkhead box G1 FOXL2 Forkhead box L2 FOXL3 Forkhead box L3 FOXO3 Forkhead box O3 FOXO4 Forkhead box O4 FSD1L Fibronectin type III and SPRY domain containing 1 like FSHB Follicle stimulating hormone beta subunit FSHR Follicle stimulating hormone receptor FST Follistatin FSTL3 Follistatin like 3 FZD1 Frizzled class receptor 1 FZD4 Frizzled class receptor 4 FZR1 Fizzy/cell division cycle 20 related 1 GADD45G Growth arrest and DNA damage inducible gamma GATA4 GATA binding protein 4 GATA6 GATA binding protein 6 GCM2 Glial cells missing homolog 2 GCX1 TOX high mobility group box family member 2 GDF9 Growth differentiation factor 9 GGN1 Gametogenetin 1 GGT1 Gamma-glutamyltransferase 1 GGT5 Gamma-glutamyltransferase 5 GJA1 Gap junction protein alpha 1 GJA4 Gap junction protein alpha 4 GLI1 GLI family zinc finger 1 GLP1 Glucagon-like peptide 1, included GNRH1 Gonadotropin releasing hormone 1 GNRHR Gonadotropin releasing hormone receptor GOLT1A Golgi transport 1A GPR3 G protein-coupled receptor 3 GREM1 Gremlin 1, DAN family BMP antagonist GREM2 Gremlin 2, DAN family BMP antagonist GULP GULP, engulfment adaptor PTB domain containing 1 H2AFX H2A histone family member X HACE1 HECT domain and ankyrin repeat containing E3 ubiquitin protein ligase 1 HAS2 Hyaluronan synthase 2 HDAC1 Histone deacetylase 1 HDAC2 Histone deacetylase 2 HDX Highly divergent homeobox HES1 Hes family bhlh transcription factor 1 HEY2 Hes related family bhlh transcription factor with YRPW motif 2 HHIP Hedgehog interacting protein HK3 Hexokinase 3 HNRNPK Heterogeneous nuclear ribonucleoprotein K HORMAD1 HORMA domain containing 1 HOXA5 Homeobox A5 HPGD Hydroxyprostaglandin dehydrogenase 15-(NAD) HPRT1 Hypoxanthine phosphoribosyltransferase 1 HSD17B4 Hydroxysteroid 17-beta dehydrogenase 4 HSF2 Heat shock transcription factor 2 HSP10 Heat-shock 10-kd protein HSP27 Heat shock protein family B (small) member 1 HTRA1 Htra serine peptidase 1 HTRA3 Htra serine peptidase 3 IGF1 Insulin like growth factor 1 IGF2R Insulin like growth factor 2 receptor IL6ST Interleukin 6 signal transducer IMMP2L Inner mitochondrial membrane peptidase subunit 2 INHA Inhibin alpha subunit INHBA Inhibin beta A subunit INHBB Inhibin beta B subunit INHBC Inhibin beta C subunit INSL3 Insulin like 3 IRS2 Insulin receptor substrate 2 JAGGED1 Jagged 1 JAK2 Janus kinase 2 JMJD1A Lysine demethylase 3A KDR Kinase insert domain receptor KISS1 Kiss-1 metastasis-suppressor KISS1R KISS1 receptor KIT KIT proto-oncogene receptor tyrosine kinase KITLG KIT ligand LAMC1 Laminin subunit gamma 1 LARS2 Leucyl-trna Sybthetase 2 LATS1 Large Tumor Suppressor Kinase 1 LBX2 Ladybird homeobox 2 LEP Leptin LEPR Leptin receptor LFNG Lunatic fringe LGR4 Leucine rich repeat containing G protein-coupled receptor 4 LHB Luteinizing hormone beta polypeptide LHCGR Luteinizing hormone/choriogonadotropin receptor LHX8 Lim homeobox gene 8 LHX9 LIM homeobox 9 LIN28A Protein lin-28 homolog A LIN28B Protein lin-28 homolog B LOX1 Low density lipoprotein, oxidized, receptor 1 MAP3K4 Mitogen-activated protein kinase kinase kinase 4 MAPK14 Mitogen-activated protein kinase 14 MCL1 Myeloid cell leukemia sequence 1 MCM8 Minichromosome maintenance complex component 8 MCM9 Minichromosome maintenance complex component 9 MEI1 Meiosis inhibitor protein 1 MGARP Mitochondria localized glutamic acid rich protein MLH1 DNA mismatch repair protein Mlh1 MLH3 DNA mismatch repair protein Mlh3 MMP2 Matrix metallopeptidase 2 MOGAT1 Monoacylglycerol o-acyltransferase 1 MSH4 Muts homolog 4 MSH5 Muts homolog 5 MSX1 Msh homeobox 1 MSX2 Homeobox protein MSX-2 MTHFR 5,10-methylenetetrahydrofolate reductase MTOR Mammalian target of rapamycin MTRR Methionine synthase reductase NALP5 NLR family pyrin domain containing 5 NANOS2 Nanos homolog 2 NANOS3 Nanos C2HC-type zinc finger 3 NAT9 N-acetyltransferase 9 NBL1 Neuroblastoma candidate region, suppression of tumorigenicity 1 NBN Nibrin NHLH2 Nescient helix-loop-helix 2 NOBOX Homeobox protein NOBOX NOHLH Spermatogenesis and oogenesis specific basic helix-loop-helix 1 NOS1 Nitric oxide synthase 1 NOS3 Nitric oxide synthase 3 NOTCH2 Neurogenic locus notch homolog protein 2 NR2C2 Nuclear receptor subfamily 2, group c, member 2 NR5A1 Nuclear receptor subfamily 5 group A member 1 NR5A2 Nuclear receptor subfamily 5 group A member 2 NRG1 Neuregulin 1 NRIP1 Nuclear receptor interacting protein 1 NTF4 Neurotrophin 4 NTRK2 Neurotrophic tyrosine kinase, receptor, type 2 NUR77 Nuclear receptor subfamily 4 group A member 1 OOSP1 Oocyte secreted protein 1, pseudogene P2Y2 Purinergic receptor P2Y, g protein-coupled, 2 P2Y2R Purinergic receptor P2Y2 P2Y6 Pyrimidinergic receptor P2Y6 P2Y6R Pyrimidinergic receptor P2Y6 PADI6 Peptidylarginine deiminase, type vi PCNA Proliferating cell nuclear antigen PCSK1 Proprotein convertase, subtilisin/kexin-type, 1 PCSK5 Proprotein convertase, subtilisin/kexin-type, 5 PCSK6 Proprotein convertase, subtilisin/kexin-type, 6 PCYT1B Phosphate cytidylyltransferase 1, choline, beta PDE3A Phosphodiesterase 3A, cGMP-Inhibited PDE4D Phosphodiesterase 4D, cAMP-Specific PDPK1 3-phosphoinositide dependent protein kinase 1 PER1 period circadian clock 1 PGD2 Prostaglandin D2 synthase, brain PGR Progesterone receptor PGRMC1 Progesterone receptor membrane component 1 PHB Prohibitin PIK3CA Phosphatidylinositol 3-kinase, catalytic, alpha PIK3CG Phosphatidylinositol 3-kinase, catalytic, gamma PMS2 PMS1 homolog 2, mismatch repair system component POPDC3 Popeye domain containing 3 POR Cytochrome P450 oxidoreductase POU1F1 Pou domain, class 1, transcription factor 1 POU5F1 POU class 5 homeobox 1 2+ 2+ PPM1A Protein phosphatase, Mg /Mn dependent 1A PPP2R1A Protein phosphatase 2, structural/regulatory subunit a, alpha PRDM1 PR domain-containing protein 1 PRDX2 Peroxiredoxin 2 PRL Prolactin PRLR Prolactin receptor PROP1 Prop paired-like homeobox 1 PSMC3IP PSMC3-interacting protein PTCH1 Protein patched homolog 1 PTEN Phosphatase and tensin homolog PTGER2 Prostaglandin e receptor 2, EP2 subtype PTGS2 Prostaglandin-endoperoxide synthase 2 PTX3 Pentraxin 3, long RAD51C RAD51 paralog C RBMS1 RNA-binding motif protein, single strand-interacting, 1 REC8 REC8 meiotic recombination protein RHOX13 Reproductive homeobox 13 RHOX5 Rhox homeobox family, member 1 RHOX8 Rhox homeobox family, member 8 RHOXF2 Rhox homeobox family, member 2 RHOXF2B Rhox homeobox family member 1, pseudogene 1 RICTOR Rapamycin-insensitive companion of MTOR RNF35 Tripartite motif-containing protein 40 RPS6KB1 Ribosomal protein S6 kinase, 70-KD, 1 RSPO1 R-spondin family, member 1 RUNX2 Ribosomal protein s6 kinase, 70-KD, 1 SAM68 KH domain-containing, RNA-binding, signal transduction-associated protein 1 SCARB1 Scavenger receptor class b, member 1 SDF1 Chemokine, CXC motif, ligand 12 SEBOX Skin-, embryo-, brain-, and oocyte-specific homeobox SETDB2 Set domain protein, bifurcated, 2 SGOL2 Shugoshin-like 2 SH2B1 Sh2b adaptor protein 1 SIGLEC11 Sialic acid-binding immunoglobulin-like lectin 11 SIRT1 Sirtuin 1 SIX1 Sine Oculis Homeobox Homolog 1 SIX4 Sine Oculis Homeobox Homolog 4 SKP2 S-phase kinase-associated protein 2 SLC44A1 Solute carrier family 44, member 1 SMAD1 SMAD family member 1 SMAD2 SMAD family member 2 SMAD3 SMAD family member 3 SMAD4 SMAD family member 4 SMAD5 SMAD family member 5 SMAD8 SMAD family member 8 SMAD9 SMAD family member 9 SMC1B Structural maintenance of chromosomes 1b SMOM2 Smoothened SOD1 Superoxide dismutase 1 SOHLH1 Spermatogenesis and oogenesis-specific basic helix-loop-helix protein 1 SOHLH2 Spermatogenesis and oogenesis-specific basic helix-loop-helix protein 2 SOX15 Sry-box 15 SOX3 Sry-box 3 SOX8 Sry-box 8 SOX9 Sry-box 9 SPO11 SPO11, initiator of meiotic double stranded breaks SRC V-src avian sarcoma (schmidt-ruppin a-2) viral oncogene SSTR2 Somatostatin receptor 2 STAG3 Stromalin 3 STAR Steroidogenic acute regulatory protein STAT3 Signal transducer and activator of transcription 3 STRA8 Stimulated by retinoic acid 8 SULT1E1 Sulfotransferase family 1e, estrogen-preferring, member 1 SUV420H2 Suppressor of variegation 4-20 SYCE1 Synaptonemal complex central element protein 1 SYCE2 Synaptonemal complex central element protein 2 SYCE3 Synaptonemal complex central element protein 3 SYCP1 Synaptonemal complex protein 1 SYCP2 Synaptonemal complex protein 2 SYCP2L Synaptonemal complex protein 2-like SYCP3 Synaptonemal complex protein 3 TAF4B TAF4B RNA polymerase ii, tata box-binding protein-associated factor TAL2 T-cell acute lymphocytic leukemia 2 TBB8 Tubulin beta 8 class VIII TCF21 Transcription factor 21 TERT Telomerase reverse transcriptase TGFB1 Transforming growth factor, beta-1 TGFBR3 Transforming growth factor-beta receptor, type III THBS1 Thrombospondin I TIAL1 Tia1 cytotoxic granule-associated rna-binding protein-like 1 TIMP3 Tissue inhibitor of metalloproteinase 3 TMEM38B Transmembrane protein 38b TNFAIP6 Tumor necrosis factor-alpha-induced protein 6 TOP3B Topoisomerase, DNA, III, beta TOPAZ1 Chromosome 3 open reading frame 77 TORC1 Creb-regulated transcription coactivator 1 TP53 Tumor protein p53 TP73 Tumor protein p73 TRIP13 Thyroid hormone receptor interactor 13 TRKB Neurotrophic tyrosine kinase, receptor, type 2 TRMT6 tRNA methyltransferase 6 TSC1 Tuberous sclerosis 1 TSC2 Tuberous sclerosis 2 TWSG1 Twisted gastrulation BMP signaling modulator 1 UBB Ubiquitin b UBE3A Ubiquitin-protein ligase E3A UBR2 Ubiquitin-protein ligase E3 component n-recognin 2 UIMC1 Ubiquitin interaction motif-containing protein 1 UMODL1 Uromodulin-like 1 UNC5A UNC-5 netrin receptor A USP9X Ubiquitin-specific protease 9, x-linked USP9Y Ubiquitin-specific protease 9, y chromosome VDR Vitamin D receptor VRK1 Vaccinia-related kinase 1 VWC2 Von willebrand factor c domain-containing protein 2 WNT2 Wingless-type MMTV integration site family, member 2 WNT4 Wingless-type MMTV integration site family, member 4 WNT5A Wingless-type MMTV integration site family, member 5a WNT7A Wingless-type MMTV integration site family, member 7a WT1 Wilms tumor 1 YBX2 Y box-binding protein 2 YY1 Transcription factor yy1 ZFAND3 zinc finger, AN1-type domain 3 ZFP36L2 Zinc finger protein 36-like 2 ZFX Zinc finger protein, x-linked ZNF346 Zinc finger protein 346 ZNF462 Zinc finger protein 462 ZP1 Zona pellucida glycoprotein 1 ZP2 Zona pellucida glycoprotein 2 ZP3 Zona pellucida glycoprotein 3 Supplemental table S2. Available protein structures for modelling mutations identified via next generation sequencing. Structures used for fragment molecular orbital analysis are indicated in bold Mutation Patient Gene PDB ID Fragment (aa) DNA Protein ID BMPR1B c.761G>A p.Arg254His Pt-45 3MDY 168-502 BMPR1B c.816C>G p.Phe272Leu Pt-36 3MDY 168-502 3ODU 2-319 3OE0 2-319 3OE6 2-325 CXCR4 c.415G>A p.Val139Ile Pt-22 3OE8 2-319 3OE9 2-319 2-228 and 231- 4RWS 319 c.1114_1115insATT p.Thr372Asnfs*1 FANCL Pt-49 4CCG 288-375 A 1 GREM1 c.506G>C p.Arg169Thr Pt-24 5AEJ 72-184 HTRA3 c.1006C>T p.Arg336Cys Pt-14 4RI0 130-453 1Z78 19-233 1ZA4 19-233 THBS1 c.287A>G p.Gln96Arg Pt-54 2ERF 25-233 2ES3 25-233 2OUH 19-257 Supplementary Figure S1 Supplementary Figure S2 Supplementary Figure S3