Biomedical Data Science Image Dr Catalina Vallejos – Reader Research in a Nutshell While biomedical data sometimes classifies as “big data” (where the number of samples and/or variables is large), complexity is its most prominent feature. This arises from a combination of different sources of heterogeneity: heterogeneity across individuals in a population (e.g. response to treatment), heterogeneity in terms of the type of data we collect (e.g. health records & genomics) and heterogeneity that is introduced by the data collection process (e.g. measurement error). We focus on the development of novel statistical methodology to address and study these sources of heterogeneity. This is a highly multidisciplinary task: from the understanding of complex biomedical problems and technologies, to the development of new methodology and the implementation of open-source analysis tools. Our current research focuses on two areas of application. Firstly, single-cell RNA-sequencing, a cutting-edge experimental technique that allows genome-wide quantification of gene expression on a cell-by-cell basis. Secondly, electronic health records research, to develop predictive models based on observational data that is routinely collected by health providers (e.g. NHS). Developing computational tools that can make full advantage of the rich information provided by these data sources is ought to improve our understanding of health and disease, playing an important role in precision medicine initiatives. Group External Website Image People Dr Catalina Vallejos Group Leader Begoña Bolos CRUK PhD student (co-supervised) Veronica Finazzi EpiCrossBorders PhD student (co-supervised; based in Munich) Yipeng Cheng Edinburgh Helsinki Program in Human Genomics PhD student (co-supervised) Louis Chislett HDRUK/Turing Wellcome programme in Health Data Science PhD student Dr Nathan Constantine-Cooke Postdoctoral Research Associate Dr Karla Monterrubio-Gomez Postdoctoral Research Associate Linda Nguyen MRC Precision Medicine PhD student (co-supervised) Emma Yang MRC HGU PhD student (co-supervised) Contact catalina.vallejos@ed.ac.uk Publications Liley, J., Emerson, S. R., Mateen, B. A., Vallejos, C. A., Aslett, L. J. M., & Vollmer, S. J. (2021). Model updating after interventions paradoxically introduces bias. Paper presented at 24th International Conference on Artificial Intelligence and Statistics. Kapourani, A., Argelaguet, R., Sanguinetti, G., & Vallejos, C. A. (2021). scMET: Bayesian modelling of DNA methylation heterogeneity at single-cell resolution. Genome Biology. 10.1186/s13059-021-02329-8 Lähnemann, D., Köster, J., Szczurek, E., McCarthy, D. J., Hicks, S. C., Robinson, M. D., Vallejos, C. A., Campbell, K. R., Beerenwinkel, N., Mahfouz, A., Pinello, L., Skums, P., Stamatakis, A., Attolini, C. S-O., Aparicio, S., Baaijens, J., Balvert, M., Barbanson, B. D., Cappuccio, A., ... Schönhuth, A. (2020). Eleven grand challenges in single-cell data science. Genome Biology, 21(1), 31. 10.1186/s13059-020-1926-6 Richter, M. L., Deligiannis, I. K., Yin, K., Danese, A., Lleshi, E., Coupland, P., Vallejos, C. A., Matchett, K. P., Henderson, N. C., Colome-Tatche, M., & Martinez-Jimenez, C. P. (2021). Single-nucleus RNA-seq2 reveals a functional crosstalk between liver zonation and ploidy. Nature Communications. 10.1038/s41467-021-24543-5 Maniatis C, Vallejos CA, Sanguinetti G. SCRaPL: A Bayesian hierarchical framework for detecting technical associates in single cell multiomics data. PLoS Comput Biol. 2022 Jun 21;18(6):e1010163. doi: 10.1371/journal.pcbi.1010163. PMID: 35727848; PMCID: PMC9249169. Full publication list can be found on Research Explorer: Catalina Vallejos Meneses — University of Edinburgh Research Explorer Partners and Funders The Alan Turing Institute British Heart Foundation Scientific Themes Statistical genomics, single cell sequencing, risk prediction, electronic health records This article was published on 2024-09-23