Ruben Branco

About Me

Hi👋! I am a researcher specializing in Deep Learning and Deep Generative Modeling, currently finishing my PhD at the Faculty of Sciences of the University of Lisbon (FCUL) and LASIGE. My research journey has taken me from Natural Language Processing and commonsense reasoning to generative modeling.

During my PhD, supervised by Professor Sara Madeira (FCUL) and Professor Piero Fariselli (University of Turin), I have developed advanced Deep Generative Modeling methods for mixed-type longitudinal clinical data. Unlocking generative models for complex clinical data can help create more targeted synthetic data for speciﬁc clinical scenarios and helps address imbalanced event distributions that are common in healthcare settings. For instance, rare diseases or uncommon patient phenotypes are often underrepresented in real datasets, making it diﬃcult to develop robust predictive models for these populations.

My earlier research at the NLX-Group focused on understanding and mitigating fundamental challenges in neural networks, particularly Shortcut Learning in commonsense reasoning tasks. This work, completed as part of my Masters in Data Science under Professor António Branco, laid the foundation for my current interests in understanding how LLMs learn and reason.

I am now transitioning toward Research Scientist roles, with particular interest in LLM development, code generation, and mechanistic interpretability.

For a more thorough and structured description of my work, you may check my CV.

Links: ⚭ ⚭ ⚭

Research Interests

I am interested in many topics beyond these and am always looking forward to collaborations (given the time). If you have a problem you would like to collaborate on, feel free to contact me.

Deep Learning
Deep Generative Modeling
Large Language Models (LLMs)
Code Generation
Mechanistic Interpretability

Teaching

2023/2024

OutSystems Programming, Faculty of Sciences of the University of Lisbon

Taught an OutSystems Module on Practical Aspects and Applications in OutSystems, deepening the student’s knowledge of advanced topics within the tool. This course was offered by the Faculty under the UPskill IEFP program.

2021/2022

Intelligent Systems, Faculty of Sciences of the University of Lisbon

Teaching Theoretical-Pratical classes, offered to different bachelor degrees within the Faculty, in the 2nd Semester.

Class covers Artificial Intelligence Fundamentals, such as search algorithms, going all the way to Machine Learning Fundamentals.

Advanced Machine Learning, Faculty of Sciences of the University of Lisbon

Teaching Theoretical-Practical classes, offered to all Department of Informatics Masters’ Programs in the 2nd Semester.

Class focuses on more advanced topics such as Ensemble Learning, Markov Models, Deep Learning and Reinforcement Learning.

Machine Learning, Faculty of Sciences of the University of Lisbon

Taught Theoretical-Practical classes, offered to 8 Masters Programs and 1 PhD Program in the 1st Semester.

Class focuses on Supervised & Unsupervised Learning fundamentals.

Science Communication

Faculty of Medicine, University of Lisbon

Invited lecture “Introduction to Deep Learning”, given at the Faculty of Medicine of the University of Lisbon for the Information Systems and Applications in Health course of the Masters in Clinical Research.

LITHME Whole Action Conference @ Jyväskylä, Finland

This invited talk focused on my Masters’ Dissertation and EMNLP 2021 Paper, regarding Shortcut Learning and Commonsense Reasoning. Additionally, potential future research paths in the effort to solve this issue and create more cognitively capable machines are discussed.

Deep Learning Sessions Portugal

This invited talk focused on my Masters’ Dissertation and EMNLP 2021 Paper: Shortcut Learning and Commonsense Reasoning. If you want to learn a bit more about Shortcut Learning, its presence in Commonsense Reasoning and what it could mean for NLP and cognitive AI, might be worth a watch.

Selected Publications

LGTM! Characteristics of Auto-Merged LLM-based Agentic Pull Requests

Ruben Branco^*, Paulo Canelas^*, Catarina Gamboa^*, Alcides Fonseca

23rd International Conference on Mining Software Repositories (Mining Challenge)

PatientFlow: Learning to Generate Mixed-Type Longitudinal Clinical Data with Flow Matching

Ruben Branco, Marta Gromicho, Mamede de Carvalho, Piero Fariselli, Sara C. Madeira

Under Review @ Artificial Intelligence in Medicine Journal. SCIMAGO Q1 (2026)

TimeHealthGAN: Adversarial Generation of Mixed-Type Longitudinal Clinical Data

Ruben Branco, Marta Gromicho, Mamede de Carvalho, Piero Fariselli, Sara C. Madeira

Pre-Submission (2026)

Guiding Patient Flows: Event-Conditioned Flow Matching for Longitudinal Clinical Data

Ruben Branco, Marta Gromicho, Mamede de Carvalho, Piero Fariselli, Sara C. Madeira

Pre-Submission (2026)

Shortcutted commonsense: Data spuriousness in deep learning of commonsense reasoning

Ruben Branco, António Branco, João Rodrigues, João Silva

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). CORE A*, Outstanding Paper Award (2021)

Other Publications (click to expand)

From greatest simplicity to full power: Research-Infrastructure-as-a-Service for language science and technology

Luís Gomes, António Branco, João Silva, Ruben Branco

Language Resources and Evaluation (Journal). SCIMAGO Q1 (2024)

Predicting the Functional Rating Scale and Self-Assessment Status of ALS Patients with Sensor Data

Andreia S. Martins, Daniela M. Amaral, Eduardo N. Castanho, Diogo F. Soares, Ruben Branco, Sara C. Madeira, Helena Aidos

CLEF (2024)

Artificial intelligence and statistical methods for stratification and prediction of progression in amyotrophic lateral sclerosis: A systematic review

Erica Tavazzi, Enrico Longato, …, [including Ruben Branco], Barbara Di Camillo

Artificial Intelligence in Medicine, 102588. SCIMAGO Q1 (2023)

Investigating the impact of environmental data on ALS prognosis with survival analysis

Ruben Branco, Diogo F. Soares, Andreia S. Martins, Joana B. Valente, Eduardo N. Castanho, Sara C. Madeira, Helena Aidos

CLEF (2023)

Survival analysis for multiple sclerosis: predicting risk of disease worsening

Ruben Branco, Joana B. Valente, Andreia S. Martins, Diogo F. Soares, Eduardo N. Castanho, Sara C. Madeira, Helena Aidos

CLEF (2023)

Transfer Learning of Lexical Semantic Families for Argumentative Discourse Units Identification

João Rodrigues, Ruben Branco, António Branco

arXiv preprint arXiv:2209.02495 (2022)

Open and inclusive language processing: Language processing services by PORTULAN to meet the widest needs of CLARIN users

Luís Gomes, Ruben Branco, João Silva, António Branco

CLARIN. The infrastructure for language resources. Berlin: De Gruyter (Book chapter) (2022)

Hierarchical modelling for ALS prognosis: predicting the progression towards critical events

Ruben Branco, Diogo F. Soares, Andreia S. Martins, Eleonora Auletta, Eduardo N. Castanho, Susana Nunes, Filipa Serrano, Rita T. Sousa, Cátia Pesquita, Sara C. Madeira, Helena Aidos

CLEF (2022)

Explaining artificial intelligence predictions of disease progression with semantic similarity

Susana Nunes, Rita Torres Sousa, Filipa Serrano, Ruben Branco, Diogo F. Soares, Andreia S. Martins, Eleonora Auletta, Eduardo N. Castanho, Sara C. Madeira, Helena Aidos, Cátia Pesquita

CLEF (2022)

Commonsense Reasoning: how do Neuro-Symbolic and Neuro-only approaches compare?

Ruben Branco, António Branco, João Silva, João Rodrigues

CIKM Workshops (2021)

Reproduction and revival of the argument reasoning comprehension task

João Rodrigues, Ruben Branco, João Silva, António Branco

Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC). CORE B (2020)

Comparative probing of lexical semantics theories for cognitive plausibility and technological usefulness

António Branco, João Rodrigues, Małgorzata Salawa, Ruben Branco, Chakaveh Saedi

arXiv preprint arXiv:2011.07997 (2020)

Learning reference alignments for ontology matching within and across domains

Beatriz Lima, Ruben Branco, João Castanheira, Gustavo Fonseca, Cátia Pesquita

OM@ISWC (2020)

The MWN.PT WordNet for Portuguese: Projection, Validation, Cross-lingual Alignment and Distribution

António Branco, Sara Grilo, Márcia Bolrinha, Chakaveh Saedi, Ruben Branco, João Silva, Andreia Querido, Rita de Carvalho, Rosa Gaudio, Mariana Avelãs, Clara Pinto

Proceedings of the 12th Language Resources and Evaluation Conference. CORE B (2020)

ELRI: A Decentralised Network of National Relay Stations to Collect, Prepare and Share Language Resources

Thierry Etchegoyhen, Borja Anza Porras, Andoni Azpeitia, …, [including Ruben Branco], Luís Gomes

Proceedings of the 1st International Workshop on Language Technology Platforms (2020)

Whom to learn from? graph-vs. text-based word embeddings

Małgorzata Salawa, António Branco, Ruben Branco, João Rodrigues, Chakaveh Saedi

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) (2019)

Assessing wordnets with wordnet embeddings

Ruben Branco, João Rodrigues, Chakaveh Saedi, António Branco

Proceedings of the 10th Global Wordnet Conference. CORE C (2019)

Setting up the PORTULAN/CLARIN repository

Luís Gomes, Frederico Apolónia, Ruben Branco, João Silva, António Branco

CLARIN Annual Conference 2018 (2018)

Predicting brain activation with WordNet embeddings

João Rodrigues, Ruben Branco, João Silva, Chakaveh Saedi, António Branco

Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing (2018)

Browsing and Supporting Pluricentric Global Wordnet, or just your Wordnet of Interest

António Branco, Ruben Branco, Chakaveh Saedi, João Silva

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). CORE B (2018)

Contact

You can contact me at: rmbranco [at] ciencias.ulisboa.pt