Home
Scholarly Works
A Multi-Task LLM Framework for Multimodal...
Conference

A Multi-Task LLM Framework for Multimodal Speech-Based Mental Health Prediction

Abstract

Mental health disorders are often comorbid, highlighting the need for predictive models that can address multiple outcomes simultaneously. Multi-task learning (MTL) provides a principled approach to jointly model related conditions, enabling shared representations that improve robustness and reduce reliance on large disorder-specific datasets. In this work, we present a tri-modal speech-based framework that integrates text transcriptions, acoustic landmarks, and vocal biomarkers within a large language model (LLM)-driven architecture. Beyond static assessments, we introduce a longitudinal modeling strategy that captures temporal dynamics across repeated clinical interactions, offering deeper insights into symptom progression and relapse risk. Our MTL design simultaneously predicts depression relapse, suicidal ideation, and sleep disturbances, reflecting the comorbid nature of adolescent mental health. Evaluated on the Depression Early Warning (DEW) dataset, the proposed longitudinal trimodal MTL model achieves a balanced accuracy of 70.8%, outperforming unimodal, single-task, and non-longitudinal baselines. These results demonstrate the promise of combining MTL with longitudinal monitoring for scalable, noninvasive prediction of adolescent mental health outcomes.

Authors

Ali M; Lucasius C; Patel TP; Aitken M; Vorstman J; Szatmari P; Battaglia M; Kundur D

Volume

00

Pagination

pp. 1-4

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

November 5, 2025

DOI

10.1109/bsn66969.2025.11337730

Name of conference

2025 IEEE 21st International Conference on Body Sensor Networks (BSN)

Labels

View published work (Non-McMaster Users)

Contact the Experts team