Engagement between client and therapist is a critical determinant of
therapeutic success. We propose a multi-dimensional natural language processing
(NLP) framework that objectively classifies engagement quality in counseling
sessions based on textual transcripts. Using 253 motivational interviewing
transcripts (150 high-quality, 103 low-quality), we extracted 42 features
across four domains: conversational dynamics, semantic similarity as topic
alignment, sentiment classification, and question detection. Classifiers,
including Random Forest (RF), Cat-Boost, and Support Vector Machines (SVM),
were hyperparameter tuned and trained using a stratified 5-fold
cross-validation and evaluated on a holdout test set. On balanced
(non-augmented) data, RF achieved the highest classification accuracy (76.7%),
and SVM achieved the highest AUC (85.4%). After SMOTE-Tomek augmentation,
performance improved significantly: RF achieved up to 88.9% accuracy, 90.0%
F1-score, and 94.6% AUC, while SVM reached 81.1% accuracy, 83.1% F1-score, and
93.6% AUC. The augmented data results reflect the potential of the framework in
future larger-scale applications. Feature contribution revealed conversational
dynamics and semantic similarity between clients and therapists were among the
top contributors, led by words uttered by the client (mean and standard
deviation). The framework was robust across the original and augmented datasets
and demonstrated consistent improvements in F1 scores and recall. While
currently text-based, the framework supports future multimodal extensions
(e.g., vocal tone, facial affect) for more holistic assessments. This work
introduces a scalable, data-driven method for evaluating engagement quality of
the therapy session, offering clinicians real-time feedback to enhance the
quality of both virtual and in-person therapeutic interactions.