Aditi Bhalla, Christian Hellert, Enkelejda Kasneci, Nastassja Becker, Continental Automotive Technologies GmbH, Technical University of Munich
Introduction
TRUCE-AV is a state-of-the-art multimodal benchmark dataset for trust and comfort estimation in autonomous vehicles. Understanding and estimating driver trust and comfort are essential for the safety and widespread acceptance of autonomous vehicles. Existing works analyze user trust and comfort separately, with limited real-time assessment and insufficient multimodal data.
Our dataset has the following key features:
- Real-time event-specific trust votes and continuous comfort ratings from 31 participants during a simulator-based fully autonomous driving.
- Two driving sessions with diverse scenarios that reflect real-world driving events.
- Concurrent data recordings of physiological signals, such as heart rate, gaze, and emotions, along with environmental data, such as vehicle speed, nearby vehicle positions, and velocity, and weather conditions.
Our dataset enables the development of adaptive AV systems capable of dynamically responding to user trust and comfort levels non-invasively, ultimately enhancing safety, user experience, and human-centered vehicle design.
Dataset
The data collected from multiple sensors is synchornised and combined into one file per participant per drive. Various events reflecting real-world driving scenarios were administered throughout both the drives to trigger physiological reactions.
If you have any questions regarding the dataset please contact: Aditi Bhalla
Drive-01
Consisted of 8 driving events in a 11-12 minutes drives:
Drive-02
Consisted of 7 driving events in a 11-12 minutes drives:
Event-1
Event-1
Event-2
Event-2
Event-3
Event-3
Event-4
Event-4
Event-5
Event-5
Event-6
Event-6
Event-7
Event-7
Event-8
Results
To demonstrate the utility of our dataset, we evaluated various machine learning models for trust and comfort estimation using physiological data. Our analysis showed that tree-based models like Random Forest and XGBoost and non-linear models such as KNN and MLP regressor achieved the best performance for trust classification and comfort regression.
Trust Classification
| Category | Model | Accuracy (Mean) | F1-score (Mean) | Precision (Mean) | Recall (Mean) |
|---|---|---|---|---|---|
| Linear models | Logistic Regression | 26.06% | 10.24% | 26.82% | 12.67% |
| LinearSVC | 25.98% | 9.29% | 15.98% | 12.34% | |
| Ridge classifier | 25.98% | 9.27% | 15.66% | 12.34% | |
| SGD classifier | 20.58% | 9.50% | 12.22% | 11.58% | |
| Tree-based models |
|
94.42% |
93.73% |
96.18% |
91.61% |
| HistGradient Boosting | 76.92% | 76.34% | 79.76% | 73.63% | |
| XGBoost | 82.64% | 83.49% | 86.72% | 80.83% | |
| LightGBM | 76.12% | 76.46% | 79.63% | 73.91% | |
Nonlinear/other models
| KNN | 78.83% | 74.27% | 72.11% | 77.49% |
| MLP classifier | 51.68% | 45.56% | 46.48% | 45.42% |
Comfort Regression
| Category | Model | R² (Mean) | MAE (Mean) | RMSE (Mean) |
|---|---|---|---|---|
| Linear models | Linear regression | 0.0106 | 0.0451 | 0.1072 |
| Ridge regression | 0.0107 | 0.0451 | 0.1072 | |
| Tree-based models |
|
0.1633 |
0.0404 |
0.0985 |
| Gradient Boosting | 0.0781 | 0.0429 | 0.1034 | |
| XGBoost | 0.1480 | 0.0414 | 0.0994 | |
| Nonlinear models |
|
0.1452 |
0.0716 |
0.0996 |
| MLP Regressor | 0.1714 | 0.0536 | 0.0981 |
BibTeX
We welcome submissions. If you use the dataset please cite the following publication.
@misc{bhalla2025truceavmultimodaldatasettrust,
title={TRUCE-AV: A Multimodal dataset for Trust and Comfort Estimation in Autonomous Vehicles},
author={Aditi Bhalla and Christian Hellert and Enkelejda Kasneci and Nastassja Becker},
year={2025},
eprint={2508.17880},
archivePrefix={arXiv},
primaryClass={cs.HC},
url={https://arxiv.org/abs/2508.17880},
}