Predicting Pain Levels from Multivariate Time-Series

This project was developed as part of the Artificial Neural Networks and Deep Learning (AN2DL) course at Politecnico di Milano. I recently participated in the course's First Challenge with my team, and I am thrilled to share that we achieved second place in the competition. Below is a breakdown of the problem we tackled and the deep learning methodology that led us to the top of the leaderboard.

AN2DL Challenges Awards ceremony — Politecnico di Milano

The Problem: Pain Level Classification

The goal of this challenge was to predict an individual's pain level from temporal sequences that include joint readings, pain survey estimates, and subject metadata.

We worked with a highly imbalanced dataset comprising 105,760 timesteps grouped into 661 distinct sequences, each spanning 160 timesteps. The sequences were labeled into three categories: no pain, low pain, or high pain.

Our Approach

Our approach was structured around three main stages: comprehensive data exploration, training various model families, and automated hyperparameter optimization using Optuna ^[1].

1. Feature Engineering & Signal Processing

We quickly realized that feeding raw temporal data into the network wouldn't be enough. Our preprocessing and exploration phase included:

Removing uninformative or highly collinear features (e.g., zero-variance joints and constant metadata).
Moving to the frequency domain to analyze the joint spectrum and power spectral density (PSD) ^[5].
Standardizing all continuous measurements and handling class imbalance using a weighted categorical cross-entropy loss and SMOTE oversampling ^[3].

2. Architectural Exploration

We designed and evaluated several model families to find the best fit for our time-series data:

MLPs: Treating each timestep as an independent sample and using majority voting for the sequence.
CNNs: Using 1D convolutions to capture short-term temporal patterns across contiguous windowed timesteps.
LSTMs: Employing Bidirectional LSTMs and experimenting with self-attention and PSD-based attention mechanisms ^[4] to weight window contributions, as well as using convolutions to extract local features and reduce dimensionality ^[2].

Results & Key Takeaways

Surprisingly, simpler models proved unexpectedly effective for this specific task.

While we initially introduced the Multilayer Perceptron (MLP) as a baseline, a 4-layer MLP (with 128 hidden units) ended up delivering the best overall performance, achieving a top leaderboard F1 score of 0.9620. Convolutional Neural Networks also performed exceptionally well, scoring 0.9520.

More complex recurrent models, including LSTMs augmented with convolutions or PSD-based attention, failed to surpass a plain bidirectional LSTM and remained clearly behind the MLP.

We attribute this trend to the limited dataset size (only 661 sequences). This constraint heavily favored compact models and made highly parameterized recurrent architectures much more susceptible to overfitting.

This project was a fantastic exercise in avoiding the trap of "complex is always better." Careful feature selection, regularization, and choosing the right model scale for the dataset at hand ultimately secured our spot on the podium.

// REFERENCES

[1]
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631.
[2]
Karim, F., Majumdar, S., Darabi, H., & Chen, S. (2018). LSTM Fully Convolutional Networks for Time Series Classification. IEEE Access, 6, 1662–1669.
[3]
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.
[4]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30.
[5]
Welch, P. D. (1967). The Use of Fast Fourier Transform for the Estimation of Power Spectra. IEEE Transactions on Audio and Electroacoustics, 15(2), 70–73.