CASPER
Cross-Modal Alignment for Semantic Processing and Effective Representation
Overview
CASPER is an MSCA research project advancing multimodal AI by aligning heterogeneous modalities – including neural signals – with language-centric models. The project combines graph-structured fusion, scalable training, and rigorous evaluation to deliver more grounded multimodal understanding and practical neural-to-speech reconstruction pipelines.
CASPER develops graph-aware multimodal AI that aligns language, vision, audio/speech, and neural signals (EEG/MEG/fMRI) to enable robust understanding and neural-to-speech reconstruction.
Cross-modal representation learning
Build unified representations across text, image, audio/speech, and video using structured fusion and graph-based learning.
Research goals
- Learn shared semantic spaces across text, image, audio/speech, and video with explicit cross-modal alignment objectives
- Improve grounding and robustness of multimodal generation and reasoning via graph-structured representations
- Enable neural decoding of speech/audio (and related percepts) from EEG/MEG/fMRI with strong generalization and uncertainty awareness
- Release reproducible artifacts (code, models, evaluation protocols) aligned with open science practices
Methodology
- Graph-based multimodal fusion: hierarchical aggregation and concept-level representations using GNNs
- Alignment + supervised fine-tuning: cross-modal contrastive/objective design + task-specific adaptation on curated multimodal datasets
- Neural foundation modeling: unified encoders for EEG/MEG/fMRI with cross-subject/session normalization and efficient decoding heads
- Evaluation & reliability: benchmarking for factual/semantic grounding, robustness across domains, and neural reconstruction fidelity
Research
Objectives
Our research is structured around three specific objectives (SO) that drive innovation in multimodal AI, graph-structured fusion, and neural decoding
These objectives work synergistically to advance multimodal AI systems, from unified representation learning to neural decoding pipelines that bridge brain activity and speech/audio reconstruction.
Multi-modal representation learning
SO1
Develop a human-like multimodal agent that learns unified, graph-structured representations across text, image, audio/speech, and video for robust semantic understanding and generation.
Key activities
- Build concept-level graphs from grid/patch and token features
- Learn aligned multimodal embeddings with contrastive and structured objectives
- Evaluate grounding, compositionality, and robustness across modalities
- Release reusable representation and evaluation components
Supervised fine-tuning and optimization
SO2
Refine and optimize the multimodal agent through supervised fine-tuning on curated multimodal datasets, improving reliability, generalization, and controllability.
Key activities
- Curate training mixtures and task formulations across modalities
- Fine-tune with quality/robustness targets and safety-aware evaluation
- Improve efficiency via parameter-efficient adaptation and scalable training
- Establish benchmarking protocols for generalization and failure modes
Neural representation learning and decoding (EEG/MEG/fMRI)
SO3
Develop unified neural encoders and decoding pipelines that map brain activity to aligned multimodal semantics, enabling neural-to-speech/audio reconstruction with strong cross-subject transfer.
Key activities
- Build modality-agnostic encoders for EEG/MEG/fMRI time series
- Align neural embeddings with multimodal semantic spaces (text/audio/image)
- Develop efficient decoders for speech/audio reconstruction and evaluation
- Validate generalization across subjects, sessions, and datasets
Impact
Timeline
24-month technical roadmap, from literature review to neural decoding validation
Project impacts
CASPER aims to generate significant scientific, societal, and economic impacts through groundbreaking research in cognitive neuroscience and artificial intelligence.
Scientific impact
- Advance understanding of neural mechanisms underlying cognitive flexibility
- Develop novel computational models bridging neuroscience and AI
- Create new methodologies for analyzing brain-behavior relationships
- Contribute to theoretical frameworks in cognitive neuroscience
- Generate high-impact publications in top-tier journals
Societal impact
- Improve diagnostic tools for neurological and psychiatric disorders
- Enhance educational strategies for cognitive skill development
- Inform policy decisions on mental health and cognitive wellness
- Promote public understanding of brain function and plasticity
- Support evidence-based interventions for cognitive enhancement
Economic impact
- Create intellectual property and potential licensing opportunities
- Foster innovation in neurotechnology and AI industries
- Reduce healthcare costs through improved diagnostic methods
- Generate employment opportunities in research and development
- Attract investment in European neuroscience research initiatives
Long-term vision
- Research excellence
Establish new standards for interdisciplinary research combining neuroscience, psychology, and artificial intelligence - Knowledge translation
Bridge the gap between basic research findings and practical applications in healthcare and education - Capacity building
Train the next generation of researchers in cutting-edge methodologies and interdisciplinary approaches - Global collaboration
Foster international partnerships and knowledge exchange in cognitive neuroscience research
Team
The CASPER project brings together world-class researchers from leading academic institutions across Europe. This interdisciplinary collaboration combines expertise in computer science, neuroscience, and cognitive science to advance multimodal AI systems and neural decoding technologies for speech and audio reconstruction.
Researcher and expert supervisors
Ambuj Mehrish
MSCA Researcher (Principal Investigator)
Ca' Foscari University of Venice
ambuj.mehrish@unive.it
Expertise: Multimodal AI, Graph Neural Networks, Neural Decoding
Sebastiano Vascon
Principal Supervisor
Ca' Foscari University of Venice
sebastiano.vascon@unive.it
Expertise: Computer Vision, Graph Neural Networks, Pattern Recognition
Valentina Borghesani
Co-Supervisor
NoCE Labs, University of Geneva
valentina.borghesani@unige.ch
Expertise: Cognitive Neuroscience, Semantic Processing, fMRI Analysis
Dimitri Van De Ville
Co-Supervisor
Neuro-X Institute, EPFL
dimitri.vandeville@epfl.ch
Expertise: Neuroimaging, Signal Processing, Brain Connectivity
Partner institutions
Internationally renowned for pioneering work in graph theory, game theory, and computer vision applications.
At Neuro-X Institute, dedicated to advanced neuroimaging and signal processing for brain functional connectivity research.
Leading pioneering research at the nexus of neurobiology, semantics, and language processing.