Beyond Structured Data - Deep Learning Approaches for Multimodal Risk Assessment and Bias Detection

Abstract

This thesis investigates the integration of multimodal deep learning techniques for enhanced decision-making in financial and public sector domains, with a focus on fairness, transparency, and performance. Although traditional models rely predominantly on structured data, this research explores the synergistic potential of combining structured and unstructured sources, such as text, images, and numerical data, through advanced data fusion strategies. The first line of inquiry focuses on corporate credit rating predictions by evaluating various fusion levels and techniques using convolutional neural networks, recurrent neural networks, and transformer-based language models. The results show that hybrid fusion strategies significantly outperform simpler or more complex architectures and that textual data plays a more influential role than numerical counterparts. The second strand addresses the detection of bias in multilingual customer service feedback within public tax administrations. A novel framework is proposed, integrating quantized large language models with human-in-the-loop validation to enhance bias detection and ensure equitable service across demographic groups. This approach demonstrated greater alignment with expert evaluations and adaptability to specific organizational contexts. The final study focuses on mortgage default prediction using multimodal inputs, such as news articles and spatial imagery. To address this, we introduce a novel fusion architecture, CapsFusion, which not only captures modality-specific features but also incorporates trainable weights that dynamically adjust the contribution of each modality. Together, these contributions demonstrate the viability and necessity of multimodal, interpretable AI systems for responsible decision-making in high-stakes environments. The findings underscore the importance of fusing diverse data types, embedding fairness principles, and improving accessibility for greater stakeholder participation.

Summary for Lay Audience

Today, banks, governments, and regulators use a lot of data to decide who gets a loan, how to detect risk, or whether people are being served fairly. But this data is not all the same. Some of it is numbers (like income or debt), some of it is text (like customer feedback or company reports), and some of it can even be images (like pictures of a property). Most traditional models only use the numeric part and ignore the rest. This thesis shows that we can make better, fairer, and more transparent decisions if we let a model “listen” to all of these data sources at the same time. The thesis has three parts. The first part shows that credit ratings for companies become more accurate when we combine financial numbers with what companies actually write in their reports. The second part works with a tax agency and shows how comments from taxpayers in English and French can be analyzed with AI in a way that checks for fairness across groups (for example, by gender or language) and keeps a human in the loop. The third part shows that even for mortgage risk, we can get useful results using only publicly available information such as property descriptions, images, and online signals—this is helpful for smaller institutions that don’t have access to expensive private data. Overall, the thesis argues that financial decisions should not only be accurate but also explainable, fair, and practical for real organizations. By combining different types of data in smart ways, we can build AI systems that are closer to how people actually reason—using numbers, words, and context together.

Description

Keywords

Fusion Strategies, Deep Learning, Multi-modality, Large Language Models, Trend Detection, Credit Rating

DOI

License

Collections