Defended my M.Sc.! Thesis available here
Abstract:
Online misinformation is one of the most challenging modern issues, yielding severe consequences, including political polarization, attacks on democracy, and public health risks. Misinformation manifests in any platform with a large user base, including online social networks and messaging apps. It permeates all media and content forms, including images, text, audio, and video. Distinctly, video-based misinformation represents a multifaceted challenge for fact-checkers, given the ease with which individuals can record and upload videos on various video-sharing platforms. Previous research efforts investigated detecting video-based misinformation, focusing on whether a video shares misinformation or not on a video level. While this approach is useful, it only provides a limited and non-easily interpretable view of the problem given that it does not provide an additional context of when misinformation occurs within videos and what content (i.e., claims) are responsible for the video’s misinformative nature.
In this work, we attempt to bridge this research gap by proposing a novel approach for misinformation detection on videos, focusing on identifying the span of videos that are responsible for the video’s misinformation claim, a task we frame as misinformation span detection. We present two new datasets for this task, both containing false claims and the video moment in which they appear. We transcribe each video’s audio to text, identifying the video segment in which the misinformation claims appear, resulting in two datasets of more than 600 videos with more than 2,300 segments containing annotated fact-checked claims. Then, we employ classifiers built with state-of-the-art language models, and our results show that we can identify in which part of a video there is misinformation with an F1 score of 0.68. Additionally, we also point to new directions for misinformation span detection using in-context learning. We hope our work can assist fact-checkers and the development of automated misinformation detection and robust automatic moderation tools that align with the evolving needs of digital platforms.