Text2Story

Extracting journalistic narratives from text and representing them in a narrative modelling language Artificial Intelligence

Funding

FCT, 250k€

Duration

2019—2023

Description Nowadays, journalistic content is distributed in multiple formats, mostly through the web and specific internet-based applications running on smartphones and tablets. Text is a very important format, but readers (or more accurately users or information consumers) heavily rely on images, videos, slideshows, charts and infographics. Textual content is still the main representation for information. Any journalistic subject (e.g. Trump and Russia) is described in one or more texts produced by journalists and possibly commented by readers. Many of those subjects are followed during days, weeks or months. To grasp a possibly vast and somewhat complex set of interconnected news articles, readers would greatly benefit from tools that summarize those articles by showing main actors, their interplay and their trajectories in time and space, their motivations, main events, causal relations of events and outcomes. In the Text2Story project we use Artificial Intelligence, Computer Science and Linguistics to automatically extract those narrative elements using a well-defined semantic framework and re-represent them in formats that convey the essential story but that are more efficiently consumed by the users. The project was lead by INESC TEC in collaboration with researchers from the Center of Linguistics of U. Porto, the Lusa News Agency and Jornal Público.

This research line poses many challenging problems in information extraction and automatic production of media content. In this project we wanted to obtain tools that were able to extract narratives/stories from news articles or collections of related news articles (unstructured data) about the same (or related) subject. Then, those narratives were represented in intermediate data structures (structured data) which could then be used to subsequent media production processes (semi-automatic generation of slide shows, infographics and other visualisations, video sequences, games, etc.). In summary, our aim in Text2Story project was to develop a conceptual framework and operational pipeline for the extraction of narratives from textual sources. The project focused on the automatic processing of journalistic text in written Portuguese.

Scientific Advances

- Definition of a Semantic Annotation Framework for Narratives focusing in the Portuguese Language. The framework is based on ISO annotation standards.
- Collection and dense annotation of a corpus of news articles in Portuguese. Annotation is coordinated and performed by linguists.
- Development of algorithmic methods for automatic annotation of text according to the defined framework.
- Definition and selection of formal representation languages for narratives. Namely Discourse Representation Structure (DRS).
- Definition of visualisations for narratives.
- Development of tools for producing DRS from raw and annotated text.
- Development of tools for producing visualisations from DRS.

Know more about our projects

2018—2020 FotoInMotion Artificial Intelligence
2020—2025 TRUST AI Artificial IntelligenceSystems Engineering and Management
2021—2023 SCORPION Artificial IntelligenceRobotics
2020—2023 TAMI Artificial IntelligenceBioengineering
2023—2027 AI4Realnet Artificial IntelligenceBioengineeringPower and Energy Systems
2019—2023 SMART4RES Artificial IntelligencePower and Energy Systems