Methodology

Pipeline

This diagram outlines our pipeline for analyzing political tone and media coverage in Spanish TV news programs. It includes data collection, audio extraction, transcription, and topic modeling, followed by tone classification using advanced natural language models.

Data Collection and Processing Steps

- Data Collection: We gather data from TV channels such as TVE, Antena 3, laSexta, and Telecinco, stored as MP4 video files.
- Audio Extraction: Using FFmpeg, the audio is extracted into WAV format.
- Speech-to-Text: The audio is transcribed to text using google speech2text engine.
- Topic Modeling: Using BERTopic, we identify major topics from the transcribed speech.
- Tone Classification:Using the last version of ChatGPT4, we classify political tone (-1 for negative, 0 for neutral, and 1 for positive) for each political party in Spain.

Entropy Calulation

Entropy has been used in previous works as a measure of news disagreemnt across outlets . Frist, we filter dates where at least more than 50% of the outlets are present to avoid missing values. Then, we calculate the relative minutes that each outlet devotes to a story and then calculate entropy:

\[ entropy_{d}=-\sum_{s=1}^{S}\sum_{c} \frac{time_{scd}}{total\_time_{cd}} \times log\left(\frac{time_{scd}}{total\_time_{cd}}\right) \]

Days with lower entropy represent higher agreement in the relevance of the topics of the day.

Political Tone

Monthly Average Tone

The monthly average tone represents the mean tone with standard errors for each channel-party. Tone is calculated by classifying the content of the news using LLMs into three categories: positive, negative, and neutral. The monthly average tone is calculated by taking the mean of the daily tone for each channel-party combination.