Methodology
Pipeline
This diagram outlines our pipeline for analyzing political tone and media coverage in Spanish TV news programs. It includes data collection, audio extraction, transcription, and topic modeling, followed by tone classification using advanced natural language models.
Data Collection and Processing Steps
- Data Collection: We gather data from TV channels such as TVE, Antena 3, laSexta, and Telecinco, stored as MP4 video files.
- Audio Extraction: Using FFmpeg, the audio is extracted into WAV format.
- Speech-to-Text: The audio is transcribed to text using google speech2text engine.
- Topic Modeling: Using BERTopic, we identify major topics from the transcribed speech.
- Tone Classification:Using the last version of ChatGPT4, we classify political tone (-1 for negative, 0 for neutral, and 1 for positive) for each political party in Spain.
Entropy Calulation
Entropy has been used in previous works as a measure of news disagreemnt across outlets . Frist, we filter dates where at least more than 50% of the outlets are present to avoid missing values. Then, we calculate the relative minutes that each outlet devotes to a story and then calculate entropy:
\[ entropy_{d}=-\sum_{s=1}^{S}\sum_{c} \frac{time_{scd}}{total\_time_{cd}} \times log\left(\frac{time_{scd}}{total\_time_{cd}}\right) \]
Days with lower entropy represent higher agreement in the relevance of the topics of the day.