Name: | Description: | Size: | Format: | |
---|---|---|---|---|
91.19 KB | Adobe PDF |
Advisor(s)
Abstract(s)
The internet plays an important role in our society, namely in the circulation of political
ideas [2] knowing that, political actors have been using web’s potential to invigorate their
campaigns [4]. Obama’s 2008 presidential campaign is a well-known example [1].
In this study, which is part of a larger one about European elections, we intend to examine
Portuguese’ use of online media in relation to political involvement in the 2019 European
Parliament election. This project, which is being developed in partnership with Netquest
(an opinion and market research company), uses a database of web navigation actions
(WNA) from its Internet user panel in Portugal. This data set includes navigation actions
on computer and mobile devices, for a sample of 1,288 users. Our data were collected
between April 26 and June 26, 2019 (a period of two months, around the elections, held
on May 26, in Portugal), and contains 20,137,355 WNA.
In order to analyze this data set we applied the CRISP-DM (Cross-Industry Standard
Process for Data Mining) methodology [3]. After business and data understanding, we
are in phase 3, the most time-consuming task - data preparation. In this phase a binary
variable was added to identify if a WNA refers to an online media or not. This classification
was based on the list of media provided by the Entidade Reguladora para a Comunica¸c˜ao
Social (Portuguese Regulatory Authority for the Media).
The next step is to identify which of these WNA of Portuguese media are about politics.
First, we select subdomains or tabs of the WNA URL address that contains the words: “politica” (politics), “eleicao” or “eleicoes” (election). Other options will be apply text
mining to news titles in the WNA url address or use HTML scraping and text mining
algorithms to analyze online news content.
This study is focused on challenges we faced during data preparation. To make sure that
the database is consistent and does not include duplicate or redundant information it
was necessary to understand what each variable actually represented. Then we recoded
the data to numerical values. And some variables were grouped, such as the region, the
level of education and the area of study. We have also turned date of birth into age and
standardized standardized the time spent online. In addition, mobile and desktop WNA
information have been tuned to be expressed in the same way.
Finally, we present the preliminary results of the identification of WNA related to policy
issues and an exploratory analysis of the information will be carried out.
References [1] E. Bomberg and B. Super. The 2008 US presidential election: Obama and the environment. Environmental Politics, 18:424–430, 2009. [2] M. Bonchek. From broadcast to netcast: The Internet and the flow of political information. PhD thesis, Harvard University, 1997. [3] F. Mart´ınez-Plumed, L. Contreras-Ochando, C. Ferri, J. H. Orallo, M. Kull, N. Lachiche, M. J. Quintana, and P. Flach. Crisp-dm twenty years later: From data mining processes to data science trajectories. IEEE Transactions on Knowledge and Data Engineering, 33:3046–3061, 2019. [4] L. Parisi and R. Rega. Disintermediation in political communication: chance or missed opportunity? Leadership and New Trends in Political Communication, pages 123–148, 2011.
References [1] E. Bomberg and B. Super. The 2008 US presidential election: Obama and the environment. Environmental Politics, 18:424–430, 2009. [2] M. Bonchek. From broadcast to netcast: The Internet and the flow of political information. PhD thesis, Harvard University, 1997. [3] F. Mart´ınez-Plumed, L. Contreras-Ochando, C. Ferri, J. H. Orallo, M. Kull, N. Lachiche, M. J. Quintana, and P. Flach. Crisp-dm twenty years later: From data mining processes to data science trajectories. IEEE Transactions on Knowledge and Data Engineering, 33:3046–3061, 2019. [4] L. Parisi and R. Rega. Disintermediation in political communication: chance or missed opportunity? Leadership and New Trends in Political Communication, pages 123–148, 2011.
Description
Keywords
European elections Online behavior Political news CRISP-DM Data preparation
Pedagogical Context
Citation
Silvestre, C., Pinheiro, R., & Montargil, F. (2021, dez, 09-11). How to analyze online behavior as a source for political information in the Portuguese 2019 European Parliament election?. Paper presented at XVIII Meeting of the Portuguese Association for Classification and Data Analysis (JOCLAD 2021), Universidade da Beira Interior – Covilhã, Portugal
Publisher
CLAD - Associação Portuguesa de Classificação e Análise de Dados
UBI - Universidade da Beira Interior
UBI - Universidade da Beira Interior