Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary

Authors

DOI:

https://doi.org/10.1590/1981-5794-1904-3

Keywords:

Popular newspapers, Lexic, Vocabulary, Computational dictionary, Lexical coverage, Recognition of words, Brazilian Portuguese,

Abstract

We report an experiment of checking the identification of a set of words in popular Portuguese written text with two versions of a computational dictionary of Brazilian Portuguese, DELAF PB 2004 and DELAF PB 2015. This computational dictionary is freely available for use in linguistic analyses of Brazilian Portuguese and other research, which gives reasons for undertaking a critical study. The set of words comes from the PorPopular corpus, composed of popular newspapers, the Diário Gaúcho (DG) and the Bahian newspaper Massa! (MA). From DG, we studied a set of texts with 984,465 words (tokens), published in 2008, in the spelling used before the Orthographic Agreement of the Portuguese Language adopted in 2009. From MA, we examined a vocabulary of 215,776 words (tokens), from papers published in 2012, 2014 and 2015 in the new spelling. The verification involved: a) generating lists of unique words used in DG and MA; b) comparing these lists with the entry lists of the two versions of DELAF PB; c) assessing the coverage of this vocabulary; d) proposing ways of including the items not covered. The results showed that an average of 19% of the types in the DG corpus were unknown by the DELAF PB 2004 and 2015. In the MA sample, this average was 13%. The version of the dictionary impacted slightly on item recognition performance.

Downloads

Download data is not yet available.

Author Biographies

Maria José Bocorny Finatto, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre - RS – Brasil.

Docente do PPG Letras-UFRGS.Pesquisadora do CNPq.

Oto Araújo Vale, Universidade Federal de São Carlos (UFSCar), Centro de Educação e Ciências Humanas, São Carlos - SP - Brasil.

Docente do PPG Linguagem da UFScar.

Published

15/04/2019

How to Cite

FINATTO, M. J. B.; VALE, O. A.; LAPORTE, Éric. Recognition of the vocabulary of popular Brazilian newspapers with a freely available computational dictionary. ALFA: Revista de Linguística, São Paulo, v. 63, n. 1, 2019. DOI: 10.1590/1981-5794-1904-3. Disponível em: https://periodicos.fclar.unesp.br/alfa/article/view/11234. Acesso em: 6 jul. 2024.

Issue

Section

Papers