Moving towards an Open Science: A statistical approach based on the generation of manual and automatic extractive summaries
DOI:
https://doi.org/10.21814/h2d.4479Keywords:
automation, open science, data science, corpus, data democratization, extractive summariesAbstract
Nowadays, the diffusion of new media has facilitated the proliferation of scientific data, which can be disseminated thanks to new information processing techniques. This article aims, based on the workflow established between open data and data science, to analyse manually and automatically generated summaries in statistical terms. As a result, we evaluate new possibilities of making scientific knowledge more accessible as we move towards a data democratization. Taking a corpus of abstracted texts as a starting point, quantitative analysis will thus be carried out using theoretical foundations that will allow us to draw conclusions about the feasibility of automation to achieve an open science.
Downloads
References
Atkins, S., Clear, J., & Ostler, N. (1992). Corpus Design Criteria. Literary and Linguistic Computing, 7(1),1-16. DOI: https://doi.org/10.1093/llc/7.1.1
Batarseh, F. A., & Yang, R. (2020). Data Democracy. At the nexus of artificial intelligence, software development, and knowledge engineering. Elsevier Academic Press.
Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O’Relly.
Cavique, L. (2014). Big data e data science. Boletim da APDIO, 51, 11-14. http://hdl.handle.net/10400.2/3918
Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. Proceedings of the Association for Computational Linguistics (ACL), 484-494. DOI: https://doi.org/10.18653/v1/P16-1046
Davenport, T. H., & Patil, D. J. (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
European Comission (2016). Open Innovation. Open Science. Open to the World: A Vision for Europe. Publications Office of the European Union. https://data.europa.eu/doi/10.2777/552370
Hartley, J. (2016). Is time up for the Flesch measure of reading ease? Scientometrics, 107(3), 1523-1526. DOI: https://doi.org/10.1007/s11192-016-1920-7
Layton, R. (2015). Learning Data Mining with Python. Packt Publishing.
Lloret, E. (2021). Enfoques y retos para la generación automática de resúmenes. Archiletras Científica, 6, 87-103.
Masuzzo, P. (2017). Do You Speak Open Science? Resources and Tips to Learn the Language. PeerJ Prints. https://doi.org/10.7287/peerj.preprints.2689v1 DOI: https://doi.org/10.7287/peerj.preprints.2689
Mitchell, D. (2015). Type-token models: a comparative study. Journal of Quantitative Linguistics, 22(1), 1-21. https://doi.org/10.1080/09296174.2014.974456 DOI: https://doi.org/10.1080/09296174.2014.974456
Mirowski, P. (2018). The future(s) of open science. Social Studies of Science, 48(2), 171-203. https://doi.org/10.1177/0306312718772086 DOI: https://doi.org/10.1177/0306312718772086
Newman, R., Chang, V., Walters, R. J., & Wills, G. B. (2016). Web 2.0. The past and the future. International Journal of Information Management, 36, 591-598. http://dx.doi.org/10.1016/j.ijinfomgt.2016.03.010 DOI: https://doi.org/10.1016/j.ijinfomgt.2016.03.010
Provost, F., & Fawcett, T. (2013). Data Science and its Relationship to Big Data and Data-Driven Decision Making. Big Data, 1(1), 51-59. https://doi.org/10.1089/big.2013.1508 DOI: https://doi.org/10.1089/big.2013.1508
Ribeiro, C., Rodrigues, E., Matos, M. E., & Saraiva, R. (2010). Os Repositórios de Dados Científicos: Estado da Arte. Projeto RCAAP. D24 – Relatório. https://hdl.handle.net/10216/23806
Svensson, P. (2010). The Landscape of Digital Humanities. Digital Humanities Quarterly, 4(1).
Van der Aalst, W. (2016). Process Mining. Springer-Verlag. DOI: https://doi.org/10.1007/978-3-662-49851-4
Van Dijk, T. A. (1979). Recalling and summarizing complex discourse. Em W. Burghardt & K. Hoker (eds.), Text processing (pp. 49-118). De Gruyter. DOI: https://doi.org/10.1515/9783110837537-004
Wright, S. E., & Budin, G. (1997). Handbook of Terminology Management. Volume I: Basic Aspects ofTerminology Management. John Benjamins. DOI: https://doi.org/10.1075/z.htm1
Zhao, B. (2017). Web scraping. Em L. A. Schintler, & C. L. McNeely (eds.), Encyclopedia of Big Data (pp. 1-3). Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-32001-4_483-1
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Ivan Arias Arias, Margarida Oliveira Ramos de Castro

This work is licensed under a Creative Commons Attribution 4.0 International License.