Moving towards an Open Science: A statistical approach based on the generation of manual and automatic extractive summaries

Authors

DOI:

https://doi.org/10.21814/h2d.4479

Keywords:

automation, open science, data science, corpus, data democratization, extractive summaries

Abstract

Nowadays, the diffusion of new media has facilitated the proliferation of scientific data, which can be disseminated thanks to new information processing techniques. This article aims, based on the workflow established between open data and data science, to analyse manually and automatically generated summaries in statistical terms. As a result, we evaluate new possibilities of making scientific knowledge more accessible as we move towards a data democratization. Taking a corpus of abstracted texts as a starting point, quantitative analysis will thus be carried out using theoretical foundations that will allow us to draw conclusions about the feasibility of automation to achieve an open science.

Downloads

Download data is not yet available.

References

Atkins, S., Clear, J., & Ostler, N. (1992). Corpus Design Criteria. Literary and Linguistic Computing, 7(1),1-16. DOI: https://doi.org/10.1093/llc/7.1.1

Batarseh, F. A., & Yang, R. (2020). Data Democracy. At the nexus of artificial intelligence, software development, and knowledge engineering. Elsevier Academic Press.

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O’Relly.

Cavique, L. (2014). Big data e data science. Boletim da APDIO, 51, 11-14. http://hdl.handle.net/10400.2/3918

Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words. Proceedings of the Association for Computational Linguistics (ACL), 484-494. DOI: https://doi.org/10.18653/v1/P16-1046

Davenport, T. H., & Patil, D. J. (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century

European Comission (2016). Open Innovation. Open Science. Open to the World: A Vision for Europe. Publications Office of the European Union. https://data.europa.eu/doi/10.2777/552370

Hartley, J. (2016). Is time up for the Flesch measure of reading ease? Scientometrics, 107(3), 1523-1526. DOI: https://doi.org/10.1007/s11192-016-1920-7

Layton, R. (2015). Learning Data Mining with Python. Packt Publishing.

Lloret, E. (2021). Enfoques y retos para la generación automática de resúmenes. Archiletras Científica, 6, 87-103.

Masuzzo, P. (2017). Do You Speak Open Science? Resources and Tips to Learn the Language. PeerJ Prints. https://doi.org/10.7287/peerj.preprints.2689v1 DOI: https://doi.org/10.7287/peerj.preprints.2689

Mitchell, D. (2015). Type-token models: a comparative study. Journal of Quantitative Linguistics, 22(1), 1-21. https://doi.org/10.1080/09296174.2014.974456 DOI: https://doi.org/10.1080/09296174.2014.974456

Mirowski, P. (2018). The future(s) of open science. Social Studies of Science, 48(2), 171-203. https://doi.org/10.1177/0306312718772086 DOI: https://doi.org/10.1177/0306312718772086

Newman, R., Chang, V., Walters, R. J., & Wills, G. B. (2016). Web 2.0. The past and the future. International Journal of Information Management, 36, 591-598. http://dx.doi.org/10.1016/j.ijinfomgt.2016.03.010 DOI: https://doi.org/10.1016/j.ijinfomgt.2016.03.010

Provost, F., & Fawcett, T. (2013). Data Science and its Relationship to Big Data and Data-Driven Decision Making. Big Data, 1(1), 51-59. https://doi.org/10.1089/big.2013.1508 DOI: https://doi.org/10.1089/big.2013.1508

Ribeiro, C., Rodrigues, E., Matos, M. E., & Saraiva, R. (2010). Os Repositórios de Dados Científicos: Estado da Arte. Projeto RCAAP. D24 – Relatório. https://hdl.handle.net/10216/23806

Svensson, P. (2010). The Landscape of Digital Humanities. Digital Humanities Quarterly, 4(1).

Van der Aalst, W. (2016). Process Mining. Springer-Verlag. DOI: https://doi.org/10.1007/978-3-662-49851-4

Van Dijk, T. A. (1979). Recalling and summarizing complex discourse. Em W. Burghardt & K. Hoker (eds.), Text processing (pp. 49-118). De Gruyter. DOI: https://doi.org/10.1515/9783110837537-004

Wright, S. E., & Budin, G. (1997). Handbook of Terminology Management. Volume I: Basic Aspects ofTerminology Management. John Benjamins. DOI: https://doi.org/10.1075/z.htm1

Zhao, B. (2017). Web scraping. Em L. A. Schintler, & C. L. McNeely (eds.), Encyclopedia of Big Data (pp. 1-3). Springer International Publishing. DOI: https://doi.org/10.1007/978-3-319-32001-4_483-1

Published

2024-07-29

How to Cite

Arias, I., & Castro, M. (2024). Moving towards an Open Science: A statistical approach based on the generation of manual and automatic extractive summaries. H2D|Digital Humanities Journal, 5. https://doi.org/10.21814/h2d.4479

Issue

Section

Artigos