Data and science

Evolution of science

The way science looks today differs greatly from the scientific practices of the past. The colossal amount of data and the tools for handling them have a dramatic effect on the way science is done.

Big data is changing science in 2 ways:

  • On the one hand, science can gather increasing amounts of data from the society that may be used for analysis.
  • On the other hand, scientific activities themselves also produce larger amounts of data than ever before.

We live in a data-driven world. At any time we have access to a huge amount of digital information, which is growing day by day. The increase in the amount of available data has opened the door to a new area of research based on big data – huge data sets that contribute to the creation of better operational tools in all sectors as well as develop scientific research.

Data driven science: a new paradigm?

Science is the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence : observation, experiment, induction, repetition, critical analysis, verification and testing.

Since the beginnings of science, different scientific methodologies have emerged. Some have profoundly changed the way research is conducted, leading to paradigm shifts. The impact of data on science is also causing profound changes. We speak of data driven science, an empirical research method which aims at making inferences from to huge amounts of data.

The debate on the advent of a 4th paradigm remains open. For some, it is not so much a new paradigm as a method which is complementary to traditional approaches and is needed because of the presence of large volumes of data.

In any case, science is increasingly focused on data which, because of their openness and exponential growth, must now be taken into account in the scientific research process.

Let’s focus now on the consequences of the consideration of data according to disciplines.

Consequences according to disciplines

The term ‘data’ intuitively seems to be more prevalent in natural and social sciences (e.g. survey data, experimental data), today, due to the widespread use of digital means in the academic workflows, humanities researchers seem more inclined to consider their sources and results as research data.

Disciplinary specificities: the digital humanities

Digital Humanities is an emerging field of science where scholars from across the humanities (historians, linguists, artists, media scholars, etc.) connect with librarians, computer and data scientists.

By Calvinius — Personal work: http://www.martingrandjean.ch/wp-content/uploads/2013/10/HumanitesNumeriques.jpg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=29275453

At the beginning, the digital humanities were mainly curating and analyzing data that were born analogue (texts, objects and images) but subsequently archived into digital forms that could be searched for automated guide analysis and visualisation.

Today, digital humanities consist of the use of sophisticated tools of curating and sharing data, augmenting the scale of research across a more vast range and volume of sources. Rather than concentrating on a basket of sources to analyze, it becomes possible to manage thousands of cultural products (paintings, books, photos, articles, etc.). Counting, classifying, graphing and mapping these data may offer new insights and raise interest in humanities as a field of science.

Some common practices in Digital Humanities are Text and Data Mining and Data visualisation.

Text and Data Mining

Text mining, or Text and Data Mining (TDM), is a field which, with the use of appropriate tools, deals with text analysis, exploration, preparation of summaries, clustering and categorisation of documents, finding groups of words with similar meaning or automatic recognition of complex expressions.

By using text-mining methods it is possible to obtain data from the text that are suitable for quantitative statistical analysis. By using text mining, a completely different approach to text data is used. They are no longer treated as purely qualitative data, but as a specific source of quantitative data – above all, on the frequency of occurrence of individual words in the analysed text. Text mining allows relatively automated searches of very large portions of text for keywords, their density and so on. This makes it possible to apply new methods of data analysis and to obtain new types of information concerning, among other things, the nature of the analysed texts or the variation in the frequency of keywords over time.

Gabriel Gallezot, Marty Emmanuel. Le temps des SIC. MIÈGE, Bernard, PELISSIER, Nicolas et DOMENGET. Temps et temporalités en information-communication: Des concepts aux méthodes., L’Harmattan, pp.27-44, 2017, 10.5281/zenodo.1000778. sic_01599944

Data visualisation

This modernised technology (and at the same time methodology) enters every sphere of human activity: from research and development to business, social activities and art. It is a practical knowledge of how to graphically « master » huge sets of data that describe a given aspect of reality.

Example of a data visualisation from a research on Icos Carbon Portal

The purpose of data visualisation is to show the information held in a way that allows its accurate and effective understanding and analysis. This is because people easily recognize and remember the images presented to them (shape, length, construction etc.). Thanks to the visualisation we can combine large data sets and show all the information at the same time, which greatly facilitates the analysis. We can also use visual comparisons, thanks to which it is much easier to find many facts. Another advantage is the ability to analyse data at several levels of detail.

Here is an example of data visualisation from the « Republic of Letters ». Researchers map thousands of letters exchanged in the 18th century’s and learn at a glance what it once took a lifetime of study to comprehend.

We deal with visualisation at every step of our lives. Graphic representation is used on television, in the press and in any other source of information (excluding radio stations) whenever there is numerical data. Visualisation is necessary when we want to: show the rate of a certain currency at the turn of a certain time (linear chart), election results (histograms) or for example the weather forecast. However, these are not the only examples of graphic representation of data. It can serve not only to make it easier to see certain properties, but even to discover them. This applies, above all, to large data sets, which are compiled over many years in favour of subsequent research.