
The Canada Research Chair on the Transformations of Scholarly Communication (Vincent Larivière and Maxime Sainte-Marie, Université de Montréal) is developing a set of algorithms for pre-processing, text/frame/image segmentation, OCR and linguistic post-processing, adapted to historical documents in French and in English. These developments will allow to make the plain text of the journals and newspapers digitized by Bibliothèque et Archives nationales du Québec suitable for textual analysis. 

Cutting-edge AI technologies are used, such as the use of neural networks for post-linguistic data-cleaning.