In 2014, VTeX developed a new innovative solution for authors – SkyLaTeX, which is based on LaTeX online compilation technology. The first implementation of SkyLaTeX is a convenient proofing service for mathematicians and other LaTeX-writing authors for both journal and book production.
However, technology enables the use of LaTeX for authors in a wider scope in the production cycle: authoring – submission – reviewing – production – proofing – indexing. Based on this, we are developing new promising services in the publishing market of heavy math content.
Language correctness is an important quality characteristic of a published scientific publication. Publishers have different views on how to improve the quality of the English language.
Some publishers outsource language editing services, while others, usually due to economic reasons, edit the language of accepted scientific articles by themselves. Often, the quality of the English language is evaluated by non-native English speakers, which leads to difficulties in ensuring the quality of scientific publications.
VTeX promotes the development of tools for the automatic quality evaluation of English language. We developed an open-source tex2text tool for converting LaTeX files into linguistically coherent plain text. We compiled a dataset for the automated evaluation of scientific writing (AESW), which is a set of text extracts from 9,919 published journal articles (mainly within the domain of physics and mathematics) with data before and after language editing. We organize shared research tasks to remain one of the leaders in academic writing evaluation.
AKIS – an AI based service that improves the quality of book indexing. During the development of AKIS, we managed to prepare automated book indexing solutions and software services, which generate primary book index/annotation in LaTeX files. For the convenience of being able to use these advantages, user interface for editing book index in LaTeX files was also created. And, finally, book indexing solution AKIS was implemented in our production.
The tex2txt tool converts LaTeX files to linguistically coherent plain-text that can be used for many natural language processing and analysis tasks. The primary aim of the tool is to help the NLP community simplify the compilation of large databases of NLP-ready texts from LaTeX sources. The tool can also be used as a system component when the input is LaTeX text.
The open-source tool is available here.
More information about tex2txt tool can be found here.
The AESW dataset is compiled from the text output of the tex2txt tool. The dataset is a collection of text extracts from 9,919 published journal articles (mainly within the domain of physics and mathematics) with data before and after language editing. The data are based on selected papers published during 2006-2013 by the Springer Publishing Company and edited at VTeX by professional language editors. Each extract is a paragraph that contains at least one edit done by the language editor. All paragraphs in the dataset were randomly ordered from the source text for anonymization.
The dataset is available here.
The goal of the Automated Evaluation of Scientific Writing (AESW) Shared Task was to analyse the linguistic characteristics of scientific writing to promote the development of automated writing evaluation tools that can assist authors in writing scientific papers. More specifically, the task was to predict whether a given sentence requires editing to ensure its “fit” within the scientific writing genre.
More information can be found in the AESW Shared task report here.
An official web-site of the AESW Shared Task can be found here.