Our IT department works hard by developing sophisticated applications that allow us to meet the ever changing content requirements and specific needs of publishers and authors. VTeX has invested a lot in IT and publication-related technology research, and we are always watching and getting ready for changes and new trends in publishing and overall information technologies.
In 2014, VTeX developed a new innovative solution for authors – SkyLaTeX, which is based on LaTeX online compilation technology. The first implementation of SkyLaTeX is a convenient proofing service for mathematicians and other LaTeX-writing authors for both journal and book production.
However, technology enables the use of LaTeX for authors in a wider scope in the production cycle: authoring – submission – reviewing – production – proofing – indexing. Based on this, we are developing new promising services in the publishing market of heavy math content.
Language correctness is an important quality characteristic of a published scientific publication. Publishers have different views on how to improve the quality of the English language.
Some publishers outsource language editing services, while others, usually due to economic reasons, edit the language of accepted scientific articles by themselves. Often, the quality of the English language is evaluated by non-native English speakers, which leads to difficulties in ensuring the quality of scientific publications.
VTeX promotes the development of tools for the automatic quality evaluation of English language. We developed an open-source tex2text tool for converting LaTeX files into linguistically coherent plain text. We compiled a dataset for the automated evaluation of scientific writing (AESW), which is a set of text extracts from 9,919 published journal articles (mainly within the domain of physics and mathematics) with data before and after language editing. We organize shared research tasks to remain one of the leaders in academic writing evaluation.
The tex2txt tool converts LaTeX files to linguistically coherent plain-text that can be used for many natural language processing and analysis tasks. The primary aim of the tool is to help the NLP community simplify the compilation of large databases of NLP-ready texts from LaTeX sources. The tool can also be used as a system component when the input is LaTeX text.
The open-source tool is available here.
More information about tex2txt tool can be found here.
The AESW dataset is compiled from the text output of the tex2txt tool. The dataset is a collection of text extracts from 9,919 published journal articles (mainly within the domain of physics and mathematics) with data before and after language editing. The data are based on selected papers published during 2006-2013 by the Springer Publishing Company and edited at VTeX by professional language editors. Each extract is a paragraph that contains at least one edit done by the language editor. All paragraphs in the dataset were randomly ordered from the source text for anonymization.
The dataset is available here.
The goal of the Automated Evaluation of Scientific Writing (AESW) Shared Task was to analyse the linguistic characteristics of scientific writing to promote the development of automated writing evaluation tools that can assist authors in writing scientific papers. More specifically, the task was to predict whether a given sentence requires editing to ensure its “fit” within the scientific writing genre.
More information can be found in the AESW Shared task report here.
An official web-site of the AESW Shared Task can be found here.