Academics on arXiv Relying on LaTeX Leak Confidential Data

A detailed analysis of thousands of arXiv submissions revealed that 88% of LaTeX source files leaked private data, including internal notes, comments, and credentials.

CSBadmin
3 Min Read

Researchers publishing on the arXiv preprint server are inadvertently exposing sensitive information through their LaTeX source files. A recent investigation scrutinized thousands of submissions and reached a concerning conclusion. A staggering 88 percent of the analyzed LaTeX sources contained some form of data leakage. This includes internal notes, development comments, and even plaintext credentials that were meant to remain private.

Data Leakage Scope

The study focused on papers submitted between 2020 and 2025. It examined both the final PDF and the accompanying LaTeX source code that authors upload. The findings reveal that many researchers treat the LaTeX source as a working document. They leave behind developer notes, file paths pointing to local machines, and comments that should have been removed before submission. In more egregious cases, tokens for version control systems or API keys appeared directly in the code. These leaks are not limited to early career researchers. Cybersecurity experts were among those who exposed hidden details in their source files.

Types of Exposed Data

One of the more striking examples involved a paper on password security. The LaTeX source included a comment listing a full table of credentials used for testing. Another instance saw an unredacted username and password for a private cloud server left in the text. These oversights create a direct path for attackers to exploit researchers’ infrastructure. The leak of institutional or grant information also poses a risk for targeted attacks.

Mitigation Recommendations

Researchers can take simple steps to address this. Removing the auxiliary files generated by LaTeX before submission is a good starting point. Running a linter or a pre-commit hook that scans for common patterns like http:// or password can catch most errors. Authors should also review their source files for any stray metadata. A final check that compares the source with the compiled PDF can reveal hidden elements. These practices can close the gap between the final paper and the messy development work that created it.


Source: Academics on arXiv Relying on LaTeX Leak Confidential Data

CSBadmin

The latest in cybersecurity news and updates.

Share This Article
Follow:
The latest in cybersecurity news and updates.