Uncover Hidden Data in the PDF

Recently, security researchers from different organizations are publishing and sharing frequent electronic documents like PDF files. But there are still many organizations that are not aware that these documents can negotiate all the sensitive data such as authors’ names, specifications on the information system, and architecture.

The French National Institute for Research in Computer Science and Automation (INRIA) asserted that it has accumulated and analyzed nearly 39664 PDF files that were published on the websites of 75 security agencies from a total of 47 countries.

According to the experts, all this data can be exploited efficiently by the threat actors to track and later strike an organization. The analyst has recognized only 7 security agencies that have sanitize some of their PDF files before publishing.

Hidden data in PDF files

The hidden data that was exposed through these PDF files are mentioned below:-

  • Metadata
  • Embedded Content and Attached Files
  • Scripts
  • Hidden Layers
  • Embedded Search Index
  • Stored Interactive Form Data
  • Reviewing and Commenting
  • Hidden Page, Image, and Update Data
  • Obscured Text and Images
  • PDF Comments (Non-Displayed)
  • Unreferenced Data

Data Involved

The cybersecurity analysts, Supriya Adhatarao and Cedric Lauradoux of INRIA (French National Institute for Research in Computer Science and Automation) have affirmed that luckily they have recovered the sensitive data that was involved from 76% of the files they analyzed.

  • Name of the author
  • Name of the PDF app
  • Author email
  • Operating system
  • Device details
  • File path information
  • Comments and annotations

Sanitization tools and level of sanitization

Sanitizations tools that are used in this activity are mentioned below:-

  • Adobe Acrobat tool
  • GhostScript
  • Exiftool
  • Text processing software
  • Xtools

Level of sanitization:-

The analyst has distinguished four different levels of PDF file sanitization, and here they are:-

  • Level-0
  • Level-1
  • Level-2
  • Level-3

Apart from this, there are many PDF files that were published by the agencies that included hidden information which can be utilized to target their employees to execute further ill-disposed activities.

Even the researchers were also able to detect an employee who has not modernized his/her software for 5 years during this investigation. But, Footprinting an organization utilizing its published PDF files is quite effective.

Moreover, the security agencies and other government organizations that are failed to extract this data from their public documents have exposed themselves to attack.

You can follow us on LinkedinTwitterFacebook for daily Cybersecurity and hacking news updates.

Also Read

FormBook Malware Campaigns Attacking Aerospace and Manufacturing sectors Using PDFs, DOC and XLS Files

DNSpooq Vulnerability In DNS software Let Attackers hijack Millions of Network Devices

Guru Baran
Gurubaran is a co-founder of Cyber Security News and GBHackers On Security. He has 10+ years of experience as a Security Consultant, Editor, and Analyst in cybersecurity, technology, and communications.