Working with PDF files on Linux

A couple of tips for working with PDF files on Linux including how to fix broken PDF files, how to digitally sign PDF files & how to visual compare different versions of a PDF.


Fixing PDF files with broken XREF tables

We had a situation recently in which the PDF files created by a particular clinical system couldn’t be processed by other applications. On further inspection, it turned out that the PDF files were actually corrupted. Although the files would load correctly in Adobe reader, when any other program tried to process the files they would fail. I guess Adobe reader was detecting the corruption and fixing the data stream on the fly.

It turned out the XREF tables within the PDF file were corrupted. There is a good article titled The trouble with the XREF table which explains the problem in more detail.

The supplier of this clinical system couldn’t easily resolve the issue because they were using a third party tool to generate their PDF files. So I had to find a way to fix these PDF files as they came out of the system.

Luckily the free version of the pdf toolkit by PDF Labs would read the files and attempt to fix the broken XREF tables. This is essentially the command I use to fix the PDF files as they come out of the afflicted system.

pdftk broken.pdf output fixed.pdf

It’s worth nothing that the pdf toolkit can also be used on Windows machines.

Digitally signing PDF files

PDF documents which have been digitally signed with a security certificate are legally enforceable in many countries. This is very important in the medical field as it ensures that a patients medical documentation has not been altered since the doctor or consultant signed them.

In the clinic letters system I’ve developed (Open ALMA) I introduced a work flow allowing consultants to digitally sign clinic letters before being sent back to the GP. The final letter is rendered as a PDF file using the LaTeX publishing system. The publishing of these PDF files is done on a Linux server. Therefore, I needed to find a Linux command line utility which could sign each PDF file with a unique security certificate identifying the person who reviewed and signed the content.

I settled on using PortableSigner which is a Java application distributed under a free European Union Public Licence. What I like about PortableSigner, beside the fact it can be invoked from the command line, is that you can embed a comment, a reason and a location during the PDF signing process. The location field can be used to store the computers host name or IP address thus documenting which terminal the signer was sat at while signing the letter.

Below is a sample command line demonstrating how to sign a PDF called input.pdf with the PortableSigner tool.

java -jar PortableSigner.jar
     -b en
     -t input.pdf
     -o output.pdf
     -s certificate.pfx
     -p secret
     -c "Signed after 4 alterations"
     -r "Approved for publication"
     -l "Department of Dermatology"

The n switch instructs the program not to invoke the GUI so that it can be run via the command line. The b switch appends a signature block to the end of the PDF file, which is very useful. The en parameter sets the language for this block to English. The s switch denotes the file name for the digital certificate you wish to use and the p switch passes in the password for that certificate.

When the resulting PDF file is opened within the Adobe Reader software, the signing toolbar appears showing you who signed the file. The PDF is also protected from being edited. This allows you to prove to a clinical audit or review that the PDF file has not been altered since it was signed.

Automating the creation of self signed certificates

With several hundred doctors and health professionals who could potentially sign documents, I needed to find a way to automate the creation of unique certificates for each member of staff. The generated certificate would be stored under the users unique employee number.

The openssl command can be used to generate these certificates. The example code below demonstrates how to create a certificate which is valid for one year for a Dr Khan with an employee number of 357. It’s useful to embed the consultants name, job title and hospital name within the organisation field of the certificate.

openssl req -x509
       -nodes -days 365
       -newkey rsa:1024
       -keyout 357.key
       -out 357.pem
          /O=Dr Z Khan Consultant Dermatologist
             (Holby City Hospital)

PortableSigner expects a .pfx (Personal Information Exchange) file as the certificate input. The .pfx file is a password protected container format that contains both the public and private certificate pairs. Unlike .pem files, this container is fully encrypted. Openssl can be used to convert the .pem file into a .pfx file like:

openssl pkcs12 -export -out 357.pfx -in 357.pem

However, when you run this openssl conversion you are prompted to enter an export password and then verify it. This password is also required when signing the PDF with the .pfx file.

I needed the ALMA application to automatically create these certificate files when a new employee is added to the system. I also wanted the system to issue and maintain a unique certificate password for each employee. So I needed to find a way to get around this password prompting to fully automate the process.

My solution was to have ALMA write an expect script. Expect is a Linux command line tool for automating interactive applications such as telnet and ftp - and worked well for automating the insertion of the export password. Expect works by looking for a known text prompt where the password is required. It then uses its send command to forward the password to the command being spawned, in this case openssl.

Below is an example of the expect script that ALMA writes and invokes from the command line, deleting the script afterwords to remove all traces of the password.

#!/usr/bin/expect -f
spawn openssl pkcs12 -export -out 357.pfx -in 357.pem
expect "Enter Export Password:"
send   "$secret_strong_password\r"
expect "Verifying - Enter Export Password:"
send   "$secret_strong_password\r"

Need help signing PDF files for your business?

If you need help with signing your PDF documents digitally, then please get in touch as I will be able to help you on a consultancy basis.

I will be able to advise on which certificate authorities are explicitly trusted by the Adobe Reader software so that you get the pale blue notification bar verifying your signature when opening a signed document. I can install the necessary software and certificates on your company website, or if needed provide you with a server dedicated for signing documents.

Compare PDF files with a visual diff tool

Many of the clinical systems I develop at work produce PDF reports when the patient is discharged. In some instances these reports can be amended after the initial report is generated. The clinical audit team needed a visual tool to compare any two revisions & see the differences highlighted in red.

To do this I used the excellant DiffPDF tool created by Qtrac Ltd

DiffPDF screeen shot

The tool is available for Windows and can be installed on a Ubuntu based Linux system by using the following command:

sudo apt-get install diffpdf

The DiffPDF tool can be easily integrated into your existing software solution, as you can pass in the PDF file names as command line options. So in my case, I produce a screen for the clinical audit team to use, which showed patients with reports which had revisions. When an auditor pick a patient for review I simply launched DiffPDF passing across the appropriate file names to compare.