Skip to main content
How to read Content from PDF and Word Document files using PHP?

How to read Content from PDF and Word Document files using PHP?

How to read Content from PDF and Word Document files using PHP? I got this question while working with one of interesting PHP project so got in mind to share my solution.

Basically my task was to get content from PDF or Word Document file and store into the mysql database. Here I am going to show you how we can extract the content from PDF as well Word Document file and print or display.

I am using Ubuntu 14.04 along with the PHP 5 installed, I am also going to install extra packages throughout this tutorial.

Let get started:

1. Read Content from PDF file:

Our first step is to installed XPDF package which is going to help us extract pdf files:

 XPDF Installation:

if you have successfully installed the XPDF package try to run pdftotext from the terminal to verify it is successfully installed.

You should get following output from pdftotext package:

PDF to Text output from terminal

We are ready to write a PHP script to execute shell command and extract pdf files:

Create new file and add following code to extract your pdf files, make sure with pdf file path. you can also use dynamic uploaded files you just needs to replace the file URL and your good to go.

If you run this file you should see the text from pdf file.

2. Read content from Word Document file

As I said we will need to install package for word document, so to read text from doc we have to install package called Antiword. Before going to move on this package please note that it is not going to support docx files. If you still needs to read docx files you simple need to convert them from docx to doc format then it will work.

Let’s get started by first step to install antiword package:

We have our package installed and ready to use, let’s move on to the next to step to write php script and read the content from word document file.

Create new file and add following script:

Try to run the above script, you should get your document content on the screen, if you get any issues you can alway comment in the comment section below. Thanks!