How to read Excel files with PHP

Reading Excel files with PHP can be tricky, but fortunately there is a great library that makes this task a lot easier: PHPExcel. In the following article I will show you how to use it to convert the excel sheets into PHP arrays and use the data in PHP.

Installing PHPExcel

The first step in working with Excel documents in PHP will be to install the PHPExcel┬álibrary. I will do that by using composer. If you are not familiar, you should definitely have a look at it and use it for your project’s dependency management. The composer.json file is very simple:

{
 "require":
   {
      "phpoffice/phpexcel": "dev-develop"
   }
}

Then you need to install the dependencies with composer, which will take a few seconds to complete:

composer.phar install

You will notice that the library was downloaded and installed in the project_root/vendor/phpexcel directory alongside with the project_root/vendor/composer directory and a vendor/autoload.php file which was automatically generated to take care of autoloading the classes for the library.

Step 1 – Load the file in PHPExcel

There are 2 basic ways to load the file into PHPExcel. You can specify one of the supported file types manually or you can let the library determine the file type automatically based on the file that you supply. Here is the required code for both and a list of the supported file types that you can choose from if you decide to explicitly define the file type:

<?php
require_once "vendor/autoload.php";

$fileName = "sample.xlsx";

/** automatically detect the correct reader to load for this file type */
$excelReader = PHPExcel_IOFactory::createReaderForFile($fileName);

/** Create a reader by explicitly setting the file type.
// $inputFileType = 'Excel5';
// $inputFileType = 'Excel2007';
// $inputFileType = 'Excel2003XML';
// $inputFileType = 'OOCalc';
// $inputFileType = 'SYLK';
// $inputFileType = 'Gnumeric';
// $inputFileType = 'CSV';
$excelReader = PHPExcel_IOFactory::createReader($inputFileType);
*/

Step 2 – Define the reader options

I won’t go into much details about the different options you can use, because they are quite a few, but I will point out the most common and useful ones.

//if we dont need any formatting on the data
$excelReader->setReadDataOnly();

//load only certain sheets from the file
$loadSheets = array('Sheet1', 'Sheet2');
$excelReader->setLoadSheetsOnly($loadSheets);

//the default behavior is to load all sheets
$excelReader->setLoadAllSheets();

These are pretty straightforward. Make sure you only load the sheets that you need to use since the library can use a lot of memory especially for large files. Also the setReadDataOnly method helps speed things up a bit by only loading the data from the cells without any special excel formatting. You have to be aware though that any date formatting will be lost if you use this option and dates will be loaded as numbers with no formatting.

A more interesting feature is the ability to add filters to the reader object. This allows to only load certain columns and rows or load the excel data in chunks, which is especially useful if you are doing some sort of a database import. I will provide a basic example of both filters. The following read filter will instruct the reader to only load the data in rows 1 to 10 from columns A to C. You can of course put much more advanced logic in the filters, making them pull different rows and columns from different sheets.

<?php
class SampleReadFilter implements PHPExcel_Reader_IReadFilter {
    public function readCell($column, $row, $worksheetName = '') {
        // Read rows 1 to 10 and columns A to C only
        if ($row >= 1 && $row <= 7) {
           if (in_array($column,range('A','C'))) {
             return true;
           }
        }
        return false;
    }
}

After we have created the filter, we need to pass it to the reader object:

$sampleFilter = new SampleReadFilter();
$objReader->setReadFilter($chunkFilter);

Step 3 – Load and display the data

The final step is to load the excel data into PHP.

$excelObj = $excelReader->load($fileName);

This will produce a PHPExcel object, but in order to easily modify and transform the data to suit our needs its better to convert it to an array.

$excelObj->getActiveSheet()->toArray(null, true,true,true);

Keep in mind that PHPExcel will only display the information on the currently active sheet, which is the last loaded one. You can however manually switch between sheets and get their contents or you can automate the process and get all sheets as one array:

//get all sheet names from the file
$worksheetNames = $excelObj->getSheetNames($fileName);
$return = array();
foreach($worksheetNames as $key => $sheetName){
//set the current active worksheet by name
$excelObj->setActiveSheetIndexByName($sheetName);
//create an assoc array with the sheet name as key and the sheet contents array as value
$return[$sheetName] = $excelObj->getActiveSheet()->toArray(null, true,true,true);
}
//show the final array
var_dump($return);

Loading data in chunks

The approach outlined above is fine in most cases, but if you need to load some information for example in a database and you dont need the entire excel file loaded at once it will be much faster to split it in chunks and work with each individual chunk at a time. To do that we need a simple read filter:

<?php
class ChunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;

/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}

public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the configured rows
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}

/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 2048;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();

/** Tell the Reader that we want to use the Read Filter **/
$excelReader->setReadFilter($chunkFilter);

/** Loop to read our worksheet in "chunk size" blocks **/
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {
/** Tell the Read Filter which rows we want this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
/** Load only the rows that match our filter **/
$excelObj = $excelReader->load($inputFileName);
$data = $excelObj->getActiveSheet()->toArray(null, true,true,true);
// Do some processing here - the $data variable will contain an array which is always limited to 2048 elements regardless of the size of the entire sheet

}
  • Igor Karpov

    Unfortunately this is good soluton for small xls files. But its doesnt work for huge xls files.

    If you need to parse huge xls files and read all xls formats, you should use XLS API parser http://noxls.net/documentation

    Also its good solution for not so power and limited servers.

  • bliksempie1975

    Is there a way to do chunking for exports? My application keeps running out of memory with 1GB process memory allocated. The database table has 35K rows in it, and the report crashes around 5K records. But when using CSV exports (and the exact data retrieval algorithm) I can dump all 35K rows successfully. Any ideas?

    • Igor Karpov

      I meet with same problem and php used all my 8GB RAM
      try my proposal XLS API parser http://noxls.net

  • Rahul Chourasiya

    Nice post Mr. Igor, One suggestion is, It might be more understandable if you also give the file name with the codes, so that we can easily separate out the codes (that which part of code needs to put into the particular file).

  • Sazzad

    Nice post! But can you please clear that why you had to use the file name two times? e.g: createReaderForFile($fileName); and load($fileName);

    • ferisoft

      You can refer to the docs here:
      https://github.com/PHPOffice/PHPExcel/blob/develop/Documentation/markdown/ReadingSpreadsheetFiles/05-Reader-Options.md

      But essentially the first one is used to determine the correct reader to create from the factory method based on the file (im not sure if it checks for extension only or other file metadata as well). After the correct reader is created you then call the load method on it with the file again to actually load the data into PHP. Hope that helps.

      • Sazzad Tushar Khan

        How can it help if you reply after 2 months! :D

        • sn1987a

          @Sazzad: Thank you very much for asking this question, I’m also curious about this. And because this is an open-source community, please do not expect people to answer you at once.

  • rame

    how to get row count at specific column?

  • Roman

    It a shame that official documentation and many posts lack description of such a basic operations such as getting data from specific row/column from file. All of them stop at getting $excelObj and $activeSheet. It would be more helpful to show up how we can get actual data from row, column, cell, search through data etc.

    • ferisoft

      “A more interesting feature is the ability to add filters to the reader object. This allows to only load certain columns and rows or load the excel data in chunks…”. Read on from that point. You can create a filter that will only load the fields that you need in an array and then you can do whatever you want with the array in PHP really.

      • Roman

        ferisoft, thank you for reply. At the time I was writing that comment I’ve spent a lot of time on research and still didn’t have a valuable info. I was need to dig deeper. I think filter is the key component when we want to read and parse data, which is barely mentioned in blog posts about PHPExcel.

  • Adrien

    Unfortunately, loading data in chunks won’t help if you have to read a really large spreadsheet. And you’ll end up with Out of Memory errors.

    So if you happen to need to read (or write) such files, I can recommend you to take a look at Spout: https://github.com/box/spout. It can read and write spreadsheets of any size and really quickly

    • http://harshamadushan.webuda.com Harsha Madushan

      Thanks in millions for mention this library :D cheers worked with 140K rows around 18 MB file.

  • http://www.facebook.com/sudhir600 Sudhir Gupta

    how to use composer.phar
    i am entirely new.
    i used window 8 and wamp.

    • Milimo

      Create a .bat file of composer.phar.
      Run the command;
      echo @php “%~dp0composer.phar” %*>composer.bat
      it should do the job.
      Refer to https://getcomposer.org/doc/00-intro.md to clearly understand how it all works. You have probably figured the answer out by now but in case somebody else in future needs an answer.

  • Eric

    Thank you for this valuable article. Saved my day.

  • Sarang Bondre

    $data = $excelObj->getActiveSheet()->toArray(null, true,true,true);
    This is giving and error Fatal error: Uncaught PHPExcel_Calculation_Exception:
    When the field has VLOOKUP value.

    can anyone please help me with this?

  • Amirul Haq Haizul

    //download_example.php

    <?php
    //include 'reader.php';
    //$excel = new Spreadsheet_Excel_Reader();
    //$output = "”;
    //Sheet 1 (Original):
    //
    //read(‘test.xlsx’);
    //$x=1;
    //while($xsheets[0]['numRows']) {
    //echo “tn”;
    //$output .= “tn”;
    //$y=1;
    //while($ysheets[0]['numCols']) {
    //$cell = isset($excel->sheets[0]['cells'][$x][$y]) ? $excel->sheets[0]['cells'][$x][$y] : ”;
    //$output .= “tt$celln”;
    //$y++;
    // }

    //echo “Amirul Haq1995PuchongMus1990Puchong”;
    //$output .= “tn”;
    //$x++;
    //}

    //$output .= “”;
    //Utk menerima data.
    if(isset($_POST['submit'])){
    if(isset($_POST['output'])){
    $output=$_POST['output'];

    header(“Content-Type: application/xls”);
    header(“Content-Disposition: attachement; filename=export_test.xls”);
    echo $output;

    }
    }

    ?>

  • Igenyar Saharam

    I would stay away from PHPExcel. I use it for a while until I need it to handle so-called “large file” (actually an xlsx only 3MB, nowadays who would take 3MB as big, seriously), still load function takes too long which finally gets killed by the system. When you search online, you would see they’ve been struggling with memory problem for years and still cannot address properly. You don’t want to use it unless you like to fix other people’s problem.