How to Read Excel files in Java using Apache POI

Excel files (spreadsheets) are widely used by people all over the world for various tasks related to organization, analysis, and storage of tabular data.

Since excel files are so common, we developers often encounter use-cases when we need to read data from an excel file or generate a report in excel format.

In this article, I’ll show you how to read excel files in Java using a very simple yet powerful open source library called Apache POI.

And in the next article, You’ll learn how to create and write to an excel file using Apache POI.

Let’s get started!

Dependencies

First of all, We need to add the required dependencies for including Apache POI in our project. If you use maven, you need to add the following dependencies to your pom.xml file -

Maven

<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.17</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>3.17</version>
</dependency>

Gradle

If you use gradle then you can add the following to your build.gradle file

compile "org.apache.poi:poi:3.17"
compile "org.apache.poi:poi-ooxml:3.17"

The first dependency poi is used to work with the old Microsoft’s binary file format for excel. These file formats have .xls extension.

The second dependency poi-ooxml is used to work with the newer XML based file format. These file formats have .xlsx extension.

Sample Excel file that We’ll read

Following is a sample excel file that we’ll read in our code. It is created using Google Sheets and has .xlsx extension.

Note that, Although the sample file is of the newer XML based file format (.xlsx). The code that we’ll write will work with both types of file formats - .xls and .xlsx

Apache Poi Sample Excel File

Apache POI terminologies

Apache POI excel library revolves around following four key interfaces -

  1. Workbook: A workbook is the high-level representation of a Spreadsheet.

  2. Sheet: A workbook may contain many sheets. The sample excel file that we looked at in the previous section has two sheets - Employee and Department

  3. Row: As the name suggests, It represents a row in the spreadsheet.

  4. Cell: A cell represents a column in the spreadsheet.

HSSF and XSSF implementations -

Apache POI library consists of two different implementations for all the above interfaces.

  1. HSSF (Horrible SpreadSheet Format): HSSF implementations of POI’s high-level interfaces like HSSFWorkbook, HSSFSheet, HSSFRow and HSSFCell are used to work with excel files of the older binary file format - .xls

  2. XSSF (XML SpreadSheet Format): XSSF implementations are used to work with the newer XML based file format - .xlsx.

Apache POI classes and interfaces - XSSF HSSF implementations

Program to Read an excel file using Apache POI

The following program shows you how to read an excel file using Apache POI. Since we’re not using any file format specific POI classes, the program will work for both types of file formats - .xls and .xlsx.

The program shows three different ways of iterating over sheets, rows, and columns in the excel file -

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.*;
import java.io.File;
import java.io.IOException;
import java.util.Iterator;

public class ExcelReader {
    public static final String SAMPLE_XLSX_FILE_PATH = "./sample-xlsx-file.xlsx";

    public static void main(String[] args) throws IOException, InvalidFormatException {

        // Creating a Workbook from an Excel file (.xls or .xlsx)
        Workbook workbook = WorkbookFactory.create(new File(SAMPLE_XLSX_FILE_PATH));

        // Retrieving the number of sheets in the Workbook
        System.out.println("Workbook has " + workbook.getNumberOfSheets() + " Sheets : ");

        /*
           =============================================================
           Iterating over all the sheets in the workbook (Multiple ways)
           =============================================================
        */

        // 1. You can obtain a sheetIterator and iterate over it
        Iterator<Sheet> sheetIterator = workbook.sheetIterator();
        System.out.println("Retrieving Sheets using Iterator");
        while (sheetIterator.hasNext()) {
            Sheet sheet = sheetIterator.next();
            System.out.println("=> " + sheet.getSheetName());
        }

        // 2. Or you can use a for-each loop
        System.out.println("Retrieving Sheets using for-each loop");
        for(Sheet sheet: workbook) {
            System.out.println("=> " + sheet.getSheetName());
        }

        // 3. Or you can use a Java 8 forEach with lambda
        System.out.println("Retrieving Sheets using Java 8 forEach with lambda");
        workbook.forEach(sheet -> {
            System.out.println("=> " + sheet.getSheetName());
        });

        /*
           ==================================================================
           Iterating over all the rows and columns in a Sheet (Multiple ways)
           ==================================================================
        */

        // Getting the Sheet at index zero
        Sheet sheet = workbook.getSheetAt(0);

        // Create a DataFormatter to format and get each cell's value as String
        DataFormatter dataFormatter = new DataFormatter();

        // 1. You can obtain a rowIterator and columnIterator and iterate over them
        System.out.println("\n\nIterating over Rows and Columns using Iterator\n");
        Iterator<Row> rowIterator = sheet.rowIterator();
        while (rowIterator.hasNext()) {
            Row row = rowIterator.next();

            // Now let's iterate over the columns of the current row
            Iterator<Cell> cellIterator = row.cellIterator();

            while (cellIterator.hasNext()) {
                Cell cell = cellIterator.next();
                String cellValue = dataFormatter.formatCellValue(cell);
                System.out.print(cellValue + "\t");
            }
            System.out.println();
        }

        // 2. Or you can use a for-each loop to iterate over the rows and columns
        System.out.println("\n\nIterating over Rows and Columns using for-each loop\n");
        for (Row row: sheet) {
            for(Cell cell: row) {
                String cellValue = dataFormatter.formatCellValue(cell);
                System.out.print(cellValue + "\t");
            }
            System.out.println();
        }

        // 3. Or you can use Java 8 forEach loop with lambda
        System.out.println("\n\nIterating over Rows and Columns using Java 8 forEach with lambda\n");
        sheet.forEach(row -> {
            row.forEach(cell -> {
                String cellValue = dataFormatter.formatCellValue(cell);
                System.out.print(cellValue + "\t");
            });
            System.out.println();
        });

        // Closing the workbook
        workbook.close();
    }
}

Note that we’re not even using the concrete classes like HSSFWorkbook and XSSFWorkbook to create an instance of the Workbook. We’re creating the workbook using a WorkbookFactory instead. This makes our program format independent and it works for both types of files - .xls and .xlsx.

The program shows three different ways to iterate over sheets, rows, and columns. I prefer the Java 8 forEach loop with a lambda expression. You may use whichever method you like.

Note that, I’ve used a DataFormatter to format and get each cell’s value as String.

Retrieving Cell values by CellType

Instead of using a DataFormatter to format and get each cell’s value as String regardless of the Cell type, You may check each cell’s type and then retrieve its value using various type-specific methods like this -

private static void printCellValue(Cell cell) {
    switch (cell.getCellTypeEnum()) {
        case BOOLEAN:
            System.out.print(cell.getBooleanCellValue());
            break;
        case STRING:
            System.out.print(cell.getRichStringCellValue().getString());
            break;
        case NUMERIC:
            if (DateUtil.isCellDateFormatted(cell)) {
                System.out.print(cell.getDateCellValue());
            } else {
                System.out.print(cell.getNumericCellValue());
            }
            break;
        case FORMULA:
            System.out.print(cell.getCellFormula());
            break;
        case BLANK:
            System.out.print("");
            break;
        default:
            System.out.print("");
    }

    System.out.print("\t");
}

You may now call the above method in the main program to print each cell’s value -

sheet.forEach(row -> {
    row.forEach(cell -> {
        printCellValue(cell);
    });
    System.out.println();
});

Conclusion

That’s all folks! In this article, You learned how to read excel files in Java using Apache POI library. You can find the entire source code on the github repository.

Also, Don’t forget to check out the next article to learn how to create and write to an excel file using Apache POI

Thank you for reading. Until next time!