Java 读取Excel大文件

内容纲要

1、背景

业务方需求:随着我们商户不断增加,商户的数据需要一次性通过Excel导出到我们系统以初始化商户的数据。商户数据一般在几万左右使用一个Excel文件导入到我们业务系统,要求业务系统快速响应,数据尽快导入完成。

2、问题

  1. 需求方提出的Excel数据在几万,使用POI处理Excel比较消耗性能。

3、解决方案

  1. 使用excel-streaming-reader解决大Excel读取问题。
    excel-streaming-reader Github开源地址:https://github.com/monitorjbl/excel-streaming-reader
  2. 分页读取Excel分页处理
  3. excel-streaming-reader只支持xlsx格式Excel

3.1、excel-streaming-reader性能-利用临时文件快速访问

以下是作者对excel-streaming-reader性能高的解释:

This library will take a provided InputStream and output it to the file system. The stream is piped safely through a configurable-sized buffer to prevent large usage of memory. Once the file is created, it is then streamed into memory from the file system.The reason for needing the stream being outputted in this manner has to do with how ZIP files work. Because the XLSX file format is basically a ZIP file, it’s not possible to find all of the entries without reading the entire InputStream.
This is a problem that can’t really be gotten around for POI, as it needs a complete list of ZIP entries. The default implementation of reading from an InputStream in POI is to read the entire stream directly into memory. This library works by reading out the stream into a temporary file. As part of the auto-close action, the temporary file is deleted.

其实就是通过创建临时文件来提升性能,实现代码如下:

// 创建临时文件
File f = Files.createTempFile("tmp-", ".xlsx").toFile();
// 将原Excel文件复制到临时文件中
try(FileOutputStream fos = new FileOutputStream(f)) {
    int read;
    byte[] bytes = new byte[bufferSize];
    while((read = is.read(bytes)) != -1) {
        fos.write(bytes, 0, read);
    }
    return f;
} finally {
    is.close();
}

3.2、excel-streaming-reader性能-利用小缓冲区缓存数据减少内存使用

3.3 实施步骤

3.3.1、引入maven依赖

  <dependency>
    <groupId>com.monitorjbl</groupId>
    <artifactId>xlsx-streamer</artifactId>
    <version>2.1.0</version>
</dependency>

3.3.2、加载大Excel

InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
return StreamingReader.builder()
        .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
        .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
        .open(is);            // InputStream or File for XLSX file (required)

3.3.3、分页读取Excel

private List<Row> nextPage(int pageSize, int sheetIndex) {
    Sheet sheet = this.wk.getSheetAt(sheetIndex);

    int limit = pageSize;
    List<Row> list = new ArrayList<>(limit);
    for (Row row : sheet) {
        list.add(row);

       limit--;
        if (limit < 1) {
            break;
        }
    }
    return list;
}

4、参考

  1. github:excel-streaming-reader
  2. java处理Excel数据,赶快进来收藏~
  3. 十万级别Excel数据的导入导出

发表评论

您的电子邮箱地址不会被公开。 必填项已用 * 标注

滚动至顶部