Class ParquetBackedPBEventFileStream

java.lang.Object
edu.stanford.slac.archiverappliance.plain.parquet.ParquetBackedPBEventFileStream
All Implemented Interfaces:
ETLParquetFilesStream, Closeable, AutoCloseable, Iterable<Event>, ETLBulkStream, EventStream, RemotableOverRaw

public class ParquetBackedPBEventFileStream extends Object implements ETLParquetFilesStream, RemotableOverRaw
An EventStream implementation that reads data from one or more Parquet files.

This class serves two primary purposes:

  1. Data Retrieval: It can stream events from a list of Parquet files, applying time-based filters using Parquet's predicate pushdown for efficient querying.
  2. Optimized ETL: It implements ETLParquetFilesStream, allowing it to act as a logical concatenation of multiple source files. The ParquetETLInfoListProcessor uses this capability to combine smaller Parquet files (e.g., hourly) into larger ones (e.g., daily) without fully deserializing and re-serializing the data, significantly improving ETL performance.
See Also: