Class ParquetAppendDataStateData

java.lang.Object
edu.stanford.slac.archiverappliance.plain.AppendDataStateData
edu.stanford.slac.archiverappliance.plain.parquet.ParquetAppendDataStateData

public class ParquetAppendDataStateData extends AppendDataStateData
A stateful class that manages the process of writing and appending EPICS event data to Parquet files.

This class extends AppendDataStateData to provide Parquet-specific logic. It handles file partitioning, writing new events, and merging data.

Key behaviors:

  • Incremental Appends: When appending new events to an existing file, this class writes the new data to a temporary file. When the stream is closed, this temporary file is efficiently merged with the original data file.
  • Bulk Appends (ETL): For ETL operations, it uses an optimized path that leverages ParquetRewriter to merge multiple source Parquet files directly into the destination file without full deserialization.
  • Constructor Details

    • ParquetAppendDataStateData

      public ParquetAppendDataStateData(PartitionGranularity partitionGranularity, String rootFolder, String desc, Instant lastKnownTimestamp, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName, PVNameToKeyMapping pv2key, org.apache.parquet.ParquetReadOptions readOptions, PathResolver pathResolver)
      Constructs a new state manager for appending data to Parquet files.
      Parameters:
      partitionGranularity - The granularity for partitioning data files (e.g., YEARLY).
      rootFolder - The root directory for storage.
      desc - A description for logging purposes.
      lastKnownTimestamp - The timestamp of the last known event in this storage partition. Used to prevent writing out-of-order data.
      compressionCodecName - The compression codec to use for writing new files.
      pv2key - The mapping from PV name to storage key.
      readOptions - Configuration for reading Parquet files.
      pathResolver - The resolver for determining file paths.
  • Method Details

    • updateStateBasedOnExistingFile

      public void updateStateBasedOnExistingFile(String pvName, Path currentPVFilePath) throws IOException
      Initializes the writer state by reading metadata from an existing Parquet file. This is called when appending data to a file that already contains events.
      Specified by:
      updateStateBasedOnExistingFile in class AppendDataStateData
      Parameters:
      pvName - The name of the PV.
      currentPVFilePath - The path to the existing Parquet file.
      Throws:
      IOException - if the PV name in the file does not match or an I/O error occurs.
    • createNewWriter

      protected EventFileWriter createNewWriter(String pvName, Path pvPath, EventStream stream) throws IOException
      Specified by:
      createNewWriter in class AppendDataStateData
      Throws:
      IOException
    • bulkAppend

      public boolean bulkAppend(String pvName, ETLContext context, ETLBulkStream bulkStream, String extension, String extensionToCopyFrom) throws IOException
      Performs a bulk append optimized for ETL processes.

      This method expects an ETLParquetFilesStream and uses combineFiles(List, Path, boolean) to merge the source files directly, bypassing event-by-event processing.

      Specified by:
      bulkAppend in class AppendDataStateData
      Parameters:
      pvName - The PV name
      context - The ETL context
      bulkStream - The ETL bulk stream
      extension -  
      extensionToCopyFrom -  
      Returns:
      true if the append was successful.
      Throws:
      IOException - if the stream is not of the expected type or an I/O error occurs.
    • closeStreams

      public void closeStreams() throws IOException
      Closes any open writers and merges temporary files.

      This method is critical for ensuring data is committed to its final destination. It closes the ParquetWriter (flushing its buffers) and then calls combineWithTempFiles() to merge the newly written data into the main partition file.

      Overrides:
      closeStreams in class AppendDataStateData
      Throws:
      IOException - if an I/O error occurs.
    • toString

      public String toString()
      Overrides:
      toString in class Object