edu.stanford.slac.archiverappliance.plain.PlainStoragePlugin

All Implemented Interfaces:: DataAtTime, ETLDest, ETLSource, StorageMetrics, Reader, StoragePlugin, Writer

public class PlainStoragePlugin extends Object implements StoragePlugin, ETLSource, ETLDest, StorageMetrics, DataAtTime

The plain PB storage plugin stores data in a chunk per PV per partition in sequential form. No index is maintained, simple search algorithms are used to locate events. This plugin has these configuration parameters.

name: This serves to identify this plugin; mandatory
rootFolder: This serves as the rootFolder that is prepended to the path generated for a PV+chunk ; mandatory. One can use environment variables here; for example, pb://localhost?name=STS&rootFolder=${ARCHAPPL_SHORT_TERM_FOLDER}&partitionGranularity=PARTITION_HOUR where the value for ${ARCHAPPL_SHORT_TERM_FOLDER} is picked up from the environment/system properties.
partitionGranularity: Defines the time partition granularity for this plugin. For example, if the granularity PARTITION_HOUR, then a new chunk is created for each hour of data. The partitions are clean; that is, they contain data only for that partition. It is possible to predict which chunk contains data for a particular instant in time and which chunks contain data for a particular time period. This is a mandatory field.
compress: This is an optional field that defines the compression mode. The support for zip compression is experimental. If the zip compression is used, the rootfolder is prepended with ZIP_PREFIX. If this is absent in the rootfolder, the initialization code automatically adds it in.
hold & gather: hold and gather are optional fields that work together to implement high/low watermarks for data transfer. By default, both hold and gather are 0 which leads to data being transferred out of this plugin as soon as the partition boundary is reached. You can hold a certain number of partitions in this store (perhaps because this store is a high performing one). In this case, ETL does not start until the first event in this store is older than hold partitions. Once ETL begins, you can transfer gather partitions at a time. For example, hold=5&gather=3 lets you keep at least 5-3=2 partitions in this store. ETL kicks in once the oldest event is older than than 5 partitions and data is moved 3 partitions at a time.
pp: An optional parameter, this contains a list of post processing operators that are computed and cached during ETL. During retrieval, if an exact match is found, then the data from the cached copy is used (greatly improving retrieval performance). Otherwise, the post processor is applied and the data is computed at runtime. To specify multiple post processors, use standard URL syntax like so pp=rms&pp=mean_3600
consolidateOnShutdown: This lets you control if ETL should push data to the subsequent store on appserver shutdown. This is useful if you are using a RAMDisk for the short term store.
reducedata: An optional parameter; use this parameter to reduce the data as you move it into this store. You can use any of the post processors that can be used with the pp argument. For example, if you define the LTS as pb://localhost?name=LTS&rootFolder=${ARCHAPPL_LONG_TERM_FOLDER}&partitionGranularity=PARTITION_YEAR&reducedata=firstSample_3600, then when moving data into this store, ETL will apply the firstSample_3600 operator on the raw data to reduce the data and store only the reduced data. The difference between this parameter and the pp parameter is that in the reducedata case, only the reduced data is stored. The raw data is thrown away. If you specify both the pp and the reducedata, you may get unpredictable results because the raw data is necessary to precompute the caches.
etlIntoStoreIf: An optional parameter; use this parameter to control if ETL should move data into this store. If the named flag specified by this parameter is false, this plugin will behave like the blackhole plugin (and you will lose data). Note that named flags are false by default; so the default behavior if you specify this flag and forget to the set the named flag is to lose data. If you don't set this flag at all; then this plugin behaves normally and will accept all the ETL data coming in. For example, if you add a etlIntoStoreIf=testFlag; then data will be moved into this store only if the value of the named flag testFlag is true.
etlOutofStoreIf: An optional parameter; use this parameter to control if ETL should move data out of this store. If the named flag specified by this parameter is false, this plugin will behave like a bag of holding and accumulate all the data it can. Note that named flags are false by default; so the default behavior if you specify this flag and forget to the set the named flag is to collect data till you run out of space. If you don't set this flag at all; then this plugin behaves normally and will move data out as before. For example, if you add a etlOutofStoreIf=testFlag; then data will be moved ouf of this store only if the value of the named flag testFlag is true.

Author:: mshankar

Constructor Summary

Constructors

Constructor

Description

PlainStoragePlugin(PlainFileHandler plainFileHandler)

PlainStoragePlugin(PlainStorageType plainStorageType)
Method Summary

Modifier and Type

Method

Description

int

appendData(BasicContext context, String pvName, EventStream stream)

boolean

appendToETLAppendData(String pvName, EventStream stream, ETLContext context)

This appends an EventStream to the ETL append data for a PV.

boolean

commitETLAppendData(String pvName, ETLContext context)

This concatenates the ETL append data for a PV with the PV's destination data.

boolean

consolidateOnShutdown()

Should ETL move data from this source to the destination on shutdown.

void

convert(BasicContext context, String pvName, ConversionFunction conversionFunction)

Sometimes, PVs change types, EGUs etc.

Event

dataAtTime(BasicContext context, String pvName, Instant atTime, Instant startAtTime, Period searchPeriod, BiDirectionalIterable.IterationDirection direction)

Generic method to iterate over the data for specified PV.

ETLInfoListProcessor

etlInfoListProcessor(ETLSource curETLSource)

FileInfo

fileInfo(Path path)

Path[]

getAllPathsForPV(BasicContext context, String pvName)

List<ETLInfo>

getAllStreams(String pvName, ETLContext context)

Given a pv and a time, this method returns all the streams.

List<Callable<EventStream>>

getDataForPV(BasicContext context, String pvName, Instant startTime, Instant endTime)

List<Callable<EventStream>>

getDataForPV(BasicContext context, String pvName, Instant startTime, Instant endTime, PostProcessor postProcessor)

String

getDesc()

String

getDescription()

Get a string description of this plugin; one that can potentially be used in log messages and provide context.

List<ETLInfo>

getETLStreams(String pvName, Instant currentTime, ETLContext context)

Given a pv and a time, this method returns all the streams that are ready for ETL.

String

getExtensionString()

Event

getFirstKnownEvent(BasicContext context, String pvName)

Get the first event for this PV.

Event

getLastKnownEvent(BasicContext context, String pvName)

Gets the last known event in this destination.

String

getName()

Multiple PVs will probably use the same storage area and we identify the area using the name.

PartitionGranularity

getPartitionGranularity()

PathResolver

getPathResolver()

PlainFileHandler

getPlainFileHandler()

String

getPluginIdentifier()

String

getRootFolder()

long

getTotalSpace(StorageMetricsContext storageMetricsContext)

Gets the total space left on this device.

String

getURLRepresentation()

Return a URL representation of this plugin suitable for parsing by StoragePluginURLParser

long

getUsableSpace(StorageMetricsContext storageMetricsContext)

Gets the space available to this VM on this device

void

initialize(String configURL, ConfigService configService)

Each storage plugin is registered to a URI scheme; for example, the PlainStoragePBPlugin uses pb:// as the scheme.

void

markForDeletion(ETLInfo info, ETLContext context)

Delete the ETLStream identifier by info when you can as it has already been consumed by the ETL destination.

String

pluginIdentifier()

Provide the prefix for storage plugin urls.

void

renamePV(BasicContext context, String oldName, String newName)

Change the name of a PV.

boolean

runPostProcessors(String pvName, ArchDBRTypes dbrtype, ETLContext context)

Run the post processors associated with this plugin if any for this pv.

void

setBackupFilesBeforeETL(boolean backupFilesBeforeETL)

void

setDesc(String newDesc)

void

setGatherETLInPartitions(int gatherETLInPartitions)

void

setHoldETLForPartitions(int holdETLForPartitions)

The hold and gather are used to implement a high/low watermark for ETL.

void

setName(String name)

void

setPartitionGranularity(PartitionGranularity partitionGranularity)

void

setRootFolder(String rootFolder)

long

spaceConsumedByPV(String pvName)

Gets an estimate of the space consumed by this PV on this device.

String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- PlainStoragePlugin
  
  public PlainStoragePlugin(PlainFileHandler plainFileHandler)
- PlainStoragePlugin
  
  public PlainStoragePlugin(PlainStorageType plainStorageType)
Method Details
- getDataForPV
  
  public List<Callable<EventStream>> getDataForPV(BasicContext context, String pvName, Instant startTime, Instant endTime) throws IOException
  
  Throws:
  
  IOException
- getAllPathsForPV
  
  public Path[] getAllPathsForPV(BasicContext context, String pvName) throws IOException
  
  Throws:
  
  IOException
- getDataForPV
  
  public List<Callable<EventStream>> getDataForPV(BasicContext context, String pvName, Instant startTime, Instant endTime, PostProcessor postProcessor) throws IOException
  
  Specified by:
  
  getDataForPV in interface Reader
  
  Throws:
  
  IOException
- dataAtTime
  
  public Event dataAtTime(BasicContext context, String pvName, Instant atTime, Instant startAtTime, Period searchPeriod, BiDirectionalIterable.IterationDirection direction) throws IOException
  
  Description copied from interface: DataAtTime
  
  Generic method to iterate over the data for specified PV. We provide a time to start the iteration at ( inclusive ) and a direction and a Predicate. The plugin then iterates ( forwards or backwards ) and calls the Predicate for each sample. The iteration stops when the Predicate returns false or when we run out of samples. Iteration also stops when we get an exception. So this mode of traversal is vulnerable to corruption in the data. We may relax this constraint later by providing a optional bool to ignore exceptions.
  
  Specified by:
  
  dataAtTime in interface DataAtTime
  
  Parameters:
  
  context -
  
  pvName - The PV name
  
  atTime - Time looking for.
  
  startAtTime - Time to start looking from.
  
  searchPeriod - - An estimate on the amount of time we want to iterate. This is used to determine the appropriate chunks containing data at a very high level and thus to limit the search. The predicate itself is the one that controls when the iteration terminates. So, for example, you could specify a searchPeriod of 1 year but stop the iteration after 1 minute
  
  Throws:
  
  IOException -
- appendData
  
  public int appendData(BasicContext context, String pvName, EventStream stream) throws IOException
  
  Specified by:
  
  appendData in interface Writer
  
  Throws:
  
  IOException
- appendToETLAppendData
  
  public boolean appendToETLAppendData(String pvName, EventStream stream, ETLContext context) throws IOException
  
  Description copied from interface: ETLDest
  
  This appends an EventStream to the ETL append data for a PV.
  
  Specified by:
  
  appendToETLAppendData in interface ETLDest
  
  Parameters:
  
  pvName - The name of PV.
  
  stream - The EventStream to append to the append data for a PV.
  
  context - ETLContext
  
  Returns:
  
  boolean True or False
  
  Throws:
  
  IOException -
- getDescription
  
  public String getDescription()
  
  Description copied from interface: StoragePlugin
  
  Get a string description of this plugin; one that can potentially be used in log messages and provide context.
  
  Specified by:
  
  getDescription in interface ETLDest
  
  Specified by:
  
  getDescription in interface ETLSource
  
  Specified by:
  
  getDescription in interface StoragePlugin
  
  Returns:
  
  description
- etlInfoListProcessor
  
  public ETLInfoListProcessor etlInfoListProcessor(ETLSource curETLSource)
  
  Specified by:
  
  etlInfoListProcessor in interface ETLDest
- initialize
  
  public void initialize(String configURL, ConfigService configService) throws IOException
  
  Description copied from interface: StoragePlugin
  Each storage plugin is registered to a URI scheme; for example, the PlainStoragePBPlugin uses pb:// as the scheme. Configuration for a storage plugin typically comes in as a URL like URI.
  
  The config service identifies the storage plugin using the scheme ("pb" maps to PlainStoragePBPlugin)
  
  Creates an instance using the default constructor.
  
  Calls initialize with the complete URL.
  
  The storage plugin is expected to use the parameters in the URL to initialize itself.
  Specified by:
  
  initialize in interface StoragePlugin
  
  Parameters:
  
  configURL - The complete URL
  
  configService -
  
  Throws:
  
  IOException -
  
  See Also:
  
  StoragePluginURLParser
- getURLRepresentation
  
  public String getURLRepresentation()
  
  Return a URL representation of this plugin suitable for parsing by StoragePluginURLParser
  
  Returns:
  
  ret A URL representation
- getRootFolder
  
  public String getRootFolder()
- setRootFolder
  
  public void setRootFolder(String rootFolder) throws IOException
  
  Throws:
  
  IOException
- getDesc
  
  public String getDesc()
- setDesc
  
  public void setDesc(String newDesc)
- getPartitionGranularity
  
  public PartitionGranularity getPartitionGranularity()
  
  Specified by:
  
  getPartitionGranularity in interface ETLDest
  
  Specified by:
  
  getPartitionGranularity in interface ETLSource
- setPartitionGranularity
  
  public void setPartitionGranularity(PartitionGranularity partitionGranularity)
- getETLStreams
  
  public List<ETLInfo> getETLStreams(String pvName, Instant currentTime, ETLContext context) throws IOException
  
  Description copied from interface: ETLSource
  
  Given a pv and a time, this method returns all the streams that are ready for ETL. For example, if the partition granularity of a source is an hour, then this method returns all the streams that are in this source for the previous hours. Ideally, these streams must be closed for writing and should not change. The ETL process will consolidates these streams into the ETL destination, which is expected to be at a longer time granularity.
  
  Specified by:
  
  getETLStreams in interface ETLSource
  
  Parameters:
  
  pvName - The name of PV.
  
  currentTime - The time that is being used as the cutoff. If we pass in a timestamp way out into the future, we should return all the streams available.
  
  context - ETLContext
  
  Returns:
  
  List ETLinfo
  
  Throws:
  
  IOException -
- getAllStreams
  
  public List<ETLInfo> getAllStreams(String pvName, ETLContext context) throws IOException
  
  Description copied from interface: ETLSource
  
  Given a pv and a time, this method returns all the streams.
  
  Specified by:
  
  getAllStreams in interface ETLSource
  
  Parameters:
  
  pvName - The name of PV.
  
  context - ETLContext
  
  Returns:
  
  List ETLinfo
  
  Throws:
  
  IOException -
- markForDeletion
  
  public void markForDeletion(ETLInfo info, ETLContext context)
  
  Description copied from interface: ETLSource
  
  Delete the ETLStream identifier by info when you can as it has already been consumed by the ETL destination. You can delete it later or immediately.
  
  Specified by:
  
  markForDeletion in interface ETLSource
  
  Parameters:
  
  info - ETLInfo
  
  context - ETLContext
- getLastKnownEvent
  
  public Event getLastKnownEvent(BasicContext context, String pvName) throws IOException
  
  Description copied from interface: Writer
  
  Gets the last known event in this destination. Future events will be appended to this destination only if their timestamp is more recent than the timestamp of this event. If there is no last known event, then a null is returned.
  
  Specified by:
  
  getLastKnownEvent in interface Writer
  
  Parameters:
  
  context -
  
  pvName - The PV name
  
  Returns:
  
  Event The last known event of pvName
  
  Throws:
  
  IOException -
- getFirstKnownEvent
  
  public Event getFirstKnownEvent(BasicContext context, String pvName) throws IOException
  
  Description copied from interface: Reader
  
  Get the first event for this PV. This call is used to optimize away calls to other readers that have older data.
  
  Specified by:
  
  getFirstKnownEvent in interface Reader
  
  Parameters:
  
  context -
  
  pvName - The PV name
  
  Returns:
  
  Event The first event of pvName
  
  Throws:
  
  IOException -
- commitETLAppendData
  
  public boolean commitETLAppendData(String pvName, ETLContext context) throws IOException
  
  Description copied from interface: ETLDest
  
  This concatenates the ETL append data for a PV with the PV's destination data.
  
  Specified by:
  
  commitETLAppendData in interface ETLDest
  
  Parameters:
  
  pvName - The name of PV.
  
  context - ETLContext
  
  Returns:
  
  boolean True or False
  
  Throws:
  
  IOException -
- runPostProcessors
  
  public boolean runPostProcessors(String pvName, ArchDBRTypes dbrtype, ETLContext context) throws IOException
  
  Description copied from interface: ETLDest
  
  Run the post processors associated with this plugin if any for this pv. The post processing is done after the commit and outside of the ETL transaction. This process is expected to catch up on previously missed/incomplete computation of cached post processing files. I can think of at least two usecases for this - one where we decide to go back and add a post processor for a pv and one where we change the algorithm for the post processor and want to recompute all the cached files again.
  
  Specified by:
  
  runPostProcessors in interface ETLDest
  
  Parameters:
  
  pvName - The name of PV.
  
  dbrtype - ArchDBRTypes
  
  context - ETLContext
  
  Returns:
  
  boolean True or False
  
  Throws:
  
  IOException -
- setBackupFilesBeforeETL
  
  public void setBackupFilesBeforeETL(boolean backupFilesBeforeETL)
- setHoldETLForPartitions
  
  public void setHoldETLForPartitions(int holdETLForPartitions) throws IOException
  
  The hold and gather are used to implement a high/low watermark for ETL. ETL is skipped until the first known event in the partitions available for ETL is earlier than hold partitions. Once this is true, we then include in the ETL list all partitions whose first event is earlier than hold - gather partitions. For example, in a PARTITION_DAY, if you want to run ETL once every 7 days, but when you run you want to move 5 days worth of data to the dest, set hold to 7 and gather to 5. Hold and gather default to a scenario where we aggressively push data to the destination as soon as it is available.
  
  Throws:
  
  IOException
- setGatherETLInPartitions
  
  public void setGatherETLInPartitions(int gatherETLInPartitions) throws IOException
  
  Throws:
  
  IOException
- pluginIdentifier
  
  public String pluginIdentifier()
  
  Description copied from interface: StoragePlugin
  
  Provide the prefix for storage plugin urls.
  
  Specified by:
  
  pluginIdentifier in interface StoragePlugin
  
  Returns:
  
  String of the Storage Plugin Identifier
- getName
  
  public String getName()
  
  Description copied from interface: StoragePlugin
  
  Multiple PVs will probably use the same storage area and we identify the area using the name. This is principally used in capacity planning/load balancing to identify the storage area for the PV. We should make sure that storage's with similar lifetimes have the same name in all the appliances. The name is also used to identify the storage in the storage report. For example, the PlainStoragePlugin takes a name parameter and we should use something like STS as the identity for the short term store in all the appliances.
  
  Specified by:
  
  getName in interface ETLDest
  
  Specified by:
  
  getName in interface ETLSource
  
  Specified by:
  
  getName in interface StorageMetrics
  
  Specified by:
  
  getName in interface StoragePlugin
  
  Returns:
  
  name
- setName
  
  public void setName(String name)
- getTotalSpace
  
  public long getTotalSpace(StorageMetricsContext storageMetricsContext) throws IOException
  
  Description copied from interface: StorageMetrics
  
  Gets the total space left on this device.
  
  Specified by:
  
  getTotalSpace in interface StorageMetrics
  
  Parameters:
  
  storageMetricsContext - StorageMetricsContext
  
  Returns:
  
  getTotalSpac
  
  Throws:
  
  IOException -
- getUsableSpace
  
  public long getUsableSpace(StorageMetricsContext storageMetricsContext) throws IOException
  
  Description copied from interface: StorageMetrics
  
  Gets the space available to this VM on this device
  
  Specified by:
  
  getUsableSpace in interface StorageMetrics
  
  Parameters:
  
  storageMetricsContext - StorageMetricsContext
  
  Returns:
  
  getUsableSpace
  
  Throws:
  
  IOException -
- spaceConsumedByPV
  
  public long spaceConsumedByPV(String pvName) throws IOException
  
  Description copied from interface: StorageMetrics
  
  Gets an estimate of the space consumed by this PV on this device.
  
  Specified by:
  
  spaceConsumedByPV in interface StorageMetrics
  
  Parameters:
  
  pvName - The name of PV.
  
  Returns:
  
  spaceConsumedByPV
  
  Throws:
  
  IOException -
- consolidateOnShutdown
  
  public boolean consolidateOnShutdown()
  
  Description copied from interface: ETLSource
  
  Should ETL move data from this source to the destination on shutdown. For example, if you are using a ramdisk for the STS and you have a UPS, you can minimize any data loss but turning this bit on for data stores that are on the ramdisk. On shutdown, ETL will try to move the data out of this store into the next lifetime store.
  
  Specified by:
  
  consolidateOnShutdown in interface ETLSource
  
  Returns:
  
  boolean True or False
- renamePV
  
  public void renamePV(BasicContext context, String oldName, String newName) throws IOException
  
  Description copied from interface: StoragePlugin
  
  Change the name of a PV. This happens occasionally in the EPICS world when people change the names of PVs but want to retain the data. This method is used to change the name of the PV in any of the datasets for PV oldName. For example, in PB files, the name of the PV is encoded in the file names and is also stored in the header. In this case, we expect the plugin to move the data to new files names and change the PV name in the file header. To avoid getting into issues about data changing when renaming files, the PV can be assumed to be in a paused state.
  
  Specified by:
  
  renamePV in interface StoragePlugin
  
  Parameters:
  
  context -
  
  oldName - The old PV name
  
  newName - The new PV name
  
  Throws:
  
  IOException -
- convert
  
  public void convert(BasicContext context, String pvName, ConversionFunction conversionFunction) throws IOException
  
  Description copied from interface: StoragePlugin
  
  Sometimes, PVs change types, EGUs etc. In these cases, we are left with the problem of what to do with the already archived data. We can rename the PV to a new but related name - this keeps the existing data as is. Or, we can attempt to convert to the new type, EGU etc. This method can be used to convert the existing data using the supplied conversion function. Conversions should be all or nothing; that is, first convert all the streams into temporary chunks and then do a bulk rename once all the conversions have succeeded. Note that we'll also be using the same conversion mechanism for imports and other functions that change data. So, when/if implementing the conversion function, make sure we respect the typical expectations within the archiver - monotonically increasing timestamps and so on. To avoid getting into issues about data changing when converting, the PV can be assumed to be in a paused state.
  
  Specified by:
  
  convert in interface StoragePlugin
  
  Parameters:
  
  context -
  
  pvName - The PV name
  
  conversionFunction -
  
  Throws:
  
  IOException -
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- getExtensionString
  
  public String getExtensionString()
- getPluginIdentifier
  
  public String getPluginIdentifier()
- getPlainFileHandler
  
  public PlainFileHandler getPlainFileHandler()
- fileInfo
  
  public FileInfo fileInfo(Path path) throws IOException
  
  Throws:
  
  IOException
- getPathResolver
  
  public PathResolver getPathResolver()

Class PlainStoragePlugin

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

PlainStoragePlugin

PlainStoragePlugin

Method Details

getDataForPV

getAllPathsForPV

getDataForPV

dataAtTime

appendData

appendToETLAppendData

getDescription

etlInfoListProcessor

initialize

getURLRepresentation

getRootFolder

setRootFolder

getDesc

setDesc

getPartitionGranularity

setPartitionGranularity

getETLStreams

getAllStreams

markForDeletion

getLastKnownEvent

getFirstKnownEvent

commitETLAppendData

runPostProcessors

setBackupFilesBeforeETL

setHoldETLForPartitions

setGatherETLInPartitions

pluginIdentifier

getName

setName

getTotalSpace

getUsableSpace

spaceConsumedByPV

consolidateOnShutdown

renamePV

convert

toString

getExtensionString

getPluginIdentifier

getPlainFileHandler

fileInfo

getPathResolver