Page tree
Skip to end of metadata
Go to start of metadata

Purpose

Optimize the Enterprise Digital Asset Network (EDAN)/Image Delivery Service (IDS) pipeline for eMammal. This pipeline sends selected data and images from eMammal to EDAN/IDS. The pipeline orchestrates sending data to EDAN, and images to IDS (with a manifest). Optimization will come in the form of parallelizing the data being sent to EDAN and IDS, and coordinating with OCIO to determine and implement any recommendations for optimizing the data streams being sent from SIdore to EDAN/IDS.

OCIO Collaboration

Based on a meeting with OCIO, one of the big takeaways in sending the images and manifests to IDS is to structure the requests so that the images are sent sequentially, with the manifest XML being sent at the very end of the thread. This will help allowing multiple images to be sent to IDS in parallel, but not requiring associated XML to be sent with each image file, just as an array of JSON objects at the end of the batch of images. Action items coming out of the meeting for further research and development are:

EDAN Follow-up Items:

  • Andrew will review the current state of HTTP bulk loading in EDAN.
  • Jonny will create documentation for HTTP bulk loading in EDAN.
  • Jonny will review the eMammal Image JSON schema and provide to Jason for integration in Sidora.

IDS Follow-up Items:

  • Jason will investigate uploading images in parallel to the NFS share.
  • Sidora will create the loader XML after uploading all of the images.

Technical Approach

EDAN

  • Parallelize the data streams, <insert config file/update>
  • Utilize EDAN Bulk Loading:
    • Add a Boolean flag to set whether the call to EDAN should be a bulk load or not 
    • The bulk load JSON array is constructed by iterating through all deployment objects and constructing a JSON array of objects instead of individual JSON objects 
    • This toggle is dependent on EDAN functionality for allowing bulk uploads to be received, will coordinate testing with Andrew and Jonny when EDAN functionality is ready 

IDS

  • Dev team will split this route - seda:idsAssetUpdate - into two: seda:idsAssetImageUpdate and seda:idsAssetXMLWriter
  • In the seda:idsAssetImageUpdate route, we will split on the header idsAssetList. Each idsAsset in the list will be put on the body of its own exchange, which will be processed by the Camel code. These exchanges will be processed in parallel. From there, we will go about the same process as the original seda:idsAssetUpdate route, changing where the values are pulled from to map to the new routes.
  • In the seda:idsAssetImageUpdate route we will call the seda:idsAssetXMLWriter route, and modify the idsAsset class to have a string PID field. 
  • We will also modify the unit test for idsAssetUpdate to properly call and test this modified route. 

The gains of employing this approach will be: 

  1. Use the idsAsset class rather than settings headers with idsAsset data
  2. Handle a list of IDS assets rather than a single asset at a time 
  3. Parallelize processing of IDS assets
  4. Divide the original route such that one route handles image updates and one handles writing asset XML


Draft diagram of logic for processing the IDS assets and generating the XML to send to IDS:

  • No labels