Default ingest actions

During ingest, all files in the request will be analyzed, and different actions will be performed based on that analysis. Types of files detected:

  • Based on ffprobe output, if there are video streams in the file, it is categorized as VIDEO
  • Based on ffprobe output, if there are audio streams in the file, it is categorized as AUDIO
  • Based on ffprobe output, if there are subtitle streams in the file, it is categorized as SUBTITLE
  • If the mime type is text/plain, the file is parsed and categorized accordingly:
  • If the file starts with Scenarist_SCC V1.0, it is categorized as SUBTITLE, with subtitle format scc
  • If the first line of the file contains words Lambda, V4 and DF, it is categorized as SUBTITLE, with subtitle format cap
  • If the file starts with WEBVTT, it is categorized as SUBTITLE, with subtitle format webvtt
  • If the file starts with #EXTM3U, it is categorized as VIDEO, with video format hls
  • Else if mime type is application/stl, it is categorized as SUBTITLE, with subtitle format stl
  • If the mime type is xml, the file is parsed and categorized accordingly:
  • If the start tag is taskReport, it is categorized as BATON
  • If the start tag is MPD, it is categorized as VIDEO, with video format dash
  • If the start tag is tt, it is categorized as SUBTITLE, with subtitle format ttml

Default actions for files categorized as VIDEO:

  • Extract sprite map for thumbnails
  • Extract a single representative poster
  • Waveform analysis
  • The video file parameters will be checked for if it can be accurately played in a browser, if not a transcode job is started to create a proxy that can be played in browsers. The parameters checked are:
  • Container format is mp4 or mov
  • Video stream format is h264
  • Audio format is aac
  • If any of the parameter checks above fail, a proxy will be created
  • If the number of audio streams are > 1, audio extraction will be performed

Default actions for files categorized as SUBTITLE:

  • Subtitle cues will be extracted as timespans on the asset, supported formats are:
  • ttml
  • webvtt
  • scc
  • cap
  • srt
  • stl

Files categorized as BATON will be parsed, and events will be added as timespans on the asset.

Manifest ingest

For manually ingesting files, it is fine for the user to manually select all the files that should be ingested as a single asset. However, for automatic ingest, a mechanism for defining what assets belong together is needed. The way this is done in AV, is by creating manifest files that contain a definition of an asset, complete with metadata and files. The format of this manifest is defined by AssetInputDto. This is the same format as is used by the frontend when doing manual ingest, but in this scenario the data is provided as a JSON file on a storage.

Lets look at an example of a manifest:

{
  "metadata": [
    {
      "key": "title",
      "value": "This is my awesome asset"
    },
    {
      "key": "source",
      "value": "Netflix"
    },
    {
      "key": "production_date",
      "value": "2019-10-15"
    }
  ],
  "files": [
    {
      "fileName": "my_awesome_video.mp4"
    },
    {
      "fileName": "/subtitles/my_awesome_video/english.vtt",
      "metadata": [
        {
          "key": "language",
          "value": "en-US"
        }
      ]
    },
    {
      "fileName": "/subtitles/my_awesome_video/croatian.vtt",
      "metadata": [
        {
          "key": "language",
          "value": "hr-HR"
        }
      ]
    },
    {
      "fileName": "audio/my_awesome_video/croatian.wav",
      "metadata": [
        {
          "key": "language",
          "value": "hr-HR"
        }
      ]
    },
    {
      "fileName": "audio/my_awesome_video/french.wav",
      "metadata": [
        {
          "key": "language",
          "value": "fr-FR"
        }
      ]
    },
    {
      "type": "MARKER",
      "fileName": "marker-import.csv",
      "container": {
        "format": "csv"
      },
      "metadata": [
        {
          "key": "timespan_type",
          "value": "Manual"
        }
      ]
    },
    {
      "type": "BATON",
      "fileName": "my_awesome_baton.xml"
    }
  ]
}

Lets start with metadata, this will simply be set as metadata fields on the resulting asset. The title field should always be set, or finding the asset might be difficult for the users.

Next, the list of files to ingest into the asset is defined. In this ingest the following files are defined:

  • A video file my_awesome_ingest.mp4. As the filename does not start with /, this file is assumed to be located in the same folder as the manifest file.
  • Subtitles in VTT format (english and croatian). Since these filename start with a /, it is assumed that they are referenced from the root of the storage where the manifest is located.
  • External audio files (croatian and french). These filenames does not start with a /, but reference sub directories, and as such are assumed to be in subfolders of where the manifest is located.
  • Markers to import (marker-import.csv). This can be used to ingest markers into the asset, the format of the marker file must be supplied in the container.format field. This example also adds the timespan type to use in the manifest file instead of inside the marker file.
  • A baton file is also imported here.

Customer specific manifest file formats

In addition to the base format supported, customer specific file formats can also be supported. To set this up the following needs to be added to the runner:

  • Configuration field av.runner.manifest_reader_location must be set to a directory
  • In the directory defined above, a script should be added that takes the incoming file and outputs a json file with the base format
  • In the ingest job a metadata field with manifest_format should be set to the filename of the script, or set to a format specifier X, and configuration field on the runner av.runner.manifest_reader.X should be set to the filename of the script

If the above is true, then the given script will be run during ingest with script input_file output_file, which must output a valid json file in the base format to the given output file. The input file will be the manifest file that is being ingested, downloaded to local disc, i.e. the script does not need to support fetching the file from a remote location.

Automatic ingest

Automatic ingest is configured on a per storage basis. There are six configuration fields that define the behavior of automatic ingest:

  • auto_ingest:enabled - set to true if automatic ingest should be performed
  • auto_ingest:template_id - the job template to use for ingest from this storage, defaults to ingest
  • auto_ingest:include - multivalued metadata field that is used to only ingest files that match the given glob
  • auto_ingest:exclude - multivalued metadata field that is used to exclude files from ingest that match the given glob
  • auto_ingest:manifest:enabled - if set to true, all files that are matched by the inclusion/exclusion rules are assumed to be manifest files
  • auto_ingest:manifest:format - the format of the manifest file, only used for customer specific manifest file formats
  • auto_ingest:manifest:include - multivalued metadata field that is used to only ingest files as manifests that match the given glob
  • auto_ingest:manifest:exclude - multivalued metadata field that is used to exclude files from manifest ingest that match the given glob

Inclusion/exclusion rules

If automatic ingest is enabled on a storage without specifying what files should be ingested, all files will be ingested with one ingest job per file. To limit what files should be imported, inclusion and exclusion rules can be defined. Both of these are multivalued, so multiple filename formats can be added both to inclusion and exclusion rules. Exclusion rules will win against inclusion rules, if a file match an exclusion rule it will not be ingested automatically.

These files use the glob format, for example:

  • auto_ingest:include=**/manifests/*.json
  • auto_ingest:exclude=backup/**

This will only try to ingest files located in manifests directories ending with .json, except for files located below the root backup/ directory.

The manifest inclusion/exclusion rules work the same, and are completely separate from the basic inclusion/exclusion rules and overrule those.