Generic File Driver (GFD) for IDM, v 1.1

0 Likes
By: scauwe

Introduction



Although I never had to, it seems that importing images is a frequently asked feature in IDM. The Generic File Driver was not yet able to do so, although Joakim Ganse used the GFD and some ECMA scripting to do so. The last trigger to implement this feature was a forum post that requested this as well and some encouragement from Lothar.

New Features



The ImageFileReader is not the only new feature. Version 1.1 of the Generic File Driver shim does contain the following new features:


  • ImageFileReader: read an image file.

  • RawFileReader: read an arbitrary file (binary or text).

  • Mark attributes as sensitive on the publisher channel.

  • CSVWriter: added support for quoted CSV.



ImageFileReader



Java has built in support to read image files. The file types depend on the configuration (and version) of your JVM, but typically include png, jpg, bmp and gif. Image reading (and writing) can be extended via the SPI infrastructure of Java. An examples (not tested) is TwelveMonkeys ImageIO on github.

The ImageFileReader supports the following features:


  • Resizing: it is possible to resize the image to a fixed width and height (not keeping aspect ratio) or to a max width and height (keeping aspect ratio).

  • Transcoding: it is possible to transcode (as far as the installed ImageReaders and ImageWriters support this) the image. Not all transcodings are supported, so use this with care. Transcoding not only depends on source and destination format (eg jpeg or png), but also on the color model, transparency, compression etc used in the given image format.



When an image is read, the shim will base64 encode the data (after resizing and transcoding), and submit this on the publisher channel. The publisher channel will receive the following predefined attributes:


  • Properties of the image submitted on the pub. channel:

    • imageBytes: base64 encoded image bytes

  • imageHeight: the height of the image after resizing

  • imageWidth: the width of the image after resizing

  • imageFormat: the format of the image (after transcoding)



  • Properties of the original image:

    • srcHeight: the height of the original image on file

  • srcWidth: the height of the original image on file

  • srcFormat: the format of the original image on file





On top of these, you can still configure the publisher to submit file name, path, etc as required. The name of the image file is typically used to somehow match this image to a user.

Memory might be an issue when reading big images: images might be stored up to 3 times in memory while resizing and transcoding. In a typical IDM scenario, where e.g. relatively small profile photo are uploaded, this should not pose any problems. If it does become a problem: run the shim in a remote loader and increase the available memory.

RawFileReader



The RawFileReader reads any file as one (big) attribute value. Files can be both binary as well as text. Binary files are base64 encoded before being submitted on the publisher channel. Text files are read as text (using a given encoding) and submitted.

Since the complete file will be in memory (both in the shim as in the IDM policies), care must be taken to memory consumption. For protection, a maximum file size can be defined.

The RawFileReader generates the following predefined fields:


  • rawData: base64 encoded (in case of binary) or text content of the file



Sensitive attributes



The primary reason for adding the feature of sensitive attributes is not security (CSV files or probably not the best method to transport sensitive data), but to prevent the above readers (ImageFileReader and RawFileReader) to clutter up the log files. Images files and arbitrary files might generate log files with large XSD documents. Marking the rawData or imageByes attributes as sensitive, will replace the actual content with <!--content suppressed -->. This makes the logfiles readable again.

Quoted CSVs



Although the CSV standard defines quotes around all fields in CSV files as optional, some systems require them. The CSV writer did not support this feature: up until version 1.0, the CSV writer only added quotes when needed (eg: the value contains quotes or a newline). As of 1.1, you can opt to quote all fields. This feature is disabled by default.

 

Labels:

Collateral
How To-Best Practice
Comment List
Parents Comment Children
No Data
Related
Recommended