Commit af399982 authored by Mark Jordan's avatar Mark Jordan
Browse files

Initial commit.

parents
This diff is collapsed.
# Islandora BagIt Light
## Introduction
[BagIt](https://wiki.ucop.edu/display/Curation/BagIt) is a specification for packaging content and metadata about that content into a format that can be shared between applications. This module provides a framework for generating Bags for Islandora objects. It is a fork of the [Islandora BagIt](https://github.com/Islandora/islandora_bagit) module. The main differences between it and this module are:
* this module does not provide a usere-facing option to generate Bags, it only provides a Drush command.
* this module does not probide implementations of Islandora hooks to detect when an object or datastream has been ingested or modified.
* this module replaces Islandora BagIt's plugins with submodules.
Islandora sites should not enable both this module and Islandora BagIt. Only one should be enabled since they share permissions and hook definitions.
## Requirements
Islandora BagIt Light requires the following modules/libraries:
* [Islandora](https://github.com/islandora/islandora)
* [Libraries](https://drupal.org/project/libraries)
* [Scholars' Lab BagItPHP library](https://github.com/scholarslab/BagItPHP)
* [Archive_Tar](http://pear.php.net/package/Archive_Tar)
## Installation
To install the Islandora BagIt Light module:
1. Install [Archive_Tar](http://pear.php.net/package/Archive_Tar). This package is required by PEAR so if you have PEAR installed on your system, you won't need to install Archive_Tar separately.
2. Install the [Libraries API](https://drupal.org/project/libraries) contrib module.
3. Unzip this module into your site's modules directory as you would any other contrib module.
4. Install the BagItPHP library by entering your site's sites/all/libraries directory and issuing the following command:
```git clone git://github.com/scholarslab/BagItPHP.git```
5. Enable the Libraries and Islandora BagIt Light modules like you would any other contrib modules.
## Configuration
All configuration for this module is performed either via Drush options.
### Extending and customizing the BagIt module
Islandora BagIt Light ...
### Modifying a Bag from your own modules
This module provides a drupal_alter() hook, which allows other modules to use hook_islandora_bagit_alter($bag, $islandora_object). Your module can modify the current Bag using any of the methods provided by the BagItPHP library. Each implementation of this hook must take $bag and $islandora_object as parameters; $islandora_object is provided so you can access properties of the object in your module easily. A typical implementation looks like this:
```
/**
* Implementation of hook drupal_alter().
*
* @param object $bag
* A BagIt object instantiated in the BagIt module.
*
* @param object $islandora_object
* The current $islandora_object.
*/
function mymodule_islandora_bagit_alter($bag, $islandora_object) {
// Add some custom metadata to bag-info.txt.
$bag->bagInfoData('Some-Arbitrary-Field', 'Foo bar baz');
// Add a file that is not managed by a plugin. Note: extra files
// should be added by plugins if possible, since files that are
// added in drupal_alter() hooks are not counted in Payload-Oxum
// values generated by the Islandora BagIt module.
$bag->addFile('/path/to/file.txt', 'myfile.txt');
// Update the Bag (this is required).
$bag->update();
}
```
Note that implementations of hook_islandora_bagit_alter() must call $bag->update() themselves, typically at the very end of the function.
### Post-Bag-creation hook
Islandora BagIt provides an additional hook, islandora_bagit_post_create, that allows other modules to get notifications that a Bag has just been created. A basic implementation is:
```
/**
* Implements hook_islandora_bagit_post_create().
*
* @param string $pid
* The PID of the Islandora object that the Bag was just created for.
*
* @param string $bag_path
* The path to the Bag, relative to the Drupal installation directory.
*/
function mymodule_islandora_bagit_post_create($pid, $bag_path) {
// Do something interesting.
}
```
This hook can be used to send notification emails after a Bag has been created, to add the Bag to a queue for further processing, or to copy the Bag to a different server.
### Drush integration
Bags can be created for individual Islandora objects or for all objects in a given collection using Drush:
```drush --user=UID create-islandora-bag [object|collection] PID```
where UID is the user ID or user name of the fedoraAdmin user (or equivalent), 'object' or 'collection' indicates whether you want to create a Bag for a single object or a Bag for every member of a collection, and PID is the PID of the Islandora object or collection.
### Permissions and security
This module is intended for users who have a fairly high level of permissions on a Drupal site. Because the goal is to package up all or some of the datastreams in an Islandora object, users who can create and download Bags should have access to those datastreams. However, the module does check the current users' access to a datastream before adding it to the Bag.
## Known Issues
Fedora 3.8.0 fails to generate FOXML files requested using the 'archive' context ([JIRA ticket](https://jira.duraspace.org/browse/FCREPO-1384)). Earlier versions may succeed on exporting 'archive' FOXML files if the resulting FOXML is smaller than approximately 200 MB, but fail on larger files. The Islandora BagIt module triggers this set of errors if 'archive' FOXML files are generated from within one of its plugins ([JIRA ticket](https://jira.duraspace.org/browse/ISLANDORA-1193)). Until this issue is resolved in Fedora, users of the Islandora BagIt module should not use plugins that generate 'archive' FOXML, including plugin_object_foxml.inc distributed with versions of Islandora BagIt prior to 7.x-1.5. The other FOXML export contexts, 'public' and 'migrate' ([documentation](https://wiki.duraspace.org/display/FEDORA37/REST+API#RESTAPI-export)), can be used safely.
Some bags do not finish properly even with PHP CLI's php.ini set to ```max_input_time = -1``` ([JIRA ticket](https://jira.duraspace.org/browse/ISLANDORA-1403)). For better performance the CLI's php.ini should have the lines ```max_execution_time = 0``` and ```man_input_time = -1```, and well as including ```Timeout 86400``` in Apache2's apache2.conf.
## Documentation
Further documentation for this module is available at ...
## Troubleshooting/Issues
Having problems or solved a problem? Check out the Islandora google groups for a solution.
* [Islandora Group](https://groups.google.com/forum/?hl=en&fromgroups#!forum/islandora)
* [Islandora Dev Group](https://groups.google.com/forum/?hl=en&fromgroups#!forum/islandora-dev)
## Maintainers/Sponsors
Current maintainers:
* [Mark Jordan](https://github.com/mjordan)
## Development
If you would like to contribute to this module, please check out [CONTRIBUTING.md](CONTRIBUTING.md). In addition, we have helpful [Documentation for Developers](https://github.com/Islandora/islandora/wiki#wiki-documentation-for-developers) info, as well as our [Developers](http://islandora.ca/developers) section on the [Islandora.ca](http://islandora.ca) site.
## License
[GPLv3](http://www.gnu.org/licenses/gpl-3.0.txt)
<?php
/**
* @file
* Utilities file for the Islandora BagIt Light module.
*/
/**
* Creates the object-level Bag.
*
* Also acts as the 'controller' to pass program flow off to the appropriate
* batch functions.
*
* @param object $islandora_object
* The Islandora object to create a Bag for.
*
* @return string|array
* Either an empty array, a blank string, or a string containing
* a link to 'Download the Bag.'
*/
function islandora_bagit_light_create_bag($pid) {
$islandora_object = islandora_object_load($pid);
// Sanitize the PID so it is usable in file paths.
$pid = str_replace(array(':', '-'), '_', $islandora_object->id);
// Save all the datastreams to a randomly named temporary directory so
// they can be added to the Bag. We delete these files after creating the Bag.
$random_string = substr(md5(rand()), 0, 7);
$tmp_ds_directory = variable_get('islandora_bagit_bag_tmp_dir', file_directory_temp()) .
'/islandora_bagit_tmp/' . $random_string;
if (!file_exists($tmp_ds_directory)) {
mkdir($tmp_ds_directory, 0777, TRUE);
}
// Load the BagItPHP library.
$bagit_library_dir = variable_get('islandora_bagit_library_dir', 'BagItPHP');
if ($bagit_library_path = libraries_get_path($bagit_library_dir)) {
require_once $bagit_library_path . '/lib/bagit.php';
}
$bag_file_name = variable_get('islandora_bagit_bag_name', 'Bag-') . $pid;
$bag_output_path = variable_get('islandora_bagit_bag_output_dir', '/tmp') .
DIRECTORY_SEPARATOR . $bag_file_name;
// Because the BagItPHP library does some things by default if the bag output
// directory already exists (like read the fetch.txt file), we always need to
// delete the directory if it exists.
if (file_exists($bag_output_path)) {
rrmdir($bag_output_path);
}
// A list of all the files added to the bag, to show the user and add to
// the watchdog entries.
$all_added_files = array();
// Get bag-info.txt metadata.
$bag_info = islandora_bagit_light_create_baginfo();
// Create a new bag.
$bag = new BagIt($bag_output_path, TRUE, TRUE, FALSE, $bag_info);
if ($files_to_add = islandora_bagit_light_ds_basic($islandora_object, $tmp_ds_directory)) {
// Generate octetstream sum.
if (variable_get('islandora_bagit_payload_octetstream_sum', 0)) {
$sum = islandora_bagit_light_get_octetstream_sum($files_to_add);
$bag->setBagInfoData('Payload-Oxum', $sum);
}
foreach ($files_to_add as $file) {
$bag->addFile($file['source'], $file['dest']);
$all_added_files[] = $file['dest'];
}
$bag->update();
}
else {
drupal_set_message(t('There are no files to add to the Bag.'), 'warning');
watchdog('bagit', 'BagIt Bag not created for !object: plugins found no files.',
array('!object' => $islandora_object->id));
return '';
}
// Allow other modules to modify the Bag using
// mymodule_islandora_bagit_alter($bag, $islandora_object).
drupal_alter('islandora_bagit', $bag, $islandora_object);
// Write out the serialized (i.e., compressed) Bag.
$serialized_bag_path = variable_get('islandora_bagit_bag_output_dir', '/tmp') .
DIRECTORY_SEPARATOR . $bag_file_name;
$compression_type = variable_get('islandora_bagit_compression_type', 'tgz');
$bag->package($serialized_bag_path, $compression_type);
if (variable_get('islandora_bagit_delete_unserialized_bag', 1)) {
rrmdir($bag_output_path);
}
// Delete the temp directory created by file create plugins, if it exists.
$bag_tmp_dir = variable_get('islandora_bagit_bag_tmp_dir', file_directory_temp()) .
DIRECTORY_SEPARATOR . $pid;
if (file_exists($bag_tmp_dir)) {
rrmdir($bag_tmp_dir);
}
// Clean up the temp directory where we downloaded the datastreams.
if (file_exists($tmp_ds_directory)) {
rrmdir($tmp_ds_directory);
}
$all_added_files = array_unique($all_added_files);
$serialized_all_added_files = implode(', ', $all_added_files);
if (variable_get('islandora_bagit_log_bag_creation', 1)) {
watchdog('islandora_bagit', 'Bag created for PID !pid (!files).',
array('!pid' => $islandora_object->id, '!files' => $serialized_all_added_files));
}
$serialized_bag_path .= '.' . $compression_type;
if (variable_get('islandora_bagit_show_messages', 1)) {
drupal_set_message(t("Bag created and saved at %path", array(
'%path' => $serialized_bag_path,
)));
drupal_set_message(t("Files added: %files",
array('%files' => $serialized_all_added_files)));
}
// Allow other modules to fire the post-Bag creation hook.
$post_create_data = module_invoke_all('islandora_bagit_post_create', $pid, $serialized_bag_path);
if (variable_get('islandora_bagit_provide_download_link', 1)) {
// file_build_uri() needs a relative path.
if (variable_get('islandora_file_default_scheme') == 'private') {
$drupal_files_path = variable_get('islandora_file_private_path');
}
else {
$drupal_files_path = variable_get('islandora_file_public_path', conf_path() . '/files');
}
$relative_bag_path = preg_replace("#$drupal_files_path#", '', $serialized_bag_path);
$download_path = file_create_url(file_build_uri($relative_bag_path));
return l(t('Download the Bag'), $download_path);
}
else {
return array();
}
}
/**
* Returns an array of source and destination file paths.
*
* @param object $islandora_object
* The Islandora object to create a Bag for.
*
* @param string $tmp_ds_directory
* The temporary directory where the datastream files have been downloaded.
*
* @return array|bool
* An array of source and destination file paths, or FALSE
* if no datastream files are present.
*/
function islandora_bagit_light_ds_basic($islandora_object, $tmp_ds_directory) {
$files_to_add = array();
$ds_files = islandora_bagit_light_retrieve_datastreams($islandora_object, $tmp_ds_directory);
// Add file source and dest paths for each datastream to the $files_to_add
// array. $files_to_add['dest'] must be relative to the Bag's data
// subdirectory.
foreach ($ds_files as $ds_filename) {
// Add each file in the directory to $files_to_add.
$source_file_to_add = $ds_filename;
if (file_exists($source_file_to_add) && is_file($source_file_to_add)) {
$files_to_add[] = array(
'source' => $source_file_to_add,
// We use basename here since there are no subdirectories in the Bag's
// 'data' directory.
'dest' => basename($ds_filename),
);
}
}
if (count($files_to_add)) {
return $files_to_add;
}
else {
return FALSE;
}
}
/**
* Placeholder line to meet Coder style checks.
*
* Iterates through the Islandora object, saves each specified datastream
* as a file in a temporary directory using the DSID as the filename,
* and returns a list of all the files saved.
*
* @param object $islandora_object
* The Islandora object to create a Bag for.
*
* @param array $datastreams
* List of DSIDs to retrieve. If empty, retrieve all datastreams.
*
* @return array
* List of all the files saved.
*/
function islandora_bagit_light_retrieve_datastreams($islandora_object, $tmp_ds_directory, $datastreams = array()) {
$ds_files = array();
$mime_detect = new MimeDetect();
foreach ($islandora_object as $ds) {
if (islandora_datastream_access(ISLANDORA_VIEW_OBJECTS, $islandora_object[$ds->id])) {
// If $datastreams is empty, retrieve all datastreams.
if ((count($datastreams) == 0) || in_array($ds->id, $datastreams)) {
$extension = $mime_detect->getExtension($ds->mimetype);
$ds_content_file_path = $tmp_ds_directory . DIRECTORY_SEPARATOR . $ds->id . '.' .
$extension;
// Only get the datastream if its file doesn't already exist.
if (!file_exists($ds_content_file_path)) {
try {
$ds->getContent($ds_content_file_path);
if (!in_array($ds_content_file_path, $ds_files)) {
$ds_files[] = $ds_content_file_path;
}
}
catch (RepositoryException $e) {
drupal_set_message(t('Cannot save datastream file.'), 'warning');
watchdog('bagit', 'Could not save datastream file for datastream !dsid to !path.',
array('!dsid' => $id->id, '!path' => $ds_content_file_path));
}
}
else {
// If the file already exists, add its path to the return array so it
// will be registered with the plugin.
if (strlen($ds_content_file_path)) {
if (!in_array($ds_content_file_path, $ds_files)) {
$ds_files[] = $ds_content_file_path;
}
}
}
}
}
}
return $ds_files;
}
/**
* Serialize and write to a file the Bag object.
*
* Saving the Bag to a $_SESSION variable resulted in unavoidable
* 'incomplete object' errors. Note: This has nothing to do with
* serializing (zipping) a Bag, it's a replacement for using Drupal's
* $_SESSION.
*
* @param object $bag
* The Bag object.
*
* @param string $pid
* The PID of the collection object that the Bag is being created for.
* This PID should already be sanitized so that it can be used in file paths.
*
* @param string $tmp_ds_directory
* The randomly-generated directory path to the temporary files used in
* creation of this Bag.
*/
function islandora_bagit_light_serialize_bag_object($bag, $pid, $tmp_ds_directory) {
// We can't pass any arguments to islandora_bagit_unserialize_bag_object()
// so save the path to the serialized data in the session.
$_SESSION['islandora_bagit_tmp_ds_directory'] = $tmp_ds_directory;
$serialized_bag_blob = serialize($bag);
file_put_contents($tmp_ds_directory . DIRECTORY_SEPARATOR .
'serialized_bag_object.dat', $serialized_bag_blob
);
}
/**
* Placeholder line to meet Coder style checks.
*
* Unserialize a Bag object serialized by
* islandora_bagit_serialize_bag_object(). Note: The BagitPHP library
* will need to be loaded in the scope that this function is called in,
* otherwise we get fatal 'incomplete object' errors.
*
* @return object
* The unserialized Bag object.
*/
function islandora_bagit_light_unserialize_bag_object() {
$tmp_ds_directory = $_SESSION['islandora_bagit_tmp_ds_directory'];
$path = $tmp_ds_directory . DIRECTORY_SEPARATOR . 'serialized_bag_object.dat';
$serialized_bag_string = file_get_contents($path);
$bag = unserialize($serialized_bag_string);
return $bag;
}
/**
* Placeholder line to meet Coder style checks.
*
* Generates the value for the Payload-Oxum metadata tag for
* an object-level Bag. We generate this value for collection-
* level Bags in islandora_bagit_collection_batch_finish_bag().
*
* @param array $files
* Associative array of file info (with 'source' and 'dest'
* keys) returned by a plugins.
*
* @return string
* The Payload-Oxum value.
*/
function islandora_bagit_light_get_octetstream_sum($files) {
$file_counter = 0;
$filesize_sum = 0;
foreach ($files as $file) {
$file_counter++;
$filesize_sum = filesize($file['source']) + $filesize_sum;
}
return $filesize_sum . '.' . $file_counter;
}
/**
* Adds metadata to the bag-info.txt metadata.
*
* Makes no attempt to wrap lines at 79 characters as recommended by the spec.
*
* @return array
* An array containing the tags' name => value pairs.
*/
function islandora_bagit_light_create_baginfo() {
$bag_info = array();
if (strlen(variable_get('islandora_bagit_transferring_organization', ''))) {
$bag_info['Source-Organization'] = variable_get('islandora_bagit_transferring_organization', '');
}
if (strlen(variable_get('islandora_bagit_transferring_organization_address', ''))) {
$bag_info['Organization-Address'] = variable_get('islandora_bagit_transferring_organization_address', '');
}
if (strlen(variable_get('islandora_bagit_contact_name', ''))) {
$bag_info['Contact-Name'] = variable_get('islandora_bagit_contact_name', '');
}
if (strlen(variable_get('islandora_bagit_contact_phone', ''))) {
$bag_info['Contact-Phone'] = variable_get('islandora_bagit_contact_phone', '');
}
if (strlen(variable_get('islandora_bagit_contact_email', ''))) {
$bag_info['Contact-Email'] = variable_get('islandora_bagit_contact_email', '');
}
if (strlen(variable_get('islandora_bagit_profile_uri', ''))) {
$bag_info['BagIt-Profile-Identifier'] = variable_get('islandora_bagit_profile_uri', '');
}
if (variable_get('islandora_bagit_bagging_date', 1)) {
$bag_info['Bagging-Date'] = date("Y-m-d");
}
return $bag_info;
}
<?php
/**
* @file
* Drush integration file for the Islandora BagIt Light module.
*/
/**
* Implements hook_drush_help().
*/
function islandora_bagit_light_drush_help($command) {
switch ($command) {
case 'drush:create-islandora-bag':
return dt('Create a Bag for an Islandora object');
}
}
/**
* Implements hook_drush_command().
*/
function islandora_bagit_light_drush_command() {
$items = array();
$items['create-islandora-bag'] = array(
'description' => dt('Creates a Bag for an Islandora object.'),
'options' => array(
'pid' => dt('The PID for the object or collection you want to create a Bag for.'),
),
'examples' => array(
),
'aliases' => array('cib'),
'bootstrap' => DRUSH_BOOTSTRAP_DRUPAL_LOGIN,
);
return $items;
}
/**
* Callback function for drush create-islandora-bag.
*
* @param string $type
* Either 'object' (for a single Islandora object) or 'collection'
* (for all objects in a collection).
*
* @param string $pid
* The PID of the Islandora object to create a Bag for.
*/
function drush_islandora_bagit_light_create_islandora_bag($type = 'object', $pid = NULL) {
if (!file_exists('sites/all/libraries/BagItPHP')) {
return drush_set_error(DRUSH_FRAMEWORK_ERROR, dt('BagIt library not found.'));
}
// Validate the arguments.
// List of objects to create Bags for.
if ($pid = drush_get_option('pid')) {
$objects_to_bag = array($pid);
}
if (count($objects_to_bag) === 0) {
drush_set_error('No objects to Bag', "Sorry, there are no objects to Bag. Please check your command-line arguments.");
return FALSE;
}
module_load_include('inc', 'islandora_bagit_light', 'includes/utilities');
foreach ($objects_to_bag as $object_to_bag_pid) {
try {
islandora_bagit_light_create_bag($object_to_bag_pid);
}
catch (Exception $e) {
drush_print("Sorry, Islandora cannot create the Bag: " . $e->getMessage());
}
}
}
/**
* Includes Tuque files.
*
* @return bool
* TRUE if the API was included, FALSE otherwise.
*/
function _islandora_bagit_light_drush_include_tuque() {
if (!file_exists('sites/all/libraries/tuque')) {
return drush_set_error(DRUSH_FRAMEWORK_ERROR, dt('Tuque API files not found.'));
}
@include_once 'sites/all/libraries/tuque/Datastream.php';
@include_once 'sites/all/libraries/tuque/FedoraApi.php';
@include_once 'sites/all/libraries/tuque/FedoraApiSerializer.php';
@include_once 'sites/all/libraries/tuque/Object.php';
@include_once 'sites/all/libraries/tuque/RepositoryConnection.php';
@include_once 'sites/all/libraries/tuque/Cache.php';
@include_once 'sites/all/libraries/tuque/RepositoryException.php';
@include_once 'sites/all/libraries/tuque/Repository.php';
@include_once 'sites/all/libraries/tuque/FedoraRelationships.php';
return TRUE;
}
name = Islandora BagIt Light
description = Creats BagIt Bags from Islandora objects.
version = 7.x-dev
core = 7.x
package = Islandora Tools
dependencies[] = libraries
dependencies[] = islandora
<?php
/**
* @file
* Module to create BagIt Bags from Islandora objects. Requires the library at
* https://github.com/scholarslab/BagItPHP. Consult the README.txt for
* instructions.
*/
/**
* Implements hook_permission().
*/
function islandora_bagit_permission() {
return array(
'create Islandora Bags' => array(
'title' => t('Create Bags'),
'description' => t('Create Bags using Islandora BagIt Light'),
),
);
}
/**
* Admin settings form builder.
*/
/*
function islandora_bagit_admin_settings() {
$form['islandora_bagit_library_dir'] = array(
'#title' => t('Location of the BagIt library'),
'#type' => 'textfield',
'#size' => 60,
'#default_value' => variable_get('islandora_bagit_library_dir', 'BagItPHP'),
'#description' => t("Directory where the Scholars' Lab BagIt for PHP library
is installed, relative to sites/all/libraries. Do not use a leading or
trailing slash."),
'#maxlength' => 255,
'#required' => TRUE,
);
$form['islandora_bagit_bag_tmp_dir'] = array(
'#title' => t('Temporary directory for unserialized Bags'),
'#type' => 'textfield',
'#size' => 60,
'#default_value' => variable_get('islandora_bagit_bag_tmp_dir', file_directory_temp()),
'#description' => t("Filesystem directory where the unserialized Bag
directories are written, named by PID. Needs to exist and to be
writable by the web server. Do not include the trailing slash."),
'#maxlength' => 255,