Duplicate records are being added to the MARC file
Since the filename for the thesis JSON file that is copied to the MARC input data directory are date stamped with yyyy-mm-dd regardless of whether a thesis is successfully ingested into Summit (see code here), TRS daily jobs that fail can result in duplicate JSON files (only the date-based filenames differ). Duplicate JSON files results in duplicate MARC records.
Copying the thesis JSON file into the MARC input data directory should only happen if the thesis was successfully ingested. The code in the TRS daily script that does this copying is:
/**
* Copies the JSON data for each thesis to the input directory for the Theses MARC Generator script.
*
* @param string $etd_id
* The ETD ID of the thesis.
* @param string $thesis_data
* The thesis JSON retrieved from the TRS.
*/
function copy_thesis_data_for_marc($etd_id, $thesis_data) {
global $marc_data_directory;
$filename = date("Y-m-d") . '_' . $etd_id . '.json';
file_put_contents($marc_data_directory . '/' . $filename, $thesis_data);
}
This could be converted into a post-node creation script run by Workbench. That script would simply copy the JSON file from the thesis temp dir to the MARC data directory, giving it the expected name.