Analyse RIS files

Reference managers like Endnote, Refworks or Zotero often allow you to export your bibliographic citations as a RIS file. You can import these into things like Talis Aspire Reading Lists.

The script below will look in the current directory for RIS files and analyse their contents. We are looking to see what types they have and how many of them have some sort of identifier that can be used to find better bibliographic data from some other source.

#!/bin/bash

while IFS= read -r -d '' file
do
	echo -n "#=== "
	printf '%q\n' "$file"
	egrep "^TY" "$file" | sort | uniq -c 
	typeCount=$(egrep "^TY" "$file" | wc -l)
	snCount=$(egrep "^SN" "$file" | wc -l)
	echo $(($snCount*100/$typeCount))"% of records have an SN ("$snCount" of "$typeCount")"
	echo
done < <(find . \( -name "*.ris" -o -name "*.txt" \) -print0 )

Sample output:

#=== ./PMUP00DNMod3.txt
  17 TY  - CHAP
   4 TY  - JOUR
80% of records have an SN ( 17 of  21)

#=== ./PMUP00DNMod4.txt
  11 TY  - CHAP
  10 TY  - JOUR
95% of records have an SN ( 20 of  21)

Leave a comment