Thursday 22 March 2012

Using Google Analytics Statistics within DSpace


Thank you to Claire Knowles of Edinburgh University who provides this overview of how they have been able to display statistics from Google Analytics in DSpace.
----
In 2009 Edinburgh University Digital Library adopted Google Analytics (GA) to track usage statistics within the DSpace Repositories it supports on behalf of the Scottish Digital Library Consortium (SDLC).  The GA statistics have proven much more reliable than the existing plugins available for DSpace previously with which we experienced lost statistics and inflated pageviews resulting from robots.

Unfortunately the GA statistics for sites being tracked are only viewable via the GA dashboard for which users require a Google account and managed permissions.  This limits the visibility of statistics to a few people at each institution.   Prompted by the presentation given by Graham Triggs (then working for BioMed Central) at the Open Repositories Conference 2010, we decided to write some code to make the Google Analytics statistics visible to all users of the DSpace installations.

The work has been broken into phases:

1. Capture of downloads in DSpace by Google Analytics. 
The basic GA tracking code within DSpace is unable to capture the number of file downloads as these are not links within pages.  To address this we added code to the two downloads on the item page to enable these download actions to be measured.  This captured all downloads within Dspace but not those users coming directly from search engines to the download file.  To capture these statistics we decided to reroute all users back through the item page. This means that they now have two clicks instead of one to reach the download but it enables us to capture these statistics and also raises the visibility of the Repository to users.  To reduce the inconvenience to the users we moved the file downloads links on the item page from the bottom to the top so that they do not have to scroll down to find the download. 

2. Adding page views to each item page within DSpace
Secondly, we added the number of page views within the last year to the item page.  This was a proof of concept which showed that we could connect to the Google Analytics API and pull back statistics into DSpace.  We decided to only include the number of views for the past year to reduce any disparities between the the number of pageviews between older and new items.

3. Making statistics viewable within the DSpace web pages. 
We decided to make the GA statistics available at three levels: item, collection and repository as this provides most of the statistics which are requested by users.  Using the Query Explorer provided by Google we were able to test and refine our queries before starting development.  The pages were developed using the Google Analytics java API, jQuery and the Google Chart tools to draw graphs and maps.  





As we complete the rollout of Google Analytics to all the SDLC partners we are starting to look to what other statistics we would like to make available both from Google Analytics and also possible exposing statistical information about DSpace using Google's chart tools. One statistic that would be of interest to researchers is collating and presenting download figures for authors (rather than by item/collection/community).

We have encountered problems separating the item, collection and community statistics within DSpace as all of their urls are formatted in the same way, we therefore have to query DSpace data to do this and cannot distinguish them using the statistics data alone.  If the requested item, file, collection or community is not available in DSpace an error page is returned, these were being recorded in the same way as successful page which has led to invalid items being listed in the statistics top ten tables.  To prevent this error pages are now recorded as an error event within Google Analytics.

These changes have given us much greater understanding of how our repository is being used with the majority of users coming directly from Google.  The URLrewrite change led to a double of our download statistics as we now capture users who previously went straight to the download.

Thanks to: Scottish Digital Library Consortium, Stuart Wood and Gareth Johnson of University of Leicester for information on the URLrewrite, Graham Triggs formerly of BioMed Central and now Sympletic.

The code to enable GA stats within DSpace is freely available from github: https://github.com/seesmith/Dspace-googleanalytics

You can view our collection and item statistic changes at http://www.era.lib.ed.ac.uk

4 comments:

  1. An interesting recent development is Google announcement that social media mentions and referrals to site pages are going to be incorporated into Google Analytics data. This means as well as seeing how many times a resource is viewed it should be possible to see external activity such as someone bookmarking it in Delicious/Diigo or mentioning on Google+.

    It's still to be seen if Google will make this data available via the API but potentially it opens up a number of possibilities in terms of how an individual resource page could be enhanced (one immediate idea is extracting user generated bookmarks used in Diigo/Delicious to enhance the resource metadata).

    [The big caveat in all of this is detailed activity data from Twitter and Facebook is more than likely going to be missing due to the current social network turf wars]

    A link to Google's announcement with some of my thoughts around this areas is here http://mashe.hawksey.info/2012/03/google-analytics-rolling-out-social-network-activity-streams-paradata-heaven/

    Martin

    ReplyDelete
    Replies
    1. Thanks Martin, that sounds interesting, I'll have a look at the announcement. Claire

      Delete
  2. Thanks for blogging about this. I've been trying to get GA optimised for my repository for a long time; my IT support team aren't able to get a handle on it and though I understand what needs to be done (from reading the GA blogs and webmaster tools etc.), I'm no coder and the technical work is beyond me. I am thinking of hiring an external consultant to work on this, and a bit of SEO for the repository as well. Did you do this work as part of everyday business, or was it a special project? Did you, or your in-house IT support, carry out the work?

    ReplyDelete
    Replies
    1. Hi, sorry for not replying sooner, we did this work ourselves as part of our development work.

      Delete