Thank you to Claire Knowles of Edinburgh University who provides this overview of how they have been able to display statistics from Google Analytics in DSpace.
----
In 2009 Edinburgh University
Digital Library adopted Google Analytics (GA) to track usage statistics within
the DSpace Repositories it supports on behalf of the Scottish Digital Library
Consortium (SDLC). The GA statistics
have proven much more reliable than the existing plugins available for DSpace
previously with which we experienced lost statistics and inflated pageviews
resulting from robots.
Unfortunately the GA statistics
for sites being tracked are only viewable via the GA dashboard for which users
require a Google account and managed permissions. This limits the visibility of statistics to a
few people at each institution.
Prompted by the presentation given by Graham Triggs (then working for
BioMed Central) at the Open Repositories Conference 2010, we decided to write
some code to make the Google Analytics statistics visible to all users of the
DSpace installations.
The work has been broken into
phases:
1. Capture of downloads in
DSpace by Google Analytics.
The basic GA tracking code within
DSpace is unable to capture the number of file downloads as these are not links
within pages. To address this we added
code to the two downloads on the item page to enable these download actions to
be measured. This captured all downloads
within Dspace but not those users coming directly from search engines to the
download file. To capture these
statistics we decided to reroute all users back through the item page. This
means that they now have two clicks instead of one to reach the download but it
enables us to capture these statistics and also raises the visibility of the
Repository to users. To reduce the
inconvenience to the users we moved the file downloads links on the item page
from the bottom to the top so that they do not have to scroll down to find the
download.
2. Adding page views to each
item page within DSpace
Secondly, we added the number of
page views within the last year to the item page. This was a proof of concept which showed that
we could connect to the Google Analytics API and pull back statistics into
DSpace. We decided to only include the
number of views for the past year to reduce any disparities between the the
number of pageviews between older and new items.
3. Making statistics viewable
within the DSpace web pages.
We decided to make the GA
statistics available at three levels: item, collection and repository as this
provides most of the statistics which are requested by users. Using the Query Explorer provided by Google
we were able to test and refine our queries before starting development. The pages were developed using the Google
Analytics java API, jQuery and the Google Chart tools to draw graphs and maps.
As we complete the rollout of
Google Analytics to all the SDLC partners we are starting to look to what other
statistics we would like to make available both from Google Analytics and also
possible exposing statistical information about DSpace using Google's chart
tools. One statistic that would be of interest to researchers is collating and
presenting download figures for authors (rather than by
item/collection/community).
We have encountered problems
separating the item, collection and community statistics within DSpace as all
of their urls are formatted in the same way, we therefore have to query DSpace
data to do this and cannot distinguish them using the statistics data
alone. If the requested item, file,
collection or community is not available in DSpace an error page is returned,
these were being recorded in the same way as successful page which has led to
invalid items being listed in the statistics top ten tables. To prevent this error pages are now recorded
as an error event within Google Analytics.
These changes have given us much
greater understanding of how our repository is being used with the majority of
users coming directly from Google. The
URLrewrite change led to a double of our download statistics as we now capture
users who previously went straight to the download.
Thanks to: Scottish Digital
Library Consortium, Stuart Wood and Gareth Johnson of University of Leicester
for information on the URLrewrite, Graham Triggs formerly of BioMed Central and
now Sympletic.
The code to enable GA stats
within DSpace is freely available from github:
https://github.com/seesmith/Dspace-googleanalytics
You can view our collection and item statistic changes at
http://www.era.lib.ed.ac.uk
Graham Trigg’s slides from
OR2010: http://www.slideshare.net/OpenRepository/enhancing-statistics-google-analytics-and-visualization-apis
An interesting recent development is Google announcement that social media mentions and referrals to site pages are going to be incorporated into Google Analytics data. This means as well as seeing how many times a resource is viewed it should be possible to see external activity such as someone bookmarking it in Delicious/Diigo or mentioning on Google+.
ReplyDeleteIt's still to be seen if Google will make this data available via the API but potentially it opens up a number of possibilities in terms of how an individual resource page could be enhanced (one immediate idea is extracting user generated bookmarks used in Diigo/Delicious to enhance the resource metadata).
[The big caveat in all of this is detailed activity data from Twitter and Facebook is more than likely going to be missing due to the current social network turf wars]
A link to Google's announcement with some of my thoughts around this areas is here http://mashe.hawksey.info/2012/03/google-analytics-rolling-out-social-network-activity-streams-paradata-heaven/
Martin
Thanks Martin, that sounds interesting, I'll have a look at the announcement. Claire
DeleteThanks for blogging about this. I've been trying to get GA optimised for my repository for a long time; my IT support team aren't able to get a handle on it and though I understand what needs to be done (from reading the GA blogs and webmaster tools etc.), I'm no coder and the technical work is beyond me. I am thinking of hiring an external consultant to work on this, and a bit of SEO for the repository as well. Did you do this work as part of everyday business, or was it a special project? Did you, or your in-house IT support, carry out the work?
ReplyDeleteHi, sorry for not replying sooner, we did this work ourselves as part of our development work.
Delete