Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Thursday, 22 March 2012

Using Google Analytics Statistics within DSpace


Thank you to Claire Knowles of Edinburgh University who provides this overview of how they have been able to display statistics from Google Analytics in DSpace.
----
In 2009 Edinburgh University Digital Library adopted Google Analytics (GA) to track usage statistics within the DSpace Repositories it supports on behalf of the Scottish Digital Library Consortium (SDLC).  The GA statistics have proven much more reliable than the existing plugins available for DSpace previously with which we experienced lost statistics and inflated pageviews resulting from robots.

Unfortunately the GA statistics for sites being tracked are only viewable via the GA dashboard for which users require a Google account and managed permissions.  This limits the visibility of statistics to a few people at each institution.   Prompted by the presentation given by Graham Triggs (then working for BioMed Central) at the Open Repositories Conference 2010, we decided to write some code to make the Google Analytics statistics visible to all users of the DSpace installations.

The work has been broken into phases:

1. Capture of downloads in DSpace by Google Analytics. 
The basic GA tracking code within DSpace is unable to capture the number of file downloads as these are not links within pages.  To address this we added code to the two downloads on the item page to enable these download actions to be measured.  This captured all downloads within Dspace but not those users coming directly from search engines to the download file.  To capture these statistics we decided to reroute all users back through the item page. This means that they now have two clicks instead of one to reach the download but it enables us to capture these statistics and also raises the visibility of the Repository to users.  To reduce the inconvenience to the users we moved the file downloads links on the item page from the bottom to the top so that they do not have to scroll down to find the download. 

2. Adding page views to each item page within DSpace
Secondly, we added the number of page views within the last year to the item page.  This was a proof of concept which showed that we could connect to the Google Analytics API and pull back statistics into DSpace.  We decided to only include the number of views for the past year to reduce any disparities between the the number of pageviews between older and new items.

3. Making statistics viewable within the DSpace web pages. 
We decided to make the GA statistics available at three levels: item, collection and repository as this provides most of the statistics which are requested by users.  Using the Query Explorer provided by Google we were able to test and refine our queries before starting development.  The pages were developed using the Google Analytics java API, jQuery and the Google Chart tools to draw graphs and maps.  





As we complete the rollout of Google Analytics to all the SDLC partners we are starting to look to what other statistics we would like to make available both from Google Analytics and also possible exposing statistical information about DSpace using Google's chart tools. One statistic that would be of interest to researchers is collating and presenting download figures for authors (rather than by item/collection/community).

We have encountered problems separating the item, collection and community statistics within DSpace as all of their urls are formatted in the same way, we therefore have to query DSpace data to do this and cannot distinguish them using the statistics data alone.  If the requested item, file, collection or community is not available in DSpace an error page is returned, these were being recorded in the same way as successful page which has led to invalid items being listed in the statistics top ten tables.  To prevent this error pages are now recorded as an error event within Google Analytics.

These changes have given us much greater understanding of how our repository is being used with the majority of users coming directly from Google.  The URLrewrite change led to a double of our download statistics as we now capture users who previously went straight to the download.

Thanks to: Scottish Digital Library Consortium, Stuart Wood and Gareth Johnson of University of Leicester for information on the URLrewrite, Graham Triggs formerly of BioMed Central and now Sympletic.

The code to enable GA stats within DSpace is freely available from github: https://github.com/seesmith/Dspace-googleanalytics

You can view our collection and item statistic changes at http://www.era.lib.ed.ac.uk

Friday, 27 May 2011

Survey: Annual metrics collection for SCONUL & UK Repositories

As it happens my boss is chair of the SCONUL Chair of Performance Measures group and we've been talking about the SCONUL annual stats return. As many of you remember last year this tried to include repository download numbers for full-text items. I know from conversations I had with various people in the community that what they asked for wasn't realistically collectible, or at least wasn't last year.

However, SCONUL remains keen to be able to demonstrate what the UK repository community is delivering in their findings for 2011. She wanted to know if it was possible to isolate full-text downloads/accesses as discrete from total accesses, as she thinks the former is a more valuable figure.


For my own part I'm still not certain that I can create these figures for my local repository, or at least not without a lot of tech time investigating (something that with all the CRIS work we've got going on isn't really going to be an option). But what about the rest of the community?


Yes, that's right - this is a plea for information!

If you could take a minute or two to complete the following survey it would be very much appreciated and may help shape SCONUL's requests for this year into a more realistic metric! Which I'm sure we could all agree would be a positive move for the community.


The survey is here: http://www.surveymonkey.com/s/PKQH6RC

If you're unable to access the Survey Monkey site please get in touch with me and I'll I'll email you a copy of the questions. I'll blog about the results this time next week - so get clicking and thanks