Many institutions and advocates are grappling with key questions around how to advance the open data agenda. While there has been activity in the development of mandates and best-practice recommendations, there is still a lot to be done in terms of providing training and incentivisation to researchers to engage with principles of data stewardship, building support structures to facilitate the work of intermediaries in the data publishing process, and establishing a more enabling, cohesive policy framework.
On 13 March 2018 the Organisation for Economic Co-operation and Development (OECD) Committee for Scientific and Technological Policy (CSTP) and Global Science Forum (GSF) hosted a workshop in Paris to advance the OECD policy-making agenda on open data.
Titled “Towards New Principles for Enhanced Access to Public Data for Science, Technology and Innovation”, the workshop brought together a group of experts from around the world to explore key areas and related principles and policy actions to promote enhanced access to data from publicly-funded research. The meeting was framed within the overall context of existing OECD principles and guidelines for access to research data from public funding and the challenges related to implementing this recommendation.
Based on my experience in working with academics in promoting the sharing of data and other scholarly outputs in various institutional environments and research settings, I was invited to make a live-stream video presentation to the workshop from a Global South perspective. As a panelist in the session on “Building human capital and institutional capabilities”, I shared some of the insights gained in the course of my work as the Curation and Publishing Manager of the Research on Open Educational Resources for Development (ROER4D) project.
Working in a large-scale network of researchers in universities and non-governmental organisations in 21 developing countries has led to fascinating insights around the complexities of pursuing an open research and open data publishing agenda in a context that is defined by diverse, multilingual approaches to the research process.
In the period leading up to the OECD CSTP-GSF workshop, I was asked by the organisers to submit notes on a few key points around open data practice that I would like to address. I am sharing those notes in the hope that they can help to advance a more nuanced understanding of the open research and open data agenda, particularly as relates to the developing country context.
Key point 1: Researchers require training in and incentivisation to undertake a basic level of content curation and data stewardship
Two of the most commonly cited barriers to data sharing are researchers’ lack of time to undertake the sharing/publishing process and the fear that their data will be misappropriated or simply not understood by others. These are valid concerns, which are often underpinned and amplified by the fact that many researchers do not have basic curatorial knowledge and strategies in place for how they store and describe their data.
This “state of disarray” which makes the public exposure of data daunting and extremely time consuming is not solely a feature of the data publishing process, but is rooted in foundational research design and data collection processes. While attention is increasingly being paid to data management planning and the legal, ethical and privacy aspects of open data, there is a need for greater attention to be paid to clear, systematic protocols addressing file-naming conventions, version control, de-identification strategies and secure storage solutions. These workflows and principles (ideally established at the outset of a research process) not only facilitate downstream data publication, but also allow researchers to have more control in their data analysis processes and share more easily with collaborators. These solutions ultimately promote greater rigour in the research process.
The conversation around good practices in data access and sharing needs to be expanded to good practices in data collection and curation. The “open” aspect of the academic experience is often a confidence game. In this context, the old adage that “well begun is half done” is highly pertinent. Researchers who have stringent, ordered data housekeeping strategies are more likely to be confident about sharing their data and describe it in ways that make it useful to others. The stressful scramble around data publication that often takes place post hoc when academics’ focus has shifted to the next grant or teaching process can be alleviated to a large extent through a sustained process of paying attention to “born open” principles of data curation.
This issue pertains to capacity building on the part of individual researchers as well as the provision of infrastructure and platforms that can facilitate this practice. The establishment of institutionally based secure data archives, virtual research environments and collaborative repositories not only serve as tools for better data management; their use also prompts researchers to consider foundational curatorial issues in the course of the research process. In instances where under-resourced institutions are not able to afford this infrastructure, consortium and regional initiatives are crucial to capacitate marginalised academics and institutions. An example of such an initiative is the Data Intensive Research Initiative of South Africa (DIRISA) pilot which is rolling out a nationally-subsidised network of Figshare instances at South African universities to address data management and the fulfillment of data-sharing mandates.
The South African case has made it evident that mandates around data sharing are necessary but not sufficient to stimulate better data management and sharing. Researchers need to be capacitated and incentivised to incorporate a more professional approach to data stewardship, an area of work which is very new to certain disciplines. Embedding a curatorial mindset in the academic process will go a long way towards stimulating activity in this regard, and in making the process less taxing for researchers.
Key point 2: Intermediaries are critical in the open data sharing process
The importance of intermediaries in the open data ecosystem has been widely acknowledged by advocates and practitioners. These intermediaries are typically professionals trained in data stewardship and publication expertise who (a) alleviate the load of data curation and publication tasks faced by researchers; or (b) provide value-add services (such as analytics and visualisations) that enrich the third-party user experience.
A study on open data in the governance of South African higher education (Van Schalkwyk, Willmers & Czerniewicz, 2014) found that open data intermediaries increase the accessibility and utility of data; provide both supply-side as well as demand-side value; can assume the role of a ‘keystone species’ in a data ecosystem; and have the potential to democratise the potential impact of and use of open data in that they play an important role in curtailing the ‘de-ameliorating’ effects of data-driven disciplinary surveillance.
In many developing country contexts there is an imperative to provide capacity building initiatives that address foundational research skills and introduce academics to the principles and processes of open data sharing. Many of these researchers face severe challenges around skills and infrastructure deficits as well as language barriers in accessing the support required to develop new scholarly communication skills. In these contexts, intermediaries such as data curators have a crucial role to play in undertaking the data preparation and de-identification work required in order to publish data. They also have a crucial role to play in working with researchers to build their the capacity and confidence of researchers in the data-sharing process, building trust in the process and liberating them from misconceptions around publication scooping and exploitative, unethical third-party data use.
In a large-scale, networked initiative such as the Research on Open Educational Resources for Development (ROER4D) project, which has engaged over 100 researchers working in 21 countries across South America, Sub-Saharan Africa and South and Southeast Asia, there was a direct imperative to provide centralised data curation and publication services. Located in a centralised Network Hub at the University of Cape Town, a curation and dissemination team worked with researchers in the articulation of data management plans and undertook a wide range of data preparation tasks on behalf of researchers – working with them in a collaborative fashion to allay fears and capacitate researchers to do work of this kind on their own in the future. These intermediaries also played a key role in assessing ethical and legal frameworks in collaboration with researchers, identifying problematic phrasing in consent forms that preclude data sharing and advising on most appropriate de-identification processes.
The increasingly high levels of casualisation in many universities (both in South Africa and abroad) poses a challenge to the entrenchment of intermediaries in sustainable initiatives that work with academics over a period of time to build skills and momentum in data-sharing activities. Support staff such as the ROER4D curation and dissemination team members are often contracted, project-specific staff who are lost to the institution and research communities after contracts expire.
The imperative around the recognition and support of data-publication intermediaries is made more acute in South Africa and other developing countries where libraries are under enormous pressure to transform into entities that can provide research-oriented support services in addition to their traditional focus on undergraduate resource provision. Many libraries in African universities face budget and skills deficits as they grapple with the challenge of addressing bourgeoning student populations with increasing demands for e-learning and remote access. This makes it challenging for African librarians to play the role of open access and open data service providers, as is the case in many developed countries.
A centralised mechanism (at state and institutional levels) to recognise the importance of support staff with a wide range of proficiencies in diverse institutional settings and provide bridging finance when there is no associated grant to cover this work, would be highly valuable in terms of capacity development as well as the establishment of a dynamic national community of practice around open data sharing.
Key point 3: A lack of cohesion in policy frameworks hinders open sharing by researchers
The ability of researchers to share outputs arising from their work is dictated by institutional intellectual property (IP) policies, which are in turn largely influenced by national copyright acts. In the African context, many universities have nascent policy environments, meaning that they may not have an IP policy or it is out of date and inadequate to cover the intricacies of online content sharing, particularly as relates to open data transfer and publication. This situation makes for confusion on the part of academics in terms of what their actual rights are in terms of data sharing … or, in some cases, may lead to flagrant disregard for policies and mandates.
A review of South African universities’ IP policies by ROER4D researcher Henry Trotter revealed that even though all 26 South African universities are public state-funded institutions, they each have their own IP policies, which provide different prescriptions regarding copyright ownership. The survey was done with a focus on the provisions of sharing teaching and learning materials, but it is reasonable to infer that similar discrepancies will occur in relation to data ownership and sharing. The University of Cape Town IP Policy, for instance, promotes open content sharing and the use of Creative Commons licensing, ceding copyright of research data to academics; while the Stellenbosch University Policy on the Commercial Exploitation of Intellectual Property by contrast is strongly focused on commercialisation and states that the university owns copyright over all outputs produced by academics in the course of their work, including raw data created during research.
This uneven policy context is not only confusing for academics to navigate, but introduces highly challenging legal constraints for open data sharing, particularly in collaborative, cross-institutional projects. Grant agreements do increasingly provide exceptions and caveats to restrictive IP policies, but these agreements are often not adequately scrutinised by researchers and the lack of cohesion between institutional policies and the dictates of funding entities serves to amplify the distrust of open data practice.
National and regional initiatives to assess and revise institutional IP policies so that they are conducive to open data sharing (or establish these policies in cases where they do not exist), would be extremely valuable in terms of promoting open data practice and ensuring a clear, cohesive approach to the legal and ethical aspects of the process – the uncertainty of which often inhibits researchers’ practice in this regard.