In this paper, written for Anne Gilliland’s Archives, Records, and Memory course (IS 431) in December 2015, I examine some of the controversies and limitations of financially-bound, restricted-access information repositories, such as JSTOR, and explore some of the alternative repositories in the open-access movement that have emerged. I highlight concerns about power imbalances in hosting information in JSTOR, notably the open-access protests and the example of the JSTOR-related halting of DISA in South Africa. I follow Keith Breckenridge’s example of contrasting Aaron Swartz’s militant opposition to and banditry from JSTOR with examples of other archives, such as the Internet Archive, whose managers explicitly make their organizations’ data open to the public. I then explore the role of torrent sharing, used by the Internet Archive, in disseminating information openly, and ultimately I tie the controversy over open access principles to notions of capitalism, reflecting on the transgressive properties of open-access initiatives.
In 2011, the U.S. Department of Justice (DOJ) indicted Aaron Swartz for computer fraud after he downloaded five million articles from the scholarly journal repository JSTOR. Swartz, an advocate for open access and free availability of information the likes of which JSTOR holds with access restricted to users affiliated with member institutions, had previously argued that JSTOR’s limiting of access to the content it owned was unjust, and had urged fellow open-access advocates to circumvent its restrictions.1 In 2013, facing prosecution from the DOJ, Swartz committed suicide, effectively becoming a martyr for the cause of open access in the eyes of his fellow advocates.2 Swartz’s argument continues to reverberate, as JSTOR-dominated systems continue to leave information out of many users’ reach, carrying political implications around the globe, notably in post-colonial societies in whose records JSTOR’s funding organizations have invested. Alternative systems of digitizing, hosting, and sharing records have also arisen, presenting potential opportunities to make access to those records more free and equal. Some of the alternative means of data dissemination used by these public archives have transgressive elements to them, and questions still remain regarding whether their transgressiveness is a necessary component to building new paradigms of information access, or whether that transgressiveness should be avoided. Systems of and advocacy for open and shared data and access lie at the intersection of technology, politics, security, social concerns, and finance, and it is necessary to navigate those intersecting issues in order to better understand the role the archive should play in between capitalist systems and a world of digital information.
JSTOR is a paradoxical enterprise, containing much to admire as a resource and repository while at the same time having problematic elements. Keith Breckenridge describes the breadth and depth of the initiative’s content, its wide accessibility and its extremely comprehensive searchability. He describes JSTOR as “a contradictory case of open and closed access,” citing its offering free membership for certain institutions but requiring payment of fees for most others, in addition to JSTOR materials being essentially inaccessible for the general public unaffiliated with a member institution. While Breckenridge mainly focuses on JSTOR’s presence in Africa, notably its African Access Initiative, the restricted-access aspect of JSTOR has been the source of controversy in the United States as well, being the motivation for Swartz’s mass download.3
Swartz’s action against JSTOR protested the perceived inequality of access fostered by the repository. In his manifesto advocating for the Open Access Movement, he calls the restriction of educational content to a small group of paying members “private theft of public culture,” in contrast to the U.S. Department of Justice’s later classification of Swartz’s actions as “theft” from JSTOR. Swartz insists that sharing that content beyond the mandated restrictions of financial and intellectual-property ownership is “a moral imperative,” advocating open access regardless of whether or not the law permits it — in fact, the illegality of sharing the information in question seems to be a primary motivation in Swartz’s description of his mission; the law is seen as an unjust one, a wrong to be righted with direct action.4
However, Breckenridge complicates Swartz’s militant open-access stance with the realities of financially driven digitization initiatives. He states that “without some mechanism to protect the property rights of the collaborating academic publishers and to support the ongoing heavy costs of digital preservation, the JSTOR project was clearly doomed.”5 According to conventional wisdom in this case, digitization and the perpetuation of digital records requires financial funding; the revenue gained by JSTOR from paid access, restrictive as the fee-paying membership policy may be to truly open access to financially underprivileged users, is necessary to provide the funds for digitization and preservation. Immediately making all of the data free and open could thus run the risk of hindering digitization and preservation initiatives.
Dependence on financial revenue for producing and maintaining digital records is still problematic, though, as Breckenridge illustrates with the case of the Digital Imaging Project of South Africa (DISA). DISA, like JSTOR, was funded by the U.S.-based Mellon Foundation, which caused concerns among South African archivists regarding possible political imbalances between themselves and their foreign benefactors. Several facets of imbalance of power were manifested here. “Technopolitical” issues included complications over licensing. Aluka, a Mellon-funded project dedicated to digitizing and providing online access to African content, was to own the rights to all the data provided to it by DISA, and the flow of data between South Africa and the “global North” was unequal as well. The data flow “from South Africa was to be immediate and unrestricted,” writes Breckenridge, “while the movement of records from Princeton to Durban was hedged about with careful licensing requirements designed to protect the value of the holdings in the US and Britain.” It is clear to see why such an arrangement sowed distrust among South African parties to the project: the way the initiative was structured appeared to favor the foreign powers closer to the initiative’s financial backers. This aroused fear of a neo-colonialist power structure to the digital archive, holding South Africa at a disadvantage while American and European entities wielding financial power would reap the lion’s share of the benefits. Part of that fear stems from the nature of digital archives and online access themselves; they can appear as an existential threat to the classic notion of the physical archive. Online access to digital records frees those records from within the archive’s tangible walls, and South African archivists worried about how their institutions would be affected by researchers’ ability to access their records without having to patronize the archives themselves. This anxiety is only compounded by the placement of the rights to the online holdings into foreign hands.6
Accusations of such offenses as neo-colonialism helped foster distrust between DISA and Aluka, and that distrust, compounded with difficult-to-realize expectations on DISA’s part, led to a lack of productivity in the digitization initiative, which prompted Mellon to cease funding the project, leading to the effective end of DISA. In 2008, the year of Swartz’s manifesto, the Aluka collection was transferred to JSTOR, and the records digitized as part of that project, according to Breckenridge, fare poorly in South Africa while access to them is vastly superior at JSTOR’s member institutions. While access is free through non-profit education institutions in Africa, most people are not members of such institutions and thus are at a disadvantage when it comes to accessing the records. Breckenridge writes, “This arrangement — despite its undeniably generous and humane intentions — has the unfortunate (and often predicted) result that most people in southern Africa are denied access to the digital archive of their own struggle history,” illustrating the point with a screen-capture of an online message denying access to Aluka’s Anti-Apartheid Movement Collection.7 This apparent injustice of a majority of South Africans being unable to access their own heritage is just the type of situation Swartz intends to rebel against in his manifesto. “Providing scientific articles to those at elite universities in the First World, but not to children in the Global South? It’s outrageous and unacceptable.”8
It is clear from the above example that the financially driven restricted-access model for information repositories has significant problems. Multiple examples are evident of persons and organizations experimenting with means of facilitating open access and open data, showing that broader access is a wide concern. Breckenridge’s main purpose in “The Politics of the Parallel Archive” is to dispel the notion that digitization in itself is detrimental to South African record-keeping; rather, he intends to put forward cases of new developments in public archives that demonstrate that the online economy for digital content is in fact robust. His goal is to promote the idea that South Africa can renew the digitization initiatives of its archival material by joining this public-archive model, and that it can continue digitizing and facilitating access to its content without having to fall victim to the power struggles and financial difficulties that affected its prior digitization project.9 Since the anxieties around consolidation of power in entities such as JSTOR are not limited to South Africa, it is worth investigating some of the organizations and technologies emerging in the movements for public archives and open access including and beyond the ones that Breckenridge examines.
An important issue worth keeping in mind throughout such an exploration is the transgressiveness toward conservative archival norms that is associated with numerous facilitators of open access, the particular examples of which will be discussed below. While some organizations promote their “legitimacy” and dissociate themselves from transgressiveness, others embrace the transgressive potential of openness and shared information; still others have elements of both in their operations. Throughout any look into the various purveyors of openness regarding records and information, one should continue to consider the transgressiveness inherent in bringing information outside the confines of the archive and within reach of anyone with an Internet connection.
The Internet Archive is a primary example of a public archive. Founded in 1996 by Brewster Kahle, the Internet Archive is free and accessible for users in the general public. It hosts digitized copies of many public-domain books and movies, as well as archived versions of Web sites. Swartz’s mass download from JSTOR included a significant amount of text that was in the public domain but access to which was limited to JSTOR users; after Swartz was indicted, a fellow activist posted 33 gigabytes of the public-domain data to a torrent site in protest. (More about torrent sharing below.) The Internet Archive helps to combat the perceived stranglehold of JSTOR on material that is in the public domain and, in the eyes of open-access advocates, should therefore be publicly available rather than privately held. The texts of the books hosted on the Internet Archive are fully searchable, Breckenridge notes, and thus the free Internet Archive holds capabilities similar to those which make the more limitedly available JSTOR appealing to users searching texts. Additionally, the Internet Archive’s affiliation with Zotero presents an alternative to scholars turning over the ownership of their writings to something like JSTOR. With the Zotero Commons, users also have access to comprehensive text searchability, and they can transfer materials for hosting at the Internet Archive. Breckenridge makes a point of contrasting the Internet Archive and Zotero with more openly transgressive access facilitators such as LibGen and WikiLeaks, which openly traffic commercially published books and classified state information, respectively; while those push and cross the boundaries of legality, the Internet Archive operates within the legal realm.10 The Internet Archive cooperates with content owners who do not wish for their content or archived Web pages to be hosted there. Two such entities are the New York Times and the Wall Street Journal, at least the former of which charges money for access to its archival content.11
Breckenridge mentions, but does not investigate in detail, the Internet Archive’s use of torrent files to disseminate its data.12 Torrent sharing is an intriguing example of a peer-to-peer file-sharing technology that can be harnessed for open data, breaking new ground in terms of transmitting data as well as hosting it. Torrent technology, developed by the BitTorrent firm in the early 2000s, does not rely on a central server. Files transmitted via BitTorrent are broken up into small pieces, and any computer that has downloaded a piece of a file serves as a host of that file from which other users can download for the duration of their connection to the overall “swarm” of users. This mechanism of distribution allows numerous users to share the burden of hosting a file, rather than having a single server bear the digital weight of hosting an entire archive of files.
Because of the distributive property inherent to BitTorrent dissemination, BitTorrent has become a prominent means of sharing pirated media, illegal copies made of books, movies and television programs still under copyright. Due to this manner in which the technology is used, BitTorrent has become practically synonymous with content piracy in many circles, and thus it has something of a dangerous reputation. As a result, BitTorrent, the firm, has tried multiple methods of branding its technology as “legitimate” and downplaying its transgressive potential. In 2013, after Edward Snowden’s leaks revealed the extent to which the National Security Agency (NSA) was spying on users’ online data, BitTorrent used the controversy to market itself as a safe means by which users could host and transmit their data, the shared hosting making it more difficult for the NSA to search users because of the lack of a central server to search. BitTorrent thus attempted to brand itself as a means by which users could enhance their online privacy and security, rather than as a tool for pirates.13 Also, in recent years, BitTorrent has hosted content for sale in an official capacity via “BitTorrent bundles,” collections of files containing movies or music that users could purchase and then download and host through BitTorrent technology. However, even with this push toward a legitimate appearance, BitTorrent has not abandoned its transgressive potential entirely, even in an official capacity as a firm. The official BitTorrent Web site eulogized Swartz after his suicide (noting that the Internet Archive was holding a memorial for him), and it has also offered promotional BitTorrent bundles intended to popularize The Internet’s Own Boy, the documentary film about Swartz. By offering copies of the documentary for sale on its own site, BitTorrent has made a clear effort to publicize the message of Swartz’s activism.14
In the wake of the Internet Archive’s use of BitTorrent to share its hosted data, others have considered its potential as a tool for access facilitation, including some government agencies. In 2010, the U.K. Treasury began releasing its Combined Online Information System (COINS) data via BitTorrent on Data.gov.uk.15 Chris Markman and Constantine Zavras have written about the potential uses for BitTorrent in libraries. They acknowledge the controversy of BitTorrent and frame its usefulness as something independent of the illegal activity for which others use it. “Separating the technology from the misinformation” is key, they state, and they advocate for its usefulness as a practical, sustainable, and legal tool for libraries: “Introducing BitTorrent into your library’s information ecosystem is not only a potential cost saver, but the first step toward building an online data community as well.” However, one of the most intriguing uses for BitTorrent in libraries that they suggest is transgressive: streaming material from a library’s video collection while eschewing the process of negotiating licensing the rights to stream that content from the rights holders. Markman and Zavras make clear that, despite the “libraries and archives exemption” from copyright infringement for certain duplication within their institutions under Section 108 of the U.S. copyright law, a BitTorrent-based streaming endeavor, even if it were restricted by the library, would still be illegal.16 Even so, the fact that they discuss such a measure in their article indicates a willingness on their part to see institutions try to implement it. Doing so, they seem to imply, may push against the limitations of the copyright code and make progress toward forming a new paradigm for broadening access to libraries’ audiovisual content in the digital age.
Others who write about BitTorrent use go farther than the aforementioned entities and persons do when advocating for it, not hedging their words but openly embracing the transgressive element of the distribution technology. Gavin Mueller, in an overview of file-sharing written for Jacobin, takes a political stance and comes close to lauding the piracy aspect as a necessary counter to capitalist hegemony, painting Kim Dotcom, the criminally convicted founder of MegaUpload and then Mega, as something of a folk-antihero, a digital-era equivalent of Blackbeard.17 Peter Sunde, founder of the torrent site The Pirate Bay, whose name embraced the technology’s reputation for piracy, and who was jailed for abetting copyright infringement, spoke out directly against capitalism as the primary hindrance to an open Internet. “I have given up the idea that we can win this fight for the Internet,” Sunde says. “We are trying to recreate this capitalistic society we have on top of the internet. So the internet has been mostly fuel on the capitalistic fire, by kind of pretending to be something which will connect the whole world, but actually having a capitalistic agenda.”18
In their placement of capitalist society as the villain and overtly transgressive dissemination of data as necessary pushback against the capitalist powers that be, Mueller and Sunde echo the militant stance of Swartz, whose manifesto advocated breaking the rules of JSTOR in order to shatter a perceived unjust concentration of power over information and spread the content it holds to the people who can benefit from having it but are not presently granted access to it. Capitalism was at the root of Swartz’s complaint, in that organizations like JSTOR effectively gave ownership of a wealth of material, some already in the public domain, to a select elite. Even BitTorrent’s official Web site, which attempts to present the organization and its technology as legitimate, holds up Swartz, advocate of transgressing standards of legality for the greater good, as a martyr for a more open internet. Even if one does not go as far as Sunde does when he despairingly claims that the entire system must come crashing down and a socialist revolution must take place in order for a truly open internet to be possible,19 it is clear that financially-based, restricted-access models of information sharing have significant flaws that should be addressed. While the “neo-imperialism” of JSTOR and the Mellon Foundation may not necessarily have been the cause of the downfall of DISA, the current system of access, in which JSTOR holds the clear advantage in terms of power and resources, is unfair to many South Africans whose ability to access their own nation’s history is severely limited. While public archives do not yet hold all the answers — and they are still to an extent dependent on capitalism, since, as evidenced by the Internet Archive’s budget and funding partners, digitization and system maintenance initiatives need financial backing in order to function20 — they show potential for reshaping the systems of storing and sharing data in a more democratic fashion that benefits all potential users equally. Whether change will occur in small increments or in the form of Sunde’s hoped-for full-scale Marxist revolution, and whether the future will be driven by BitTorrent or by something else entirely, remains to be seen. What is clear, however, is that it is possible to, if not form an entire system of open access, build a system in which access is more open for more people, and public archives and information-sharing communities are taking important steps toward that ultimate goal.
1. Aaron Swartz, “Guerilla Open Access Manifesto.” Internet Archive, July 2008. Online at http://archive.org/details/GuerillaOpenAccessManifesto.
2. Keith Breckenridge, “The Politics of the Parallel Archive: Digital Imperialism and the Future of Record-Keeping in the Age of Digital Production.” Journal of Southern African Studies 40:3 (May 23, 2014): 512-513.
3. Breckenridge, “Politics of the Parallel Archive,” 510-512.
4. Swartz, “Guerilla Open Access Manifesto.”
5. Breckenridge, “Politics of the Parallel Archive,” 512.
6. Breckenridge, “Politics of the Parallel Archive,” 507-509.
7. Breckenridge, “Politics of the Parallel Archive,” 510-511.
8. Swartz, “Guerilla Open Access Manifesto.”
9. Breckenridge, “Politics of the Parallel Archive,” 511-512.
10. Breckenridge, “Politics of the Parallel Archive,” 513-514.
11. David Womack, “Who Owns History?” Cabinet 10 (Spring 2003). Online at http://www.cabinetmagazine.org/issues/10/womack.php.
12. Breckenridge, “Politics of the Parallel Archive,” 514.
13. Omar El Akkad, “After NSA revelations, BitTorrent tries to capitalize on privacy fears.” The Globe and Mail, Oct. 14, 2013. Online at http://www.theglobeandmail.com/technology/after-nsa-revelations-bittorrent-tries-to- capitalize-on-privacy-fears/article14861176/.
14. “Aaron Swartz” tag, BitTorrent Blog. Online at http://blog.bittorrent.com/tag/aaron-swartz/.
15. Charles Arthur, “Coins: A flood of data is on its way … but we will need to make sense of it.” The Guardian, June 4, 2010. Online at http://www.theguardian.com/politics/2010/jun/04/coins-treasury-public-sector- data.
16. Chris Markman & Constantine Zavras, “BitTorrent and Libraries: Cooperative Data Publishing, Management and Discovery.” D-Lib, Vol. 4, Nos. 3/4 (March/April 2014). Online at http://www.dlib.org/dlib/march14/markman/03markman.html.
17. Gavin Mueller, “Gimme The Loot.” Jacobin, August 2012. Online at http://www.jacobinmag.com/2012/08/gimme-the-loot/.
18. Joost Mollen, “Pirate Bay Founder: ‘I Have Given Up.’” Motherboard, Dec. 11, 2015. Online at http://motherboard.vice.com/read/pirate-bay-founder-peter-sunde-i-have-given-up.
20. Womack, “Who Owns History?”