Skip to Main Content

Copyright Help for Theses and Other Projects

Using Data

Since data can be considered "facts," they are not protected by copyright. Compilations of data, such as datasets or databases, however, are treated differently. §103 of the U.S. Copyright Code states that copyright can be extended to compilation, as long as there was some "creative" act of authorship in the assembly of the compilation. The U.S. Supreme Court, in the decision for the case Feist v. Rural, clarified the restrictions and limitations that govern whether or not a compilation of facts can enjoy copyright protection:
  • The compilation must show some creative act in the selection (i.e. which facts to include or exclude) or the arrangement (i.e. order in which facts will be organized or presented) of the facts.
  • Simply "working hard" to compile a large number of facts does not entail enough of a creative act to confer copyright protection on a compilation (known as the "sweat of the brow" doctrine).
  • The copyright protection extends only to those part of the compilation (i.e. the selection and/or arrangement) that constitute the act of creative authorship, not to the underlying facts themselves.
For example, in Feist v. Rural, the Supreme Court decided that while it may take a lot of time, money, and effort to gather all of the names and phone numbers of residents of particular area and to arrange them in alphabetical order, this does not mean that the white pages (a type of phonebook) enjoys copyright protection. The selection of facts was not creative (all residents' names and phone numbers were collected), and the arrangement (alphabetical order) was also not creative. However, if you were to compile a listing of science fiction novels with strong female protagonists, and then arrange them by the protagonist's country (or planet) of origin, your compiliation would be protected by copyright. Since the underlying facts would not be protected, however, anyone would be able to take your compilation and re-use it, provided they arranged it in a different way, or only used a subset of the data (selection).


License Restrictions


Though data are not protected by copyright, some databases which are accessed under contract have additional restrictions placed on their use by the contract. For example, a database owner may specify that data obtained from their compilation may only be used for non-commercial purposes, or that any derivative works created with their data may only be made accessible through restricted access (i.e. behind a sign-in wall). Some of the data collections the library provides access to may have these restrictions. If you have any questions or concerns, contact a librarian and they can check on our exact license conditions.


Scraping data from the web

Sometimes the data you want is spread across a multitude of pages on a website. The dataset you want isn't easily downloadable, rather the data is trapped in elements of these pages. Each page with the same structure, building a web scraper could help create the dataset you desire.

Before you scrape a site, you want to take legality into account. The programmatic nature of scraping means that reusing data gathered in this way usually does not qualify for fair use. For subscription databases, you'll want to take a look at the licensing agreement which will sometimes exclude scraping as an accepted use (look for language around automated or programmatic downloads in the Terms of Use).

Beyond the copyright and licensing of the data you are after, scraping may be disallowed by the hosting site due to the burden placed on the receiving server. Some content providers and databases have APIs explicitly for making programmatic calls, and it is always worth looking for them before creating a scraper.

Questions? Reach out to the Data and Digital Scholarship Librarian, Jess Yao.