Data Licensing: Overview

Michelle Ma
December 6, 2024

Contracts: Data

In this new series, I discuss data licensing as it relates to general B2B SaaS situations, taken from my years of drafting and negotiating data license agreements. In the next two posts, I’ll dive more into common licensor concerns (the company licensing out the data) and the licensee (the company receiving the data), and expectations. I’ll also discuss data licensing for AI-specific use cases, such as for training foundation models, as well as data licensing as part of a larger commercial license or contract. 

Why License: The Value of Data 

In today’s world of data-driven technology, companies increasingly recognize the value of data as an asset with monetary value, and as something that can be commercialized through licensing. Data sets may also be subject to IP protection as a copyrightable compilation; therefore, proper licensing language and access and usage rights is necessary. 

A Data License Agreement provides these protections by describing a company’s access, use, and (sometimes) distribution rights to specific data sets provided by another company. 

Where Data Licensing Happens

Data licensing comes up in these 2 common commercial contexts: 

  1. Data is the core asset being licensed and sold. Examples: database replication services, database access, and training data access. Applicable use cases include marketing agency services, academic research, and others where a data set is valuable for an entity’s research, product development or business operations. 
  2. Data exchange is part of a broader commercial relationship. Example: SaaS contracts often require usage of usage data, customer-specific data and aggregated customer data for service monitoring, improvements, and product development. The focus is on providing services for the customer, with customer data and usage data access an ancillary, but important, aspect to the core commercial relationship.

Common Concerns

In licensing data, product and legal teams commonly focus on:

  • the exact data sets being licensed
  • usage of the data
  • ownership of that data
  • ownership and creation of derived data 

In AI-specific contexts, the provenance of the data and how and whether it’ll be used to train AI models is also a key issue that can drive the price of obtaining those datasets. 

In my next two posts, I’ll discuss these concerns in more detail, both from the licensor and licensee side. Stay tuned!