From 41af605958e79daea9d7e9b7512503a1f23b36b1 Mon Sep 17 00:00:00 2001 From: Tom Sercu Date: Mon, 9 Dec 2024 11:17:13 -0500 Subject: [PATCH] Clarify License Terms in README.md and LICENSE.md --- LICENSE.md | 8 ++-- README.md | 116 +++++++++++++++++++++++++++++++++++++---------------- 2 files changed, 85 insertions(+), 39 deletions(-) diff --git a/LICENSE.md b/LICENSE.md index fe0b992..9798d1d 100644 --- a/LICENSE.md +++ b/LICENSE.md @@ -2,12 +2,12 @@ Here are the different licenses that govern access to the ESM codebase and the models inclusive of weights: -| License | What | +| License | Component | |------------------------------------------------------------|-------------------------------------------------------------------| | [Cambrian Open License Agreement](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement) | Code on GitHub (excluding model weights) | | [Cambrian Open License Agreement](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement) | ESM C 300M (incl weights) | | [Cambrian Non-Commercial License Agreement](https://www.evolutionaryscale.ai/policies/cambrian-non-commercial-license-agreement) | ESM-3 Open Model (incl weights) | | [Cambrian Non-Commercial License Agreement](https://www.evolutionaryscale.ai/policies/cambrian-non-commercial-license-agreement) | ESM C 600M (incl weights) | -| Governed by API Agreements (See Below) | API-only models (ESM C 6B, ESM3 family) | -| [Forge API Terms of Use](https://www.evolutionaryscale.ai/policies/terms-of-use) | Free non-commercial API access via Forge | -| [Cambrian Inference Clickthrough License Agreement](https://www.evolutionaryscale.ai/policies/cambrian-inference-clickthrough-license-agreement) | Commercial Inference via SageMaker | +| Governed by API Agreements (See Below) | API access to all models, including API-only models (ESM3 family, ESM C 6B) | +| [Forge API Terms of Use](https://www.evolutionaryscale.ai/policies/terms-of-use) | Free non-commercial API access via Forge to all models including API-only models (ESM3 family, ESM C 6B) | +| [Cambrian Inference Clickthrough License Agreement](https://www.evolutionaryscale.ai/policies/cambrian-inference-clickthrough-license-agreement) | Commercial Inference via SageMaker for all ESM C models | diff --git a/README.md b/README.md index 9fa73fb..ac15209 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,20 @@ -- [Installation ](#installation-) -- [ESM C ](#esm-c-) - - [Using ESM C 300M and 600M via GitHub](#using-esm-c-300m-and-600m-via-github) - - [Using ESM C 6B via Forge API](#using-esm-c-6b-via-forge-api) - - [Using ESM C 6B via SageMaker](#using-esm-c-6b-via-sagemaker) -- [ESM 3 ](#esm-3--) - - [Quickstart for ESM3-open](#quickstart-for-esm3-open) - - [Forge: Access to larger ESM3 models](#forge-access-to-larger-esm3-models) -- [Responsible Development ](#responsible-development-) -- [Licenses ](#licenses--) +- [Installation ](#installation) +- [ESM C](#esm-c) + - [ESM C 300M and 600M via GitHub](#esm-c-github) + - [ESM C via Forge API for Free Non-Commercial Use](#esm-c-forge) + - [ESM C via SageMaker for Commercial Use](#esm-c-sagemaker) + - [ESM C Example Usage](#esmc-example) +- [ESM 3](#esm3) + - [Quickstart for ESM3-open](#esm3-quickstart) + - [Forge: Access to larger ESM3 models](#esm3-forge) + - [ESM 3 Example Usage](#esm3-example) +- [Responsible Development ](#responsible-development) +- [Licenses](#licenses) ## Installation -To get started with ESM, install the library using pip: +To get started with ESM, install the python library using pip: ```bash pip install esm @@ -23,12 +25,14 @@ pip install esm ESM C comes with major performance benefits over ESM2. The 300M parameter ESM C delivers similar performance to ESM2 650M with dramatically reduced memory requirements and faster inference. The 600M parameter ESM C rivals the 3B parameter ESM2 and approaches the capabilities of the 15B model, delivering frontier performance with far greater efficiency. The 6B parameter ESM C sets a new benchmark, outperforming the best ESM2 models by a wide margin. -ESM C models are available immediately for academic and commercial use under a new license structure designed to promote openness and enable scientists and builders. You can find the high level take-away of the license structure in the [Licenses](#licenses) section of this page, and the full license structure in the [LICENSE.md](LICENSE.md) file. +ESM C models are available immediately for academic and commercial use under a new license structure designed to promote openness and enable scientists and builders. You can find the high level take-away of the license structure in the [Licenses](#licenses) section of this page, and complete license details in [LICENSE.md](LICENSE.md). -You can use the following guides to start using ESM C models today, either [running the model locally](https://huggingface.co/EvolutionaryScale), [the Forge API](https://forge.evolutionaryscale.ai/) and [AWS SageMaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-iw2nbscescndm). +You can use the following guides to start using ESM C models today, either running the model locally, [the Forge API](https://forge.evolutionaryscale.ai/) and [AWS SageMaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-iw2nbscescndm). -### Using ESM C 300M and 600M via GitHub -ESM C model weights are stored on the HuggingFace hub under https://huggingface.co/EvolutionaryScale/. +### ESM C Local Models via GitHub +The code and weights for the ESM C 300M model are available under the Cambrian Open [license agreement](#licenses). The weights for the ESM C 600M model are available under the Cambrian Non-Commercial [license agreement](#licenses). + +When running the code below, a pytorch model will be instantiated locally on your machine, with the weights downloaded from the [HuggingFace hub](https://huggingface.co/EvolutionaryScale). ```py from esm.models.esmc import ESMC from esm.sdk.api import ESMProtein, LogitsConfig @@ -42,9 +46,13 @@ logits_output = client.logits( print(logits_output.logits, logits_output.embeddings) ``` -### Using ESM C 6B via Forge API +### ESM C via Forge API for Free Non-Commercial Use + +The ESM C model family, including ESMC 6B, are accessible via EvolutionaryScale Forge for free [non-commercial use](#licenses). +Apply for access and copy the API token from the console by first visiting https://forge.evolutionaryscale.ai. + +With the code below, a local python client talks to the model inference server hosted by EvolutionaryScale. -ESM C models, including ESMC 6B, are accessible via EvolutionaryScale Forge. You can request access and utilize these models through forge.evolutionaryscale.ai, as demonstrated in the example below. ```py from esm.sdk.forge import ESM3ForgeInferenceClient from esm.sdk.api import ESMProtein, LogitsConfig @@ -58,11 +66,14 @@ logits_output = forge_client.logits( print(logits_output.logits, logits_output.embeddings) ``` -### Using ESM C 6B via SageMaker +Remember to replace `` with your actual Forge access token. -ESM C models are also available on Amazon SageMaker. They function similarly to the ESM3 model family, and you can refer to the sample notebooks provided in this repository for examples. +### ESM C via SageMaker for Commercial Use -You'll need an admin AWS access to an AWS account to follow these instructions. To deploy, first we need to deploy the AWS package: +ESM C models are also available on Amazon SageMaker under the Cambrian Inference Clickthrough License Agreement. +Under this license agreement models are available for broad commercial use to commercial entities. + +You will need an admin AWS access to an AWS account to follow these instructions. To deploy, first we need to deploy the AWS package: 1. Find the ESM C model version you want to subscribe to. All of our offerings are visible [here](https://aws.amazon.com/marketplace/seller-profile?id=seller-iw2nbscescndm). 2. Click the name of the model version you are interested in, review pricing information and the end user license agreement (EULA), then click "Continue to Subscribe". @@ -77,6 +88,8 @@ The Sagemaker deployment of the model now lives on a dedicated GPU instance insi Make sure to remember to shut down the instance after you stop using it. Find the CloudFormation stack you created [here](https://us-east-1.console.aws.amazon.com/cloudformation/home), select it, and then click "Delete" to clean up all resources. After creating the endpoint, you can create a sagemaker client and use it the same way as a forge client. They share the same API. +The local python client talks to the Sagemaker endpoint you just deployed, which runs on an instance with a GPU to run model inference. + Ensure that the code below runs in an environment that has AWS credentials available for the account which provisioned SageMaker resources. Learn more about general AWS credential options [here](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-authentication.html#cli-chap-authentication-precedence). @@ -99,6 +112,12 @@ logits_output = sagemaker_client.logits( print(logits_output.logits, logits_output.embeddings) ``` +### ESM C Example Usage + +Look at [esmc_examples.py](./examples/esmc_examples.py) for the standard usage (extracting embeddings and model amino acid prediction). + +More coming soon. + ## ESM 3 [ESM3](https://www.evolutionaryscale.ai/papers/esm3-simulating-500-million-years-of-evolution-with-a-language-model) is a frontier generative model for biology, able to jointly reason across three fundamental biological properties of proteins: sequence, structure, and function. These three data modalities are represented as tracks of discrete tokens at the input and output of ESM3. You can present the model with a combination of partial inputs across the tracks, and ESM3 will provide output predictions for all the tracks. @@ -115,7 +134,7 @@ Here we present `esm3-open-small`. With 1.4B parameters it is the smallest and f ESM3-open is available under the [Cambrian non-commercial license agreement](https://www.evolutionaryscale.ai/policies/cambrian-non-commercial-license-agreement), as outlined in `LICENSE.md` (note: updated with ESM C release). Visit our [Discussions page](https://github.com/evolutionaryscale/esm/discussions) to get in touch, provide feedback, ask questions or share your experience with ESM3! -### Quickstart for ESM3-open +### Quickstart for ESM3-open ``` pip install esm @@ -153,20 +172,9 @@ protein.to_pdb("./round_tripped.pdb") ``` Congratulations, you just generated your first proteins with ESM3! -Let's explore some more advanced prompting with the help of our [notebooks and scripts](examples/). -`generate.ipynb` will walk through two prompting examples (scaffolding and secondary structure editing) using the open model: -[](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/examples/generate.ipynb) - -`gfp_design.ipynb` will walk through the more complex generation procedure we used to design esmGFP: -[](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/examples/gfp_design.ipynb) - -We also provide example scripts that show common workflows under `examples/`: - -- [local_generate.py](./examples/local_generate.py) shows how simple and elegant common tasks are: it shows folding, inverse folding and chain of thought generation, all by calling just `model.generate()` for iterative decoding. -- [seqfun_struct.py](./examples/seqfun_struct.py) shows direct use of the model as a standard pytorch model with a simple model `forward` call. - -### Forge: Access to larger ESM3 models +### EvolutionaryScale Forge: Access to larger ESM3 models + You can apply for beta access to the full family of larger and higher capability ESM3 models at [EvolutionaryScale Forge](https://forge.evolutionaryscale.ai). @@ -188,6 +196,21 @@ model: ESM3InferenceClient = esm.sdk.client("esm3-medium-2024-08", token=" +Let's explore some more advanced prompting with the help of our [notebooks and scripts](examples/). + +`generate.ipynb` will walk through two prompting examples (scaffolding and secondary structure editing) using the open model: +[](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/examples/generate.ipynb) + +`gfp_design.ipynb` will walk through the more complex generation procedure we used to design esmGFP: +[](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/examples/gfp_design.ipynb) + +We also provide example scripts that show common workflows under `examples/`: + +- [local_generate.py](./examples/local_generate.py) shows how simple and elegant common tasks are: it shows folding, inverse folding and chain of thought generation, all by calling just `model.generate()` for iterative decoding. +- [seqfun_struct.py](./examples/seqfun_struct.py) shows direct use of the model as a standard pytorch model with a simple model `forward` call. + ## Responsible Development EvolutionaryScale is a public benefit company. Our mission is to develop artificial intelligence to understand biology for the benefit of human health and society, through partnership with the scientific community, and open, safe, and responsible research. Inspired by the history of our field as well as [new principles and recommendations](https://responsiblebiodesign.ai/), we have created a Responsible Development Framework to guide our work towards our mission with transparency and clarity. @@ -202,4 +225,27 @@ The core tenets of our framework are With this in mind, we have performed a variety of mitigations for `esm3-sm-open-v1`, detailed in our [paper](https://www.evolutionaryscale.ai/papers/esm3-simulating-500-million-years-of-evolution-with-a-language-model) ## Licenses -The code and model weights of ESM3 and ESM C are available under a mixture of non-commercial and more permissive licenses, fully outlined in [LICENSE.md](LICENSE.md). +The code and model weights of ESM3 and ESM C are available under a mixture of non-commercial and permissive commercial licenses. +This summary provides a high-level overview. For complete license details, see [LICENSE.md](./LICENSE.md). + +### How can I access the models and which licenses apply? + +The models can be accessed in three different ways, each with its own licensing terms. + +1. **Code and weights** via GitHub and HuggingFace are available under either a [non-commercial](https://www.evolutionaryscale.ai/policies/cambrian-non-commercial-license-agreement) (ESM C 600M, ESM3-small-open) or an [open license](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement) (codebase, ESM C 300M). + 1. **Building with ESM encouraged**: You can use embeddings, model predictions, fine-tune the models and use components of both the models and code. We strongly encourage anyone to build on ESM C and ESM3! Just remember to maintain the same license terms and release under the ESM name. +2. **Free non-commercial inference API** via Forge. All models are available this way, with free credits granted to students and researchers. We want to enable academics under [non-commercial Terms of Use](https://www.evolutionaryscale.ai/policies/terms-of-use), which mirrors the non-commercial license. +3. **Paid commercial Inference API** for commercial use via SageMaker (Forge coming soon). All ESM C models are available this way to commercial entities for commercial use under a [clickthrough license agreement](https://www.evolutionaryscale.ai/policies/cambrian-inference-clickthrough-license-agreement) with few restrictions. + 1. In broad strokes: standard commercial use like developing molecules and developing downstream ML models and methods with the model is allowed, while training competing models on the API outputs is not. + 2. Note: For ESM3 commercial use, reach out to [bd@evolutionaryscale.ai](mailto:bd@evolutionaryscale.ai) + +### What changed with the release of ESM C? + +We introduced a [clickthrough license agreement](https://www.evolutionaryscale.ai/policies/cambrian-inference-clickthrough-license-agreement) to enable frictionless commercial use of ESM C. + +We introduced the new [Cambrian Open License](https://www.evolutionaryscale.ai/policies/cambrian-open-license-agreement) for ESM C 300M, and at the same time moved all code in the [`esm` repo](https://github.com/evolutionaryscale/esm) under that permissive license. + +The [Cambrian non-commercial license](https://www.evolutionaryscale.ai/policies/cambrian-non-commercial-license-agreement) is largely based on the original [ESM3 Community License Agreement](https://www.evolutionaryscale.ai/policies/community-license-agreement), but removed the clause that restricted drug development, added the naming requirement, and extended the meaning of “Derivative Work” to allow training on model outputs. Just remember to release models and methods built on ESM under the same license. +These changes are meant to remove potential gray areas and points of friction for researchers building with ESM. + +Finally, The ESM3-open-small model has been moved under the Cambrian non-commercial license.