Remove tools notebooks upport

This commit is contained in:
Neil Thomas
2025-04-08 00:52:47 +00:00
parent dc8d437db1
commit 494e2e1d5c
10 changed files with 18 additions and 452 deletions

View File

@@ -1,3 +1,7 @@
# ESM
[![Design a GFP](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/cookbook/tutorials/4_forge_generate.ipynb)
- [Installation ](#installation-)
- [ESM 3](#esm-3-)
- [Quickstart for ESM3 Open](#esm3-quickstart-)
@@ -39,10 +43,6 @@ ESM3-open, with 1.4B parameters, is the smallest and fastest model in the family
### Quickstart for ESM3-open <a name="esm3-quickstart"></a>
```
pip install esm
```
The weights are stored on HuggingFace Hub under [HuggingFace/EvolutionaryScale/esm3](https://huggingface.co/EvolutionaryScale/esm3).
```py
@@ -100,7 +100,11 @@ This enables a seamless transition from smaller and faster models, to our larges
### ESM3 Example Usage
<a name="esm3-example"></a>
Check out our [tutorials](./cookbook/tutorials) to learn how to use ESM3.
[Generating a novel GFP with chain of thought generation using ESM3](./cookbook/tutorials/3_gfp_design.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/cookbook/tutorials/3_gfp_design.ipynb)
[Advanced prompting with ESM3 input tracks](./cookbook/tutorials/4_forge_generate.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/cookbook/tutorials/4_forge_generate.ipynb)
## ESM C <a name="esm-c"></a>
[ESM Cambrian](https://www.evolutionaryscale.ai/blog/esm-cambrian) is a parallel model family to our flagship ESM3 generative models. While ESM3 focuses on controllable generation of proteins, ESM C focuses on creating representations of the underlying biology of proteins.
@@ -230,7 +234,7 @@ print(logits_output.logits, logits_output.embeddings)
### ESM C Example Usage
<a name="esm-c-example"></a>
Check out our [tutorials](./cookbook/tutorials) to learn how to use ESM C.
[Embedding a sequence using ESM C](./cookbook/tutorials/2_embed.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/evolutionaryscale/esm/blob/main/cookbook/tutorials/2_embed.ipynb)
## Responsible Development <a name="responsible-development"></a>

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Input tracks of `ESMProtein`\n",
"# [Tutorial 1](https://github.com/evolutionaryscale/esm/tree/main/cookbook/tutorials): Input tracks of `ESMProtein`\n",
"\n",
"ESM3 is a frontier generative model for biology, able to jointly reason across three fundamental biological properties of proteins: sequence, structure, and function. These three data modalities are represented as tracks of discrete tokens at the input and output of ESM3. You can present the model with a combination of partial inputs across the tracks, and ESM3 will provide output predictions for all the tracks."
]

View File

@@ -4,6 +4,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# [Tutorial 2](https://github.com/evolutionaryscale/esm/tree/main/cookbook/tutorials): Embedding with ESM C\n",
"\n",
"In this notebook we will see how to embed a batch of sequences using ESM C, as well as explore its different layers"
]
},

View File

@@ -6,7 +6,7 @@
"id": "zWXOAcBB8h3z"
},
"source": [
"# Design a GFP Candidate with ESM3\n",
"# [Tutorial 3](https://github.com/evolutionaryscale/esm/tree/main/cookbook/tutorials): Design a GFP Candidate with ESM3\n",
"\n",
"This notebook walks through the computational methods used to design esmGFP in [Hayes et al., 2024](https://doi.org/10.1101/2024.07.01.600583). esmGFP has similar brightness and spectral properties as GFPs found in nature despite being a far distance (58% identity) from known fluorescent proteins, but we also found many other bright new GFPs with similar or higher sequence identity. One can likely design a lot more new GFPs with the approach sketched in this notebook!\n",
"\n",

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# ESM3\n",
"# [Tutorial 4](https://github.com/evolutionaryscale/esm/tree/main/cookbook/tutorials): Generating with ESM3\n",
"\n",
"ESM3 is a frontier generative model for biology, able to jointly reason across three fundamental biological properties of proteins: sequence, structure, and function. These three data modalities are represented as tracks of discrete tokens at the input and output of ESM3. You can present the model with a combination of partial inputs across the tracks, and ESM3 will provide output predictions for all the tracks.\n",
"\n",

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Guided Generation with ESM3\n",
"# [Tutorial 5](https://github.com/evolutionaryscale/esm/tree/main/cookbook/tutorials): Guided Generation with ESM3\n",
"\n",
"Guided generation is a powerful tool that allows you to sample outputs out of ESM3 that maximize any kind of score function.\n",
"\n",
@@ -19,9 +19,7 @@
"In this notebook we will walk through a few examples to illustrate how to use guided generation. \n",
"\n",
"1. Guide towards high pTM for improved generation quality\n",
"\n",
"2. Generate a protein with no cysteine (C) residues\n",
"\n",
"3. Maximize protein globularity by minimizing the radius of gyration\n",
"\n"
]
@@ -129,7 +127,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## pTM Guided Generation\n",
"## Guide towards high pTM for improved generation quality\n",
"\n",
"Once your scoring function is defined and you have initialized your model you can create an `ESM3GuidedDecoding` instance to sample from it"
]

1
tools/README.md Normal file
View File

@@ -0,0 +1 @@
Our tools have migrated to Apps under [Forge](https://forge.evolutionaryscale.ai/). Go check them out!

View File

@@ -1,108 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "HC86rsLf-_Zt"
},
"source": [
"# Generation UI\n",
"\n",
"This is the most flexible notebook for generating protein sequences using the ESM3 model.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "ICGSD1Jo7zAb"
},
"outputs": [],
"source": [
"# @title Input API keys, then hit `Runtime` -> `Run all`\n",
"# @markdown Our hosted service that provides access to the full suite of ESM3 models.\n",
"# @markdown To utilize the Forge API, users must first agree to the [Terms of Service](https://forge.evolutionaryscale.ai/termsofservice) and generate an access token via the [Forge console](https://forge.evolutionaryscale.ai/console).\n",
"# @markdown The console also provides a comprehensive list of models available to each user.\n",
"\n",
"import os\n",
"\n",
"# @markdown ### Authentication\n",
"# @markdown Paste your token from the [Forge console](https://forge.evolutionaryscale.ai/console)\n",
"forge_token = \"\" # @param {type:\"string\"}\n",
"os.environ[\"FORGE_TOKEN\"] = forge_token\n",
"\n",
"# @markdown ### Model Selection\n",
"# @markdown Enter the model name from the [Forge console page](https://forge.evolutionaryscale.ai/console) that you would like to use:\n",
"model_name = \"esm3-medium-2024-03\" # @param {type:\"string\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "03ARpZRE_N39"
},
"outputs": [],
"source": [
"# @title Install dependencies\n",
"import os\n",
"\n",
"os.system(\"pip install git+https://github.com/evolutionaryscale/esm\")\n",
"os.system(\"pip install pydssp pygtrie dna-features-viewer py3dmol\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "x1MUAuDWBAel"
},
"outputs": [],
"source": [
"# @title Create Generation UI\n",
"# @markdown If running on Google colab, it is recommended to use the light theme and select the \"View output fullscreen\" option in the cell toolbar for the best experience\n",
"\n",
"from esm.widgets.utils.clients import get_forge_client\n",
"from esm.widgets.utils.types import ClientInitContainer\n",
"from esm.widgets.views.generation import create_generation_ui\n",
"\n",
"client_container = ClientInitContainer()\n",
"create_generation_ui(get_forge_client(model_name))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "default",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -1,183 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "4l-TA3Od1JFs"
},
"source": [
"# ESM3 Inverse Folding Notebook\n",
"\n",
"This notebook is intended to be used as a tool for inverse folding using the ESM3 model.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "1TwEAW_LSNZZ"
},
"outputs": [],
"source": [
"# @title Input API keys, then hit `Runtime` -> `Run all`\n",
"# @markdown Our hosted service that provides access to the full suite of ESM3 models.\n",
"# @markdown To utilize the Forge API, users must first agree to the [Terms of Service](https://forge.evolutionaryscale.ai/termsofservice) and generate an access token via the [Forge console](https://forge.evolutionaryscale.ai/console).\n",
"# @markdown The console also provides a comprehensive list of models available to each user.\n",
"\n",
"import os\n",
"\n",
"# @markdown ### Authentication\n",
"# @markdown Paste your token from the [Forge console](https://forge.evolutionaryscale.ai/console)\n",
"forge_token = \"\" # @param {type:\"string\"}\n",
"os.environ[\"ESM_API_KEY\"] = forge_token\n",
"\n",
"# @markdown ### Model Selection\n",
"# @markdown Enter the model name from the [Forge console page](https://forge.evolutionaryscale.ai/console) that you would like to use:\n",
"model_name = \"esm3-medium-2024-08\" # @param {type:\"string\"}\n",
"\n",
"# @markdown ### Input Structure\n",
"pdb_code = \"\" # @param {type:\"string\"}\n",
"chain = \"detect\" # @param {type:\"string\"}\n",
"# @markdown Enter PDB code or leave blank to upload file\n",
"# @markdown Specify a chain if uploading a complex\n",
"\n",
"# @markdown ### Design Parameters\n",
"temperature = 0.1 # @param {type:\"slider\", min:0.0, max:1.0, step:0.01}\n",
"num_sequences = 8 # @param {type:\"integer\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "_942E63WS8-U"
},
"outputs": [],
"source": [
"# @title Install dependencies\n",
"import os\n",
"\n",
"os.system(\"pip install git+https://github.com/evolutionaryscale/esm\")\n",
"os.system(\n",
" \"pip install pydssp pygtrie dna-features-viewer py3dmol nest-asyncio ipywidgets\"\n",
")\n",
"\n",
"import nest_asyncio # noqa: E402\n",
"\n",
"nest_asyncio.apply()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "jXl61b-zTIsp"
},
"outputs": [],
"source": [
"# @title Run Inverse Folding\n",
"import numpy as np\n",
"from esm.sdk.api import ESMProtein, ESMProteinError, GenerationConfig\n",
"from esm.widgets.utils.clients import get_forge_client\n",
"from google.colab import files\n",
"from IPython.display import HTML\n",
"\n",
"\n",
"def get_pdb(pdb_code=\"\"):\n",
" if pdb_code is None or pdb_code == \"\":\n",
" upload_dict = files.upload()\n",
" pdb_string = upload_dict[list(upload_dict.keys())[0]]\n",
" with open(\"tmp.pdb\", \"wb\") as out:\n",
" out.write(pdb_string)\n",
" return \"tmp.pdb\"\n",
" else:\n",
" os.system(f\"wget -qnc https://files.rcsb.org/view/{pdb_code}.pdb\")\n",
" return f\"{pdb_code}.pdb\"\n",
"\n",
"\n",
"print(\"Loading structure...\")\n",
"pdb_path = get_pdb(pdb_code)\n",
"\n",
"# Create protein object\n",
"protein = ESMProtein.from_pdb(pdb_path, chain_id=chain)\n",
"protein.sequence = None\n",
"\n",
"print(\"Running inverse folding...\")\n",
"client = get_forge_client(model_name)\n",
"generations = client.batch_generate(\n",
" inputs=[protein] * num_sequences,\n",
" configs=[GenerationConfig(track=\"sequence\", temperature=temperature)]\n",
" * num_sequences,\n",
")\n",
"\n",
"if isinstance(protein, ESMProteinError):\n",
" raise RuntimeError(f\"Error: {str(protein)}\")\n",
"\n",
"errors: list[ESMProteinError] = []\n",
"sequences: list[str] = []\n",
"for i, protein in enumerate(generations):\n",
" if isinstance(protein, ESMProteinError):\n",
" errors.append((i, protein))\n",
" else:\n",
" sequences.append(protein.sequence)\n",
"\n",
"\n",
"def calculate_conservation_scores(sequences: list[str]) -> np.ndarray:\n",
" array = np.array([list(seq) for seq in sequences], dtype=\"S1\")\n",
" array = array.view(np.uint8) - ord(\"A\")\n",
"\n",
" # Create a 2D array of counts\n",
" max_range = 26\n",
" counts = np.zeros((max_range + 1, array.shape[1]), dtype=int)\n",
" for col in range(array.shape[1]):\n",
" count = np.bincount(array[:, col], minlength=max_range + 1)\n",
" counts[:, col] = count\n",
" counts = counts.T\n",
"\n",
" # Calculate entropy (-sum(p log p))\n",
" probabilities = counts / counts.sum(axis=1, keepdims=True)\n",
" entropy = -np.sum(probabilities * np.log(probabilities + 1e-9), axis=1)\n",
"\n",
" # Convert to conservation score (1 - normalized entropy)\n",
" max_entropy = np.log(256)\n",
" # Magic constant to make displaying non-conserved residues more apparent\n",
" conservation_scores = np.maximum(0, 0.5 - (entropy / max_entropy)) / 0.5\n",
"\n",
" return conservation_scores\n",
"\n",
"\n",
"def display_sequences(sequences: list[str]):\n",
" conservation_scores = calculate_conservation_scores(sequences)\n",
" html_output = '<pre style=\"line-height:1.0;letter-spacing:3px;font-family:monospace;margin:0;padding:0\">'\n",
" for sequence in sequences:\n",
" for j, residue in enumerate(sequence):\n",
" # Add padding for alignment and color the background\n",
" html_output += f'<span style=\"background-color: rgba(9, 121, 105,{conservation_scores[j]})\">{residue}</span>'\n",
" html_output += \"<br>\"\n",
" html_output += \"</pre>\"\n",
" display(HTML(html_output))\n",
"\n",
"\n",
"display_sequences(sequences)\n",
"\n",
"for i, error in errors:\n",
" print(f\"Error code {error.error_code} at index {i}: {error.error_msg}\")"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -1,148 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "wO0XaARp1Ghc"
},
"source": [
"# ESM3 Prediction Notebook\n",
"\n",
"This notebook is intended to be used as a tool for quick and easy protein property prediction using the ESM3 model.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "0zITyTcwKK2o"
},
"outputs": [],
"source": [
"# @title Input API keys, then hit `Runtime` -> `Run all`\n",
"# @markdown Our hosted service that provides access to the full suite of ESM3 models.\n",
"# @markdown To utilize the Forge API, users must first agree to the [Terms of Service](https://forge.evolutionaryscale.ai/termsofservice) and generate an access token via the [Forge console](https://forge.evolutionaryscale.ai/console).\n",
"# @markdown The console also provides a comprehensive list of models available to each user.\n",
"\n",
"import os\n",
"\n",
"# @markdown ### Authentication\n",
"# @markdown Paste your token from the [Forge console](https://forge.evolutionaryscale.ai/console)\n",
"forge_token = \"\" # @param {type:\"string\"}\n",
"os.environ[\"ESM_API_KEY\"] = forge_token\n",
"\n",
"# @markdown ### Model Selection\n",
"# @markdown Enter the model name from the [Forge console page](https://forge.evolutionaryscale.ai/console) that you would like to use:\n",
"model_name = \"esm3-medium-2024-08\" # @param {type:\"string\"}\n",
"\n",
"# markdown ### Sequence\n",
"# @markdown Please use '|' to delimit a multimer sequence.\n",
"sequence = \"MSHHWGYGKHNGPEHWHKDFPIAKGERQSPVDIDTHTAKYDPSLKPLSVSYDQATSLRILNNGHAFNVEFDDSQDKAVLKGGPLDGTYRLIQFHFHWGSLDGQGSEHTVDKKKYAAELHLVHWNTKYGDFGKAVQQPDGLAVLGIFLKVGSAKPGLQKVVDVLDSIKTKGKSADFTNFDPRGLLPESLDYWTYPGSLTTPPLLECVTWIVLKEPISVSSEQVLKFRKLNFNGEGEPEELMVDNWRPAQPLKNRQIKASFK\" # @param {type:\"string\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "CryS18DaKgjP"
},
"outputs": [],
"source": [
"# @title Install dependencies\n",
"import os\n",
"\n",
"os.system(\"pip install git+https://github.com/evolutionaryscale/esm\")\n",
"os.system(\"pip install pydssp pygtrie dna-features-viewer py3dmol ipywidgets\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "ej6cllESKj5S"
},
"outputs": [],
"source": [
"# @title Run Prediction and Display Results\n",
"from esm.sdk.api import ESMProtein, ESMProteinError, GenerationConfig\n",
"from esm.widgets.components.results_visualizer import create_results_visualizer\n",
"from esm.widgets.utils.clients import get_forge_client\n",
"from ipywidgets import widgets\n",
"\n",
"# Initialize client\n",
"client = get_forge_client(model_name)\n",
"\n",
"# Create protein object\n",
"protein = ESMProtein(sequence=sequence)\n",
"\n",
"# Predict all tracks\n",
"tracks = [\"structure\", \"secondary_structure\", \"sasa\", \"function\"]\n",
"\n",
"output = widgets.Output()\n",
"display(output)\n",
"with output:\n",
" print(\"Starting predictions...\")\n",
"\n",
" for track in tracks:\n",
" print(f\"Predicting {track}...\")\n",
" protein = client.generate(\n",
" protein, config=GenerationConfig(track=track, temperature=0.01)\n",
" )\n",
" if isinstance(protein, ESMProteinError):\n",
" raise RuntimeError(f\"Error: {str(protein)}\")\n",
"\n",
" # Create result visualizers\n",
" structure_results = create_results_visualizer(\n",
" modality=\"structure\", samples=[protein], items_per_page=1, include_title=False\n",
" )\n",
"\n",
" secondary_structure_results = create_results_visualizer(\n",
" modality=\"secondary_structure\",\n",
" samples=[protein],\n",
" items_per_page=1,\n",
" include_title=False,\n",
" )\n",
"\n",
" sasa_results = create_results_visualizer(\n",
" modality=\"sasa\", samples=[protein], items_per_page=1, include_title=False\n",
" )\n",
"\n",
" function_results = create_results_visualizer(\n",
" modality=\"function\", samples=[protein], items_per_page=1, include_title=False\n",
" )\n",
"\n",
" output.clear_output(wait=True)\n",
"\n",
" # Create tabbed interface\n",
" results_ui = widgets.Tab(\n",
" children=[\n",
" structure_results,\n",
" secondary_structure_results,\n",
" sasa_results,\n",
" function_results,\n",
" ]\n",
" )\n",
" results_ui.set_title(0, \"Structure\")\n",
" results_ui.set_title(1, \"Secondary Structure\")\n",
" results_ui.set_title(2, \"SASA\")\n",
" results_ui.set_title(3, \"Function\")\n",
" display(results_ui)"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}