Skip to content

Commit 62af4cc

Browse files
committed
Edited README and replaced ReactomeContentService4R package with direct API call in submodule 3
1 parent 56676a0 commit 62af4cc

7 files changed

Lines changed: 444 additions & 189 deletions

Azure/README.md

Lines changed: 40 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -16,46 +16,68 @@ Follow the steps highlighted [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/
1616
The Machine Learning Studio Jupyter Notebooks already have an R kernel available which we will use and install our packages to.
1717

1818

19-
Follow the steps highlighted in part two (2. Spin up Instance from a Container) of [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateAzureMLNotebooks.md) to create a new notebook instance in Machine Learning Studio. Follow steps 1-8, in step 5 select region us-east4 (Northern Virginia) and be especially careful to use custom container `us-east4-docker.pkg.dev/nih-cl-shared-resources/nigms-sandbox/nigms-vertex-r` in step 6 under the Docker container image prompt. In step 7 under the Machine type tab, select n1-standard-4 from the dropdown box. In step 8, be careful to **Enable Idle Shutdown**. After creating the notebook, you can click on **OPEN JUPYTERLAB**.
19+
Follow the steps highlighted [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToCreateAzureMLNotebooks.md) to create a Azure Machine Learning workspace and a new notebook instance that workspace. Follow steps 1-6, when creating your instance. Idle shutdown should automatically be enabled for 1 hour. You have two options to interact with these tutorials:
20+
21+
Option 1 uses the 'Notebook' tab in the left-hand menu to view an Azure UI version of interacting with notebooks. If you are using option 1 after navigating to **Notebooks**, start your instance, and then you can run your notebooks.
22+
23+
Option 2 allows you to enter in a JupyterLab setting to interact with notebooks. If you are using option 2 after creating the instance to run the notebooks under 'Compute', you can click **Start**. Then under 'Applications' click **JUPYTERLAB**.
2024

2125
### Downloading and Running Tutorial Files
2226

23-
Now that you have successfully created your virtual machine, you will be directed to Jupyterlab screen. The next step is to import the notebooks and start the course.
24-
This can be done by selecting the __Git__ from the top menu in Jupyterlab, and choosing the __Clone a Repository__
27+
Now that you have successfully created your virtual machine, you can use either the Azure Notebook UI screen or Jupyterlab screen to interact with the tutorials. The next step is to import the notebooks and start the course.
28+
29+
This can be done in either two ways:
30+
31+
**Azure Notebook UI Setting**
32+
Open the terminal, type in `git clone https://github.com/NIGMS/Consensus-Pathway-Analysis-in-the-Cloud.git`, then hit enter.
33+
34+
![](./images/Intro/Azure_clone_repo.png)
35+
36+
**JupyterLab Setting**
37+
Selecting the __Git__ from the top menu in Jupyterlab, and choosing the __Clone a Repository__
2538
option. Next, you can copy and paste in the link of the repository: `https://github.com/NIGMS/Consensus-Pathway-Analysis-in-the-Cloud.git` and click __Clone__.
2639

2740
![](./images/Intro/clone.png)
2841

29-
This will download the repository to your JupyterLab folder. All tutorial files for the five submodules are in Jupyter format with a .ipynb extension. Double-click each file to view the lab content and run the code. This will open the Jupyter file in a Jupyter notebook. From here, you can run each section, or "cell", of the code, one by one, by pressing the "Play" button in the menu above.
42+
This will download the repository to your JupyterLab folder. All tutorial files for the five submodules are in Jupyter format with a .ipynb extension. Double-click each file to view its content and select your kernel in the top left corner to run the notebook in a specific conda environment (e.g., R, Python, etc.). From here, you can run each section, or "cell", of the code, one by one, by pressing the "Play" button next to the cell for the Azure Notebook UI or above for the JupyterLab setting.
43+
44+
**Azure Notebook UI Setting**
45+
46+
![](./images/SettingGC/Azure_run_cell.png)
47+
48+
**JupyterLab Setting**
3049

3150
![](./images/SettingGC/Run_Cell.png)
3251

33-
Some 'cells' of code take longer for the computer to process than others. You will know a cell is running when a cell has an asterisk next to it \[\*\]. When the cell finishes running, that asterisk will be replaced with a number which represents the order that cell was run in. You can now explore the tutorials by running the code in each, from top to bottom. Look at the 'workflows' section below for a short description of each tutorial.
52+
Some 'cells' of code take longer for the computer to process than others. You will know a cell is running when a cell has an asterisk next to it \[\*\]. When the cell finishes running, that asterisk will be replaced with a number, which represents the order the cell was run. You can now explore the tutorials by running the code in each, from top to bottom. Look at the 'workflows' section below for a short description of each tutorial.
53+
54+
Jupyter is a powerful tool with many useful features. For more information on how to use Jupyter, we recommend searching for Jupyter tutorials and literature online.
3455

35-
Jupyter is a powerful tool, with many useful features. For more information on how to use Jupyter, we recommend
36-
searching for Jupyter tutorials and literature online.
56+
### Stopping Your Instance/Virtual Machine
3757

38-
### Stopping Your Virtual Machine
58+
When you are finished running code, you should turn off your notebook by turning off the instance attached to your notebook to prevent unnecessary billing or resource use by navigating to **Compute** in the side menu, selecting your instance, then clicking the __STOP__ button.
3959

40-
When you are finished running code, you should turn off your notebook to prevent unnecessary billing or resource use by checking your notebook and pushing the __STOP__ button.
60+
![](./images/Intro/stop_instance.png)
4161

4262
## Creating Azure Blob Storage
4363
In this section, we will describe the steps to create Azure Blob Storage to store data generated during analysis. The storage can be created via GUI or using the command line.
4464

4565
Azure Storage is comprised of three parts: accounts, containers, and blobs, as seen in the image below. Accounts contain containers which act as folders. Containers hold files, aka blobs and folders or subfolders.
4666

47-
![](./images/Module1/azure_blob_diagram.png)
67+
<img src="./images/Module1/azure_blob_diagram.png" alt="Alt Text" width="300" height="200">
4868

4969
When you create the Azure Machine Learning service, it will automatically create a storage account under the name of the workspace you created before. This allows you to either use the previously made storage account or create a new one. Instructions for both options are outlined below.
70+
5071
**Option 1:Using a Premade Storage Accounts**
5172
1. On the webpage of your Azure account, find and select `Azure Storage Accounts`.
5273
2. Find and click the storage account that is labeled the same as your Machine learning service.
53-
![](./images/Module1/Data_CloudStorageAccount.png)
74+
![|](./images/Module1/Data_CloudStorageAccount.png)
5475
4. Navigate to `Containers` on the left side menu.
5576
5. The console will list all the containers and files created as shown in the figure below. Some of these containers and files are automatically created when we create our Machine Learning Service:
5677
![](./images/Module1/Data_CloudStorageContainer.png)
5778
6. Click `+ Container` to create and name a new container.
58-
![](./images/Module1/Data_CloudContainerName.png)
79+
<img src="./images/Module1/Data_CloudContainerName.png" alt="Alt Text" width="300" height="200">
80+
5981
7. Click on the new container, then click `Upload` to upload files.
6082
![](./images/Module1/Data_CloudBlobUpload.png)
6183

@@ -91,15 +113,10 @@ You can learn more about `az storage` commands by reading the article [here](htt
91113
## Azure Architecture
92114

93115
# ![](./images/Intro/architecture.png)
94-
The figure above shows the architecture of the learning module with Azure infrastructure. First, we will create
95-
an Machine Learning Studio Jupyter Notebook with R kernel. The code and instructions for each submodule are presented in a separate Jupyter Notebook.
96-
User can either upload the Notebooks to the Machine Learning Studio Notebooks or clone from the project repository. Then, users can execute
97-
the code directly in the Notebook. In our learning course, the submodule 01 will download data from the public repository (e.g., GEO database)
98-
for preprocessing and save the processed data to a local file in Machine Learning Studio and to the user's Azure Storage Container. The output
99-
of the submodule 01 will be used as inputs for all other submodules. The outputs of the submodules 02, 03, and 04 will be saved to
100-
local repository in Machine Learning Studio Notebooks and the code to copy them to the user's cloud storage is also included.
116+
The figure above shows the architecture of the learning module with Azure infrastructure. First, we will create a Machine Learning Studio Jupyter Notebook with the premade R kernel. The code and instructions for each submodule are presented in a separate Jupyter Notebook.
117+
The user can either upload the Notebooks to the Machine Learning Studio Notebooks or clone from the project repository. Then, users can execute
118+
the code directly in the Notebook. In our learning course, the submodule 01 will download data from the public repository (e.g., the GEO database)
119+
for preprocessing and then save the processed data to a local file in the notebook and to the user's Azure Storage Container. The output
120+
of the submodule 01 will be used as the input for all other submodules. The outputs of the submodules 02, 03, and 04 will be saved to the
121+
local repository in Machine Learning Studio Notebooks, and the code to copy them to the user's cloud storage is also included.
101122
<!-- #endregion -->
102-
103-
```python
104-
105-
```

Azure/Submodule01-ProcessingExpressionData.ipynb

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -108,32 +108,40 @@
108108
"To run this tutorial, let's first download all of the packages needed. For Jupyter Notebooks in Azure Machine Learning Studio, we must specify our library as `/home/azureuser` otherwise we can run into errors."
109109
]
110110
},
111+
{
112+
"cell_type": "markdown",
113+
"id": "9d4c35b6-56d6-4b08-b612-59cf31244818",
114+
"metadata": {},
115+
"source": [
116+
"Add our library to our path to easily load our packages."
117+
]
118+
},
111119
{
112120
"cell_type": "code",
113121
"execution_count": null,
114-
"id": "0aca1008-428a-4303-9c88-07e54af1dced",
122+
"id": "7243bc28-43ca-439a-86bc-cd0bc4df3d4a",
115123
"metadata": {},
116124
"outputs": [],
117125
"source": [
118-
"install.packages(\"BiocManager\", lib = \"/home/azureuser\")"
126+
".libPaths(\"/home/azureuser\")"
119127
]
120128
},
121129
{
122130
"cell_type": "markdown",
123-
"id": "9d4c35b6-56d6-4b08-b612-59cf31244818",
131+
"id": "ebb8fa4c-7d3d-4018-98e4-f1085f3d6d07",
124132
"metadata": {},
125133
"source": [
126-
"Add our libaray to our path to easily load our packages."
134+
"Install BiocManager."
127135
]
128136
},
129137
{
130138
"cell_type": "code",
131139
"execution_count": null,
132-
"id": "7243bc28-43ca-439a-86bc-cd0bc4df3d4a",
140+
"id": "0aca1008-428a-4303-9c88-07e54af1dced",
133141
"metadata": {},
134142
"outputs": [],
135143
"source": [
136-
".libPaths(\"/home/azureuser\")"
144+
"install.packages(\"BiocManager\", lib = \"/home/azureuser\")"
137145
]
138146
},
139147
{
@@ -161,7 +169,7 @@
161169
"metadata": {},
162170
"outputs": [],
163171
"source": [
164-
"BiocManager::install(c('GEOquery', 'hgu133plus2.db', 'limma', 'edger', 'topgo', 'go.db', 'keggrest', 'reactomecontentservice4r', 'fgsea', 'deseq2', 'safe', 'annotationdbi', 'keggdzpathwaysgeo'))"
172+
"BiocManager::install(c('GEOquery', 'hgu133plus2.db', 'org.Hs.eg.db'))"
165173
]
166174
},
167175
{
@@ -474,7 +482,7 @@
474482
"Alternately, users can also upload their data to Azure Storage. The data may be lost after users delete the Machine Learning Studio Instance, so storing it in Azure Storage allows users to use the data anytime they like. \n",
475483
"Azure Storage is comprised of three parts: accounts, containers, and blobs, as seen in the image below. Accounts contain containers which act as folders. Containers hold files, aka blobs and folders or subfolders.\n",
476484
"\n",
477-
"![](./images/Module1/azure_blob_diagram.png)\n",
485+
"<img src=\"./images/Module1/azure_blob_diagram.png\" alt=\"Alt Text\" width=\"300\" height=\"200\">\n",
478486
"\n",
479487
"When you create the Azure Machine Learning service, it will automatically create a storage account under the name of the workspace you created before. This allows you to either use the previously made storage account or create a new one. Instructions for both options are outlined below.\n",
480488
"**Option 1:Premade Storage Accounts**\n",
@@ -485,7 +493,8 @@
485493
"5. The console will list all the containers and files created as shown in the figure below. Some of these containers and files are automatically created when we create our Machine Learning Service:\n",
486494
"![](./images/Module1/Data_CloudStorageContainer.png)\n",
487495
"6. Click `+ Container` to create and name a new container.\n",
488-
"![](./images/Module1/Data_CloudContainerName.png)\n",
496+
"<img src=\"./images/Module1/Data_CloudContainerName.png\" alt=\"Alt Text\" width=\"300\" height=\"200\">\n",
497+
"\n",
489498
"7. Click on the new container, then click `Upload` to upload files.\n",
490499
"![](./images/Module1/Data_CloudBlobUpload.png)\n",
491500
"\n",
@@ -867,14 +876,6 @@
867876
"metadata": {},
868877
"outputs": [],
869878
"source": [
870-
"# Install the genome wide annotation database for human\n",
871-
"suppressMessages({\n",
872-
" suppressWarnings({\n",
873-
" if (!require(\"BiocManager\", quietly = TRUE))\n",
874-
" install.packages(\"BiocManager\")\n",
875-
" BiocManager::install(\"org.Hs.eg.db\")\n",
876-
" })\n",
877-
"})\n",
878879
"# Import the annotation database\n",
879880
"library(org.Hs.eg.db)\n",
880881
"\n",

Azure/Submodule02-DifferentialAnalysis.ipynb

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1993,14 +1993,6 @@
19931993
"source": [
19941994
"sessionInfo()"
19951995
]
1996-
},
1997-
{
1998-
"cell_type": "code",
1999-
"execution_count": null,
2000-
"id": "f8f68f05-db10-417f-b513-8156737482ab",
2001-
"metadata": {},
2002-
"outputs": [],
2003-
"source": []
20041996
}
20051997
],
20061998
"metadata": {

0 commit comments

Comments
 (0)