---
title: "Machine Learning"
author: "CS542 Peter Chin"
date: "3/7/2019"
output: html_document
---
<style type="text/css">

body{ /* Normal  */
  font-size: 12px;
}
td {  /* Table  */
  font-size: 8px;
}
code.r{ /* Code block */
    font-size: 10px;
    background-color: #EAEAEA; 
}
pre { /* Code block - determines code spacing between lines */
    font-size: 12px;
     background-color:#EAEAEA;
}
</style>

<br><br>

**Software to connect to the remote server:**<br>

* *Windows:*  MobaXterm: ( [https://mobaxterm.mobatek.net/](https://mobaxterm.mobatek.net/) )<br>
* *Apple OS X:*  Terminal
* *Linux:*  Terminal


**Software to enable graphics (X-Forwarding):**<br>

* *Windows:*  MobaXterm: ( [https://mobaxterm.mobatek.net/](https://mobaxterm.mobatek.net/) )<br>
* *Apple OS X:*  - xQuartz( [https://www.xquartz.org/](https://www.xquartz.org/) )<br>
*Note:* After xQuartz is installed, please logout of your computer and login back again.
<br><br>

**SFTP Clients for File Transfer:**<br>

* *Windows:* MobaXtern, FileZilla 
* *Apple OS X:* Cyberduck, Fetch
<br><br>
More Information about ssh and sftp clients for connecting to the SCC can be found on ( [RCS Website](http://www.bu.edu/tech/support/research/system-usage/getting-started/get-started-file-transfer/) )

## Connecting to the SCC

Use `scc1` or `scc2` login node to connect to the *Shared Computing Cluster*: <br>`ssh username@scc2.bu.edu`


An example for user *ktrn*: 
```{bash eval=FALSE}
[local prompt] > ssh    ktrn@scc2.bu.edu  # Windows
[local prompt] > ssh -Y ktrn@scc2.bu.edu  # Mac
[local prompt] > ssh -X ktrn@scc2.bu.edu  # Linux
```

<br><br>
Check if graphics works (X-forwarding is enabled):
```{bash eval=FALSE}
[scc2 ~] > xclock &
```
You should see a pop-up window with clock in it:<br><br>
```{r, out.width="250px", echo=FALSE}
knitr::include_graphics("images/xclock.png")
```

<br>
If you do not see clock:<br>
*Windows:* make sure Xserver icon on the op-left side of MobaXterm is green.<br>
*Apple OS X:* make sure you logout of your computer after you installed xQuartz, start xQuartx (you should see xQuartz icon on the bottom of your screen) and that you use `-Y` with your ssh command when you login to the SCC
<br><br>

## File Transfer

There are a number of ways you can transfer files to the SCC from a local computer and back. See directions on our website:<br>
[http://www.bu.edu/tech/support/research/system-usage/getting-started/get-started-file-transfer/](http://www.bu.edu/tech/support/research/system-usage/getting-started/get-started-file-transfer/)
<br><br>
Example of scp command (from a local terminal window). Transfer a file to the home directory on the SCC:
```{bash eval=FALSE}
[local prompt] date > date.txt
[local prompt] scp date.txt username@scc2.bu.edu:.
```
<br>


To download a file from a website, use wget command, i.e.:
```{bash eval=FALSE}
[scc2 ~] wget http://rcs.bu.edu/classes/DeepLearning/images/scc.jpg
```
<br>
To view the downloaded image:
```{bash eval=FALSE}
[scc2 ~] display scc.jpg
```
<br>
```{r, out.width="250px", echo=FALSE}
knitr::include_graphics("images/scc.jpg")
```
<br><br>

## Home Directories
On the SCC, each user has a 10 GB home directory which is backed up nightly and protected by Snapshots. Additional quota is not available for home directories. To check the home directory quota, use the quota -s command:
```{bash eval=FALSE}
[scc2 ~] quota -s
```
```{bash eval=TRUE, echo=FALSE, comment=""}
quota -s ktrn
```
Home directories are private to the user. The permissions are set that only the owner has permissions to view, modify or execute the files in the home directory. Home directories should NOT be used for the production work. Once quota is reached, your jobs will fail and many programs will not run (or even start).
<br><br>

## Project Directories

You can view wich projects you belong to executing command *groups*. You might belong to one or more SCC projects. The first project on your list is your default project.


```{bash eval=FALSE}
[scc2 ~] groups
cs542sp
```

When working on the assignments for this class please use *cs542sp* project.

Each SCC project has its *project space*. Use *pquota* command to see the directories associated with the project and their sizes:
```{bash, comment=""}
pquota cs542sp
```

We will be using *projectnb* partition for the class:

```{bash, comment="", eval=FALSE}
cd /projectnb/cs542sp    # change directory to the project directory
ls -l                  # check the content of the directory
```

Each memomber of cs542 class needs to create his/hew own subdirectory and work inside this directory
```{bash, comment="", eval=FALSE}
mkdir koleinik     # create subdirectory koleinik
ls -l              # check what folders are there

cd koleinik        # change current directory to be the one you just created
pwd                # view the current directory path
```
<br><br>

## SCC Text editors
SCC has all standard Linux editors like *emacs*, *vi* (*vim*, *gvim*) and *nano*. There is also a note-pad like editor *gedit*
<br><br>

## Software on the SCC
The module package is available on the Shared Computing Cluster, allowing users to access non-standard tools or alternate versions of standard packages. This is also an alternative way to configure your environment as required by certain packages. You can read more about *module* command usage on our webpage:<br> 
[http://www.bu.edu/tech/support/research/software-and-programming/software-and-applications/modules/](http://www.bu.edu/tech/support/research/software-and-programming/software-and-applications/modules/)

To view all available modules:
```{bash eval=FALSE}
module avail
```

To list all available version for a particular package:
```{bash comment="",  eval=FALSE}
module avail python
```
<br><br>
```{bash comment="",  eval=FALSE}
module avail anaconda
```
<br><br>

## Work with Python

To select a particular version of python:

<pre>
module load python/3.6.2
</pre>

To view which modules are loaded:
```{bash comment="",  eval=FALSE}
module list

```

To start Jupyter, please close any firefox browser you have opened locally and then type:
```{bash eval=FALSE}
jupyter notebook

```
To run Jupyter notebook from you local browser, please see "Running Jupyter Notebook" section at the bottom of this page.

SCC has spyder installed as well, but it might be a slow environment to work on the SCC.
Another alternative is to write your program in an editor (or simple IDE like geany) and execute your code at the prompt:

<pre>
# File: hello.py
a = 2+3
print("Hello CS542!\n","a=",a)
</pre>

```{bash eval=TRUE}
module load python/3.6.2
python hello.py
```
<br><br>

## Installing Python packages
We have many popular packages installed in the System. If you need some additional package to be added you can ask us to istall it system-wide or you can try to install it into your user evironment. You can see some information about installing Python packages on the SCC on our website:<br>
[http://www.bu.edu/tech/support/research/software-and-programming/common-languages/python/install-packages/](http://www.bu.edu/tech/support/research/software-and-programming/common-languages/python/install-packages/)
<br><br>

## SCC Batch System
<br>

### Running an interactive job
Login nodes are most suitable for code development and debugging. Non-interactive batch processing is an environment for running jobs that take more than a few minutes and which require no interactions between the user and the application software. There are occasions where longer duration jobs that require user interaction may arise or additional resources (like GPUs) are needed. SCC login nodes do not have GPUs.
Use *qrsh* command with appropriate options to start an interactive job.

An example workflow for the interactive job is below. Here we request 1 CPU and 1 GPU with compute capability of at least 3.5 (which is suitable for tensorflow jobs):
```{bash eval=FALSE}
qrsh -P cs542sp -l gpus=1 -l gpu_c=3.5 -now n


cd /projectnb/cs542sp/koleinik    # change working directory
module load python/3.6.2          # load all modules (python and possibly tenslorflow)
module load tensorflow/r1.10
jupyter notebook                # start jupyter in a local firefox browser or other Python environment
```

Once you finish with running your interactive job, execute *exit* or *logout* to free up the resources.
<br><br>

### Running a batch job
In any text editor create a file which will submit your code to the batch system. A simple example of a file that submits a python job may look like this:

<pre>
#!/bin/bash -l

#Specify project
#$ -P cs542sp

#Request appropriate time (default 12 hours; gpu jobs time limit - 2 days (48 hours), cpu jobs - 30 days (720 hours) )
#$ -l h_rt=12:00:00

#Send an email when the job is done or aborted (by default no email is sent)
#$ -m e

# Give job a name
#$ -N hello

#$ Join output and error streams into one file
#$ -j y


#load appropriate envornment
module load python/3.6.2


#execute the program
python hello.py
</pre>
<br><br>

## Tensorflow
We have a number of versions available for tensorflow. All recent versions have *cpu* and *gpu* options. If you plan to run tensorflow using GPUs you need to load gpu-enabled version, for the jobs that do not require GPUs, load cpu version of tensorflow. You can work with either python2 or python3 versions.

```{bash comment=""}
module avail tensorflow
```

Depending on the version tensorflow requires a number of dependencies to be loaded first. 
For example if you plan to work with python-3 your *module load* commands will look like:
<pre>
module load python/3.6.2
module load tensorflow/r1.10
</pre>


To submit a job that requires a node with a GPU, your submition script might look like:
<pre>
#!/bin/bash -l

#Specify project
#$ -P cs542sp

#Request appropriate time (default 12 hours; gpu jobs time limit - 2 days (48 hours), cpu jobs - 30 days (720 hours) )
#$ -l h_rt=12:00:00

#Send an email when the job is done or aborted (by default no email is sent)
#$ -m e

# Give job a name
#$ -N hello

#$ Join output and error streams into one file
#$ -j y


#load appropriate envornment
module load python/3.6.2
module load tensorflow/r1.10

#execute the program
python you_code.py
</pre>
<br><br>

## Track SCC job
```{bash comment=""}
qstat -u username
```

You can read more about checking the status of submitted jobs on our website:<br>
[http://www.bu.edu/tech/support/research/system-usage/running-jobs/tracking-jobs/](http://www.bu.edu/tech/support/research/system-usage/running-jobs/tracking-jobs/)
<br><br>


## Setup Jupyter Notebook to run in a local browser
#### Generate the Jupyter configuration file
```{bash eval=FALSE}
module load python/3.6.2
jupyter notebook --generate-config
```
<br><br>

#### Use Python to create an encrypted password hash and exit Python:
```{bash eval=FALSE}
scc1 % python

Python 3.6.2 (default, May 31 2018, 11:36:25)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from notebook.auth import passwd; passwd()
Enter password: 
Verify password: 
'sha1:5a40d332c1a5:b62c59587e5cd5a2ab8791492cfcbcb7811ca34a'
>>> exit()
```
<br><br>

#### Modify your Jupyter configuration file
Open the Jupyter configuration file ~/.jupyter/jupyter_notebook_config.py in your preferred text editor and edit lines 162 and 217. You will need to first uncomment the lines by removing the preceding “#” symbol and then modify them as follows:

```{bash eval=FALSE}
## Line 162:
## The IP address the notebook server will listen on.
c.NotebookApp.ip = '*'
 
## Line 217: 
#  The string should be of the form type:salt:hashed-password.
c.NotebookApp.password = 'sha1:5a40d332c1a5:b62c59587e5cd5a2ab8791492cfcbcb7811ca34a'
```
<br><br>
Save the configuration file and exit from the editor.

#### Start interactive job
```{bash eval=FALSE}
qrsh -P cs542sp -l gpus=1 -l gpu_c=3.5
```
Please note the login node name from where you submitted your job and the compute node name that your job will be assigned to.


#### Load modules and start jupyter notebook
```{bash eval=FALSE}

#Navigate to your working directory
cd /projectnb/cs542sp/username

#Load modules
module load python/3.6.2
module load tensorflow/r1.10

#Launch Jupyter notebook
jupyter notebook --no-browser
```

Please note the port that was assigned to you:<br>
<pre>
[I 13:10:11.680 NotebookApp] The port 8888 is already in use, trying another port.
[I 13:10:11.681 NotebookApp] The port 8889 is already in use, trying another port.
[I 13:10:11.681 NotebookApp] The port 8890 is already in use, trying another port.
[I 13:10:11.681 NotebookApp] The port 8891 is already in use, trying another port.
[I 13:10:11.810 NotebookApp] Serving notebooks from local directory: /project/scv/classes/CS542sp
[I 13:10:11.811 NotebookApp] 0 active kernels
[I 13:10:11.811 NotebookApp] The Jupyter Notebook is running at:
[I 13:10:11.811 NotebookApp] http://[all ip addresses on your system]:<b>8892</b>/
[I 13:10:11.811 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
</pre>

#### Set up tunneling

Open a new local terminal (where you have not logged in to the SCC) and execute
```{bash eval=FALSE}
my-PC % ssh koleinik@scc2.bu.edu -L 7777:scc-x06:8892
```
<br><br>
**Note:** There are 4 values that you need to adjust in the above example:<br>

* your user name
* login node name (if you used scc1.bu.edu to login to the scc, you need to use it instead of scc2.bu.edu)
* your compute node name (you can find what compute name you were assigned by looking at the prompt where you started your Jupyter notebook or by running `qstat -u username` command)
* port number: instead of 8892 number above, use the port number that was assigned to you when you started jupyter notebook
* for yout local port you can use 7777 (as in example above or some other port that is available on your local machine)
<br>

#### Run Jupyter Notebook in your local browser

Open your local browser and in the address bar type:<br>
`localhost:7777`
where 7777 is the local port number you used in the tunneling command.


## Getting help
Please us at help@scc.bu.edu. <br>
Please include the following information in your email: Your class ID (CS542sp), your user name, your working directory and detailed description of the problem: what script you are running, what error you are getting etc. There is no need to attach your script to your email.
<br><br>

## Additional Resources
Our website: [http://www.bu.edu/tech/support/research/](http://www.bu.edu/tech/support/research/)<br>
Getting started: [http://www.bu.edu/tech/support/research/system-usage/getting-started/](http://www.bu.edu/tech/support/research/system-usage/getting-started/)<br>
SCC Cheat Sheet (pdf): [http://scv.bu.edu/documents/SCC_CheatSheet.pdf](http://scv.bu.edu/documents/SCC_CheatSheet.pdf)<br>
Running jobs: [http://www.bu.edu/tech/support/research/system-usage/running-jobs/](http://www.bu.edu/tech/support/research/system-usage/running-jobs/)<br><br>