IS&T RCS Summer 2024 Trainings

May 21 – June 14, 2024

Registration is open for the RCS Summer 2024 Tutorials. Please also be aware that we have lots of recordings and slides available from past tutorials by RCS staff and vendors.

  • For hands-on sessions where you wish to use your own computer, please have the appropriate software installed on your computer before the session starts.
  • Tutorials are tagged based on experience required (Beginner, Intermediate, or Advanced), location (details below), and if they are new.
  • Tutorial sessions are held either in-person or over Zoom. Zoom sessions have special considerations:
    • Please register at least three days in advance in order to be emailed the Zoom link.
    • Zoom sessions will be recorded; keep your camera off if you do not want your image recorded. The recorded sessions may be made available to the BU community.

The IS&T Research Computing Services (RCS) group offers a tutorial series on programming, data analysis, high performance computing, and domain specific topics three times each year. These tutorials are free and open to all members of the Boston University community.

The RCS tutorials cover concepts, techniques, and tools which researchers can use in their own computing environments. Many are designed to help you make effective use of the Boston University Shared Computing Cluster (SCC). The RCS staff can also deliver extra, or customized, tutorial sessions to your course, group, or lab. Please contact us at help@scc.bu.edu if you are interested.

Register

Trainings Schedule

You may register for as many tutorials as you like. Registration is required and is accessed with your BU Kerberos password.

If you don’t have a Kerberos password, or if you find that a tutorial is full, or have any other questions, please send email to rcs-tutorial@bu.edu.

Tutorial Locations

BSC Biological Science Center, 2 Cummington Mall, Room 107
Zoom Online over Zoom Registered attendees will be sent via email the Zoom link for each tutorial 2-3 days before the tutorial starts and at this point registration for the tutorial will close.


Tutorial Descriptions and Times

Boot Camp Topics

Intermediate GPT & Transformers for Natural Language Processing (Hands-on) NEW ICON

Instructor: Josh Bevan (jbevan@bu.edu)

BSC Tuesday, May 21, 10:00am – 4:00pm

Human communication is rich and complex, and one of the main ways we encode it computationally is through Natural Language Processing (NLP). We’ll explore recent advances in NLP, building from the ground up over the course of three sections. First, we’ll look at generating random first names of people using a simple character level “bigram” model. Then we’ll dive into word embeddings, a technique for encoding words as vectors that captures their semantic meanings. Second, we’ll look at the popular word2vec method and explore how to perform linguistic operations using simple vector arithmetic. Finally, we’ll look at transformer models and see how we can use a pre-trained SentenceTransformer model to do a range of classification on real-world data.

Register

Research Computing Basics Tutorials

Beginner Introduction to Linux (Hands‐on)

Instructor: Augustine Abaris (augustin@bu.edu)

BSC Wednesday, May 29, 12:30pm – 2:30pm

This tutorial will give attendees a hands-on introduction to Linux. Topics covered will include a short history of Linux, logging in with ssh, the Bash shell and shell scripts, I/O redirection (pipes), file system navigation, and job control. Time permitting, attendees will edit, compile, and run a simple C program.

If you have not connected to the SCC from your laptop before, please read and follow these instructions prior to attending the tutorial.

Beginner Introduction to BU’s Shared Computing Cluster (Hands‐on)

Instructor: Aaron Fuegi (aarondf@bu.edu)

BSC Wednesday, May 29, 3:00pm – 5:00pm
Zoom Thursday, May 30, 12:30pm – 2:30pm

This tutorial will introduce Boston University’s Shared Computing Cluster (SCC) in Holyoke, MA. This Linux cluster has more than 23000 processors and over 9 petabytes of storage available for Research Computing by students and faculty on the Charles River and BUMC campuses. A very large number of software packages for programming, mathematics, data analysis, plotting, statistics, visualization, and domain-specific disciplines are available as well on the SCC. You will get a general overview of the SCC and the facility that houses it and then a hands-on introduction covering connecting to and using the SCC for new users. This tutorial will cover a few basic Linux commands but we strongly encourage people to also take our more extensive “Introduction to Linux” tutorial.

There will also be ample time for questions of all types about the SCC.

For those in the BU community interested in using a particular package on the SCC, after taking this tutorial we also recommend viewing one of our short videos on that package if one is available.

Please read and follow these instructions prior to attending the tutorial.

Intermediate Intermediate Usage of the SCC (Lecture)

Instructor: Katia Bulekova (ktrn@bu.edu)

Zoom Monday, June 3, 10:00am – 12:00pm
BSC Tuesday, June 4, 10:00am – 12:00pm

This tutorial will provide some more advanced techniques and common strategies used for interacting with the Shared Computing Cluster and its resources.

The topics discussed during the tutorial include:

  • Customizing your environment
  • Parallel computing on the SCC
  • Jobs monitoring (CPU and memory usage)
  • Profiling programs for performance optimization
  • General optimization strategies

Prerequisites: some prior experience with high performance computing or attendance of our “Introduction to BU’s Shared Computing Cluster” tutorial.

Intermediate Using and Building Containers on the SCC (Hands‐on)

Instructors: Augustine Abaris (augustin@bu.edu) and Brian Gregor (bgregor@bu.edu)

BSC Wednesday, June 5, 12:30pm – 2:30pm

Container technologies such as Docker and Singularity are becoming a common way of developing and sharing applications and workflows. In this tutorial we will cover high level concepts and options for adopting container technologies. This tutorial will provide hands-on examples for working with containers on the SCC. The first hour will cover running Singularity containers and converting Docker containers to Singularity. The second hour will cover building your own customized Singularity containers.

Register

Computer Programming Tutorials

Intermediate Using GenAI Tools in RStudio for Code Development and Graphics (Hands-on) NEW ICON

Instructor: Katia Bulekova (ktrn@bu.edu)

Zoom Thursday, June 6, 10:00am – 12:00pm

This tutorial explores how to leverage GenAI within RStudio, a popular integrated development environment (IDE) for R programming.

The following topics will be covered:

  • Setting Up Your Environment (enabling GitHub Copilot within RStudio)
  • Review various ChatGPT – like packages
  • Explore GenAI’s suggestions for code completion
  • Go over some tips for effective prompting
Register

High Performance Computing Tutorials

Introduction to Parallel Programming Concepts (Hands‐on)

Instructor: Brian Gregor (bgregor@bu.edu)

BSC Tuesday, May 28, 10:00am – 12:00pm

This “Introduction to Parallel Programming Concepts” tutorial is recommended for anyone interested in learning more about the topic or who plans on taking our language-specific tutorials on parallel programming. This tutorial is not oriented towards any program language in particular and is intended for anyone with programming experience. This tutorial covers basic topics such as the use of processes and threads, types of computer hardware for parallel computing, and the limits of parallelization as a strategy. Additionally, several common data and algorithm patterns in software will be discussed along with effective strategies on how to parallelize them.

Intermediate Introduction to MPI (Hands‐on)

Instructor: Josh Bevan (jbevan@bu.edu)

ZoomThursday, May 30, 10:00am – 12:00pm

Many programs can be sped up by using additional CPU cores. To do this the execution needs to be parallelized and distributed across multiple cores. While “shared-memory” approaches like OpenMP allow you to use many cores on a single machine, if the program can still benefit from additional cores then a “distributed-memory” approach like MPI is needed to use multiple machines/nodes. MPI provides a way to communicate between machines and distribute work/data so that they can work cooperatively. This tutorial will take a hands-on approach at writing several simple MPI programs and along the way demonstrate basic MPI functionality.

Prior parallel programming experience for attendees is important. Programs will be written in Fortran so prior experience in Fortran is helpful, but the syntax is straightforward so C/C++ experience can be enough.

Advanced Python Parallelization (Hands‐on)

Instructor: Brian Gregor (bgregor@bu.edu)

BSC Thursday, June 6, 12:30pm – 2:30pm

This tutorial is an introduction to the variety of ways that parallel computations can be performed in Python. Ways of identifying code that can benefit from parallelization will be discussed. Several parallelization methods using the Python language and external libraries will be covered with examples. This tutorial assumes an intermediate understanding of the Python language and parallel computing concepts. It is strongly recommended that the “Introduction to Parallel Programming” tutorial be taken first for those new to parallel software development.

If you do not have Python installed on your home machine, please read and follow these instructions prior to attending the tutorial.

Intermediate Introduction to OpenMP (Hands‐on)

Instructor: Josh Bevan (jbevan@bu.edu)

Zoom Monday, June 10, 10:00am – 12:00pm

Many programs can be sped up by using additional CPU cores. To do this the execution needs to be parallelized and distributed across multiple cores. OpenMP provides a relatively straightforward way to do this for single machines (desktop/laptop) or a single computational node on a cluster. By adding directives within the code to modify the behavior of the compiler, you can generate programs that will use multiple cores. This tutorial will take a hands-on look at several example serial (single-core) programs and show how to use OpenMP to modify them to run in parallel.

Experience in Fortran and parallel programming will be helpful, but not required. It is expected attendees have previous programming experience in at least one language, preferably a compiled one.

Advanced Python Optimization (Hands‐on)

Instructor: Brian Gregor (bgregor@bu.edu)

BSC Tuesday, June 11, 12:30pm – 2:30pm

This tutorial is for those with intermediate Python experience who are interested in optimizing their code to maximize performance. The topics covered are profiling and timing Python code, selecting data structures, avoiding common pitfalls, using external libraries, and tuning Python code.

If you do not have Python installed on your home machine, please read and follow these instructions prior to attending the tutorial.

Advanced Python with Dask (Hands‐on) NEW ICON

Instructor: Brian Gregor (bgregor@bu.edu)

BSC Friday, June 14, 12:30pm – 2:30pm

Dask is an open source Python library for parallel computing. This helps to scale Python code to large scale problems, including ones where the quantity of data is much greater than the amount of computer memory on hand. It provides a convenient way to adapt existing programs based around libraries such as Pandas and Numpy to run in parallel. This tutorial will cover using Dask to scale up Pandas Dataframes, numpy array processing, parallelizing custom Python code, and scalable file processing.

Register

Data Analysis Tutorials

Advanced Advanced Topics in R: Code Optimization and Parallelization (Lecture)

Instructor: Katia Bulekova (ktrn@bu.edu)

Zoom Friday, June 7, 10:00am – 12:00pm

Join our R Optimization and Parallelization Tutorial to learn essential techniques to speed up your analysis with R. We will learn how to identify the bottlenecks in your code and review the main pitfalls that hinder R performance. We will also explore various packages and functions that can significantly shorten the time it takes to execute your code. The second part of the tutorial will be dedicated to R code parallelization. We will go over several R packages (such as parallel, foreach, snowfall) that can be used to run your R script using multiple CPU cores.

Register