IS&T RCS Summer 2024 Trainings
May 21 – June 14, 2024
Registration is open for the RCS Summer 2024 Tutorials. Please also be aware that we have lots of recordings and slides available from past tutorials by RCS staff and vendors.
- For hands-on sessions where you wish to use your own computer, please have the appropriate software installed on your computer before the session starts.
- Tutorials are tagged based on experience required (Beginner, Intermediate, or Advanced), location (details below), and if they are new.
- Tutorial sessions are held either in-person or over Zoom. Zoom sessions have special considerations:
- Please register at least three days in advance in order to be emailed the Zoom link.
- Zoom sessions will be recorded; keep your camera off if you do not want your image recorded. The recorded sessions may be made available to the BU community.
The IS&T Research Computing Services (RCS) group offers a tutorial series on programming, data analysis, high performance computing, and domain specific topics three times each year. These tutorials are free and open to all members of the Boston University community.
The RCS tutorials cover concepts, techniques, and tools which researchers can use in their own computing environments. Many are designed to help you make effective use of the Boston University Shared Computing Cluster (SCC). The RCS staff can also deliver extra, or customized, tutorial sessions to your course, group, or lab. Please contact us at help@scc.bu.edu if you are interested.
RegisterTrainings Schedule
- Boot Camp Topics
Tue, May 21 10:00am ‐ 4:00pm
GPT & Transformers for Natural Language Processing (Hands-on)
- Research Computing Basics Tutorials
Wed, May 29 12:30pm ‐ 2:30pm
Introduction to Linux (Hands‐on)Wed, May 29 3:00pm ‐ 5:00pm
Introduction to BU’s Shared Computing Cluster (Hands‐on)Thu, May 30 12:30pm ‐ 2:30pm
Introduction to BU’s Shared Computing Cluster (Hands‐on)Mon, June 3 10:00am ‐ 12:00pm
Intermediate Usage of the SCC (Lecture)Tue, June 4 10:00am ‐ 12:00pm
Intermediate Usage of the SCC (Lecture)Wed, June 5 12:30pm ‐ 2:30pm
Using and Building Containers on the SCC (Hands-on)
- Computer Programming Tutorials
Thu, June 6 10:00am ‐ 12:00pm
Using GenAI Tools in RStudio for Code Development and Graphics (Hands‐on)
- High Performance Computing Tutorials
Tue, May 28 10:00am ‐ 12:00pm
Introduction to Parallel Programming Concepts (Hands‐on)Thu, May 30 10:00am ‐ 12:00pm
Introduction to MPI (Hands‐on)Thu, June 6 12:30pm ‐ 2:30pm
Python Parallelization (Hands‐on)Mon, June 10 10:00am ‐ 12:00pm
Introduction to OpenMP (Hands‐on)Tue, June 11 12:30pm ‐ 2:30pm
Python Optimization (Hands‐on)Fri, June 14 12:30pm ‐ 2:30pm
Python with Dask (Hands‐on)
- Data Analysis Tutorials
Fri, June 7 10:00am ‐ 12:00pm
Advanced Topics in R: Code Optimization and Parallelization (Lecture)
You may register for as many tutorials as you like. Registration is required and is accessed with your BU Kerberos password.
If you don’t have a Kerberos password, or if you find that a tutorial is full, or have any other questions, please send email to rcs-tutorial@bu.edu.
Tutorial Locations
Biological Science Center, 2 Cummington Mall, Room 107
Online over Zoom Registered attendees will be sent via email the Zoom link for each tutorial 2-3 days before the tutorial starts and at this point registration for the tutorial will close.
Tutorial Descriptions and Times
Boot Camp Topics
GPT & Transformers for Natural Language Processing (Hands-on)
Instructor: Josh Bevan (jbevan@bu.edu)
Tuesday, May 21, 10:00am – 4:00pm
Human communication is rich and complex, and one of the main ways we encode it computationally is through Natural Language Processing (NLP). We’ll explore recent advances in NLP, building from the ground up over the course of three sections. First, we’ll look at generating random first names of people using a simple character level “bigram” model. Then we’ll dive into word embeddings, a technique for encoding words as vectors that captures their semantic meanings. Second, we’ll look at the popular word2vec method and explore how to perform linguistic operations using simple vector arithmetic. Finally, we’ll look at transformer models and see how we can use a pre-trained SentenceTransformer model to do a range of classification on real-world data.
RegisterResearch Computing Basics Tutorials
Introduction to Linux (Hands‐on)
Instructor: Augustine Abaris (augustin@bu.edu)
Wednesday, May 29, 12:30pm – 2:30pm
This tutorial will give attendees a hands-on introduction to Linux. Topics covered will include a short history of Linux, logging in with ssh, the Bash shell and shell scripts, I/O redirection (pipes), file system navigation, and job control. Time permitting, attendees will edit, compile, and run a simple C program.
If you have not connected to the SCC from your laptop before, please read and follow these instructions prior to attending the tutorial.
Introduction to BU’s Shared Computing Cluster (Hands‐on)
Instructor: Aaron Fuegi (aarondf@bu.edu)
Wednesday, May 29, 3:00pm – 5:00pm
Thursday, May 30, 12:30pm – 2:30pm
This tutorial will introduce Boston University’s Shared Computing Cluster (SCC) in Holyoke, MA. This Linux cluster has more than 23000 processors and over 9 petabytes of storage available for Research Computing by students and faculty on the Charles River and BUMC campuses. A very large number of software packages for programming, mathematics, data analysis, plotting, statistics, visualization, and domain-specific disciplines are available as well on the SCC. You will get a general overview of the SCC and the facility that houses it and then a hands-on introduction covering connecting to and using the SCC for new users. This tutorial will cover a few basic Linux commands but we strongly encourage people to also take our more extensive “Introduction to Linux” tutorial.
There will also be ample time for questions of all types about the SCC.
For those in the BU community interested in using a particular package on the SCC, after taking this tutorial we also recommend viewing one of our short videos on that package if one is available.
Please read and follow these instructions prior to attending the tutorial.
Intermediate Usage of the SCC (Lecture)
Instructor: Katia Bulekova (ktrn@bu.edu)
Monday, June 3, 10:00am – 12:00pm
Tuesday, June 4, 10:00am – 12:00pm
This tutorial will provide some more advanced techniques and common strategies used for interacting with the Shared Computing Cluster and its resources.
The topics discussed during the tutorial include:
- Customizing your environment
- Parallel computing on the SCC
- Jobs monitoring (CPU and memory usage)
- Profiling programs for performance optimization
- General optimization strategies
Prerequisites: some prior experience with high performance computing or attendance of our “Introduction to BU’s Shared Computing Cluster” tutorial.
Using and Building Containers on the SCC (Hands‐on)
Instructors: Augustine Abaris (augustin@bu.edu) and Brian Gregor (bgregor@bu.edu)
Wednesday, June 5, 12:30pm – 2:30pm
Container technologies such as Docker and Singularity are becoming a common way of developing and sharing applications and workflows. In this tutorial we will cover high level concepts and options for adopting container technologies. This tutorial will provide hands-on examples for working with containers on the SCC. The first hour will cover running Singularity containers and converting Docker containers to Singularity. The second hour will cover building your own customized Singularity containers.
RegisterComputer Programming Tutorials
Using GenAI Tools in RStudio for Code Development and Graphics (Hands-on)
Instructor: Katia Bulekova (ktrn@bu.edu)
Thursday, June 6, 10:00am – 12:00pm
This tutorial explores how to leverage GenAI within RStudio, a popular integrated development environment (IDE) for R programming.
The following topics will be covered:
- Setting Up Your Environment (enabling GitHub Copilot within RStudio)
- Review various ChatGPT – like packages
- Explore GenAI’s suggestions for code completion
- Go over some tips for effective prompting
High Performance Computing Tutorials
Introduction to Parallel Programming Concepts (Hands‐on)
Instructor: Brian Gregor (bgregor@bu.edu)
Tuesday, May 28, 10:00am – 12:00pm
This “Introduction to Parallel Programming Concepts” tutorial is recommended for anyone interested in learning more about the topic or who plans on taking our language-specific tutorials on parallel programming. This tutorial is not oriented towards any program language in particular and is intended for anyone with programming experience. This tutorial covers basic topics such as the use of processes and threads, types of computer hardware for parallel computing, and the limits of parallelization as a strategy. Additionally, several common data and algorithm patterns in software will be discussed along with effective strategies on how to parallelize them.
Introduction to MPI (Hands‐on)
Instructor: Josh Bevan (jbevan@bu.edu)
Thursday, May 30, 10:00am – 12:00pm
Many programs can be sped up by using additional CPU cores. To do this the execution needs to be parallelized and distributed across multiple cores. While “shared-memory” approaches like OpenMP allow you to use many cores on a single machine, if the program can still benefit from additional cores then a “distributed-memory” approach like MPI is needed to use multiple machines/nodes. MPI provides a way to communicate between machines and distribute work/data so that they can work cooperatively. This tutorial will take a hands-on approach at writing several simple MPI programs and along the way demonstrate basic MPI functionality.
Prior parallel programming experience for attendees is important. Programs will be written in Fortran so prior experience in Fortran is helpful, but the syntax is straightforward so C/C++ experience can be enough.
Python Parallelization (Hands‐on)
Instructor: Brian Gregor (bgregor@bu.edu)
Thursday, June 6, 12:30pm – 2:30pm
This tutorial is an introduction to the variety of ways that parallel computations can be performed in Python. Ways of identifying code that can benefit from parallelization will be discussed. Several parallelization methods using the Python language and external libraries will be covered with examples. This tutorial assumes an intermediate understanding of the Python language and parallel computing concepts. It is strongly recommended that the “Introduction to Parallel Programming” tutorial be taken first for those new to parallel software development.
If you do not have Python installed on your home machine, please read and follow these instructions prior to attending the tutorial.
Introduction to OpenMP (Hands‐on)
Instructor: Josh Bevan (jbevan@bu.edu)
Monday, June 10, 10:00am – 12:00pm
Many programs can be sped up by using additional CPU cores. To do this the execution needs to be parallelized and distributed across multiple cores. OpenMP provides a relatively straightforward way to do this for single machines (desktop/laptop) or a single computational node on a cluster. By adding directives within the code to modify the behavior of the compiler, you can generate programs that will use multiple cores. This tutorial will take a hands-on look at several example serial (single-core) programs and show how to use OpenMP to modify them to run in parallel.
Experience in Fortran and parallel programming will be helpful, but not required. It is expected attendees have previous programming experience in at least one language, preferably a compiled one.
Python Optimization (Hands‐on)
Instructor: Brian Gregor (bgregor@bu.edu)
Tuesday, June 11, 12:30pm – 2:30pm
This tutorial is for those with intermediate Python experience who are interested in optimizing their code to maximize performance. The topics covered are profiling and timing Python code, selecting data structures, avoiding common pitfalls, using external libraries, and tuning Python code.
If you do not have Python installed on your home machine, please read and follow these instructions prior to attending the tutorial.
Python with Dask (Hands‐on)
Instructor: Brian Gregor (bgregor@bu.edu)
Friday, June 14, 12:30pm – 2:30pm
Dask is an open source Python library for parallel computing. This helps to scale Python code to large scale problems, including ones where the quantity of data is much greater than the amount of computer memory on hand. It provides a convenient way to adapt existing programs based around libraries such as Pandas and Numpy to run in parallel. This tutorial will cover using Dask to scale up Pandas Dataframes, numpy array processing, parallelizing custom Python code, and scalable file processing.
RegisterData Analysis Tutorials
Advanced Topics in R: Code Optimization and Parallelization (Lecture)
Instructor: Katia Bulekova (ktrn@bu.edu)
Friday, June 7, 10:00am – 12:00pm
Join our R Optimization and Parallelization Tutorial to learn essential techniques to speed up your analysis with R. We will learn how to identify the bottlenecks in your code and review the main pitfalls that hinder R performance. We will also explore various packages and functions that can significantly shorten the time it takes to execute your code. The second part of the tutorial will be dedicated to R code parallelization. We will go over several R packages (such as parallel, foreach, snowfall) that can be used to run your R script using multiple CPU cores.
Register