Python with Jupyter Notebooks#

Publishing Jupyter Notebooks#

You can publish Jupyter notebooks to RStudio Connect. The Jupyter Notebook extension for RStudio Connect (rsconnect-jupyter) allows you to publish Jupyter notebooks with the press of a button. Once published on RStudio Connect, these notebooks can be scheduled for updates or refreshed on demand.

Examples#

Jupyter end-to-end flow and best practices#

Jupyter Notebooks were designed to be run in a single-user environment, where the user can also act in the role of administrator. As a result, notebooks do not have built-in support for:

managing virtual environments
managing package installation
easily accessing a terminal

JupyterLab is a more fully-featured IDE and as such can be easier to use (for example, it provides readier access to a terminal). However, RStudio currently offers no JupyterLab extensions, so all publishing from JupyterLab must be done via the rsconnect-python CLI.

Below, we outline one possible path for working with Jupyter Notebooks that provides you, the developer, with a consistent, isolated and reproducible environment that works well on RStudio Workbench and simplifies publishing to RStudio Connect.

Creating a new project#

We first create a directory to choose our project and virtual environment.

To do this, we use the Jupyter terminal:

Start a Jupyter terminal

Remember to replace <PROJECT-NAME> with an appropriate name for your project. Avoid using spaces in the directory names, as this can cause problems with registering your new Jupyter kernel in later steps.

$ mkdir <PROJECT-NAME>
$ cd <PROJECT-NAME>

Next, create a virtual environment for your project. In this example, we're using venv as the name of our virtual environment. You can use any name you prefer.

$ /opt/python/3.9.7/bin/python -m venv venv

Then activate your virtual environment.

$ . ./venv/bin/activate

Now register your virtual environment as a Jupyter kernel.

(venv)$ python -m pip install ipykernel
(venv)$ python -m ipykernel install --name "<PROJECT-NAME>" --user

You can now install additional packages using pip if you'd like, alternatively, you can begin working inside a notebook.

Using Jupyter Notebooks#

In the Jupyter UI, navigate to your project folder and create a new notebook using your newly registered kernel. If your kernel does not show up, you may need to refresh the page in your browser.

Create a new Notebook with your custom Kernel

Ensure the Notebook is saved to the project directory you created earlier.

You may now use Jupyter as normal.

If you need to install additional packages, you must either return to the command line, navigate to the project directory, activate the virtual environment and install using pip, or install within the notebook using the following commands:

import sys
!{sys.executable} -m pip install numpy

Warning

Take care never to use !pip install package as this will use the system pip and not the one associated with your virtual environment. This can result in packages being installed to your user environment instead of the virtual environment, or in some cases, packages failing to install altogether.

It is important to install in this way to ensure that packages are installed to the appropriate environment. Please remember however, that any such package installation commands should be removed from your Notebook prior to publication on RStudio Connect.

Publishing to RStudio Connect#

There are two options for publishing to Connect:

Use the push-button deployment in the RStudio Workbench hosted Jupyter Notebook. Push the "publish" button and follow the on-screen prompts.
Install the rsconnect-python package and use the rsconnect command line tool. In your virtual environment, you can run rsconnect --help for more info.

Checking your project into version control#

Version control (for example, git) is an essential part of all good software development. In order to allow your collaborators to restore the virtual environment, you need to check in a requirements.txt file.

This file can be created in one of two ways:

Return to the terminal, navigate to the project directory, activate the virtual environment and then run pip freeze > requirements.txt
Within the Jupyter Notebook, create a cell that contains the following:

import sys
!{sys.executable} -m pip freeze > requirements.txt

Ensure the requirements.txt file exists before removing this cell.

Managing Jupyter kernels#

Over time, you may build up lots of available Jupyter kernels.

These can be managed from the command line. For example, to list all the available kernels:

$ jupyter kernelspec list

Or to remove an old unused kernel:

$ jupyter kernelspec remove <KERNEL-NAME>

Notes and acknowledgements#

Some of the content in this document was adapted from other sources:

Some of the info on working with different Kernels came from the RStudo Workbench documentation.
Additional background on Jupyter Notebook environments and methods for installing packages inside of Jupyter Notebooks was adapted from a blog post by Jake VanderPlas.
Further information on using virtual environments as kernels in Jupyter Notebooks was obtained from a blog post by Nikolai Janakiev.
Results and advice from a study of 1.4 million Jupyter Notebooks.

Additional debugging info#

The easiest mistake to make in the processes above is to add the new kernel from outside of the virtual environment.

The following example shows the kernel.json two kernels registered from the same location. The first shows a kernel registered from outside a virtual environment and the second from within.

Notice how the first argv parameter differs in each example.

From outside the virtual environment we have captured the main python installation instead of the virtual environment version:

$ cat /usr/home/mark.sellors/.local/share/jupyter/kernels/jpy-test2/kernel.json
{
 "argv": [
  "/opt/python/3.9.6/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "jpy-test2",
 "language": "python",
 "metadata": {
  "debugger": true
 }
}

And from inside the virtual environment we correctly capture the path the python binary within the environment:

$ cat /usr/home/mark.sellors/.local/share/jupyter/kernels/jupyter-test/kernel.json
{
 "argv": [
  "/usr/home/mark.sellors/jupyter-test/venv/bin/python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ],
 "display_name": "jupyter-test",
 "language": "python",
 "metadata": {
  "debugger": true
 }