Python with Jupyter Notebooks#
Publishing Jupyter Notebooks#
You can publish Jupyter notebooks to RStudio Connect. The Jupyter Notebook
extension for RStudio Connect (rsconnect-jupyter
) allows you to publish
Jupyter notebooks with the press of a button. Once published on RStudio Connect,
these notebooks can be scheduled for updates or refreshed on demand.
Examples#
- Loading and Visualizing Geospatial Data in Jupyter Notebooks
- Python Visualizations in Jupyter Notebooks
- Interactive Python Visualizations in Jupyter Notebooks
Jupyter end-to-end flow and best practices#
Jupyter Notebooks were designed to be run in a single-user environment, where the user can also act in the role of administrator. As a result, notebooks do not have built-in support for:
- managing virtual environments
- managing package installation
- easily accessing a terminal
JupyterLab is a more fully-featured IDE and as such can be easier to use
(for example, it provides readier access to a terminal).
However, RStudio currently offers no JupyterLab extensions,
so all publishing from JupyterLab must be done via the rsconnect-python
CLI.
Below, we outline one possible path for working with Jupyter Notebooks that provides you, the developer, with a consistent, isolated and reproducible environment that works well on RStudio Workbench and simplifies publishing to RStudio Connect.
Creating a new project#
We first create a directory to choose our project and virtual environment.
To do this, we use the Jupyter terminal:
Remember to replace <PROJECT-NAME>
with an appropriate name for your project.
Avoid using spaces in the directory names, as this can cause problems
with registering your new Jupyter kernel in later steps.
$ mkdir <PROJECT-NAME>
$ cd <PROJECT-NAME>
Next, create a virtual environment for your project.
In this example, we're using venv
as the name of our virtual environment.
You can use any name you prefer.
$ /opt/python/3.9.7/bin/python -m venv venv
Then activate your virtual environment.
$ . ./venv/bin/activate
Now register your virtual environment as a Jupyter kernel.
(venv)$ python -m pip install ipykernel
(venv)$ python -m ipykernel install --name "<PROJECT-NAME>" --user
You can now install additional packages using pip
if you'd like,
alternatively, you can begin working inside a notebook.
Using Jupyter Notebooks#
In the Jupyter UI, navigate to your project folder and create a new notebook using your newly registered kernel. If your kernel does not show up, you may need to refresh the page in your browser.
Ensure the Notebook is saved to the project directory you created earlier.
You may now use Jupyter as normal.
If you need to install additional packages, you must either return to the
command line, navigate to the project directory, activate the virtual
environment and install using pip
, or install within the notebook using
the following commands:
import sys
!{sys.executable} -m pip install numpy
Warning
Take care never to use !pip install package
as this will use the system
pip
and not the one associated with your virtual environment. This can result
in packages being installed to your user environment instead of the virtual
environment, or in some cases, packages failing to install altogether.
It is important to install in this way to ensure that packages are installed to the appropriate environment. Please remember however, that any such package installation commands should be removed from your Notebook prior to publication on RStudio Connect.
Publishing to RStudio Connect#
There are two options for publishing to Connect:
- Use the push-button deployment in the RStudio Workbench hosted Jupyter Notebook. Push the "publish" button and follow the on-screen prompts.
- Install the
rsconnect-python
package and use thersconnect
command line tool. In your virtual environment, you can runrsconnect --help
for more info.
Checking your project into version control#
Version control (for example, git) is an essential part of all good software
development. In order to allow your collaborators to restore the virtual
environment, you need to check in a requirements.txt
file.
This file can be created in one of two ways:
- Return to the terminal, navigate to the project directory, activate the virtual environment and then run
pip freeze > requirements.txt
- Within the Jupyter Notebook, create a cell that contains the following:
import sys
!{sys.executable} -m pip freeze > requirements.txt
Ensure the requirements.txt
file exists before removing this cell.
Managing Jupyter kernels#
Over time, you may build up lots of available Jupyter kernels.
These can be managed from the command line. For example, to list all the available kernels:
$ jupyter kernelspec list
Or to remove an old unused kernel:
$ jupyter kernelspec remove <KERNEL-NAME>
Notes and acknowledgements#
Some of the content in this document was adapted from other sources:
- Some of the info on working with different Kernels came from the RStudo Workbench documentation.
- Additional background on Jupyter Notebook environments and methods for installing packages inside of Jupyter Notebooks was adapted from a blog post by Jake VanderPlas.
- Further information on using virtual environments as kernels in Jupyter Notebooks was obtained from a blog post by Nikolai Janakiev.
- Results and advice from a study of 1.4 million Jupyter Notebooks.
Additional debugging info#
The easiest mistake to make in the processes above is to add the new kernel from outside of the virtual environment.
The following example shows the kernel.json
two kernels registered from the
same location. The first shows a kernel registered from outside a virtual
environment and the second from within.
Notice how the first argv
parameter differs in each example.
From outside the virtual environment we have captured the main python installation instead of the virtual environment version:
$ cat /usr/home/mark.sellors/.local/share/jupyter/kernels/jpy-test2/kernel.json
{
"argv": [
"/opt/python/3.9.6/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "jpy-test2",
"language": "python",
"metadata": {
"debugger": true
}
}
And from inside the virtual environment we correctly capture the path the python binary within the environment:
$ cat /usr/home/mark.sellors/.local/share/jupyter/kernels/jupyter-test/kernel.json
{
"argv": [
"/usr/home/mark.sellors/jupyter-test/venv/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "jupyter-test",
"language": "python",
"metadata": {
"debugger": true
}