Jupyter in the Cloud

I recently read Joe Feeeny’s amazing guide on how to get Jupyter set up in the cloud. Having suffered trying to optimize models on my laptop, I was really excited about the ability to do this, but automated, of course.

I would recommend two small additions on top of that post:

Use Amazon Linux Machine Learning AMIs, so that most deep learning frameworks(Keras + TensorFlow, Theano, numpy) and low level libraries(like CUDA) are installed already, so no need to waste precious time installing anaconda. I haven’t investigated this thoroughly, but it appears that the machine learning amis have 30gb of free storage that comes with the image, much higher than the 8gb limit that comes with Ubuntu AMIs.
Actually secure the server. Fortunately, this is really easy to do with Ansible Roles.

If you are new to Ansible and Terraform, this might not be the best post to start, as I will only cover the broad strokes.

Provision the server

The relevant parts here are to open an incoming port to the server so that Jupyter notebook server can listen on it, in addition to to the default ssh port that needs to be exposed for Ansible. I had already previously set up an AWS key pair and a security group enabling outbound access and opening the ssh port. As you can see here I also use cloudflare to provision an A record so that we can set up SSL.

Note that I also modify a local file that is configured to be my ansible hosts file. You can make an ansible.cfg file to do this.

# config.tf
provider "aws" {
  access_key = "${var.aws_access_key}"
  secret_key = "${var.aws_secret_key}"
  region = "${var.region}"
}

provider "cloudflare" {
  email = "${var.cloudflare_email}"
  token = "${var.cloudflare_api_key}"
}

resource "aws_security_group" "notebook_access" {
  name        = "jupyter_access"
  description = "Allow access on Jupyter default port"

  ingress {
    from_port   = 8888
    to_port     = 8888
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  tags {
    Name = "allow_notebook_access"
  }
}

data "aws_security_group" "default_security_group" {
  id = "${var.aws_default_security_group_id}"
}

resource "aws_instance" "chestnut" {
  ami           = "${lookup(var.deep_learning_amis, var.region)}"
  instance_type = "p2.xlarge"
  key_name = "deployer-key" # already existing through other configuration
  security_groups = ["${data.aws_security_group.default_security_group.name}", "${aws_security_group.notebook_access.name}"]
  count = "${var.count}"
}

resource "cloudflare_record" "chestnut" {
  domain = "${var.cloudflare_domain}"
  name   = "chestnut"
  value  = "${aws_instance.chestnut.public_ip}"
  type   = "A"
}

resource "local_file" "ansible_hosts" {
  filename = "${path.module}/ansible/hosts"
  content = <<EOF
[web]
${cloudflare_record.chestnut.hostname}
EOF
}

Configure notebook

Using a playbook, we can do SSL signing and updating the notebook config in one fell swoop.

---
- hosts: web
  gather_facts: no
  remote_user: ec2-user
  vars:
    domain: "mydomain.com"
    notebook_config_path: "~/.jupyter/jupyter_notebook_config.py"
    certbot_install_from_source: yes
    certbot_auto_renew: yes
    certbot_auto_renew_user: "{{ ansible_user }}"
    certbot_auto_renew_minute: 20
    certbot_auto_renew_hour: 5
    certbot_admin_email: "{{ email }}"
    certbot_create_if_missing: yes
    certbot_create_standalone_stop_services: []
    certbot_create_command: "{{ certbot_script }} certonly --standalone --noninteractive --agree-tos --email {{ cert_item.email | default(certbot_admin_email) }} -d {{ cert_item.domains | join(',') }} --debug"
    certbot_certs:
     - domains:
       - "{{ domain }}"
  roles:
    - role: geerlingguy.certbot
      become: yes
  tasks:
    - name: Enable daily security updates
      become: yes
      package:
        name: yum-cron-security.noarch
        state: present

    - name: Ensure that cert keys can be read
      become: yes
      file:
        path: /etc/letsencrypt/live
        mode: a+rx
        recurse: yes

    - name: Ensure that archive is readable too
      become: yes
      file:
        path: /etc/letsencrypt/archive
        mode: a+rx
        recurse: yes

    - name: Update certfile
      replace:
        path: "{{ notebook_config_path }}"
        regexp: '.*c.NotebookApp\.certfile.*'
        replace: "c.NotebookApp.certfile = '/etc/letsencrypt/live/{{ domain }}/fullchain.pem'"

    - name: Update keyfile
      replace:
        path: "{{ notebook_config_path }}"
        regexp: '.*c.NotebookApp\.keyfile.*'
        replace: "c.NotebookApp.keyfile = '/etc/letsencrypt/live/{{ domain }}/privkey.pem'"

    - name: Configure notebook to bind to all ips
      replace:
        path: "{{ notebook_config_path }}"
        regexp: '.*c.NotebookApp\.ip.*'
        replace: "c.NotebookApp.ip = '*'"

    - name: Don't open browser by default
      replace:
        path: "{{ notebook_config_path }}"
        regexp: '.*c.NotebookApp\.open_browser.*'
        replace: "c.NotebookApp.open_browser = False"

Some interesting things to point out here:

Lets Encrypt support for Amazon Linux AMIs are in development, so I had to essentially copy over certbot_create_command and add the ‘debug’ flag.
certbot_create_standalone_stop_services has to be set to [] for me, since it assumes nginx is running by default, and the script fails if nginx is not running.
You might need to install the geerlingguy.certbot role if you haven’t already
- ansible-galaxy install geerlingguy.certbot

The rest is straightforward, and can be updated to set more configurations on the config file!

With that done, all that is left is to ssh into the server, source the right environment, and run the jupyter notebook(with a command like jupyter notebook). I guess this could be daemonized, but I like to be sshed in to have confirmations that the notebook is still alive. I ran into an issue trying to debug this on a t2.nano instance, where the notebook would continually crash, and it was good to see some output.

I had to stop going down the rabbit hole, but it would be trivial to run fail2ban as good measure on the server. Right now we also still need to copy the token from stdout when the server starts, but the config file could be modified to do that.