schrama.io blogs

jupyter notebook on the web

October 21, 2019

I work for a major energy company and we have every platform for data science you can think of. But what if you’re a self employed engineer and you want to be able to run Jupyter Notebooks in the cloud to be able to collaborate with a small team (2-3 people)? I will show how you can set up a Virtual Machine on AWS for free and host a Jupyter Notebook instance on your personal website where people can log into from where ever they are. In my journey to pick up data science to complement my engineering skills, I discovered it’s is very useful to understand more about the data engineering side of data science (i.e. the plumbing).

In the beginning I spent most time on learning Python (predominantly Pandas), machine learning basics (e.g. Scikit Learn) and querying (e.g. SQL). I used Jupyter Notebooks for that work mainly, hosted locally (through Anaconda). At work, we set up an Azure virtual machine that runs Jupyter Lab, which makes it easy to share work. This setup has proven to be very effective and I wanted to learn how to set something up myself, but then on AWS (somehow works better for me than Azure).

I host my own website (this one, which learnt me a lot about HTML and CSS) on AWS from an S3 bucket and when setting that up myself, I discovered how powerful AWS is for nearly everything digital. After setting up a static website, I went on to back up all my family pictures and movies on S3 as storing data in Deep Glacier is death cheap ($0.00099 per GB-Month!)

That work introduced me to the AWS Command Line Interface (CLI) as I had to upload a lot of files. When working with AWS CLI I had to google a lot for all the different commands and this exposed my to other parts of AWS like Lambda (I will write a blog on this later) and ElasticCloud (EC2).All this together inspired me to try to set up my own Virtual Machine with Jupyter Notebooks running on it.

First you need to set up an AWS account. This doesn’t cost anything. Most things on AWS are free or very cheap if you keep things small (check the billing section regularly to avoid surprises). To set up a Virtual Machine (VM in short), you follow these steps (Getting started with EC2). The t2.micro is for free for a whole year and is more than enough to get you going. To connect to my VMs, I tend to use SSH on my mac. This seems to work best for me. I use PuTTY on my windows machine.

As there are so many blogs already out there with detailed step-by-step guides I won’t bore you with all the screenshots of each step to get into your EC instance.

Once you SSH-ed into your machine, you need to install jupyter notebook. I used this step-by-step guide to do that (Installing Jupyter Notebook on your EC2).

In one of the steps you need to edit a configuraton file with vi. Some useful commands (which you can google of course) are “i” for insert and ESC for getting out of editing mode and into command mode. In command mode you can write and quit with “:wq”. The “screen” step is important if you want to keep the server running after you logout of the machine.

Everytime you spin up your VM, a new IP address will be issued by AWS. This is not very helpfull if you want to route traffic to your VM. In the AWS console for EC2 you can go to the Elastic IP menu (located under Network and Security) and allocate a new address. Follow the menu and link the Elastic IP to your EC2 instance. The last step is to route your personal domain to this Elastic IP. I host my domain from Google and Google makes it very easy to route a subdomain to any URL. This is called synthetic records. On AWS you can find instructions here: (Routing traffic to your EC2 with Jupyter Notebook). Please note that you start paying for an elastic IP when it is NOT linked to a running EC2 instance.

Well that wasn’t that difficult was it? Now you can host your own Jupyter Notebook environment. If you want to add libraries, you SSH into your VM and pip install them like on your own machine, but then everyone else has them as well.