Free virtual machine in Google cloud
Big data is big. Right? No doubt about it. If you are an aspiring Big Data Engineer
or an Architect, and still away from
the Cloud Platforms, you are away from the infrastructure and the platform that is
to implement scalable Big Data solutions.
There are multiple players in the Cloud space. But my first bet is on Google. Analytics and machine learning at internet scale have been in Google’s founding DNA. My interest towards GCP is an attempt to tap into the innovation that Google has used for its applications and made it available to everyone. But let's not get into too many details at this stage.
This article has a simple and straightforward objective. To learn, play and experiment with modern Big Data tools, you need a high-end machine. That's a significant investment. GCP allows you to get the necessary infrastructure at no cost for a year. In this article, I will demonstrate some basics of creating and using a virtual machine in GCP. I will give you some tips to keep the cost low and help you survive the free credits for an entire year.
This article will explain following things.
- Create and Manage a Linux VM in GCP
- SSH to the Linux VM
- Download/Upload files to the Linux VM
- Connect the VM using third party tools
I have a video tutorial as well for this topic.
Register for free account
The first thing is to register for a free account. Follow the steps listed below.
- Go to the cloud.google.com and hit Try it free button.
- Use your Gmail account credentials to log in.
- Accept the terms and conditions and press next.
- At the next screen, you need to create a payment profile. Select account type as an individual.
- Tax information as an unregistered individual. You can leave the tax details (PAN and TAN numbers) as blank.
- Fill in your name and address details.
- Finally, you need to fill in your credit card number and other information.
- Finally, hit the start my free trial button. It will take few minutes, and Google will create a free account and a GCP project for you.
Google does not charge you for this account. In fact, they will give you USD 300 as a credit to your Google cloud account. You can learn free for a year or until you exhaust your USD 300 credit. I will share some tips to keep the cost as low as possible and use the free credits for a maximum benefit. Even after completing a year or exhausting your free credit limit, they don't charge you anything until you manually upgrade your account to a paying customer. So nothing to worry about your money.
Google Cloud Project
GCP project is like a workspace for you. Google manages all the resources,
credentials, permissions and billing information
at the project level. A GCP project will have a Project
, an automatically generated Project Number and a globally unique
Project id. You can choose a Project Name and a Project id.
You won't be able to change the Project ID later.
So, I recommend that you create a Project id that you can remember. Your free account allows you to create 5 or 6 projects. I don't think you will need more than that until you are planning to deploy a live application. A couple of projects are enough for your learning and experiments. Once you have successfully registered for a free GCP trial, you should land on your project dashboard. If you are not there, click the home button at the top left corner and you will reach there.
Create your first Virtual Machine
You can create a virtual machine in many ways, but I prefer to use the compute engine menu. You will find a menu item on the left side. Scroll down and select the compute engine menu item. Now you can follow the steps defined below.
- Hit the create button on the compute engine page.
- Give a name to the instance.
- Select a zone. You will see the estimated monthly cost. Try with different zones. I found that the cost varies by a zone. So, select the one that is cheapest.
- Select a machine type. Since we are doing a simple test, choose the smallest available size.
- Select the OS image. I prefer using CentOS 6 because most of my tutorials are using the same version. But you have enough choices there.
- The default disk size is 10GB HDD. You can get SSD, but that's expensive. The 10 GB HDD is sufficient for the OS. If required, you can get additional HDD or SSD at the later stage.
- Hit the create button. Just a few seconds and your VM is up and running.
- If you want to stop your VM, select the VM and hit the stop button. You want to delete it, hit the delete button.
Your CPU and RAM is the primary component of your cost. The amount that you see is
the price that Google will deduct from
your free credits when you keep running the machine for 730 hours in a month. I
think you need to keep it up for more than 100 hours in a month. You will start the
do your lab and stop it, start it again when you come back to the next day lab.
One critical price component is the disk price. Remember that the storage cost is applicable even if you stop the machine. And that's reasonable because the GCP has to reserve your HDD space. When you shut down your VM, you are not consuming CPU and RAM, but Google can't reuse your HDD space for any other purpose until you delete your storage. So choose the disk space carefully. Review the cost. You can increase it later when you need more.
SSH to your VM
You can SSH to your VM using the browser-based SSH tool. That's the easiest method. Just click ssh button next to your VM. You will be logged in as a default user with sudo privileges. If you want to install some additional tools, you can use the yum command. However, most of the things are preconfigured and ready to use.
How to use Putty and WinSCP with GCP VM?
The SSH window is a browser-based terminal, but you can copy paste things using Ctrl+Shift+C
and Ctrl+Shift+V. You can upload and download files from the VM using the
items from your SSH window. So frankly speaking, you don't need Putty and
to work with your GCP VMs. All those facilities are available inside the browser
But let's assume that you don't want to use a browser-based SSH. You want to use a particular third party tool for some reason. You can do that as well. Alternatively, suppose you want your friend to access the VM. But you don't want to make him a member of the project. You can do that as well. Let me list down the necessary steps.
1. Generate a pair of SSH keys for an arbitrary user. You can use the below command.
ssh-keygen -t rsa -f ~/.ssh/tanya-gcp-ssh-key -C tanya
The ssh-keygen is the command name. The next part (-t rsa) is the type of the key. Then you specify the filename (~/.ssh/tanya-gcp-ssh-key). You can choose whatever file name you want. The final part is the username (-C tanya). You can choose whatever username you want. You don't have to create a user, set the password and provide credentials. The GCP will take care of that. Just give a username of your choice.
2. Once you execute the ssh-keygen command, you will get two files. The first one contains the private key, and the second one (the dot pub file) holds the public key. GCP has tagged the files with a username that you used in the ssh-keygen command. You need to attach the public key to your VM. Follow these steps to do that.
- Copy the content of the public key.
- Go to the compute engine screen.
- Click on your instance name.
- Click Edit button at the top.
- Scroll down to the SSH Keys.
- Paste your public key and save.
3. The next part is to download the private key file and share it with your friend or whoever you want to access your machine. They can use the private key to login to your GCP VM.
Private Key file format
Unfortunately, the downloaded private key file doesn't work in its original format. We have to convert the file into a Putty's private key file format. You can do it using PuttyGen tool. The PuttyGen tool comes along with Putty. So, start PuttyGen and load your file. Then Save it as a private key. That is it. You can use the new file with Putty and WinSCP to connect to your VM. Follow these steps for Putty.
- Start Putty.
- Enter the username@IP address. Use the external IP of your VM.
- The next thing is to supply the private key.
- Hit the open button.
You should get the connection. You should be logged in as the username that you
supplied at the time of creating the SSH
key pair. Be careful to share the private key with others because the user comes
the sudo credentials.
Continue reading to create a multi-node Hadoop and Spark cluster in Google Cloud Platform.