How to set up a Data Science environment on Windows using Visual Studio Code
It is no mystery that I am profoundly in love with VS Code in many more ways than I’m comfortable admitting. Since my good friend Victor showed me the power of this application, I not only have never gone back to any other code editor but I’ve also even replaced other tools with it.
I’m currently doing a master’s degree and I have found myself using it as not only my main code editor and development environment but also my main tool for editing documents, opening PDFs, editing my site, etc…
Tutorial index
- How to set up a Data Science environment on Windows using Visual Studio Code
- Tutorial index
- Why VS Code?
- Step 1: Install Anaconda
- Step 2: Install VS Code
- Step 3: Install Git
- Step 4: Set up our environment in VS Code
- Must have extensions (IMO)
- Step 5: Syncing your extensions, UI and settings
- Step 6: Pushing code to your repos on github
- Creating and cloning private repositories
- Committing code and pushing it
- Done!
- Thanks for checking out my article!
Why VS Code?
A few simple reasons without explanation and in no specific order:
- It’s mostly open source (it’s microsoft’s distribution of code-oss)
- It’s incredibly customizable
- It’s free
- It can edit scripts written in pretty much any programming language
Now that I’ve convinced you with my incredibly detailed explanation of why it’s good, we’re ready to install it and set up our development environment.
Step 1: Install Anaconda
Anaconda is an open source distribution that includes Python along with several other applications you can install.
It also includes several libraries that are useful for data science, machine learning, numerical operations and such.
1- Go to the Anaconda website
2- Click on download and select the download for your platform, in this case Windows.
3- Click on 64-Bit or 32-Bit depending on your system, most computers these days run 64-Bit operating systems.
4- We wait for the download to finish.
5- We open the installer file and follow the setup. Click on next after opening.
6- We agree with the license agreement.
7- Install for just your user or for all users, ideally just yours.
8- Choose the install location, be mindful that anaconda requires 2.7GB of space.
9- Make sure to register anaconda as your default python 3, this is the default option. Then click install.
10- Wait for the installation to be done and then open cmd (Windows button -> type cmd -> open cmd). You should see the command prompt come up, type conda init and then enter on your keyboard.
11- Test if the environment has been initialized by typing “conda activate base” and then enter on your keyboard.
If you see base to the left of the path you’re in, then we’re done setting anaconda up!
Step 2: Install VS Code
In this next step we install VS Code. This only encompasses the process of getting the application installed, we will then set it up in Step 4 to work well with Python or R.
*Note: Rstudio is probably a better code editor for R, however, you can still open R files or R notebooks in VS Code and run them as long as you have the R interpreter installed on your computer.*
1- Go to the VS Code website. It’ll show the version to download for your platform in the blue download button.
2- Upon clicking we should get the download to start.
3- Once the download is compelted, we open the installer.
4- We accept the license agreement and continue.
5- We select the installation folder, we can just leave the defaul there and continue.
6- We decide whether to include it in the start menu or not and continue.
7- We add code to PATH and select additional options we may want. I personally would add the “Open with Code” option to the windows explorer context menu and register the application for the supported file types.
This is completely optional though.
8- We install the application
9- We wait for the installer to do its thing.
10- And we’re done, we can decide to launch VS Code now, we can go ahead and do that and go to Step 3.
Done!
Note: when opening VS Code for the first time you might get a firewall prompt like the following:
When this comes up, make sure to allow access!
Now we set up VS Code. It might seem daunting at first but the application is actually quite intuitive once you start diving into it. I’ll cover as much as I can in this guide.
Step 3: Install Git
An important part of developing things is to be able to publish them somewhere. Or just have a backup of them!
I personally really like services like Github or Gitlab to host my own projects, and there’s no way to push stuff into your own repositories if you don’t have git. So we’re gonna go ahead and install that now.
1- Go to the git downloads site and click on the download for your platform.
2- The installer download should start
3- We open the installer and accept the administrative privilege request
4- We click on next and continue through the installer
5- We select the installation location. We can just leave the default one. Git occupies ~258 MB of space.
6- We select the features we want. You could just leave this as default or take off windows explorer integration in case you don’t want to have another option in your windows explorer context menu (the menu that comes up when you right click inside a folder of your windows explorer).
7- We select whether to install or not in start menu. I’d keep this as default.
8- Default editor used by Git. Select VS Code as the default editor for git.
9- For the next steps, we can just keep the default options, but I will include the screenshots for the sake of completion.
10- Keep this option as default (unchecked) and click install!
11- We wait for the installation to finish and we’re done installing git!
12- You can choose to view release notes if you’re interested, otherwise uncheck the option and finish!
We’re done installing git, now we can commit content to platforms like github or gitlab right from VS Code when we get it set up, which is the next step.
Step 4: Set up our environment in VS Code
This section of the tutorial will be a bit longer but I will try to explain everything in as much detail as I can.
Now we have VS Code open and we’re greeted with the following Welcome page.
Currently, VS Code is JUST a text editor, it can’t run a lot of files yet, but we have the wonderful gift of extensions. VS Code extensions is what makes the application such a powerful tool, along with all the cool things it has.
Let’s get it set up!
1- Click on the little blocks icon on the left bar.
2- We see the list of extensions by amount of downloads. All these extensions are awesome and they make this application the full programming suite that it is.
There is a whole world of extensions out there. I’ll go through a few that I think are absolutely amazing and will serve greatly when working with data.
Must have extensions (IMO)
The Python and Jupyter extensions.
This adds support for anything related to python, including notebooks and linting.
You still should install this extension even if you don’t work with python, as it adds support for jupyter notebooks.
Installing the python extension installs he jupyter extension automatically.
Jupyter Notebooks work once the extension is installed (it comes with the python extension though). If you have R installed, R notebooks will also work out of the box, or pretty much any kernel that you select as long as it’s supported (Julia works well too).
Running Python scripts
Obviously, we want to be able to run python scripts as well (.py files). VS Code allows us to run the files in either our default shell or interactive (which is an IPython console that quite resembles a notebook.
1- Let’s open a folder
And let’s select the folder we want to work on, this first folder is just an example folder I made on the desktop
2- Let’s create a test script using the new file button, pointed by the green arrow. First button from left to right.
3- Let’s write something simple to test whether things run smoothly
4- Before running it, let’s set our desired kernel.
To change the kernel, click on where it says “Python 3.8.3…” this will pop up the kernel selection menu. Let’s select the conda path, the only one that pops up here. This is an important step, as some other platforms come with python preinstalled and the default path might be different.
5- Now that we have selected the right kernel, we now check whether this indeed works in your shell.
First click on Run python file in terminal.
One thing to note here is that we could get this nasty error if we are using powershell as our default shell and it’s never been initialized:
We can solve this by changing our default shell to cmd, as we have already initialized it before (when installing anaconda by running conda init in cmd).
6- On the top right of the shell, we can see a drop down menu. Let’s click on select default shell
Pick command prompt as your default shell
Close the previous shell through the little trash can button
7- Run the script again after having picked command prompt as default shell
Great!, the option to Run python file in interactive window should work out of the box after picking the default interpreter
After getting things up and running with python, we can continue with more awesome extensions.
The R extension.
This is a bit of a choice that I personally make as sometimes I’m working with Python files or R files, and I like to have the ability to edit both, along with run jupyter notebooks with an R kernel.
The Julia extension.
You see where this is going? if you want support for a specific programming language, you just search for the programming language extension in VS Code and it’ll usually come with the support you need and you’ll be able to run/interpret/compile code using that programming language as long as you have what you need installed on your computer.
The julia extension adds Julia language support, for using Julia with notebooks you will have to have IJulia installed and the julia binary in your PATH. As far as I know, the julia binary will be installed to your PATH by default when you install Julia in your computer.
Remember, extensions add such programming language support to vscode but they do NOT install the programming language compiler/interpreter on your computer.
Just like with Python, R or Julia, you can install basically any programming language support to your VS Code editor.
Rainbow CSV.
This is amazing. Absolutely amazing. It’ll allow you to open CSV and TSV files (comma separated or tab separated) on VS Code with a little bit of a twist.
VS Code comes with support for any text file by default, however, this gives you an excel-like perspective of any CSV/TSV file.
Here you can see an example without the extension
Here you can see the same file with the extension enabled
And you can even align a file’s columns or query the file
Querying the file using SQL-like queries
and executing queries will return the same CSV with the entries matched by the query
Themes!
You can get any theme you like for it, for example this one!
Before the theme:
After the theme:
And we can also customize the icons we see for files in the integrated file explorer:
Before applying the icon theme:
After applying the icon theme:
And there’s thousands to choose from.
PDF Reader
I personally work with PDFs regularly when doing programming exercises, or when reading papers, maybe reading cheat sheets when working with regular expressions or latex, or previewing a latex file after compiling!
You can use the vscode-pdf reader extension to explore PDF files by just opening them like you’d open any other file inside VS Code
Editing/compiling LaTeX files
If you already work with latex, you can use the LaTeX Workshop extension which includes a PDF reader
Here I can have the .Tex file, the preview PDF and a cheat sheet open side by side:
When opening a PDF using this extension we might get the following prompt
Make sure to click on open it anyway and it should work fine as long as you have this extension or vscode-pdf installed.
Step 5: Syncing your extensions, UI and settings
There’s an extension called Settings Sync in the VS Marketplace which worked amazingly. However, this was recently added to VS Code as a built in feature. I’ll quickly walk you through it.
On the bottom left you should see a typical user account icon.
1- Click on it and click turn on settings sync
2- You’ll be asked what you’d like to sync.
Select your desired options and click on Sign in & Turn on.
3- You’ll then be asked whether you want to use a Microsoft or Github account.
Either option works fine but I will recommend going with Github, they will redirect you to their specific login pages and you can just log in and go through with it.
I’ll go with the github option, as for me it makes more sense to sync VS Code with github.
After logging in you should see the following prompt
Click on continue and you should be redirected to a tab containing an authorization key. Make sure to click on Open link.
This should open VS Code and ask you for an authorization to open the URI
Then you should see a notification saying it’s activated
4- As a result, you’ll also be signed into Github (or Microsoft account)
Now your settings, extensions, themes and everything will be synced automatically and whenever you re-install VS Code on another machine, you’ll only have to sign up and won’t have to manually re-download everything or reconfigure the editor.
Step 6: Pushing code to your repos on github
The purpose of having installed git and VS Code and using them as our working environment is to code cool stuff and have our development environment. Great, check. Now we want to maybe push code to github/gitlab or other similar platforms.
To do this we first need to make an account in either platform. In this case I will use github, as this is what I use, but the process should be quite similar for other platforms.
1- Create a new repo
2- Name your repo and make a description for it.
Then specify whether you want this to be a public or a private repository and you can decide to initialize the repo using a Readme file (a markdown file where you can describe the contents of your repo).
The .gitignore file is a file that you can add to a repository in order to ignore specific files when pushing files into it.
In my case I will include a Readme file in it and make it a public repo.
3- Done!, we have created our repo, now we should clone it to a folder we want. Go to a new VS Code window and click on Clone Repository.
4- A small menu will pop up top to allow introducing a repository link or to clone directly from github. Let’s click on Clone from Github.
5- You’ll get a prompt that says the GitHub extension wants you to sign into GitHub. Click on Allow.
6- You’ll be redirected to your browser asking for autorization to access github from VS Code. Click on Continue.
7- You’ll get a prompt to open VS Code. Click on Open link.
8- You’ll then be asked if you want to allow opening this URI. Click on Open.
9- After this you’ll see a similar small window as before pop up with the options of which repository in our github account to clone.
Where you should see the previously created repository.
You’ll only be able to clone Public repos using this feature. If you have created your previous repository as a private repo, you should do the following.
Creating and cloning private repositories
Not all code is public. Maybe you don’t want to share your code yet and you’ve created a private repository so that only you (and anyone you’ve set as collaborator) can see the contents of it.
When creating your repo you can make a repository private the following way:
Once the repository is created, we must clone it to add stuff to it and push it to github.
Click on the green button that says Code with a download icon. This should show us the link to clone our repository in a folder.
1- Click on the copy button, to copy the HTTPS link to clone our repo.
2- Once copied and in a new VS Code window we will have the option to either Open a folder or to Clone a repo. Click on Clone Repository.
3- We will be prompted with a little window up top that allows us to paste the previously copied link to clone the repo. Once pasted, click on Clone from URL.
4- There will be another prompt to select the location to clone the repo in. Select the location and click Select repository location.
5- In case you aren’t signed into github, you’ll be prompted to do so. Click on Sign in with your browser.
6- On your browser Authorize the GitCredentialManager.
7- Your authentication will succeed and you can go back to github.
8- You’ll see a cloning git repository notification.
When that one’s done, you’ll see another notification that asks if you would like to open the cloned repo. Click on Open.
Cool! now you’re in the right folder, now any file you create or any modifications you make to files already in the repository will be able to be uploaded to github with a few clicks.
Committing code and pushing it
Now that you have your folder open, the only thing left to do is to code some things and upload them to your repo.
VS Code makes this very easy, you still should learn git, but VS Code takes most of the effort out of it. We’ll go through it step by step.
1- Code something or add files to the repo!
In this case I will create a folder inside the repo and drag a PDF file to it. Then create a python script on the root directory of the repo.
I’ve created a folder called pdf and dragged the file latexsheet.pdf to it.
I’ve also added a script called test.py to the root of the ‘testing’ directory.
However, to upload these files we must create a commit and push them.
We can see that there’s a 2 on the source control tab. (we’ve made changes to 2 files, in this case created them)
2- Once we’ve made our modifications, we go to source control and we’ll see the changes.
We can now do several things. We could commit this to our ‘main’ branch, which is, well, the main channel to which we are committing files in this repo
Note: You can have multiple channels, called branches, maybe multiple people are working on a repository and multiple changes are being made to it, each update/change can be made in a different branch and merged with the ‘main’ or, as it is most commonly called ‘master’ branch.
In this case, we’re going to commit directly to the main branch, but maybe we should first examine the changes:
We can see the changes made to the file test.py, where we have added a line printing the string ‘testing’. We can see that because the file was not in the repository prior to the changes, it shows “(untracked)” next to the name. But we can see each line changed highlighted with red on the left and green on the right, where red is the previous state of that line and green the new state of that line after those changes.
3- We can commit the changes to the repository. Write a message for the commit and then press commit.
4- Once we press the button, the changes will be applied and we can now “push” the commit to github.
On the bottom left corner of VS Code we should see the small 1 with a commit to push to the repo. Click on it to push and pull commits. In this case we won’t pull anything as no one has made any additional changes to the repo’s contents before we push these commits.
5- Push the changes!
6- See your changes live on github!
Done!, we can see the pdf folder was uploaded with the file in it along with the test.py file and the message of the commit next to that specific file or folder.
Done!
Hopefully this guide has been useful at getting you started with VS Code. If you have any questions you can respond to this article or if you like to self-troubleshoot you can do that too, VS Code is a very widely used code editor and you can find the solution to pretty much everything online!
Thanks for checking out my article!
You can read up more up in the VS Code Documentation which is quite complete as well!