SAS Tutorial: All You Need To Know About SAS
What is SAS? Why should I learn SAS? Are there any benefits of learning it? Are you looking for answers to the above questions? If yes, then this SAS tutorial will answer all your questions.
We human beings are a curious species, aren’t we? We always want to know more. This desire to know, makes us ask more questions, and in turn, keeps us on the look out for more answers.
Let us consider a simple problem. What if you want to go shopping, and you have two options to choose from:
- A $149 coupon
- 25% off coupon
You may wonder, which option will help you save more? This is one scenario. We have many such questions and we may come across different scenarios where we can’t make a proper decision. Do these questions make you curious about finding answers? If yes, then you will like analytics and also the tools that will help you analyse data. SAS is one such tool.
By the end of the article you should be familiar with the following topics:
- What Is Data Analytics?
- Need For SAS
- What Is SAS?
- SAS Components
- SAS As A Programming Language
- Installation Of SAS Programming/ Development Environment
- Running A SAS Program
What Is Data Analytics?
The word “analytics” has been trending for a while now, yet there is no single way to define it. Let us try to understand analytics with a simple example. Imagine, you want to buy a T-shirt. What would be the possible questions that you may think of? Let me help you with a few common ones:
- When should I buy a T-shirt?
- How much should I spend for the T-shirt?
- Should I buy the T-shirt online or should I visit a store?
- If I decide to buy the T-shirt online, from which website should I buy it?
- If I decide to visit a store, which store should I visit?
The decision may depend on factors like time, money, preference and previous experiences. Let us continue with the above problem. Consider the following:
- You are free on Sunday
- Your budget is $300
- You prefer visiting a store as it lets you handpick a T-shirt
- You decide to visit a particular store because your previous visit at the store did not disappoint you
Based on the above points, suppose, you decide to visit a store on Sunday and buy a T-shirt which will cost you around $265. You considered a few possibilities, picked the ones that suited the requirement and made a decision.
Well in simple words you just did some sort of analysis to help you buy a T-shirt. Let me simplify it further. Your brain did two simple things here:
- Collected information as per your requirement
- Understood the data and based on that information, helped you decide in buying the T-shirt.
This is what you can do using analytics. You can gather information, analyse it and take better decisions. The above example was easy, so you could take a decision based on a few assumptions. What if the problem and the decision making wasn’t this easy?
Consider this problem from a business point of view. Suppose, an e-commerce company wants to study the buying patterns of its customers based on the previous data. The company will have to consider thousands of records isn’t it? Now, imagine the data we just talked about or permutations and combinations the company may consider for different preferences which people may have.
Also, the company may not have all the data. For example, if a customer did not buy a T-shirt, then what factors led the customer to decide not to buy the T-shirt? This missing data may create problems. How do we deal with these problems? How do we handle such data? Well, these problems become easier when we use analytics. By using analytics you can eliminate unnecessary data and optimize the relevant information to find patterns which can help you take better decisions.
Need For SAS
The analytics market has grown immensely in the last few years. This has resulted in an increase in the number of tools used. All of these are beneficial in one way or the other. So let us move ahead with our SAS tutorial and take a look at a few of the most widely used tools in the market.
- SAS: It is the most used tool in commercial analytics market. With a plethora of statistical functions and good GUI (Enterprise Guide & Miner), it certainly leads the market.
- R: It is an open-source software. It is easy to learn R because it is well documented. It is cost effective and has strong statistical capabilities.
- Python is another open-source scripting language which is widely used. Python usage has grown over time. Today, it sports libraries such as Numpy, Scipy and MatPlotLib. You can perform almost any statistical operation or build any model using these libraries.
SAS Vs. R Vs. Python
Let us compare these three tools in this SAS tutorial and I am going to use the following parameters for comparison.
Being a vital tool for research and analytics, it has surely generated a lot of requirement for SAS trained resources. SAS holds 70% of the market share. R holds 15% and python holds the least, which is less than 10%.
2) Ease Of Understanding
SAS is one of the easiest tools to learn. Even people with limited knowledge of SQL can learn it easily. Python is not as convenient as SAS is for analytics. R requires you to write tedious and lengthy codes, hence giving SAS an edge.
3) Fourth Generation Language
SAS is a fourth-generation programming language. A fourth-generation programming language is “a programming language designed with a specific purpose in mind such as the development of commercial business software.” It is designed to reduce programming effort and minimize the time and cost it takes to develop a software. R and Python are not Fourth Generation Languages.
SAS stays updated to the market needs. Its ease of integration makes it more flexible and usable. This also means it amalgamates well with other technologies. Making it flexible in true sense.
The above reasons strongly support the claim that SAS holds its top position in the market firmly. So now that we have seen the comparison of these three analytical tools. Moving ahead in this SAS tutorial let us understand SAS in a little more detail.
SAS Tutorial: What Is SAS?
Let us now try to understand what is SAS and what it does?
SAS stands for Statistical Analysis System. It is a software suite developed by SAS Institute.
The image below shows a few application of SAS:
In simple words, SAS can process complex data and generate meaningful insights that would help organizations take better decisions or predict possible outcomes in the near future.
SAS lets you Mine, Alter, Manage and Retrieve data from different sources and analyse it. The graphical point-and-click user interface of SAS aids non-technical users to use its features for graphical operations and advanced options.
SAS Tutorial | SAS Tutorial For Data Analysis | Edureka
Let us move ahead with our SAS tutorial and take a look at few important SAS components:
- Base SAS: It is the most widely used component. It has data management facility. You can do data analysis using Base SAS.
- SAS/GRAPH: With the use SAS/Graph you can represent data as graphs. This makes data visualization easy.
- SAS/STAT: It lets you perform Statistical analysis, such as Variance, Regression, Multivariate, Survival and Psychometric analysis.
- SAS/ETS: It is suited for Time Series Analysis.
Since this is an introductory article, we will be focusing on Base SAS and I am sure, it should be easy for everyone to understand.
SAS As A Programming Language
Most programming environments are either menu driven (point-and-click) or command driven (enter and execute commands). However, SAS is neither menu driven nor command driven. This is because it uses a series of instructions or statements known as SAS program. This program is a depiction of what you want to do and is written in SAS language.
Data is central to every data set. In SAS, data is available in tabular form where variables occupy the column space, and observations occupy the row space.
SAS treats numbers as numeric data and everything else falls under character data. Hence SAS has two data types, numeric and character. Easy, isn’t it?
DATA step and PROC step form the basic building blocks of a SAS program. What do these building blocks do is what we are going to discuss in this SAS tutorial.
Building Blocks Of SAS
We start a program with a DATA step to create a SAS data set and then pass the data onto a PROC step. The PROC step processes the data. In order to understand how DATA and PROC steps work, let us consider the below example.
Suppose I wanted to convert a number which is in inches to centimeters and store the result in a variable called ‘size’ and print it, then the DATA step would convert the number in inches to centimeters and PROC step would print the result.
The image below shows a code snippet for the above mentioned problem:
The statements constitute DATA and PROC steps. The length of a step may vary from one, to more than hundred statements. It is important you remember that DATA steps are used to read and modify data, whereas PROC steps are used to analyse data, perform utility functions, or print reports.
DATA steps begin with the keyword DATA which is followed by a name that you choose for your SAS data set. It is evident that the above DATA step produces a data set named size. DATA steps read data from external data files and may also be used to include loops and case statements. It can be used to merge, sort, combine and concatenate data.
Similarly, procedures start with a PROC statement where the keyword PROC follows the name of the procedure used (for example the name of the procedure may be PRINT, SORT, or MEAN). SAS procedures mostly have a handful of possible statements.
Each time SAS comes across a new step (marked by a DATA or PROC statement), it terminates or ends the previous step and starts with a new one.
While a typical program starts with a DATA step to input or modify data, and then passes the data to a PROC step, it is certainly not the only pattern for mixing DATA and PROC steps. Just as you can stack building blocks in any order, you can arrange DATA and PROC steps in any order. A program could even contain only DATA steps or only PROC steps.
Nonetheless, you will find it much easier to write SAS programs if you understand these basic functions. The above mentioned are few basics every SAS beginner should know. Moving on to the next part of our SAS tutorial, let us understand how to install SAS university edition.
Now beginners can learn and practice SAS, as SAS Institute Inc has released SAS University Edition which is available for free. All the features which are needed to learn Base SAS are available here. Learning Base SAS will make it easy for you to learn other components.
SAS Tutorial: Installation
Installing SAS University Edition is easy. However, its availability as a virtual machine requires you to run it in a virtual environment. You need to install a virtualization software on your PC before you run the SAS software. The following steps would help you download and setup the SAS environment.
1) Download SAS University Edition
SAS University Edition can be downloaded from this link SAS University Edition. If you click on the above link, following window will appear. Please read the requirement details before you start downloading.
2) Quick Start Guide To Installation
People who are completely new to the process of installation, can go through following guides and videos available as in step 1. It is optional and you can skip it if you are already familiar.
3) Setting Up A Virtualization Software
The links in step 2 will let you download the suitable virtualization software. You can skip this step if you already have installed a Virtualization Software.
4) Download The Zip File
Choose the appropriate version of the SAS University Edition compatible with the virtualization environment you have. It will download as a zip file. The name would be similar to: ‘unvbasicvapp_9411005_vmx_en_sp0_1.zip’
5) Unzip The Zip File
Unzip the above zipped file and store it in an appropriate directory.
6) Loading The Virtual Machine
Open and start your VMware player, look for a file which ends with an extension .vmx and open it. The following screen will be visible. Please note down the basic settings like memory and hard disk space allocated for your reference.
7) Power On The Virtual Machine
Clicking the Power on this virtual machine alongside the green arrow mark, lets you start the virtual machine. The following screen should appear.
While loading, the following screen appears. After this you can run the virtual machine. You will get a prompt to go to the URL which will open the SAS environment.
8) Starting SAS Studio
Open a new tab on your browser and load the URL highlighted in the above image. The following screen would appear to indicate the readiness of the SAS environment. You may get a different URL because, the URL may be different from PC to PC.
Running A SAS Program
Now that we have understood how to install SAS University Edition, next in our SAS Tutorial let us take at a sample SAS program.
The code below shows how to print a Fibonacci sequence. In case, if you don’t know what a Fibonacci sequence is, let me define it for you.
The Fibonacci sequence is a set of numbers that starts with a one or a zero, followed by a one, and proceeds based on the rule that, each number (called a Fibonacci number) is equal to the sum of the preceding two numbers. If the Fibonacci sequence is denoted F(n), where n is the first term in the sequence, the following equation shows Fibonacci sequence for n=0, where the first two terms are defined as 0 and 1 by convention:
F (0) = 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 …
In some context, it is customary to use n=1. In that case, the first two terms are defined as 1 and 1 by default, and therefore:
F (1) = 1, 1, 2, 3, 5, 8, 13, 21, 34 …
Let us take a look at this SAS code which generates a Fibonacci sequence that starts with one.
DATA Fibonacci; Do i =1 to 10; Fib = Sum(Fib, lag(Fib)); if i = 1 then Fib =1; output; end; PROC PRINT Fibonacci; Run;
In the above code, we have defined a function called as ‘Fib’ to calculate the next number. Fib is equal to the sum of the current Fib number and the previous Fibonacci number. We use lag function to retrieve the last function and to fetch the value of the previous fib number.
The following image shows the output of the above code. We have used the PROC PRINT procedure to display the output in a printed form.
I hope you liked this SAS tutorial blog. This was the first blog of the SAS Tutorial blog series. My next blog will be on SAS programming, do read that as well to learn how to write programs in SAS.
If you wish to learn SAS and build a career in the analytics domain, then check out our SAS Training & Certification which comes with instructor-led live training and real-life project experience. This training will help you understand SAS in depth and help you master various concepts of SAS language.
Got a question for us? Please mention it in the comments section and we will get back to you.