sas

Using Joins in SAS

Introduction#

Each database is a collection of different tables and each table contains different data in an organized way. While working with data, most of the times information we need is scattered in more than one table. We need joins/merge to get the desired output.

In SAS we use joins while working with Proc SQL and use merge while working with Data step. We will now talk only about joins inside Proc SQL.

Parameters#

Type of join Output
Proc Sql SQL procedure inside SAS
Create Table Creates a SAS dataset
Select Selects required variables from respective datasets
Where Specifies particular condition
Quit End the procedure
## Remarks#
As mentioned in the introduction, we can also use Mergeinside a data step which will be discussed under a separate topic. Joins play a very important role to blend and unify data according to the requirement.
## Vertical Joining
Vertical join appends dataset B to dataset A providing both of them have similar variables. For example, we have sales for the month of Jan’17 in dataset A and sales for Feb’17 in dataset B. To create a dataset C that has sales of both Jan and Feb we use Vertical Join.
PROC SQL;
CREATE TABLE C AS
SELECT *
FROM A
UNION 
SELECT *
FROM B;
QUIT;

Now dataset C has observations from both A and B and is appended vertically.

Inner Join

Inner join creates a dataset that contains records that have matching values from both the tables. For example, we have a dataset A that contains customer information and a dataset B that contains credit card details. To get the credit card details of customers in dataset A, let us create dataset C

PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUM
FROM CUSTOMER A, CC_DETAILS B
WHERE A.CUSTOMERID=B.CUSTOMERID
QUIT;

Dataset C will have only matching observations from both the datasets.

Left Join

Left join returns all the observations in the left data set regardless of their key values but only observations with matching key values from the right data set. Considering the same example as above,

PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUMBER, B.START_DATE
FROM CUSTOMER A LEFT JOIN CC_DETAILS B
ON A.CUSTOMERID=B.CUSTOMERID
QUIT;

Dataset C contains all the values from the left table, plus matched values from the right table or missing values in the case of no match.

Right join

Like left join, right join selects all the observations from the right dataset and the matched records from the left table.

PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUMBER, B.START_DATE
FROM CUSTOMER A RIGHT JOIN CC_DETAILS B
ON A.CUSTOMERID=B.CUSTOMERID
QUIT;

Dataset C contains all the values from the right table, plus matched values from the left table or missing values in the case of no match.

Full Join

Full join selects all the observations from both data sets but there are missing values where the key value in each observation is found in one table only.

PROC SQL;
CREATE TABLE C AS
SELECT A.*, B.CC_NUMBER, B.START_DATE
FROM CUSTOMER A FULL JOIN CC_DETAILS B 
ON A.CUSTOMERID=B.CUSTOMERID
QUIT;

Dataset C will contain all records from both the tables and fill in . for missing matches on either side.


This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow