This is a review of MATH2871, just to prepare for my final exam.
Chapter 2: Getting Started
Data
step are typically used to create SAS data sets.Proc
step typically used to process SAS data sets.- SAS steps begins with either
Data
orProc
. SAS detects the end of a step by: arun
statement, aquit
statement, the beginning of another step. - Three primary windows:
Editor
contains the SAS program to submit.Log
contains information about the processing of the SAS program, including any warning and error messages.Ouput
contains reports generated by the SAS program.
Chapter 3: Working with SAS Syntax
- SAS statements usually begin with an identifying keyword, always ends with a semicolon.123456data work.NewSalesEmps;length First_Name $ 12Last_Name $ 18 Job_Title $ 25;infile `newemps.csv` dlm=`,`;input First_Name $ Last_Name $ Job_Title $ Salary;run;
There are five statement in this data step, because 5 semicolons.
- SAS comments:
/* comment */
and* comment ;
- Syntax errors: misspelled keywords, unmatched quotation marks, missing semicolons, invalid options.
Chapter 4: Getting Familiar with SAS Data Sets
- Components of SAS Data Sets:
Descriptor Potion
andData Potion
. Descriptor potion contains general information about the SAS data set and variable names. Browsing the descriptor potion:
12proc contents data=work.NewSalesEmps;run;The data potion of a SAS data set is a rectangular table of character and/or numeric data values. Their are two types of variables:
- character: contain any value: letters, numbers, blanks …
- numeric: stored as floating point numbers, in 8 bytes by default
- SAS uses numeric data type to store data values
- A SAS data value is stored as the number of days between January 1, 1960, and a specific date.
- 01JAN1959 –> -365
- 01JAN1960 –> 0
- 01JAN1961 –> 366
- Missing value: A character missing value is displayed as a blank. A numeric missing value is displayed as a period (.)
- Variable names:
- can be 32 characters long
- must start with a letter or a underscore, subsequent can also be numerals
- can be uppercase. lower case or mixed case,
- are not case sensitive
proc print
display the data portion of a SAS data set12proc print data=work.NewSalesEmps;run;Options and statements can be added to the print procedure. The
noobs
option suppresses the observation numbers on the left side of the report. Thevar
statement selects variables that appear in the report and determines their order.123proc print data=work.NewSalesemps noobs;var Last_Name First_Name Salary;run;When a SAS session starts, SAS automatically create one temporary and at least one permanent SAS data library that you can access.
work
: temporary library.sasuser
:permanent library.- Create your own permanent libraries:
libref
must be 8 characters or less. must start with a letter or underscore. the remaining characters must be letters, numbers or underscores.123456789libname libref `SAS-data-library` <options>;```11. The `libname` (Temporal link to the directory) statement remains in effect until canceled, changed or your SAS session ends.+ general form of cleaning `libref`: `libname libref clear;`12. The default `libref` is `work` if the `libref` is omitted.13. The `contents` procedure with the `_all_` keyword produces a list of all the SAS files in the data library. The `nods` option suppresses the descriptor portions of the data set. The `nods` is only used in conjunction with the keyword `_all_````sasproc contents data=libref._all_ nods;run;
Chapter 5: Reading SAS Data Sets
- The
set
statement read observations from a SAS data set for further processing in thedata
step. By default, theset
statement read all observations and all variables from the input data set. - Subset observations by using the
where
statement.
where Gender = 'M';
where Salary > 50000;
where Country in ('AU', 'US');
where salary between 50000 and 100000; /* inclusive */
where Employee_ID is null;
where Employee_ID is missing; /* same as is null */
where Job_Title contains 'Rep' /* case sensitive */;
where Name like '%N'; /* A percent sign (%) replaces any number of characters */
where Name like 'T_M%' /* An undersocre (_) replaces one character */
- Subset variables by using the
drop
andkeep
statements.
drop
statement specifies the names if the variables to omit from the output data set.keep
statement specifies the names of the variables to write to the output data set.drop Employee_ID Gender Country Birth_Date;
keep First_Name Last_Name Salary Job_Title Hire_Date;
- The
sum
statement produces column totals.
- general form of the
sum
statement:sum variable(s)
- Adding permanent attributes: Add labels to the descriptor portion of a SAS data set by using the
label
statement. Adding formats to the descriptor portion of a SAS data set by usingformat
statement.
- In order to use labels in the
print
procedure, a label option need to be added toproc print
statement:proc print data=work.subset1 label; run;
- A
format
is an instruction that SAS used to write data values. General form:format variable(s) format;
- Use
label
andformat
statement in the
- proc step to temporally assign the attributes
- data step to permanently assign the attributes
|
|
Chapter 6: Reading Excel Worksheets
libname oriionxls '~/desktop/dataset/sales.xls';
- SAS name literals: By default, special characters such as
$
are not allowed in data set names, but SAS name literals allow special characters to be included in data set names. (A string with quotation marks, followed by the lettern
).1234libname orionxls '~/desktop/dataset/sales.xls';proc print data=orionxls.'Australia$'n;run;
|
|
Chapter 7: Reading Delimited Raw Data File
- The
infile
statement identifies the physical name of the raw data file to read with aninput
statement. - The
input
statement describes the arrangement of values in the raw data file and assigns input values to the corresponding SAS variables. The
dlm=
option can be added to theinfile
statement to specify an alternate delimiter. (By default, the delimiter isspace
)123456data work.subset3;infile 'sales.csv' dlm=',';input Employee_ID First_Name $Last_Name $ Gender $ SalaryJob_Title $ Country $;run;input variables <$>;
: variables must be specified in the order they appear in the raw data file, left to right.$
indicates to store a variable as a character value. The default length for character and numeric variables is eight bytes.- The
data
step is process in two phases: compilation & execution
- See Week 4’s note, page 21 for more details
- During the compilation phase, SAS
- checks the syntax of the DATA step statements
- creates an input buffer to hold the current raw data file record that is being processed
- creates a program data vector (PDV) to hold the current SAS observation
- creates the descriptor portion of the output data set
The
length
statement defines the length of a variable explicitly.legnth First_Name Last_Name $ 12 Gender $ 1;
12345678910data work.subset3;/* length statement define the order appeared in the ouput data, work.subset3 */length First_Name $ 12 Last_Name $ 18Gender $ 1 Job_Title $ 25 Country $ 2;infile 'sales.csv' dlm=',';/* input statment defines the order appeared in the source data, 'sales.csv' */input Employee_ID First_Name $Last_Name $ Gender $ SalaryJob_Title $ Country $;run;Nonstandard data
is any data that SAS cannot read without a special instruction. read such data:input variable <$> variable < :informat >;
, whereinformat
is an instruction that SAS uses to read data values into a variable.
- SAS uses data informats to read and convert dates to SAS date values.
:modifier
: The:modifier
informs SAS to ignore the width associated with the informat and treat the file as delimited.
:mmddyy10.
can read all of the following values:01/07/2008
,1/7/2008
,1/07/2008
,01/07/08
,01/7/2008
,1/7/08
input Employee_ID First_Name $ Last_Name $ Gender $ Salary Job_Title $ Country $ Birth_Date :date9. Hire_Date :mmddyy10.;