This is a review of MATH2871, just to prepare for my final exam.
Datastep are typically used to create SAS data sets.
Procstep typically used to process SAS data sets.
- SAS steps begins with either
Proc. SAS detects the end of a step by: a
quitstatement, the beginning of another step.
- Three primary windows:
Editorcontains the SAS program to submit.
Logcontains information about the processing of the SAS program, including any warning and error messages.
Ouputcontains reports generated by the SAS program.
- SAS statements usually begin with an identifying keyword, always ends with a semicolon.123456data work.NewSalesEmps;length First_Name $ 12Last_Name $ 18 Job_Title $ 25;infile `newemps.csv` dlm=`,`;input First_Name $ Last_Name $ Job_Title $ Salary;run;
There are five statement in this data step, because 5 semicolons.
- SAS comments:
/* comment */and
* comment ;
- Syntax errors: misspelled keywords, unmatched quotation marks, missing semicolons, invalid options.
- Components of SAS Data Sets:
Descriptor potion contains general information about the SAS data set and variable names. Browsing the descriptor potion:12proc contents data=work.NewSalesEmps;run;
The data potion of a SAS data set is a rectangular table of character and/or numeric data values. Their are two types of variables:
- character: contain any value: letters, numbers, blanks …
- numeric: stored as floating point numbers, in 8 bytes by default
- SAS uses numeric data type to store data values
- A SAS data value is stored as the number of days between January 1, 1960, and a specific date.
- 01JAN1959 –> -365
- 01JAN1960 –> 0
- 01JAN1961 –> 366
- Missing value: A character missing value is displayed as a blank. A numeric missing value is displayed as a period (.)
- Variable names:
- can be 32 characters long
- must start with a letter or a underscore, subsequent can also be numerals
- can be uppercase. lower case or mixed case,
- are not case sensitive
proc printdisplay the data portion of a SAS data set12proc print data=work.NewSalesEmps;run;
Options and statements can be added to the print procedure. The
noobsoption suppresses the observation numbers on the left side of the report. The
varstatement selects variables that appear in the report and determines their order.123proc print data=work.NewSalesemps noobs;var Last_Name First_Name Salary;run;
When a SAS session starts, SAS automatically create one temporary and at least one permanent SAS data library that you can access.
work: temporary library.
- Create your own permanent libraries:
librefmust be 8 characters or less. must start with a letter or underscore. the remaining characters must be letters, numbers or underscores.123456789libname libref `SAS-data-library` <options>;```11. The `libname` (Temporal link to the directory) statement remains in effect until canceled, changed or your SAS session ends.+ general form of cleaning `libref`: `libname libref clear;`12. The default `libref` is `work` if the `libref` is omitted.13. The `contents` procedure with the `_all_` keyword produces a list of all the SAS files in the data library. The `nods` option suppresses the descriptor portions of the data set. The `nods` is only used in conjunction with the keyword `_all_````sasproc contents data=libref._all_ nods;run;
setstatement read observations from a SAS data set for further processing in the
datastep. By default, the
setstatement read all observations and all variables from the input data set.
- Subset observations by using the
where Gender = 'M';
where Salary > 50000;
where Country in ('AU', 'US');
where salary between 50000 and 100000; /* inclusive */
where Employee_ID is null;
where Employee_ID is missing; /* same as is null */
where Job_Title contains 'Rep' /* case sensitive */;
where Name like '%N'; /* A percent sign (%) replaces any number of characters */
where Name like 'T_M%' /* An undersocre (_) replaces one character */
- Subset variables by using the
dropstatement specifies the names if the variables to omit from the output data set.
keepstatement specifies the names of the variables to write to the output data set.
drop Employee_ID Gender Country Birth_Date;
keep First_Name Last_Name Salary Job_Title Hire_Date;
sumstatement produces column totals.
- general form of the
- Adding permanent attributes: Add labels to the descriptor portion of a SAS data set by using the
labelstatement. Adding formats to the descriptor portion of a SAS data set by using
- In order to use labels in the
proc print data=work.subset1 label; run;
formatis an instruction that SAS used to write data values. General form:
format variable(s) format;
formatstatement in the
- proc step to temporally assign the attributes
- data step to permanently assign the attributes
libname oriionxls '~/desktop/dataset/sales.xls';
- SAS name literals: By default, special characters such as
$are not allowed in data set names, but SAS name literals allow special characters to be included in data set names. (A string with quotation marks, followed by the letter
n).1234libname orionxls '~/desktop/dataset/sales.xls';proc print data=orionxls.'Australia$'n;run;
infilestatement identifies the physical name of the raw data file to read with an
inputstatement describes the arrangement of values in the raw data file and assigns input values to the corresponding SAS variables.
dlm=option can be added to the
infilestatement to specify an alternate delimiter. (By default, the delimiter is
space)123456data work.subset3;infile 'sales.csv' dlm=',';input Employee_ID First_Name $Last_Name $ Gender $ SalaryJob_Title $ Country $;run;
input variables <$>;: variables must be specified in the order they appear in the raw data file, left to right.
$indicates to store a variable as a character value. The default length for character and numeric variables is eight bytes.
datastep is process in two phases: compilation & execution
- See Week 4’s note, page 21 for more details
- During the compilation phase, SAS
- checks the syntax of the DATA step statements
- creates an input buffer to hold the current raw data file record that is being processed
- creates a program data vector (PDV) to hold the current SAS observation
- creates the descriptor portion of the output data set
lengthstatement defines the length of a variable explicitly.
legnth First_Name Last_Name $ 12 Gender $ 1;12345678910data work.subset3;/* length statement define the order appeared in the ouput data, work.subset3 */length First_Name $ 12 Last_Name $ 18Gender $ 1 Job_Title $ 25 Country $ 2;infile 'sales.csv' dlm=',';/* input statment defines the order appeared in the source data, 'sales.csv' */input Employee_ID First_Name $Last_Name $ Gender $ SalaryJob_Title $ Country $;run;
Nonstandard datais any data that SAS cannot read without a special instruction. read such data:
input variable <$> variable < :informat >;, where
informatis an instruction that SAS uses to read data values into a variable.
- SAS uses data informats to read and convert dates to SAS date values.
:modifierinforms SAS to ignore the width associated with the informat and treat the file as delimited.
:mmddyy10.can read all of the following values:
input Employee_ID First_Name $ Last_Name $ Gender $ Salary Job_Title $ Country $ Birth_Date :date9. Hire_Date :mmddyy10.;