This is a review of MATH2871, just to prepare for my final exam.
Chapter 2: Getting Started
Datastep are typically used to create SAS data sets.Procstep typically used to process SAS data sets.- SAS steps begins with either 
DataorProc. SAS detects the end of a step by: arunstatement, aquitstatement, the beginning of another step. - Three primary windows: 
Editorcontains the SAS program to submit.Logcontains information about the processing of the SAS program, including any warning and error messages.Ouputcontains reports generated by the SAS program. 
Chapter 3: Working with SAS Syntax
- SAS statements usually begin with an identifying keyword, always ends with a semicolon.123456data work.NewSalesEmps;length First_Name $ 12Last_Name $ 18 Job_Title $ 25;infile `newemps.csv` dlm=`,`;input First_Name $ Last_Name $ Job_Title $ Salary;run;
 
There are five statement in this data step, because 5 semicolons.
- SAS comments: 
/* comment */and* comment ; - Syntax errors: misspelled keywords, unmatched quotation marks, missing semicolons, invalid options.
 
Chapter 4: Getting Familiar with SAS Data Sets
- Components of SAS Data Sets: 
Descriptor PotionandData Potion. Descriptor potion contains general information about the SAS data set and variable names. Browsing the descriptor potion:
12proc contents data=work.NewSalesEmps;run;The data potion of a SAS data set is a rectangular table of character and/or numeric data values. Their are two types of variables:
- character: contain any value: letters, numbers, blanks …
 - numeric: stored as floating point numbers, in 8 bytes by default
 - SAS uses numeric data type to store data values
 
- A SAS data value is stored as the number of days between January 1, 1960, and a specific date.
 
- 01JAN1959 –> -365
 - 01JAN1960 –> 0
 - 01JAN1961 –> 366
 
- Missing value: A character missing value is displayed as a blank. A numeric missing value is displayed as a period (.)
 - Variable names:
 
- can be 32 characters long
 - must start with a letter or a underscore, subsequent can also be numerals
 - can be uppercase. lower case or mixed case,
 - are not case sensitive
 
proc printdisplay the data portion of a SAS data set12proc print data=work.NewSalesEmps;run;Options and statements can be added to the print procedure. The
noobsoption suppresses the observation numbers on the left side of the report. Thevarstatement selects variables that appear in the report and determines their order.123proc print data=work.NewSalesemps noobs;var Last_Name First_Name Salary;run;When a SAS session starts, SAS automatically create one temporary and at least one permanent SAS data library that you can access.
work: temporary library.sasuser:permanent library.- Create your own permanent libraries: 
librefmust be 8 characters or less. must start with a letter or underscore. the remaining characters must be letters, numbers or underscores.123456789libname libref `SAS-data-library` <options>;```11. The `libname` (Temporal link to the directory) statement remains in effect until canceled, changed or your SAS session ends.+ general form of cleaning `libref`: `libname libref clear;`12. The default `libref` is `work` if the `libref` is omitted.13. The `contents` procedure with the `_all_` keyword produces a list of all the SAS files in the data library. The `nods` option suppresses the descriptor portions of the data set. The `nods` is only used in conjunction with the keyword `_all_````sasproc contents data=libref._all_ nods;run; 
Chapter 5: Reading SAS Data Sets
- The 
setstatement read observations from a SAS data set for further processing in thedatastep. By default, thesetstatement read all observations and all variables from the input data set. - Subset observations by using the 
wherestatement. 
where Gender = 'M';where Salary > 50000;where Country in ('AU', 'US');where salary between 50000 and 100000; /* inclusive */where Employee_ID is null;where Employee_ID is missing; /* same as is null */where Job_Title contains 'Rep' /* case sensitive */;where Name like '%N'; /* A percent sign (%) replaces any number of characters */where Name like 'T_M%' /* An undersocre (_) replaces one character */
- Subset variables by using the 
dropandkeepstatements. 
dropstatement specifies the names if the variables to omit from the output data set.keepstatement specifies the names of the variables to write to the output data set.drop Employee_ID Gender Country Birth_Date;keep First_Name Last_Name Salary Job_Title Hire_Date;
- The 
sumstatement produces column totals. 
- general form of the 
sumstatement:sum variable(s) 
- Adding permanent attributes: Add labels to the descriptor portion of a SAS data set by using the 
labelstatement. Adding formats to the descriptor portion of a SAS data set by usingformatstatement. 
- In order to use labels in the 
printprocedure, a label option need to be added toproc printstatement:proc print data=work.subset1 label; run; - A 
formatis an instruction that SAS used to write data values. General form:format variable(s) format; 

- Use 
labelandformatstatement in the 
- proc step to temporally assign the attributes
 - data step to permanently assign the attributes
 
  | 
  | 
Chapter 6: Reading Excel Worksheets
libname oriionxls '~/desktop/dataset/sales.xls';- SAS name literals: By default, special characters such as 
$are not allowed in data set names, but SAS name literals allow special characters to be included in data set names. (A string with quotation marks, followed by the lettern).1234libname orionxls '~/desktop/dataset/sales.xls';proc print data=orionxls.'Australia$'n;run; 
  | 
  | 
Chapter 7: Reading Delimited Raw Data File
- The 
infilestatement identifies the physical name of the raw data file to read with aninputstatement. - The 
inputstatement describes the arrangement of values in the raw data file and assigns input values to the corresponding SAS variables. The
dlm=option can be added to theinfilestatement to specify an alternate delimiter. (By default, the delimiter isspace)123456data work.subset3;infile 'sales.csv' dlm=',';input Employee_ID First_Name $Last_Name $ Gender $ SalaryJob_Title $ Country $;run;input variables <$>;: variables must be specified in the order they appear in the raw data file, left to right.$indicates to store a variable as a character value. The default length for character and numeric variables is eight bytes.- The 
datastep is process in two phases: compilation & execution 
- See Week 4’s note, page 21 for more details
 
- During the compilation phase, SAS
 
- checks the syntax of the DATA step statements
 - creates an input buffer to hold the current raw data file record that is being processed
 - creates a program data vector (PDV) to hold the current SAS observation
 - creates the descriptor portion of the output data set
 
The
lengthstatement defines the length of a variable explicitly.legnth First_Name Last_Name $ 12 Gender $ 1;12345678910data work.subset3;/* length statement define the order appeared in the ouput data, work.subset3 */length First_Name $ 12 Last_Name $ 18Gender $ 1 Job_Title $ 25 Country $ 2;infile 'sales.csv' dlm=',';/* input statment defines the order appeared in the source data, 'sales.csv' */input Employee_ID First_Name $Last_Name $ Gender $ SalaryJob_Title $ Country $;run;Nonstandard datais any data that SAS cannot read without a special instruction. read such data:input variable <$> variable < :informat >;, whereinformatis an instruction that SAS uses to read data values into a variable.
- SAS uses data informats to read and convert dates to SAS date values.
 
:modifier: The:modifierinforms SAS to ignore the width associated with the informat and treat the file as delimited.
:mmddyy10.can read all of the following values:01/07/2008,1/7/2008,1/07/2008,01/07/08,01/7/2008,1/7/08input Employee_ID First_Name $ Last_Name $ Gender $ Salary Job_Title $ Country $ Birth_Date :date9. Hire_Date :mmddyy10.;