MATH2871 Review (Part 1)

This is a review of MATH2871, just to prepare for my final exam.

Chapter 2: Getting Started

  1. Data step are typically used to create SAS data sets.
  2. Proc step typically used to process SAS data sets.
  3. SAS steps begins with either Data or Proc. SAS detects the end of a step by: a run statement, a quit statement, the beginning of another step.
  4. Three primary windows: Editor contains the SAS program to submit. Log contains information about the processing of the SAS program, including any warning and error messages. Ouput contains reports generated by the SAS program.

Chapter 3: Working with SAS Syntax

  1. SAS statements usually begin with an identifying keyword, always ends with a semicolon.
    1
    2
    3
    4
    5
    6
    data work.NewSalesEmps;
    length First_Name $ 12
    Last_Name $ 18 Job_Title $ 25;
    infile `newemps.csv` dlm=`,`;
    input First_Name $ Last_Name $ Job_Title $ Salary;
    run;

There are five statement in this data step, because 5 semicolons.

  1. SAS comments: /* comment */ and * comment ;
  2. Syntax errors: misspelled keywords, unmatched quotation marks, missing semicolons, invalid options.

Chapter 4: Getting Familiar with SAS Data Sets

  1. Components of SAS Data Sets: Descriptor Potion and Data Potion.
  2. Descriptor potion contains general information about the SAS data set and variable names. Browsing the descriptor potion:

    1
    2
    proc contents data=work.NewSalesEmps;
    run;
  3. The data potion of a SAS data set is a rectangular table of character and/or numeric data values. Their are two types of variables:

  • character: contain any value: letters, numbers, blanks …
  • numeric: stored as floating point numbers, in 8 bytes by default
  • SAS uses numeric data type to store data values
  1. A SAS data value is stored as the number of days between January 1, 1960, and a specific date.
  • 01JAN1959 –> -365
  • 01JAN1960 –> 0
  • 01JAN1961 –> 366
  1. Missing value: A character missing value is displayed as a blank. A numeric missing value is displayed as a period (.)
  2. Variable names:
  • can be 32 characters long
  • must start with a letter or a underscore, subsequent can also be numerals
  • can be uppercase. lower case or mixed case,
  • are not case sensitive
  1. proc print display the data portion of a SAS data set

    1
    2
    proc print data=work.NewSalesEmps;
    run;
  2. Options and statements can be added to the print procedure. The noobs option suppresses the observation numbers on the left side of the report. The var statement selects variables that appear in the report and determines their order.

    1
    2
    3
    proc print data=work.NewSalesemps noobs;
    var Last_Name First_Name Salary;
    run;
  3. When a SAS session starts, SAS automatically create one temporary and at least one permanent SAS data library that you can access. work: temporary library. sasuser:permanent library.

  4. Create your own permanent libraries: libref must be 8 characters or less. must start with a letter or underscore. the remaining characters must be letters, numbers or underscores.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    libname libref `SAS-data-library` <options>;
    ```
    11. The `libname` (Temporal link to the directory) statement remains in effect until canceled, changed or your SAS session ends.
    + general form of cleaning `libref`: `libname libref clear;`
    12. The default `libref` is `work` if the `libref` is omitted.
    13. The `contents` procedure with the `_all_` keyword produces a list of all the SAS files in the data library. The `nods` option suppresses the descriptor portions of the data set. The `nods` is only used in conjunction with the keyword `_all_`
    ```sas
    proc contents data=libref._all_ nods;
    run;

Chapter 5: Reading SAS Data Sets

  1. The set statement read observations from a SAS data set for further processing in the data step. By default, the set statement read all observations and all variables from the input data set.
  2. Subset observations by using the where statement.
  • where Gender = 'M';
  • where Salary > 50000;
  • where Country in ('AU', 'US');
  • where salary between 50000 and 100000; /* inclusive */
  • where Employee_ID is null;
  • where Employee_ID is missing; /* same as is null */
  • where Job_Title contains 'Rep' /* case sensitive */;
  • where Name like '%N'; /* A percent sign (%) replaces any number of characters */
  • where Name like 'T_M%' /* An undersocre (_) replaces one character */
  1. Subset variables by using the drop and keep statements.
  • drop statement specifies the names if the variables to omit from the output data set.
  • keep statement specifies the names of the variables to write to the output data set.
  • drop Employee_ID Gender Country Birth_Date;
  • keep First_Name Last_Name Salary Job_Title Hire_Date;
  1. The sum statement produces column totals.
  • general form of the sum statement: sum variable(s)
  1. Adding permanent attributes: Add labels to the descriptor portion of a SAS data set by using the label statement. Adding formats to the descriptor portion of a SAS data set by using format statement.
  • In order to use labels in the print procedure, a label option need to be added to proc print statement: proc print data=work.subset1 label; run;
  • A format is an instruction that SAS used to write data values. General form: format variable(s) format;
  • SAS formats
  • SAS Date Formats
  1. Use label and format statement in the
  • proc step to temporally assign the attributes
  • data step to permanently assign the attributes
1
2
3
4
5
6
7
8
9
10
libname orion '~/desktop/datasets';
data work.subset1;
set orion.sales;
where Country='AU' and Job_Title contains 'Rep';
keep Fisrt_Name Last_Name Salary Job_Tile Hire_Date;
label Job_Title = 'Sales Title'
Hire_Date = 'Date Hired';
format Salary commax8. Hire_Date ddmmyy10.;
run;

Chapter 6: Reading Excel Worksheets

  1. libname oriionxls '~/desktop/dataset/sales.xls';
  2. SAS name literals: By default, special characters such as $ are not allowed in data set names, but SAS name literals allow special characters to be included in data set names. (A string with quotation marks, followed by the letter n).
    1
    2
    3
    4
    libname orionxls '~/desktop/dataset/sales.xls';
    proc print data=orionxls.'Australia$'n;
    run;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
libname orionxls 'P:\math2871\sales.xls';
data work.subset1;
set orionxls.'UnitedStates$'n;
run;
data work.subset2;
set orionxls.'Australia$'n;
run;
proc contents data=orionxls._all_;
run;
libname orionxls clear;

Chapter 7: Reading Delimited Raw Data File

  1. The infile statement identifies the physical name of the raw data file to read with an input statement.
  2. The input statement describes the arrangement of values in the raw data file and assigns input values to the corresponding SAS variables.
  3. The dlm= option can be added to the infile statement to specify an alternate delimiter. (By default, the delimiter is space)

    1
    2
    3
    4
    5
    6
    data work.subset3;
    infile 'sales.csv' dlm=',';
    input Employee_ID First_Name $
    Last_Name $ Gender $ Salary
    Job_Title $ Country $;
    run;
  4. input variables <$>;: variables must be specified in the order they appear in the raw data file, left to right. $ indicates to store a variable as a character value. The default length for character and numeric variables is eight bytes.

  5. The data step is process in two phases: compilation & execution
  • See Week 4’s note, page 21 for more details
  1. During the compilation phase, SAS
  • checks the syntax of the DATA step statements
  • creates an input buffer to hold the current raw data file record that is being processed
  • creates a program data vector (PDV) to hold the current SAS observation
  • creates the descriptor portion of the output data set
  1. The length statement defines the length of a variable explicitly. legnth First_Name Last_Name $ 12 Gender $ 1;

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    data work.subset3;
    /* length statement define the order appeared in the ouput data, work.subset3 */
    length First_Name $ 12 Last_Name $ 18
    Gender $ 1 Job_Title $ 25 Country $ 2;
    infile 'sales.csv' dlm=',';
    /* input statment defines the order appeared in the source data, 'sales.csv' */
    input Employee_ID First_Name $
    Last_Name $ Gender $ Salary
    Job_Title $ Country $;
    run;
  2. Nonstandard data is any data that SAS cannot read without a special instruction. read such data: input variable <$> variable < :informat >;, where informat is an instruction that SAS uses to read data values into a variable.

  • SAS uses data informats to read and convert dates to SAS date values.
  1. :modifier: The :modifier informs SAS to ignore the width associated with the informat and treat the file as delimited.
  • :mmddyy10. can read all of the following values: 01/07/2008, 1/7/2008, 1/07/2008, 01/07/08, 01/7/2008, 1/7/08
    input Employee_ID First_Name $ Last_Name $ 
      Gender $ Salary Job_Title $ Country $ 
      Birth_Date :date9. 
      Hire_Date :mmddyy10.;
    
谢谢你请我吃糖果:)