MATH2871 Review (Part 2)

Chapter 8: Validating and Cleaning Data

  1. Data errors occur when data values are not appropriate for the SAS statements that are specified in a program. SAS detects data errors during program execution.
  2. The freq produce can show if any genders are not F or M and if any countries are not AU or US.
  3. The mean procedure can show if any salaries are not in the range of 24000 to 500000.
  4. The univariate procedure can show if any salaries are not in the range of 24000 to 500000.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    data work.nonsales;
    length Employee_ID 8 First $ 12
    Last $ 18 Gender $ 1
    Salary Job_Title $ 25
    Country $ 2 Birth_Date
    Hire_Date 8;
    infile 'nonsales.csv' dlm=',';
    input Employee_ID First $ Last $
    Gender $ Salary Job_Title $
    Country $ Birth_Date :date9.
    Hire_Date :date9.;
    format Birth_Date Hire_Date ddmmyy10.;
    run;
    proc print data=work.nonsales;
    var Employee_ID Job_Title Birth_Date Hire_Date;
    where Job_Title = ' ' or Birth_Date > Hire_Date;
    run;
    proc freq data=work.nonsales;
    tables Gender Country;
    run;
    proc means data=work.nonsales n nmiss min max;
    var Salary;
    run;
    proc univariate data=work.nonsales;
    var Salary;
    run;
  5. During the processing of every data step, SAS automatically creates the following temporary variable:

  • _N_ variable, which counts the number of times the data step begins to iterate.
  • _ERROR_ variable, which signals the occurrence of an error caused by the data during execution. 0 indicates no error exist.
  1. Which statement best descries the invalid data? b:
  • The data in the raw data file is bad
  • The programmer incorrectly read the data
  1. To write a SAS date constant, enclose a date in quotation marks in the form ddmmyyyy and immediately follow the final quotation mark with the letter d. Example: January 1, 1974 is '01JAN1974'd

    1
    2
    3
    4
    proc print data=orion.nonsales;
    var Employee_ID Birth_Date Hire_Date;
    where Hire_Date < '01JAN1974'd;
    run;
  2. The freq procedure produces one-way to n-way frequency tables.

  • The tables statement specifies the frequency tables to produce. Without it, proc freq produces a frequency table for each variable.
  • The nlevels option displays a table that provides the number of distinct values for each variable named in the tables statement.
    1
    2
    3
    proc freq data=orion.nonsales nlevels;
    tables Gender Country Employee_ID;
    run;
  1. The means procedure produces summary reports displayed descriptive statistics.
  • The var statement specifies the analysis variables and their order in the result.
  • By default, the means procedure creates a report with N, mean, stddev, min and max
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    proc means data=orion.nonsales n nmiss min max;
    var Salary;
    run;
    ```
    10. The `univariate` procedure produces summary reports displaying descriptive statistics.
    + The `var` statement specifies the analysis variables and their order in the results.
    + Without the `var` statement, SAS will analysis all numeric variables.
    ```sas
    proc univariate data=orion.nonsales;
    var Salary;
    run;
  1. Interactively cleaning data: the Viewtable window enables you to browse, edit, or create SAS data sets interactively.
  2. Programmatically cleaning data: The data step can be used to programmatically clean the invalid data.
  • The assignment statement evaluates an expression and assigns the resulting value to a variable: variable = expression;
  • Salary = 26960;
  • Hire_Date = '21JAN1995'd;
  • Country = upcase(Country);
  1. The if-then-else statement executes a SAS statement for observations that meet specific conditions.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    data work.clean;
    set orion.nonsales;
    Country=upcase(Country);
    if Employee_ID=120106 then Salary=26960;
    else if Employee_ID=120115 then Salary=26500;
    else if Employee_ID=120191 then Salary=24015;
    else if Employee_ID=120107 then Hire_Date='21JAN1995'd;
    else if Employee_ID=120111 then Hire_Date='01NOV1978'd;
    else if Employee_ID=121011 then Hire_Date='01JAN1998'd;
    run;
  2. What are the two phases of DATA step processing?: Compilation and Execution

  3. What is a program data vector (PDV)?: A logical area in memory where SAS holds the current observation
  4. What is an instruction that SAS uses to read data values into a variable?: An informat
  5. When would you use a : modifier?: You use a : modifier with nonstandard raw data that requires list input and an informat

Chapter 9: Manipulating Data

  1. If an operand is missing for an arithmetic operator, the result is missing. Example: var1 = . , var2 = 10, then num = var1 + var2 / 2, num is . (missing).
  2. sum: return the sum of all arguments. year, qtr, month, day, weekday: extract pieces from a SAS date. today(): return the current date as a SAS date value. mdy(month, day, year): return a SAS date value.
  • AnnivBonus=mdy(month(Hire_Date),15,2008);
  • Given the following code, are the correct results produced when the drop statement is placed after the set statement?

    1
    2
    3
    4
    5
    6
    7
    data work.comp;
    set orion.sales;
    drop Gender Salary Job_Title Country Birth_Date Hire_Date;
    Bonus=500;
    Compensation=sum(Salary,Bonus);
    BonusMonth=month(Hire_Date);
    run;
  • Yes, the drop statement specifies the names of the variables to omit from the output data set

  1. The drop and keep statements select variables after they are brought into the program data vector.
  2. Alternatives to the drop and keep statements are the drop= and keep= data set options placed in the data statement.

    1
    2
    3
    4
    5
    6
    data work.comp(drop=Salary Hire_Date);
    set orion.sales(keep=Employee_ID First_Name Last_Name Salary Hire_Date);
    Bonus=500;
    Compensation=sum(Salary,Bonus);
    BonusMonth=month(Hire_Date);
    run;
  3. Multiple executable statements are allowed in if-then do / else do ... end statements.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    data work.bonus;
    set orion.sales;
    length Freq $ 12;
    if Country='US' then do;
    Bonus=500;
    Freq='Once a Year';
    end;
    else do;
    Bonus=300;
    Freq='Twice a Year';
    end;
    run;
  4. if-then delete: an alternative to the subsetting if statement is the delete statement on an if-then statement.

  • if BonusMonth ne 12 then delete; is equivalent to:
  • if BounsMonth = 12;

Chapter 10: Combining SAS Data Sets

1.

谢谢你请我吃糖果:)