Chapter 8: Validating and Cleaning Data
- Data errors occur when data values are not appropriate for the SAS statements that are specified in a program. SAS detects data errors during program execution.
 - The 
freqproduce can show if any genders are notForMand if any countries are notAUorUS. - The 
meanprocedure can show if any salaries are not in the range of 24000 to 500000. The
univariateprocedure can show if any salaries are not in the range of 24000 to 500000.123456789101112131415161718192021222324252627282930data work.nonsales;length Employee_ID 8 First $ 12Last $ 18 Gender $ 1Salary Job_Title $ 25Country $ 2 Birth_DateHire_Date 8;infile 'nonsales.csv' dlm=',';input Employee_ID First $ Last $Gender $ Salary Job_Title $Country $ Birth_Date :date9.Hire_Date :date9.;format Birth_Date Hire_Date ddmmyy10.;run;proc print data=work.nonsales;var Employee_ID Job_Title Birth_Date Hire_Date;where Job_Title = ' ' or Birth_Date > Hire_Date;run;proc freq data=work.nonsales;tables Gender Country;run;proc means data=work.nonsales n nmiss min max;var Salary;run;proc univariate data=work.nonsales;var Salary;run;During the processing of every
datastep, SAS automatically creates the following temporary variable:
_N_variable, which counts the number of times thedatastep begins to iterate._ERROR_variable, which signals the occurrence of an error caused by the data during execution. 0 indicates no error exist.
- Which statement best descries the invalid data? b:
 
- The data in the raw data file is bad
 - The programmer incorrectly read the data
 
To write a SAS date constant, enclose a date in quotation marks in the form
ddmmyyyyand immediately follow the final quotation mark with the letterd. Example: January 1, 1974 is'01JAN1974'd1234proc print data=orion.nonsales;var Employee_ID Birth_Date Hire_Date;where Hire_Date < '01JAN1974'd;run;The
freqprocedure produces one-way to n-way frequency tables.
- The 
tablesstatement specifies the frequency tables to produce. Without it,proc freqproduces a frequency table for each variable. - The 
nlevelsoption displays a table that provides the number of distinct values for each variable named in thetablesstatement.123proc freq data=orion.nonsales nlevels;tables Gender Country Employee_ID;run; 
- The 
meansprocedure produces summary reports displayed descriptive statistics. 
- The 
varstatement specifies the analysis variables and their order in the result. - By default, the 
meansprocedure creates a report withN,mean,stddev,minandmax1234567891011proc means data=orion.nonsales n nmiss min max;var Salary;run;```10. The `univariate` procedure produces summary reports displaying descriptive statistics.+ The `var` statement specifies the analysis variables and their order in the results.+ Without the `var` statement, SAS will analysis all numeric variables.```sasproc univariate data=orion.nonsales;var Salary;run; 
- Interactively cleaning data: the 
Viewtablewindow enables you to browse, edit, or create SAS data sets interactively. - Programmatically cleaning data: The 
datastep can be used to programmatically clean the invalid data. 
- The assignment statement evaluates an expression and assigns the resulting value to a variable: 
variable = expression; Salary = 26960;Hire_Date = '21JAN1995'd;Country = upcase(Country);
The
if-then-elsestatement executes a SAS statement for observations that meet specific conditions.12345678910data work.clean;set orion.nonsales;Country=upcase(Country);if Employee_ID=120106 then Salary=26960;else if Employee_ID=120115 then Salary=26500;else if Employee_ID=120191 then Salary=24015;else if Employee_ID=120107 then Hire_Date='21JAN1995'd;else if Employee_ID=120111 then Hire_Date='01NOV1978'd;else if Employee_ID=121011 then Hire_Date='01JAN1998'd;run;What are the two phases of DATA step processing?: Compilation and Execution
- What is a program data vector (PDV)?: A logical area in memory where SAS holds the current observation
 - What is an instruction that SAS uses to read data values into a variable?: An informat
 - When would you use a : modifier?: You use a : modifier with nonstandard raw data that requires list input and an informat
 
Chapter 9: Manipulating Data
- If an operand is missing for an arithmetic operator, the result is missing. Example: 
var1 = .,var2 = 10, thennum = var1 + var2 / 2,numis.(missing). sum: return the sum of all arguments.year,qtr,month,day,weekday: extract pieces from a SAS date.today(): return the current date as a SAS date value.mdy(month, day, year): return a SAS date value.
AnnivBonus=mdy(month(Hire_Date),15,2008);Given the following code, are the correct results produced when the drop statement is placed after the set statement?
1234567data work.comp;set orion.sales;drop Gender Salary Job_Title Country Birth_Date Hire_Date;Bonus=500;Compensation=sum(Salary,Bonus);BonusMonth=month(Hire_Date);run;Yes, the drop statement specifies the names of the variables to omit from the output data set
- The 
dropandkeepstatements select variables after they are brought into the program data vector. Alternatives to the
dropandkeepstatements are thedrop=andkeep=data set options placed in thedatastatement.123456data work.comp(drop=Salary Hire_Date);set orion.sales(keep=Employee_ID First_Name Last_Name Salary Hire_Date);Bonus=500;Compensation=sum(Salary,Bonus);BonusMonth=month(Hire_Date);run;Multiple executable statements are allowed in
if-then do / else do ... endstatements.123456789101112data work.bonus;set orion.sales;length Freq $ 12;if Country='US' then do;Bonus=500;Freq='Once a Year';end;else do;Bonus=300;Freq='Twice a Year';end;run;if-then delete: an alternative to the subsettingifstatement is thedeletestatement on anif-thenstatement.
if BonusMonth ne 12 then delete;is equivalent to:if BounsMonth = 12;
Chapter 10: Combining SAS Data Sets
1.