Saturday, 19 September 2015

Learn SAS by examples - Ron Cody - solutions of some even number problems of chapter 7 to chapter 15

In this blog I am trying to demonstrate the solutions of few problems from Ron Cody's book - "Learn SAS by example".
I will try to explain the code for the solutions.

My explanation will follow the following pattern
* chapter number- problem number
Problem in brief
Dataset used : - which data set is used to solve the problem
Condition applied on which column :- important column in that dataset to solve the problem
Important operators :- Important operators used to solve the problem
Solution Explation : Approach taken to solve the problem





Note : please refer the book for more details on the question.
Get all the datasets required here.

Let us start with chapter seven problem number 2.
First you have to run the libname statement to create a library that stores the codes permanently

libname a15010 "/folders/myfolders/Learning SAS by Example";
*7-2
The following program demonstrates printing of obervations with subject value 5,100,150,200 using proc print.
This functionality is carried out with two different operators "in" and "or".
This is demonstrated in the below program.

*7-2; title "Data for 4 subjects - 5,100,150,200"; proc print data=a15010.a10_hosp; /* using multiple or statements */ *where Subject = 5 or Subject = 100 or Subject = 150 or Subject = 200; /* using in statement */ where Subject in(5,100,150,200); run;

Dataset Used : - Hosp
Condition applied on column name - Subject
important Operators - "in","or"
solution explanation :in operator takes all the values specified iside the bracket.
Or operator checks whether the value of subject is 5 or value of subject is 10 and so on.
we use proc print to print the dataset






Output:



*7-6
Using Sales dataset given we have to list all observations of north region whose quantity is less than 60.
Moreover, in this list include any observations with customer name Pet's are Us.

Dataset Used : - Sales
Condition applied on column name - region,quantity and CustomerName
important Operators - "where","if-else"
Solution Explanation :- If else condition along with and are applied to check the above mentioned conditions.
and where is also used to add all the customers with the given customer name

*7-6; title "Selected Sales Observations"; proc print data=a15010.a10_sales noobs; where Region eq "North" and Quantity le 60 or Customer eq "Pet's are Our's"; run;


Output:




*8-4

Next problem demonstrates counting of missing values for A, B, and C in the Missing data
set.


Dataset Used : - Missing
Condition applied on column name - A,B,C
important Operators/functions - "Missing()"
solution explanation :using missing function we have stored the values in three respective variables for A,Band C.
Added the cumulative number of missing values to each observation by using three different variables.


* 8 -4; data a15010.a10_missing; input A $ B $ C $; if missing(A) then MissA + 1; if missing(B) then MissB + 1; if missing(C) then MissC + 1; datalines; X Y Z X Y Y Z Z Z X X . Y Z . X . . ; run; title "printing number of missing values"; proc print data= a15010.a10_missing; run;

Output:




*8-10
In the below problem we have been given 10 results for three speed-reading methods (A, B, and C)
Here are the results:

250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399

we have to list three lines and Score.

Dataset used - Created temporary data set

important Operators/functions - "do loop"

solution explanation :Created a temporary SAS data set to store the given observations.
Used single trailing @ in our INPUT statement to indicate the starting point for each input.
Then applied do loop to get read the values and calculate the score.


*8-10; data Speed; length Method $ 1; do Method = 'A','B','C'; do line= 1 to 10; input spd @; output; end; end; datalines; 250 255 256 300 244 268 301 322 256 333 267 275 256 320 250 340 345 290 280 300 350 350 340 290 377 401 380 310 299 399 ; title "Speed Reading"; proc print data=Speed; run;

Output :



*8-14

In the below program we have generated a table of integers whose squares squares start at 1 and end when the square
value is greater than 100. Use either a DO UNTIL or DO WHILE statement to
accomplish this.


Dataset used - Created temporary data set

important Operators/functions - "do while" "do until"

solution explanation :Created a temporary SAS data and calculated the value of squares of integer starting from 1 using do while loop and do until loop.
the while statement is used to state the condition. Our condition here is that when square value is greater than hundered then exit the do loop.


*8-14; data squares; do num = 1 to 20 until (sqr ge 100); sqr = num * num; output; end; run; title "listing of squares till 100"; proc print data = squares; run; /* Note : •do while: evaluation at top of loop •do until: evaluation at bottom of loop */


Output:




* 9-4.
Let us try out few date computation problems. In the problem given below we have to calculate age of the subjects in an hosiptal.
His age till 1 Jan 2006 and his age today.


Dataset used : Hosp
Importane attributes : DOB
important operators/functions/statements : "yrdiff()"

solution explanation :We computed the age of the person by taking the difference between the two dates i.e. 1st Jan 2006 and the DOB.
The second date was calculated by taking the difference between todays date and DOB.
We used nobs =10 to list the first 10 observaartions.
Format statement is used to display the variables in the format you want.


*9-4; data ages; set a15010.a10_hosp; AgeJan1 = yrdif(DOB,'01Jan2006'd,'Actual'); AgeToday = yrdif(DOB,today(),'Actual'); run; title "Listing of AGES"; proc print data=ages; var DOB AgeJan1 AgeToday; format AgeJan1 AgeToday 5.1; run;

Output:



*9-6

In this program we compute frequencies for the days of the week for the date
of the visit (VisitDate) in Medical data set, . Supply a format for the days of the week and months of the
year.

Dataset used : Medical
Importane attributes : VisitDate
important operators/functions/statements : format , "day()" "month()"

solution explanation :We calculated the day of the week and month of the year using day() and moonth function.
Format statement is used to display the variables in the format you want.


*9-6; data freqvisit; set a15010.medical (keep=VisitDate); Day = weekday(Visitdate); Month = month(VisitDate); Year = year(VisitDate); run; proc format; value days 1='Sunday' 2='Monday' 3='Tuesday' 4='Wednesday' 5='Thursday' 6='Friday' 7='Saturday'; value months 1='January' 2='Februaru' 3='March' 4='April' 5='May' 6='June' 7='July' 8='August' 9='September' 10='October' 11='November' 12='December'; run; title "Frequencies of Visit Date"; proc freq data=freqvisit; tables Day Month Year / nocum nopercent; format Day days. Month months.; run;

Output:



* 10 - 2.
Listing all the observations from hosp dataset where the admission date (AdmitDate) falls on
a Monday and the year is 2002 and compute the person’s age as of the admission date, rounded to the nearest year.

Dataset used : Hosp
Importane attributes : Admitdate
important operators/functions/statements : yrdiff, round(),day().

solution explanation :We calculated the day of the week and check it is monday and yeae is 2002 by year and day functions.
then using yrdif function we calculated the age of the function by passing admit date and dob as arguments.
Finall we rounded the age to the nearest integer by using round function.


*10-2; data Monday2002; set a15010.a10_hosp; where weekday(AdmitDate)=1 and year(AdmitDate)=2002 ; Age = round(yrdif(DOB,AdmitDate,'Actual')); run; title "listing of all patients on mondays of year 2002"; proc print data = Monday2002; run;

Output:



* 10-4
Following program lists all the observations from Bicycles where Country is USA and
Model is Mountain and also list Bicycles where
Country is France and Model is Road Bike


Dataset used : Bicycles
Importane attributes : country , model
important operators/functions/statements : set,merge

solution explanation :We use set statement to create tpmorary datasets. We apply if conditions to get th desired result.



*10-4; data Mountain_USA ; set a15010.a10_bicycles; where Country ='USA' and Model = 'Mountain Bike'; run; data Road_France ; set a15010.a10_bicycles; where Country = 'France' and Model = 'Road Bike'; run; title "Listing bicycles in usa with model mountain"; proc print data=Mountain_USA; run; title "Listing bicycles in France with model Road Bike"; proc print data=Road_France; run;

Output:




*11- 2

The below peoblem illustrates Counting the number of missing values for WBC, RBC, and Chol in the Blood data set.
Use the MISSING function to detect missing values.

Dataset used : Blood
Importane attributes : WBC,RBC,Chol
important operators/functions/statements : missing()

solution explanation :Used missing function to find if there are ny missing values in blood dataset w.r.t. WBC,RBC and Chol column.
If present then it is depcted by 1 else 0.




*11-2; data missing_values; set a15010.blood; if missing(WBC) then MissWBC + 1; if missing(RBC) then MissRBC + 1; if missing(Chol) then MissChol + 1; run; Title "Listing missing values"; proc print data=missing_values; run;

Output:




*11-4.

The SAS data set Psych contains an ID variable, 10 question responses (Ques1–
Ques10), and 5 scores (Score1–Score5). You want to create a new, temporary SAS
data set (Evaluate) containing the following:

a. A variable called QuesAve computed as the mean of Ques1–Ques10. Perform
this computation only if there are seven or more non-missing question values.

b. If there are no missing Score values, compute the minimum score (MinScore),
the maximum score (MaxScore), and the second highest score (SecondHighest).

Dataset used : Psych
Importane attributes : score1-socre5
important operators/functions/statements : min(),max(),largest()

solution explanation :used nmiss variable to calculate total missing values and then calculated the avg of non missiing values .
again checked for the missing values in score1-5 if not then calulated minimum maximum and 2nd maximum score using three functions listed above.

In the final result you will only observe avg score min score and max score column as keep statement iss used t keep only these columns to sgow the output

*11-4; data Evaluate; set a15010.a10_Psych; if nmiss(of ques1-ques10) le 3 then do; QuesAve = mean(of ques1-ques10); end; if nmiss(of score1-score5) eq 0 then do; Min_score = min(of score1-score5); Max_score2 = largest(2,of score1-score5); Max_score = max(of score1-score5); end; keep QuesAve Min_score Max_score2 Max_score; run; proc print data=Evaluate; run;

Output:



*12-2
Using the data set Mixed, create a temporary SAS data set (also called Mixed) with
the following new variables:
a. NameLow – Name in lowercase
b. NameProp – Name in proper case
c. (Bonus – difficult) NameHard – Name in proper case without using the
PROPCASE function

Dataset used : Mixed
Importane attributes : Name
important operators/functions/statements : lowcase(),propcase()

solution explanation :for part a and b propcase and lowcase character functions convert your supplied argument into lower case and proper case respectivelyy.
propcase means first letter of the string should be capital.
For part c. there is a logic you need to appl. The logic is Scan off each 'word' based upon a space, locate the position
of the first letter in the original string, UPCASE the first  letter and use SUBSTR to replace the byte





*12-2; data Mixed; set a15010.a10_mixed; NameLow = LOWCASE(Name); NameProp = PROPCASE(Name); /*NameHard – Name in proper case without using the PROPCASE function*/ NameHard =compbl(lowcase(Name)); length next $ 20; i=0; next=scan(NameHard ,1,' '); do while(next ne ' '); i+1; /* Scan off each 'word' based upon a space, locate the position */ /* of the first letter in the original string, UPCASE the first */ /* letter and use SUBSTR to replace the byte. */ pos=indexw(NameHard ,trim(next)); substr(NameHard ,pos,1)=upcase(substr(NameHard ,pos,1)); next=scan(NameHard ,i,' '); end; keep NameLow NameProp NameHard; run; title "listing of dataset mixed"; proc print data = mixed; run;

Output:



*12- 4
Data set Names_And_More contains a character variable called Height. As you can
see in the listing in Problem 3, the heights are in feet and inches. Assume that these
units can be in upper- or lowercase and there may or may not be a period following
the units. Create a temporary SAS data set (Height) that contains a numeric variable
(HtInches) that is the height in inches.


Dataset used : Names_And_More
Importane attributes : Height
important operators/functions/statements : compress(),scan()

solution explanation : In height variable we hav ht in two parts. one specifying ft and other inches.
we separate these two parts using scan function. Left of " " will have feet value and right will have inches.
then we compres these values to keep text and delete the character content.and then calculate height in inches by
multiplying ht in feet by 12 and adding the inches part.



*12-4; data Height; set a15010.a10_names_and_more(keep=Height); Feet = scan(Height,1,' '); Inches = scan(Height,2,' '); Feet = compress(Feet,,'kd'); Inches= compress(Inches,,'kd'); if missing(Inches) then Inches =0; HtInches = 12*Feet + Inches; run; title "Listing of Height in inches"; proc print data=Height noobs; run;

Output:



*13-2

Using the SAS data set Survey2, create a new, temporary SAS data set (Survey2)
where the values of the variables Ques1–Ques5 are reversed as follows: 1 - 5; 2
- 4; 3 - 3; 4 - 2; 5 - 1.
Note: Ques1–Ques5 are character variables.

Dataset used : Survey2
Importane attributes : Ques1-Ques5
important operators/functions/statements : array , translate()
solution explanation : We acomplish this task using array.using translate functio we reverse the values of the questions






*13-2; data survey2; set a15010.a10_survey2; array Ques{5} Q1-Q5; do i = 1 to 5; Ques{i} = translate(Ques{i},'54321','12345'); end; drop i; run; title "List of Survey 2"; proc print data=survey2; run;

Output:



*13-4

Data set Survey2 has five numeric variables (Q1–Q5), each with values of 1, 2, 3, 4,
or 5. You want to determine for each subject (observation) if they responded with a
5 on any of the five questions. This is easily done using the OR or the IN operators.
However, for this question, use an array to check each of the five questions. Set
variable (ANY5) equal to Yes if any of the five questions is a 5 and No otherwise.

Dataset used : Survey2
Importane attributes : Ques1-5
important operators/functions/statements : _NUMERIC_

solution explanation : we accomplish this task using numeric array.chec for ieach content in arra if it is equal to five
if yes then print yes else print no




*13-4; data a15010.a10_survey2; Any5='No'; array five{*} _NUMERIC_; do i = 1 to dim(five); if five{i} = 5 then ANY5 = 'yes'; end; drop i; run; "Printing value of any5 variable"; proc print data= a15010.a10_survey2 noobs; run;


Output :


*14- 2

Using sales dataset we have to create a report using east and west region sales. and add them up w.r.t region.

Dataset used : Sales
Importane attributes : region,quantity,totalsales
important operators/functions/statements : proc sort, where,sum,var

solution explanation : We sort the sales dataa using regionand then print the records of east and west region
using where statement in proc print.To achieve the region wise sum we use sum statement in proc print.




*14-2; proc sort data=a15010.a10_sales; by region; run; title "Sales figure from sales data set"; proc print data=a15010.a10_sales; by region; id region; where region in ("East","West"); var Quantity TotalSales; sum Quantity TotalSales; run;

Output:




*14- 3
Using proc print we have to list records in september and calculate number of entries at the bottom and the patients age.
Without  doing a data step.

Dataset used : hosp
Importane attributes : AdmitDate,Dischargedate,DOB
important operators/functions/statements : where , label

solution explanation : We use where statement to achieve he desired output and we have also labelled the column.




*14-3; title "Selected Patients from HOSP Data Set"; title2 "Admitted in September of 2004"; title3 "Older than 83 years of age"; title4 "--------------------------------------"; proc print data=a15010.a10_hosp n='Number of Patients = ' label double; where Year(AdmitDate) eq 2004 and Month(AdmitDate) eq 9 and yrdif(DOB,AdmitDate,'Actual') ge 83; id Subject; var DOB AdmitDate DischrDate; label AdmitDate = 'Admission Date' DischrDate = 'Discharge Date' DOB = 'Date of Birth'; run;

Output:




*15-3

Using the Hosp data set, create the report where Age should be computed using
the YRDIF function and rounded to the nearest integer.


Dataset used : hosp
Importane attributes : AdmitDate,DOB
important operators/functions/statements : proc report , coulmn ,define , rbreak

solution explanation : We accomplish this task using proc report to create a report
Proc report has some components.COLUMN statement serves a similar function to the VAR statement of
PROC PRINT—it allows you to select which variables you want in your report and the
order that they appear.
DEFINE statement to specifies the usage for each variable




*15-3; title "Demonstrating a Compute Block"; proc report data=a15010.a10_hosp(obs=5) nowd headline; column Subject AdmitDate DOB Age; define AdmitDate / display "Admission Date" width=10; define DOB / display; define Subject / display width=7; define Age / computed "Age at Admission" ; compute Age; Age = round(yrdif(DOB,AdmitDate,'Actual')); endcomp; run; quit;

Output:



*15- 4

Using the SAS data set BloodPressure, compute a new variable in your report. This
variable (Hypertensive) is defined as Yes for females (Gender=F) if the SBP is
greater than 138 or the DBP is greater than 88 and No otherwise. For males
(Gender=M), Hypertensive is defined as Yes if the SBP is over 140 or the DBP is over
90 and No otherwise.

Dataset used : BloodPressure
Importane attributes : DBP,SBP,Hypertensive
important operators/functions/statements : proc report, compute

solution explanation : We use if else statement inside report statement to match the given conditions.
To compute value for new variable called hypertensive we use compute statement inside proc report.
*15-4; proc report data=a15010.a10_bloodpressure nowd headline; column Gender SBP DBP Hypertensive; define Gender / group width=8; define SBP / display; define DBP / display; define Hypertensive / computed "Hypertensive?"; compute Hypertensive/character length=6; If gender eq "F" then if SBP > 138 or DBP > 88 then Hypertensive = "yes" ; else Hypertensive = "no"; else if SBP > 140 or DBP > 90 then Hypertensive = "yes"; else Hypertensive = "no"; endcomp; run; quit;


Output :


I hope that this blog helps you.

2 comments:

  1. "Beautify" the programs before posting. There a menu item in SAS studio, which does it for you. Also, you may like to post the programs, the way you have posted the result.

    ReplyDelete
    Replies
    1. Sir, Thank you for the reply. I will make the suggested changes.

      Delete