Stat 430 Data Analysis Term Project Guidelines.
  (Due Date: Thursday December 18, 4pm)
  (I). The data project should be
based on a dataset which you select,   (II). The objective of your
data project should be to discover and present   (III). It is not required that
your data analysis project be "finished" in the   (IV). While it is permissible
to violate the guidelines in (I)-(II) somewhat,   (V). The guideline for how
much material to hand in is much like the
probably downloaded from some
public web source, and which I suggest
ought to have at least n=100
observations, a continuous response variable
Y, and at least several
other meaningful continuous or categorical explanatory
X-columns.
Ideally, since you will be looking for relationships between the
X and Y columns, the source and subject matter of the data should
relate to
a topic about which you have some general
knowledge to aid you in
asking and answering meaningful research
questions relevant to the data.
the best fitting
regression-type statistical model you can in SAS to explain the Y
responses in your dataset in terms of the X explanatory variables. So
at the outset,
you should try to pose questions about the data
relationships whose answers
will be interpretable and expressible in
clear language as well as a formal model.
A successful project will
relate the research questions to a regression-type model,
use
techniques developed in the Stat 430 course to build the best such
model you
can for the data and to examine the adequacy or goodness
of fit of the model, and
finally (maybe very briefly) explain what
conclusions your model lead to for the
data you studied.
sense of necessarily
reaching firm conclusions about a realistic problem, but
you should make
every effort to showcase tools learned in the course (of all
kinds:
histograms, QQplots, transformations, data-subsetting as necessary,
residuals plots and prediction intervals, standardized residuals
and considera-
tion of outliers, ANOVA, and automatic model-selection
techniques) and
demonstrate that you have uncovered
all the regression-model structure of
the data that was possible with
a reasonable amount of effort.
I strongly urge you to
discuss your project with me, before investing too much
effort into it,
if you know you want to deviate much from them. This is mostly
in order that I can help you avoid certain kinds of data (time series
where
successive observations are definitely not independent, or
survival data where
many observations are "censored" in the sense of
not being observed until the
health outcome of main interest, or
categorical response-data) where the main
assumptions of our
regression models are not tenable.
"Homework Guideline"
below. Do not hand in data or any computations or
pictures you do not
explicitly refer to in accompanying text. You must explain
the data
problem and model-building and solution in words, with reference to
pictures and numerical exhibits. You should hand in the SAS code as an
Appendix,
or email it to me as a text-file: but in either case it
should be edited down to the
code that worked to do the analyses and
exhibits you are handing in.
Hand in no more
than 20 printed pages -- including tables and pictures --- in a
reasonable sized font and spacing.
Here are links for a sample final project
assignment, data, and
guidelines from a data analysis
course at U of Michigan
similar to this one. If you wanted to do this assignment for your
project,
that would be OK.
The University of Maryland, College Park
has a nationally recognized
Code of Academic Integrity, administered by the Student Honor Council.
This Code sets standards for academic integrity at Maryland for all
undergraduate and graduate students. As a student you are responsible
for upholding these standards for this course. It is very important for
you to be aware of the consequences of cheating, fabrication,
facilitation, and plagiarism. For more information on the Code of
Academic Integrity or the Student Honor Council, please click
here.