Old Statistics Seminar Schedules
Contents
Seminar Talks 2011-2005
(Fall 2011, Seminar No. 1)
SPEAKER: Dr. Michael Sverchkov
Bureau of Labor Statistics
Washington, DC 20212-0001, USA
TITLE:
Model-Based and Semi-Parametric Estimation of Time Series Components
and Mean Square Error of Estimators
TIME AND PLACE:
September 8, 2011, 3:30pm
Room 1313, Math Bldg.
ABSTRACT:
This paper will focus on time-series analysis and more specifically,
on estimation of seasonally adjusted and trend components and the mean
square error (MSE) of the estimators. We shall compare the component
estimators obtained by application of the
X-11 ARIMA method with estimators obtained by fitting state-space
models that account more directly for correlated sampling errors.
The component estimators and
MSE estimators are obtained under a different definition of the target
components.
By this definition the unknown components are defined to be the X-11
estimates of
them in the absence of sampling errors and if the time series under
consideration is
long enough for application of the symmetric filters imbedded in this
procedure. We
propose new MSE estimators with respect to this definition.
The performance of the
estimators is assessed by using simulated series that approximate a real series
produced by the Bureau of Labor Statistics in the U.S.A.
Key words: Bias correction, Mean-Square error, Seasonal Adjustment, Trend.
Go to
Slides
(Fall 2011, Seminar No. 2)
SPEAKER: Prof. Abram Kagan
University of Maryland
Colege Park, MD 20742, USA
TITLE:
Subsufficient algebra related to the structure of UMVUEs
TIME AND PLACE:
September 22, 2011, 3:30pm
Room 1313, Math Bldg.
ABSTRACT:
Statistics T(X) (or, more generally, subalgebras) with the property
that any function of T(X) is a UMVUE are studied.
Though they are functions of the minimal sufficient statistic, the
construction in case of categorical X is absolutely different
from that of the minimal sufficient statistic.
(Fall 2011, Seminar No. 3)
SPEAKER: Dr.Tucker McElroy
Center for Statistical Research and Methodology, U. S. Census Bureau
TITLE:
Signal Extraction for Nonstationary Multivariate Time Series With
Applications to Trend Inflation
TIME AND PLACE:
October 6, 2011, 3:30pm
Room 1313, Math Bldg.
ABSTRACT:
We advance the theory of signal extraction by developing the optimal
treatment of nonstationary vector time series that may have common trends. We present new formulas for
exact
signal estimation for both theoretical bi-infinite and finite samples.
The formulas reveal the specific roles of inter-relationships among
variables for sets of optimal filters, which makes fast and direct
calculation feasible, and shows rigorously
how the optimal asymmetric filters are constructed near the end points for a
set of series. We develop a class of model-based low-pass filters
for trend estimation and illustrate the methodology by studying
statistical estimates of trend inflation.
(Fall 2011, Seminar No. 4)
SPEAKER: Professor Radu Balan
University of Maryland
Colege Park, MD 20742, USA
TITLE: A Regularized Estimator and the
Cramer-Rao Lower Bound for a Nonlinear Signal Processing
Problem
TIME AND PLACE:
October 13, 2011, 3:30pm
Room 1313, Math Bldg.
ABSTRACT:
In this talk we present and algorithm for signal reconstruction from
absolute value of frame coefficients. Then we compare its performance to
the Cramer-Rao Lower Bound (CRLB) at high signal-to-noise ratio. To fix
notations, assume {f_i; 1<= I <= m} is a spanning set (hence frame) in
R^n. Given noisy measurements d_i=||^2+\nu_i,
1<= i<= m, the problem is to recover x\in R^n up to a global sign. In
this talk the reconstruction algorithm solves a regularized least
squares criterion of the form
(Fall 2011, Seminar No. 5)
SPEAKER: Jiraphan Suntornchost
University of Maryland
Colege Park, MD 20742, USA
TITLE: Modeling U.S. Cause-Specific Mortality Rates Using an
Age-Segmented Lee Carter
TIME AND PLACE:
November 3, 2011, 3:30pm
 
Room 1313, Math Bldg.
ABSTRACT:
In many demographic and public-health applications, it is important to
summarize mortality curves and time trends from population-based
age-specific mortality data collected over successive years, and this is
often done through the well-known model of Lee and Carter (1992). In
this paper, we propose a modification of the Lee-Carter model which
combines an age-segmented Lee-Carter model with spline-smoothed
period-effects within each age segment. With different period-effects
across age-groups, the segmented Lee-Carter is fitted by using iterative
penalized least squares and Poisson Likelihood methods. The new methods
are applied to the 1971-2006 public-use mortality data sets released by
the National Center for Health Statistics (NCHS). Mortality rates for
three leading causes of death, heart diseases, cancer and accidents, are
studied in this research. The results from data analysis suggest that
the age-segmented method improves the performance of the Lee-Carter
method in capturing period-effects across ages.
(Fall 2011, Seminar No.
6)
SPEAKER: Dr. Yuan Liao
Princeton University
USA
TITLE: Theory and Applications of High Dimensional Covariance Matrix
Estimation
TIME AND PLACE:
November 10, 2011, 3:30pm
 
Room 1313, Math Bldg.
ABSTRACT:
Due to the abundance of high dimensional data in modern scientific
research, the estimation of large covariance matrices becomes an important
question in many application areas of statistical analysis. In these
applications, the sample sizes can be very small relative to the
dimensions. We give an overview of these applications. Examples include
finance, genetic data, brain imaging, climate study, and many others. The
recent advances in random matrix theory showed that without regularization
the traditional sample covariance performs poorly when estimating a large
covariance matrix. In this talk, we demonstrate that both of the two
popular regularization methods in the literature, directly exploiting
sparsity and assuming a strict factor structure, are restrictive and
inappropriate in many applied problems.
We estimate the covariance matrix using an Approximate Factor Model, which
is one of the commonly used tools for dimension reduction. By assuming
sparse error covariance, we allow the presence of the cross-sectional
correlation among the noises even after common factors are taken out.
Therefore, this approach enables us to combine the merits of the methods
based on either sparsity or the factor model. We estimate the sparse noise
covariance using the adaptive thresholding technique as in Cai and Liu
(2011), taking into account the fact that in many cases direct observations
of the noise components and the common factors are both unavailable. The
convergence rate of the estimated covariance matrices under various norms
is derived. In particular, the rate is optimal when the number of factors
is bounded. It is shown that the effect of estimating the unknown factors
is negligible when the dimensionality is large enough, and thus we can
treat the common factors as if there were known. Finally, an empirical
example of financial portfolio allocation is presented.
(Fall
2011, Seminar No. 7)
SPEAKER: Professor Grace Yang
University of Maryland
College Park, MD USA
TITLE: Neyman, Markov Processes and Survival Analysis
TIME AND PLACE:
December 1, 2011, 3:30pm
 
Room 1313, Math Bldg.
ABSTRACT:
Neyman used stochastic processes extensively, particularly the Markov processes, in his applied work. One
example$
comparison of different treatments of breast cancer. The work gives rise to the celebrated Fix-Neyman
competing r$
the Fix-Neyman model and one of its extensions to a non-parametric analysis made by Altshuler (1970). This
will b$
Fix-Neyman model with the current development of the survival analysis. We shall illustrate that the
Markov model$
general approach to study many of the problems in survival analysis.
(Spring 2011, Seminar No. 1)
SPEAKER: Prof. Jian-Jian Ren
University of Central Florida
Orlando, FL 32816-8005, USA
TITLE:
Weighted Empirical Likelihood, Censored Data and Logistic Regression
TIME AND PLACE:
February 10, 2011, 9:30am (NOTE the TIME)
Colloquium Room 3206, Math Bldg
ABSTRACT:
In this talk, we will review the concepts of parametric likelihood
and the maximum likelihood estimator (MLE), and will review the concepts
of nonparametric likelihood, called empirical likelihood (Owen, 1988), and
the nonparametric MLE. We then introduce a new likelihood function, called
weighted empirical likelihood (Ren, 2001, 2008), which is formulated in a
unified form for various types of censored data. We show that the weighted
empirical likelihood method provides a useful tool for solving a broad class
of nonparametric and semiparametric inference problems involving
complicated types of censored data, such as doubly censored data, interval
censored data, partly interval censored data, etc. These problems are
mathematically challenging and practically important due to applications in
cancer research, AIDS research, etc. As an example, some related new
statistical methods and data examples on the logistic regression model
with censored data will be presented.
(Spring 2011, Seminar No. 2)
SPEAKER: Prof. Abram Kagan
University of Maryland
College Park, MD 20742, USA
TITLE:
Semiparametric Estimation for Kernel Families
TIME AND PLACE:
February 17, 2011, 3:30am
Room 1313, Math Bldg
ABSTRACT:
A non-negative function h(x;\theta), called a kernel, generates a
parametric family of probability distributions on the measurable space of
x-values with \theta as a parameter. Namely, any probability measure P
such that E_{P}[h(X;\theta)]<\infty (we call P a generator) generates a
family P_{\theta} with
dP_{\theta}(x)=C(\theta)h(x;\theta)dP(x)
where C(\theta) is the normalizing function.
The kernel families (of which the NEFs are a special case) are a natural
model for semiparametric estimation. The semiparametric ML and MM
estimators are developed and their behavior studied in large samples.
Seminar Slides
(Spring 2011, Seminar No. 3)
SPEAKER: Dr. Martin Ehler
Section of Medical Biophysics, NICHD
Bethesda, MD 20892, USA
TITLE:
Random tight frames in compressive sampling and directional
statistics
TIME AND PLACE:
February 24, 2011, 3:30am
Room 1313, Math Bldg
ABSTRACT:
Distinguishing between uniform and non-uniform sample distributions is a
common problem in directional data analysis; however for many tests,
non-uniform distributions exist that fail uniformity rejection. By merging
directional statistics with frame theory and introducing probabilistic
frames, we find that probabilistic tight frames yield non-uniform
distributions that minimize directional potentials, leading to failure of
uniformity rejection for the Bingham test.
Moreover, it is known that independent, uniformly distributed points on
the sphere approximately form a finite unit norm tight frame. In fact, we
verify that points chosen from any probabilistic tight frame approximately
form a finite tight frame; points do not have to be uniformly distributed,
nor have unit norm. We also observe that classes of random matrices used
in compressed sensing are induced by probabilistic tight frames.
Finally, we apply our results to model patterns found in granular rod
experiments.
(Spring 2011, Seminar No. 4)
SPEAKER: Prof. Hector Corrada Bravo
University of Maryland
College Park, MD 20742, USA
TITLE:
Genomic anti-profiles: modeling gene expression and DNA
methylation variability in cancer
populations for prediction and prognosis
TIME AND PLACE:
March 10, 2011, 3:30am
Room 1313, Math Bldg
ABSTRACT:
Predictive models of disease based on genomic measurements,
e.g., gene expression or DNA methylation,
usually focus on finding
distinct representative profiles for healthy and diseased populations.
However, some diseases, such as cancer,
exhibit increased heterogeneity in the disease population. In this
talk, I will discuss recent results and methods
that use the idea of anti-profiles, based on the observation of
increased variation in cancer populations,
as predictive and prognostic models.
(Spring 2011, Seminar No. 5)
SPEAKER: Prof. Armand Makowski
University of Maryland
College Park, MD 20742, USA
TITLE:
Recent results for random key graphs: Connectivity, triangles, etc.
TIME AND PLACE:
March 17, 2011, 3:30am
Room 1313, Math Bldg
ABSTRACT:
Random key graphs, also known as uniform random intersection graphs,
appear in application areas as diverse as clustering analysis,
collaborative filtering in recommender systems and key distribution
in wireless sensor networks (WSNs). In this last context
random key graphs are naturally associated with a random key
predistribution scheme proposed by Eschenauer and Gligor.
In this talk we present some recent results concerning the structure
of random key graphs. Similarities and differences with Erdos-Renyi
graphs are given. We also discuss performance implications for the
scheme of Eschenauer and Gligor. Highlights include:
(i) A zero-one law for graph connectivity (and its critical scaling)
as the number of nodes becomes unboundedly large; (ii) A zero-one law
(and its critical scaling) for the appearance of triangles; and
(iii) Clustering coefficients and the "small world" property
of random key graphs.
This is joint work with Ph.D. student Osman Yagan.
(Spring 2011, Seminar No. 6)
SPEAKER: Prof. Maria Cameron
University of Maryland
College Park, MD 20742, USA
TITLE:
Computing Transition Paths for Rare events
TIME AND PLACE:
April 14, 2011, 3:30am
Room 1313, Math Bldg
ABSTRACT:
Methods will be discussed for computing the transition paths
between metastable states in the systems evolving according to
the Ito-type SDE's. These include
(a) the calculation of the
quasipotential and Freidlin's cycles, and
(b) the alternative so-called MaxFlux
functional approach.
These techniques will be demonstrated on examples coming from physics and
physical chemitry.
(Spring 2011, Seminar No. 7)
SPEAKER: Prof. Prakash Narayan
University of Maryland
College Park, MD 20742, USA
TITLE:
Data Compression, Secrecy Generation and Secure Computation
TIME AND PLACE:
April 21, 2011, 3:30am
Room 1313, Math Bldg
ABSTRACT:
This talk addresses connections between the information theoretic notion
of multiterminal data compression, secrecy generation
and secure function computation. It is based on joint works with Imre
Csiszar, Himanshu Tyagi and Chunxuan Ye.
Consider a situation in which multiple terminals observe separate but
correlated signals and seek to devise a secret key through
public communication that is observed by an eavesdropper, in such a way
that the key is concealed from the eavesdropper. We
show how this problem is connected to a multiterminal data compression
problem (without secrecy constraints), and illustrate the
connection with a simple key construction. Next, consider the situation
in which the same of terminals seek to compute a given function
of their observed signals using public communication; it is required now
that the value of the function be kept secret from an eavesdropper
with access to the communication. We show that the feasibility of such
secure function computation is tied tightly to the previous secret
key generation problem.
(Spring 2011, Seminar No. 8)
SPEAKER: Prof. Joel Cohen
University of Maryland
Colege Park, MD 20742, USA
TITLE:
Potentials and Flux on Markov Recurrent Chains
TIME AND PLACE:
April 28, 2011, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Abstract
(Spring 2011, Seminar No. 9)
SPEAKER: Prof. Robert Kass
Carnegie-Mellon University
Pitsburgh, PA 15213, USA
TITLE:
Statistical Analysis of Neural Spike Train Data
TIME AND PLACE:
May 3, 2011, 3:30pm
Room 3206(tentative), Math Bldg
ABSTRACT:
One of the most important techniques in learning about the functioning
of the brain has involved examining neuronal activity in laboratory
animals under differing experimental conditions. Neural information
is represented and communicated through series of action potentials,
or spike trains, and the central scientific issue in many studies
concerns the physiological significance that should be attached to a
particular neuron firing pattern in a particular part of the brain.
Because repeated presentations of stimuli often produce quite variable
neural responses, statistical models have played an important role in
advancing neuroscientific knowledge. In my talk I will briefly
outline some of the progress made, by many people, over the past 10 years,
highlighting work my colleagues and I have contributed. I
will also comment on the perspective provided by statistical thinking.
See: http://www.stat.cmu.edu/~kass/11MD-NSA.zip under "talks".
(Fall 2010, Seminar No. 1)
SPEAKER: Prof. Abram Kagan
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
On estimating the multinomial parameter
TIME AND PLACE:
September 23, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Abstract
(Fall 2010, Seminar No. 2)
SPEAKER: Prof. Eric Slud
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
Symmetric "Rejective" Probability Proportional to Size Sampling
TIME AND PLACE:
September 30, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
This talk will first introduce the topic of Probability
Proportional to Size (PPS) sampling in surveys, and explain why
PPS sampling without replacement in a way which is at the same
time tractable to implement and has good theoretical properties, is
still a topic of research.
The PPS method and results from a classical survey sampling
paper of Hajek (1964) will then be described, and the rest of the talk
will show the solution of a general existence problem related to such
designs, and describe some computational developments which
make rejective PPS sampling genuinely usable in practice.
Abstract
(Fall 2010, Seminar No. 3)
SPEAKER: Prof. Chandra Gulati
University of Wollongong
Wollongong, NSW, Australia
TITLE:
Pair Trading Using Cointegration
TIME AND PLACE:
October 7, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Pairs trading is one of the arbitrage strategies employed in some
stock markets. Using cointegration technique, it can be determined
whether a pair of stocks is cointegrated. Pair trading strategy for
these two stocks consists of buying the undervalued stock and selling
the overvalued stock when the two stocks are temporarily out of
equilibrium. After the stock prices return to equilibrium, positioned
is reversed, i.e., previously bought (sold) share is sold (bought).
In this talk, pairs for which cointegration error follows an AR(1)
process, evaluation of average trade duration, average number of
trades and profit over a specified trading period are considered.
Also, determination of pre-set boundaries to open and close a trade
is considered. Results are applied to some trading pairs in the
Australian Stock Market.
(Fall 2010, Seminar No. 4)
SPEAKER: Prof. G´rard L´tac
Universit´ Paul Sabatier
Toulouse, France
TITLE:
Contingency Tables from the Algebraic Statistics Viewpoint
TIME AND PLACE:
October 14, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Abstract
(Fall 2010, Seminar No. 5)
SPEAKER: Prof. Abram Kagan
University of Maryland
College Park, MD 20742, USA
TITLE:
Lower bounds for the Fisher information and the least favorable
distributions
TIME AND PLACE:
October 28, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
On projecting the Fisher score into appropriately chosen
subspaces, useful lower bounds are obtained for the Fisher information on
a location parameter theta contained in an observation with pdf
f(x-theta) for (more or less) natural classes of f.
Also of interest are the least favorable distributions for which the
bounds are attained.
It is a joint work with Nicholas Henderson.
(Fall 2010, Seminar No. 6)
SPEAKER: Dr. Paul S. Albert
National Institute of Child Health and Human Development
Rockville, MD 20852-3902, USA
TITLE:
A Linear Mixed Model for Predicting a Binary Event Under Random
Effects Misspecification
TIME AND PLACE:
November 4, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The use of longitudinal data for predicting a subsequent binary event is often the focus of diagnostic studies. This is particularly important in obstetrics, where ultrasound measurements taken during fetal development may be useful for predicting various poor pregnancy outcomes. The focus of this paper is on developing a class of joint models for the longitudinal measurements and binary events that can be used for prediction. A shared random parameter model is proposed for linking the two processes together. Under a Gaussian random effects assumption, the approach is simple to implement with standard statistical software. Using asymptotic and simulation results, we show that estimates of predictive accuracy under a Gaussian random effects distribution are robust to severe misspecification of this distribution. However, under some circumstances, estimates of individual risk may be sensitive to severe random effects misspecification. We illustrate the methodology with data from a longitudinal fetal growth study.
(Fall 2010, Seminar No. 6)
SPEAKER: Dr. Paul S. Albert
National Institute of Child Health and Human Development
Rockville, MD 20852-3902, USA
TITLE:
A Linear Mixed Model for Predicting a Binary Event Under Random
Effects Misspecification
TIME AND PLACE:
November 4, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The use of longitudinal data for predicting a subsequent binary event is often the focus of diagnostic studies. This is particularly important in obstetrics, where ultrasound measurements taken during fetal development may be useful for predicting various poor pregnancy outcomes. The focus of this paper is on developing a class of joint models for the longitudinal measurements and binary events that can be used for prediction. A shared random parameter model is proposed for linking the two processes together. Under a Gaussian random effects assumption, the approach is simple to implement with standard statistical software. Using asymptotic and simulation results, we show that estimates of predictive accuracy under a Gaussian random effects distribution are robust to severe misspecification of this distribution. However, under some circumstances, estimates of individual risk may be sensitive to severe random effects misspecification. We illustrate the methodology with data from a longitudinal fetal growth study.
(Fall 2010, Seminar No. 7)
SPEAKER: Prof. Abram Kagan
University of Maryland
College Park, MD 20742, USA
TITLE:
Semiparametric Estimation in Exponential Families
TIME AND PLACE:
November 11, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The talk deals with combining large samples of size n from the
Natural Exponential Family (NEF) with an unknown generator F and of size m
from population F to estimate the parameter of the NEF.
All the cases m=cn(1+o(1)) with c>0, m=o(n) and n=o(m) are considered.
(Fall 2010, Seminar No. 8)
SPEAKER: Dr. Ao Yuan
National Human Genome Center, Howard University
Washington, DC 20060, USA
TITLE:
U-statistics with side information
TIME AND PLACE:
November 18, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We study U-statistics with side information incorporated using the method of
empirical likelihood. Some basic properties of the proposed statistics
are investigated. We find
that by implementing the side information properly, the new
U-statistics can have smaller
asymptotic variance than the existing versions. The proposed U-statistics can achieve
asymptotic efficiency and their weak limits admit a convolution result. We also find
that
the corresponding U-likelihood ratio procedure, as well as the U-empirical
likelihood based
confidence interval construction, do not benefit from incorporating side information, a
result consistent with that under the standard empirical likelihood ratio. The
impact of
incorrect side information in the proposed U-statistics is also explored. Simulation
studies
are conducted to assess the finite sample performance of the proposed method. The
numerical results show that with side information implemented, the deduction of
asymptotic
variance can be substantial in some cases, and the coverage probability of confidence
interval using the U-empirical likelihood ratio based method outplay that of the normal
approximation based method, especially when the underlying distribution is skewed.
(it is a joint work with Wenqing He, Binhuan Wang, Gengsheng Qin).
Click here to see the slides.
(Fall 2010, Seminar No. 9)
SPEAKER: Prof. Mikhail Malioutov
Northeastern University
Boston, Massachusetts 02115, USA
TITLE:
Universal Compressor-based Statistical Inference
TIME AND PLACE:
December 2, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Abstract
(Fall 2010, Seminar No. 10)
SPEAKER: Dr. Yang Cheng
U.S. Census Bureau
Suitland, MD 20746, USA
TITLE:
Government Statistics Research Problems and Challenges
TIME AND PLACE:
December 9, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The Governments Division in the U.S. Census Bureau has conducted many
innovative research projects in the area of sample design, estimation,
variance estimation, and small area estimation. In this talk, we first
give the background of a particular government statistics challenge. Then,
we introduce our solution to the problem, a unique modified cut-off sample
design. This design is a new two-stage sampling method that was developed
by combining stratified sampling with cut-off sampling based on the size of
the unit. Next, we present our decision-based estimation methodology. This
adaptive decision-based estimation method was introduced as a stratum-wise
regression for strata defined first by cut-points for cut-off sampling and
then through stratum collapsing rules determined from the results of a
hypothesis test for equality of regression slopes. Also, we discuss the
small area estimation challenges we face when we estimate functional level
data, such as estimates for airports, public welfare, hospitals, and so on.
Finally, we explore variance estimators for the decision-based
estimation.
PPT or
PDF.
(Spring 2010, Seminar No. 1)
SPEAKER: Prof. Jun Shao
University of Wisconsin
Madison, WI 53706-1685, U.S.A.
TITLE:
Sparse Linear Discriminant Analysis With High Dimensional Data
TIME AND PLACE:
Thursday, January 28, 2010, 3:30pm
Room 3206, Math Bldg
ABSTRACT:
In many social, economical, biological, and medical studies,
the statistical analysis focuses on classifying a subject into
several classes based on a set of variables observed from
the subject and a training sample used to estimate the unknown
probability distributions of the variables.
The well-known linear discriminant analysis (LDA) works well
for the situation where the number of variables used for
classification is much smaller than the training sample size.
With very fast development of modern computer and other technologies,
we now face problems with the number of variables much larger than
the sample size, and the LDA may perform poorly in these problems.
We propose a sparse LDA and show it is asymptotically optimal
under some sparsity conditions on the unknown parameters.
An example of classifying human cancer into two classes of leukemias
based on a set of 1,714 genes and a training sample of size 72 is
discussed. Some simulation results are also presented.
(Spring 2010, Seminar No. 2)
SPEAKER: Prof. Xin He
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
Semiparametric Analysis of Multivariate Panel Count Data
TIME AND PLACE:
Thursday, February 18, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Multivariate panel count data frequently occur in periodic follow-up studies
that involve several different types of recurrent events of interest. In
many applications, these recurrent event processes can be correlated and it
may not be easy to accommodate the dependence structures. In this talk, I
will present a class of marginal mean models that leave the dependence
structures for related types of recurrent events completely unspecified.
Some estimating equations are developed for inference and the resulting
estimates of regression parameters are shown to be consistent and
asymptotically normal. Simulation studies are conducted for practical
situations and the methodology is applied to a motivating cohort study of
patients with psoriatic arthritis.
(Spring 2010, Seminar No. 3.)
SPEAKER: Prof. William Rand,
Director of Research, Center for Complexity in Business
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
Inferring Network Properties from Aggregate Diffusion Data
TIME AND PLACE:
Thursday, February 25, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Where do fads come from? Why are urban myths popular? Which of our
friends tells us about the next must-have gadget? Underlying all of
these questions is a process of diffusion, that is how do ideas,
concepts, and best practices spread through a population of
individuals? We examine these questions using a combination of agent-
based modeling, social network analysis, and machine learning.
Beginning with a replication of a traditional model using an agent-
based approach, we move on to explore diffusion processes on social
networks. After that we examine an application of these techniques to
a marketing application, specifically the role of customer
interactions in product success. Anecdotal evidence from business
practitioners and numerous academic studies have shown the importance
of word-of-mouth communication for product adoption. However, rarely
are interaction networks observable. Nevertheless, most firms do have
a significant amount of dynamic, aggregate marketing data, such as
customer purchases, attitudes, and information queries. We present a
new technique for inferring general network properties from this
aggregate-level data. We propose a Bayesian model selection approach
in combination with agent-based modeling to infer properties of the
unobserved consumer network, and show that it has the ability to
distinguish between various classes of networks on the basis of
aggregate data.
(Spring 2010, Seminar No. 4)
NSA Mathematics Colloquium
SPEAKER: Prof. David Gamarnik,
MIT Sloan School of Management
Massachusetts Institute of Technology
Cambridge , MA 02139, U.S.A.
TITLE:
A combinatorial approach to the interpolation method and scaling
limits in sparse random graphs.
TIME AND PLACE:
Tuesday, March 2, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We establish the existence of scaling limits for several
combinatorial optimization models on Erdos-Renyi and sparse random regular
graphs. For a variety of models, including maximum independent sets,
MAX-CUT, coloring and K-SAT, we prove that the optimal value appropriately
rescaled, converges to a limTITLE: A combinatorial approach to the interpolation method and scaling
limits in sparse random graphs.
ABSTRACT: We establish the existence of scaling limits for several
combinatorial optimization models on Erdos-Renyi and sparse random regular
graphs. For a variety of models, including maximum independent sets,
MAX-CUT, coloring and K-SAT, we prove that the optimal value appropriately
rescaled, converges to a limit with probability one, as the size of the
underlying graph diverges to infinity. For example, as a special case we
prove that the size of a largest independent set in these graphs, normalized
by the number of nodes converges to a limit with probability one, thus
resolving an open problem.
Our approach is based on developing a simple combinatorial approach to an
interpolation method developed recently in the statistical physics
literature. Among other things, the interpolation method was used to prove
the existence of the so-called free energy limits for several spin glass
models including Viana-Bray and random K-SAT models. Our simpler
combinatorial approach allows us to work with the zero temperature case
(optimization) directly and extend the approach to many other models.
Additionally, using our approach, we establish the large deviations
principle for the satisfiability property for constraint satisfaction
problems such as coloring, K-SAT and NAE(Not-All-Equal)-K-SAT. The talk will
be completely self-contained. No background on random graph
theory/statistical physics is necessary.
Joint work with Mohsen Bayati and Prasad Tetali it with probability one, as the size of the
underlying graph diverges to infinity. For example, as a special case we
prove that the size of a largest independent set in these graphs, normalized
by the number of nodes converges to a limit with probability one, thus
resolving an open problem.
Our approach is based on developing a simple combinatorial approach to an
interpolation method developed recently in the statistical physics
literature. Among other things, the interpolation method was used to prove
the existence of the so-called free energy limits for several spin glass
models including Viana-Bray and random K-SAT models. Our simpler
combinatorial approach allows us to work with the zero temperature case
(optimization) directly and extend the approach to many other models.
Additionally, using our approach, we establish the large deviations
principle for the satisfiability property for constraint satisfaction
problems such as coloring, K-SAT and NAE(Not-All-Equal)-K-SAT. The talk will
be completely self-contained. No background on random graph
theory/statistical physics is necessary.
Joint work with Mohsen Bayati and Prasad Tetali
(Spring 2010, Seminar No. 5)
SPEAKER: Mr. Simon Jidong Zhang
Senior Financial Analyst, Capital One
McLean, VA 22102, U.S.A.
TITLE:
Using copula to better understand dependence structure of
financial risks and to model the comovement between financial markets
TIME AND PLACE:
Thursday, March 11, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The problem of modeling dependence structure among different financial
risks, financial assets, or financial markets is critical. Copula
approach quickly gains popularity among financial practitioners in past
10 years. In first part, I will briefly introduce copula, compare it to
traditional correlation-based approach, explain how to apply copula in
capital management, and present a case study by using Federal Reserve
Bank data. In second part, I will model the comovement between US equity
and US bond market, use some advanced techniques such as Markov Chain
Regime shifting modeling, GARCH model and time varying copula. If time
allowed, broader application of copula will also be discussed, such as
cross-country financial market comovement, diversification, financial
asset pricing and allocation.
(Spring 2010, Seminar No. 6)
SPEAKER: Prof. Wojciech Czaja
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
Dimension reduction, classification, and detection in high-dimensional data
TIME AND PLACE:
Thursday, March 25, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We shall present a set of techniques for dimension reduction,
classification, and detection in high-dimensional datasets. These
techniques are based on the analysis of the discrete Laplace operator
built on the data-dependent graphs. Some of the best known examples
include Laplacian Eigenmaps and Locally Linear Embedding algorithms. Our
point of view on these methods is purely deterministic. However, the goal
of this presentation is to provide an overview of these aspects of the
manifold learning theory, which have the potential to be improved upon by
means of probabilistic tools.
(Spring 2010, Seminar No. 7)
SPEAKER: Prof. Kaushik Ghosh
University of Nevada
Las Vegas, NV 89154-4020, U.S.A.
TITLE:
A Unified Approach to Variations of Ranked Set Sampling
TIME AND PLACE:
Thursday, April 1, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
This talk will develop a general theory of inference using data
collected from
several variations of ranked set sampling. Such variations include
balanced and
unbalanced ranked set sampling, balanced and unbalanced k-tuple ranked set
sampling,
nomination sampling, simple random sampling, as well as a combination of
them.
Methods of estimating the underlying distribution function as well as its
functionals, and
asymptotic properties of the resulting estimators will be discussed. The
results so
obtained will be used to develop nonparametric procedures for one- and
two-sample
problems. The talk will conclude with a study of the small-sample
properties of the
estimators and an illustrative example.
(Spring 2010, Seminar No. 8)
SPEAKER: Prof. Galit Shmueli
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
To Explain or To Predict?
TIME AND PLACE:
Thursday, April 8, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Statistical modeling is at the core of many scientific disciplines.
Although a vast literature exists on the process of statistical modeling,
on good practices, and on abuses of statistical models, the literature
lacks the discussion of a key component: the distinction between modeling
for explanatory purposes and modeling for predictive purposes. This
omission exacts considerable cost in terms of advancing scientific
research in many fields and especially in the social sciences, where
statistical modeling is done almost entirely in the context of
explanation. In this talk, I describe how statistical modeling is used in
research for causal explanation, and the differences between explanatory
and predictive statistical modeling.
(Spring 2010, Seminar No. 10)
SPEAKER: Mr. Vasilis A. Sotiris
CALCE, University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
Failure inference from first hitting time models
TIME AND PLACE:
Thursday, April 22, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
There is a growing interest in models that relate failure times and stochastic time-varying covariates (markers). Such models as used for a variety of purposes; to improve prediction of failure time in operational settings; to give better estimates of marginal failure time distributions from censored data, to compare item designs, by considering markers as covariates or, sometimes, as surrogates for failure. We are interested in the situation where failure is not related deterministically to an observable marker, rather to a latent variable: degradation. Degradation is usually a complex mechanism that explains the wear of an item, and is described well by a stochastic process. In first hitting time models, the time-to-failure is modeled as the first hitting time (fht) of a barrier a by the degradation process. The model presented here is based on a bivariate Wiener process in which one component of the process represents the marker and the second, the latent degradation, determines the time to failure. This models yields reasonably simple expressions for estimation and prediction, and is easy to fit to commonly occurring data that involve the marker at the censoring time for surviving cases and the marker value and failure time for failing cases. Parametric and predictive inference is discussed, and an example is used to illustrate the model.
(Spring 2010, Seminar No. 11)
SPEAKER: Prof. Ping Ma
University of Illinois
Urbana-Champaign, IL 61802, U.S.A.
TITLE:
A Journey to the Center of the Earth
TIME AND PLACE:
April 29, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
At a depth of 2890 km, the core-mantle boundary (CMB) separates turbulent
flow of liquid metals in the outer core from slowly convecting, highly
viscous mantle silicates. The CMB marks the most dramatic change in dynamic
processes and material properties in our planet, and accurate images of the
structure at or near the
CMB--over large areas--are crucially important
for our understanding of present day geodynamical processes and the
thermo-chemical structure and history of the mantle and mantle-core system.
In addition to mapping the CMB we need to know if other structures exist
directly above or below it, what they look like, and what they mean (in
terms of physical and chemical material properties and geodynamical
processes). Detection, imaging,
(multi-scale) characterization, and understanding of structure (e.g.,
interfaces) in this remote region have
been--and are likely to remain--a
frontier in cross-disciplinary geophysics research. I will discuss the
statistical problems and challenges in imaging the CMB through generalized
Radon transform.
(Spring 2010, Seminar No. 12)
SPEAKER: Dr. Yaakov Malinovsky
NICHD
Rockville, MD 20852, U.S.A.
TITLE:
PREDICTION OF ORDERED RANDOM EFFECTS IN A SIMPLE SMALL AREA MODEL
TIME AND PLACE:
May 6, 2010, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Prediction of a vector of ordered parameters, or part of it, arises naturally
in the context of Small Area Estimation (SAE). For example, one may want to
estimate the parameters associated with the top ten areas, the best or worst
area, or a certain percentile. We use a simple SAE model to show that
estimation of ordered parameters by the corresponding ordered estimates of
each area separately does not yield good results with respect to Mean Square
Error. Shrinkage-type predictors, with an appropriate amount of shrinkage for
the particular problem of ordered parameters, are considerably better, and
their performance is close to that of the optimal predictors, which cannot in
general be computed explicitly.
(Fall 2009, Seminar No. 1)
SPEAKER: Prof. Nikolai Chernov
University of Alabama at Birmingham,
Birmingham, AL, 35294, U.S.A.
TITLE:
Errors-in-variables regression models: Parameter estimates often have no
moments
TIME AND PLACE:
Thursday, September 10, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In the studies of linear and nonlinear regression problems when
both variables are subject to errors, it often happens that maximum
likelihood estimates have infinite mean values and infinite variances, but
they work well in practice. I will discuss these facts and their
methodological
implications.
(Fall 2009, Seminar No. 2)
SPEAKER: Terrence Moore, PhD Candidate
US Army Research Laboratory
Adelphi, MD 20783, U.S.A.
TITLE:
Constrained Cramer-Rao Bound
TIME AND PLACE:
Thursday, September 17, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Hero and Gorman developed a simple expression for an information
inequality under differentiable equality constraints (a constrained
Cramer-Rao bound) in terms of a full rank Fisher information and the
Jacobian of the constraint function. Later,
Stoica and Ng presented a more general expression applicable to
singular
Fisher
information matrices. This bound is particularly useful in measuring
estimation
performance in communications models, in which the design or control
of constraints is entirely plausible. Here, I will present a very
simple proof of this constrained Cramer-Rao bound and detail an
example of its utility in communications research.
BIO:
Terrence Moore is a mathematician working for the Tactical
Communications Networks Branch at the Army Research Lab in Adelphi
on statistical signal processing issues
of interest to the Army. He received his B.S. and M.A. degrees
in mathematics from the American University in Washington, DC,
in 1998 and 2000, respectively, and he is
currently a Ph.D. candidate in mathematics at the University of Maryland.
(Fall 2009, Seminar No. 3)
SPEAKER: Dr. Lance Kaplan, Team Leader
US Army Research Laboratory
Adelphi, MD 20783, U.S.A.
TITLE:
Monotonic Analysis for the Evaluation of Image Quality Measures
TIME AND PLACE:
Thursday, September 24, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
A number of image quality measures have been proposed to quantify the
perceptual quality of images. When humans are given a task to
interpret a set of images, the "performance" of these humans over the
imagery should correspond to the image quality values in a monotonic
fashion. To this end, we consider two tests to determine whether or not
the monotonic relationship exists. First, the monotonic correlation
test assumes that human performance measurement errors are Gaussian, and
it is computed from the R2 value when using isotonic regression. The
second test, the diffuse prior monotonic likelihood ratio test, assumes
that the performance measurements follow a binomial distribution. This
talk will discuss properties of these two tests and apply these tests to
evaluate the effectiveness of image quality measures to "score" fused
images.
BIO:
Lance M. Kaplan received the B.S. degree with distinction from Duke
University, Durham, NC, in 1989 and the M.S. and Ph.D. degrees from the
University of Southern California, Los Angeles, in 1991 and 1994,
respectively, all in electrical engineering. From 1987 to1990, he was a
Technical Assistant at the Georgia Tech. Research Institute. He held a
National Science Foundation Graduate Fellowship and a University of
Southern California (USC) Dean's Merit Fellowship from 1990 to 1993, and
was a Research Assistant in the Signal and Image Processing Institute at
USC from 1993 to 1994. Then, he worked on staff in the Reconnaissance
Systems Department of the Hughes Aircraft Company from 1994 to 1996.
>From 1996 to 2004, he was a member of the faculty in the Department of
Engineering and a Senior Investigator in the Center of Theoretical
Studies of Physical Systems (CTSPS) at Clark Atlanta University (CAU),
Atlanta, GA. Currently, he is a Team Leader in the Networked Sensing and
Fusion branch of the U.S. Army Research Laboratory. Dr. Kaplan serves as
an Associate Editor-In-Chief and EO/IR Systems Editor for the IEEE
Transactions on Aerospace and Electronic Systems (AES). In addition, he
is the tutorials editor for the IEEE AES Magazine, and he also serves on
the Board of Governors of the IEEE AES Society. He is a three-time
recipient of the Clark Atlanta University Electrical Engineering
Instructional Excellence Award from 1999 to 2001. His current research
interests include signal and image processing, automatic target
recognition, data fusion, and resource management.
(Fall 2009, Seminar No. 4)
SPEAKER: Prof. Jian-Jian Ren
University of Central Florida,
Orlando, FL 32816, U.S.A.
TITLE:
Full Likelihood Inferences in the Cox Model
TIME AND PLACE:
Thursday, October 1, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We derive the full likelihood function for regression
parameter $\beta_0$ and baseline distribution function $F_0$ in the
continuous Cox model. Using the empirical likelihood parameterization,
we explicitly profile out nuisance parameter $F_0$ to obtain the
full-profile likelihood function and the maximum likelihood estimator
(MLE) for $\beta_0$. We show that the log full-likelihood ratio has an
asymptotic chi-squared distribution, while the simulation studies indicate
that for small or moderate sample sizes, the MLE performs favorably
over Cox's partial likelihood estimator. Moreover, we show that the
estimation bias of the MLE is asymptotically smaller than that of Cox's
partial likelihood estimator. In a real dataset example, our full likelihood
ratio test leads to statistically different conclusions from Cox's partial
likelihood ratio test. Part of this work is joint with Mai Zhou.
(Fall 2009, Seminar No. 5)
SPEAKER: David Judkins, Senior Statistician
WESTAT Inc.
Rockville, MD 20850-3195, U.S.A.
TITLE:
Using Longitudinal Surveys to Evaluate Interventions
TIME AND PLACE:
Thursday, October 8, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Longitudinal surveys are often used in evaluation studies conducted to
assess the effects of a program or intervention. They are useful for
examining the temporal nature of any effects, to distinguish between
confounding variables and mediators, and to better control for
confounders in the evaluation. In particular, the
estimation of causal effects may be improved if baseline data are
collected before the intervention is put in place. This presentation
will provide an overview of types of interventions, types of effects,
some issues in the design and analysis of
evaluation studies, and the value of longitudinal data. These points
will be illustrated using three evaluation studies: the U.S. Youth
Media Campaign Longitudinal Survey (YMCLS), conducted to evaluate a
media campaign to encourage 9-to 13-year-old Americans to be
physically active; the National Survey of Parents and
Youth (NSPY), conducted to evaluate the U.S. National Youth Anti-Drug Media
Campaign; and the Gaining Early Awareness and Readiness for
Undergraduate Programs (GEAR UP) program, designed to increase the
rate of postsecondary education among
low-income and disadvantaged students in the United States.
Based on:
Piesse, A., Judkins, D., and Kalton, G. (2009). Using longitudinal
surveys to evaluate interventions. In P. Lynn (Ed.), Methodology of
Longitudinal Surveys (pp. 303-316). Chichester: Wiley.
(Fall 2009, Seminar No. 6)
SPEAKER: Dr. Ben Klemens
United States Census Bureau
Suitland, MD 20746, U.S.A.
TITLE:
Using Agent-Based Models as Statistical Models
TIME AND PLACE:
Thursday, October 15, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Agent-based models (ABMs) involve the simulation of hundreds to millions
of individual agents, each making simple decisions. The results of these
decisions are often striking and make a direct qualitative statement.
However, ABMs can also be used like traditional statistical models for
quantitative analysis. I give the example of an ABM that explains a common
anomaly in the distribution of equity prices.
Click here to see the paper.
(Fall 2009, Seminar No. 7)
SPEAKER: Prof. David Stoffer,
Program Director, Probability and Statistics Program
University of Pittsburgh and National Science Foundation
Arlington, VA 22230-0002, U.S.A.
TITLE:
Spectral Magic
TIME AND PLACE:
Thursday, October 22, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The problem of estimating the spectral matrix of a multivariate time
series that has slowly changing dynamics has become a recent interest of
mine. The problem is difficult and had to be broken into smaller pieces.
I will discuss the first two pieces; there are at least two more
pieces to the puzzle. In the first place, estimating the spectral
density matrix of vector-valued stationary time series is not easy
because different degrees of smoothness are typically needed for
different components; this problem must be balanced with the fact that
the matrix must be positive semi-definite. I will discuss our approach
and then move on to the harder task of estimating the slowly changing
spectral density of a univariate locally stationary time series.
(Fall 2009, Seminar No. 8)
SPEAKER: Prof. Mei-Ling Ting Lee
University of Maryland,
College Park, MD 20742, U.S.A. (MLTLEE@UMD.EDU)
TITLE:
Threshold Regression for Time-to-event Data: with Applications in
Proteomics, Cancer Research, and Environmental Health.
TIME AND PLACE:
Thursday, October 29, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Threshold regression (TR) methodology is based on the concept that
health degradation follows a stochastic process and the onset of
disease, or death, occurs when the latent health process first
reaches a failure state or threshold (a first hitting time). Instead
of calendar time, the analytical time is considered. The model is
intuitive and generally does not require the proportional hazards
assumption and thus provides an important alternative for analyzing
time-to-event data. Connections with proportional hazard models will
be discussed. Examples and extensions will be discussed.
(Fall 2009, Seminar No. 9)
SPEAKER: Prof. Abram Kagan
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
A Class of Multivariate Distributions Related to Distributions with a Gaussian Component
TIME AND PLACE:
Thursday, November 5, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Abstract
(Fall 2009, Seminar No. 10)
SPEAKER: Prof. Antai Wang
Georgetown University
Washington, DC 20057, U.S.A.
TITLE:
Archimedean Copula Tests
TIME AND PLACE:
Thursday, November 12, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In this talk, we propose two tests for parametric models belonging to
the Archimedean copula family, one for uncensored bivariate data and
the other one for right-censored bivariate data. Our test procedures
are based on the Fisher transform of the correlation coefficient of
a bivariate $(U, V)$, which is a one-to-one transform of the
original random pair $(T_{1}, T_{2})$ that can be modeled by an
Archimedean copula model. A multiple imputation technique is applied
to establish our test for censored data and its $p$ value is
computed by combining test statistics obtained from multiply imputed
data sets. Simulation studies suggest that both procedures perform
well when the sample size is large. The test for censored data is
carried out for a medical data example.
(Fall 2009, Seminar No. 11)
SPEAKER: Prof. Jordan Stoyanov
Newcastle University
Newcastle upon Tyne, United Kingdom, NE1 7RU
TITLE:
Moment Analysis of Distributions: Recent Developments
TIME AND PLACE:
Thursday, November 19, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We start with classical conditions under which a distribution with finite
moments is M-determinate (unique), however our goal is to focus on recent
results providing conditions under which a distribution is
M-determinate or M-indeterminate (non-unique). Thus we will be
able to analyze Box-Cox functional transformations of random data,
before and/or after transforming, and characterize the moment
determinacy of their distributions. Popular distributions such as
Normal, Skew-Normal, Log-normal, Skew-Log-Normal, Exponential,
Gamma, Poisson, IG, etc. will be used as examples. Distributions
of Random walks and of Stochastic processes such as the Geometric
BM and the solutions of SDEs will also be considered.
We will illustrate the practical importance of these properties
in areas such as Financial modelling and Statistical inference problems.
Several facts will be reported. It seems, some of them are not
so well-known, they are a little surprising and even shocking.
However they are all quite instructive.
The talk will be addressed to professionals in Statistics/Probability,
Stochastic modelling, and also to Doctoral and Master students in
these areas. If time permits, some open questions will be discussed.
(Fall 2009, Seminar No. 12)
SPEAKER: Prof. Richard Levine
San Diego State University
San Diego, CA 92182, U.S.A.
TITLE:
Frailty Modeling via the Empirical Bayes Hastings Sampler
TIME AND PLACE:
Wednesday, December 16, 2009, 3pm
Colloquium Room 3206, Math Bldg
ABSTRACT:
Studies of ocular disease and analyses of time to disease onset are
complicated by the correlation expected between the two eyes from a single
patient. We overcome these statistical modeling challenges through a
nonparametric Bayesian frailty model. While this model suggests itself as
a natural one for such complex data structures, model fitting routines
become overwhelmingly complicated and computationally intensive given the
nonparametric form assumed for the frailty distribution and baseline
hazard function. We consider empirical Bayesian methods to alleviate
these difficulties through a routine that iterates between frequentist,
data-driven estimation of the cumulative baseline hazard and Markov chain
Monte Carlo estimation of the frailty and regression coefficients. We
show both in theory and through simulation that this approach yields
consistent estimators of the parameters of interest. We then apply the
method to the short-wave automated perimetry (SWAP) data set to study risk
factors of glaucomatous visual field deficits.
(Seminar No. 13)
SPEAKER: Dr. Zhe Lin
Institute for Advanced Computer Studies, University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
Recognizing Actions by Shape-Motion Prototypes
TIME AND PLACE:
Thursday, February 12, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In this talk, I will introduce our recent work on gesture or action
recognition based on shape-motion prototypes.
During training, a set of action prototypes are learned in a joint shape and
motion space via k-means clustering;
During testing, humans are tracked while a frame-to-prototype correspondence
is established by nearest neighbor
search, and then actions are recognized using dynamic prototype sequence
matching. Similarity matrices used for
sequence matching are efficiently obtained by look-up table indexing, which
is an order of magnitude faster than
brute-force computation of frame-to-frame distance. Our approach enables
robust action matching in very challenging
situations (such as moving cameras, dynamic backgrounds) and allows
automatic alignment of action sequences
by dynamic time warping. Experimental results demonstrate that our approach
achieves over 91% recognition rate
on a large gesture dataset containing 294 video clips of 14 different
gestures, and 100% on the Weizmann action dataset.
(Seminar No. 14)
SPEAKER: Prof. Refik Soyer
George Washington University
Washington, DC 20052, U.S.A.
TITLE:
Information Importance of Predictors
TIME AND PLACE:
Thursday, February 19, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The importance of predictors is characterized by the extent to which
their use reduces uncertainty about predicting the response variable,
namely their information importance.
Shannon entropy is used to operationalize the concept.
For nonstochastic predictors, maximum
entropy characterization of probability distributions
provides measures of information importance. For stochastic
predictors, the expected
entropy difference gives measures of information importance,
which are invariant under
one-to-one transformations of the variables. Applications to
various data types lead to
familiar statistical quantities for various models, yet with the unified
interpretation of
uncertainty reduction. Bayesian inference procedures for the
importance and relative
importance of predictors are developed.
Three examples show applications to normal
regression, contingency table, and logit analyses.
(Seminar No. 15)
SPEAKER: Lior Noy
Harvard Medical School
Boston, MA 02115, U.S.A.
TITLE:
Studying Eye Movements in Movement Imitation
TIME AND PLACE:
Thursday, February 26, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
People, animals and robots can learn new actions from observation. In
order to do so, they need to transform the visual input to motor
output. What is the nature of this transformation? What are the visual
features that are extracted and used by the imitator? A possible route
for answering these questions is to analyze imitator eye movements
during imitation.
We monitored eye movements of human subjects while they were watching
simple, one-arm movements in two conditions. In the watch-only
condition the observers were instructed only to watch the
movements. In the imitate condition the observers were instructed to
watch and then to imitate each movement. Gaze trajectories were
compared between the two conditions. In addition, we compared the
human behavior to the predications of the Itti-Koch saliency-map model
[1].
To determine the similarity among gaze trajectories of different
observers we developed a novel comparison method, based on
semi-parametric statistics. We compared this method to the more
standard usage of cross-correlation scores and show the advantages of
this method, in particular its ability to state that two gaze
trajectories are either different or similar in a statistically
significant way.
Our results indicate that:
(1)Subjects fixate at both the joints and the end-effectors of the observed moving arms, in contrast to previous reports [2].
(2)The Itti-Koch saliency-map model does not fully account for the human gaze trajectories.
(3)Eye movements in movement imitation are similar to each other in the
watch-only versus the imitate conditions.
Joint work with:
Benjamin Kedem & Ritaja Sur, University of Maryland, and
Tamar Flash, Weizmann Institute of Science.
References
[1] L. Itti and C. Koch. A saliency-based search mechanism for overt and covert
shifts of visual attention. Vision Research,
40:1489-1506, 2000.
[2] M. J. Mataric and M. Pomplun. Fixation behavior in observation and
imitation of human movement. Cognitive Brain Research,
7(2):191-202, 1998.
(Seminar No. 16)
SPEAKER: Prof. Yasmin H. Said
(Bio)
George Mason University
Fairfax, Virginia 22030, U.S.A.
TITLE:
Microsimulation of an Alcohol System
TIME AND PLACE:
Thursday, March 5, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Users of alcohol are incorporated into a societal system, which for
many purposes resembles an ecological system. An understanding of how
this ecological alcohol system works provides an opportunity to
evaluate effectiveness of interventions. I use a hybrid directed graph
social network model calibrated with conditional probabilities derived
from actual data with the idea of reproducing the experience of acute
outcomes reflecting undesirable individual and societal outcomes. In
the present model, I also approximate geospatial effects related to
transportation as well as temporal effects. Drinking behaviors among
underage users can be particularly harmful from both a societal and
individual perspective. Using the model based on data from experiences
in Fairfax County, Virginia, I am able to reproduce the multinomial
probability distribution of acute outcomes with high accuracy using a
microsimulation of all residents of Fairfax, approximately 1,000,000
agents simulated. By adjusting conditional probabilities corresponding
to interventions, I am able to simulate the effects of those
interventions. This methodology provides an effective tool for
investigating the impact of interventions and thus provides guidance
for public policy related to alcohol use.
(Seminar No. 17)
SPEAKER: Dr. Philip Rosenberg
Biostatistics Branch, Division of Cancer Epidemiology and Genetics,
National Cancer Institute, NIH
Rockville, MD 20852-4910, U.S.A.
TITLE:
Proportional Hazards Models and Age-Period-Cohort Analysis of Cancer
Rates
TIME AND PLACE:
Thursday, March 12, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Age-period-cohort (APC) analysis is widely used in cancer epidemiology
to model trends in cancer rates. We develop methods for comparative APC
analysis of two independent cause-specific hazard rates assuming that an
APC model holds for each one. We construct linear hypothesis tests to
determine whether the two hazards are absolutely proportional, or
proportional after stratification by cohort, period, or age. When a
given proportional hazards model appears adequate, we derive simple
expressions for the relative hazards using identifiable APC parameters.
We also construct a linear hypothesis test to assess whether the
logarithms of the fitted age-at-event curves are parallel after
adjusting for possibly heterogeneous period and cohort effects, a
relationship that can hold even when the expected hazard rates are not
proportional. To assess the utility of these new methods, we surveyed
cancer incidence rates in Blacks versus Whites for the leading cancers
in the United States, using data from the National Cancer Institute's
Surveillance, Epidemiology, and End Results Program. Our comparative
survey identified cancers with parallel and crossing age-at-onset
curves, cancers with rates that were proportional after stratification
by cohort, period, or age, and cancers with rates that were absolutely
proportional. Proportional hazards models provide a useful statistical
framework for comparative APC analysis.
(Seminar No. 18)
SPEAKER: Dr. Hormuzd Katki
Biostatistics Branch, Division of Cancer Epidemiology and Genetics,
National Cancer Institute, NIH, DHHS
Rockville, MD 20852-4910, U.S.A.
TITLE:
Insights into p-values and Bayes Factors from False Positive and
False Negative Bayes Factors
TIME AND PLACE:
Thursday, March 26, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The Bayes Factor has stronger theoretical justification
than p-values for quantifying statistical evidence, but when the goal is
hypothesis testing, the Bayes Factor yields no insight about false
positive vs. false negative results. I introduce the False Positive
Bayes Factor (FPBF) and the False Negative Bayes Factor (FNBF) and show
that they are approximately the two components of the Bayes Factor. In
analogy to diagnostic testing, the FPBF and FNBF provide additional
insight not obvious from the Bayes Factor. FPBF & FNBF require only the
p-value and the power under an alternative hypothesis, forging a new
link of p-values to Bayes Factors. This link can be exploited to
understand differences in inferences drawn by Bayes Factors versus
p-values. In a genome-wide association study of prostate cancer, FPBF &
FNBF help reveal the two SNP mutations declared positive by p-values and
Bayes Factors that with future data turned out to be false positives.
(Seminar No. 20)
SPEAKER: Dr. Hiro Hikawa
Department of Statistics, George Washington University
Washington, DC 20052, U.S.A.
TITLE:
Robust Peters-Belson Type Estimators of
Measures of Disparity and their Applications in
Employment Discrimination Cases
TIME AND PLACE:
Thursday, April 16, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In discrimination cases concerning equal pay, the Peters-Belson (PB) regression method is used to estimate the pay disparities between minority and majority employees after accounting for major covariates (e.g., seniority, education). Unlike the standard approach, which uses a dummy variable to indicate protected group status, the PB method first fits a linear regression model for the majority group. The resulting regression equation is then used to predict the salary of each minority employee by using their individual covariates in the equation. The difference between the actual and the predicted salaries of each minority employee estimates the pay differential for that minority employee, which takes into account legitimate job-related factors. The average difference estimates a measure of pay disparity. In practice, however, a linear regression model may not be sufficient to capture the actual pay-setting practices of the employer. Therefore, we use a locally weighted regression model in the PB approach as a specific functional form of the relationship between pay and relevant covariates is no longer needed. The statistical properties of the new procedure are developed and compared to those of the standard methods. The method also extends to the case with a binary (1-0) response, e.g., hiring or promotion. Both simulation studies and re-analysis of actual data show that, in general, the locally weighted PB regression method reflects the true mean function more accurately than the linear model, especially when the true function is not a linear or logit (for a 1-0 response) model. Moreover, only a small loss of efficiency is incurred when the true relation follows a linear or logit model.
(Seminar No. 21)
SPEAKER: Dr. Tsong Yi
(Bio)
Division of Biometrics VI, OB/OTS/CDER, FDA
Silver Spring, MD 20993, U.S.A.
TITLE:
Multiple Testing Issues in Thorough QTc Clinical Trials
TIME AND PLACE:
Thursday, April 23, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Clinical trial endpoint often measured repeatedly at multiple time
points with the objective to show either that the test treatment is
more effective than control treatment at at-least one time point or to
show that it is more effective than control treatment at all time
points. With either objective, it involves with multiple comparisons
and the issues of type I error rate control. We illustrate the problem
with the example of thorough QT clinical trials. The ICH E14, 2005
defined that drug-induced prolongation of QT interval as evidenced by
an upper bound of the 95% confidence interval around the mean effect
on QTc of 10 ms. Further more it defined that a negative thorough
QT/QTc study is one in which the upper bound of the 95% one-sided
confidence interval for the largest time-matched mean effect of the
drug on the QTc interval excludes 10 ms. It leads to the requirement
of showing non-inferiority of the test treatment to placebo at
multiple time points. Conventionally, it is carried out by testing
multiple hypotheses at 5% type I error rate each. The multiple
comparison concern of this analysis is conservativeness when the
number of tests is many. On the other hand, when the study result is
negative, ICH E14 recommended to validate the negative result by
showing that the study population is sensitive enough to show at least
5 ms prolongation of QTc interval of a carefully selected positive
control. The validation test is often carried out by demonstrating
that the mean difference between positive control and placebo is
greater than 5 ms at at-least one of the selected few time points. The
multiple comparison nature of the validation test led to the concerns
of type I error rate inflation. Both of the multiple comparison issue
can be represented by the biasness of using the maximum of the
estimates of treatment difference as the estimate of the maximum of
the expected differences. We will discuss a few proposed approaches to
address the problem.
(Seminar No. 22)
SPEAKER: Dr. Alan Dorfman
Dr. Alan Dorfman
Bureau of Labor Statistics, U.S. Department of Labor
NE Washington, DC 20212-0001, U.S.A.
TITLE:
Nonparametric Regression and the Two Sample Problem
TIME AND PLACE:
Thursday, April 30, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The two sample problem: two distinct surveys gather information on a
variable y of interest from a single frame, differing perhaps in sample
design and sample size, but with common auxiliary information x. How
should we combine the data from the surveys to get a single estimate?
Nonparametric regression: Models are often used in survey sampling to
sharpen inference on y based on more complete knowledge of an auxiliary
variable x. Because of the tentativeness of models in most
circumstances, samplers typically buttress their model-based inference
by embedding it in a design-based framework ("model assisted"
estimation). An alternate approach is to use very weak models and
nonparametric regression.
A simple two sample problem is described and several approaches to
handling it described. A simple, somewhat disguised version of
nonparametric regression provides a nice solution. Some problematic and
controversial aspects of nonparametric regression in survey sampling are
discussed.
(
Seminar No. 23)
SPEAKER: Prof. Andrew J. Waters
Uniformed Services University of the Health Sciences
Bethesda, MD 20814, U.S.A.
TITLE:
Using ecological momentary assessment to study relapse in
addiction
TIME AND PLACE:
Thursday, May 7, 2009, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Rationale
There has been growing interest in the use of handheld computers (PDAs) to collect
behavioral data in a naturalistic or Ecological Momentary Assessment (EMA) setting.
In many EMA studies, participants carry around a PDA with them as they go about
their daily lives. They are beeped at random times on 4 or 5 occasions per day. When
beeped, they complete items assessing subjective and contextual variables. Because
each participant typically completes a fairly large number of assessments, EMA
studies can generate large and complex datasets. The talk will first provide an
overview of how EMA methods have been used to study addiction. I will also discuss a
number of studies in which implicit cognitive assessments (reaction time tasks) have
been administered on a PDA in an EMA setting. In an initial study, twenty-two
smokers and 22 non-smokers carried around a PDA for 1-week (Waters & Li, 2008). They
were beeped at random times on 4 occasions per day (RAs). At each assessment,
participants responded to items assessing subjective, pharmacological, and
contextual variables. They subsequently completed a Stroop task. In a second study,
30 participants completed an Implicit Association Test (IAT) at each assessment. In
a third study, 68 heroin abusers undergoing drug detoxification in a detoxification
clinic completed implicit/explicit cognitive assessments at each assessment. In a
fourth study, 81 participants wishing to quit smoking have carried around a PDA for
1-week after their quit date. The talk will address: 1) The feasibility of assessing
implicit/explicit cognitions on PDAs in an EMA setting; 2) The statistical methods
that have been employed to analyze the EMA data; and 3) The unique associations
between implicit/explicit cognitions and temptations/relapse that have
been revealed
in EMA data.
(Fall 2008, Seminar No. 1)
SPEAKER: Prof. Hosam M. Mahmoud
The George Washington University,
Washington, D.C. 20052, U.S.A.
TITLE: The Polya Process and Applications
TIME AND PLACE:
Thursday, September 18, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We investigate the Polya process, which underlies an urn of white and
blue balls growing in real time. A partial differential equation
governs the evolution of the process. Some special cases are amenable to
exact and asymptotic solution: they include the (forward or backward)
diagonal processes, and the Ehrenfest process.
Applications of standard (discrete) urns and their analogue when embedded
in real time include several classes of random trees that have
applications in computer science, epidemiology and philology. We shall
present some of these applications.
TIME AND PLACE:
Thursday, September 12, 2008, 3:30pm
NO Talk:
AMSC celebration.
(Seminar No. 2)
SPEAKER: Anastasia Voulgaraki, M.Sc.
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
Estimation of Death Rates in US States With Small Subpopulations
TIME AND PLACE:
Thursday, October 2, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The National Center for Health Statistics (NCHS) uses observed
mortality data to publish race-gender specific life tables for
individual states decennially. At ages over 85 years, the reliability
of death rates based on these data is compromised to some extent by
age misreporting. The eight-parameter Heligman-Pollard parametric
model is then used to smooth the data and obtain estimates/extrapolation of
mortality rates for advanced ages. In States with small
sub-populations the observed mortality rates are often zero,
particularly among young ages. The presence of zero
death rates makes the fitting of the Heligman-Pollard model dificult
and at times outright impossible. In addition, since death rates are
reported on a log scale, zero mortality rates are problematic.
To overcome observed zero death rates, appropriate probability models
are used. Using these models, observed zero mortality
rates are replaced by the corresponding expected values.
This enables using logarithmic transformations, and the fitting of
the Heligman-Pollard model to produce
mortality estimates for ages 0-130 years.
(Seminar No. 3)
SPEAKER: Prof. Ali Arab
Georgetown University
Washington, D.C. 20057, U.S.A.
TITLE:
Efficient Parameterization of PDE-Based Dynamics for
Spatio-Temporal Processes
TIME AND PLACE:
Thursday, October 16, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Spatio-temporal dynamical processes in the physical and
environmental sciences are often described by partial
differential equations (PDEs). The
inherent complexity of such processes due to high-
dimensionality and
multiple scales of spatial and temporal variability is often
intensified
by characteristics such as sparsity of data, complicated
boundaries and
irregular geometrical spatial domains, among others. In addition,
uncertainties in the appropriateness of any given PDE for a
real-world
process, as well as uncertainties in the parameters associated
with the
PDEs are typically present. These issues necessitate the
incorporation of
efficient parameterizations of spatio-temporal models that are
capable of
addressing such characteristics. A hierarchical Bayesian model
characterized by the PDE-based dynamics for spatio-temporal
processes based on their Galerkin finite element method (FEM)
representations is
developed and discussed. As an example, spatio-temporal models
based on
advection-diffusion processes are considered. Finally, an
application of
the hierarchical Bayesian modeling approach is presented which
considers
the analysis of tracking data obtained from DST (data storage
devices) sensors to mimic the pre-spawning upstream migration
process of the
declining shovelnose sturgeon.
(Seminar No. 4)
SPEAKER: Prof. Sandra Cerrai
University of Maryland
College Park, MD 20742, U.S.A.
TITLE:
A central limit theorem for some reaction-diffusion equations with
fast oscillating perturbation
TIME AND PLACE:
Thursday, October 23, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We study the normalized difference between the solution $u_\e$ of a
reaction-diffusion equation in a bounded interval $[0,L]$ perturbed by
a fast oscillating term, arising as the solution of a stochastic
reaction-diffusion equation with a strong mixing behavior, and the
solution $\bar{u}$ of the corresponding averaged equation. We assume
the smoothness of the reaction coefficient and we prove that a central
limit type theorem holds. Namely, we show that the normalized
difference $(u_\e-\bar{u})/\sqrt{\e}$ converges weakly in
$C([0,T];L^2(0,L))$ to the solution of the linearized equation where
an extra Gaussian term appears.
(Seminar No. 5)
SPEAKER: Prof. Edward J. Wegman
George Mason University
Fairfax, VA 22030, U.S.A.
TITLE:
Mixture Models for Document Clustering
TIME AND PLACE:
Thursday, October 30, 2008, 3:30pm
Colloquium Room 3206, Math Build
(not the usual room.)
Talk sponsored by Math Stat and the Stat Consortium. There will be
a eception following the talk in
the Math Lounge 3201.
ABSTRACT:
Automatic clustering and classification of documents within corpora is
a challenging task. Often, comparing word usage within the corpus, the
so-called bag-of-words methodology, does this. The lexicon for a corpus
can indeed be very large. For the example of 503 documents that we
consider, there are more than 7000 distinct terms and more than 91,000
bigrams. This means that a term vector characterizing a document will
be approximately 7000 dimensional. In this talk, we use an adaptation
of normal mixture models with 7000 dimensional data to locate centroids
of clusters. The algorithm works surprisingly well and is linear in all
the size metrics.
(Seminar No. 6)
SPEAKER: Dr. Michail Sverchkov
BAE Systems IT and Bureau of Labor Statistics
Washington, DC 20212-0001, U.S.A.
TITLE:
On Estimation of Response Probabilities when Missing Data are
Not Missing at Random
TIME AND PLACE:
Thursday, November 6, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Most methods that deal with the estimation of response probabilities
assume either explicitly or implicitly that the missing data are
'missing at random' (MAR). However, in many practical situations
this assumption is not valid, since the probability to respond often
depends directly on the outcome value. The case where the missing
data are not MAR (NMAR) can be treated by postulating a parametric
model for the distribution of the outcomes before non-response and
a model for the response mechanism. The two models define a parametric
model for the joint distribution of the outcomes and response
indicators, and therefore the parameters of these models can be
estimated by maximization of the likelihood corresponding to this
distribution. Modeling the distribution of the outcomes before
non-response, however, can be problematic since no data are available
from this distribution.
In this talk we propose an alternative approach that allows to
estimate the parameters of the response model without modelling the
distribution of the
outcomes before non-response. The approach utilizes relationships
between the population, the sample and the sample complement
distributions derived in Pfeffermann and Sverchkov (1999, 2003) and
Sverchkov and Pfeffermann (2004).
Key words: sample distribution, complement-sample distribution,
prediction under informative sampling or non-response, estimating
equations, missing information principle, non-parametric estimation
(Seminar No. 7)
SPEAKER: Prof. Malay Ghosh
University of Florida
Gainesville, FL 32611-8545 , U.S.A.
TITLE:
Bayesian Benchmarking in Small Area Estimation
TIME AND PLACE:
Thursday, November 13, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Abstract
(
Seminar No. 8: Special Tuesday Seminar)
SPEAKER: Prof. Gauri S. Datta
University of Georgia
Athens, GA 30602, U.S.A.
TITLE:
Estimation of Small Area Means under Measurement Error Models
TIME AND PLACE:
Tuesday, November 18, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In recent years demand for reliable estimates for characteristics of small domains (small areas) has greatly increased worldwide due to growing use of such estimates in formulating policies and programs, allocating government funds, planning regional development, and marketing decisions at local level. However, due to cost and operational considerations, it is seldom possible to procure a large enough overall sample size to support direct estimates of adequate precision for all domains of interest. It is often necessary to employ indirect estimates for small areas that can increase the effective domain sample size by borrowing strength from related areas through linking models, using census and administrative data and other auxiliary data associated with the small areas. To this end, the nested error regression model for unit-level data and the Fay-Herriot model for the area-level data have been widely used in small area estimation. These models usually treat that the explanatory variables are measured without error. However, explanatory variables are often subject to measurement error. Both functional and structural measurement error models have been recently proposed by researchers in small area estimation to deal with this issue. In this talk, we consider both functional and structural measurement error models in discussing empirical Bayes (equivalently, empirical BLUP) estimation of small area means.
(Seminar No. 9)
SPEAKER: Dr. Gang Zheng
Office of Biostatistics Research,
National Heart, Lung and Blood Institute
6701 Rockledge Drive, Bethesda, MD 20892-7913, U.S.A.
TITLE:
On Robust Tests for Case-Control Genetic Association Studies
TIME AND PLACE:
Thursday, November 20, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
When testing association between a single marker and the disease using
case-control samples, the data are presented in a 2x3 table. Pearson's
Chi-square test (2 df) and the trend test (1 df) are commonly used.
Usually one does not know which of them to choose. It depends on the
unknown genetic model underlying the data. So one could either choose
the maximum (MAX) of a family of trend tests over all possible genetic
models (Davies, 1977, 1987) or take the smaller p-values (MIN2) of
Pearson's test and the trend test
(Wellcome Trust Case-Control Consortium, 2007).
We show that Pearson's test, the trend test and MAX are all trend
tests with different types of scores: data-driven or prespecified and
restricted or not restricted. The results provide insight into the
properties that MAX is always more powerful than Pearson's test when
the genetic model is restricted and that Pearson's test is more
robust when the model is not restricted. For the MIN2 of WTCCC (2007),
we show that its null distribution can be derived, so the p-value of
MIN2 can be obtained. Simulation is used to compare the above four
tests. We apply MIN2 to the result obtained by
The SEARCH Collaborative Group (NEJM, August 21, 2008) who used MIN2
to detect a SNP in a genome-wide association study, but could not
report the p-value for that SNP when MIN2 was used.
References:
1. Joo J, Kwak M, Ahn K and Zheng G. A robust genome-wide scan
statistic of the Wellcome Trust Case-Control Consortium.
Biometrics (to appear).
2. Zheng G, Joo J and Yang Y. Pearson's test, trend test, and MAX are
all trend tests with different type of scores. Unpublished
manuscript. See Slides.
(Seminar No. 10:
This Seminar is on a Tuesday)
SPEAKER: Dr. Yair Goldberg
Hebrew University of Jerusalem,
Mt. Scopus, Jerusalem, Israel
TITLE: Manifold learning: The price of normalization
TIME AND PLACE:
Tuesday, November 25, 2008, 3:30pm
Room 1313, Math Bldg (room number may change)
ABSTRACT:
The problem of finding a compact representation for high-dimensional
data is encountered in many areas of science and has motivated the
development of various dimension-reducing algorithms. The Laplacian
EigenMap dimension-reducing algorithm (Belkin & Niyogi, 2003) is
widely used for its intuitive approach and computational simplicity, claims
to reveal the underlying non-linear structure of high-dimensional data.
We present a general class of examples in which the Laplacian EigenMap
fails to generate a reasonable reconstruction of the data given to it.
We both prove our results analytically and show them empirically. This
phenomenon is then explained with an analysis of the limit-case
behavior of the Laplacian EigenMap algorithm both using asymptotics
and the continuous Laplacian operator. We also discuss the relevance
of these findings to the algorithms Locally Linear Embedding (Roweis
and Saul, 2000), Local Tangent Space Alignment (Zhang and Zha, 2004),
Hessian Eigenmap (Donoho and Grimes, 2004), and Diffusion Maps
(Coifman and Lafon, 2006).
(Seminar No. 11:
DISTINGUISHED STATISTICS CONSORTIUM LECTURE
This Seminar is on a Friday)
SPEAKER:
Mitchell H. Gail, M.D., Ph.D.
Senior Investigator
Biostatistics Branch, Div. Cancer Epidemiology &
Genetics, National Cancer Institute,
Rockville, MD, 20852, U.S.A.
TITLE:
Absolute Risk: Clinical Applications and Controversies
DATE/TIME:
Friday, December 5, 2008, 3:15--5:00pm
PLACE:
Engineering Building Lecture Hall EGR 1202
Immediately following the talk there will be a formal 25-minute
Discussion, with a Reception to follow that.
ABSTRACT:
Absolute risk is the probability that a disease will develop
in a defined age interval in a person with specific risk factors.
Sometimes absolute risk is called "crude" risk to distinguish it from
the cumulative "pure" risk that might arise in the absence of competing
causes of mortality. After defining absolute risk, I shall present a
model for absolute breast cancer risk and illustrate its clinical
applications. I will also describe the kinds of data and approaches that
are used to estimate models of absolute risk and two criteria,
calibration and discriminatory accuracy, that are used to evaluate
absolute risk models. In particular, I will address whether well
calibrated models with limited discriminatory accuracy can be useful.
Dr. Mitchell Gail received an M.D. from Harvard Medical School in 1968
and a Ph.D. in statistics from George Washington University in 1977. He
joined NCI in 1969, and served as chief of the Biostatistics Branch from
1994 to 2008. Dr. Gail is a Fellow and former President of the American
Statistical Association, a Fellow of the American Association for the
Advancement of Science, an elected member of the American Society for
Clinical Investigation, and an elected member of the Institute of
Medicine of the National Academy of Sciences. He has received the
Spiegelman Gold Medal for Health Statistics, the Snedecor Award for
applied statistical research, the Howard Temin Award for AIDS Research,
the NIH Director's Award, and the PHS Distinguished Service Medal.
Discussant:
Professor Bilal Ayyub
Department of Civil & Environmental Engineering, UMCP
College Park, MD, 20742, U.S.A.
Discussion, 4:15pm: Engineering perspectives on Risk
Professor Ayyub is a Professor of Civil and Environmental Engineering at
the University of Maryland College Park and Director of the Center for
Technology and Systems Management. He is a Fellow of the ASCE, ASME, and
SNAME.
(Seminar No. 12)
SPEAKER: Dr. Janice Lent
Energy Information Administration
Washington, DC 20585, U.S.A.
TITLE:
Some Properties of Price Index Formulas
TIME AND PLACE:
Thursday, December 11, 2008, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Price indexes are important statistics that move large amounts of money
in the U.S. economy. In order to adjust monetary figures for
inflation/deflation, we must develop methods of using sample data to
estimate changes in the value of a currency. A vast array of target
price index formulas are discussed in the economics literature. In this
seminar, we will present some of the formulas that are widely used by
government statistical agencies as targets for price index estimation.
We will examine and compare some of the properties of these formulas,
including underlying economic assumptions, ease of estimation, and
sensitivity to extreme values.
Seminar Talks 2006-2007
SPEAKER: Professor Leonid Koralov
Mathematics Department, UMCP
TITLE: Averaging of Hamiltonian Flows with an
Ergodic Component
TIME AND PLACE:
Thurs., Feb. 8, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: We consider a process which consists of the fast
motion along the stream lines of an incompressible periodic vector
field perturbed by the white noise. Together with D. Dolgopyat we
showed that for almost all rotation numbers of the unperturbed flow,
the perturbed flow converges to an effective, "averaged" Markov
process.
SPEAKER: Professor Donald Martin
Mathematics Department, Howard University & Census Bureau
Stat. Resch. Div.
TITLE: Distributions of patterns and statistics
in higher-order Markovian sequences
TIME AND PLACE:
Thurs., Feb. 15, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: In this talk we discuss a method for computing
distributions associated with general patterns and statistics in
higher-order Markovian sequences. An auxiliary Markov chain is
associated with the original sequence and probabilities are computed
through the auxiliary chain, simplifying computations that are
intractable using combinatorial or other approaches. Three distinct
examples of computations are given: (1) sooner or later waiting time
distributions for collections of compound patterns that must occur
pattern-specific numbers of times, using either overlapping counting
or two types of non-overlapping counting; (2) the joint distribution
of the total number of successes in success runs of length at least ,
and the distance between the beginning of the first such success run
and the end of the last one; (3) the distribution of patterns in
underlying variables of a hidden Markov model. Applications to
missing and noisy data and to bioinformatics are given to illustrate
the usefulness of the computations.
SPEAKER:
Professor Alexander S. Cherny
Moscow State University
TITLE: Coherent Risk Measures
TIME AND PLACE:
Tues., Feb. 20, 2007, 3:30pm
Room 1313, Math Bldg
ABSTRACT: The notion of a coherent risk measure was
introduced by Artzner, Delbaen, Eber, and Heath in 1997 and by now
this theory has become a considerable and very rapidly evolving branch
of the modern mathematical finance.
The talk will be aimed at describing basic results of this theory,
including the basic representation theorem of Artzner, Delbaen, Eber,
and Heath as well as the characterization of law invariant risk
measures obtained by Kusuoka.
It will also include some recent results obtained by the author,
related to the strict diversification property and to the
characterization of dilatation monotone coherent risks.
SPEAKER: Dr. Siamak Sorooshyari
Lucent Technologies -- Bell Laboratories
TITLE: A Multivariate Statistical Approach
to Performance Analysis of Wireless Communication Systems
TIME AND PLACE:
Thurs., Mar. 1, 2007, 3:30pm
Room 1313, Math Bldg
NOTE: this seminar is
presented jointly with the Norbert Wiener Center.
ABSTRACT: The explosive growth of wireless communication
technologies has placed paramount importance on accurate performance
analysis of the fidelity of a service offered by a system to a
user. Unlike the channels of wireline systems, a wireless medium
subjects a user to time-varying detriments such as multipath fading,
cochannel interference, and thermal receiver noise. As a
countermeasure, structured redundancy in the form of diversity has
been instrumental in ensuring reliable wireless communication
characterized by a low bit error probability (BEP). In the performance
analysis of diversity systems the common assumption of uncorrelated
fading among distinct branches of system diversity tends to exaggerate
diversity gain resulting in an overly optimistic view of
performance. A limited number of works take into account the problem
of statistical dependence. This is primarily due to the mathematical
complication brought on by relaxing the unrealistic assumption of
independent fading among degrees of system diversity.
We present a multivariate statistical approach to the performance
analysis of wireless communication systems employing diversity. We
show how such a framework allows for the statistical modeling of the
correlated fading among the diversity branches of the system
users. Analytical results are derived for the performance of
maximal-ratio combining (MRC) over correlated Gaussian vector
channels. Generality is maintained by assuming arbitrary power users
and no specific form for the covariance matrices of the received faded
signals. The analysis and results are applicable to binary signaling
over a multiuser single-input multiple-output (SIMO) channel. In the
second half of the presentation, attention is given to the performance
analysis of a frequency diversity system known as multicarrier
code-division multiple-access (MC-CDMA). With the promising prospects
of MC-CDMA as a predominant wireless technology, analytical results
are presented for the performance of MC-CDMA in the presence of
correlated Rayleigh fading. In general, the empirical results
presented in our work show the effects of correlated fading to be
non-negligible, and most pronounced for lightly-loaded communication
systems.
SPEAKER: Professor Harry Tamvakis
Mathematics Department, UMCP
TITLE: The Dominance Order
TIME AND PLACE:
Thurs., Mar. 8, 2007, 3:30pm
Room 1313, Math Bldg
Abstract: The dominance or majorization order has its
origins in the theory of inequalities, but actually appears
in many strikingly disparate areas of mathematics. We will
give a selection of results where this partial order appears,
going from inequalities to representations of the symmetric
group, families of vector bundles, orbits of nilpotent matrices,
and finally describe some recent links between them.
NOTE: The topic of this talk is related to the following
problem being studied in the RIT of Prof. Abram Kagan:
Consider a round robin tournament with n players (each plays with each
one game; the winner gets one point, the loser zero).
The outcome of the tournament is a set of n integers, a1 >=
a2 >= ... >= an where a1 is the
total score of the tournament winner(s), a2 the score of
the second-place finisher, etc. Not all such sets are possible outcomes
but all the possible outcomes can be described.
A number of interesting probability problems arise here. E.g., assume
that n players are equally strong, i. e., the probability that player i
beats player j is 1/2 for all i, j. The expected score of each player in
the tournament is (n-1)/2. But what is the expected score (or the
distribution of the score) of the winner(s)? At the moment the answer is
unknown even in the asymptotic formulation (i. e., for large n).
SPEAKER: Zhibiao Zhao
Staistics Department, University of Chicago
TITLE: Confidence Bands in Nonparametric
Time Series Regression
TIME AND PLACE:
Tues., March 27, 2007, 3:30pm
NOTE special seminar
time.
Room 1313, Math Bldg
Abstract: Nonparametric model validation under dependence
has been a difficult problem. Fan and Yao (Nonlinear Time Series:
Nonparametric and Parametric Methods, 2003, page 406) pointed out that
there have been virtually no theoretical development on nonparametric
model validations under dependence, despite the importance of
the latter problem since dependence is an intrinsic characteristic in
time series. In this talk, we consider nonparametric estimation and
inference of mean regression and volatility functions in non- linear
stochastic regression models. Simultaneous confidence bands are
constructed and the coverage probabilities are shown to be
asymptotically correct. The imposed dependence structure allows
applications in many nonlinear autoregressive processes and linear
processes, including both short-range dependent and long-range
dependent processes. The results are applied to the S&P 500 Index
data. Interestingly, the constructed simultaneous confidence bands
suggest that we can accept the two null hypotheses that the regression
function is linear and the volatility function is quadratic.
SPEAKER: Dr. Ram Tiwari
National Cancer Institute, NIH
TITLE: Two-sample problems in ranked set
sampling
TIME AND PLACE:
Thurs., March 29, 2007, 3:30pm
Room 1313, Math Bldg
Abstract: In many practical problems, the variable of
interest is difficult/expensive to measure but the sampling units can
be easily ranked based on another related variable. For example, in
studies of obesity, the variable of interest may be the amount of body
fat, which is measured by Dual Energy X-Ray Absorptiometry --- a
costly procedure. The surrogate variable of body mass index is much
easier to work with. Ranked set sampling is a procedure of improving
the efficiency of an experiment whereby one selects certain sampling
units (based on their surrogate values) that are then measured on the
variable of interest. In this talk, we will first discuss some results
on two-sample problems based on ranked set samples. Several
nonparametric tests will be developed based on the vertical and
horizontal shift functions. It will be shown that the new methods are
more powerful compared to procedures based on simple random samples of
the same size.
When the measurement of surrogate variable is moderately expensive, in
the presence of a fixed total cost of sampling, one may resort to a
generalized sampling procedure called k-tuple ranked set sampling,
whereby k(>1) measurements are made on each ranked set. In the second
part of this talk, we will show how one can use such data to estimate
the underlying distribution function or the population mean. The
special case of extreme ranked set sample, where data consists of
multiple copies of maxima and minima will be discussed in detail due
to its practical importance. Finally, we will briefly discuss the
effect of incorrect ranking and provide an illustration using data on
conifer trees.
SPEAKER: Guanhua Lu
Statistics Program, UMCP
TITLE: Asymptotic Theory in Multiple-Sample
Semiparametric Density Ratio Models
TIME AND PLACE:
Thurs., April 5, 2007, 3:30pm
Room 1313, Math Bldg
Abstract:
A multiple-sample semiparametric density ratio model can be constructed
by multiplicative exponential distortions of the reference distribution.
Distortion functions are assumed to be nonnegative and of a known
finite-dimensional parametric form, and the reference distribution is left
nonparametric. The combined data from all the samples are used in the
semiparametric large sample problem of estimating each distortion and the
reference distribution. The large sample behavior for both the parameters
and the unknown reference distribution are studied. The estimated
reference distribution has been proved to converge weakly to a zero-mean
Gaussian process.
SPEAKER: Dr. Gabor Szekely
NSF and Bowling Green State University
TITLE: Measuring and Testing Dependence by
Correlation of Distances
TIME AND PLACE:
Thurs., April 12, 2007, 3:30pm
Room 1313, Math Bldg
Abstract:
We introduce a simple new measure of dependence between random
vectors. Distance covariance (dCov) and distance correlation(dCor) are
analogous to product-moment covariance and correlation, but unlike the
classical definition of correlation, dCor = 0 characterizes independence
for the general case. The empirical dCov and dCor are based on certain
Euclidean distances between sample elements rather than sample moments,
yet have a compact representation analogous to the classical covariance
and correlation. Definitions can be extended to metric-space-valued
observations where the random vectors could even be in different metric
spaces. Asymptotic properties and applications in testing independence
will also be discussed. A new universally consistent test of
multivariate independence is developed. Distance correlation can also be
applied to prove CLT for strongly stationary sequences.
Distinguished JPSM Lecture
co-Sponsored by Statistics Consortium
SPEAKER: Professor Roderick J. Little
Departments of Biostatistics and Statistics and Institute for
Social Research, University of Michigan
TITLE: Wait! Should We Use the Survey Weights
to Weight?
TIME AND PLACE:
Friday, April 13, 2007, 3:30pm
Room 2205, Lefrak Hall
Two discussants will speak following Professor Little's talk:
John Eltinge of Bureau of Labor Statistics and Richard Valliant from
JPSM.
SPEAKER: Dr. Song Yang
Office of Biostatistics Research, National Heart Lung and Blood
Institute, NIH
TITLE: Some versatile tests of treatment
effect using adaptively weighted log rank statistics
TIME AND PLACE:
Thurs., April 19, 2007, 3:30pm
Room 1313, Math Bldg
Abstract: For testing treatment effect with time to event
data, the log rank test is the most popular choice and is optimal for
proportional hazards alternatives. When a range of possibly
nonproportional alternatives are possible, combinations of several
tests are often used. Currently available methods inevitably
sacrifice power at proportional alternatives and may also be
computationally demanding. We introduce some versatile tests that use
adaptively weighted log rank statistics. Extensive numerical studies
show that these new tests almost uniformly improve the tests that they
modify, and are optimal or nearly so for proportional alternatives.
In particular, one of the new tests maintains optimality at the
proportional alternatives and also has very good power at a wide range
of nonproportional alternatives, thus is the test we recommend when
flexibility in the treatment effect is desired. The adaptive weights
are based on the model of Yang and Prentice (2005).
Statistics Consortium Lecture
co-Sponsored by JPSM and MPRC
SPEAKER: Professor Bruce Spencer
Statistics Department & Faculty Fellow, Institute for Policy
Research, Northwestern University
TITLE: Statistical Prediction of Demographic
Forecast Accuracy
TIME AND PLACE:
Friday, April 27, 2007, 3:15pm
Room 2205, Lefrak Hall
ABSTRACT: Anticipation of future population change affects
public policy deliberations on (i) investment for health care and pensions,
(ii) effects of immigration policy on the economy, (iii) future
competitiveness of the U.S. economy, to name just three. In this
talk, we review some statistical approaches used to predict the
accuracy of demographic forecasts and functional forecasts underlying
the policy discussions. A functional population forecast is
one that is a function of the population vector as well as other
components, for example a forecast of the future balance of a pension
fund. No background in demography will be assumed, and the necessary
demographic concepts will be introduced from the statistical point of
view. The talk is based on material in Statistical Demography and
Forecasting by J. M. Alho and B. D. Spencer (2005, Springer) and
reflects joint work by the authors.
Following Professor Spencer's talk, there will be a formal
Discussion, by Dr. Peter Johnson of the International Programs Center
of the Census Bureau and Dr. Jeffrey Passel of the Pew Hispanic
Center. Following the formal and floor discussion, there
will be a reception including refreshments.
SPEAKER: Professor Dennis Healy
Mathematics Department, UMCP
TITLE: TBA
TIME AND PLACE: Postponed
NOTE: this seminar will be
presented jointly with the Norbert Wiener Center.
SPEAKER: Dr. Mokshay Madiman
Statistics Department, Yale
TITLE: Statistical Data Compression with
Distortion
TIME AND PLACE:
Tues., January 31, 2006, 3:30pm Note unusual day !
Room 1313, Math Bldg
ABSTRACT: Motivated by the powerful and fruitful connection
between information- theoretic ideas and statistical model selection,
we consider the problem of "lossy" data compression ("lossy" meaning
that a certain amount of distortion is allowed in the decompressed
data) as a statistical problem. After recalling the classical
information-theoretic development of Rissanen's celebrated Minimum
Description Length (MDL) principle for model selection, we introduce
and develop a new theoretical framework for _code selection_ in data
compression. First we describe a precise correspondence between
compression algorithms (or codes) and probability distributions, and
use it to interpret arbitrary families of codes as statistical
models. We then introduce "lossy" versions of several familiar
statistical notions (such as maximum likelihood estimation and MDL
model selection criteria), and we propose new principles for building
good codes. In particular, we show that in particular cases, our
"lossy MDL estimator'" has the following optimality property: Not only
it converges to the best available code (as the amount of data grows),
but it also identifies the right class of codes in finite time with
probability one.
[Joint work with Ioannis Kontoyiannis and Matthew Harrison.]
This talk is by Invitation of the
Hiring Committee.
SPEAKER: Lang Withers
MITRE Signal Processing Center
TITLE: The Bernoulli-trials
Distribution and Wavelet
This talk is jointly sponsored with the Harmonic Analysis Seminar
this week.
TIME AND PLACE:
Thurs., February 2, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk is about a probability distribution
function for Bernoulli ("coin-toss") sequences. We use the Haar
wavelet to analyze it, and find that this function just maps binary
numbers in [0,1] into general p-binary numbers in [0,1]. Next we see
that this function obeys a two-scale dilation equation and use it to
construct a family of wavelets. This family contains the Haar wavelet
and the piecewise-linear wavelet as special cases. What is striking
here is how naturally probability and wavelets interact: the Haar
wavelet sheds light on the meaning of a distribution; the distribution
happens to obey a two-scale dilation equation and lets us make it into
a wavelet.
We take up the more general case of the distribution function for
multi-valued Bernoulli trials. A special case of this for three-valued
trials is the Cantor function. Again we find that it just maps ternary
numbers into generalized ternary numbers. I hope to develop the Cantor
wavelet as well in time for the talk.
Audience: advanced undergrad and up; some familiarity with wavelets and
measure theory is helpful.
Click here to
see a current draft of the speaker's paper on the subject of the talk.
SPEAKER: Hyejin Shin
Department of Statistics, Texas A&M University
TITLE: An RKHS Formulation of Discrimination
and Classification for Stochastic Processes
TIME AND PLACE:
Thurs., February 9, 2006, 12:30-1:45pm
Room 3206, Math Bldg
Note unusual time and place for this
seminar !
ABSTRACT: Modern data collection methods are now
frequently returning observations that should be viewed as the result
of digitized recording or sampling from stochastic processes rather
than vectors of finite length. In spite of great demands, only a few
classification methodologies for such data have been suggested and
supporting theory is quite limited. Our focus is on discrimination and
classification in the infinite dimensional setting. The methodology
and theory we develop are based on the abstract canonical correlation
concept in Eubank and Hsing (2005) and motivated by the fact that
Fisher's discriminant analysis method is intimately tied to canonical
correlation analysis. Specially, we have developed a theoretical
framework for discrimination and classification of sample paths from
stochastic processes through use of the Lo`eve-Parzen isometric
mapping that connects a second order process to the reproducing kernel
Hilbert space generated by its covariance kernel. This approach
provides a seamless transition between finite and infinite dimensional
settings and lends itself well to computation via smoothing and
regularization.
This talk is by Invitation of the
Mathematics Department Hiring Committee.
SPEAKER: Professor Jae-Kwang Kim
Dept. of Applied Statistics, Yonsei University, Korea
TITLE: Regression fractional hot deck imputation
TIME AND PLACE:
Thurs., February 16, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Imputation using a regression model is
a method to preserve the correlation
among variables and to provide imputed point estimators.
We discuss the implementation of regression imputation using fractional
imputation. By a suitable choice of fractional weights, the fractional
regression imputation can take the form of hot deck fractional imputation,
thus no artificial values are constructed after the imputation. A variance
estimator, which extends the method of Kim and Fuller (2004, Biometrika),
is also proposed. By a suitable choice of imputation cells, the proposed
estimators can be made robust against the failure of the assumed regression
imputation model. Comparisons based on simulations are presented.
Professor Kim has made the slides for his talk available here .
SPEAKER: Professor Hannes Leeb
Yale University, Statistics Department
TITLE: Model selection and inference in regression
when the number of explanatory variables is of the same order as
sample size.
TIME AND PLACE:
Thurs., February 23, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Some of the most challenging problems in
modern econometrics and statistics feature a large number of possibly
important factors or variables, and a comparatively small sample
size. Examples include portfolio selection, detection of fraudulent
customers of credit card or telephone companies, micro-array analysis,
or proteomics.
I consider one problem of that kind: Regression with random design,
where the number of explanatory variables is of the same order as
sample size. The focus is on selecting a model with small predictive
risk.
Traditional model selection procedures, including AIC, BIC, FPE or MDL,
perform poorly in this setting. The models selected by these procedures
can by anything from mildly suboptimal to completely unreasonable,
depending on unknown parameters. In addition, inference procedures
based on the selected model, like tests or confidence sets, are invalid,
irrespective of whether a good model has been chosen or not.
I propose a new approach to the model selection problem in this setting
that explicitly acknowledges the fact that the number of explanatory
variables is of the same order as sample size. This approach has
several attractive features:
1) It will select the best predictive model asymptotically, irrespective of
unknown parameters (under minimal conditions).
2) It allows for inference procedures like tests or confidence sets
based on the selected model that are asymptotically valid.
3) Simulations suggest that the asymptotics in 1 and 2 above `kick in'
pretty soon, e.g., in a problem with 1000 parameters and 1600 observations.
These results are currently work in progress.
Professor Leeb will also give a second,
more general talk for the campus statistical community which is
jointly sponsored by the Stat Program in the Math Department along
with the campus Statistics Consortium. Details for the second talk are
as follows:
SPEAKER: Professor Hannes Leeb
Yale University, Statistics Department
TITLE: Model Selection and Inference: Facts
and Fiction
TIME AND PLACE:
Friday., February 24, 2006, 3:00pm
Lefrak Building Room 2205
ABSTRACT: Model selection has an important impact on
subsequent inference. Ignoring the model selection step leads to
invalid inference. We discuss some intricate aspects of data-driven
model selection that do not seem to have been widely appreciated in
the literature. We debunk some myths about model selection, in
particular the myth that consistent model selection has no effect on
subsequent inference asymptotically. We also discuss an
`impossibility' result regarding the estimation of the finite-sample
distribution of post-model-selection estimators.
A paper of Professor Leeb covering most of the issues in the second
talk can be found here.
This talk is jointly sponsored by the
Statistics Consortium and the Statistics Program in the Mathematics
Department. The talk will be followed by refreshments at 4:30pm.
SPEAKER: Guoxing (Greg) Soon, Ph.D.
Office of Biostatistics, CDER, Food & Drug Administration
TITLE: Statistical Applications in FDA
TIME AND PLACE:
Thurs., March 2, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk will be divided into three
parts. In the beginning I will briefly describe the kind of work the
FDA statistician do, then I will discuss two topics, one is on "From
Intermediate endpoint to final endpoint: a conditional power approach
for accelerated approval and interim analysis", one is on "Computer
Intensive and Re-randomization Tests in Clinical Trials".
1. Statistical Issues in FDA
Statistics plays an important role in the FDA's decision making
process. Statistical inputs were critical for design, conduct,
analysis and interpretation of clinical trials. The statistical issues
we dealt with include, but not limited to the following:
appropriateness of randomization procedure, determination of analysis
population, blinding, potential design flaws that may lead to biases,
quality of endpoint assessment, interim analysis, information
handling, missing values, discontinuations, decision rule, analysis
methods, and interpretation. In this talk I will describe the type of
work we do with a few examples.
2. From Intermediate endpoint to final endpoint: a conditional power
approach for accelerated approval and interim analysis
For chronic and life threatening diseases, the clinical trials
required for final FDA approval may take a long time. It is therefore
sometimes necessary to approve the drug temporarily (accelerated
approval) based on early surrogate endpoints. Traditionally such
approvals were based on similar requirements on the surrogate
endpoints as if it is final endpoint, regardless of the quality of the
surrogacy. However, in this case the longer term information on some
patients is ignored, and the risk for the eventual failure on the
final approval is not being considered.
In contrast, in typical group sequential trials, only information on
the final endpoint on a fraction of patients are used, and short-term
endpoints on other patients are being ignored. This reduces the
efficiency of inferences and will also fail to account for potential
shift of population over the course of the trial.
In this talk I will propose an approach that utilizes both short-term
surrogate and long-term final endpoint at interim or intermediate
analyses, and the decision for terminating trial early, or granting
temporary approval, will be based on the likelihood of seeing a
successful trial were the trial to be completed. Issues on Type I
error control as well as efficiency of the procedure will be
discussed.
3. Computer Intensive and Re-randomization Tests in Clinical
Trials
Quite often clinicians are concerned about balancing important
covariates at baseline. Allocation methods designed to achieve
deliberate balance on baseline covariates, commonly called dynamic
allocation or minimization, were used for this purpose. This
non-standard allocation poses challenge for the common statistical
analysis. In this talk I will examine robustness of level and power of
common tests with deliberately balanced assignments when assumed
distribution of responses is not correct.
There are two methods of testing with such allocations: computer
intensive and model based. I will review some of the common mistaken
attitudes about the goals of randomization. And I will discuss some
simulations that attempt to explore the operating characteristics of
re-randomization and model based analyses when model assumptions are
violated.
Click here
to see the slides for Dr. Soon's talk.
SPEAKER: Professor Lee K. Jones
Department of Mathematical Sciences, University of
Massachusetts Lowell
TITLE: On local minimax estimation with some
consequences for ridge regression,
tree learning and reproducing kernel methods
This talk is jointly sponsored with the Harmonic Analysis Seminar
this week.
TIME AND PLACE:
Thurs., March 9, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Local learning is the process of determining the value of an unknown
function at only one fixed query point based on information about the
values of the function at other points. We propose an optimal
methodology ( local minimax estimation) for local learning of
functions with band-limited ranges which differs from (and is
demonstrated in many interesting cases to be superior to) several
popular local and global learning methods. In this theory the
objective is to minimize the (maximum) prediction error at the query
point only - rather than minimize some average performance over the
entire domain of the function. Since different compute-intensive
procedures are required for each different query, local learning
algorithms have only recently become feasible due to the advances in
computer availability, capability and parallelizability of the last
two decades.
In this talk we first apply local minimax estimation to linear
functions. A rotationally invariant approach yields ridge regression,
the ridge parameter and optimal finite sample error bounds. A scale
invariant approach similarly yields best error bounds but is
fundamentally different from either ridge or lasso regression. The
error bounds are given in a general form which is valid for
approximately linear target functions.
Using these bounds an optimal local aggregate estimator is derived
from the trees in a Breiman (random) forest or a deterministic
forest. Finding the estimator requires the solution to a challenging
large dimensional non-differentiable convex optimization problem.
Some approximate solutions to the forest optimization are given for
classification using micro-array data.
Finally the theory is applied to reproducing kernel Hilbert space
and an improved Tikhonov estimator for probability of correct
classification is presented along with a proposal for local
determination of optimal kernel shape without cross validation.
To see a copy of the paper on which the talk is based, click
here .
SPEAKER: Professor Reza Modarres
George Washington University, Department of Statistics
TITLE: Upper Level Set Scan Statistic for
Detection of Disease and Crime Hotspots
TIME AND PLACE:
Thurs., March 16, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
The upper level set (ULS) scan statistic, its theory, implementation,
and extens ion to the bivariate data are discussed. The ULS-Hotspot
algorithm that obtains the response rates, maintains a list of
connected components at each level of th e rate function and yields
the ULS tree is described. The tree is grown in the immediate
successor list, which provides a computationally efficient method for
likelihood evaluation, visualization and storage. An example shows
how the zones are formed and the likelihood function is developed for
each candidate zone. Bivariate hotspot detection is discussed,
including the bivariate binomial model, the multivariate exceedance
approach, and the bivariate Poisson distribution. The Intersection
method is recommended as it is simple to implement, using univariate
hotspot detection methods. Applications to mapping of crime hotspots
and disease clusters are presented.
Joint work with G.P. Patil.
SPEAKER: Professor Robert Mislevy
Department of Educational Measurement & Statistics (EDMS),
UMCP
TITLE: A Bayesian perspective on structured
mixtures of IRT models: Interplay among psychology, evidentiary
arguments, probability-based reasoning
TIME AND PLACE:
Thurs., March 30, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: (Joint paper with Roy Levy, Marc Kroopnick,
and Daisy Wise, all of EDMS.)
Structured mixtures of item response theory (IRT) models are used in
educational assessment for so-called cognitive diagnosis, that is,
supporting inferences ab out the knowledge, procedures, and
strategies students use to solve problems. Th ese models arise from
developments in cognitive psychology, task design, and psy chometric
models. We trace their evolution from the perspective of Bayesian
inf erence, highlighting the interplay among scientific modeling,
evidentiary argument, and probability-based reasoning about
uncertainty.
This work draws in part on the first author's contributions to the
National Research Council's (2002) monograph, available online :
Knowing what students know, J. Pellegrino, N. Chudowsky, &
R. Glaser (Eds.), Washington, D.C.: National Academy Press.
On Friday, April 7, 2006,
JPSM is sponsoring a Distinguished Lecture:
SPEAKER: Nora Cate Schaeffer
TITLE: Conversational Practices with a Purpose:
Interaction within the Standardized Interview
TIME AND PLACE:
Friday, April 7, 2006, 3:30pm
Room 2205 Lefrak Hall
There will be a reception immediately afterwards.
ABSTRACT: The lecture will discuss interactions in survey
interviews and standardization as it is actually pacticed. An early
view of the survey interview characterized it as a "conversation with
a purpose," and this view was later echoed in the description of
survey interviews as "conversations at random." In contrast to these
informal characterizations of the survey interview, stand the formal
rules and constraints of standardization as they have developed over
several decades. Someplace in between a "conversation with a purpose"
and a perfectly implemented standardized interview are the actual
practices of interviewers and respondents as they go about their
tasks. Most examinations of interaction in the survey interview have
used standardization as a starting point and focused on how
successfully standardization has been implemented, for example by
examining whether interviewers read questions as worded. However, as
researchers have looked more closely at what interviewers and
respondents do, they have described how the participants import into
the survey interview conversational practices learned in other
contexts. As such observations have accumulated, they provide a
vehicle for considering how conversational practices might support or
undermine the goals of measurement within the survey interview. Our
examination of recorded interviews from the Wisconsin Longitudinal
Study provides a set of observations to use in discussing the
relationship among interactional practices, standardization, and
measurement.
SPEAKER: Prof. Jiuzhou Song
Department of Animal Sciences, UMCP
TITLE: The Systematic Analysis for Temporal
Gene Expression Analysis
TIME AND PLACE:
Thurs., April 13, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In temporal gene expression analysis, we propose a strategy to explore
the use of gene and treatment effect information, and build synthetic
genetic network. Assuming that variations of gene expression are
caused by different conditions, we classified all experimental
conditions into several subgroups via clustering analysis which groups
conditions based on the similarity of temporal gene expression
profiles, this procedure is useful because it allows us to combine
more diverse gene expression data sets as they become available, by
setting a reference gene we described makes the genetic regulatory
networks laid on a concrete biological foundation. We also visualized
the gene activation process via starting point and ending point, and
combined all of the information to describe genetic regulatory
relationships and obtain consensus gene activation order. The
estimation of activation points and building of synthetic genetic
network may result in important new insights in ongoing endeavor to
understand the complex network of gene regulations.
On Thursday, April 20, 2006,
4:15-6:45pm, there will be a Statistics Consortium
Sponsored Statistics Day event, involving a Distinguished Lecture and
a Discussion at Physics Building Room 1410.
DISTINGUISHED SPEAKER: Professor Peter
Bickel Statistics Department,
University of California, Berkeley
TITLE: Using Comparative Genomics to
Assess the Function of Noncoding Sequences
TIME AND PLACE:
Thursday, April 20, 2006, 4:15-6:00 pm
Room 1410, Physics Building
ABSTRACT: We have studied 2094 NCS of length
150-200bp from Edward Rubin's
laboratory. These sequences are conserved at high homology between
human, mouse, and fugu. Given the degree of homology with fugu, it
seems plausible that all or part of most of these sequences is
functional and, in fact, there is already some experimental validation
of this conjecture. Our goal is to construct predictors of regulation
(or potential irrelevance) by the NCS of nearby genes and further using
binding sites and the transcription factors that bind to them to deduce
some pathway information. One approach is to collect covariates such as
features of nearest genes, physical clustering indices, etc, and use
statistical methods to identify covariates, select among these for
importance, relate these to each other and use them to create stochastic
descriptions of the NCS which can be used for NCS clustering and NCS and
gene function prediction singly and jointly. Of particular importance so
far has been GO term annotation and tissue expression of downstream
genes as well as the presence of blocks of binding sites known from
TRANSFAC data base in some of the NCS. Our results so far are
consistent with those of recent papers engaged in related explorations
such as Woolfe et al (2004), Bejerano et al (2005) and others but also
suggest new conclusions of biological interest.
DISCUSSANT: Dr. Steven Salzberg
Director, Center for Bioinformatics and Computational Biology, and
Professor, Department of Computer Science, University of Maryland
The Lecture and Discussion will
be followed by a reception (6:00-6:45pm)
in the Rotunda of the Mathematics Building.
SPEAKER: Dr. Neal Jeffries
National Institute of Neurological Diseases and Stroke
TITLE: Multiple Comparisons Distortions of
Parameter Estimates
TIME AND PLACE:
Thurs., April 27, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
In experiments involving many variables investigators typically use
multiple comparisons procedures to determine differences that are
unlikely to be the result of chance. However, investigators rarely
consider how the magnitude of the greatest observed effect sizes may
have been subject to bias resulting from multiple testing. These
questions of bias become important to the extent investigators focus
on the magnitude of the observed effects. As an example, such bias
can lead to problems in attempting to validate results if a biased
effect size is used to power a follow-up study. Further, such factors
may give rise to conflicting findings in comparing two independent
samples -- e.g. the variables with strongest effects in one study may
predictably appear much less so in a second study. An associated
important consequence is that confidence intervals constructed using
standard distributions may be badly biased. A bootstrap approach is
used to estimate and correct the bias in the effect sizes of those
variables showing strongest differences. This bias is not always
present; some principles showing what factors may lead to greater
bias are given and a proof of the convergence of the bootstrap
distribution is provided.
Key words: Effect size, bootstrap, multiple comparisons
SPEAKER: Professor Bing Li
Department of Statistics, Penn State University
TITLE: A Method for Sufficient
Dimension Reduction in Large-p-Small-n Regressions
TIME AND PLACE:
Thurs., May 4, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Large-p-small-n data, in which the number of recorded
variables (p) exceeds the number of independent observational units
(n), are becoming the norm in a variety of scientific fields. Sufficient
dimension reduction provides a meaningful and theoretically motivated
way to handle large-p-small-n regressions, by restricting
attention to d < n linear combinations of the original
p predictors. However, standard sufficient dimension reduction
techniques are themselves designed to work for n > p, because
they rely on the inversion of the predictor sample covariance
matrix. In this article we propose an iterative method that
eliminates the need for such inversion, using instead powers
of the covariance matrix. We illustrate our method with a genomics
application; the discrimination of human regulatory elements
from a background of ``non-functional" DNA, based on their alignment
patterns with the genomes of other mammalian species. We also
investigate the performance of the iterative method by simulation,
obtaining excellent results when n < p or $n \approx p$. We
speculate that powers of the covariance matrix may allow us to
effectively exploit available information on the predictor
structure in identifying directions relevant to the regression.
SPEAKER: Professor Biao Zhang
Mathematics Department, University of Toledo
TITLE: Semiparametric ROC Curve Analysis
under Density Ratio Models
TIME AND PLACE:
Thurs., May 11, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Receiver operating characteristic (ROC) curves are commonly used to
measure the accuracy of diagnostic tests in discriminating disease and
nondisease. In this talk, we discuss semiparametric statistical
inferences for ROC curves under a density ratio model for disease and
nondisease densities. This model has a natural connection to the
logistic regression model. We explore semiparametric inference
procedures for the area under the ROC curve (AUC), semiparametric
kernel estimation of the ROC curve and its AUC, and comparison of the
accuracy of two diagnostic tests. We demonstrate that statistical
inferences based on a semiparametric density ratio model are more
robust than a fully parametric approach and are more efficient than a
fully nonparametric approach.
SPEAKER: Prof. Eric Slud
Mathematics Department, UMCP
TITLE: "General position" results on uniqueness of
optimal nonrandomized Group-sequential decision
procedures in Clinical Trials
TIME AND PLACE:
Thurs., Oct. 26, 2006, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk will first give some background
on group- or batch-sequential hypothesis tests for
treatment effectiveness in two-group clinical trials.
Such tests are based on a test statistic like the
logrank, repeatedly calculated at a finite number of
"interim looks" at the developing clinical trial
survival data, where the timing of each look can in
principle depend on all previously available data. The
focus of this talk will be on a decision-theoretic
formulation of the problem of designing such trials,
when, as is true in large trials, the data can be
viewed as observations of a Brownian motion with
drift, and the drift parameter quantifies the
difference in survival distributions between the
treatment and control groups. The new results
presented in the talk concern existence and
uniqueness of nonrandomized optimal designs, subject
to constraints on type I and II error probability,
under fairly general loss functions when the cost
functions are slightly perturbed, randomly, as
functions of time. The proof techniques are related
to old results on level-crossings for continuous
time random processes.
This work is joint with Eric Leifer, a UMCP PhD of
several years ago now at the Heart, Lung and Blood
Institute at NIH.
To see a copy of the slides for the talk, click here .
SPEAKER: Prof. Ross Pinsky
Mathematics Department, Technion, Israel
TITLE: Law of Large Numbers for Increasing
Subsequences of Random Permutations
TIME AND PLACE:
Tues., August 23, 2005, 2pm
Room 1313, Math Bldg
ABSTRACT: click here.
SPEAKER: Prof. Paul Smith
Statistics Program, Mathematics Department, UMCP
TITLE: Statistical Analysis of Ultrasound
Images of Tongue Contours
during Speech
TIME AND PLACE:
Thurs., September 15, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: The shape and movement of the tongue are critical
in the formation of human speech. Modern imaging techniques allow
scientists to study tongue shape and movement without interfering with
speech. This presentation describes statistical isssues arising from
ultrasound imaging of tongue contour data.
There are many sources of variability in tongue image data,
including speaker to speaker differences, intraspeaker differences, noise
in the images, and other measurement problems. To make matters worse, the
tongue is supported entirely by soft tissue, so no fixed co-ordinate
system is available. Statistical methods to deal with these problems are
presented.
The goal of the research is to associate tongue shapes and sound
production. Principal component analysis is used to reduce contours.
Combinations of two basic shapes accurately represent tongue contours.
The results are physiologically meaningful and correspond well to actual
speech activity. The methods are applied to a sample of 16 subjects, each
producing four vowel sounds. It was found that principal components clearly
distinguish vowels based on tongue contours.
We also investigate whether speakers fall into distinct groups on the
basis of their tongue contours. Cluster analysis is used to identify
possible groupings, but many variants of this technique are possible
and the results are sometimes conflicting. Methods to compare
multiple cluster analyses are suggested and applied to tongue contour
to assess the meaning of apparent speaker clusters.
SPEAKER: Prof. Benjamin Kedem
Statistics Program, Mathematics Department, UMCP
TITLE: A Semiparametric Approach to Time Series
Prediction
TIME AND PLACE:
Thurs., September 22, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Given m time series regression models, linear or
not, with additive noise components, it is shown how to estimate the
predictive probability distribution of all the time series conditional
on the observed and covariate data at the time of prediction. This is
done by a certain synergy argument, assuming that the distributions of
the noise components associated with the regression models are tilted
versions of a reference distribution. Point predictors are obtained
from the predictive distribution as a byproduct. An application to US
mortality rates prediction will be discussed.
A former student of our Statistics
Program, Dean Foster of the Statistics
Department at the Wharton School, University of Pennsylvania, will
be visiting the Business School on Friday
9/23/05 and giving a seminar entitled
"Learning Nash equilibria via public
calibration" from 3-4:15 pm in Van
Munching Hall Rm 1206.
You can see an abstract of the talk by clicking here.
SPEAKER: Professor Steven Martin
Department of Sociology, University of Maryland College Park
TITLE: Reassessing delayed and forgone marriage in
the United States
TIME AND PLACE:
Wed., September 28, 2005, 3:30pm
Room 1313, Math Bldg
NOTE
UNUSUAL TIME !
ABSTRACT: Do recent decreases in marriage rates mean
that more women are forgoing marriage, or that women are simply
marrying at later ages? Recently published demographic projections
from standard nuptiality models that suggest changes in marriage rates
have different implications for women of different social classes,
producing an "education crossover" in which four-year college graduate
women have become more likely to marry than other women in the US,
instead of less likely as has been the case for at least a century.
To test these findings, I develop a new projection technique that
predicts the proportion of women marrying by age 45 under flexible
assumptions about trends in age-specific marriage rates and effects of
unmeasured heterogeneity. Results from the 1996 and 2001 Surveys of
Income and Program Participation suggest that the "crossover" in
marriage by educational attainment is either not happening or is
taking much longer than predicted. Also, recent trends are broadly
consistent with an ongoing slow decline in proportions of women ever
marrying, although that decline is less pronounced in the last decade
than in previous decades.
SPEAKER: Professor Rick Valliant
Joint Program in Survey Methodology, Univ. of Michigan &
UMCP
TITLE: Balanced Sampling with Applications to Accounting
Populations
TIME AND PLACE:
Thurs., October 6, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Weighted balanced sampling is a way of restricting the configure of
sample units that can be selected from a finite population. This
method can be extremely efficient under certain types of structural
models that are reasonable in some accounting problems. We review
theoretical results that support weighted balancing, compare different
methods of selecting weighted balanced samples, and give some
practical examples. Where appropriate, balancing can meet precision
goals with small samples and can be robust to some types of model
misspecification. The variance that can be achieved is closely
related to the Godambe-Joshi lower bound from design-based theory.
One of the methods of selecting these samples is restricted
randomization in which "off-balance" samples are rejected if selected.
Another is deep stratification in which strata are formed based on a
function of a single auxiliary and one or two units are selected with
equal probability from each stratum. For both methods, inclusion
probabilities can be computed and design-based inference done if
desired.
Simulation results will be presented to compare results from balanced
samples with ones selected in more traditional ways.
SPEAKER: Professor Wolfgang Jank
Department of Decision & Information Technologies
The Robert H. Smith School of Business, UMCP
TITLE: Stochastic Variants of EM:
Monte Carlo, Quasi-Monte Carlo, and More
TIME AND PLACE:
Thurs., October 20, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
We review recent advances in stochastic implementations of the EM
algorithm. We review the Ascent-based Monte Carlo EM algorithm, a new
automated version of Monte Carlo EM based on EM's likelihood ascent
property. We discuss more efficient implementations via quasi-Monte
Carlo sampling. We also re-visit a new implementation of the old
stochastic approximation version for EM. We illustrate some of the
methods on a geostatistical model of online purchases.
The slides for Professor Jank's presentation are linked
here .
SPEAKER: Professor Ciprian Crainiceanu
Johns Hopkins Biostatistics Department, School of Public Health
TITLE: Structured Estimation under Adjustment
Uncertainty
TIME AND PLACE:
Thurs., October 27, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT:
Population health research is increasingly focused on identifying
small risks by use of large databases containing millions of
observations and hundreds or thousands of covariates. As a result,
there is an increasing need to develop statistical methods to
estimate these risks and properly account for all their sources of
uncertainty. An example is the estimation of the health effects
associated with short-term exposure to air pollution, where the
goal is to estimate the association between daily changes in
ambient levels of air pollution and daily changes in the number of
deaths or hospital admissions accounting for many confounders,
such as other pollutants, weather, seasonality, and influenza
epidemics.
Regression models are commonly used to estimate the effect of an
exposure on an outcome, while controlling for confounders. The
selection of confounders and of their functional form generally
affects the exposure effect estimate. In practice, there is often
substantial uncertainty about this selection, which we define here
as ``adjustment uncertainty".
In this paper, we propose a general statistical framework to
account for adjustment uncertainty in risk estimation called
``Structured Estimation under Adjustment Uncertainty (STEADy)". We
consider the situation in which a rich set of potential
confounders is available and there exists a model such that every
model nesting it provides the correctly adjusted exposure effect
estimate. Our approach is based on a structured search of the
model space that sequentially identifies among all the potential
confounders the ones that are good predictors of the exposure and
of the outcome, respectively.
Through theoretical results and simulation studies, we compare
``adjustment uncertainty" implemented with STEADy versus ``model
uncertainty" implemented with Bayesian Model Averaging (BMA) for
exposure effect estimation. We found that BMA, by averaging
parameter estimates adjusted by different sets of confounders,
estimates a quantity that is not the scientific focus of the
investigation and can over or underestimate statistical
variability. Another potential limitation of BMA in this context
is the strong dependence of posterior model probabilities on prior
distributions. We show that using the BIC approximation of
posterior model probabilities favors models more parsimonious than
the true model, and that BIC is not consistent under assumptions
relevant for moderate size signals.
Finally we apply our methods to time series data on air pollution
and health to estimate health risks accounting for adjustment
uncertainty. We also compare our results with a BMA analysis of
the same data set. The open source R package STEADy
implementing this methodology for Generalized Linear Models
(GLMs) will be available at the R
website.
You can see the paper on which this talk is based, here .
No Seminar Thursday 11/3. But NOTE
special seminar at unusual time on Monday 11/7, below.
SPEAKER: Professor Lise Getoor
Department of Computer Science, UMCP
TITLE: Learning Statistical Models from
Relational Data
TIME AND PLACE:
Mon., November 7, 2005, 4-5pm
Room 1313, Math Bldg
NOTE
UNUSUAL TIME !
ABSTRACT:
A large portion of real-world data is stored in commercial relational
database systems. In contrast, most statistical learning methods work
only with "flat" data representations. Thus, to apply these methods, we
are forced to convert the data into a flat form, thereby losing much of
the relational structure present in the data and potentially introducing
statistical skew. These drawbacks severely limit the ability of current
methods to mine relational databases.
In this talk I will review recent work on probabilistic models,
including Bayesian networks (BNs) and Markov Networks (MNs) and their
relational counterpoints, Probabilistic Relational Models (PRMs) and
Relational Markov Networks (RMNs). I'll briefly describe the
development of techniques for automatically inducing PRMs directly
from structured data stored in a relational or object-oriented
database. These algorithms provide the necessary tools to discover
patterns in structured data, and provide new techniques for mining
relational data. As we go along, I'll present experimental results in
several domains, including a biological domain describing tuberculosis
epidemiology, a database of scientific paper author and citation
information, and Web data.
Power-point slides for an extended tutorial
related to Professor Getoor's talk can be found here
. Additional related research can be found at her home-page.
SPEAKER: Professor Victor de Oliveira
Department of Mathematical Sciences, University of Arkansas
TITLE: Bayesian Analysis of Spatial Data:
Some Theoretical Issues and Applications in the Earth
Sciences
TIME AND PLACE:
Thurs., November 10, 2005, 4:00pm
Room 3206, Math Bldg
NOTE change to unusual 4-5pm
time-slot and unusual location!!
ABSTRACT: Random fields are useful mathematical tools for
modeling spatially varying phenomena. This talk will focus on
Bayesian analysis of geostatistical data based on Gaussian random
fields (or models derived from these), which have been extensively
used for the modeling and analysis of spatial data in most earth
sciences, and are usually the default model (possibly after a
transformation of the data).
The Bayesian approach for the analysis of spatial data has seen
in recent years an upsurge in interest and popularity, mainly
due to the fact that it is particularly well suited for
inferential problems that involve prediction.
Yet, implementation of the Bayesian approach faces several
methodological and computational challenges, most notably:
(1) The likelihood behavior of covariance parameters is not
well understood, with the possibility for ill behaviors.
In addition, there is a lack of automatic or default prior
distributions for the parameters these models, such as
Jeffreys and reference priors.
(2) There are substantial computational difficulties for the
implementation of Markov chain Monte Carlo methods required
for carrying out Bayesian inference and prediction based on
moderate or large spatial datasets.
This talk presents recent advances in the formulation of
default prior distributions as well as some properties,
Bayesian and frequentist, of inferences based on these priors.
We illustrate some of the issues and problems involved using
simulated data, and apply the methods for the solution of
several inferential problems based on two spatial datasets:
one dealing with pollution by nitrogen in the Chesapeake bay,
and the other dealing with depths of a geologic horizon based
on censored data.
If time permits, a new computational algorithm is described that
can substantially reduce the computational burden mentioned in (2).
Finally, we describe some challenges and open problems whose
solution would make the Bayesian approach more appealing.
NO STATISTICS SEMINAR Thursday, November
17, 2005.
BUT NOTE THAT ON FRIDAY, NOVEMBER 18, 2005, THERE IS A PAIR OF TALKS
in the Distinguished Lecture Series at the University of Maryland
co-sponsored by the Joint Program in Survey Methodology and the
University of Maryland Statistics Consortium.
The first talk is by Alastair Scott, titled "The
Design and Analysis of Retrospective Health Surveys." The second,
titled "The Interplay Between Sample Survey Theory and Practice: An
Appraisal," is by J. N. K. Rao. Click
here for additional details about the speakers and
talks.
Dr. Scott's talk will begin
at 1:00 pm and will be discussed by Barry Graubard from the
National Cancer Institute and Graham Kalton from Westat and
JPSM.
Dr. Rao's talk will begin at 3:00 pm and will be
discussed by Phil Kott from the National
Agricultural Statistical Service and Mike Brick from Westat and
JPSM.
Both talks will be held in 2205 LeFrak Hall.
There will be a reception immediately afterwards at 4:45.
SPEAKER: Professor Michael Cummings
Center for Bioinformatics and Computational Biology, UMCP
TITLE: Analysis of Genotype-Phenotype
Relationships: Machine Learning/Statistical Methods
TIME AND PLACE:
Thurs., December 8, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: Understanding the relationship of genotype to
phenotype is a fundamental problem in modern genetics research.
However, significant analytical challenges exist in the study of
genotype-phenotype relationships. These challenges include genotype
data in the form of unordered categorical values (e.g., nucleotides,
amino acids, SNPs), numerous levels of variables, mixture of variable
types (categorical and numerical), and potential for non-additive
interactions between variables (epistasis). These challenges can be
dealt with through use of machine learning/statistical approaches such
as tree-based statistical models and random forests. These methods
recursively partition a data set in two (binary split) based on values
of a single predictor variable to best achieve homogeneous subsets of
a categorical response variable (classification) or to best separate
low and high values of a continuous response variable (regression).
These methods are very well suited for the analysis of
genotype-phenotype relationships and have been shown to provide
outstanding results. Examples to be presented include identifying
amino acids important in spectral tuning in color vision and
nucleotide sequence changes important in some growth characteristics
in maize.
SPEAKER: Dr. Myron Katzoff
National Center for Health Statistics/ Centers for Disease Control
TITLE: Statistical Methods for Decontamination
Sampling
TIME AND PLACE:
Thurs., December 15, 2005, 3:30pm
Room 1313, Math Bldg
ABSTRACT: This talk will be about an adaptive sampling
procedure applicable to microparticle removal and a methodology for
validating a computational fluid dynamics (CFD) model which it is
believed will be useful in refining such a procedure. The adaptive
sampling procedure has many features in common with current field
practices; its importance is that it would enable valid statistical
inferences. The methodology for CFD model validation which is
described employs statistical techniques used in the frequency domain
analysis of spatio-temporal data. Seminar attendees will be encouraged
to contribute their thoughts on alternative proposals for analyses of
experimental data for CFD model validation.
Slides from the talk can be viewed here .
|