If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

March: Comments and Questions

Page history last edited by bob pruzek 13 years, 1 month ago

Please post your comments and questions in March here.

Comments (Show all 72)

I'm having some trouble with the function corresp() in the MASS library. I have a contingency table which is of class "table" as opposed to "data.frame" or some other form which corresp() will accept. I am having no luck trying to convert the table to a matrix or data.frame and then running corresp(). I tried b=as.data.frame(x) and then corresp(b) and received Error in x%%1 : non-numeric argument to binary operator. (Along with a host of other trial-and-error methods.) I don't know what non-numeric argument it is trying to convert? Maybe the row titles? It looks like a contingency table (Just like the caith data.)! Any suggestions?

Samira, Try this: First, convert the table to a matrix: e.g. xxm=matrix(xx,ncol=p) for xx the table; then xxdf=data.frame(xxm).
It works for me here. (I often find that data.frame is better than as.data.frame, am not sure why.) Let me know how this works. b

This works. Thank you. Very happy now.

professor, to use the function corresp() does the data need to be in a table format? is it possible to analyze a 4 dimensional array using corresp() function? thank you.

Andrey, Did you check the help file? It says, about the input, "x, formula The function is generic, accepting various forms of the principal argument for specifying a two-way frequency table. Currently accepted forms are matrices, data frames (coerced to frequency tables), objects of class "xtabs" and formulae of the form ~ F1 + F2, where F1 and F2 are factors." So it is clearly intended ONLY for 2-way arrays, usually called cross-tabs or contingency tables. Did you try it? b

I am having some trouble understanding the 'absolute standardized covariate effect sizes w/ and w/out PS adjustment' graph. How is the red line being 'adjusted'?
What does an individual effect size from one variable mean, is it like a regression value where an increase in that variable increases the outcome by that much?
Finally, how do similar effect sizes indicate balance?
Any help would be appreciated.

The individual effect size is the effect size of that variable between two groups (sample mean x1 - sample mean x2)/(pooled standard deviation). In the contexts of binomial covariates, I think the average is the proportion of people with characteristic? A small effect size (in magnitude) means that the covariates are relatively

Hi,
I was not able to be in class on Thursday. Is there an assignment that is due tomorrow?

Richard, Matthew's response is ok, but should have ended w/ 'relatively well balanced' The denominator for the ES is the same for the Unadjusted and adjusted ES's, which means the differences between the two entail comparing simple mean differences w/ averages (across strata) of mean differences -- in both cases scaled using the relevant covariate's (pooled) s.d.
The assignment involves starting to review for Thursday's test, and reading (most of) the long pdf on Missing data that I am posting now. b

Dr. Pruzek,

Just wondering if you received my e-mail about syllabus question. I re-sent it yesterday afternoon but still haven't got your feedback.

No, Manchun, I did not get it. Please send to rmpruzek@yahoo.com. same for anyone else who did not get a response.... bob

I just send the document from another e-mail....hope this time it can get through.

Will we be able to use our notes during the test on Thursday?

Dr. Pruzek,

I didn't get any feedback about my syllabus questions that I sent it last Saturday. I already tried three times. Please let me if you get it. Thank you so much!

For your information, I DID NOT GET your emails before today w/ your answers to questions (for Manchun and Yi). Now that I have, I have
edited them and sent those edits back to you. Let me know if you do not receive them.
Samira, and everyone. Yes, as I said several times in class, all my tests are open book and open notes. But do not bring a computer to class as tests are not 'open computer'. bob

Is missing data going to be on the exam?

We are unclear about how to interpret the vector(s) of weights that are converted into quantitative composite variables.

I know th

I know that ATE is Average Treatment Effect and ATT is Average Effect of the Treatment on the Treated. How do you compute these? Is ATT the effect for those in the treatment group and ATE the effect on all individuals? I read the Elisabeth Stuart paper and she mentions weighting subclass estimates, but I still don't understand.

Marisa,
Please provide a context for all questions. I will presume that you mean the vectors of weights obtained in a correspondence analysis (but tell me if that is wrong).
Suppose there are four categories (as for caith data, rows); then think about the indicator matrix w/ four columns for this categorical variable; then form a composite
of the columns using the four corresp-derived weights for these four categories. That is, if the weights are a, b, c and d, generally some negative, some positive, you
would take the columns, x1....x4 and multiply each column by these weights: a*x1 +b*x2 + c*x3 +d*x4 to get a composite, say x.comp. If the same were done for
the four columns of the indicator matrix for the ROW categories of the initial cross-tabs matrix, you might call the result y.comp. The correlation between x.comp and y.comp
would now, having used these particular (canonical) weights, be as large as it could possibly be for one pair of composite variables. The categorical variables would have
lead in this case to two quantitative composites. bob

Samira, Let's forget about ATT for this test, or until it could be discussed in class. ATE is what you learned about when the loess.psa or circ.psa functions were run. I went over
their computation in detail in the handouts for the birthwt example, so the computation is fully illustrated there. Ask further if you don't understand what you see there. bob

One more thing: Jamie, I don't know where I answered this question before, but just in case anyone is in doubt, missing data will not be on the exam tomorrow. bp

Professor,
Will this exam be cumulative?
If so, which topics should we focus on?

The exam will be cumulative, but the emphasis will be on topics we've covered near the end. See the syllabus for the topics we've covered and for examples of questions
you should be ready to address. Write answers HERE if you want to some of those questions and I will publicly edit them if you get them in by say 4 pm on Tuesday. In
order not to be overwhelmed, however, I'll limit this to the first five or six questions you, the class, posts. This puts a premium on posting sooner than later. The rest
we can of course cover in the review class on Tuesday. b

Q: Identify two key challenges that often arise in dealing with missing data. Elaborate
Do you mean things such as, methods to deal with missing data often assume MAR? Or things such as, many statistical methods were not developed to handle missing data? I don't understand this question exactly.

I mean this: one challenge has to do w/ deleting cases where 'some' values are missing -- which cases to delete, and which not? The other concerns how
missing values (those that remain) are to be estimated: what different methods might be used, what assumptions do these different methods make, and
what 'seems most reasonable' in particular contexts. You are not asked to answer such questions, as you see, you are asked to elaborate on what these
challenges are, and how they might be expected to play out in data analysis practice. We'll speak about it more in class if you ask. bp

A few of us are studying, and discussing how the stadnardized effect size relates to confidence intervals (a question you mentioned in class). Here's what we have so far: If we take the standardized effect size between two means, and multiply it by sqrt(n) (n being the total sample size), we get a t statistic. From here, we can construct a confidence interval. As an example, P( -2 < (effect size) * sqrt(n) < 2) ~= 95%. Is this a satisfactor answer? We are curious on what relationship you refer to between the standardized ES and confidence intervals.

To the few of you, and those who look on,
The problem is not so much the algebra, it is the understanding of this relationship. To wit: what is the denominator of the t (for a two independent sample comparison)? Write the algebra. Once you
have that, you have the standard error (estimate) of the sampling distribution of the difference in the two independent means; then ask: what is the denominator of the ES in this case? How does
the standard error relate algebraically to the latter denominator? The answer, with a bit of elaboration, gives the connection between the st. ES and the t. Discuss these two ideas. Use examples (in R?)
to make the ideas explicit. I'd like students to be able to reason about such matters and this means especially to interpret relationships of the kind we are talking about here. HTH, bp

The algebra of the denominator of the t is (sqrt(1/n1+1/n2))*S.D.(pooled), the standard error(SE) algebra is sqrt(s1^2/n1+s2^2/n2). Under the condition when n1 and n2 are large enough, the two algebra can be seen to be approximately the same. Using the standard error, we can calculate approximate confidence intervals for the mean. For instance, if we want the 95% C.I., we use the following algebra to get the upper bound and lower bound:

Upper 95% Limit = Xbar+1.96*SE
Lower 95% Limit = Xbar-1.96*SE

On the other hand, the denominator of the ES is just S.D.(pooled), therefore by multiplying st. ES by 1/sqrt(1/n1+1/n2), we get the t and can subsequently get C.I.

Am I on the right track here?

I was reviewing the "MoreComparingGroupsBotht&Anova.pdf" document when I got puzzled how the non-parametric transformation was applied to the original data. It is my understanding that in NP we get all the scores together, rank them, keep track of their groups, and then use parametric methods of analysis on the ranked data. I tried to replicate this practice, but I can't get the exact same dataset as shown right in the middle of the second page of the document. How exactly did we get from the original data frame:
[1,] 26 21 22 26 19 22 26 25 24 21 23 23 18 29 22
[2,] 18 23 21 20 20 29 20 16 20 26 21 25 17 18 19

to the ranked data as below?
[1,] 25.0 21.3 22.0 25.7 18.7 21.7 25.3 24.3 23.7 20.7 22.7 23.0 17.7 26.0 22.3
[2,] 18.0 23.3 21.0 20.0 19.3 26.3 19.0 16.7 19.7 24.7 20.3 24.0 17.0 17.3 18.3

all help is appreciated.

Chuck, That any careful reader might be puzzled is understandable. First, I jittered the scores (since there are tied values); then I transformed the ranks to a scale w/ the same median and spread as the
initial scale. The latter step helps make the scores to be in the same metric while still being linearly related to the ranks themselves. If you look at the t or F statistic for comparing two groups you should
get the same thing, approximately (due to jittering), as I got w/ my version. (We did not have enough class time for me to discuss the details or my special function that does what I've described here, so
this is why your question helps me address that. Thanks.) BP ps. Let me know what your t and F comparisons are....

I obtained the F statistics for comparing two groups in conventional way and not surprisingly I got the same results as shown in the handout. In terms of the NP mothod, I now understand what you did for the transformation but on the operational level , I found it difficult to replicate the second step. I had no problem using the jitter function, but what does it mean to "tranform the ranks to a scale w/ the same media and spread as the initial scale"?

thanks,
Chuck

Another question, when reviewing the "Basic.Matrix.Ops.correl.regrsn11.pdf" document, I am not sure if I fully understand the role of matrix "D.smc". According to the notes in the document, the diagonal of D.smc is the squared multiple correlations which can be used to predict each column of Z (or X) from all other columns. Since we already have the product-moment correlation calculated from t(Z) %*% Z, what is the benefit of getting D.smc?

Many thanks,
Chuc

Chuck,
As for doing what I did in the rank transform way (and this goes for anyone interested), I will email you or put on the wiki, the function that does the job. But only after the final exam.
Q2. Your q. puzzles me. The correlation matrix shows how each pair of variables relate to one another. The squared multiple correlation does what you describe, a wholly different matter.
What am I missing here? bp

Hi Bob,

I am not sure if you are stiil monitoring the wiki pages, but I am wondering if you could please share the function you mentioned that would perform the rank transform, I am still very intrigued finding out how it is done in R.

Also, have you considered realeasing a exam answer sheet for the final exam. Not only that I am interested in how I did in the final, but also I hate being blind to what I have possibly done wrong.

Thank you and thanks for the great semester.

Chuck

I think I confused the two concepts before. I think now I understand. t(Z) %*% Z or cor(X) gets the Pearson product-moment correlation coefficient, the "r" and D.smc gets the squared multiple correlations, the "R^2". I guess the reason I was confused before is that the example only have two columns(variables) so the D.smc doesn't really provide additional information that we don't already know from cor(X), numercially speaking. But once we get more than 2 columns (variables), D.smc predict each column from all other columns.

For example, using the tree data (3 variables), we have
tree5<-trees[1:5,]
tree5=as.matrix(tree5)
Dn=diag(t(tree5) %*% L(5) %*% tree5)
Dn=diag(1/sqrt(Dn))
Z=L(5) %*% tree5 %*% Dn
t(Z) %*% Z # Get Pearson product-moment correlation
[,1] [,2] [,3]
[1,] 1.0000000 0.7757376 0.9758850
[2,] 0.7757376 1.0000000 0.8940005
[3,] 0.9758850 0.8940005 1.0000000

> R<-t(Z) %*% Z
> R.inverse<-solve(R)
> S<-diag(solve(R))
> S.sqrd<-diag(1/S)
> D.smc=diag(3)-S.sqrd # Gets squared multiple correlations.
[,1] [,2] [,3]
[1,] 0.9989322 0.000000 0.0000000
[2,] 0.0000000 0.995501 0.0000000
[3,] 0.0000000 0.000000 0.9994617

Hope this helps.

Chuck, When you say "t(Z) %*% Z or cor(X) gets the Pearson product-moment correlation coefficient, the "r" " , your language needs editing.
Say instead: t(Z) %*% Z or cor(X) gets the MATRIX OF Pearson product-moment correlation coefficientS, the matrix of "r" values for ALL PAIRS OF VARIABLES. bp

We are still unsure about how to answer the question: What is principal component analysis? Be able to describe using matrix multiplications.
We've reached the understanding that PCA involves making linear combinations of the original variables that are uncorrelated with each other to reduce the amount of variables necessary, and to view the structure of the original matrix.
How are these new vectors that are created by PCA used to analyze the data? What do these new vectors say about the relationships between the original variables? We are still unsure about how to use the principal components after you have found them.

This question really is too basic to be asked the day before the final exam! To answer it, I will depend on the 'tutorial on PCA' that I
have just uploaded to the Files page (so you must go to the last page of this wiki to get it). I suggest that to cut the reading down to the
relevant minimum, vis-a-vis your question that you go to the section on the SVD, and read that carefully. Ask questions about that here by
say 11 pm this evening and I will answer them here. Best, BP

Addendum: The section I'm referring to is VI, and you should substitute our Z (where t(Z) %*% Z = R ) for the author's X. Also D.lambda, the singular
values for the author's Sigma. Then note that Z = U D V' (or U %*% D %*% t(V) in R) is consistent w/ the author's U SIGMA V' (V' = V transpose, V exp T).
Finally, an approximation to Z can be based on some limited number of components, say Z-hat(m) = U(m) D(m) (Vm ') where the first m columns are
designated on the right side. Hope this helps. bp

March: Comments and Questions

Please post your comments and questions in March here.

March: Comments and Questions

Page Tools

Insert links

Comments (Show all 72)

Samira said

bob pruzek said

Samira said

Andrey Avakov said

bob pruzek said

Richard said

Matthew Swahn said

Samira said

bob pruzek said

Manchun Chang said

bob pruzek said

Manchun Chang said

Samira said

Yi Lu said

bob pruzek said

Jamie Kammer said

Marisa Reuber said

Samira said

Samira said

bob pruzek said

bob pruzek said

bob pruzek said

Richard said

bob pruzek said

Richard said

bob pruzek said

Matthew Swahn said

bob pruzek said

Chuck Yang said

Chuck Yang said

bob pruzek said

Chuck Yang said

Chuck Yang said

bob pruzek said

Chuck Yang said

Chuck Yang said

bob pruzek said

Richard said

bob pruzek said

bob pruzek said

Join this workspace

Navigator

SideBar

Recent Activity