<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>StatisticsPedia</title>
	<atom:link href="http://www.statisticspedia.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.statisticspedia.com</link>
	<description></description>
	<lastBuildDate>Sat, 20 Aug 2011 19:44:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Regression without an intercept &#8211; Yay or Nay?</title>
		<link>http://www.statisticspedia.com/articles/basic-statistics/regression-without-an-intercept-yay-or-nay/</link>
		<comments>http://www.statisticspedia.com/articles/basic-statistics/regression-without-an-intercept-yay-or-nay/#comments</comments>
		<pubDate>Sat, 20 Aug 2011 19:44:42 +0000</pubDate>
		<dc:creator>Dason</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Basic Statistics]]></category>
		<category><![CDATA[Regression Analysis]]></category>
		<category><![CDATA[intercept]]></category>
		<category><![CDATA[linear model]]></category>
		<category><![CDATA[regression]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=388</guid>
		<description><![CDATA[There was a post on TalkStats quite a while ago that I ended up recalling for another thread. I thought I&#8217;d post it here for a better reference and because the idea is important. A poster asked when it is appropriate to run a regression without an intercept. There was some discussion but mainly we [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>There was a post on TalkStats quite a while ago that I ended up recalling for another thread.  I thought I&#8217;d post it here for a better reference and because the idea is important.  A poster asked when it is appropriate to run a regression without an intercept.  There was some discussion but mainly we decided that it&#8217;s not usually a good idea.  Bryangoodrich posted this:</p>
<p>&#8220;If you purposely exclude it, then you&#8217;re basically forcing the regression to be (0, 0) when all the X terms are 0&#8243;</p>
<p>And I replied with:</p>
<p>And note that even though in a lot of situations this sounds desirable &#8211; it isn&#8217;t a reason that should force you to not include an intercept. The reason being that you are forcing an added restriction on your model. You&#8217;re already making the assumption that the response is linear with respect to the predictors. This might be a reasonable approximation locally but if all of your predictors are very far away from 0 then this greatly influences the fit of the model.</p>
<p>Consider the following fake dataset:</p>
<img src='http://s.wordpress.com/latex.php?latex=%20%20%5Cbegin%7Btabular%7D%7Bc%7Cc%7D%20%20x%20%26%20y%20%5C%5C%20%20%5Chline%20%2010%20%26%20100%5C%5C%20%2011%20%26%2099%20%5C%5C%20%2012%20%26%2096%20%5C%5C%20%20%5Cend%7Btabular%7D%20%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='  \begin{tabular}{c|c}  x &amp; y \\  \hline  10 &amp; 100\\  11 &amp; 99 \\  12 &amp; 96 \\  \end{tabular}  ' title='  \begin{tabular}{c|c}  x &amp; y \\  \hline  10 &amp; 100\\  11 &amp; 99 \\  12 &amp; 96 \\  \end{tabular}  ' class='latex' />
<p>Let&#8217;s look at the predictions for the models with an intercept and the predictions for a model without the intercept.</p>
<img src='http://s.wordpress.com/latex.php?latex=%20%20%5Cbegin%7Btabular%7D%7Bc%7Cc%7Ccc%7D%20%20x%20%26%20y%20%26%20With%20int.%20%26%20Without%20int.%5C%5C%20%20%5Chline%20%2010%20%26%20100%20%26%20100.3%20%26%2088.79%5C%5C%20%2011%20%26%2099%20%26%2098.3%20%26%2097.67%5C%5C%20%2012%20%26%2096%20%26%2096.3%20%26%20106.55%5C%5C%20%20%5Cend%7Btabular%7D%20%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='  \begin{tabular}{c|c|cc}  x &amp; y &amp; With int. &amp; Without int.\\  \hline  10 &amp; 100 &amp; 100.3 &amp; 88.79\\  11 &amp; 99 &amp; 98.3 &amp; 97.67\\  12 &amp; 96 &amp; 96.3 &amp; 106.55\\  \end{tabular}  ' title='  \begin{tabular}{c|c|cc}  x &amp; y &amp; With int. &amp; Without int.\\  \hline  10 &amp; 100 &amp; 100.3 &amp; 88.79\\  11 &amp; 99 &amp; 98.3 &amp; 97.67\\  12 &amp; 96 &amp; 96.3 &amp; 106.55\\  \end{tabular}  ' class='latex' />
<p>Now it might sound silly to fit a model with no intercept to this data. But what if I told you that the data was generated from a model where y would be 0 when x is 0. The model is <img src='http://s.wordpress.com/latex.php?latex=%20y%20%3D%20-x%5E2%20%2B%2020x&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' y = -x^2 + 20x' title=' y = -x^2 + 20x' class='latex' />. Now this is a quadratic model but the linear fit with an intercept gives a good local approximation for the data we had. If you knew that the response needed to be 0 when x was 0 and forced that into your model you&#8217;re only hurting yourself in this situation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/articles/basic-statistics/regression-without-an-intercept-yay-or-nay/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Testing Multivariate Normality</title>
		<link>http://www.statisticspedia.com/articles/basic-statistics/testing-multivariate-normality/</link>
		<comments>http://www.statisticspedia.com/articles/basic-statistics/testing-multivariate-normality/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 19:38:55 +0000</pubDate>
		<dc:creator>terzi</dc:creator>
				<category><![CDATA[Basic Statistics]]></category>
		<category><![CDATA[M]]></category>
		<category><![CDATA[N]]></category>
		<category><![CDATA[Statistical Research]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=344</guid>
		<description><![CDATA[Normality is, perhaps, the most common assumption in statistics. Many tests and models are somehow related to a normal distribution. Testing for normality is a common procedure and we all know (or should know) how to test for univariate normality: we have the Shapiro-Wilk test, the Shapiro-Francia test, Skewness and Kurtosis coefficients and my personal [...]]]></description>
			<content:encoded><![CDATA[<p></p><p style="text-align: justify">Normality is, perhaps, the most common assumption in statistics. Many tests and models are somehow related to a normal distribution. Testing for normality is a common procedure and we all know (or should know) how to test for univariate normality: we have the Shapiro-Wilk test, the Shapiro-Francia test, Skewness and Kurtosis coefficients and my personal favorites, graphical displays. So, no problem there. The issues start when we talk about multivariate normality.</p>
<p style="text-align: justify">Many researchers don&#8217;t test for multivariate distributions, maybe because some multivariate techniques are very robust against deviations from multivariate normality, like discriminant analysis, for example. Still, it is important to verify the assumption in order to understand the severity of  violations. Also, we should remember that some multivariate procedures are really, really, sensitive to non-normal distributions, such as Box&#8217;s M statistic, which is almost useless even with some minor deviations from multivariate normality. Ok, testing normality is good, but how can I test the multivariate hypothesis?</p>
<p style="text-align: justify">The most common way of testing multivariate normality, one that I really do not like but that many authors recommend, is testing for univariate and bivariate normality. <strong>Normality on each of the variables separately is a necessary, but not sufficient, condition for multivariate normality to hold.</strong> That is, each of the individual variables must be normally distributed for the variables to follow a multivariate normal distribution. Another property of a multivariate normal distribution imply that all pairs of variables must be bivariate normal. Bivariate normality, for correlated variables, implies that the scatterplots for each pair of variables will be elliptical; the higher the correlation, the thinner the ellipse. So, as a partial check on multivariate normality, one could verify univariate normality in every single variable and then obtain the scatterplots for pairs of variables and see if they are approximately elliptical.</p>
<p style="text-align: justify">As I wrote, many authors support this procedure. Gnanadesikan (1977) stated: &#8220;In practice, except for rare or pathological examples, the presence of joint (multivariate) normality is likely to be detected quite often by methods directed at studying the marginal (univariate) normality of the observations on each variable&#8221;. Johnson and Wichern (1992) agreed: &#8220;Moreover, for most practical work, one-dimensional and two-dimensional investigations are ordinarily sufficient. Fortunately, pathological data sets that are normal in lower dimensional representations but non normal in higher dimensions are not frequently encountered in practice&#8221;.</p>
<p style="text-align: justify">Personally, I don&#8217;t think a univariate analysis should be the standard procedure for analyzing multivariate normality, since there are many other good methods available for studying this distribution directly, not only through hypothesis testing but also with some graphical tools.</p>
<p style="text-align: justify">Indeed, there are tests developed for testing multivariate normality. Mardia (1970) produced a generalization of both skewness and kurtosis measures for multivariate normal distributions. From these, two statistics were developed: Mardia&#8217;s statistic of multivariate kurtosis and Mardia&#8217;s statistic of multivariate skewness. An omnibus test was later proposed. There&#8217;s also a multivariate Shapiro-Wilk test, proposed by Royston (1983). This one combines the univariate tests creating a statistic H which has an approximate <img src='http://s.wordpress.com/latex.php?latex=%20%5Cchi%5E2%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' \chi^2 ' title=' \chi^2 ' class='latex' /> distribution with non-integer degrees of freedom e. Using HATCO&#8217;s data, a common data set taken from an industrial enterprise and used in the multivariate statistics book from Hair, we can calculate some of these statistics:</p>
<p style="text-align: center"> <strong>Mardia&#8217;s Statistic of Multivariate Skewness= 25.3078</strong></p>
<p style="text-align: center"><img src='http://s.wordpress.com/latex.php?latex=%20%5Cchi%5E2%20%2884%29%20%3D437.7%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' \chi^2 (84) =437.7 ' title=' \chi^2 (84) =437.7 ' class='latex' />    <img src='http://s.wordpress.com/latex.php?latex=%20Prob%3E%5Cchi%5E2%3D0.000%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' Prob&gt;\chi^2=0.000 ' title=' Prob&gt;\chi^2=0.000 ' class='latex' /></p>
<p style="text-align: center"><strong>Mardia&#8217;s Statistic of Multivariate Kurtosis=74.6837</strong></p>
<p style="text-align: center"><img src='http://s.wordpress.com/latex.php?latex=%20%5Cchi%5E2%20%281%29%20%3D27.08%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' \chi^2 (1) =27.08 ' title=' \chi^2 (1) =27.08 ' class='latex' />      <img src='http://s.wordpress.com/latex.php?latex=%20Prob%3E%5Cchi%5E2%3D0.000%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' Prob&gt;\chi^2=0.000 ' title=' Prob&gt;\chi^2=0.000 ' class='latex' /></p>
<p style="text-align: center">
<p style="text-align: justify">So far, the most popular statistic for multivariate normality (at least to my humble knowledge) is the Doornik-Hansen test (2008). To obtain this statistic the multivariate observations are transformed and then univariate skewness and kurtosis for each of the transformed variables are computed. These measures are finally combined in an approximate χ2 statistic:</p>
<p style="text-align: center"><strong>Doornik-Hansen Multivariate Normality Test</strong></p>
<p style="text-align: center"><img src='http://s.wordpress.com/latex.php?latex=%20%5Cchi%5E2%20%2814%29%20%3D78.36%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' \chi^2 (14) =78.36 ' title=' \chi^2 (14) =78.36 ' class='latex' />     <img src='http://s.wordpress.com/latex.php?latex=%20Prob%3E%5Cchi%5E2%20%3D0.000&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' Prob&gt;\chi^2 =0.000' title=' Prob&gt;\chi^2 =0.000' class='latex' /></p>
<p style="text-align: center">
<p style="text-align: justify">As we can see, results tend to agree: our dataset does not seem to adjust a multivariate normal distribution. Sadly, most of this values are not included in common statistics software. If you happen to fall in this case, don&#8217;t worry, there are more solutions. Stevens (1986) included in the first two editions of his book “Applied Multivariate Statistics” a graphical procedure to assess for multivariate normality. The result is a graph very similar to those obtained with normal probability plots.</p>
<p style="text-align: justify">Remember that the Mahalanobis distance of a multivariate vector with a mean vector <img src='http://s.wordpress.com/latex.php?latex=%20%5Cmu%20&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt=' \mu ' title=' \mu ' class='latex' /> and covariance matrix <em>S</em> is formally defined as:</p>
<p><a href="http://www.statisticspedia.com/wp-content/uploads/2011/07/7d12d753978a8d7714b13777f05927e0.png"><img class="aligncenter size-full wp-image-355" src="http://www.statisticspedia.com/wp-content/uploads/2011/07/7d12d753978a8d7714b13777f05927e0.png" alt="" width="273" height="31" /></a><span></span></p>
<p style="text-align: justify">Well, the trick is simple: first, calculate Mahalanobis Distance of your data and plot it against ordered quantiles of a Chi-square distribution. I don&#8217;t recall any function in R that can immediately obtain this graph, though we have the <em>mahalanobis</em> and <em>qchisq</em> functions. The resulting graph for the HATCO data is something like this:</p>
<p><a href="http://www.statisticspedia.com/wp-content/uploads/2011/07/Graph.jpg"><img class="aligncenter size-full wp-image-357" src="http://www.statisticspedia.com/wp-content/uploads/2011/07/Graph.jpg" alt="" width="690" height="530" /></a><span></span></p>
<p style="text-align: justify">This graph is a powerful way to assess for multivariate normality. As in any normality plot, we wish to see a straight 45° line that ensures multivariate normality. I like it because it not only tells you whether your data is multivariate normal or not, but it may also show what&#8217;s wrong. In our dataset, there are two observations that deviate heavily from multivariate normality, possibly outliers, certainly cases that require further attention.</p>
<p style="text-align: justify">As you can see, there are several ways to test for multivariate normality and some are easy to implement. Please don&#8217;t forget that distributional assumptions are important and if these are not met, your analysis may not be as powerful as you wished.</p>
<p style="text-align: justify">
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/articles/basic-statistics/testing-multivariate-normality/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Principal Components Analysis – Step by Step (Part III)</title>
		<link>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step3/</link>
		<comments>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step3/#comments</comments>
		<pubDate>Wed, 25 May 2011 20:57:40 +0000</pubDate>
		<dc:creator>terzi</dc:creator>
				<category><![CDATA[P]]></category>
		<category><![CDATA[Statistical Research]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=283</guid>
		<description><![CDATA[You can find the first two sections of this tutorial here: Principal Components Analysis – Step by Step (Part I) Principal Components Analysis – Step by Step (Part II) Hi! This is the last part of this tutorial, a tiny introduction to Principal Components Analysis. So far we have seen the basics of the analysis [...]]]></description>
			<content:encoded><![CDATA[<p></p><p style="text-align: justify" lang="en-US">You can find the first two sections of this tutorial here:</p>
<p style="text-align: justify" lang="en-US"><a title="PCA step by step - Part I" href="http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step1/" target="_blank">Principal Components Analysis – Step by Step (Part I)</a></p>
<p style="text-align: justify" lang="en-US"><a title="PCA step by step - Part II" href="http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step2/" target="_blank">Principal Components Analysis – Step by Step (Part II) </a></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Hi! This is the last part of this tutorial, a tiny introduction to Principal Components Analysis. So far we have seen the basics of the analysis and some interpretations.  Now, we&#8217;ll see some post estimation statistics and values that are usually used to qualify the overall results of our PCA. In fact these methods look more like pre estimation commands rather than post estimation ones, yet these are commonly used at the end to confirm the results.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">I think the most common post estimation measure for PCA is the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy, so let&#8217;s start with that.  <strong>KMO takes values between 0 and 1, with small values indicating that overall the variables have too little in common to warrant a PCA analysis</strong>. This is done by comparing the partial correlations and the correlations of  our variables. Historically (or at least according STATA&#8217;s help file), the following labels are often given to values of KMO:</span></span></p>
<p style="text-align: center" lang="en-US"><span><span style="font-size: small">0.00  to 0.49     &gt;&gt;&gt; <strong>Unacceptable</strong></span></span></p>
<p style="text-align: center" lang="en-US"><span><span style="font-size: small">0.50  to 0.59     &gt;&gt;&gt; <strong>Miserable</strong></span></span></p>
<p style="text-align: center" lang="en-US"><span><span style="font-size: small">0.60  to 0.69     &gt;&gt;&gt; <strong>Mediocre</strong></span></span></p>
<p style="text-align: center" lang="en-US"><span><span style="font-size: small">0.70  to 0.79     &gt;&gt;&gt; <strong>Middling</strong></span></span></p>
<p style="text-align: center" lang="en-US"><span><span style="font-size: small">0.80  to 0.89     &gt;&gt;&gt; <strong>Meritorious</strong></span></span></p>
<p style="text-align: center" lang="en-US"><span><span style="font-size: small">0.90  to 1.00     &gt;&gt;&gt; <strong>Marvelous</strong></span></span></p>
<p style="text-align: center" lang="en-US"><span><span style="font-size: small"><strong> </strong></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">I wonder why almost all labels start with an “M”. Anyway, for our  example, the KMO resulted in <em>0.8153</em> which is “Meritorious”. Wow, it sound like we would get a medal for that, so let&#8217;s just call it OK. Usually, when KMO turns out to be low that is also seen in the unexplained variance. <strong>Analysis with low KMO values will usually need a higher number of components to obtain a good representation</strong>. Remember that KMO is calculated based on correlations so it is not influenced by the number of components retained in a PCA.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">There are other ways to check whether the analysis properly described our data, such as using a post estimation procedure to understand the residuals. Wait, residuals? In some forms, PCA can be seen as a model, actually Pearson first developed it as some form of regression model. It is possible to invert the equation that produces PC from data, in order to produce a formula that can compute the data from the Principal Components. However, the original data matrix<strong> </strong>will only be reproduced when all the components are retained. Certainly, that almost never happens. Because of that, it is useful to estimate our data using only the retained components, then  analyzing the distance between these predicted outcomes and the real values. These residuals can be can be tested by means of the sum of squares of the  residuals. This is called the <strong>Q-statistic or Rao statistic</strong>.</span></span></p>
<p lang="en-US"><span><span style="font-size: small">The values for the Q-statistic in our problem are shown above for every observation:</span></span></p>
<p lang="en-US"><span><span style="font-size: small"> </span></span></p>
<p lang="en-US"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Q.jpg"><img class="aligncenter size-full wp-image-284" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Q.jpg" alt="" width="629" height="664" /></a></span></span></p>
<p lang="en-US"><span><span style="font-size: small">There&#8217;s a way to obtain a critical value, in order to test for problems or outliers in Q-statistics. In this case, it will be enough to notice that two states have remarkably higher values, that may indicate that these states are not properly represented by the first two components. If any of this states were absolutely important for our results, it may be necessary to retain more PC&#8217;s. Since that&#8217;s not our case, it should be enough to <strong>comment in our conclusions that the states of Tabasco and Yucatán were not properly represented by our analysis</strong>.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">In a very similar way, t is also possible to analyze the <strong>fitted correlation matrix</strong> to understand which variables and relationships were not fully incorporated in these results. We can now calculate the residual correlation matrix, whose elements are the difference between the actual and the fitted correlations. For our example, we get the following matrix or residuals:</span></span></p>
<p lang="en-US"><span><span style="font-size: small"> <a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Res-Corr.jpg"><img class="aligncenter size-full wp-image-286" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Res-Corr.jpg" alt="" width="650" height="178" /></a></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Most relationships have been properly represented, with most residuals below 0.06. The misrepresentation is mostly due to the variation in the variables, which are not fully accounted. Some variables like sewer coverage, water coverage and life expectancy have some large residuals (in bold). Does this sound familiar? Remember that when we analyzed the unexplained variance accounted for each variables we got these exact same numbers. Now you know where they came from. </span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small"><strong>This additional post estimation analysis must be incorporated in any Principal Components Analysis</strong> or ate least some of this results. This information is important in order to understand the solution that was obtained. As you can imagine, most analysis will drop several components but using these measures the analyst can easily understand the impact of that unexplained variance.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Well, that is about everything I can contribute regarding how to perform a good Principal Components Analysis.  Even when there are many things we didn&#8217;t discuss, such as analysis of characteristic roots, inferential procedures,  rotations, etc. I still hope this humble practical guide will help you in understanding the concepts that lies beneath this beautiful statistical tool. Don&#8217;t forget to check the references out in order to gain some deeper knowledge about PCA and remember that we have a whole forum and a this great website in order to help those in statistical need.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Have a nice day!</span></span></p>
<p><span style="color: #ff0000"><span><span style="font-size: large">REFERENCES</span></span></span></p>
<ul>
<li><strong><span style="color: #ff0000"><span><span style="font-size: small">Cahill, 	Miles B. et al</span></span></span><span><span style="font-size: small"> Using principal components to produce an economic and social 	development index: An application to Latin America and the U.S. </span></span></strong><span><span style="font-size: small"><em>Atlantic 	Economic Journal Volume 29, Number 3.</em></span></span></li>
<li><span><span style="font-size: small"><span style="color: #ff0000"><strong>Jackson, 	J. Edward</strong></span><em> </em><span style="color: #000000"><strong>A 	User&#8217;s Guide to Principal Components. </strong></span><em> Wiley – Interscience, 1991.</em></span></span></li>
<li><span style="color: #ff0000"><span><span style="font-size: small"><strong>Lattin, 	James</strong></span></span></span><span><span style="font-size: small"> </span></span><span style="color: #000000"><span><span style="font-size: small"><strong>Analyzing 	Multivariate Data </strong></span></span></span><span style="color: #000000"><span><span style="font-size: small">Duxbury 	Press, 2002.</span></span></span></li>
<li><span><span style="font-size: small"><span style="color: #ff0000"><strong>Rabe-Heskett, 	Sophia</strong></span> <span style="color: #000000"><strong>A 	Handbook of Statistical Analyses using STATA.</strong></span><em> CRC Press, 2004.</em></span></span></li>
<li><span><span style="font-size: small"><span style="color: #ff0000"><em><strong>Venables, 	W.N. &amp; Ripley, B.D.</strong></em></span><em> </em><span style="color: #000000"><em><strong>Modern Applied Statistics 	with S.</strong></em></span><em><strong> </strong></em><em>Springer, 	2002.</em></span></span><span><span style="font-size: small"> </span></span></li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Principal Components Analysis – Step by Step (Part II)</title>
		<link>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step2/</link>
		<comments>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step2/#comments</comments>
		<pubDate>Thu, 19 May 2011 21:03:31 +0000</pubDate>
		<dc:creator>terzi</dc:creator>
				<category><![CDATA[P]]></category>
		<category><![CDATA[Statistical Research]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=260</guid>
		<description><![CDATA[Hi again! This time I&#8217;ll continue with some results and interpretations from a Principal Component Analysis, hoping this tiny tutorial can help you understand some of the basics. Although we won&#8217;t cover every aspect, some ideas will be useful when you happen to need PCA. The application that we&#8217;ve been working is a study aimed [...]]]></description>
			<content:encoded><![CDATA[<p></p><p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Hi again! This time I&#8217;ll continue with some results and interpretations from a Principal Component Analysis, hoping this tiny tutorial can help you understand some of the basics. Although we won&#8217;t cover every aspect, some ideas will be useful when you happen to need PCA. </span></span><span><span style="font-size: small">The application that we&#8217;ve been working is a study aimed to compare development among the states in Mexico.  Last time, we started our analysis and decided to retain two principal components. You can check that last post <a title="Part I" href="http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step1/" target="_blank">here</a>. It is time to continue with interpretations. </span></span></p>
<p style="text-align: justify"><span><span style="font-size: small">Last time we decided to keep two components, so it is time check the first two eigenvectors that resulted from our Singular Value Decomposition. First of all, it is important that we retain only two principal components. I mean, it is important to <em>let the software know </em>it. By doing that, most programs will show us some important information regarding the unexplained variance. Recall that our first two components will allow us to represent 87% of the variation, so there&#8217;s still a 13% hanging around. After adjusting the solution for two components, we can see them in order to understand the variables involved: </span></span></p>
<p style="text-align: justify"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Vectors.jpg"><img class="aligncenter size-full wp-image-261" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Vectors.jpg" alt="Eigenvectors from PCA" width="647" height="197" /></a> </span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Let&#8217;s begin analyzing the unexplained variance. As you see, sewer access is the variable that was less explained by our two components. The rest have only a mere 15% of variance unexplained at most. Since there is no single variable with a high unexplained rate, we can assume that most variables will be correctly represented.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">One of the most important interpretations in PCA is that of the components, the eigenvectors. Each component requires an interpretation: based on the variables that are important we can even “name” our components. As you can see, in PC1 almost all variables have the same value. Remember that one of our main reasons for analyzing the components is to understand the numbers in the index related to it, the scores that we will discuss below. <strong>The higher numbers, in absolute value, mean that those particular variables are more important to that component.</strong> In PC1 all variables seem to contribute and most have positive sign, which means that a state with larger values in GDP, access to services and life expectancy will get larger numbers in that index. Notice that the variable illiteracy has a negative sign: a state with high illiteracy will have low values in this index. It is important to realize how cool PCA is, since no one ever told the software that illiteracy was a bad sign for development. Anyway,<strong> our first PC is an index of “overall development” or wellbeing, where states with higher scores can be seen as those with a higher quality of life. </strong></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">One of the reasons why we kept two components is because the interpretation of the first one was quite easy, don&#8217;t you think? I&#8217;d bet most books have examples like it. The second PC is something that resembles some real situations and I find its interpretation more worthy for the purpose of this tutorial. <strong>Two variables have almost zero values: water and illiteracy, which means that their effect in the component is almost nonexistent, </strong>we can ignore them<strong>.</strong> Sewer and Electricity have positive values. GDP and life expectancy have negative values. <em>This component is a contrast</em>. The analysis detected some states with low economic performance but great infrastructure. Politics, certainly. The key is to understand what a score would mean for this index. <strong>Low numbers in this score would mean great GDP and life expectancy in a state with poor infrastructure for its citizens. High values would mean great infrastructure but an economic performance not so good. </strong>The ideal for any state would be near zero values, which would mean some balance&#8230; or maybe slightly negative, everyone wants to be rich, right? This weird situation arose since GDP is the least correlated variable in our group. Check the correlation matrix and you&#8217;ll see yet another proof that richer people aren&#8217;t always better.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">From the principal components we can obtain one of the most important results in PCA: principal component scores. Those are our “indexes”. Scores represent the values that each unit, in this case, each state would get in each component. This is crucial, specially since we now understand what the values in the components would mean.  With the first scores, we can see the valuation each state got in our “overall development” index, our first PC:</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/scores1.jpg"><img class="aligncenter size-full wp-image-262" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/scores1.jpg" alt="" width="358" height="398" /></a> </span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">It is easy to tell which are the five most developed states and the five least developed. I live in Veracruz, by the way so I&#8217;m starting to regret analyzing this data. Anyway, remember that this first scores only give us a 74% of the real variation. In order to get that 87% we wish, we should check both indexes at once. That is usually done by graphing indexes, i.e. both scores, in a scatterplot, which is commonly named Principal Components Score Plot or Principal Components Biplot. We can see it here:</span></span></p>
<p style="text-align: justify" lang="en-US"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Scores.jpg"><img class="aligncenter size-full wp-image-264" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Scores.jpg" alt="" width="685" height="499" /></a></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">There is a lot to learn from this graph. First of all, who&#8217;s enjoying the best life? Well, our first score is represented by the x-axis, so <strong>states at the left have lowest values in our overall well being index, our first PC.</strong> So, the best states for living should be in the right part of the graph: too bad for Veracruz. Do you remember what we concluded regarding PC2?  Low numbers would mean great GDP and life expectancy but poor infrastructure and high values would mean great infrastructure but low economic performance. Since it was stated that the ideal for any state would be near zero or slightly negative, those states above Jalisco may be considered less developed. One could say that Jalisco, Nayarit, Distrito Federal and the states in the low right corner are the winners: the most developed areas. <strong>Another really interesting result are the clusters that appear in the graph</strong>. It is easy to notice three or maybe four different groups: PCA is a good way to start a cluster analysis. </span></span><span><span style="font-size: small">From the graph you can also find the situation of any individual state. For instance, Distrito Federal, which is the official name of the capital of Mexico, Mexico City, is certainly one of the most developed areas.  Points that are near in the graph represent similar situations in those estates.  <strong>It is easy to compare states with each others or even geographic areas. </strong></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Notice that two states are colored in gray in the graph. That is because a residual analysis showed that these two points were not properly represented: these are part of that 13% PCA could not get. But wait, Residual analysis in PCA? Yes, we&#8217;ll see that when we get to the final part of this tutorial: Assessing the fit in the solution.</span></span></p>
<p style="text-align: justify" lang="en-US"><a title="PCA step by step - Part III" href="http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step3/" target="_blank"><span><span style="font-size: small"> Principal Components Analysis – Step by Step (Part III)</span></span></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Principal Components Analysis &#8211; Step by Step (Part I)</title>
		<link>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step1/</link>
		<comments>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step1/#comments</comments>
		<pubDate>Sat, 14 May 2011 20:12:58 +0000</pubDate>
		<dc:creator>terzi</dc:creator>
				<category><![CDATA[P]]></category>
		<category><![CDATA[Statistical Research]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=241</guid>
		<description><![CDATA[Hi everyone. For my first post I&#8217;ll start a basic step by step tutorial regarding the use of principal component analysis, an old technique that can be dated back to 1901 with the first approaches by Pearson. It was Hotelling who derived a definitive solution about three decades later. Although the mathematical background in the [...]]]></description>
			<content:encoded><![CDATA[<p></p><p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Hi everyone. For my first post I&#8217;ll start a basic step by step tutorial regarding the use of principal component analysis, an old technique that can be dated back to 1901 with the first approaches by Pearson. It was Hotelling who derived a definitive solution about three decades later. Although the mathematical background in the analysis isn&#8217;t too complex, I&#8217;ll try to avoid formulas to focus on a single application that may be useful to understand the basics of the analysis. I hope this turns out helpful at all.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Mexico is divided in 32 states (Actually 31 and a federal district, but let us consider 32 states). Some places have a better quality of life and a higher level of development than others, as it happens in every country in the world. Therefore, one interesting subject of study would be a comparison of that development among the states. But, what exactly is development? Well, that isn&#8217;t just a number, right? That would include many things: life expectancy in the state, Gross Domestic Product, percentage of people with access to fresh water, portion of population that is illiterate. It seems we should compare all these things at once in order to perform a good comparison. And that&#8217;s what PCA is good for. The data we will use has the following form:</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small"> <a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/data4.png"><img class="aligncenter size-full wp-image-252" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/data4.png" alt="Dataset used for PCA" width="650" height="81" /></a></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">As you can see, there are 6 variables, all in continuous scale. I just used those for convenience, but some analysis will include twenty, forty or even hundreds. How would you compare all of them at once? One way would be to produce an index and then compare that index, right? But another question arises: how do we get that index? We could get the sum of all the variables, like some psychological tests do. We could get an average, or sum all the variables and then subtract the year I was born. Those are good indexes, but which one would be the best? </span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">In PCA we want to obtain the best indexes, that is, the linear combinations that can reproduce the highest possible amount of variance present in our data. Hotelling suggested using a Singular Value Decomposition of the covariance matrix in order to obtain that result. <strong>When used as a descriptive tool, PCA has no distributional assumptions. </strong></span></span><span><span style="font-size: small">So, the first step will require an analysis of the covariance matrix. Here it is:</span></span></p>
<p lang="en-US"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Cov.jpg"><img class="aligncenter size-full wp-image-247" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Cov.jpg" alt="Covariance matrix calculated for our data" width="650" height="174" /></a></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Cov.jpg"></a></span></span><span><span style="font-size: small">Great! We should perform an SVD on this matrix and our results will be a PCA! Well, it&#8217;s not that simple. You can see the variances in bold. It is easy to notice that certain variable that involves money  has a huge variance, simply because it is measured with a different scale. Is that a problem? <strong>Yeah, because PCA wants to represent the total variance, and here the total variance is almost entirely due to the GDP. The rest of the variables will be somehow ignored by the analysis,  more exactly they will receive a lower weight in the results.</strong> We can perform a PCA of this matrix but our indexes will be almost totally about GDP and nothing more. In order to obtain results that give the same importance to every variable in the study, we should standardize the variables. That way no variable will have a larger influence in our results. Here we have the standardized covariance matrix:</span></span></p>
<p lang="en-US"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Corr.jpg"><img class="aligncenter size-full wp-image-248" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Corr.jpg" alt="Corrrelation matrix calculated for our data" width="651" height="180" /></a> </span></span></p>
<p style="text-align: justify" lang="en-US"><span> </span><span><span style="font-size: small">No, you&#8217;re not wrong. The standardized covariances are the correlations. That is why your software will ask for a covariance or correlation matrix, depending on whether you standardize your data. As you can see,<strong> it will largely depend on the scales you are using</strong>. I&#8217;ll take the correlation matrix, since I don&#8217;t want the first variable to overtake the rest. So, SVD on the R matrix and, voila! PCA! Your stats software will get you something like this:</span></span></p>
<p lang="en-US"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Eigen.jpg"><img class="aligncenter size-full wp-image-249" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Eigen.jpg" alt="Results from Eigenanalysis for PCA" width="650" height="196" /></a></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">Six variables generated six eigenvalues. These eigenvalues originated from the SVD and are related to six principal components that can express the total amount of variation in our data. Every time you run a Principal Component Analysis, you will get as many components as variables. Certainly, with PCA we want to reduce dimension, so why having six indexes? The key is that these contain information regarding several variables. As you can see, the first eigenvalue, attached to the first principal component, can positively express 74.06% of our total variance. One single value will express ¾ of our six measures of well being! Since the principal components are orthogonal, the amount of total variance expressed by the first two PC is 87.37%, the sum of the proportions explained by them individually. Two indexes will give us almost 90% of the information! So, analyzing fewer pieces of information will give us almost the same results as analyzing the whole set of variables.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">In practice, you can work with as many components as you wish.<strong> A good rule of thumb suggests that you should keep only the components that have an eigenvalue larger than 1 (only if you based your PCA on the correlation matrix)</strong>. Others suggest to look for a breaking point in a screeplot of eigenvalues, or simply retaining the components that will give you the amount of explained variance that you find suitable. Our screeplot for this case is shown below:</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small"> <a href="http://www.statisticspedia.com/wp-content/uploads/2011/05/Screeplot.jpg"><img class="aligncenter size-full wp-image-253" src="http://www.statisticspedia.com/wp-content/uploads/2011/05/Screeplot.jpg" alt="" width="685" height="495" /></a></span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small">We look for a breaking point, a moment where the line starts flattening. That is achieved with 2 eigenvalues. Some may say that it would be enough to work just with the first principal component but for the purpose of this tutorial, I&#8217;ll keep the first two in order to show some different interpretations. But this is getting way longer than a first post should be, so I&#8217;ll continue next time with the interpretation and understanding of principal components analysis.</span></span></p>
<p style="text-align: justify" lang="en-US"><span><span style="font-size: small"><a href="http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step2/">Principal Components Analysis &#8211; Step by Step (Part II)</a></span></span></p>
<p style="text-align: justify" lang="en-US"><a title="PCA step by step - Part III" href="http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step3/" target="_blank"><span><span style="font-size: small">Principal Components Analysis &#8211; Step by Step (Part III) </span></span></a></p>
<p lang="en-US"><span><span style="font-size: small"> </span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/articles/statistical-research/pca-step-by-step1/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Calling C Code from R</title>
		<link>http://www.statisticspedia.com/articles/r-programming/calling-c-code-from-r/</link>
		<comments>http://www.statisticspedia.com/articles/r-programming/calling-c-code-from-r/#comments</comments>
		<pubDate>Wed, 11 May 2011 16:41:47 +0000</pubDate>
		<dc:creator>Dason</dc:creator>
				<category><![CDATA[R Programming]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=221</guid>
		<description><![CDATA[Sometimes it can be advantageous to write some code in C instead of R.  R is great but sometimes its speed is lacking when it comes to certain tasks.  If you find yourself in a situation like this it can be nice to drop down to a lower level language like C to do some [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Sometimes it can be advantageous to write some code in C instead of R.  R is great but sometimes its speed is lacking when it comes to certain tasks.  If you find yourself in a situation like this it can be nice to drop down to a lower level language like C to do some of the programming because you&#8217;ll find a substantial speed boost in terms of execution time.  However, coding in C is somewhat more of a headache than coding in R.  You have to get your hands dirty with memory allocation and all the fun stuff that goes along with that.  That being said if you have a task that takes forever to run you might want to spend a little time optimizing your algorithms using C.  We&#8217;ll start by doing a simple example that you wouldn&#8217;t actually need C to do: computing the fourth power of a vector of numbers.  I won&#8217;t go into the details of writing C code here &#8211; just how to write C code and compile it in such a way that you can use the code from R.  I&#8217;ll be assuming some sort of *nix environment.  The first thing we need to do is write our function in C.</p>
<pre>
/* n: The length of the array x
 * x: The array of numbers we want to take the forth power of */
void forth(int *n, double *x){
    int i;
    int tmp;
    for(i = 0; i &lt; n[0]; i++){
        tmp = x[i] * x[i];
        x[i] = tmp * tmp;
    }
}
</pre>
<p>All inputs to our functions are pointers.  Also note that we don&#8217;t actually return any values from our function.  After we call the C function in R we get a list returned with all the inputs in their modified form.  In this case n will be unchanged but x will have all of its elements to the forth power.</p>
<p>Now assume we saved that in a file called forth.c.  To compile this code to be used in R we execute the following command from the command line:</p>
<pre>
 R CMD SHLIB forth.c
</pre>
<p>which will create a .so file in the same directory.  That .so file is what we&#8217;ll call when we use this in R.  </p>
<p>It is most convenient to write a wrapper function to call the C function to do some error checking for us.</p>
<pre>
dyn.load("forth.so")
forth &lt;- function(x){
  if(!is.numeric(x)){
    stop(&quot;Argument x must be numeric&quot;)
  }
  ans &lt;- .C(&quot;forth&quot;,
            n = as.integer(length(x)),
            x = as.double(x))
  return(ans$x)
}
</pre>
<p>Typically I would save this code in a file called forth.R and source that in if I wanted to use this function.  There are a few things to note here: 1) You need to use dyn.load before you have access to the .so file you created and 2) You use .C to call the function itself.  It is possible to use .Call and there are a few subtle differences between .C and .Call that I&#8217;m not going to discuss.  It is good practice to make sure that when you use .C that you explicitly cast your inputs to the correct type required by the C function.</p>
<p>Note that one could do some other helpful things for the user.  For instance if you input a numeric matrix into our function the output will be a vector.  There are a variety of ways to fix this depending on what you would expect this function should do.</p>
<p>It is also possible to call R functions from within your C code.  I&#8217;ll get to that in the next post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/articles/r-programming/calling-c-code-from-r/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>First Post</title>
		<link>http://www.statisticspedia.com/uncategorized/first-post/</link>
		<comments>http://www.statisticspedia.com/uncategorized/first-post/#comments</comments>
		<pubDate>Mon, 02 May 2011 12:16:40 +0000</pubDate>
		<dc:creator>vinuct</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[introuction]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=94</guid>
		<description><![CDATA[Hi All, This is Vinux ( Richie). Hope all TS contributers joined blog. Please reply if you have joined the blog. Regards Richie Note: I will delete this post, once the quality articles started filling the blog]]></description>
			<content:encoded><![CDATA[<p></p><p>Hi All,<br />
This is Vinux ( Richie). Hope all TS contributers joined blog. Please reply if you have joined the blog.</p>
<p>Regards<br />
Richie</p>
<p>Note:</p>
<p>I will delete this post, once the quality articles started filling the blog</p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/uncategorized/first-post/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>What is StatisticsPedia?</title>
		<link>http://www.statisticspedia.com/faq/what-is-statisticspedia/</link>
		<comments>http://www.statisticspedia.com/faq/what-is-statisticspedia/#comments</comments>
		<pubDate>Tue, 03 May 2011 09:43:46 +0000</pubDate>
		<dc:creator>orbit</dc:creator>
				<category><![CDATA[FAQ]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=136</guid>
		<description><![CDATA[What is StatisticsPedia? StatisticsPedia is a free online service which has been written collaboratively by members of Talk Stats Forum. While the talkstats forum is a great place to interact with other users, focusing on statistics related questions, StatisticsPedia is viewed as a repository for statistics-related information provided by users whom are willing and able [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><strong><strong>What is StatisticsPedia?</strong></strong></p>
<p>StatisticsPedia is a free online service which has been written collaboratively by members of Talk Stats Forum.</p>
<p>While the talkstats forum is a great place to interact with other users, focusing on statistics related questions, StatisticsPedia is viewed as a repository for statistics-related information provided by users whom are willing and able to share this information based on their experience.</p>
<p>The ‘pedia allows users to post and edit articles related to specific statistics-related topics and will provide a useful source of statistical definitions, tips, tricks, tutorials, and other free stuff for anyone interested in statistics.</p>
<p>So welcome to StatisticsPedia, and have fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/faq/what-is-statisticspedia/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What kind of articles can I post?</title>
		<link>http://www.statisticspedia.com/faq/what-kind-of-articles-can-i-post-2/</link>
		<comments>http://www.statisticspedia.com/faq/what-kind-of-articles-can-i-post-2/#comments</comments>
		<pubDate>Tue, 03 May 2011 09:40:15 +0000</pubDate>
		<dc:creator>orbit</dc:creator>
				<category><![CDATA[FAQ]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=132</guid>
		<description><![CDATA[StatisticsPedia is a website devoted to statistics and sharing statistics related information. We are currently accepting articles in all areas of statistics. If you have experience in a particular field of statistics, have knowledge of a particular piece of software or have just discovered some new tips or tricks that you would like to share, [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>StatisticsPedia is a website devoted to statistics and sharing statistics related information.<br />
We are currently accepting articles in all areas of statistics. If you have experience in a particular field of statistics, have knowledge of a particular piece of software or have just discovered some new tips or tricks that you would like to share, we would love to hear from you!</p>
<p>StatisticsPedia will also accept contributions to our on-line dictionary/glossary, and written tutorials.</p>
<p><strong><em>Is there a word limit?</em></strong></p>
<p>Strictly speaking, no; however we would encourage articles to be kept to a few hundred words.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/faq/what-kind-of-articles-can-i-post-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How do I post an article?</title>
		<link>http://www.statisticspedia.com/faq/how-do-i-post/</link>
		<comments>http://www.statisticspedia.com/faq/how-do-i-post/#comments</comments>
		<pubDate>Tue, 03 May 2011 19:42:04 +0000</pubDate>
		<dc:creator>pedia</dc:creator>
				<category><![CDATA[FAQ]]></category>

		<guid isPermaLink="false">http://www.statisticspedia.com/?p=140</guid>
		<description><![CDATA[First you need to register and log in to the control dashboard. The “Add Post” link is located on the top left corner of the dashboard. When adding a post, you can enter article title, content, and make sure to slelect the appropriate category.  We suggest that you write and edit the article using a program [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>First you need to register and log in to the control dashboard. The “Add Post” link is located on the top left corner of the dashboard. When adding a post, you can enter article title, content, and make sure to slelect the appropriate category.  We suggest that you write and edit the article using a program on your computer, then copy and paste the article into the add post page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.statisticspedia.com/faq/how-do-i-post/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

