ecosmak.ru

Statistical assessment of distribution parameters. Point estimate and its properties Statistical estimates of distribution parameters examples

Distributions in mathematical statistics are characterized by many statistical parameters. Estimating unknown distribution parameters based on various sample data allows one to construct distributions of a random variable.

Find a statistical estimate of an unknown distribution parameter - find a function of observed random variables that will give an approximate value of the estimated parameter.

Statistical estimates can be classified as unbiased, biased, efficient, and consistent.

Definition 1

Unbiased Estimate-- statistical estimate $Q^*$, which, for any value of the sample size, has a mathematical expectation equal to the estimated parameter, that is

Definition 2

Biased estimate-- statistical estimate $Q^*$, which, for any value of the sample size, has a mathematical expectation that is not equal to the estimated parameter, that is

Definition 4

Consistent assessment-- a statistical assessment in which, with a sample size tending to infinity, the probability tends to the estimated parameter $Q.$

Definition 5

Consistent assessment-- a statistical estimate in which, as the sample size tends to infinity, the variance of the unbiased estimate tends to zero.

General and sample averages

Definition 6

General average-- arithmetic mean of the values ​​of the general population variant.

Definition 7

Sample mean-- arithmetic mean of the values ​​of the sample population.

The values ​​of the general and sample average can be found using the following formulas:

  1. If the values ​​of option $x_1,\ x_2,\dots ,x_k$ have, respectively, frequencies $n_1,\ n_2,\dots ,n_k$, then
  1. If the values ​​of option $x_1,\ x_2,\dots ,x_k$ are different, then

Associated with this concept is the concept of deviation from the average. This value is found using the following formula:

The average deviation has the following properties:

    $\sum(n_i\left(x_i-\overline(x)\right)=0)$

    The average deviation is zero.

General, sample and corrected variances

Another of the main parameters is the concept of general and sample variance:

General variance:

Sample variance:

General and sample standard deviations are also associated with these concepts:

To estimate the general variance, the concept of corrected variance is introduced:

The concept of corrected standard deviation is also introduced:

Example of problem solution

Example 1

The population is defined by the following distribution table:

Picture 1.

Let us find for it the general mean, the general variance, the general standard deviation, the corrected variance and the corrected standard deviation.

To solve this problem, we first make a calculation table:

Figure 2.

The value $\overline(x_в)$ (sample average) is found by the formula:

\[\overline(x_in)=\frac(\sum\limits^k_(i=1)(x_in_i))(n)\]

\[\overline(x_in)=\frac(\sum\limits^k_(i=1)(x_in_i))(n)=\frac(87)(30)=2.9\]

Let's find the general variance using the formula:

General standard deviation:

\[(\sigma )_в=\sqrt(D_в)\approx 1.42\]

Corrected variance:

\[(S^2=\frac(n)(n-1)D)_в=\frac(30)(29)\cdot 2.023\approx 2.09\]

Corrected standard deviation.

Statistical estimates of population parameters. Statistical hypotheses

LECTURE 16

Let it be necessary to study a quantitative characteristic of a general population. Let us assume that, from theoretical considerations, we have been able to establish exactly what distribution the feature has. This raises the problem of estimating the parameters that determine this distribution. For example, if it is known that the characteristic being studied is distributed in the general population according to a normal law, then it is necessary to estimate (approximately find) the mathematical expectation and standard deviation, since these two parameters completely determine the normal distribution. If there are grounds to believe that the characteristic has a Poisson distribution, then it is necessary to estimate the parameter by which this distribution is determined.

Typically, in a distribution, the researcher has only sample data, for example, values ​​of a quantitative characteristic obtained as a result of observations (hereinafter, observations are assumed to be independent). The estimated parameter is expressed through these data.

Considering as values ​​of independent random variables , we can say that finding a statistical estimate of an unknown parameter of a theoretical distribution means finding a function of observed random variables, which gives an approximate value of the estimated parameter. For example, as will be shown below, to estimate the mathematical expectation of a normal distribution, use the function (the arithmetic mean of the observed values ​​of the attribute):

.

So, statistical assessment An unknown parameter of a theoretical distribution is called a function of observed random variables. A statistical estimate of an unknown population parameter, written as a single number, is called point. Consider the following point estimates: biased and unbiased, efficient and consistent.

In order for statistical estimates to provide “good” approximations of the estimated parameters, they must satisfy certain requirements. Let us indicate these requirements.

Let there be a statistical estimate of an unknown parameter of the theoretical distribution. Let's assume that when sampling the volume, an estimate is found. Let's repeat the experiment, that is, we will extract another sample of the same size from the general population and use its data to find an estimate, etc. Repeating the experiment many times, we get the numbers , which, generally speaking, will differ from each other. Thus, the score can be considered as a random variable, and the numbers – as its possible meanings.

It is clear that if the estimate gives an approximate value with an excess, then each number found from the sample data will be greater than the true value. Consequently, in this case the mathematical (average value) of the random variable will be greater than , that is, . Obviously, if it gives an approximate value with a disadvantage, then .


Therefore, the use of a statistical estimate, the mathematical expectation of which is not equal to the estimated parameter, leads to systematic (of the same sign) errors. For this reason, it is natural to require that the mathematical expectation of the estimate be equal to the estimated parameter. Although compliance with this requirement will not generally eliminate errors (some values ​​are greater than and others less than), errors of different signs will occur equally frequently. However, compliance with the requirement guarantees the impossibility of obtaining systematic errors, that is, it eliminates systematic errors.

Unbiased is called a statistical estimate (error), the mathematical expectation of which is equal to the estimated parameter for any sample size, that is.

Displaced is called a statistical estimate, the mathematical expectation of which is not equal to the estimated parameter for any sample size, that is.

However, it would be a mistake to assume that an unbiased estimate always provides a good approximation of the parameter being estimated. Indeed, the possible values ​​may be widely scattered around their mean value, that is, the dispersion may be significant. In this case, the estimate found from the data of one sample, for example, may turn out to be very far from the average value, and therefore from the estimated parameter itself. Thus, by taking as an approximate value, we will make a big mistake. If you require that the variance be small, then the possibility of making a large error will be excluded. For this reason, statistical evaluation is subject to the requirement of efficiency.

Effective is a statistical estimate that (for a given sample size) has the smallest possible variance.

Wealthy they call a statistical estimate, which tends in probability to the estimated parameter, that is, the equality is true:

.

For example, if the variance of an unbiased estimate at tends to zero, then such an estimate also turns out to be consistent.

Let's consider the question of which sample characteristics best estimate the general mean and variance in terms of unbiasedness, efficiency and consistency.

Let us study a discrete general population with respect to some quantitative characteristic.

General Secondary is called the arithmetic mean of the characteristic values ​​of the general population. It is calculated by the formula:

§ – if all values ​​of the characteristic of the general population of volume are different;

§ – if the values ​​of the characteristic of the general population have frequencies respectively, and . That is, the general average is a weighted average of attribute values ​​with weights equal to the corresponding frequencies.

Comment: let the general population of the volume contain objects with different values ​​of the attribute. Let's imagine that one object is selected at random from this set. The probability that an object with a feature value, for example, will be retrieved is obviously equal to . Any other object can be retrieved with the same probability. Thus, the value of a feature can be considered as a random variable, the possible values ​​of which have the same probabilities equal to . In this case, it is not difficult to find the mathematical expectation:

So, if we consider the surveyed characteristic of the general population as a random variable, then the mathematical expectation of the characteristic is equal to the general average of this characteristic: . We obtained this conclusion by considering that all objects in the general population have different attribute values. The same result will be obtained if we assume that the general population contains several objects with the same attribute value.

Generalizing the obtained result to the general population with a continuous distribution of the characteristic, we define the general average as the mathematical expectation of the characteristic: .

Let a sample of volume be extracted to study the general population regarding a quantitative characteristic.

Sample average is called the arithmetic mean of the characteristic values ​​of the sample population. It is calculated by the formula:

§ – if all values ​​of the characteristic of the sample volume are different;

§ – if the values ​​of the characteristic of the sample population have frequencies respectively, and . That is, the sample average is a weighted average of attribute values ​​with weights equal to the corresponding frequencies.

Comment: The sample mean found from the data of one sample is obviously a certain number. If you take other samples of the same size from the same population, then the sample mean will change from sample to sample. Thus, the sample mean can be considered as a random variable, and therefore, we can talk about the distributions (theoretical and empirical) of the sample mean and the numerical characteristics of this distribution, in particular, the mathematical expectation and variance of the sample distribution.

Further, if the general average is unknown and it is required to estimate it using sample data, then the sample average, which is an unbiased and consistent estimate, is taken as an estimate of the general average (we suggest proving this statement yourself). From the above it follows that if sample means are found for several samples of a sufficiently large volume from the same general population, then they will be approximately equal to each other. This is the property stability of sample means.

Note that if the variances of two populations are the same, then the proximity of the sample means to the general means does not depend on the ratio of the sample size to the size of the general population. It depends on the sample size: the larger the sample size, the less the sample average differs from the general average. For example, if 1% of objects are selected from one population, and 4% of objects are selected from another population, and the volume of the first sample turns out to be larger than the second, then the first sample mean will differ less from the corresponding general mean than the second.

Questions of statistical assessment link into a single whole such problematic aspects of mathematical statistics as scientific methodology, random variables, statistical distributions etc. For any sample there are inherent errors due to incomplete coverage of units, measurement errors and similar reasons. Such errors in real life give each hypothesis (in particular, those formulated on the basis of economic conclusions) a random, stochastic character. Regardless of the number of variables stipulated by theoretical hypotheses, it is assumed that the influence various types errors can be described quite accurately using only one component. This methodological approach allows us to limit ourselves to a one-dimensional probability distribution while simultaneously estimating several parameters.

Statistical evaluation is one of two types of statistical judgment (the second type is hypothesis testing). It is a special kind of method for judging the numerical values ​​of the characteristics (parameters) of the distribution of a population based on data from a sample from this population. That is, having the results of a sample observation, we are trying to estimate (with the greatest accuracy) the values ​​of certain parameters on which the distribution of the characteristic (changeable) that interests us in the general population depends. Since the sample includes only a portion of the population (sometimes a very small number), there is a risk of error. Although this risk decreases with increasing number of observation units, it still occurs during random observation. Hence, the decision made based on the sampling results is of a probabilistic nature. But it would be wrong to consider statistical judgments only in terms of probabilities. This approach is not always sufficient to construct correct theoretical assumptions regarding the parameters of the population. Often a number of additional judgments are needed to provide deeper justification. For example, it is necessary to estimate, as close as possible, the average number of skilled workers at enterprises in the region. In this case, the arithmetic mean of the variable x from the population, which has a normal distribution, is estimated. Having received a sample for this characteristic in quantity P units, it is necessary to resolve the question: what value, according to the sample data, should be taken as closest to the average in the general population? There are several such quantities, the mathematical expectation of which is equal to the desired parameter (or close to it): a) arithmetic mean; b) fashion; c) median; d) average, calculated by the range of variation, etc.

From a probabilistic point of view, each of the above quantities can be considered to provide the best approximation to the desired population parameter (x), since the mathematical expectation of each of these functions (especially for large samples) is equal to the general average. This assumption is due to the fact that when repeating a sample from the same population many times, an “on average” correct result will be obtained.

The correctness “on average” is explained by the equality of repetitions of positive and negative deviations of the resulting errors in estimating the general average, that is, the average error of estimation will be equal to zero.

In practical conditions, as a rule, one sample is organized, so the researcher is interested in the question of more accurate assessment the desired parameter based on the results of a specific sample. To solve such a problem, in addition to the conclusions that follow directly from the abstract calculation of probabilities, additional rules are needed to motivate the best approximation of the estimate to the desired parameter of the population.

There are a sufficient number of ways to estimate constants from sample observations. Which of them are the best in solving specific research problems is the subject of statistical estimation theory. It examines the conditions to which this or that assessment must be subject, and focuses on assessments that are more preferable under given circumstances. Evaluation theory indicates the superiority of one evaluation over another.

As is known, information obtained from a sample is not categorical in conclusion. If, for example, 100 animals studied turned out to be healthy and 99 were healthy, then there is a possibility that one animal that remained unexamined carries the virus of the suspected disease. Since this is unlikely, it is concluded that the disease does not exist. In most cases, this conclusion is completely justified.

Based on similar findings in practical activities, the experimenter (researcher) does not rely on the reliability of the information, but only on its probability.

The other side of sample observation, as already noted, solves the problem of determining the degree of reliability of the resulting sample estimates as objectively as possible. They try to provide the solution to this problem with the most accurate probabilistic expression possible, that is, we are talking about determining the degree of accuracy of the assessment. Here the researcher determines the limits of the possible discrepancy between the estimate obtained from the sample and the actual value of its value in the population.

The accuracy of the estimate is determined by the way it is calculated from the sample data and the method of selecting units in the sample population.

The method for obtaining estimates involves any computational procedure (method, rule, algebraic formula). This is a priority of the theory of statistical estimation. Selection methods lead to questions of sampling technique.

The above allows us to define the concept of “statistical assessment”.

Statistical evaluation- this is an approximate value of the desired parameter of the population, which is obtained from the results of the sample and provides the opportunity to make informed decisions about unknown parameters of the population.

Let us assume that ^ "is a statistical estimate of the unknown parameter ^ of the theoretical distribution. Based on repeated implementations of the same

Sample size from the general population found estimates and 2 ^ ""n,

having different meanings. Therefore, the estimate ^" can be considered as

random variable, and +17 two, 3 ~ "n - as its possible values. How random value, it is characterized by a certain probability density function. Since this function is determined by the result of selective observation (experiment), it is called sampling distribution. Such a function describes the probability density for each of the estimates using a certain number of sample

observations. If we assume that the statistical estimate ^ " is an algebraic function of a certain set of data and such a set will be obtained by carrying out a sample observation, then in

In general, the estimate will receive the expression: ® n = f (Xl.X2, ^ 3, ... X t).

At the end of the sample survey, this function is no longer an assessment general view, but takes on a specific value, that is, it becomes a quantitative assessment (number). In other words, from the above expression of the function it follows that any of the indicators characterizing the results of a sample observation can be considered an estimate. The sample mean is an estimate of the population mean. The variance calculated from the sample or the value of the standard deviation calculated from it are estimates of the corresponding characteristics of the general population, etc.

As already noted, the calculation of statistical estimates does not guarantee the elimination of errors. The point is that the latter should not be systematic. Their presence must be random. Let us consider the methodological side of this position.

Suppose the estimate ^ "gives an inaccurate value of the estimate ^ of the population with a disadvantage. In this case, each calculated value = 1,2,3, ..., n) will be less than the actual value of the value $.

For this reason, the mathematical expectation (average value) of the random variable b will be less than b, that is, (M(^ n. And, conversely, if it gives an estimate in excess, then the mathematical expectation

random ^" will become greater than $.

It follows that the use of a statistical estimate, the mathematical expectation of which is not equal to the estimated parameter, leads to systematic errors, that is, to non-random errors that bend the measurement results in one direction.

A natural requirement arises: the mathematical expectation of the estimate ^ "must be equal to the estimated parameter. Compliance with this requirement does not eliminate errors in general, since sample values ​​of the estimate may be greater or less than the actual value of the estimate of the general population. But errors in one direction or the other from the values ​​of ^ will occur (according to probability theory) with the same frequency. Therefore, compliance with this requirement, the mathematical expectation of a sample estimate must be equal to the estimated parameter, excludes the occurrence of systematic (non-random) errors, that is

M (V) = 6.

Selecting a statistical estimator that provides the best approximation of the parameter being estimated is an important problem in estimation theory. If it is known that the distribution of the random variable under study in the population corresponds to the law of normal distribution, then using sample data it is necessary to estimate the mathematical expectation and standard deviation. This is explained by the fact that these two characteristics completely determine the basis on which the normal distribution is built. If the random variable under study is distributed according to Poisson's law, the parameter ^ is estimated, since it determines this distribution.

Mathematical statistics distinguishes between the following methods for obtaining statistical estimates from sample data: the method of moments, the maximum likelihood method.

When obtaining estimates using the method of moments, moments of the general population are replaced by moments of the sample population (instead of probabilities, frequencies are used for weighting).

In order for a statistical estimate to give the “best approximation” to a general characteristic, it must have a number of properties. They will be discussed below.

The ability to choose the best assessment is due to knowledge of their basic properties and the ability to classify assessments according to these properties. In the mathematical literature, “properties of assessments” are sometimes called “requirements for assessments” or “criteria for assessments.” The main properties of statistical assessments include: Unbiasedness, efficiency, ability, sufficiency.

If we assume that the sample mean (~) and sample variance

(Stv) are estimates of the corresponding general characteristics (^), that is, their mathematical expectation, we take into account that when large quantities

sample units named characteristics (~) will be close to their mathematical expectations. If the number of sampling units is small, these characteristics may differ significantly from the corresponding mathematical expectations.

If the mean of the sample characteristics chosen as an estimate matches the value of the general characteristic, the estimate is called unbiased. The proof that the mathematical expectation of the sample mean is equal to the general mean (m (x) = x) indicates that the value ~ is an unbiased general

average The situation is different with selective dispersion (o). her

M (ST 2) = - o-2. .

mathematical expectation n, not equal to the general

variances. So, a h is a biased estimate of a ". To eliminate the bias and obtain an unbiased estimate, sample

the dispersion is multiplied by the correction n - 1 (this follows from the formation

in 2 _ 2 p P -1 "n -1

above equation: n).

Thus, with a small sample, the variance is equal to:

2 Tx, - ~) 2 P E (x and - ~) 2

sg in= x - = -.

p p - 1 p -1

Fraction (P- 1) is called the Bessel correction. Mathematician Bessel was the first to establish that the sample variance is a biased estimate of the general variance and applied the specified correction to correct

ratings. For small samples, the correction (n - 1) differs significantly from 1. As the number of observation units increases, it quickly approaches 1. For n<>50 the difference between the estimates disappears, that is

° ~ "- .From all of the above, the following definitions of unbiasedness requirements follow.

Unbiased is a statistical estimate whose mathematical expectation for any sample size is equal to the value

population parameter, that is, m (^) = 9; m(x) = x.

The category "mathematical expectation" is studied in the probability theory course. This is a numerical characteristic of a random variable. The mathematical expectation is approximately equal to the average value of the random variable. Mathematical expectation of a discrete random variable is the sum of the products of all its possible values ​​and their probabilities. Suppose n studies have been carried out in which the random variable X took w 1 time the value of w 2 times the value of Sh and times the value of X k. In this case, Sh 1 + Sh 2 + Sh 3 + ... + Sh k = n. Then the sum of all values ​​accepted x, equal to

x 1 w 1 + x 2 w 2 + x 3 w 3 + ... + x k w k

The arithmetic average of these values ​​will be:

X 1 w 1 + x 2 w 2 + x 3 w 3 + ... + x k w k - w 1^ w 2 ^ w 3 ^ ^ w k

P or 1 p 2 p 3 p 1 p.

Since n is the relative frequency ^ value X ^ P- relative frequency of the value x 2, etc., the above equation will take the form:

X = X 1 No. 1 + X 2 No. 2 + X 3 No. 3 + ... + X to H> to

With a large number of sample observations, the relative frequency is approximately equal to the probability of the event occurring, that is

u>1 = L; ^ 2 = Ш = ™ к = Рк and therefore x 2 x 1 r 1 + x 2 r 2 + X 3 g. 3 + ... + X KRK. Then

x~ m(x) the probabilistic meaning of the obtained calculation result is that the mathematical expectation is approximately equal (the more accurately, the larger the sample) to the arithmetic mean of the observed values ​​of the random variable [M (x -) = ~ 1.

The unbiased criterion guarantees the absence of systematic errors in the estimation of population parameters.

Note that the sample estimate (^) is a random variable, the value of which can vary from one sample to another. The extent of its variation (dispersion) around the mathematical expectation of the population parameter # is characterized by the dispersion st2 (^).

Let in andIN -- two unbiased estimates of the parameter ^, that is M (in") = 6 and M (d,) = v. Their variances V 1 (V -) And VGf -). With two 0 these nok In Artaud, give preference to the one that has less dispersion around the estimated parameter. If the variance of the estimate is ^" less than the variance

estimates Cn, then the first estimate, that is, ^ ", is taken as the estimate &.

The unbiased estimator ^ that has the smallest variance among all possible unbiased estimators of the parameter ^ calculated from samples of the same size is called the effective estimator. This is the second property (requirement) of statistical estimates of population parameters. It must be remembered that the effective estimate of the parameter of the general population, subject to a certain distribution law, does not coincide with the effective estimate of the parameter of the second section.

When considering large samples, statistical estimates must have the property of ability. An estimate is capable (also referred to as “fit” or “consistent”) meaning that the larger the sample size, the greater the likelihood that the estimate error will not exceed an arbitrarily small positive

number E. An estimate of the 6th parameter ^ is called consistent if it obeys the law of large numbers, that is, the following equality holds:

/ shg | G in-in <Е} = 1.

As we can see, a statistical estimate is called capable if, for n, it approaches the probability of the parameter being estimated. In other words, this is the value of the indicator obtained from the sample and approaching (coinciding in probability) due to the law of large numbers with increasing sample size to its mathematical expectation. For example, if the variance of an unbiased estimate tends to zero as n, then such an estimate turns out to be consistent, since it has the smallest possible variance (for a given sample size).

Capable assessments are:

1) the share of the attribute in the sample population, that is, frequency as an estimate of the share of the attribute in the general population;

2) sample average as an estimate of the general average;

3) sample variance as an estimate of the general variance;

4) sample coefficients of asymmetry and kurtosis as an estimate of general coefficients.

For some reason, in the literature on mathematical statistics it is not always possible to find a description of the fourth property of statistical estimates - sufficiency. Grade sufficient(or exhaustive) is an estimate that provides (ensures) completeness of coverage of all sample information about an unknown parameter of the general population. Thus, a sufficient estimate includes all the information contained in the sample regarding the statistical characteristics of the population under study. None of the three estimates considered earlier can provide the necessary additional information about the parameter under study, as a sufficient statistical estimate.

Therefore, the arithmetic sample mean ~ is an unbiased estimate of the arithmetic population mean x. The unbiased factor of this estimate shows: if you take a large number of random samples from the general population, then their averages *<отличались бы от генеральной средней в большую и меньшую сторону одинаково, то есть, свойство несмещенности хорошей оценки также показывает, что среднее значение бесконечно большого числа выборочных средних равно значению генеральной средней.

In a symmetric series distribution, the median is an unbiased estimate of the general mean. And provided that the size of the sample population approaches the general population (P ~ * N), the median can be in such series and a consistent estimate of the general average. As for the criterion of efficiency relative to the median as an estimate of the arithmetic mean of the general population, it can be proven that in samples large volume, the root mean square error of the median (Sme) is equal to 1.2533 of the root mean square error of the sample mean

). That is, Stme *. Therefore, the median cannot be an effective estimate of the arithmetic population mean, since its mean square error is greater than the mean square error of the arithmetic mean of the sample. In addition, the arithmetic mean satisfies the conditions of unbiasedness and ability, and, therefore, is the best estimate.

Such a setting is also possible. Can the arithmetic mean of a sample be an unbiased estimate of the median in symmetrical population distributions for which the mean and median are the same? And will the sample mean be a consistent estimate of the population median? In both cases the answer will be yes. For a population median (with a symmetric distribution), the arithmetic mean of the sample is an unbiased and consistent estimator.

Remembering that Sme ~ 1.2533st, we come to the conclusion: the arithmetic mean of the sample, rather than the median, is a more effective estimate of the median of the population under study.

Each sample characteristic is not necessarily the best estimate of the corresponding population characteristic. Knowledge of the properties of estimates allows us to solve the issue of not only choosing estimates, but also improving them. As an example, we can consider the case when calculations show that the values ​​of the standard deviations of several samples from the same population in all cases are less than the standard deviation of the general population, and the magnitude of the difference is determined by the sample size. By multiplying the sample standard deviation by the correction factor, we obtain an improved estimate of the population standard deviation. For such a correction factor, the Bessel correction is used

P a I P

(P - 1), that is, to eliminate the bias, estimates are obtained "P- 1. This numerical expression shows that the standard deviation of the sample, used as an estimate, gives an underestimated value of the population parameter.

As is known, the statistical characteristics of a sample population are approximate estimates of the unknown parameters of the general population. The score itself can be in the form of a single number or a specific point. An estimate that is determined by a single number is called a point estimate. Thus, the sample mean (~) is an unbiased and best-performing point estimate of the general mean (x), and the sample variance) is a biased point estimate of the general mean (x).

variance (). If we denote the average error of the sample mean T <>then the point estimate of the general average can be written as x ± m°. This means that ~ is an estimate of the general mean x with an error equal to m. It is clear that point statistical estimates of x and o should not have a systematic error in

ooo~~o<в 2

side of overestimation or underestimation of the estimated parameters x and. As mentioned earlier, estimates that satisfy such a condition are called

undisplaced. What is a parameter error? It is the average of many specific errors:

The point estimate of a population parameter is that, from different possible sample estimates, the one that has optimal properties is first selected, and then the value of this estimate is calculated. The resulting calculated value of the latter is considered as the best approximation to the unknown true value of the population parameter. Additional calculations related to determining a possible estimation error are not always mandatory (depending on the specific assessment tasks), but, as a rule, are almost always carried out.

Let's consider examples of determining a point estimate for the average of the characteristics under study and for their share in the population.

Example. The region's grain crops cover 20,000 hectares. With a 10% sample survey of fields, the following sample characteristics were obtained: average yield - 30 centners per hectare, yield dispersion - 4, area sown with high-yield crops - 1200 hectares.

What to know about the value of the average yield of grain crops in the region and what is the numerical value of the indicator of the share (specific gravity) of high-yielding crops in the total area of ​​grain crops under study

region? That is, it is necessary to evaluate the named parameters (x, z) in the general population. To calculate estimates we have:

N = 20000; - = 20000 x 0.1 = 2000; ~ = 30;<т = л / 4; № 2000,

As is known, the selective arithmetic mean is an effective estimate

general arithmetic mean. Thus, it can be accepted that

the best estimate of the general parameter (^) is 30. To determine the degree

accuracy of the estimate, it is necessary to find its average (standard) error:

ia. p ~ I April 2000 h PPL

t = L - (1--) = - (1--) = 0,04

v n N i2000 2000 ^

The resulting error value indicates a high accuracy of the estimate. The value of m here means that if such samples were repeated many times, the parameter estimation error would be on average 0.04. That is, beyond the point

It is estimated that the average yield on farms in the region will be x = 30 - 0.04 c per I hectare.

To obtain a point estimate of the share of high-yielding grain crops in the total area of ​​grain, the best estimate can be taken as the share in the sample ¥ = 0.6. Thus, we can say that, based on the results of observations, the best estimate of the desired structure indicator will be the number 0.6. To clarify the calculations, you should calculate the average error of this estimate: T And (1 _ p) and 0.6 (1 - 0.b) (1 = 0.01

v P Nv 2000 2000 A

As we can see, the average error in estimating the general characteristics is 0.01.

The obtained result means that if the sample with a volume of 2000 hectares of grain was repeated many times, the average error of the accepted estimate of the share (specific gravity) of high-yielding crops in the area of ​​grain crops of enterprises in the region would be ± 0.01. In this case, P = 0.6 ± 0.01. In percentage terms, the share of high-yield crops in the total grain area of ​​the region will average 60 ± I.

Calculations show that for a specific case, the best estimate of the desired structure indicator will be the number 0.6, and the average error of estimation in one direction or another will be approximately equal to 0.01. As we can see, the estimate is quite accurate.

There are several known methods for point estimation of the standard deviation in cases where the sample is taken from a population of units with a normal distribution and the parameter b is unknown. A simple (easiest to calculate) estimate is the range of variation (and °) of the sample, multiplied by a correction factor taken from standard tables and which depends on the sample size (for small samples). The population standard deviation parameter can be estimated using the calculated sample variance taking into account the number of degrees of freedom. The square root of this variance gives the value that will be used as an estimate of the general standard deviation).

Using the parameter value in "calculate the average error in estimating the general mean (x") in the manner discussed above.

As stated earlier, according to the ability requirement, confidence in the accuracy of a point estimate increases as the sample size increases. It is somewhat difficult to demonstrate this theoretical position using a point estimate as an example. The effect of sample size on the accuracy of the estimate is obvious when calculating interval estimates. They will be discussed below.

Table 39 shows the most commonly used point estimates of population parameters.

Table 39

Basic point estimates _

Estimation values ​​calculated using different methods may not be the same in magnitude. In this regard, in practical calculations one should not engage in sequential calculation of possible options, but, based on the properties of various estimates, choose one of them.

With a small number of observation units, the point estimate is largely random and therefore not very reliable. Therefore, in small samples it can differ greatly from the estimated characteristic of the general population. This situation leads to gross errors in conclusions that extend to the general population based on the sampling results. For this reason, interval estimates are used for small samples.

Unlike a point estimate, an interval estimate gives a range of points within which the population parameter should be located. In addition, interval estimation indicates probability and is therefore important in statistical analysis.

Interval is an estimate that is characterized by two numbers - the boundaries of the interval that covers (covers) the parameter being estimated. Such an estimate represents a certain interval in which the desired parameter is located with a given probability. The center of the interval is taken to be a sample point estimate.

Thus, interval estimates are a further development of point estimation, when such an estimate is ineffective with a small sample size.

The problem of interval estimation in general can be formulated as follows: based on sample observation data, it is necessary to construct a numerical interval in relation to which, using a previously selected level of probability, it can be stated that the estimated parameter lies within this interval.

If we take a sufficiently large number of sampling units, then, using Lyapunov’s theorem, we can prove the probability that the sampling error will not exceed a certain specified value a, that is

And ~ "*!" A or I No. "YA.

In particular, this theorem makes it possible to estimate the errors of approximate equalities:

- "R (p and - frequency) x" x. p

If ^ * 2X3..., x are ~ independent random variables and n, then the probability of their average (x) is in the range from a to 6 and can be determined by the equations:

p(a(X (e) 1 e 2 these,

_A- E(x); _ in - E (x) DE ° a

The probability P is called the confidence probability.

Thus, the confidence probability (reliability) of estimating a general parameter based on a sample estimate is the probability with which the inequalities are realized:

| ~ X | <а; | и, ориентир | <д

where a is the maximum estimation error, according to the average and share.

The limits within which the general characteristic can be located with this given probability are called confidence intervals (confidence limits). And the boundaries of this interval are called trust boundaries.

Confidence (or tolerance) boundaries are boundaries beyond which a given characteristic due to random fluctuations has an insignificant probability (A ^ 0.5; p 2<0,01; Л <0,001). Понятие "доверительный интервал" введено Дж.Нейман и К.Пирсоном (1950 г.). Это установленный по выборочным данным интервал, который с заданной вероятностью (доверительной вероятностью) охватывает (покрывает) настоящее, но неизвестно для нас значение параметра. Если уровня доверительной вероятности принять значения 0,95, то эта вероятность свидетельствует о том, что при частых приложениях данного способа (метода) вычислений доверительный интервал примерно в 95% случаев будет покрывать параметр. Доверительный интервал генеральной средней и генеральной доли определяется на основе приведенных выше неравенств, из которых

it follows that ~ _A - x - ~ + A; No. _A - g. - No. + A.

In mathematical statistics, the reliability of a particular parameter is assessed by the value of the following three probability levels (sometimes called “probability thresholds”): A = 0.95; ^2 = 0.99; P 3 = 0.999. Probabilities that are decided to be neglected, that is A 1 = 0.05;; a 2 = 0.01; "3 = 0.001 are called significance levels, or levels of materiality. From the given levels, reliable conclusions are ensured by the probability P 3 = 0.999. Each level of confidence probability corresponds to a certain value of the normalized deviation (see Table 27). If there are no standard tables of probability interval values ​​available, then this probability can be calculated with a certain degree of approximation using the formula:

R (<) = - = ^ = 1 e "~ yi.

In Figure 11, those parts of the total area bounded by the normal curve and the x-axis that correspond to the value are shaded <= ± 1;<= ± 2; <= и 3 и для которых вероятности равны 0,6287, 0,9545; 0,9973. При точечном оценке рассчитывается, как уже известно, средняя ошибка выборки, при интервальном - предельная.

Depending on the principles of selection of units (repeated or without repetition), structural formulas for calculating sampling errors

differ in the magnitude of the correction (N).

Rice. 11. Normal probability distribution curve

Table 40 shows formulas for calculating errors in estimating the general parameter.

Let us consider the specific case of interval estimation of the parameters of the general population based on sample observation data.

Example. During a sample survey of farms in the region, it was found that the average daily milk yield of cows (x) is 10 kg. The share of purebred cattle in the total livestock is 80%. The sampling error with a confidence probability of P = 0.954 turned out to be equal to 0.2 kg; for private purebred livestock 1%.

Thus, the boundaries within which the general average can be

performance will be 9.8<х <10,2; для генеральной доли скота -79 <Р <81.

Conclusion: with a probability of 0.954 it can be stated that the difference between the selective average productivity of cows and the general productivity is 0.2 kg. The average daily milk yield limit is 9.8 and 10.2 kg. The share (specific gravity) of purebred cattle in the enterprises of the region ranges from 79 to 81%, the estimation error does not exceed 1%.

Table 40

Calculation of point and interval sampling errors

When organizing a sample, it is important to determine the required sample size (n). The latter depends on the variation of the units of the population being surveyed. The greater the diversity, the larger the sample size should be. Inverse relationship between sample size and its marginal error. The desire to obtain a smaller error requires increasing the size of the sample population.

The required sample size is determined based on the formulas for the maximum sampling error (d) with a given level of probability (P). Through mathematical transformations, formulas for calculating the sample size are obtained (Table 41).

Table 41

Calculation of the required sample size _

It should be noted that everything stated in relation to statistical estimates is based on the assumption that the sample population, the parameters of which are used in the assessment, is obtained using a selection method (method) that provides sampling probabilities.

At the same time, when choosing a confidence probability of an estimate, one should be guided by the principle that the choice of its level is not a mathematical problem, but is determined specifically by the problem being solved. To confirm this, let's look at an example.

Example. Suppose that at two enterprises the probability of producing finished (high-quality) products is P = 0.999, that is, the probability of receiving defective products will be a = 0.001. Is it possible, within the framework of mathematical considerations, without being interested in the nature of the product, to resolve the question of whether there was a high probability of shortage a = 0.001? Let's say one enterprise produces seeders, and the second produces airplanes for processing crops. If out of 1000 seeders there is one defective one, then this can be tolerated, because melting down 0.1% of seeders is cheaper than restructuring the technological process. If there is one defective aircraft out of 1000 aircraft, this will certainly lead to serious consequences during its operation. So, in the first case, the probability of getting a marriage A = 0.001 can be accepted, in the second case - not. For this reason, the choice of confidence probability in calculations in general and when calculating estimates in particular should be carried out based on the specific conditions of the problem.

Depending on the objectives of the study, it may be necessary to calculate one or two confidence limits. If the features of the problem being solved require setting only one of the boundaries, upper or lower, you can make sure that the probability with which this boundary is set will be higher than when specifying both boundaries for the same value of the confidence coefficient 1

Let the confidence limits be set with probability P = 0.95, that is,

in 95% of cases the general average (x) will be no less than the lower one

confidence interval x ™ - x "m and not more than the upper confidence

interval Xup - = x + In this case, only with probability a = 0.05 (or 5%) can the general average go beyond the specified boundaries. Since the distribution of X is symmetrical, then half of this level

probability, that is, 2.5% will occur in the case when x (x ™ - and the second half - in the case when, x ^ x "^ -. It follows from this that the probability that the general average may be less than upper value

Hvei's confidence limit "- is equal to 0.975 (that is, 0.95 +0.025). Consequently, conditions are created when, with two confidence limits, we neglect

the value of x is both less than x "" *., and larger or Heerx. Naming

only one confidence limit, for example, Xup., we neglect only those ~ exceeding this limit. For the same value of the confidence coefficient X, the significance level a here turns out to be two times less.

If only the characteristic values ​​that exceed

(or vice versa do not exceed) the value of the desired parameter x, the confidence interval is called one-sided. If the values ​​under consideration are limited on both sides, the confidence interval is called two-sided. From the above it follows that hypotheses and a number of criteria, in particular the X-Student test, should be considered as one-sided and two-sided. Therefore, with a two-sided hypothesis, the significance level for the same value of X will be twice as large as a one-sided one. If we want to leave the level of significance (and confidence level) the same with a one-sided hypothesis as with a two-sided hypothesis, then the value of X should be taken less. This feature was taken into account when compiling standard tables of X-Student criteria (Appendix 1).

It is known that from a practical point of view, what is often of interest is not so much the confidence intervals of the possible value of the general average, but rather those maximum and minimum values ​​that the general average cannot be greater or less than with a given (confidence) probability. In mathematical statistics they are called the guaranteed maximum and the guaranteed minimum of the average. Having designated the named parameters

respectively, through and x ™, we can write: ХШ ™ = x +; xship = x ~.

When calculating the guaranteed maximum and minimum values ​​of the general average, as the boundaries of the one-sided confidence interval in the above formulas, the value 1 is taken as a one-sided criterion.

Example. For 20 sample plots, the average sugar beet yield was 300 n/ha. This sample mean characterizes the corresponding

population parameter (x) with an error of 10 n/ha. According to the selectivity of estimates, the general average yield can be either greater or less than the sample average x = 300. With probability P = 0.95, it can be stated that the desired parameter will not be greater than XIII "= 300 +1.73 x10 = 317.3 kg / ha.

The value 1 is taken for the number of degrees of freedom ^ = 20-1 with a one-sided critical region and significance level A = 0.05 (Appendix 1). So, with a probability of P = 0.95, the guaranteed maximum possible level of general average yield is estimated at 317 n/ha, that is, under favorable conditions, the average yield of sugar beet does not exceed the specified value.

In some branches of knowledge (for example, in the natural sciences), the theory of estimation is inferior to the theory of testing statistical hypotheses. In economic science, statistical evaluation methods play a very important role in checking the reliability of research results, as well as in various kinds of practical calculations. First of all, this concerns the use of a point estimate of the statistical populations under study. Choosing the best possible estimate is the main problem of point estimation. The possibility of such a choice is determined by knowledge of the basic properties (requirements) of statistical estimates.

Let it be necessary to study a quantitative characteristic of a general population. Let us assume that, from theoretical considerations, we have been able to establish exactly what distribution the feature has. The problem arises of estimating the parameters that determine this distribution. For example, if it is known in advance that the characteristic being studied is distributed in the general population according to a normal law, then it is necessary to estimate the mathematical expectation and standard deviation, since these two parameters completely determine the normal distribution. If there is reason to believe that a characteristic has a Poisson distribution, then it is necessary to estimate the parameter by which this distribution is determined. Typically, only sample data obtained from observations are available: , , ... , . The estimated parameter is expressed through these data. Considering , , ... as values ​​of independent random variables , , ... , , we can say that finding a statistical estimate of an unknown parameter of a theoretical distribution means finding a function of observed random variables, which gives an approximate value of the estimated parameter.

So, statistical assessment An unknown parameter of a theoretical distribution is called a function of observed random variables. A statistical estimate of an unknown population parameter using one number is called point. The following point estimates are considered: biased and unbiased, effective and consistent.

In order for statistical estimates to provide good approximations of the estimated parameters, they must satisfy certain requirements. Let us indicate these requirements. Let there be a statistical estimate of an unknown parameter of the theoretical distribution. Let us assume that an estimate has been found from a sample of volume. Let's repeat the experiment, i.e., we will extract another sample of the same size from their general population and, using its data, we will find an estimate, etc. We will obtain numbers , , ... , which will be different from each other. Thus, the estimate can be considered as a random variable, and the numbers , , ... , as its possible values.

If the estimate gives an approximate value with an excess, then the number found from the sample data ( ) will be greater than the true value. Consequently, the mathematical expectation (average value) of the random variable will be greater than , i.e. . If it gives an approximate value with a disadvantage, then .

Thus, using a statistical estimate whose mathematical expectation is not equal to the parameter being estimated would lead to systematic errors. Therefore, it is necessary to require that the mathematical expectation of the estimate be equal to the estimated parameter. Compliance with the requirement eliminates systematic errors.

Unbiased is called a statistical estimate, the mathematical expectation of which is equal to the estimated parameter, i.e.

Displaced called a statistical estimate, the mathematical expectation of which is not equal to the estimated parameter.

However, it is a mistake to assume that an unbiased estimate always gives a good approximation of the estimated parameter. Indeed, the possible values ​​may be widely scattered around their mean value, that is, the dispersion of the value may be significant. In this case, the estimate found from the data of one sample, for example, may turn out to be very distant from its average value, and therefore from the estimated parameter itself. If we took as an approximate value, we would make a big mistake. If you require that the variance of a quantity be small, then the possibility of making a large error will be eliminated. Therefore, statistical evaluation is subject to efficiency requirements.

Effective is a statistical estimate that (for a given sample size) has the smallest possible variance. When considering large samples, statistical estimates are required to be consistent.

Wealthy called a statistical estimate, which tends in probability to the estimated parameter. For example, if the variance of an unbiased estimate at tends to zero, then such an estimate turns out to be consistent.

Let us consider the question of which sample characteristics best estimate the general mean and variance in terms of unbiasedness, efficiency and consistency.

Let us study a discrete general population with respect to a quantitative characteristic. General Secondary is called the arithmetic mean of the characteristic values ​​of the general population. It can be calculated using formulas or , where are the values ​​of the characteristic of the general population of volume , are the corresponding frequencies, and .

Let a sample of volume with characteristic values ​​be extracted from the general population as a result of independent observations of a quantitative characteristic . Sample average is called the arithmetic mean of the sample population. It can be calculated using formulas or , where are the values ​​of the characteristic in the sample population of volume , are the corresponding frequencies, and .

If the general average is unknown and it is required to estimate it using sample data, then the sample average, which is an unbiased and consistent estimate, is taken as an estimate of the general average. It follows that if sample means are found from several samples of a sufficiently large size from the same general population, then they will be approximately equal to each other. This is the property stability of sample means.

Note that if the variances of two populations are the same, then the proximity of the sample means to the general means does not depend on the ratio of the sample size to the size of the general population. It depends on the sample size: the larger the sample size, the less the sample mean differs from the general mean.

In order to characterize the dispersion of the values ​​of a quantitative characteristic of a population around its average value, a summary characteristic is introduced - the general dispersion. General variance called the arithmetic mean of the squared deviations of the values ​​of the characteristic of the population from their mean value, which is calculated using the formulas: , or .

In order to characterize the dispersion of the observed values ​​of a quantitative characteristic of a sample around its mean value, a summary characteristic is introduced - sample variance. Sample variance called the arithmetic mean of the squared deviations of the observed values ​​of a characteristic from their mean value, which is calculated using the formulas: , or .

In addition to dispersion, to characterize the dispersion of the values ​​of a characteristic of the general (sample) population around its mean value, a summary characteristic is used - the standard deviation. General standard deviation called the square root of the general variance: . Sample standard deviation is called the square root of the sample variance:

Let a sample of volume be extracted from the general population as a result of independent observations on a quantitative characteristic. It is required to estimate the unknown general variance based on sample data. If we take the sample variance as an estimate of the general variance, then this estimate will lead to systematic errors, giving an underestimated value of the general variance. This is explained by the fact that the sample variance is a biased estimate; in other words, the mathematical expectation of the sample variance is not equal to the estimated general variance, but is equal to .

It is easy to correct the sample variance so that its expected value is equal to the population variance. To do this, it is enough to multiply by a fraction. As a result, we obtain the corrected variance, which is usually denoted by . The corrected variance will be an unbiased estimate of the population variance: .

2. Interval estimates.

Along with point estimation, the statistical theory of parameter estimation deals with issues of interval estimation. The problem of interval estimation can be formulated as follows: based on sample data, construct a numerical neutral, relative to which, with a pre-selected probability, we can say that the estimated parameter is located within this interval. Interval estimation is especially necessary with a small number of observations, when the point estimate is largely random and, therefore, not very reliable.

Confidence interval for a parameter, such an interval is called, relative to which it is possible, with a pre-selected probability close to unity, to assert that it contains an unknown value of the parameter, i.e. . The smaller the number for the selected probability, the more accurate the estimate of the unknown parameter. Conversely, if this number is large, then the estimate made using this interval is of little use for practice. Since the ends of the confidence interval depend on the elements of the sample, the values ​​of and can vary from sample to sample. Probability is usually called confidence probability (reliability). Typically, the reliability of the estimate is specified in advance, and a number close to one is taken as the value. The choice of confidence probability is not a mathematical problem, but is determined by the specific problem being solved. The reliability most often set is equal to ; ; .

Let us present without derivation a confidence interval for the general mean for a known value of the standard deviation, provided that the random variable (quantitative characteristic) is normally distributed:

where is a predetermined number close to one, and the function values ​​are given in Appendix 2.

The meaning of this relationship is as follows: it can be reliably stated that the confidence interval ( ) covers the unknown parameter, the accuracy of the estimate is equal to . The number is determined from the equality , or . Using the table (Appendix 2), find the argument to which the value of the Laplace function corresponds, equal to .

Example 1. The random variable has a normal distribution with a known standard deviation. Find confidence intervals for estimating the unknown general mean based on sample means, if the sample size and the reliability of the estimate are given.

Solution. Let's find it. From the relationship we obtain that . Using the table (Appendix 2) we find . Let's find the accuracy of the estimate . The confidence intervals will be: . For example, if , then the confidence interval has the following confidence limits: ; . Thus, the values ​​of the unknown parameter , consistent with the sample data, satisfy the inequality .

The confidence interval for the general mean of the normal distribution of a characteristic with an unknown value of the standard deviation is given by the expression .

It follows that it can be reliably stated that the confidence interval covers the unknown parameter.

There are ready-made tables (Appendix 4), using which, given the given ones, one can find the probability, and vice versa, given the given ones, one can find.

Example 2. The quantitative characteristic of the population is normally distributed. Based on the volume sample, the sample mean and corrected standard deviation were found. Estimate an unknown general mean using a confidence interval with reliability.

Solution. Let's find it. Using the table (Appendix 4) we find: . Let's find the confidence limits:

So, with reliability, the unknown parameter is contained in the confidence interval.

3. The concept of statistical hypothesis. General formulation of the hypothesis testing problem.

Testing statistical hypotheses is closely related to the theory of parameter estimation. In natural science, technology, and economics, in order to clarify one or another random fact, they often resort to expressing hypotheses that can be tested statistically, that is, based on the results of observations in a random sample. Under statistical hypotheses hypotheses are meant that relate either to the type or to individual parameters of the distribution of a random variable. So, for example, the statistical hypothesis is that the distribution of labor productivity of workers performing the same work under the same conditions has a normal distribution law. The hypothesis that the average sizes of parts produced on similar parallel-operating machines do not differ from each other will also be statistical.

The statistical hypothesis is called simple, if it uniquely determines the distribution of the random variable, otherwise the hypothesis is called complex. For example, a simple hypothesis is the assumption that a random variable is normally distributed with a mathematical expectation equal to zero and a variance equal to one. If it is assumed that a random variable has a normal distribution with a variance equal to one, and the mathematical expectation is a number from the interval, then this is a complex hypothesis. Another example of a complex hypothesis is the assumption that a continuous random variable is likely to take a value from the interval, in which case the distribution of the random variable can be any of the class of continuous distributions.

Often the distribution of a quantity is known, and it is necessary to test assumptions about the value of the parameters of this distribution using a sample of observations. Such hypotheses are called parametric.

The hypothesis being tested is called null hypothesis and is designated . Along with the hypothesis, one of the alternative (competing) hypotheses is considered. For example, if the hypothesis is being tested that a parameter is equal to some given value, i.e. : , then one of the following hypotheses can be considered as an alternative hypothesis: : ; : ; : ; : , where is the specified value, . The choice of an alternative hypothesis is determined by the specific formulation of the problem.

The rule by which the decision is made to accept or reject a hypothesis is called criterion. Since the decision is made on the basis of a sample of observations of a random variable, it is necessary to select an appropriate statistic, in this case called a test statistic. When testing a simple parametric hypothesis: the same statistics are chosen as the criterion statistics as for estimating the parameter.

Statistical hypothesis testing is based on the principle that low-probability events are considered impossible, and events that have a high probability are considered reliable. This principle can be implemented as follows. Before analyzing the sample, a certain small probability is fixed, called level of significance. Let be a set of statistics values ​​, and let be a subset such that, provided the hypothesis is true, the probability of the criterion statistic falling in is equal to , i.e. .

Let us denote by the sample value of statistics calculated from a sample of observations. The criterion is formulated as follows: reject the hypothesis if ; accept the hypothesis if . A criterion based on the use of a predetermined significance level is called significance criterion. The set of all values ​​of the criterion statistics at which a decision is made to reject the hypothesis is called critical area; the area is called area of ​​adoption hypotheses.

The significance level determines the size of the critical region. The position of the critical region on the set of statistical values ​​depends on the formulation of the alternative hypothesis. For example, if the hypothesis is tested: , and the alternative hypothesis is formulated as: (), then the critical region is located on the right (left) “tail” of the statistics distribution, i.e., it has the form of inequality: (), where and are those statistics values ​​that are accepted with probabilities accordingly and provided that the hypothesis is true. In this case the criterion is called one-sided, right-handed and left-handed, respectively. If the alternative hypothesis is formulated as: , then the critical region is located on both “tails” of the distribution, i.e., it is determined by a set of inequalities and ; in this case the criterion is called two-way.

In Fig. Figure 30 shows the location of the critical region for various alternative hypotheses. Here is the distribution density of the criterion statistics, provided that the hypothesis is true, is the area of ​​acceptance of the hypothesis, .

Thus, testing a parametric statistical hypothesis using a significance test can be divided into the following stages:

1) formulate testable () and alternative () hypotheses;

2) assign a significance level; as inconsistent with the results of observations; if , then accept the hypothesis, i.e. assume that the hypothesis does not contradict the observational results.

Usually, when performing steps 4 - 7, statistics are used whose quantiles are tabulated: statistics with a normal distribution, Student statistics, Fisher statistics.

Example 3. According to the passport data of the car engine, fuel consumption per 100 km mileage is 10 l. As a result of the engine design change, fuel consumption is expected to decrease. Tests are carried out to verify 25 randomly selected cars with an upgraded engine, with a sample average of fuel consumption per 100 km mileage according to test results was 9.3 l. Assume that the fuel consumption sample is drawn from a normally distributed population with mean and variance. Provided that the critical region hypothesis for the initial statistics is true, i.e. equal to the significance level. Find the probabilities of errors of the first and second types for a criterion with such a critical region. has a normal distribution with a mathematical expectation equal to and a variance equal to . We find the probability of a second type error using formula (11.2):

Therefore, in accordance with the accepted criterion, 13.6% of cars with fuel consumption 9 l on 100 km mileage are classified as vehicles having fuel consumption 10 l.

4. Theoretical and empirical frequencies. Consent criteria.

Empirical frequencies- frequencies obtained as a result of experience (observation). Theoretical frequencies are calculated using formulas. For the normal distribution law they can be found as follows:

, (11.3)

Lecture outline:

    Concept of evaluation

    Properties of statistical estimates

    Methods for finding point estimates

    Interval parameter estimation

    Confidence interval for the mathematical expectation with a known variance of a normally distributed population.

    Chi-square distribution and Student's t-distribution.

    Confidence interval for the mathematical expectation of a random variable that has a normal distribution with unknown variance.

    Confidence interval for the standard deviation of a normal distribution.

Bibliography:

    Wentzel, E.S. Probability theory [Text] / E.S. Wentzel. – M.: Higher School, 2006. – 575 p.

    Gmurman, V.E. Probability theory and mathematical statistics [Text] / V.E. Gmurman. - M.: Higher School, 2007. - 480 p.

    Kremer, N.Sh. Probability theory and mathematical statistics [Text] / N.Sh. Kremer - M: UNITY, 2002. – 543 p.

P.1. Concept of evaluation

Distributions such as binomial, exponential, and normal are families of distributions that depend on one or more parameters. For example, an exponential distribution with a probability density depends on one parameter λ, a normal distribution
- from two parameters m and σ. From the conditions of the problem under study, it is usually clear which family of distributions we are talking about. However, the specific values ​​of the parameters of this distribution, which are included in the expressions of the distribution characteristics that interest us, remain unknown. Therefore, it is necessary to know at least the approximate value of these quantities.

Let the law of distribution of the general population be determined up to the values ​​of the parameters included in its distribution
, some of which may be known. One of the tasks of mathematical statistics is to find estimates of unknown parameters from a sample of observations
from the general population. Estimation of unknown parameters consists of constructing a function
from a random sample such that the value of this function is approximately equal to the estimated unknown parameter θ . Function called statistics parameter θ .

Statistical assessment(in the future simply assessment) parameter θ The theoretical distribution is called its approximate value, depending on the choice data.

Grade is a random variable, because is a function of independent random variables
; If you make another sample, then the function will, generally speaking, take a different value.

There are two types of estimates: point and interval.

Spot is called a score determined by one number. With a small number of observations, these estimates can lead to gross errors. To avoid them, interval estimates are used.

Interval is an estimate that is determined by two numbers - the ends of the interval in which the estimated value is contained with a given probability θ .

P. 2 Properties of statistical estimates

Size
called assessment accuracy. The less
, the better, the more accurately the unknown parameter is determined.

The assessment of any parameter is subject to a number of requirements that it must satisfy in order to be “close” to the true value of the parameter, i.e. be in some sense a “benign” assessment. The quality of an estimate is determined by checking whether it has the properties of unbiasedness, efficiency, and consistency.

Grade parameter θ called undisplaced(without systematic errors), if the mathematical expectation of the estimate coincides with the true value θ :

. (1)

If equality (1) does not hold, then the estimate called displaced(with systematic errors). This bias may be due to measurement errors, counting errors, or the non-random nature of the sample. Systematic errors lead to overestimation or underestimation.

For some problems in mathematical statistics, there may be several unbiased estimates. Usually preference is given to the one with the least scattering (dispersion).

Grade called effective, if it has the smallest variance among all possible unbiased estimates of the parameter θ .

Let D() is the minimum variance, and
– variance of any other unbiased estimate parameter θ . Then the efficiency of the estimate equal to

. (2)

It's clear that
. The closer
to 1, the more effective the assessment . If
at
, then the estimate is called asymptotically efficient.

Comment: If the score biased, then the smallness of its variance does not indicate the smallness of its error. Taking, for example, as an estimate of the parameter θ some number , we get an estimate even with zero variance. However, in this case the error (error)
can be as large as you like.

Grade called wealthy, if with increasing sample size (
) the estimate converges in probability to the exact value of the parameter θ , i.e. if for anyone

. (3)

Validity of the assessment parameter θ means that with growth n sample size quality of assessment is improving.

Theorem 1. The sample mean is an unbiased and consistent estimate of the mathematical expectation.

Theorem 2. The corrected sample variance is an unbiased and consistent estimate of the variance.

Theorem 3. The empirical distribution function of a sample is an unbiased and consistent estimate of the distribution function of a random variable.

Loading...