Optimal stopping rules for exponential data

A problem of sequential sampling from an Exponential Distribution is considered in this research. The problem is formulated in the stochastic dynamic programming framework and the objective is to determine a control policy maximizing the total expected reward. It is assumed that under standard assumptions the control limit policy is optimal. Two types of optimal stopping problems are considered. First one is the problem of sampling without recall that once the decision maker cannot return to that observation at a later time, the second type of optimal stopping problems is sampling with recall where the decision maker can select any observation which he has taken earlier. |Stochastic dynamic programming | Optimal stopping problem | Exponential distribution | ® 2011 Ibnu Sina Institute. All rights reserved.


INTRODUCTION
The problem to be treated in this research is in the category of problems of optimal stopping.In each of these problems, the statistician takes observation sequentially, and at each stage he must decide whether to stop and suffer a specified stopping risk or continue and the next observation at some specified sampling cost.Two types of optimal stopping problems are considered.First one is the problem of sampling without recall that once the decision maker can not return to that observation at a later time, the second type of optimal stopping problems is sampling with recall where the decision maker can select any observation which he has taken earlier.
The optimal stopping problems have been studied by Macqueen and Miller [1].Sakaguchi [2] considered problems involve sampling from a distribution with unknown mean.Kramer and Starr [3] characterized the optimal stopping rule and its distribution in a size dependent search.Application of optimal stopping rule in size dependent search has been extended by Bellout [4].In this research, posterior state of the process after taking each sample has been determined through Bayesian inference.
The Bayesian process control focuses on determining the optimal control policy based on the posterior probability minimizing the total expected cost over a finite horizon, or the long-run expected average cost.
Corresponding author at: 1  Makis [5] applied Bayesian control through POMDP (partially observable Markov decision process) framework in a multivariate process mean control problem.He developed the optimal stopping rule through control limit policy which minimizes the total expected cost.The early contributions to Bayesian process control are the models by Girshick and Rubin [6] and Bayesian control design are considered by Tagaras and Nikolaidis [7], Makis and Jiang [8].
One application of this model is Stock-Option Model (Ross [9]).Suppose that you own an option to buy one share of the stock at a fixed price, say c, and you have N days in which to exercise the option.You need never exercise it, but if you do it at a time when the stock's price is t, then your profit is t-c.What strategy maximizes your expected profit?If ( ) n V t denote the maximal expected profit when the stock' price is t and the option has n additional days to run, then ( ) where ( ) The problem is formulated in the stochastic dynamic programming framework and the objective is to determine a control policy maximizing the total expected reward.It is assumed that under standard assumptions the | 46 | control limit policy is optimal.Also the posterior distribution of the state of the system is determined through Bayesian approach.
This paper is organized as follows: Problems of sampling without recall comes in the second section.Properties of optimal stopping rule for the problems of sampling with recall come in section three.The conclusions are presented in the final section.

Sampling without Recall with finite stages
We assume that a sequential random sample T + .If he has not stopped earlier, he must stop and accept the final observed value n t as his reward.This is a problem of sampling without recall, once the decision maker cannot return to that observation at a later time.
Let N denote the random number of observations which have been taken under a stopping rule.The problem is to find an optimal stopping rule that maximizes the expected reward ( ) At any stage of the sampling process, the state of process is denoted by the value of the most recent observation gathered, t, and optimal stopping rule is ( ) . Since the decision maker must stop and accept the final observed value t as his reward, it is concluded that ( ) Since the decision maker is taking samples without recall from a known distribution, the value of expected reward v j from taking one more observation and then continuing depends on the number of j remained observations in an optimal stopping rule.After one more observation t has been gathered, the expected reward from the optimal continuation over the remaining j-1 stages is ( ) hence the expected reward v j satisfies the following equation, ( ) the expected reward ( ) j V t from the optimal stopping rule is maximum of the reward from stopping t ,and the expected reward from continuing, v j, , thus following is concluded, = and from equations ( 1) and ( 2), the functions 1 2 , ,... V V and the numbers 1 2 , ,... v v are evaluated successively.
Let v j , j=0, 1, 2… n, denote the maximum expected reward.It can be proven that (Degroot [10]) the optimal stopping rule for problem is to continue sampling whenever an observed value j n j t v − < and to stop the process elsewhere.Values of v j can be determined from following equations.

Sampling without Recall with infinite stages
In this sampling scheme, there is no upper bound on the number of observations which can be taken but there is a fixed cost per observation.Let .It can be proven that if the variance of the F(t) is finite, then the maximum expected reward * v among all stopping rules is finite and there is an optimal stopping rule whose expected reward is * v (Degroot [10]).
After the first observation t has been taken, the decision maker either can stop the sampling process or can continue to take more observations.If he stops the sampling, his reward will be the values of t minus the sampling cost c also if the decision maker continues the sampling, then he is in the same position that he was at the start of the sampling process but he has already spent c for the first observation, thus the expected reward from the optimal continuation is again * v minus the cost c of the observation gathered.
Thus after the first observation t has been observed, if * T t v = < then the optimal stopping rule is to continue sampling process and if the optimal stopping rule is to terminate the sampling process.The expected reward from this optimal procedure is { } ( ) * max , E t v c − , Also, the expected reward from the optimal stopping rule is assumed to be * v .Thus the following equation is concluded, Assuming f(t) is a exponential distribution with parameter λ ( ) ( ) ( ) The expected reward can be calculated as follows,

Sampling without Recall from an Exponential Distribution with Unknown Mean
Assume that a random sample 1 2 , ,... T T is taken at a cost of c from an exponential distribution with unknown mean ( λ is not known).Let the prior distribution of λ is a Gamma distribution.Thus, the posterior distribution at each stage of the sampling process will be a Gamma distribution.The decision maker should find a stopping rule that maximizes the expected reward ( ) It can be proven that if the variance of distribution of each observation is finite then an optimal stopping rule exists and the expected reward of the optimal stopping rule is finite (Degroot [10]).This assumption is satisfied in the problem of sampling from an exponential distribution which has been considered here.
Assume that the prior distribution of λ is a Gamma Distribution, to use a non-informative prior by assuming that parameters of Gamma converge to zero, i.e., the prior distribution of λ is Gamma (0, 0).

( ) ( )
Then the marginal distribution of the next observation, t, can be determined as follows: Hence the mean of random variable T is: At each stage of the sampling process, the state of the process is characterized by the triple ( ) At this stage, the decision maker either can stop sampling and accept the reward n t or can take another observation at a cost c and then continue the sampling process in an optimal procedure.Thus the following functional equation for V is concluded, ( ) Let n α defined as follows: , , 1 1 Hence the optimal stopping rule at the state ( ) Lemma 1: for any value of n>0, then can be determined as follows: Thus 1 n α + can be determined as follows: ( Using distribution function f(t) derived in equation ( 10), following is concluded, ( ) Since all functions in equation ( 18) are continuous, thus following is concluded: Using mathematical methods for convergence of functions and according to the law of large numbers, it is concluded that, ( ) From equations ( 18), ( 19) and (20), following is concluded,

SAMPLING WITH RECALL FROM AN EXPONENTIAL DISTRIBUTION WITH UNKNOWN MEAN
Now, we consider the problem of sampling with recall.We assume that a random sample 1 2 , ,... T T is taken at a cost of c units per observation from a distribution F(t).Assume that the decision maker can select any observation which he has taken earlier, and accept value of that observation minus the total sampling cost as his reward, thus if decision maker stops taking samples after he has taken the samples In Lemma 3 and Lemma 4, we derive some properties of optimal solution for this problem.
reward and terminate the sampling process otherwise continue taking samples.It can be shown that the value of 1 n α + can be determined from the value of n α as follows: is the mean of the random variable T and n is the number of observations has been taken.r denotes the reward to the decision maker, without considering the amount spent on taking samples, if the optimal stopping rule is to continue from this state and the value of the next observation is t, then the state ( ) µ