Problem
We wish to uniformly randomly sample binary strings of length that when interpreted as integers lie between 0 and some upper bound .
Equivalently we wish to uniformly randomly sample leaves of a binary tree of depth where the sampled leaves lie to the left of (or on) an upper bound leaf .
We seek to minimize the number of individual samples necessary to produce a uniformly distributed sample string such that . We denote the expected number of individual samples
Naive rejection has worst case exponential
The simplest procedure is to sample by concatenating independent samples and rejecting whenever .
We will write for this procedure as
where is the probability that we accept , is the probability that we reject , and is the expected number of samples used ("wasted") per reject that we reject.
This equation can be rewritten as
It is easy to see that if we fix then , , therefore . Note that under naive rejection sampling . This shows that has worst case (over choice of ) exponentially large magnitude using naive rejection sampling. Indeed, ).
Prefix matching and early rejection
We will modify naive rejection sampling in two ways and then show that under this modified procedure .
The modifcations are:
- Prefix matching. We will perform rejection sampling restricted to the smallest subtree that contains all of the in-range leaves (instead of sampling from the entire tree). We will refer to this smallest containing subtree as the
. Note that there are different s and that each is contained under one of them. - Early rejection. We abort a tree descent as soon as the last
sample places us in a subtree with no legal leaves rather than continuing for the full length .
We will call naive rejection sampling with these two modifications modified sampling.
Under modified sampling, achieves maximum at
Under prefix matching, the has the property that all of its left child's leaves are in-range and at least one of its right child's leaves is in-range (if none of the right child's leaves are in-range then we can decrease the size of the by descending to the left child).
This implies that Therefore, and therefore
Generally,
where is the depth of the and is the number of in-range leaves in that subtree depending on .
Under modified sampling, achieves maximum at
We now consider as a function of . Every specifies a unique path in the binary tree. At every depth in a path where the left child was chosen ( ) the right child represents a subtree with leaves that can be early-rejected using samples.
This gives, for an of height :
For smaller there are two necessary observations:
- For all
that correspond to the same each additional contributes an additional positive term in and thus increases . Therefore the that maximizes for a given is the with all possible elements . - Across
of different heights is maximized by the that is the full binary tree for . This is because the max for smaller simply leave out terms from above.
From these observations we can conclude that for a is maximized over all at and the value of is with all terms in the sum included.
We can then use the fact that
to conclude that and this is achieved at .
Worst case is therefore
We have shown that the that maximises both and is . Therefore the maximum of their product is also achieved at and in particular is at most:
which directly implies that .
Extension to -ary strings
Similar reasoning seems to show that .
No comments:
Post a Comment