##
Problem

We wish to uniformly randomly sample binary strings of length $D$ that when interpreted as integers lie between 0 and some upper bound $U,\space0 \leq U \leq 2^D-1$.

Equivalently we wish to uniformly randomly sample leaves of a binary tree of depth $D$where the sampled leaves lie to the left of (or on) an upper bound leaf $U$.

We seek to minimize the number of individual $0,1$ samples necessary to produce a uniformly distributed sample string $S$ such that $S \leq U$. We denote the expected number of individual $0,1$ samples $E_{cost}.$

$$ depends on $D$, $U$, and the particular sampling procedure we use. However it is clear, over all choices of $D$, $U$, and sampling procedure, that $E_{cost} \geq D$ and achieves this lower bound for $U=11...1$. (When $U=11...1$ any leaf is valid and it takes $D$ samples to select a leaf).

##
Naive rejection has worst case exponential $E_{cost}$

The simplest procedure is to sample $S$ by concatenating $D$ independent $0,1$ samples and rejecting $S$ whenever $S\gt U$.

We will write $E_{cost}$ for this procedure as

$E_{cost}=p_{acc}D + p_{rej}(E_{waste}+E_{cost})\tag{1}$

where $p_{acc}$ is the probability that we accept $S$, $p_{rej}(=1-p_{acc})$ is the probability that we reject $S$, and $E_{waste}$ is the expected number of $0,1$ samples used ("wasted") per reject $S$that we reject.

This equation can be rewritten as

$E_{cost}=D + \frac{p_{rej}}{p_{acc}}E_{waste}\tag{2}.$

It is easy to see that if we fix $U=000...0$ then $p_{rej}=\frac{2^D-1}{2^D}$, $p_{acc}=\frac{1}{2^D}$, therefore $\frac{p_{rej}}{p_{acc}}=2^D-1$. Note that under naive rejection sampling $E_{waste}=D$. This shows that $E_{cost}$has worst case (over choice of $U$) exponentially large magnitude using naive rejection sampling. Indeed, $0\leq E_{cost} \leq D(2^D-1$).

##
Prefix matching and early rejection

We will modify naive rejection sampling in two ways and then show that under this modified procedure $E_{cost} \leq D+2$.

The modifcations are:

*Prefix matching*. We will perform rejection sampling restricted to the smallest subtree that contains all of the in-range leaves (instead of sampling from the entire tree). We will refer to this smallest containing subtree as the $SCS$. Note that there are $D$different $SCS$s and that each $U$ is contained under one of them.
*Early rejection*. We abort a tree descent as soon as the last $0,1$ sample places us in a subtree with no legal leaves rather than continuing for the full length $D$.

We will call naive rejection sampling with these two modifications *modified sampling*.

##
Under modified sampling, $p_{rej}\over p_{acc}$ achieves maximum at $U=100...0$

Under prefix matching, the $SCS$ has the property that all of its left child's leaves are in-range and at least one of its right child's leaves is in-range (if none of the right child's leaves are in-range then we can decrease the size of the $SCS$ by descending to the left child).

This implies that $p_{acc} \geq {1\over2}.$ Therefore, $p_{acc}\gt p_{rej}$ and therefore ${p_{rej}\over p_{acc}}\leq1.$

Generally,

$\frac{p_{rej}}{p_{acc}} = \frac{2^{D-k}-j}{2^{D-k}+j}\tag{Eqn. 3}$

where $k$ is the depth of the $SCS$ and $1 \geq j>2^{D-k-1}$ is the number of in-range leaves in that subtree depending on $U$.

$(3)$ is maximized when $k=0$ and $j=1$. This occurs for $U=100...0$.

##
Under modified sampling, $E_{waste}$ achieves maximum at $U=100...0$

We now consider $E_{waste}$ as a function of $U$. Every $U$ specifies a unique path in the binary tree. At every depth $d$ in a path where the left child was chosen ($U_d=0$) the right child represents a subtree with $2^{D-(d+1)}$ leaves that can be early-rejected using $(d+1)$ samples.

This gives, for an $SCS$ of height $D$:

$E_{waste}(U)=\sum_{\{d|U_d=0\}}^{D} (d+1)(\frac{2^{D-(d+1)}}{2^D})=\sum_{\{d|U_d=0\}}^{D} \frac{(d+1)}{2^{(d+1)}}.\tag{4}$

For smaller $SCS$ there are two necessary observations:

- For all $U$ that correspond to the same $SCS$ each additional $U_d=0$ contributes an additional positive term in $(4)$ and thus increases $E_{waste}(U)$. Therefore the $U$ that maximizes $E_{waste}(U)$ for a given $SCS$ is the $U$ with all possible elements $0$.
- Across $SCS$ of different heights $E_{waste}(U)$ is maximized by the $SCS$ that is the full binary tree for $D$. This is because the max $U$ for smaller $SCS$ simply leave out terms from $(4)$ above.

From these observations we can conclude that for a $E_{waste}(U)$ is maximized over all $U$ at $U=100...0$ and the value of $E_{waste}(U)$ is $(4)$ with all terms in the sum included.

We can then use the fact that

$\lim_{D \to \infty} \sum_{d=0}^{D} \frac{(d+1)}{2^{(d+1)}}=2$

to conclude that $E_{waste} \leq 2$ and this is achieved at $U=100...0$.

##
Worst case is therefore $D+2$

We have shown that the $U$ that maximises both ${p_{rej}\over p_{acc}}$ and $E_{waste}$ is $U=100...0$. Therefore the maximum of their product is also achieved at $U=100...0$ and in particular is at most:

$(\frac{2^{D}-1}{2^{D}+1})(2)$

which directly implies that $E_{cost} \leq D+2$.

##
Extension to $n$-ary strings

Similar reasoning seems to show that $E_{cost} \leq D+n$.