In research papers, a segment tree refers to a tree data structure allowing retrieving a list of segments which contain the given point. In competitive programming, the name "segment tree" usually refers to a data structure maintaining an array. According to http://web.ntnu.edu.tw/~algo/Sequence2.html, the data structure originated from Baltic OI 2001: Mars Maps.
Let's say we have an array with n elements
a[0],a[1],...,a[n-1]
. We will refer to the length by n
below. There are range modifications and range queries. Our segment tree
stores the elements in its leaf nodes, supporting range modifications
and range queries in O(log(n)) time. We can make an in-order traversal
at any time to retrieve the updated array.
A range modification applies a unique function f
to
every element whose index is in the range [l,r)
(note:
f
can be parameterized on the index). The most frequently
used range modification operations are:
- add an integer (this is commutative, no need to use lazy propagation)
- set to an integer
A range query computes the monoid sum of the elements
a[l],a[l+1],...,a[r-1]
. Some of the most frequently used
binary operations for range queries:
- sum:
a[l]+a[l+1]+...+a[r-1]
- max (range maximum query):
max(a[l],a[l+1],...,a[r-1])
- min (range minimum query):
min(a[l],a[l+1],...,a[r-1])
There are many others. The requirement is that the binary operation is associative and there is an identity element. Strictly speaking, an identity element is not essential, but it makes implementation simple and is trivial to augment once we have a semigroup. For sum/max/min, the binary operations are also commutative. For non-commutative operations, we should watch for the computation order when implementaing a range query.
Each leaf node stores an element in the original array. Two adjacent leaf nodes form an inner node which represents an interval of length 2. Next, two adjacent nodes with length 2 form an inner node which represents an interval of length 4. Then, two adjacent nodes with length 4 form an inner node which represents an interval of length 8. This process is recursive. In the end we have a root node of length n. (This exposition assumes that n is a power of two. If not, some length 2 nodes may join with length 1 nodes.)
For a range modification or range query, the algorithm performs
divide and conquer and splits the original interval into several
disjoint (but consecutive) subintervals corresponding to nodes. If n is
a power of two greater than or equal to 4, any query [l,r)
where 0<=l<r<=n
can be split into 2*(log2(n)-1) or
fewer nodes. As a worse case example, when n=16, [1,15)
splits into: [1,2) [2,4) [4,8) [8,12) [12,14) [14,15)
.
Top-down representations
The root represents the whole array [0,n)
. Then we
divide its interval into halves and the two children of the root will
represent [0,n/2)
and [n/2,n)
. For each node
we check whether the length is greater than 1, and if so, recursively
divide the interval into halves.
The tree structure can be represented in multiple ways. Let's start with the verbose explicit representation.
Explicit tree structure
The pointers to the left and right children are explicitly stored in
a node. When the size is not a concern, the [l,r)
interval
is stored into the node as well.
This representation is useful when:
- there is a need to pack more than one elements into one node. This may be handy and is difficult to be replaced by an iterative bottom-up approach.
- implementating a persistent segment tree.
For other problems, this representation is not recommended.
1 | const int N = 1000000, P = 1000000; |
Implicit tree structure
Binary heap representation (BFS layout)
Let the root node's index (BFS index) be 1. For any node
i
, its left and right children are 2*i
and
2*i+1
respectively.
This representation wastes the index 0. I prefer zero based numbering
(Edsger W. Dijkstra: Why numbering should start at zero).
However, the segment tree is an exception. The reason is that if you
have the index of a leaf, you can subtract n
from it to get
the corresponding array index. While with one based numbering, you need
to subtract n-1
, which is slightly inconvenient. In
addition, getting the index of an ancestor node needs just a right shift
operation while with zero based numbering you need one extra plus and
one extra minus. The inconvenience will add to the complexity of lazy
propagation.
Typically n is a power of 2. The segment tree is a full binary tree.
If n is not a power of 2, we can pad the length by adding tail sentinel
elements. In the worst case, when n=2^k+1
, we need to pad
2^k-1
elements. The space consumption is roughly
4*n
.
1:[0,16) | |||||||||||||||
2:[0,8) | 3:[8,16) | ||||||||||||||
4:[0,4) | 5:[4,8) | 6:[8,12) | 7:[12,16) | ||||||||||||
8:[0,2) | 9:[2,4) | 10:[4,6) | 11:[6,8) | 12:[8,10) | 13:[10,12) | 14:[12,14) | 15:[14,16) | ||||||||
16:0 | 17:1 | 18:2 | 19:3 | 20:4 | 21:5 | 22:6 | 23:7 | 24:8 | 25:9 | 26:10 | 27:11 | 28:12 | 29:13 | 30:14 | 31:15 |
When n is not a power of two, this representation can still be used,
but note that some indexes will be unused beside index 0. If we set the
middle point to floor((l+r)/2)
, the approach uses no less
space than tail padding. For the visual representation below for n=7,
indexes 8 and 9 are unused.
1:[0,7) | |||||||
2:[0,3) | 3:[3,7) | ||||||
4:0 | 5:[1,3) | 6:[3,5) | 7:[5,7) | ||||
10:1 | 11:2 | 12:3 | 13:4 | 14:5 | 15:6 |
1 | void build(int i, int l, int r) { |
If we set the middle point to floor((l+r+1)/2)
, we can
use less space. The exact space is not easy to compute but the last
index increases very fast when n increases from 2^k+1
to
2^(k+1)
. So it is not worth the trouble writing
l+r+1 >> 1
.
1 | 1:[0,7) |
There is another variant when n is not a power of 2, which will be detailed below when introducing the bottom-up open interval implementation.
l+r (in-order layout)
This is an alternative representation for a full binary tree. We can compute the node index by adding its left and right boundaries (in-order traversal indexing).
8:[0,8) | |||||||
4:[0,4) | 12:[4,8) | ||||||
2:[0,2) | 6:[2,4) | 10:[4,6) | 14:[6,8) | ||||
1:0 | 3:1 | 5:2 | 7:3 | 9:4 | 11:5 | 13:6 | 15:7 |
1 | void build(int l, int r) { |
This simple rule works because no two node share the same
l+r
.
However, if n is not a power of two, we may have a non-leaf node
whose l+r
is odd, say, 0+5
. The leftmost node
of its right subtree has the same index:
(l+r-1)/2 + (l+r+1)/2
. Note that such a collision only
happens with a leaf node and an internal node.
A naive approach avoiding collision is to special case the index
computation for leaf nodes. The space consumption is 3*n
.
There are some unused indexes, though.
1 | int id(int l, int r) { |
A better encoding costs 2*n space.
1 | int id(int l, int r) { |
Alternatively, if we make the tree left leaning, we can use:
1 | int id(int l, int r) { |
Anycase, the l+r
based numbering schemes are not cache
friendly. For the binary heap representation, the indexes on one level
are contiguous. We can imagine that the top few levels are shared and
they need fewer blocks.
Bottom-up representaations
Bottom-up, open interval (recommended)
To the best of my knowledge, it was first discovered by 张昆玮 in 2010 (《统计的力量——线段树全接触》). 张昆玮's representation uses open intervals. I used to find it inconvenient in two places:
- the implementation needs two sentinel elements 0 and n-1 (n-2 >= original length).
- the implementation uses open intervals. I prefer half-open intervals.
However, the sentinel elements are not really needed and open intervals turn out to be an advantage. Read on.
The segment tree maintains the an array of length n where n is not necessarily a power of two. 0 and n-1 are not necessarily sentinel elements. (In 张昆玮's code, 0 and n-1 are sentinel elements. There may be other padding elements at the end.)
For a modification or query [l,r)
, we have
0<=l && r<=n
. (In 张昆玮's code, we have
0<l && r<n
because 0 and n-1 are sentinel
elements and thus cannot be included in modifications or queries.)
Let's read a full program. The program reads n integers, then either
computes a[l]+a[l+1]+...+a[r-1]
or adds an integer to
a[l],a[l+1],...,a[r-1]
.
1 | // The program can be easily adapted to other modifications and queries. |
( For completeness, here is the function for single-element updates.
1
2
3
4
5
6void add(int i, long v) {
i += n;
untag(i);
for (; i > 1; i >>= 1)
mconcat(i);
}
After l += n-1, r += n
, l and r have changed semantics
from array indexes to leaf node indexes. They now represent an open
interval of node indexes. The two initial node indexes are not on the
same level if:
- if n is a power of two,
l==0 || r==n-1
- if n is not a power of two, the first 2**ceil(log2(n))-n elements are one level above the rest elements.
(If we use 张昆玮's representation, the two initial nodes are guaranteed to be on the same level.)
In particular, when n is not a power of two, not all leaf nodes
(which store array elements) are on the same level. Say n=13, while
a[3]~a[12]
are stored in leaf nodes 16~25,
a[0]~a[2]
are stored in nodes 13~15 which are one level
above. Indexes 26~31 are unused and not considered nodes. The initial r
may be the invalid 26 but the program will not access it because 26 is
even.
1:[3,13),[0,3) | |||||||||||||||
2:[3,11) | 3:[11,13),[0,3) | ||||||||||||||
4:[3,7) | 5:[7,11) | 6:[11,13),0 | 7:[1,3) | ||||||||||||
8:[3,5) | 9:[5,7) | 10:[7,9) | 11:[9,11) | 12:[11,13) | 13:0 | 14:1 | 15:2 | ||||||||
16:3 | 17:4 | 18:5 | 19:6 | 20:7 | 21:8 | 22:9 | 23:10 | 24:11 | 25:12 |
Let's discuss the position of the two initial nodes and how they
affect the final value of l and r when the loop condition
l^r^1
evaluates to false:
- If initial l and r are in subtree 2, we have
l>=4
after the main loop. - If initial l and r are in subtree 3, we have
l>=6
after the main loop.. - If initial l is in subtree 2 and initial r is in subtree 3, we have
l==2 && r==3
after the main loop. - (Only if the two initial nodes are on different levels) If initial l
is in subtree 3 and initial r is in subtree 2, we have
l==0 && r==1
after the main loop. In a previous iteration, we havel==1 && 2<=r && r<=4
. (r=4 can only happen when n is a power of two and the initial interval is the full interval.)
From the last point, we can see that if initial nodes are not on the same level, there are some subtleties. Fortunately the program can correctly handle such cases without extra code.
Let's analyze some concrete examples when the initial nodes are not on the same level:
Say n=8 and the query is [0,8)
. We have l=7 and r=16.
The first iteration in the main loop does not update any node. l becomes
3 and r becomes 8. The second iteration does not update any node. l
becomes 1 and r becomes 4. The third iteration does not update any node.
l becomes 0 and r becomes 2. The fourth (last) iteration updates node 1.
l is still 0 and r becomes 1. The loop condition evaluates to false.
Let's look at another example [0,7)
. We have l=7 and
r=15. The first iteration updates node 14. l becomes 3 and r becomes 7.
The second iteration updates node 6. l becomes 1 and r becomes 3. The
third (last) iteration updates node 2. l becomes 0 and r becomes 1. The
loop condition evaluates to false.
When adding an integer to a subtree, we need to know the number of
elements in the subtree to compute its influence on the monoid sum. If n
is a power of two, we can derive the number of elements in a subtree
easily. In the program, we introduce the array num[]
. We
initialize nodes [n,2*n)
to have 1 element, and propagate
up the numbers.
For many other modifications, we don't need the number of elements in a subtree. When n is not a power of two, there is no extra bookkeeping.
What does the node at index one represent?
When n is a power of two, the node at index one (root) stores the
monoid sum of the whole array. Top-down traversals are straightforward.
For example, if the array has 0 and 1 elements and we try to find the
kth 1 (the select operation in an order statistic tree). We can start
from node 1 (it represents the whole array) and walk down from it. If
subtree 2 (which corresponds to elements 0 to n/2-1) has more than k
ones, we go to subtree 2, otherwise subtree 3. Keep this process until
we reach a leaf. 1
2
3
4
5
6
7
8// n is a power of two.
int i = 1;
while (i < n) {
i <<= 1;
if (k >= cnt[i])
k -= cnt[i];
}
return i-n; // array index
When n is not a power of two, the node at index one stores the monoid sum of a permutation of the whole array. If the query operation is commutative (e.g. sum/min/max), we can still query node 1; otherwise we need to visit multiple full binary subtrees with increasing array indexes, and compute their monoid sum. Subtree 2 no longer aggregates elements 0 to n/2-1 and subtree 3 no longer aggregate elements n/2 to n-1. The above top-down walk code will be incorrect.
As a rule of thumb, pad the original array if you need top-down walks or the monoid sum of the original array in the node at index 1.
Bottom-up, half-open interval
In 2011, Dmitry Urbanovich wrote https://codeforces.com/blog/entry/1256 and introduced an implementation dealing with closed intervals. There is a great write-up https://codeforces.com/blog/entry/18051 by Oleksandr Bacherikov using half-open intervals.
1 | // Different from the open interval implementation. |
For a while, I used the following style which is similar to
Oleksandr's. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17void modify(int l, int r, int v) {
untag(l-1); // untag accepts an array index
untag(r);
bool lf = false, rf = false;
for (l += NN, r += NN; l < r; ) {
if (l & 1) lf = true, apply(l++, v);
l >>= 1;
if (lf) mconcat(l-1);
if (r & 1) rf = true, apply(--r, v);
r >>= 1;
if (rf) mconcat(r);
}
for (l--; l >>= 1, r >>= 1; ) {
if (lf || l == r) mconcat(l);
if (rf && l != r) mconcat(r);
}
}
I hope you have noticed the pain point now.
mconcat(l-1, k)
in the main loop is a bit unnatural. The
bottom-up propagation (l--
) in the end is particularly
error-prone. Two mconcat
calls instead of one are needed
due to a case which cannot be handled elegantly by the loop condition
l < r
. (If you convert the open interval code to the
half-closed interval style, you shall note that the problem can be fixed
by using (l-1)^r^1
, but then we lose elegance.)
Say n=4, we need to update [1,3). We have l=5 and r=7. After applying
to node 5 and node 6, we get l=r=3. l--
is to ensure node 2
is properly updated.
If we use open intervals, we have l=4 and r=7 initially. After
applying to node 5 and node 6, we get l=2 and r=3. Even if
r-l == 1
, (l^r^1) != 0
! This means with open
intervals and the loop condition l^r^1
, the main loop gets
executed one more time which ensures node 2 and node 3 are updated. This
is a big difference. In addition, with open intervals, we can compute
the parent node index with l>>1
instead of
l = (l+1)>>1
.
The main observation of Oleksandr Bacherikov is that n does not need to be a power of two.
Lazy propagation
In both top-down and bottom-up implementations, we call
untag
. Now let's figure out why untag
is
needed.
Say n=16, the query range is [1,13)
. The program will
visit the green nodes.
1:[0,16) | |||||||||||||||
2:[0,8) | 3:[8,16) | ||||||||||||||
4:[0,4) | 5:[4,8) | 6:[8,12) | 7:[12,16) | ||||||||||||
8:[0,2) | 9:[2,4) | 10:[4,6) | 11:[6,8) | 12:[8,10) | 13:[10,12) | 14:[12,14) | 15:[14,16) | ||||||||
16:0 | 17:1 | 18:2 | 19:3 | 20:4 | 21:5 | 22:6 | 23:7 | 24:8 | 25:9 | 26:10 | 27:11 | 28:12 | 29:13 | 30:14 | 31:15 |
However, if there were previously modifications to the ancestors of these nodes. The to-be-visited nodes may not have up-to-date monoid sums. There is a generic lazy propagation approach and a simpler approach for commutative modifications. Let's consider the generic approach first.
We need to call untag
on the leftmost and rightmost
element in the interval to ensure the monoid sums are up-to-date.
1:[0,16) | |||||||||||||||
2:[0,8) | 3:[8,16) | ||||||||||||||
4:[0,4) | 5:[4,8) | 6:[8,12) | 7:[12,16) | ||||||||||||
8:[0,2) | 9:[2,4) | 10:[4,6) | 11:[6,8) | 12:[8,10) | 13:[10,12) | 14:[12,14) | 15:[14,16) | ||||||||
16:0 | 17:1 | 18:2 | 19:3 | 20:4 | 21:5 | 22:6 | 23:7 | 24:8 | 25:9 | 26:10 | 27:11 | 28:12 | 29:13 | 30:14 | 31:15 |
1:[0,16) | |||||||||||||||
2:[0,8) | 3:[8,16) | ||||||||||||||
4:[0,4) | 5:[4,8) | 6:[8,12) | 7:[12,16) | ||||||||||||
8:[0,2) | 9:[2,4) | 10:[4,6) | 11:[6,8) | 12:[8,10) | 13:[10,12) | 14:[12,14) | 15:[14,16) | ||||||||
16:0 | 17:1 | 18:2 | 19:3 | 20:4 | 21:5 | 22:6 | 23:7 | 24:8 | 25:9 | 26:10 | 27:11 | 28:12 | 29:13 | 30:14 | 31:15 |
Say we have applied f
on a[1]
, next
g
on a[0,2)
, then h
on
a[0,4)
, finally m
on a[0,8)
. The
final value of a[0]
is therefore
m(h(g(f(a[0]))))
. Note that the outer functions apply to
larger intervals.
1 | 1:[0,16) identity |
Say we want to apply another operation f'
to
a[0]
. If the operations commute, i.e.
f'(m(h(g(f(a[0]))))) = m(h(g(f'(f(a[0])))))
, we may ignore
m
, h
and g
on ancestor nodes and
apply f'
directly to a[0]
, essentially
composing f'
and f
.
1 | 1:[0,16) identity |
Then the next a[0]
retrieval query will return
m(h(g(f'(f(a[0])))))
. The classical example of this class
is the add operation. See the next section.
However, if modification operations are not commutative, we cannot
ignore m
, h
and g
. We have to
propagate operations downwards. This process is usually called lazy
propagation. When the operations stores in ancestor nodes are all
identity, we can compose f
and f'
, because an
identity commutes with any operation.
1 | 1:[0,16) identity |
Commutative modifications and query monoid
If both modifications and the query monoid are commutative, we can
apply modifications in a bottom-up manner. We can thus omit
untag
calls. The query function needs an update. Unlike the
untag
implementation, after l^r^1
evaluates to
false, we should keep bubbling up to add in modifications to ancestor
nodes.
Here is an example of range add and range sum. 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
using namespace std;
const int N = 200000;
long sum[N*2], tag[N*2];
int n;
void add(int l, int r, long v) {
l += n-1, r += n;
long lc = 0, rc = 0;
for (int k = 1; l^r^1; l >>= 1, r >>= 1, k <<= 1) {
if (~l&1)
sum[l^1] += v*k, tag[l^1] += v, lc += k;
if (r&1)
sum[r^1] += v*k, tag[r^1] += v, rc += k;
sum[l>>1] += v*lc, sum[r>>1] += v*rc;
}
while (l > 1) {
sum[l>>=1] += v*lc;
sum[r>>=1] += v*rc;
}
}
long get_sum(int l, int r) {
l += n-1, r += n;
long ans = 0, lc = 0, rc = 0;
for (int k = 1; l^r^1; l >>= 1, r >>= 1, k <<= 1) {
if (~l&1)
ans += sum[l^1], lc += k;
if (r&1)
ans += sum[r^1], rc += k;
ans += tag[l>>1]*lc + tag[r>>1]*rc;
}
while (l > 1)
ans += tag[l>>=1]*lc + tag[r>>=1]*rc;
return ans;
}
int main() {
int l, r, v;
char c;
scanf("%d", &n);
REP(i, n)
scanf("%ld", &sum[n+i]);
ROF(i, 1, n)
sum[i] = sum[2*i]+sum[2*i+1];
fill_n(tag, 2*n, 0);
while (scanf("%d%d%c", &l, &r, &c) == 3) {
// [l, r)
if (c == '\n')
printf("%ld\n", get_sum(l, r));
else {
scanf("%d", &v);
add(l, r, v);
}
}
}
Combining monoid sum and lazy tag
This further optimizes based on commutative modifications. 《统计的力量——线段树全接触》 names this 标记永久化.
For range add and range min/max, we can store one value instead of
two in an inner node. Without loss of generality, let's say the query
type is range min. The value is defined as
min_value(i)-min_value(parent(i))
.
1 | void mconcat(int l) { |
Relations with other data structures
Fenwick tree
If the binary operation used by range queries supports subtraction,
we can answer a [l,r)
query with the difference of two
prefix sums sum[0,r) - sum[0,l)
, and therefore switch to a
Fenwick tree (binary indexed tree). Unlike a segment tree
(2n-1
nodes), a Fenwick tree needs exactly n elements.
Binary search tree
A segment tree maintains a static array. If the array is dynamic (insertion/deletion), we can switch to a binary search tree which encodes the array in its in-order traversal. For range queries, a Splay tree and a Treap (with join-based tree algorithms) are common.
A binary search tree needs n nodes, but larger space consumption encoding the left and right children, and potential extra space ensuring self balancing.
Application
- Maintain an array with value updates
- Change the role of values to indexes, maintain n (persistent) segment trees with frequency as value.
- Euler-tour technique
- ...