In research papers, a segment tree refers to a tree data structure allowing retrieving a list of segments which contain the given point. In competitive programming, the name "segment tree" usually refers to a data structure maintaining an array. According to http://web.ntnu.edu.tw/~algo/Sequence2.html, the data structure originated from Baltic OI 2001: Mars Maps.
Let's say we have an array with n elements a[0],a[1],...,a[n1]
. We will refer to the length by n below. There are range modifications and range queries. Out segment tree stores the elements in its leaf nodes, supporting range modifications and range queries in O(log(n)) time. We can make an inorder traversal at any time to retrieve the updated array.
A range modification applies a unique function f
to every element whose index is in the range [l,r)
(note: f
can be parameterized on the index). The most frequently used range modification operations are:
 add an integer (this is commutative, no need to use lazy propagation)
 set to an integer
A range query computes the monoid sum of the elements a[l],a[l+1],...,a[r1]
. Some of the most frequently used binary operations for range queries:
 sum:
a[l]+a[l+1]+...+a[r1]
 max (range maximum query):
max(a[l],a[l+1],...,a[r1])
 min (range minimum query):
max(a[l],a[l+1],...,a[r1])
There are many others. The requirement is that the binary operation is associative and there is an identity element. Strictly speaking, an identity element is not essential, but it makes implementation simple and is trivial to augment once we have a semigroup. For sum/max/min, the binary operations are also commutative. For noncommutative operations, we should watch for the computation order when implementaing a range query.
Each leaf node stores an element in the original array. Two leaf nodes form an inner node which represents an interval of length 2. Next, two nodes with length 2 form an inner node which represents an interval of length 4. Then, two nodes with length 4 form an inner node which represents an interval of length 8. This process is recursive. In the end we have a root node of length n. (This exposition assumes that n is a power of two. If not, some length 2 nodes may join with length 1 nodes.)
For a range modification or range query, the algorithm performs divide and conquer and splits the original interval into several disjoint (but consecutive) subintervals corresponding to nodes. If n is a power of two greater than or equal to 4, any query [l,r)
where 0<=l<r<=n
can be split into 2*(log2(n)1) or fewer nodes. As a worse case example, when n=16, [1,15)
splits into: [1,2) [2,4) [4,8) [8,12) [12,14) [14,15)
.
Representation
The root will represent the whole array [0,n)
. Then we will divide ther interval into halves and the two children of the root will represent [0,n/2)
and [n/2,n)
. For each node we check whether the length is greater than 1, and divide the interval into halves. There are n leaves and n1 internal nodes. The height of the segment tree is O(log(n)).
The tree structure can be represented in multiple ways. Let's start with the verbose explicit representation.
Explicit tree structure
The pointers to the left and right children are explicitly stored in a node. This is usually built in a topdown manner. When the size is not a concern, the [l,r)
interval is stored into the node as well.
This representation is useful when:
 there is a need to pack more than one elements into one node. This may be handy and is difficult to be replaced by an iterative bottomup approach.
 implementating a persistent segment tree.
For other problems, this representation is not recommended.
1  const int N = 1000000, P = 1000000; 
Implicit tree structure
Binary heap representation (BFS layout)
Let the root node's index (BFS index) be 1. For any node i
, its left and right children are 2*i
and 2*i+1
respectively.
This representation wastes the index 0. I prefer zero based numbering (Edsger W. Dijkstra: Why numbering should start at zero). However, the segment tree is an exception. The reason is that if you have the index of a leaf, you can subtract n
from it to get the corresponding array index. While with one based numbering, you need to subtract n1
, which is slightly inconvenient. In addition, getting the index of an ancestor node needs just a right shift operation while with zero based numbering you need one extra plus and one extra minus. The inconvenience will add to the complexity of lazy propagation.
Typically n is a power of 2. The segment tree is a full binary tree. If n is not a power of 2, we can pad the length by adding tail sentinel elements. In the worst case, when n=2^k+1
, we need to pad 2^k1
elements. The space consumption is roughly 4*n
.
1:[0,16)  
2:[0,8)  3:[8,16)  
4:[0,4)  5:[4,8)  6:[8,12)  7:[12,16)  
8:[0,2)  9:[2,4)  10:[4,6)  11:[6,8)  12:[8,10)  13:[10,12)  14:[12,14)  15:[14,16)  
16:0  17:1  18:2  19:3  20:4  21:5  22:6  23:7  24:8  25:9  26:10  27:11  28:12  29:13  30:14  31:15 
When n is not a power of two, this representation can still be used, but note that some indexes will be unused beside index 0. If we set the middle point to floor((l+r)/2)
, the approach uses no less space than tail padding. For the visual representation below for n=7, indexes 8 and 9 are unused.
1:[0,7)  
2:[0,3)  3:[3,7)  
4:0  5:[1,3)  6:[3,5)  7:[5,7)  
10:1  11:2  12:3  13:4  14:5  15:6 
1  void build(int i, int l, int r) { 
If we set the middle point to floor((l+r+1)/2)
, we can use less space. The exact space is not easy to compute but the last index increases very fast when n increases from 2^k+1
to 2^(k+1)
. So it is not worth the trouble writing l+r+1 >> 1
.
1  1:[0,7) 
There is another variant when n is not a power of 2, which will be detailed below when introducing the bottomup open interval implementation.
l+r (inorder layout)
This is an alternative representation for a full binary tree. We can compute the node index by adding its left and right boundaries (inorder traversal indexing).
8:[0,8)  
4:[0,4)  12:[4,8)  
2:[0,2)  6:[2,4)  10:[4,6)  14:[6,8)  
1:0  3:1  5:2  7:3  9:4  11:5  13:6  15:7 
1  void build(int l, int r) { 
This simple rule works because no two node share the same l+r
.
However, if n is not a power of two, we may have a nonleaf node whose l+r
is odd, say, 0+5
. The leftmost node of its right subtree has the same index: (l+r1)/2 + (l+r+1)/2
. Note that such a collision only happens with a leaf node and an internal node.
A naive approach avoiding collision is to special case the index computation for leaf nodes. The space consumption is 3*n
. There are some unused indexes, though.
1  int id(int l, int r) { 
A better encoding costs 2*n space.
1  int id(int l, int r) { 
Alternatively, if we make the tree left leaning, we can use:
1  int id(int l, int r) { 
Anycase, the l+r
based numbering schemes are not cache friendly. For the binary heap representation, the indexes on one level are contiguous. We can imagine that the top few levels are shared and they need fewer blocks.
Implementation
Topdown
Described when introducing the representations.
Bottomup, open interval (recommended)
To the best of my knowledge, it was first discovered by 张昆玮 in 2010 (《统计的力量——线段树全接触》). 张昆玮's representation uses open intervals. I used to find it inconvenient in two places:
 the implementation needs two sentinel elements 0 and n1 (n2 >= original length).
 the implementation uses open intervals. I prefer halfopen intervals.
However, the sentinel elements are not really needed and open intervals turn out to be an advantage. Read on.
The segment tree maintains the an array of length n where n is not necessarily a power of two. 0 and n1 are not necessarily sentinel elements. (In 张昆玮's code, 0 and n1 are sentinel elements. There may be other padding elements at the end.)
For a modification or query [l,r)
, we have 0<=l && r<=n
. (In 张昆玮's code, we have 0<l && r<n
because 0 and n1 are sentinel elements and thus cannot be included in modifications or queries.)
Let's read a full program. The program reads n integers, then either computes a[l]+a[l+1]+...+a[r1]
or adds an integer to a[l],a[l+1],...,a[r1]
.
1  // The program can be easily adapted to other modifications and queries. 
After l += n1, r += n
, l and r have changed semantics from array indexes to leaf node indexes. They now represent an open interval of node indexes. The two initial node indexes are not on the same level if:
 if n is a power of two,
l==0  r==n1
 if n is not a power of two, the first 2**ceil(log2(n))n elements are one level above the rest elements.
(If we use 张昆玮's representation, the two initial nodes are guaranteed to be on the same level.)
In particular, when n is not a power of two, not all leaf nodes (which store array elements) are on the same level. Say n=13, while a[3]~a[12]
are stored in leaf nodes 16~25, a[0]~a[2]
are stored in nodes 13~15 which are one level above. Indexes 26~31 are unused and not considered nodes. The initial r may be the invalid 26 but the program will not access it because 26 is even.
1:[3,13),[0,3)  
2:[3,11)  3:[11,13),[0,3)  
4:[3,7)  5:[7,11)  6:[11,13),0  7:[1,3)  
8:[3,5)  9:[5,7)  10:[7,9)  11:[9,11)  12:[11,13)  13:0  14:1  15:2  
16:3  17:4  18:5  19:6  20:7  21:8  22:9  23:10  24:11  25:12 
Let's discuss the position of the two initial nodes and how they affect the final value of l and r when the loop condition l^r^1
evaluates to false:
 If initial l and r are in subtree 2, we have
l>=4
after the main loop.  If initial l and r are in subtree 3, we have
l>=6
after the main loop..  If initial l is in subtree 2 and initial r is in subtree 3, we have
l==2 && r==3
after the main loop.  (Only if the two initial nodes are on different levels) If initial l is in subtree 3 and initial r is in subtree 2, we have
l==0 && r==1
after the main loop. In a previous iteration, we havel==1 && 2<=r && r<=4
. (r=4 can only happen when n is a power of two and the initial interval is the full interval.)
From the last point, we can see that if initial nodes are not on the same level, there are some subtleties. Fortunately the program can correctly handle such cases without extra code.
Let's analyze some concrete examples when the initial nodes are not on the same level:
Say n=8 and the query is [0,8)
. We have l=7 and r=16. The first iteration in the main loop does not update any node. l becomes 3 and r becomes 8. The second iteration does not update any node. l becomes 1 and r becomes 4. The third iteration does not update any node. l becomes 0 and r becomes 2. The fourth (last) iteration updates node 1. l is still 0 and r becomes 1. The loop condition evaluates to false.
Let's look at another example [0,7)
. We have l=7 and r=15. The first iteration updates node 14. l becomes 3 and r becomes 7. The second iteration updates node 6. l becomes 1 and r becomes 3. The third (last) iteration updates node 2. l becomes 0 and r becomes 1. The loop condition evaluates to false.
When adding an integer to a subtree, we need to know the number of elements in the subtree to compute its influence on the monoid sum. If n is a power of two, we can derive the number of elements in a subtree easily. In the program, we introduce the array num[]
. We initialize nodes [n,2*n)
to have 1 element, and propagate up the numbers.
For many other modifications, we don't need the number of elements in a subtree. When n is not a power of two, there is no extra bookkeeping.
What does the node at index one represent?
When n is a power of two, the node at index one (root) stores the monoid sum of the whole array. Topdown traversals are straightforward. For example, if the array has 0 and 1 elements and we try to find the kth 1 (the select operation in an order statistic tree). We can start from node 1 (it represents the whole array) and walk down from it. If subtree 2 (which corresponds to elements 0 to n/21) has more than k ones, we go to subtree 2, otherwise subtree 3. Keep this process until we reach a leaf.
1  // n is a power of two. 
When n is not a power of two, the node at index one stores the monoid sum of a permutation of the whole array. If the query operation is commutative (e.g. sum/min/max), we can still query node 1; otherwise we need to visit multiple full binary subtrees with increasing array indexes, and compute their monoid sum. Subtree 2 no longer aggregates elements 0 to n/21 and subtree 3 no longer aggregate elements n/2 to n1. The above topdown walk code will be incorrect.
As a rule of thumb, pad the original array if you need topdown walks or the monoid sum of the original array in the node at index 1.
Bottomup, halfopen interval
In 2011, Dmitry Urbanovich wrote https://codeforces.com/blog/entry/1256 and introduced an implementation dealing with closed intervals. There is a great writeup https://codeforces.com/blog/entry/18051 by Oleksandr Bacherikov using halfopen intervals.
1  // Different from the open interval implementation. 
For a while, I used the following style which is similar to Oleksandr's.
1  void modify(int l, int r, int v) { 
I hope you have noticed the pain point now. mconcat(l1, k)
in the main loop is a bit unnatural. The bottomup propagation (l
) in the end is particularly errorprone. Two mconcat
calls instead of one are needed due to a case which cannot be handled elegantly by the loop condition l < r
. (If you convert the open interval code to the halfclosed interval style, you shall note that the problem can be fixed by using (l1)^r^1
, but then we lose elegance.)
Say n=4, we need to update [1,3). We have l=5 and r=7. After applying to node 5 and node 6, we get l=r=3. l
is to ensure node 2 is properly updated.
If we use open intervals, we have l=4 and r=7 initially. After applying to node 5 and node 6, we get l=2 and r=3. Even if rl == 1
, (l^r^1) != 0
! This means with open intervals and the loop condition l^r^1
, the main loop gets executed one more time which ensures node 2 and node 3 are updated. This is a big difference. In addition, with open intervals, we can compute the parent node index with l>>1
instead of l = (l+1)>>1
.
The main observation of Oleksandr Bacherikov is that n does not need to be a power of two.
Lazy propagation
In both topdown and bottomup implementations, we call untag
. Now let's figure out why untag
is needed.
Say n=16, the query range is [1,13)
. The program will visit the green nodes.
1:[0,16)  
2:[0,8)  3:[8,16)  
4:[0,4)  5:[4,8)  6:[8,12)  7:[12,16)  
8:[0,2)  9:[2,4)  10:[4,6)  11:[6,8)  12:[8,10)  13:[10,12)  14:[12,14)  15:[14,16)  
16:0  17:1  18:2  19:3  20:4  21:5  22:6  23:7  24:8  25:9  26:10  27:11  28:12  29:13  30:14  31:15 
However, if there were previously modifications to the ancestors of these nodes. The tobevisited nodes may not have uptodate monoid sums. There is a generic lazy propagation approach and an simpler approach for commutative modifications. Let's consider the generic approach first.
We need to call untag
on the leftmost and rightmost element in the interval to ensure the monoid sums are uptodate.
1:[0,16)  
2:[0,8)  3:[8,16)  
4:[0,4)  5:[4,8)  6:[8,12)  7:[12,16)  
8:[0,2)  9:[2,4)  10:[4,6)  11:[6,8)  12:[8,10)  13:[10,12)  14:[12,14)  15:[14,16)  
16:0  17:1  18:2  19:3  20:4  21:5  22:6  23:7  24:8  25:9  26:10  27:11  28:12  29:13  30:14  31:15 
1:[0,16)  
2:[0,8)  3:[8,16)  
4:[0,4)  5:[4,8)  6:[8,12)  7:[12,16)  
8:[0,2)  9:[2,4)  10:[4,6)  11:[6,8)  12:[8,10)  13:[10,12)  14:[12,14)  15:[14,16)  
16:0  17:1  18:2  19:3  20:4  21:5  22:6  23:7  24:8  25:9  26:10  27:11  28:12  29:13  30:14  31:15 
Say we have applied f
on a[1]
, next g
on a[0,2)
, then h
on a[0,4)
, finally m
on a[0,8)
. The final value of a[0]
is therefore m(h(g(f(a[0]))))
. Note that the outer functions apply to larger intervals.
1  1:[0,16) identity 
Say we want to apply another operation f'
to a[0]
. If the operations commute, i.e. f'(m(h(g(f(a[0]))))) = m(h(g(f'(f(a[0])))))
, we may ignore m
, h
and g
on ancestor nodes and apply f'
directly to a[0]
, essentially composing f'
and f
.
1  1:[0,16) identity 
Then the next a[0]
retrieval query will return m(h(g(f'(f(a[0])))))
. The classicaly example of this class is the add operation. See the next section.
However, if modification operations are not commutative, we cannot ignore m
, h
and g
. We have to propagate operations downwards. This process is usually called lazy propagation. When the operations stores in ancestor nodes are all identity, we can compose f
and f'
, because an identity commutes with any operation.
1  1:[0,16) identity 
Commutative modifications and query monoid
If both modifications and the query monoid are commutative, we can apply modifications in a bottomup manner. We can thus omit untag
calls. The query function needs an update. Unlike the untag
implementation, after l^r^1
evaluates to false, we should keep bubbling up to add in modifications to ancestor nodes.
Here is an example of range add and range sum.
1 

Combining monoid sum and lazy tag
This further optimizes based on commutative modifications. 《统计的力量——线段树全接触》 names this 标记永久化.
For range add and range min/max, we can store one value instead of two in an inner node. Without loss of generality, let's say the query type is range min. The value is defined as min_value(i)min_value(parent(i))
.
1  void mconcat(int l) { 
Relations with other data structures
Fenwick tree
If the binary operation used by range queries supports subtraction, we can answer a [l,r)
query with the difference of two prefix sums sum[0,r)  sum[0,l)
, and therefore switch to a Fenwick tree (binary indexed tree). Unlike a segment tree (2n1
nodes), a Fenwick tree needs exactly n elements.
Binary search tree
A segment tree maintains a static array. If the array is dynamic (insertion/deletion), we can switch to a binary search tree which encodes the array in its inorder traversal. For range queries, a Splay tree and a Treap (with joinbased tree algorithms) are common.
A binary search tree needs n nodes, but larger space consumption encoding the left and right children, and potential extra space ensuring self balancing.
Application
 Maintain an array with value updates
 Change the role of values to indexes, maintain n (persistent) segment trees with frequency as value.
 Eulertour technique
 ...