Data_Structure
Asymptotic Notation
Time & space complexities
顺序结构:直接相加
循环中:复杂度=一次循环的复杂度x循环次数
嵌套循环中:循环规模的乘积x一次循环的复杂度
if/else语句:选其中复杂度最高的    

O上界,$\Omega$ 下界, $\theta$ 相等
斐波那契数列:T N=T N-1 + T N-2
空间复杂度O(N), 时间复杂度:O(2^N)
注意这里有$(logN)^2$ Is O(N) is True. 因为这里它O是指上界,这里$O(N^2)$也是可以的
$logN=2log(\sqrt{N} )<2\sqrt{N}$
$(logN)^2<4N=O(N)$
最大子列和
| 1 | { | 
Linked List
Array
| 操作 | 时间复杂度 | 
|---|---|
| 查找第K个 | O(1) | 
| 插入 | O(N) | 
| 删除 | O(N) | 
Linked List
千万注意题目里的是否是List或者Array!
| 操作 | 时间复杂度 | 
|---|---|
| 插入 | O(1) | 
| 查找第K个 | O(N) | 
| 删除 | O(1) | 
Linked List Sort
| 1 | List BubbleSort(List L) | 
| 1 | typedef struct Node* Nodeptr; | 
Doubly Linked Circular Lists
| 1 | typedef struct node *node_ptr ; | 
Cursor Implementation
数组实现列表
一个数组元素起码包含2个数据,element 和 next
element是本身的数据,next是下一个结点的index
| 1 | struct ListNode{ | 
node[i].next是该元素下一个元素的索引值
node[i].element是该元素的数据
如果在索引是k的结点后面插入一个结点,新结点的索引是n
| 1 | p = node[k].next//记录下该结点后面一个结点的索引 | 
由于缺少内存管理,array很快就会满。
Stack
记得第一个可能是空head
FILO
直接数组实现!
Queue
FIFO
一开始rear指向front前一个元素,表示队列是空的。
插入第一个元素,rear和front指向第一个元素。
以后每插入一个元素,rear往后移动 rear++
注意进入的顺序,从rear(后面)进队列,从front(前方)出队列 front++
dequeue出队
enqueue入队
Circular Queue
注意判断full的条件 front == (rear + 1) % MAXSIZE 表示full了
rear=(++rear)%size
栈的形状是环形,首尾相邻
初始时,rear仍然指向front的前方
注意当入栈达到栈的最大数量,下一个将从0开始入栈
| 1 | 
 | 
Heap
 
heap is 完全二叉树+大顶堆或者小顶堆
N个元素放列表里找第k个 复杂度O(nlogn)
建堆 O(n) ,在原有的数组的基础上,再delete O(k(logn))
假如有N个节点,那么高度为H=log2(N) + 1,最后一层每个父节点最多只需要下调1次,倒数第二层最多只需要下调2次,顶点最多需要下调H次,而最后一层父节点共有2^(H-1)个,倒数第二层公有2^(H-2),顶点只有1(2^0)个,所以总共的时间复杂度为s = 1 * 2^(H-1) + 2 * 2^(H-2) + … + (H-1) * 2^1 + H * 2^0
将H代入后s= 2N - 2 - log2(N),近似的时间复杂度就是O(N)。
| 1 | 
 | 
千万注意,这里的是一次性给出数组,还是一个一个insert数字!看仔细!
| 1 | { | 
B树,B+树
tree
n nodes with n-1 edge
siblings :children of the same parent
length of path :number of edges on the path.
depth of ni length of the unique path from the root to ni.
Depth(root) = 0.
树的高度和深度是相同的。
Depth: 从根节点到当前结点的路径长度,根节点的深度为0.
Height:从叶子结点到当前结点的路径长度,叶子结点的高度为0.空节点的高度为-1
深度为K的二叉树最多拥有 $2^{k+1}-1$个节点
Expression Trees (syntax trees)
Threaded Binary Trees

Dynamic Equivalence Problem
查并集问题
| 1 | //int pre[1000]; (数组长度依题意而定)。这个数组记录了每个人的上级是谁。这些人从0或1开始编号(依题意而定)。比如说pre[16]=6就表示16号的上级是6号。如果一个人的上级就是他自己,那说明他就是教主了,查找到此结束。也有孤家寡人自成一派的,比如欧阳锋,那么他的上级就是他自己。 | 
| 1 | void join(int x,int y) //我想让虚竹和周芷若做朋友 | 
压缩路径
合并的时候直接加到根上,貌似find到时候也得加到根上去。
| 1 | int find(int x) //查找结点 x的根结点 | 
加权标记法:union-by-size
Union(N,N-1):N-1挂在N的下面,树高的作为底,合并树比较低的那个:**– Always change the smaller tree**
S [ Root ] = – size; /* initialized to be –1 */ 表示这棵树有多少结点
| 1 | void smartunion(int x,int y) | 
| 1 | const int N=1005 //指定并查集所能包含元素的个数(由题意决定) | 
Graph
Strongly connected directed graph G ::= for every pair of **vi **and **vj **in V( G ), there exist directed paths from **vi **to vj and from vj to vi. If the graph is connected without direction to the edges, then it is said to be weakly connected
计算点的度数:即链表的结点数。(第一个结点不算)
往往会这样实现:
| 1 | typedef struct AdjVNode *PtrToAdjVNode; | 
另一种方法是结点记录边:
结点的中记录边的两个点,以及该点的下一条边结点
| 1 | typedef struct AdjENode{ | 
| 1 | void Topsort( Graph G ) | 
topological order
拓扑排序!这里是in==0的时候enqueue
| 1 | void Topsort(Graph G) | 
If a graph has a topological sequence, then its adjacency matrix must be triangular.
错的,在无向图中不一定
If precedes in a topological sequence, then there must be a path from to .
错的,不一定有
If the adjacency matrix is triangular, then the corresponding directed graph must have a unique topological sequence.
错的,可以举出反例
Dijkstra算法
最大流量图
Dinic算法
| 1 | int bfs(int s, int t) | 
最小生成树
prim算法
先选定一个点,然后压入栈中,然后遍历所有相邻的点,选取下一个点(边最小的)。。。
Kruskal算法
结构体存两端顶点和边的value,然后排序,从小到大,利用查并集进行归类,注意,Kruskal算法中,uniqueness问题要注意,这里若两个边长相同,且两个顶点分别属于不同的集合,则不唯一。
强连通有向图:任意两个顶点之间存在有向的路径可以到达
图论中一些难以判断的结论(都是对的)
- If e is the only shortest edge in the weighted graph G, then e must be in the minimum spanning tree of G.
- If the BFS sequence of a graph is 1234…, and if there is an edge between vertices 1 and 4, then there must be an edge between the vertices 1 and 3.
- In a directed graph G with at least two vertices, if DFS from any vertex can visit every other vertices, then the topological order must NOT exist.
- Suppose that a graph is represented by an adjacency matrix. If there exist non-zero entries in the matrix, yet all the entries below the diagonal are zeros, then this graph must be a directed graph.
- Kruskal’s algorithm is to grow the minimum spanning tree by adding one edge, and thus an associated vertex, to the tree in each stage.
Biconnectivity
articulation point
v is an articulation point if G’ = DeleteVertex( G, v ) has at least 2 connected components.
就是删掉一个点,回产生两个联通图!connectivity
G is a biconnected graph if G is connected and has no articulation points.
The root is an articulation point iff it has at least 2 children
Any other vertex u is an articulation point iff u has at least 1 child and it is impossible to move down at least 1 step and then jump up to u’s ancestor

biconnected subgraph
A biconnected component is a maximal biconnected subgraph.
联通分量是最大的联通子图


注意这里的low就是自己的祖先的编号或者自己的编号的较小者,所以父亲节点不一定比儿子节点的low小!
Euler circuit
遍历图G中的每一条路径(每条路Edge仅通过一次,不是Vertex)(闭合回路!)
An Euler circuit is possible only if the graph is connected and each vertex has an even degree.
- 无向图存在欧拉回路,当且仅当该图的所有顶点度数都为偶数且连通 
- 有向图存在欧拉回路,当且仅当每一个点的出度等于入度且图要连通 
Euler tour
遍历图G中的每一条路径(每条路Edge仅通过一次,不是Vertex)(不是闭合的回路!)
An Euler tour is possible if there are exactly two vertices having odd degree. One must start at one of the odd-degree vertices.
Hamilton cycle
恰好通过图G的每个Vertex一次
T = O( |E| + |V| )
Sort

Stable: insert,bubble,merge
Insert
从尾巴开始往前插入
| 1 | void InsertionSort ( ElementType A[ ], int N ) | 
The worst case:Input A[ ] is in reverse order. T( N ) = O( N^2 ) 逆序排列
The best case:Input A[ ] is in sorted order. T( N ) = O( N) 正序排列
inversion
An inversion in an array of numbers is any ordered pair ( i, j ) having the property that i < j** but **A[i] > A[j].
Input list 34, 8, 64, 51, 32, 21 has 9 inversions.
(34, 8) (34, 32) (34, 21) (64, 51) (64, 32) (64, 21) (51, 32) (51, 21) (32, 21)
- The average number of inversions in an array of N distinct numbers is N ( N - 1 ) / 4 
- Any algorithm that sorts by exchanging adjacent elements requires $\Omega$ ( **N^2 **) time on average. 
Shell
| 1 | void Shellsort( ElementType A[ ], int N ) | 
The worst-case running time of Shellsort, using Shell’s increments, is $\theta$ ( N^2 ).
注意,本质上就是分块进行Insert排序,这里就是交换的排序,最差的时间复杂度都是O(N^2)
Heap
build heap;O(N)
delete min;O(logN)重复N 次
total: T ( N ) = O ( N log N )
The average number of comparisons used to heapsort a random permutation of N distinct items is 2N log N - O( N log log N ) .
Merge
Mergesort requires linear extra memory, and copying an array is slow. It is hardly ever used for internal sorting, but is quite useful for external sorting.
T ( N ) = 2T ( N / 2 ) + O( N ) = O( N + N log N ) = O(NlogN)
QuickSort
A[ ] is presorted – quicksort will take O( N^2 ) time to do nothing
Worst case:O(N^2)
Best and average case: O(NlogN)
Median-of-Three Partitioning:Pivot = median ( left, center, right )
| 1 | ElementType Median3( ElementType A[ ], int Left, int Right ) | 
千万看仔细!前面部分的是i,后面部分的是j,然后swap的是A[i]与A[Right-1]!也就是最终i停下的地方变成pivot!!!!!
Decision Tree
Any algorithm that sorts by comparisons only must have a worst case computing time of 下界( N log N ).
Bucket Sort
linear time
Radix Sort
基数排序
LSD(Least significant digital):排序方式由数值的最右边(低位)开始
MSD(Most significant digital):由数值的最左边(高位)开始。
LSD
Sort according to the Least Significant Digit first.
注意看这里,是按照后几位排序的,比如125–>27–>729,按照25 27 29排序!

MSD
( Most Significant Digit ) Sort
Hash

Without overflow, Tsearch = Tinsert = Tdelete = O( 1 )
A collision occurs when we hash two nonidentical identifiers into the same bucket, i.e.  f ( i1 ) = f ( i2 ) when i1 不等于 i2 .
An overflow occurs when we hash a new identifier into a full bucket.
Separate Chaining
开散列
Open Addressing
Generally $\lambda$ < 0.5.
Linear Probing
- 冲突就+1
Quadratic Probing
- Insertion will be seriously slowed down if there are too many deletions intermixed with insertions.
Double Hashing
冲突后就加上i*hash2( x )
hash2( x ) = R – ( x % R )
Rehashing
- As soon as the table is half full 
- When an insertion fails 
- When the table reaches a certain load factor 
Usually there should have been N/2 insertions before rehash, so O(N) rehash only adds a constant cost to each insertion. However, in an interactive system, the unfortunate user whose insertion caused a rehash could see a slowdown.
UNSURE

Apply DFS to a directed acyclic graph, and output the vertex before the end of each recursion. The output sequence will be:
reversely topologically sorted????output the vertex before the end of each recursion this means the postorder of DFS
During the sorting, processing every element which is not yet at its final position is called a “run”. To sort a list of integers using quick sort, it may reduce the total number of recursions by processing the small partion first in each run.
False ??
Which of the following statements about HASH is true?
the expected number of probes for insertions is greater than that for successful searches in linear probing method
if the table size is prime and the table is at least half empty, a new element can always be inserted with quadratic probing
in separate chaining method, if duplicate elements are allowed in the list, insertions are generally quicker than deletions
insertions需要查找两次!
all of the above
What is the major difference among lists, stacks, and queues?
Lists are linear structures while stacks and queues are not
Lists use pointers, and stacks and queues use arrays
- Stacks and queues are lists with insertion/deletion constraints(约束,限定) container 容器
Lists and queues can be implemented using circularly linked lists, but stacks cannot
PTA
HW1
The recurrent equations for the time complexities of programs P1 and P2 are:
- P1: T(1)=1,T(N)=T(N/3)+1
- P2: T(1)=1,T(N)=3T(N/3)+1
Then the correct conclusion about their time complexities is:
P1: O(logN)
P2: O(N)
The Fibonacci number sequence {FN} is defined as: F0=0, F1=1, The space complexity of the function which calculates FN recursively is:
Spae complexity is O(N)
Time complexity is O(FN)
HW2
consecutive连续的!!记住数组的内存是连续的!
For a sequentially stored linear list of length N, the time complexities for deleting the first element and inserting the last element are O(1) and O(N), respectively.
liner list 线性表=一种概念,即序列,可以用数组和链表实现,sequentiantlly表示顺序存储,即数组,删第一个是O(N)(所有元素向前移动一个),删最后一个是O(1),F
To merge two singly linked ascending lists, both with N nodes, into one singly linked ascending list, the minimum possible number of comparisons is:
显然是把一个接到另一个后面,即一张表的尾元素小于另一张表的首元素,最小比较一次。但是可能题目的意思是通常情况,从List1的第一个从List2的第一个比到最后一个,O(N)。
HW3
operators操作符
operands操作数
Represent a queue by a singly linked list. Given the current status of the linked list as 1->2->3 where x->y means y is linked after x. Now if 4 is enqueued and then a dequeue is done, the resulting status must be:
2->3->4
Queue FIFO
queue: {1,2,3} –>{1,2,3,4}–>{2,3,4}
Suppose that an array of size 6 is used to store a circular queue, and the values of front and rear are 0 and 4, respectively. Now after 2 dequeues and 2 enqueues, what will the values of front and rear be?
2 and 0
0 1 2 3 4 5
Front 0–>1–>2
rear 4–>5–>0
HW4
结点拥有的子树数称为结点的度Degree
If a general tree T is converted into a binary tree BT, then which of the following BT traversals gives the same sequence as that of the post-order traversal of T?
In-order traversal
Attention: you can make example for preorder, in_order,post_order

HW5
Among the following binary trees, which one can possibly be the decision tree (the external nodes are excluded) for binary search?
1-10共10个数,然后按照n+m/2取上整的顺序插入排序。决策树!
HW6
Insert {5, 2, 7, 3, 4, 1, 6} one by one into an initially empty min-heap. The preorder traversal sequence of the resulting tree is
1, 3, 5, 4, 2, 7, 6
attention Insert!!!we need to build a min-heap whenever
In a max-heap with n (>1) elements, the array index of the minimum key x may be __.
⌊n/2⌋+2
If a d-heap is stored as an array, for an entry located in position i, the parent, the first child and the last child are at
⌊(i+d−2)/d⌋, (i−1)d+2, and id+1
HW7
In Union/Find algorithm, if Unions are done by size, the depth of any node must be no more than N/2, but not O(logN).
false H<=log N+1
Let T be a tree created by union-by-size with N nodes, then the height of T can be .
at most log2(N)+1
这里的Union by size是指当一个高一个低时,低的并到高的树里,若两个树相同,则高度加一,所以树最高为log2(N)+1,准确来说应该是log2(N-1)+1
| 1 | SetType Find ( ElementType X, DisjSet S ) | 
A relation R is defined on a set S. If for every element e in S, “e R e“ is always true, then R is said to be __ over S
reflexive
HW8
If graph G is NOT connected and has 35 edges, then it must have at least ____ vertices.
因为C(9,2)=36,即9个顶点的连通图最多有36条边,35条边也能符合。C(8,2)=28,即8个顶点的联通图最多有28条边,显然是达不到35条边的,因此如果是不连通的,必须加上一个独立的顶点,9 + 1 = 10
A graph with 90 vertices and 20 edges must have at least __ connected component(s).
70??就是相当于69个独立的点,加上1个21个点的连通图,就是最少是69+1=70
20*2=40 50+20=70?
directed graph 就是有相图,有表示方向
Complete graph:完全图,拥有最大的边数
connected graph联通图:最少边的就是树,任意两个点都相互联通。
Strongly connected:强联通,对于有向图,任意两个顶点必须有路径使得v->w,w->v
Weekly connected:对于有向图,其无向图模式满足联通图
(Connected) Component of an undirected G the maximal connected subgraph.
HW9
In a weighted undirected graph, if the length of the shortest path from b to a is 12, and there exists an edge of weight 2 between c and b, then the length of the shortest path from c to a must be no less than 10.
true 若有小于10的话,则距离肯定小于12!
HW10
The minimum spanning tree of any weighted graph ____
may not exist 可以有很多component
HW11
Apply DFS to a directed acyclic graph, and output the vertex before the end of each recursion. The output sequence will be:
reversely topologically sorted????output the vertex before the end of each recursion this means the postorder of DFS
For a graph, if each vertex has an even degree or only two vertexes have odd degree, we can find a cycle that visits every edge exactly once
false Euler path 不是circle
After the first run of Insertion Sort, it is possible that no element is placed in its final position.
True! 区别Insert and select
| 1 | void insertion_sort(int arr[],int len){ | 
HW12
To sort { 8, 3, 9, 11, 2, 1, 4, 7, 5, 10, 6 } by Shell Sort, if we obtain ( 4, 2, 1, 8, 3, 5, 10, 6, 9, 11, 7 ) after the first run, and ( 1, 2, 3, 5, 4, 6, 7, 8, 9, 11, 10 ) after the second run, then the increments of these two runs must be __ , respectively.
3 and 2,距离相差两个
To sort N elements by heap sort, the extra space complexity is:
O(1)
HW13
During the sorting, processing every element which is not yet at its final position is called a “run”. To sort a list of integers using quick sort, it may reduce the total number of recursions by processing the small partion first in each run.
False ??
Among the following sorting methods, which ones will be slowed down if we store the elements in a linked structure instead of a sequential structure?
- Insertion sort; 2. Selection Sort; 3. Bubble sort; 4. Shell sort; 5. Heap sort
Internal sort
内部排序是排序的基础,在排序的过程中,把所有元素调到内存中进行排序,称之为内部排序。
External sort
在数据量大的时候,只能分块排序,但是块和块排序不能保证有序,外排序用读写次数来衡量其效率。
外部排序常用归并,多路归并用的是胜利树和失败树
During the sorting, processing every element which is not yet at its final position is called a “run”. Which of the following cannot be the result after the second run of quicksort?
5, 2, 16, 12, 28, 60, 32, 72
2, 16, 5, 28, 12, 60, 32, 72
2, 12, 16, 5, 28, 32, 72, 60
5, 2, 12, 28, 16, 32, 72, 60
Least Signification Digit (LSD) radix sort
→7→321→28→331→33→34→46→156→57→63
HW14
Suppose that the numbers {4371, 1323, 6173, 4199, 4344, 9679, 1989} are hashed into a table of size 10 with the hash function h(X)=X%10, and hence have indices {1, 3, 4, 9, 5, 0, 2}. What are their indices after rehashing using h(X)=X%TableSize with linear probing?
rehashing double the Tablesize10*2=20
and choose a prime bigger than 20 : 23, so tablesize is 23
1, 12, 9, 13, 20, 19, 11
HW15
The average search time of searching a hash table with N elements is:
search time :cannot be determined
Which of the following statements about HASH is true?
the expected number of probes for insertions is greater than that for successful searches in linear probing method
if the table size is prime and the table is at least half empty, a new element can always be inserted with quadratic probing
in separate chaining method, if duplicate elements are allowed in the list, insertions are generally quicker than deletions
all of the above
Final0
In hashing with open addressing to solve collisions, the operarion FIND will be seriously slowed down if there are too many deletions intermixed with insertions.
True
For a connected graph, if there are exactly two vertices having odd degree, we can find an Euler tour that visits every vertex exactly once by starting from one of its odd-degree vertices.
vertex wrong
edge yes
The average run time and the extra space of Quicksort for sorting n elements are O(nlogn) and O(1), respectively.
False
快排可以说需要的空间为O(1),因为是原地排序,不需要额外空间;也可以说需要的空间是O(n),因为在递归调用时有栈的开销,当然最坏情况是O(n),平均情况是O(logn)。
An inversion in an array A[ ] is any ordered pair (i,j) having the property that i<j but A[i]>A[j]. Given array A: {3,87,12,61,70,26,45},after the first partition of Quicksort with Median3 pivot selection, the number of inversions will be decreased by _____.
3 26 12 45 70 87 61!注意这里最后i是停在3点位置为61,j在2的位置,此时停止,而pivot是45,这里是交换i也就是45与61交换。
left-child right-sibling representation of tree - 左孩子右兄弟表示树
就是把兄弟变成自己的right son!
In hashig with open addressing method,rehashing is definitely necessary when __.
an insertion fails
the hash table is half full
primary clustering occurs
the hash function is not uniform
Final1
对一棵平衡二叉树,所有非叶结点的平衡因子都是0,当且仅当该树是完全二叉树。
False
区别满二树叉与完全二叉树
一棵深度为k且有$2^k-1$个结点的二叉树称为满二叉树
如果对满二叉树的结点进行编号, 约定编号从根结点起, 自上而下, 自左而右。则深度为k的, 有n个结点的二叉树, 当且仅当其每一个结点都与深度为k的满二叉树中编号从1至n的结点一一对应时, 称之为完全二叉树
对给定序列{ 110,119,7,911,114,120,122 }采用次位优先(LSD)的基数排序,则两趟收集后的结果为:
7, 110, 911, 114, 119, 120, 122
0 1 2 3 4 5 6 7 8 9
110 120 911 122 114 7 119
0 1 2 3 4 5 6 7 8 9
7 110 911 114 119 120 122
给定输入序列 {4371, 1323, 6173, 4199, 4344, 9679, 1989} 以及散列函数 h(X)=X%10。如果用大小为10的散列表,并且用分离链接法解决冲突,则输入各项经散列后在表中的下标为:(-1表示相应的插入无法成功)(2分)
1, 3, 3, 9, 4, 9, 9
设数字 {4371, 1323, 6173, 4199, 4344, 9679, 1989} 在大小为10的散列表中根据散列函数 h(X)=X%10得到的下标对应为 {1, 3, 4, 9, 5, 0, 2}。那么继续用散列函数 “h(X)=X%表长”实施再散列并用线性探测法解决冲突后,它们的下标变为:
1, 12, 9, 13, 20, 19, 11 首先表长翻倍=20 取最近的质数23,
DidTerm
If N numbers are stored in a singly linked list in increasing order, then the average time complexity for binary search is O(logN).
false , see the linked list! 是链表!所以都需要全部遍历一遍?
The Fibonacci number sequence {F**N} is defined as: F0=0, F1=1, FN=FN−1+FN−2, N=2, 3, …. The time complexity of the function which calculates F**N recursively is O(FN)
True ??
(logN)^2 is O(N).
True
The array representation of the disjoint sets is given by { 3, 1, -5, 2, 1, -3, -1, 6, 6 }. Keep in mind that the elements are numbered from 1 to 9. After invoking Union(Find(4), Find(8)) with union-by-size and path compression, how many elements will be changed in the resulting array?
union-by-size change 3 and 6, but path compression change 3!!! because the 7 will be change too!
注意!在Find(4)的时候,这里就要考path compression了,因为这一路径下的东西都要接到根节点上,4和2都直接挂到3下面

union-by-size
height<= 1+[logN] 
Always change the smaller tree
union-by-height
Always change the shallow tree 改变高度小的树
threaded tree
线索树,其实非常简单,首先写出inorder 或者postorder或者preorder的顺序,然后看左右subtree是否缺腿,如果缺腿则在上面遍历的表里连接左边或者右边的树!
What is the major difference among lists, stacks, and queues?
Lists are linear structures while stacks and queues are not
Lists use pointers, and stacks and queues use arrays
- Stacks and queues are lists with insertion/deletion constraints(分辨) container 容器
Lists and queues can be implemented using circularly linked lists, but stacks cannot
PAT
1064
1006
| 1 | int cmp(const void *a, const void *b) | 
