09-Lab 9:Time complexity. More practice on debugging and programming
1. Objectives
The purpose of this week's lab is to:
- Analyse time complexity of certain programs.
- Practice debugging code again.
- Practice more programming problems.
If you have any questions about any of the material in the previous labs, now is a good time to ask your tutor for help during the lab.
相关信息
If you do not have time to finish all exercises (in particular, the programming problems) during the lab time, you should continue working on them later.
2. Time complexity
As we have seen in the lecture last week, the big-O notation is used to denote the worst-case time complexity of an algorithm to solve a certain problem. If the problem size is nn and the algorithm needs numerical operations to solve it, where cc is a constant number (thus independent of ), we say that the algorithm has a linear time complexity . Here, we assume that each operation has a constant time complexity . (Otherwise, the algorithm won't have a linear complexity anymore.)
For example, the following function finds the index of an element in a sequence, (returning -1 if the element is not found):
def find_element(sequence, element):
for i in range(len(sequence)):
if sequence[i] == element:
return i
return -1It has time complexity of , where nn is the length of the sequence. This is because the main for loop needs nn comparisons in the worst-case, namely when the element we are looking for does not appear in the sequence.
As a rule of thumb, a for or while loop with nn iterations indicates that the code may have a complexity of . If there are two loops nested within each other, the code might have a quadratic time complexity of .
Let's look at an example: the following code computes the minimum absolute difference between any pairs of elements in a numerical sequence:
def smallest_difference(sequence):
min_diff = abs(sequence[0]-sequence[1])
for i in range(len(sequence)-1):
for j in range(i+1, len(sequence)):
diff = abs(sequence[i] - sequence[j])
if diff < min_diff:
min_diff = diff
return min_diffLet's analyse the time complexity of this code. Let nn be the length of the input sequence. The outer for loop has iterations using index . In the first iteration of the outer loop, i.e., when , the inner loop iterates in range(1,n), each doing four operations (subtraction, abs, comparison, and assignment), totalling operations. In the 2nd iteration of the outer loop (), the inner loop iterates over range(2,n), which means iterations. And so on. Summing all together, the total number of operations is:
Now, when is large enough, the term will dominate over the nn term. We can also ignore the constant multiplying this leading term. That means that the function smallest_difference has time complexity .
3. Exercise 1.0
Each of the following three functions takes as input an integer n. For each function, give its computational complexity in big-O notation in terms of n.
It is important you to format your answer exactly. If the algorithm has linear time complexity, you should answer:
If the algorithm has cubic time complexity, then either of the following are acceptable:
Other possible time complexities should be formatted similarly.
3.1 Question 1
def func_a(n):
total = 0
for i in range(n):
for j in range(n):
total = total + i * j
return total3.2 Question 2
def fun_b(n):
total = 0
for i in range(100 * n):
for j in range(10):
total = total + i * j
return total外层
100*n、内层常数10,常数忽略:
3.3 Question 3
def fun_c(n):
total = 0
for i in range(100):
for j in range(n):
total = total + i * j
for k in range(n):
total = total + i * k
for l in range(100):
total = total + i * l
return l外层常数
100;内有两个range(n)和一个常数range(100),总体仍线性于n:
4. Exercise 1.1: Efficient Implementation of smallest_difference
It turns out that there is another algorithm for smallest_difference with a time complexity of only . Can you figure it out? Write your solution in the scaffold file smallest_difference.py with that time complexity.
相关信息
After thinking about it and you still can't find it out, click the spoiler below for a hint!
Expand: What if the sequence is sorted? What is the time for sorting?
smallest_difference.py
def smallest_difference(sequence):
# TODO: Find the smallest difference
# between numbers in the sequence
# in worst-case O(n log(n)) time.
passdef smallest_difference(sequence):
"""
Return the smallest absolute difference between any two numbers in sequence.
Worst-case time: O(n log n) due to sorting; the scan is O(n).
Assumes len(sequence) >= 2. Raises ValueError otherwise.
"""
if len(sequence) < 2:
raise ValueError("sequence must contain at least two elements")
seq = sorted(sequence)
min_diff = abs(seq[1] - seq[0])
for i in range(2, len(seq)):
d = abs(seq[i] - seq[i - 1])
if d < min_diff:
min_diff = d
if min_diff == 0: # early exit: duplicate values
return 0
return min_diff5. Exercise 1.2: Largest Difference
Here is an implementation to compute the maximum absolute difference in a similar style to smallest_difference:
def largest_difference(sequence):
max_diff = abs(sequence[0]-sequence[1])
for i in range(1, len(sequence)):
for j in range(i):
diff = abs(sequence[i] - sequence[j])
if max_diff < diff:
max_diff = diff
return max_diffIt has a time complexity of . You can of course use the same idea from exercise 1.1 to reduce the complexity to .
However, it turns out there is another algorithm with linear time complexity: . Can you work it out? Provide a solution to the largest_difference function with that time complexity.
largest_difference.py
def largest_difference(sequence):
# TODO: Find the largest difference
# between numbers in the sequence
# in worst-case O(n log(n)) time
pass性质:最大绝对差值 = max(sequence) - min(sequence)(单次线性扫描即可)。
def largest_difference(sequence):
"""
Return the largest absolute difference between any two numbers in sequence.
Worst-case time: O(n).
Assumes len(sequence) >= 2. Raises ValueError otherwise.
"""
if len(sequence) < 2:
raise ValueError("sequence must contain at least two elements")
it = iter(sequence)
first = next(it)
min_val = max_val = first
for x in it:
if x < min_val:
min_val = x
elif x > max_val:
max_val = x
return max_val - min_val6. Exercise 1.3: Visualise Runtimes
We now want to plot the real running times of these algorithms on a computer. The provided program plots the running time for the smallest_difference solution.
Copy in your solution to exercise 1.1 and compare the graphs.
相关信息
To see the graphs on EdStem, enter the terminal and type in python plot_time.py to run the file.
Bonus (if you have extra time): compare the solution of the largest difference problem to the solution of the largest difference problem.
plot_time.py
import random
import time
import matplotlib.pyplot as plt
def plot_time(file_name, func, steps):
seq = [random.random() for i in range(steps)]
elapsed_times = []
for n in range(2,steps):
start_time = time.time()
# run 10 times to reduce fluctuation
for i in range(10):
func(seq[:n])
end_time = time.time()
elapsed_times.append((end_time - start_time)/10)
plt.figure()
plt.plot(range(2,steps),elapsed_times)
plt.xlabel("Size of Sequence")
plt.ylabel("Running Time (s)")
plt.savefig(file_name)
plt.close()
print("Total runtime: " + str(sum(elapsed_times)) + " seconds")
def smallest_difference_slow(sequence):
min_diff = abs(sequence[0]-sequence[1])
for i in range(len(sequence)-1):
for j in range(i+1, len(sequence)):
diff = abs(sequence[i] - sequence[j])
if diff < min_diff:
min_diff = diff
return min_diff
if __name__ == "__main__":
plot_time("smallest_difference_slow_300.png", smallest_difference_slow, 300)import random
import time
import matplotlib.pyplot as plt
def plot_time(file_name, func, steps):
seq = [random.random() for _ in range(steps)]
elapsed_times = []
for n in range(2, steps):
start_time = time.time()
# run 10 times to reduce fluctuation
for _ in range(10):
func(seq[:n])
end_time = time.time()
elapsed_times.append((end_time - start_time) / 10)
plt.figure()
plt.plot(range(2, steps), elapsed_times)
plt.xlabel("Size of Sequence")
plt.ylabel("Running Time (s)")
plt.savefig(file_name)
plt.close()
print("Total runtime:", sum(elapsed_times), "seconds")
def smallest_difference_slow(sequence):
# O(n^2)
if len(sequence) < 2:
raise ValueError("sequence must contain at least two elements")
min_diff = abs(sequence[0] - sequence[1])
for i in range(len(sequence) - 1):
for j in range(i + 1, len(sequence)):
diff = abs(sequence[i] - sequence[j])
if diff < min_diff:
min_diff = diff
return min_diff
def smallest_difference_fast(sequence):
# O(n log n): sort + scan neighbors
if len(sequence) < 2:
raise ValueError("sequence must contain at least two elements")
seq = sorted(sequence)
min_diff = abs(seq[1] - seq[0])
for i in range(2, len(seq)):
d = abs(seq[i] - seq[i - 1])
if d < min_diff:
min_diff = d
if min_diff == 0:
return 0
return min_diff
if __name__ == "__main__":
plot_time("smallest_difference_slow_300.png", smallest_difference_slow, 300)
plot_time("smallest_difference_fast_300.png", smallest_difference_fast, 300)7. Debugging problems
Learning to read, understand and debug code is a very important skill. There may be a debugging question in the final exam. So here are some debugging exercises to practice this skill.
7.1 Exercise 2.1
The following are attempts to define a function that takes three (numeric) arguments and checks if any one of them is equal to the sum of the other two. For example, any_one_is_sum(1, 3, 2) should return True (because 3 == 1 + 2), while any_one_is_sum(0, 1, 2) should return False.
However, the function below is incorrect.
- Find examples of arguments that cause it to return the wrong answer.
- Can you work out how they are intended to work? That is, what was the idea of the programmer who wrote them? What comments would be useful to add to explain the thinking? Is it possible to fix them by making only a small change to each function?
7.1.1 Question 1
def any_one_is_sum(a,b,c):
sum_c=a+b
sum_b=a+c
sum_a=b+c
if sum_c == a+b:
if sum_b == c+a:
if sum_a == b+c:
return True
else:
return False详情
现象:几乎对任意输入都返回
True,因为它把“和”的变量又和同一表达式比较(恒真),例如(1,2,10)本应False,却返回True。意图:应检查是否有一个数等于另两个数之和。
修正版(最小改动 + 明确逻辑):
def any_one_is_sum(a, b, c):
# True if ANY of: c == a + b, b == a + c, a == b + c
if c == a + b:
return True
if b == a + c:
return True
if a == b + c:
return True
return False7.1.2 Question 2
def any_one_is_sum(a,b,c):
if b + c == a:
print(True)
if c + b == a:
print(True)
else:
print(False)
return False详情
- 问题:使用
print而非return;第二个if的else只对应第二个判断;函数总是返回False。 - 例子:
(2,1,1)本应True,函数打印一些东西后仍返回False。
修正版(保持风格但正确返回):
def any_one_is_sum(a, b, c):
if a == b + c or b == a + c or c == a + b:
return True
else:
return False7.1.3 Question 3
def any_one_is_sum(a, b, c):
if a+b==c and a+c==b:
return True
else:
return False详情
- 问题:用了
and,要求同时满足两条等式,几乎不可能(除非都为 0)。 - 例子:
(1,3,2)本应True,却返回False。
修正版(把 and 改为 or):
def any_one_is_sum(a, b, c):
return (a + b == c) or (a + c == b) or (b + c == a)7.2 Exercise 2.2
Here is a function that is meant to return the position (index) of a given element in a sequence; if the element does not appear in the sequence, it returns the length of the sequence. For example, find_element([3,2,1,4], 1) should return 2, since that is the index where we find a 1.
def find_element(sequence, element):
i = 0
while sequence[i] != element:
if i < len(sequence):
i = i + 1
i = i + 1
return iHowever, the function is not correct. For some inputs it will cause a runtime error. Find an example of arguments that cause an error to occur. Can you correct the error without introducing another?
详情
- 错误示例:
find_element([1,2,3], 99)触发IndexError。原因:i每轮可能加两次,并且没有在访问前检查i < len(sequence)。 - 正确思路:循环条件里同时检查没越界且尚未命中。
修正版:
def find_element(sequence, element):
i = 0
while i < len(sequence) and sequence[i] != element:
i += 1
return i # 若未找到,i == len(sequence)可选:若你希望“未找到时返回 -1”,把最后一行改为:
return i if i < len(sequence) else -1。
7.3 Exercise 2.3
Here is a function that is meant to count the number of repetitions of a substring in a string. For example, if the string is 'abcdabcf' and the substring is 'ab', the count should be 2. (Of course, for any of the substrings 'a', 'b', 'c', 'ab', 'bc' or 'abc', the count should be 2.)
def count_repetitions(string, substring):
'''
Count the number of repetitions of substring in string. Both
arguments must be strings.
'''
n_rep = 0
# p is the next position in the string where the substring starts
p = string.find(substring)
# str.find returns -1 if the substring is not found
while p >= 0:
n_rep = n_rep + 1
# find next position where the substring starts
p = string[p + 1:len(string) - p].find(substring)
return n_repHowever, the function is not correct. For some inputs it will return the wrong answer, and for some inputs it will get stuck in an infinite loop. Find examples of arguments that cause these errors to happen.
Among the many string methods, there is one that does what this function is meant to do: 'abcdabcf'.count('abc') will return 2. Can you fix the function above without using the string count method, i.e., keeping the idea of the original function and only correcting the error?
详情
原始代码会:
- 用切片
string[p + 1:len(string) - p]继续找,索引不再对应原串,导致计数错误; - 对某些输入可能死循环;
- 对空子串
''也会异常。
错误示例:
count_repetitions('aaaa', 'aa')期望(若允许重叠)是3(位置 0、1、2),原实现会算错。count_repetitions('abc', '')可能异常或逻辑混乱。
修正版(保留“从上次起点+1 开始查找”= 允许重叠):
def count_repetitions(string, substring):
'''
Count the number of repetitions of substring in string (allow overlaps).
'''
if not isinstance(string, str) or not isinstance(substring, str):
raise TypeError("Both arguments must be strings")
if substring == "":
# 常见约定:空子串次数要么定义为 0,要么为 len(string)+1。
# 这里统一返回 0,避免歧义和潜在死循环。
return 0
n_rep = 0
p = string.find(substring)
while p >= 0:
n_rep += 1
p = string.find(substring, p + 1) # 允许重叠:下一次从 p+1 开始
return n_rep若你需要不允许重叠(与 str.count 一致),把 p + 1 改为 p + len(substring)。
7.4 Exercise 2.4
Here is another string function. This function is meant to remove all occurrences of a substring from a string, and return the result. For example, if the string is 'abcdabcf' and the substring is 'bc', we want the function to return 'adaf'.
def remove_substring_everywhere(string, substring):
'''
Remove all occurrences of substring from string, and return
the resulting string. Both arguments must be strings.
'''
p = string.find(substring)
lsub = len(substring) # length of the substring
while p >= 0:
string[p : len(string) - lsub] = string[p + lsub : len(string)]
p = string.find(substring)
return stringThis function is also not correct, and should cause a runtime error on almost any input. Like in the previous problem, try to make the function do what it is supposed to while keeping the idea of the original function Can you also find a string method that does (or can be made to do) what this function is intended to do?
详情
- 问题:对字符串做切片赋值(
string[...] = ...)——Python 字符串是不可变的,会 TypeError。 - 正确思路:每次找到位置
p,用切片构造新字符串并继续查找。
修正版(不允许重叠/允许都可以,下面示例不重叠):
def remove_substring_everywhere(string, substring):
'''
Remove all occurrences of substring from string and return the result.
'''
if not isinstance(string, str) or not isinstance(substring, str):
raise TypeError("Both arguments must be strings")
if substring == "":
return string # 约定:移除空子串无意义,直接返回原串
lsub = len(substring)
p = string.find(substring)
while p >= 0:
string = string[:p] + string[p + lsub:]
p = string.find(substring)
return string等价库方法:
string.replace(substring, "")一行即可完成同样的工作(默认不重叠地移除全部出现)。
小测试(可选)
assert smallest_difference([5, 2, 9, 4]) == 1 # 4 与 5
assert largest_difference([5, 2, 9, 4]) == 7 # 9 - 2
assert any_one_is_sum(1, 3, 2) is True
assert any_one_is_sum(0, 1, 2) is False
assert find_element([3,2,1,4], 1) == 2
assert find_element([3,2,1,4], 99) == 4
assert count_repetitions('abcdabcf', 'ab') == 2
assert count_repetitions('aaaa', 'aa') == 3 # 允许重叠的版本
assert remove_substring_everywhere('abcdabcf', 'bc') == 'adaf'公众号:AI悦创【二维码】

AI悦创·编程一对一
AI悦创·推出辅导班啦,包括「Python 语言辅导班、C++ 辅导班、java 辅导班、算法/数据结构辅导班、少儿编程、pygame 游戏开发、Web、Linux」,招收学员面向国内外,国外占 80%。全部都是一对一教学:一对一辅导 + 一对一答疑 + 布置作业 + 项目实践等。当然,还有线下线上摄影课程、Photoshop、Premiere 一对一教学、QQ、微信在线,随时响应!微信:Jiabcdefh
C++ 信息奥赛题解,长期更新!长期招收一对一中小学信息奥赛集训,莆田、厦门地区有机会线下上门,其他地区线上。微信:Jiabcdefh
方法一:QQ
方法二:微信:Jiabcdefh

更新日志
3fa69-于17783-于efec4-于3346e-于dcfcd-于