# Assignment 3

## Problem 1

Position yourself in /home/OMIS107/Lecture2. Print the lines of alice.txt that contain at least two words (not necessarily consecutive words) starting with “a”, with the first word being composed of at least 8 more letters after “a”.

``````grep -Ein "\ba[a-z]{8,}.*\ba" alice.txt
``````

## Problem 2

Add “his or her majesty” before the occurrences of the word “the” + word starting with a capital letter and the word “a” + word starting with a capital letter, as in these examples:

• “the Hatter” → “his or her majesty the Hatter”
• “a Caterpillar” → “his or her majesty a Caterpillar”
• “Vanilla Ice cream” → “Vanilla Ice cream” (unchanged because “a” is not a word)
``````sed -r "s/\b(the|a)\b [A-Z]/his or her majesty &/g" alice.txt
sed -r "s/(\bthe\b|\ba\b) [A-Z]/his or her majesty &/g" alice.txt
``````

``````sed -r "s/\b(the|a)\b [A-Z]/his or her majesty &/g" alice.txt
``````

`\b(the|a)\b [A-Z]` 是正则表达式，`\b` 是一个词边界符，表示前面和后面应是一个词的开始或结束。`the|a` 表示匹配 "the" 或者 "a"。 `[A-Z]` 表示匹配任何一个大写字母。所以这个表达式匹配的是以 "the" 或 "a" 开头，后面紧跟一个大写字母的字符串。

`his or her majesty &` 是替换模式。`&` 在这里代表了前面正则表达式匹配到的完整内容。所以，这个命令将会把 "the" 或 "a" 后面直接跟着大写字母的所有字符串替换成 "his or her majesty " 加上原来匹配到的字符串。

`g` 是一个选项，表示对每一行的所有匹配项进行替换，而不仅仅是每一行的第一个匹配项。

``````sed -r "s/(\bthe\b|\ba\b) [A-Z]/his or her majesty &/g" alice.txt
``````

## Problem 3

To emphasize a verb in past tense, one could add “did” before it and turn the verb into present tense. Assume that all words ending with “ed” are verbs in past tense (e.g., invented, asked). Find all words ending in “ed”, add “did” before them and turn them into present tense. For example:

• “You invented it” → “you did invent it”
• “They need.” → “They did ne.” (do not worry about whether removing “ed” is grammatically incorrect)
``````sed -r "s/([a-z]*)ed\b/did \1/g" alice.txt
sed -r "s/([a-z]+)ed\b/did \1/g" alice.txt
``````

`sed -r "s/([a-z]*)ed\b/did \1/g" alice.txt`

`sed -r "s/([a-z]+)ed\b/did \1/g" alice.txt`

• "*" 表示前面的元素可以重复零次或多次。也就是说，"`([a-z]*)ed`" 可以匹配到 "ed" (因为在这里，小写字母可以出现零次，即不出现)。

• "+" 表示前面的元素可以重复一次或多次。也就是说，"`([a-z]+)ed`" 只能匹配到至少有一个小写字母接着 "ed" 的字符串。

• 如果文件 alice.txt 包含单词 "walked"，那么两个命令都会将其替换为 "did walk"。
• 如果文件 alice.txt 包含单词 "ed"，那么第一个命令会将其替换为 "did "，而第二个命令则不会有任何操作，因为 "ed" 前没有至少一个小写字母。

``````sed -r "s/([a-z]*)ed\b/did \1/g" alice.txt
sed -r "s/([a-z]+)ed\b/did \1/g" alice.txt
``````

`\1` 是对正则表达式中 `([a-z]*)``([a-z]+)` 的引用，这两个子模式匹配到的是以 "ed" 结尾的单词的根形式（不包括 "ed"）。所以，当用 "did \1" 替换匹配到的单词时，`\1` 就会被替换为这个单词的根形式。

## Problem 4

The goal of this exercise is to replace the occurrences in alice.txt that contain the wordsaid+ other words + punctuation mark with ‘said someone’ + punctuation mark. By “punctuation mark”, we mean any of the following characters ,.:;

Some examples:

• Line 3300: 'I won't!' said Alice. → 'I won't!' said someone.

• Line 3314: 'Wake up, Alice dear!' said her sister; 'Why, what a long sleep you've

• 'Wake up, Alice dear!' said someone; 'Why, what a long sleep you've
• Line 396: but it said nothing. → but it said someone.

If the word “said” is followed by more than one punctuation mark in the same line, make sure to replace only up to the first punctuation mark. That is, the following replacement for example 2 is wrong:

'Wake up, Alice dear!' said someone, what a long sleep you've

• `-r`：这个参数告诉 `sed` 使用扩展的正则表达式语法。

• `"s/\bsaid\b[ a-zA-Z]+([,.:;])/said someone\1/g"`: 这是一个 `sed``s`（替换）命令，它使用了正则表达式。

• `\bsaid\b``\b` 是单词边界元字符，它确保 "said" 是单独的一个单词，而不是其他单词的一部分，比如 "unsaid"。

• `[ a-zA-Z]+`：匹配一个或多个空格后跟的一个或多个英文字母。

• `([,.:;])`：这个表达式匹配一些标点符号（逗号、句号、冒号、分号）。这里的括号表示一个分组，这个分组可以在替换部分使用。

• `said someone\1`：这是替换部分的模式。`\1` 代表前面匹配的第一个分组，也就是前面的标点符号。

• `g`：这是一个全局标志，它意味着替换所有匹配的部分，而不仅仅是每行的第一个匹配。

