> For the complete documentation index, see [llms.txt](https://huataihuang.gitbook.io/cloud-atlas-draft/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://huataihuang.gitbook.io/cloud-atlas-draft/develop/shell/utilities/parallel.md).

# parallel利用多核CPU加速Linux命令

> 待实践积累

要想让Linux命令使用所有的CPU内核，我们需要用到GNU Parallel命令，它让我们所有的CPU内核在单机内做神奇的map-reduce操作，当然，这还要借助很少用到的–pipes 参数(也叫做–spreadstdin)。这样，你的负载就会平均分配到各CPU上。

## BZIP2

```
cat bigfile.bin | bzip2 --best > compressedfile.bz2
```

改进成：

```
cat bigfile.bin | parallel --pipe --recend '' -k bzip2 --best > compressedfile.bz2
```

## GREP

如果你有一个非常大的文本文件，以前你可能会这样：

```
grep pattern bigfile.txt
```

可以这样：

```
cat bigfile.txt | parallel --pipe grep 'pattern'
```

或者这样：

```
cat bigfile.txt | parallel --block 10M --pipe grep 'pattern'
```

这第二种用法使用了 `–block 10M`参数，这是说每个内核处理1千万行——你可以用这个参数来调整每个CUP内核处理多少行数据。

## AWK

下面是一个用awk命令计算一个非常大的数据文件的例子。

常规用法：

```
cat rands20M.txt | awk '{s+=$1} END {print s}'
```

现在这样：

```
cat rands20M.txt | parallel --pipe awk \'{s+=\$1} END {print s}\' | awk '{s+=$1} END {print s}'
```

这个有点复杂：parallel命令中的`–pipe`参数将cat输出分成多个块分派给awk调用，形成了很多子计算操作。这些子计算经过第二个管道进入了同一个awk命令，从而输出最终结果。第一个awk有三个反斜杠，这是GNU parallel调用awk的需要。

## WC

想要最快的速度计算一个文件的行数吗？

传统做法：

```
wc -l bigfile.txt
```

现在你应该这样：

```
cat bigfile.txt | parallel  --pipe wc -l | awk '{s+=$1} END {print s}'
```

非常的巧妙，先使用parallel命令‘mapping’出大量的wc -l调用，形成子计算，最后通过管道发送给awk进行汇总。

## SED

想在一个巨大的文件里使用sed命令做大量的替换操作吗？

常规做法：

```
sed s^old^new^g bigfile.txt
```

现在你可以：

```
cat bigfile.txt | parallel --pipe sed s^old^new^g
```

…然后你可以使用管道把输出存储到指定的文件里。

## 参考

* [如何利用多核CPU来加速你的Linux命令 — awk, sed, bzip2, grep, wc等](http://www.vaikan.com/use-multiple-cpu-cores-with-your-linux-commands/)
* [GNU Parallel Tutorial](https://www.gnu.org/software/parallel/parallel_tutorial.html)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://huataihuang.gitbook.io/cloud-atlas-draft/develop/shell/utilities/parallel.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.