Rust-Ripgrep all: ripgrep-all — ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.

github repo Crates.io fearless concurrency

For more detail, see this introductory blogpost: https://phiresky.github.io/blog/2019/rga--ripgrep-for-zip-targz-docx-odt-epub-jpg/

rga will recursively descend into archives and match text in every file type it knows.

Here is an example directory with different file types:

demo/
├── greeting.mkv
├── hello.odt
├── hello.sqlite3
└── somearchive.zip
├── dir
│ ├── greeting.docx
│ └── inner.tar.gz
│ └── greeting.pdf
└── greeting.epub

rga output

Integration with fzf

rga-fzf

You can use rga interactively via fzf. Add the following to your ~/.{bash,zsh}rc:

rga-fzf() {
	RG_PREFIX="rga --files-with-matches"
	local file
	file="$(
 FZF_DEFAULT_COMMAND="$RG_PREFIX '$1'" \
 fzf --sort --preview="[[ ! -z {} ]] && rga --pretty --context 5 {q} {}" \
 --phony -q "$1" \
 --bind "change:reload:$RG_PREFIX {q}" \
 --preview-window="70%:wrap"
 )" &&
	echo "opening $file" &&
	xdg-open "$file"
}

INSTALLATION

Linux x64, macOS and Windows binaries are available in GitHub Releases.

Linux

On Arch Linux, you can simply install from AUR: yay -S ripgrep-all.

On Debian-based distributions you can download the rga binary and get the dependencies like this:

apt install ripgrep pandoc poppler-utils ffmpeg

If ripgrep is not included in your package sources, get it from here.

rga will search for all binaries it calls in $PATH and the directory itself is in.

Windows

Install ripgrep-all via Chocolatey:

choco install ripgrep-all

If you get an error like VCRUNTIME140.DLL could not be found, you need to install vc_redist.x64.exe.

Homebrew/Linuxbrew

rga can be installed with Homebrew:

brew install rga

To install the dependencies that are each not strictly necessary but very useful:

brew install pandoc poppler tesseract ffmpeg

Compile from source

rga should compile with stable Rust (v1.36.0+, check with rustc --version). To build it, run the following (or the equivalent in your OS):

   ~$ apt install build-essential pandoc poppler-utils ffmpeg ripgrep cargo
   ~$ cargo install ripgrep_all
   ~$ rga --version    # this should work now

Available Adapters

rga --rga-list-adapters

Adapters:

  • ffmpeg Uses ffmpeg to extract video metadata/chapters and subtitles
    Extensions: .mkv, .mp4, .avi
  • pandoc Uses pandoc to convert binary/unreadable text documents to plain markdown-like text
    Extensions: .epub, .odt, .docx, .fb2, .ipynb
  • poppler Uses pdftotext (from poppler-utils) to extract plain text from PDF files
    Extensions: .pdf
    Mime Types: application/pdf

  • zip Reads a zip file as a stream and recurses down into its contents
    Extensions: .zip
    Mime Types: application/zip

  • decompress Reads compressed file as a stream and runs a different extractor on the contents.
    Extensions: .tgz, .tbz, .tbz2, .gz, .bz2, .xz, .zst
    Mime Types: application/gzip, application/x-bzip, application/x-xz, application/zstd

  • tar Reads a tar file as a stream and recurses down into its contents
    Extensions: .tar

  • sqlite Uses sqlite bindings to convert sqlite databases into a simple plain text format
    Extensions: .db, .db3, .sqlite, .sqlite3
    Mime Types: application/x-sqlite3

The following adapters are disabled by default, and can be enabled using '--rga-adapters=+pdfpages,tesseract':

  • pdfpages Converts a pdf to its individual pages as png files. Only useful in combination with tesseract
    Extensions: .pdf
    Mime Types: application/pdf

  • tesseract Uses tesseract to run OCR on images to make them searchable. May need -j1 to prevent overloading the system. Make sure you have tesseract installed.
    Extensions: .jpg, .png

USAGE:

rga [RGA OPTIONS] [RG OPTIONS] PATTERN [PATH ...]

FLAGS:

--rga-accurate

Use more accurate but slower matching by mime type

By default, rga will match files using file extensions. Some programs, such as sqlite3, don't care about the file extension at all, so users sometimes use any or no extension at all. With this flag, rga will try to detect the mime type of input files using the magic bytes (similar to the `file` utility), and use that to choose the adapter. Detection is only done on the first 8KiB of the file, since we can't always seek on the input (in archives).

-h, --help

Prints help information

--rga-list-adapters

List all known adapters

--rga-no-cache

Disable caching of results

By default, rga caches the extracted text, if it is small enough, to a database in ~/.cache/rga on Linux, ~/Library/Caches/rga on macOS, or C:\Users\username\AppData\Local\rga on Windows. This way, repeated searches on the same set of files will be much faster. If you pass this flag, all caching will be disabled.

--rg-help

Show help for ripgrep itself

--rg-version

Show version of ripgrep itself

-V, --version

Prints version information

OPTIONS:

--rga-adapters=<adapters>...

Change which adapters to use and in which priority order (descending)

"foo,bar" means use only adapters foo and bar. "-bar,baz" means use all default adapters except for bar and baz. "+bar,baz" means use all default adapters and also bar and baz.

--rga-cache-compression-level=<cache-compression-level>

ZSTD compression level to apply to adapter outputs before storing in cache db

Ranges from 1 - 22 [default: 12]

--rga-cache-max-blob-len=<cache-max-blob-len>

Max compressed size to cache

Longest byte length (after compression) to store in cache. Longer adapter outputs will not be cached and recomputed every time. Allowed suffixes: k M G [default: 2000000]

--rga-max-archive-recursion=<max-archive-recursion>

Maximum nestedness of archives to recurse into [default: 4]

-h shows a concise overview, --help shows more detail and advanced options.

All other options not shown here are passed directly to rg, especially [PATTERN] and [PATH ...]

Development

To enable debug logging:

export RUST_LOG=debug
export RUST_BACKTRACE=1

Also remember to disable caching with --rga-no-cache or clear the cache (~/Library/Caches/rga on macOS, ~/.cache/rga on other Unixes, or C:\Users\username\AppData\Local\rga on Windows) to debug the adapters.

Comments

  • Error: Unknown adapter:
    Error: Unknown adapter: "poppler"

    Jun 13, 2021

    Installed on current master branch

    cargo +nightly install --path .
    

    MacBook Air M1 Big Sur 11.4

    ❯ rustup show
    Default host: aarch64-apple-darwin
    rustup home:  /Users/yingzhu/.rustup
    
    installed toolchains
    --------------------
    
    beta-aarch64-apple-darwin (default)
    nightly-aarch64-apple-darwin
    
    active toolchain
    ----------------
    
    beta-aarch64-apple-darwin (default)
    rustc 1.52.0-beta.6 (f97769a2b 2021-04-27)
    

    It doesn't search PDFs or anything else - looks like it doesn't have the right adapters

    ❯ rga --rga-list-adapters
    Adapters:
    
     - **zip**
         Reads a zip file as a stream and recurses down into its contents
         Extensions: .zip
         Mime Types: application/zip
    
    The following adapters are disabled by default, and can be enabled using '--rga-adapters=+pdfpages,tesseract':
    
    
    Reply
  • Getting error:
    Getting error: "adapter: poppler Error: Broken pipe"

    Aug 19, 2021

    After the command rga --files-with-matches "search-term", I get for almost all the pdf files I have in my directory the error:

    name-of-file.pdf:
    adapter: poppler
    Error: Broken pipe (os error 32)
    

    For some files it also says:

    adapter: poppler
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/adapters/spawning.rs:93:53
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    thread 'main' panicked at 'called `Result::unwrap()` on
    an `Err` value: Any { .. }', src/adapters/spawning.rs:98:6
    
    

    I have no idea what causes this error.

    Reply
  • Failure on searching epub with pandoc deprecation warning
    Failure on searching epub with pandoc deprecation warning

    Aug 26, 2021

    Hello, I have encountered the following failure

    $ rga "<search>" <filename>.epub
    <filename>.epub: preprocessor command failed: '"/home/<user>/.local/bin/rga-preproc" "<filename>.epub"':
    -------------------------------------------------------------------------------
    adapter: pandoc
    [WARNING] Deprecated: --atx-headers. Use --markdown-headings=atx instead.
    parseSpine
    Error: subprocess failed: ExitStatus(ExitStatus(16384))
    -------------------------------------------------------------------------------
    

    I am using pandoc 2.11.2. My OS is Fedora 34.

    Reply
  • Workaround or fix for unsupported zip archive:
    Workaround or fix for unsupported zip archive: "Encrypted files are not supported"

    Sep 14, 2021

    I'm trying to look at files in a particular archive that has a number of other archives in it, and one of them is an encrypted zip file that causes the preprocessor to blow up. Is it possible to work around that or fix it? I don't care about searching the encrypted archive. Thanks for the great tool!

    $ rga 'search string' ctrllog.tgz
    rootvarlog.tgz: preprocessor command failed: '"~/bin/rga-preproc" "rootvarlog.tgz"': 
    -------------------------------------------------------------------------------
    adapter: decompress
    adapter: tar
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: decompress
    adapter: zip
    adapter: zip
    Error: Unsupported Zip archive: Encrypted files are not supported
    -------------------------------------------------------------------------------
    
    Reply
  • text searching through gpg encrypted documents?
    text searching through gpg encrypted documents?

    Oct 15, 2021

    first of all thank you for this amazing tool!

    After utilizing it for a few days I was wondering if anyone found a solution to use rga to search through gpg encrypted files?
    I imagine this might be quite handy, especially when dealing with a lot of gpg encrypted documents (notes, logs etc) on a regular basis ...

    thanks in advance!

    Reply
  • Fix help message for --rga-cache-path
    Fix help message for --rga-cache-path

    Oct 17, 2021

    Former message was a copy of --rga-cache-compression-level

    Reply
  • Build fails with
    Build fails with "unstable feature" error in rkv dependency

    Jun 17, 2019

    I tried doing cargo build (of master at commit ef2e4ebf28f) and got this error:

      $ cargo build 
          Updating crates.io index
       Downloading crates ...
        Downloaded chrono v0.4.6
        Downloaded encoding_rs v0.8.17
        [...]
         Compiling zip v0.5.2
         Compiling serde_json v1.0.39
         Compiling rkv v0.9.6
      error[E0658]: use of unstable library feature 'try_from' (see issue #33417)
         --> /home/kfogel/.cargo/registry/src/github.com-1ecc6299db9ec823/rkv-0.9.6/src/error.rs:166:11
          |
      166 | impl From<::std::num::TryFromIntError> for MigrateError {
          |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      error[E0658]: use of unstable library feature 'try_from' (see issue #33417)
        --> /home/kfogel/.cargo/registry/src/github.com-1ecc6299db9ec823/rkv-0.9.6/src/migrate.rs:78:5
         |
      78 |     convert::TryFrom,
         |     ^^^^^^^^^^^^^^^^
      
      [...many more similar error lines...]
      
      error: aborting due to 12 previous errors
      
      For more information about this error, try `rustc --explain E0658`.
      error: Could not compile `rkv`.
      warning: build failed, waiting for other jobs to finish...
      error: build failed
      $ 
    

    I don't know much Rust, but it looks like rkv is using an unstable feature (rust bug 33417 has more about it), and that since rga depends on rkv, this affects the rga build too. I ran rustc --explain E0658 and got some information about how to solve the problem -- presumably those solutions would have to be implemented upstream in rkv, if we wanted to solve this for everyone, or else I'd have either build a modified rkv locally or get the nightly version of rustc to do the build I just tried to do.

    I'm not sure what ways might be available to solve this within rga. Ideas welcome; like I said, I don't know Rust that well.

    Anyway, this was all along the way to submitting a PR for README.md to add installation instructions. I'll submit that PR, and then in its commentary mention this issue.

    Reply
  • preprocessor command failed: '
    preprocessor command failed: '"rga-preproc" "/Users/user/Desktop/test/test.pdf.zip"

    May 18, 2020

    I am getting this error while executing:

    rga "hello" ~/Desktop/test/

    where I have a zip file. I don't understand from the documentation whether ZIP files need an extra argument or not. Thanks in advance.

    Reply
  • error running pdf search on windows 10 - 64bit
    error running pdf search on windows 10 - 64bit

    Nov 8, 2019

    I tried running the pdf search with the adapter "poppler" on both version 0.9.2 and 0.9.3 and I get the following error message. What am I missing here?

    Reference.pdf: preprocessor command failed: '"rga-preproc" "Reference.pdf"':
    -------------------------------------------------------------------------------
    adapter: poppler
    pdftotext version 4.00
    Copyright 1996-2017 Glyph & Cog, LLC
    Usage: pdftotext [options] <PDF-file> [<text-file>]
      -f <int>             : first page to convert
      -l <int>             : last page to convert
      -layout              : maintain original physical layout
      -simple              : simple one-column page layout
      -table               : similar to -layout, but optimized for tables
      -lineprinter         : use strict fixed-pitch/height layout
      -raw                 : keep strings in content stream order
      -fixed <number>      : assume fixed-pitch (or tabular) text
      -linespacing <number>: fixed line spacing for LinePrinter mode
      -clip                : separate clipped text
      -nodiag              : discard diagonal text
      -enc <string>        : output text encoding name
      -eol <string>        : output end-of-line convention (unix, dos, or mac)
      -nopgbrk             : don't insert page breaks between pages
      -bom                 : insert a Unicode BOM at the start of the text file
      -opw <string>        : owner password (for encrypted files)
      -upw <string>        : user password (for encrypted files)
      -q                   : don't print any messages or errors
      -cfg <string>        : configuration file to use in place of .xpdfrc
      -v                   : print copyright and version info
      -h                   : print usage information
      -help                : print usage information
      --help               : print usage information
      -?                   : print usage information
    Error: The pipe has been ended. (os error 109)
    
    Reply
  • Fix installation and CI
    Fix installation and CI

    Dec 28, 2020

    • Fixes installation with the stable toolchain. Essentially it's just cargo update
    • Fixes the push pipeline, now it fails on a test
    • Fixes the release pipeline
    Reply
  • installed on mint 19.1 (ubuntu 18.04) error reading .odt files  - ./file.odt: preprocessor command failed: '
    installed on mint 19.1 (ubuntu 18.04) error reading .odt files - ./file.odt: preprocessor command failed: '"rga-preproc" "./file.odt"': ------------------------------------------------------------------------------- adapter: pandoc pandoc: Cannot read archive from stdin CallStack (from HasCallStack): error, called at pandoc.hs:1386:22 in main:Main Error: Broken pipe (os error 32) -------------------------------------------------------------------------------

    Jul 9, 2019

                                                                                                                                                                                                           
    Reply
  • OSX Homebrew formula
    OSX Homebrew formula

    Jun 17, 2019

    Hi,

    Would be great to have in the Homebrew. I am using rg for a while now and rga would be useful to have in handy.

    Reply