chkbit-py/README.md

258 lines
8.3 KiB
Markdown
Raw Normal View History

2023-12-23 10:27:52 +00:00
2019-09-23 20:39:30 +00:00
# chkbit
2020-11-18 22:33:35 +00:00
chkbit is a lightweight tool to check the data integrity of your files. It allows you to verify *that the data has not changed* since you put it there and that it is still the same when you move it somewhere else.
2019-09-23 20:39:30 +00:00
2024-01-13 19:18:58 +00:00
cross-platform support for [Linux, macOS and Windows](https://github.com/laktak/chkbit-py/releases)!
2023-12-23 10:27:52 +00:00
- [Use it](#use-it)
- [On your Disk](#on-your-disk)
- [On your Backup](#on-your-backup)
- [For Data in the Cloud](#for-data-in-the-cloud)
- [Installation](#installation)
- [Usage](#usage)
- [Repair](#repair)
- [Ignore files](#ignore-files)
- [FAQ](#faq)
- [Should I run `chkbit` on my whole drive?](#should-i-run-chkbit-on-my-whole-drive)
- [Why is chkbit placing the index in `.chkbit` files (vs a database)?](#why-is-chkbit-placing-the-index-in-chkbit-files-vs-a-database)
- [How does chkbit work?](#how-does-chkbit-work)
- [I wish to use a stronger hash algorithm](#i-wish-to-use-a-stronger-hash-algorithm)
- [How can I delete the index files?](#how-can-i-delete-the-index-files)
- [Can I test if chkbit is working correctly?](#can-i-test-if-chkbit-is-working-correctly)
- [Development](#development)
## Use it
2020-11-18 22:33:35 +00:00
### On your Disk
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
chkbit starts with your primary disk. It creates checksums for each folder that will follow your data onto your backups.
2020-09-20 19:47:57 +00:00
2023-12-23 10:27:52 +00:00
Here it alerts you to
- damage on the disk
- damage caused by filesystem errors
- damage caused by malware (when it encrypts your files)
The built in checksums from your filesystems only cover some of these cases.
2020-09-20 19:47:57 +00:00
2023-12-23 10:27:52 +00:00
### On your Backup
2020-09-20 19:47:57 +00:00
2020-11-18 22:33:35 +00:00
No matter what storage media or filesystem you use, chkbit stores its indexes in hidden files that are backed up together with your data.
2020-09-20 19:47:57 +00:00
2023-12-23 10:27:52 +00:00
When you run chkbit on your backup media you can verify that every byte was correctly transferred.
2019-09-23 20:39:30 +00:00
2023-12-23 10:27:52 +00:00
If your backup media fails or experiences [bitrot/data degradation](https://en.wikipedia.org/wiki/Data_degradation), chkbit allows you to discover what files were damaged and need to be replaced by other backups. You should always keep multiple backups :)
2019-12-17 20:07:53 +00:00
2023-12-23 10:27:52 +00:00
### For Data in the Cloud
2019-12-17 20:07:53 +00:00
2020-11-18 22:33:35 +00:00
Some cloud providers re-encode your videos or compress your images to save space. chkbit will alert you of any changes.
2019-10-04 20:57:08 +00:00
2019-09-23 20:39:30 +00:00
## Installation
2024-01-18 20:49:47 +00:00
- Download for [Linux, macOS or Windows](https://github.com/laktak/chkbit-py/releases).
- Linux packages for [Arch, Debian, Fedora, Suse and Ubuntu via the Open Build Service](https://software.opensuse.org//download.html?project=home%3Alaktak&package=chkbit)
- Get it with [pipx](https://pipx.pypa.io/latest/installation/): `pipx install chkbit`
2023-12-21 18:29:27 +00:00
2019-09-23 20:39:30 +00:00
## Usage
Run `chkbit -u PATH` to create/update the chkbit index.
chkbit will
- create a `.chkbit` index in every subdirectory of the path it was given.
2023-12-21 18:29:27 +00:00
- update the index with blake3 (see --algo) hashes for every file.
2020-11-18 22:33:35 +00:00
- report damage for files that failed the integrity check since the last run (check the exit status).
2019-09-23 20:39:30 +00:00
2019-10-04 20:57:08 +00:00
Run `chkbit PATH` to verify only.
2019-09-23 20:39:30 +00:00
```
2024-01-09 22:07:43 +00:00
usage: chkbit [-h] [-u] [--show-ignored-only] [--algo ALGO] [-f] [-s] [-l FILE] [--log-verbose] [--index-name NAME] [--ignore-name NAME] [-w N] [--plain] [-q] [-v] [PATH ...]
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
Checks the data integrity of your files. See https://github.com/laktak/chkbit-py
2019-09-23 20:39:30 +00:00
positional arguments:
2024-01-09 22:07:43 +00:00
PATH directories to check
2023-01-09 21:38:39 +00:00
options:
2024-01-09 22:07:43 +00:00
-h, --help show this help message and exit
-u, --update update indices (without this chkbit will verify files in readonly mode)
--show-ignored-only only show ignored files
--algo ALGO hash algorithm: md5, sha512, blake3 (default: blake3)
-f, --force force update of damaged items
-s, --skip-symlinks do not follow symlinks
-l FILE, --log-file FILE
write to a logfile if specified
--log-verbose verbose logging
2024-01-10 19:28:24 +00:00
--index-name NAME filename where chkbit stores its hashes, needs to start with '.' (default: .chkbit)
--ignore-name NAME filename that chkbit reads its ignore list from, needs to start with '.' (default: .chkbitignore)
2024-01-09 22:07:43 +00:00
-w N, --workers N number of workers to use (default: 5)
--plain show plain status instead of being fancy
-q, --quiet quiet, don't show progress/information
-v, --verbose verbose output
2019-09-23 20:39:30 +00:00
.chkbitignore rules:
each line should contain exactly one name
you may use Unix shell-style wildcards (see README)
lines starting with `#` are skipped
lines starting with `/` are only applied to the current directory
2019-09-23 20:39:30 +00:00
Status codes:
2020-11-18 22:33:35 +00:00
DMG: error, data damage detected
2019-09-23 20:39:30 +00:00
EIX: error, index damaged
old: warning, file replaced by an older version
2020-01-22 17:09:17 +00:00
new: new file
2019-09-23 20:39:30 +00:00
upd: file updated
ok : check ok
ign: ignored (see .chkbitignore)
2019-09-23 20:39:30 +00:00
EXC: internal exception
```
2023-12-21 18:29:27 +00:00
chkbit is set to use only 5 workers by default so it will not slow your system to a crawl. You can specify a higher number to make it a lot faster if the IO throughput can also keep up.
2020-11-19 10:24:15 +00:00
2019-09-23 20:39:30 +00:00
## Repair
2020-11-18 22:33:35 +00:00
chkbit cannot repair damage, its job is simply to detect it.
2019-09-23 20:39:30 +00:00
You should
- backup regularly.
- run chkbit *before* each backup.
2020-11-18 22:33:35 +00:00
- check for damage on the backup media.
- in case of damage *restore* from a checked backup.
2019-09-23 20:39:30 +00:00
## Ignore files
Add a `.chkbitignore` file containing the names of the files/directories you wish to ignore
- each line should contain exactly one name
- you may use [Unix shell-style wildcards](https://docs.python.org/3/library/fnmatch.html)
- `*` matches everything
- `?` matches any single character
- `[seq]` matches any character in seq
- `[!seq]` matches any character not in seq
2019-09-23 20:39:30 +00:00
- lines starting with `#` are skipped
- lines starting with `/` are only applied to the current directory
2024-01-10 19:28:24 +00:00
- you can use `path/sub/name` to ignore a file/directory in a sub path
- hidden files (starting with a `.`) are ignored by default
2019-09-23 20:39:30 +00:00
## FAQ
### Should I run `chkbit` on my whole drive?
You would typically run it only on *content* that you keep for a long time (e.g. your pictures, music, videos).
### Why is chkbit placing the index in `.chkbit` files (vs a database)?
The advantage of the .chkbit files is that
- when you move a directory the index moves with it
- when you make a backup the index is also backed up
2019-10-04 20:57:08 +00:00
The disadvantage is obviously that you get hidden `.chkbit` files in your content folders.
2019-09-23 20:39:30 +00:00
### How does chkbit work?
chkbit operates on files.
2022-02-20 18:11:29 +00:00
When run for the first time it records a hash of the file contents as well as the file modification time.
2019-09-23 20:39:30 +00:00
When you run it again it first checks the modification time,
2022-02-20 18:11:29 +00:00
- if the time changed (because you made an edit) it records a new hash.
- otherwise it will compare the current hash to the recorded value and report an error if they do not match.
2024-01-13 19:18:58 +00:00
### I wish to use a different hash algorithm
2022-02-20 18:11:29 +00:00
2023-12-21 18:37:51 +00:00
chkbit now uses blake3 by default. You can also specify `--algo sha512` or `--algo md5`.
2022-02-20 18:11:29 +00:00
2023-12-21 18:37:51 +00:00
Note that existing index files will use the hash that they were created with. If you wish to update all hashes you need to delete your existing indexes first. A conversion mode may be added later (PR welcome).
2022-02-20 18:11:29 +00:00
### How can I delete the index files?
List them with
```
find . -name .chkbit
```
and add `-delete` to delete.
2019-09-23 20:39:30 +00:00
### Can I test if chkbit is working correctly?
2024-01-13 19:18:58 +00:00
On Linux/macOS you can try:
2019-09-23 20:39:30 +00:00
Create test and set the modified time:
```
$ echo foo1 > test; touch -t 201501010000 test
$ chkbit -u .
2023-12-21 18:29:27 +00:00
new ./test
Processed 1 file.
2023-12-22 19:55:56 +00:00
- 0:00:00 elapsed
2023-12-21 18:29:27 +00:00
- 192.31 files/second
- 0.00 MB/second
- 1 directory was updated
- 1 file hash was added
- 0 file hashes were updated
2019-09-23 20:39:30 +00:00
```
2023-12-21 18:29:27 +00:00
`new` indicates a new file was added.
2019-09-23 20:39:30 +00:00
Now update test with a new modified:
```
$ echo foo2 > test; touch -t 201501010001 test # update test & modified
$ chkbit -u .
2019-10-04 20:57:08 +00:00
upd ./test
2023-12-21 18:29:27 +00:00
Processed 1 file.
2023-12-22 19:55:56 +00:00
- 0:00:00 elapsed
2023-12-21 18:29:27 +00:00
- 191.61 files/second
- 0.00 MB/second
- 1 directory was updated
- 0 file hashes were added
- 1 file hash was updated
2019-09-23 20:39:30 +00:00
```
2019-10-04 20:57:08 +00:00
`upd` indicates the file was updated.
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
Now update test with the same modified to simulate damage:
2019-09-23 20:39:30 +00:00
```
$ echo foo3 > test; touch -t 201501010001 test
$ chkbit -u .
2020-11-18 22:33:35 +00:00
DMG ./test
2023-12-21 18:29:27 +00:00
Processed 1 file.
2023-12-22 19:55:56 +00:00
- 0:00:00 elapsed
2023-12-21 18:29:27 +00:00
- 173.93 files/second
- 0.00 MB/second
2020-11-18 22:33:35 +00:00
chkbit detected damage in these files:
2019-09-23 20:39:30 +00:00
./test
2023-12-21 18:29:27 +00:00
error: detected 1 file with damage!
2019-09-23 20:39:30 +00:00
```
2020-11-18 22:33:35 +00:00
`DMG` indicates damage.
2019-09-23 20:39:30 +00:00
2023-12-22 14:19:54 +00:00
## Development
With pipenv (install with `pipx install pipenv`):
```
# setup
pipenv install
# run chkbit
2024-01-16 08:37:58 +00:00
pipenv run python3 run.py
2023-12-22 14:19:54 +00:00
```
To build a source distribution package from pyproject.toml
```
pipx run build
```
You can then install your own package with
```
pipx install dist/chkbit-*.tar.gz
```
2024-01-13 19:18:58 +00:00
The binaries are created using pyinstaller via Github actions.