chkbit-py/README.md

235 lines
7.8 KiB
Markdown
Raw Permalink Normal View History

2023-12-23 10:27:52 +00:00
2019-09-23 20:39:30 +00:00
# chkbit
2024-01-22 10:50:36 +00:00
chkbit is a tool that ensures the safety of your files by checking if their *data integrity remains intact over time*, especially during transfers and backups. It helps detect issues like disk damage, filesystem errors, and malware interference.
2019-09-23 20:39:30 +00:00
2024-01-22 10:50:36 +00:00
![gif of chkbit](https://raw.githubusercontent.com/laktak/chkbit-py/readme/readme/chkbit-py.gif "chkbit")
2024-01-13 19:18:58 +00:00
2024-01-22 10:50:36 +00:00
- [How it works](#how-it-works)
2023-12-23 10:27:52 +00:00
- [Installation](#installation)
- [Usage](#usage)
- [Repair](#repair)
- [Ignore files](#ignore-files)
- [FAQ](#faq)
- [Development](#development)
2024-01-22 10:50:36 +00:00
## How it works
2019-09-23 20:39:30 +00:00
2024-01-22 10:50:36 +00:00
- **On your Disk**: chkbit starts by creating checksums for each folder on your main disk. It alerts you to potential problems such as damage on the disk, filesystem errors, and malware attacks that could alter your files.
2020-09-20 19:47:57 +00:00
2024-01-22 10:50:36 +00:00
- **On your Backup**: Regardless of your storage media, chkbit stores indexes in hidden files alongside your data during backups. When you run chkbit on your backup, it verifies that every byte was accurately transferred. If issues like [bitrot/data degradation](https://en.wikipedia.org/wiki/Data_degradation) occur, chkbit helps identify damaged files, alerting you to replace them with other backups.
2023-12-23 10:27:52 +00:00
2024-01-22 10:50:36 +00:00
- **For Data in the Cloud**: chkbit is useful for cloud-stored data, alerting you to any changes introduced by cloud providers like video re-encoding or image compression. It ensures your files remain unchanged in the cloud.
2020-09-20 19:47:57 +00:00
2024-01-22 10:50:36 +00:00
Remember to always maintain multiple backups for comprehensive data protection.
2019-10-04 20:57:08 +00:00
2019-09-23 20:39:30 +00:00
## Installation
2024-01-18 20:49:47 +00:00
- Download for [Linux, macOS or Windows](https://github.com/laktak/chkbit-py/releases).
- Linux packages for [Arch, Debian, Fedora, Suse and Ubuntu via the Open Build Service](https://software.opensuse.org//download.html?project=home%3Alaktak&package=chkbit)
- Get it with [pipx](https://pipx.pypa.io/latest/installation/): `pipx install chkbit`
2023-12-21 18:29:27 +00:00
2024-01-18 21:20:12 +00:00
[Homebrew](https://brew.sh) is delayed until we reach their `>=75` stars rule.
2019-09-23 20:39:30 +00:00
## Usage
Run `chkbit -u PATH` to create/update the chkbit index.
chkbit will
- create a `.chkbit` index in every subdirectory of the path it was given.
2023-12-21 18:29:27 +00:00
- update the index with blake3 (see --algo) hashes for every file.
2020-11-18 22:33:35 +00:00
- report damage for files that failed the integrity check since the last run (check the exit status).
2019-09-23 20:39:30 +00:00
2019-10-04 20:57:08 +00:00
Run `chkbit PATH` to verify only.
2019-09-23 20:39:30 +00:00
```
2024-01-09 22:07:43 +00:00
usage: chkbit [-h] [-u] [--show-ignored-only] [--algo ALGO] [-f] [-s] [-l FILE] [--log-verbose] [--index-name NAME] [--ignore-name NAME] [-w N] [--plain] [-q] [-v] [PATH ...]
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
Checks the data integrity of your files. See https://github.com/laktak/chkbit-py
2019-09-23 20:39:30 +00:00
positional arguments:
2024-01-09 22:07:43 +00:00
PATH directories to check
2023-01-09 21:38:39 +00:00
options:
2024-01-09 22:07:43 +00:00
-h, --help show this help message and exit
-u, --update update indices (without this chkbit will verify files in readonly mode)
--show-ignored-only only show ignored files
--algo ALGO hash algorithm: md5, sha512, blake3 (default: blake3)
-f, --force force update of damaged items
-s, --skip-symlinks do not follow symlinks
-l FILE, --log-file FILE
write to a logfile if specified
--log-verbose verbose logging
2024-01-10 19:28:24 +00:00
--index-name NAME filename where chkbit stores its hashes, needs to start with '.' (default: .chkbit)
--ignore-name NAME filename that chkbit reads its ignore list from, needs to start with '.' (default: .chkbitignore)
2024-01-09 22:07:43 +00:00
-w N, --workers N number of workers to use (default: 5)
--plain show plain status instead of being fancy
-q, --quiet quiet, don't show progress/information
-v, --verbose verbose output
2019-09-23 20:39:30 +00:00
.chkbitignore rules:
each line should contain exactly one name
you may use Unix shell-style wildcards (see README)
lines starting with `#` are skipped
lines starting with `/` are only applied to the current directory
2019-09-23 20:39:30 +00:00
Status codes:
2020-11-18 22:33:35 +00:00
DMG: error, data damage detected
2019-09-23 20:39:30 +00:00
EIX: error, index damaged
old: warning, file replaced by an older version
2020-01-22 17:09:17 +00:00
new: new file
2019-09-23 20:39:30 +00:00
upd: file updated
ok : check ok
ign: ignored (see .chkbitignore)
2019-09-23 20:39:30 +00:00
EXC: internal exception
```
2023-12-21 18:29:27 +00:00
chkbit is set to use only 5 workers by default so it will not slow your system to a crawl. You can specify a higher number to make it a lot faster if the IO throughput can also keep up.
2020-11-19 10:24:15 +00:00
2019-09-23 20:39:30 +00:00
## Repair
2024-01-22 10:50:36 +00:00
chkbit is designed to detect "damage". To repair your files you need to think ahead:
2019-09-23 20:39:30 +00:00
2024-01-22 10:50:36 +00:00
- backup regularly
- run chkbit *before* each backup
- run chkbit *after* a backup on the backup media (readonly)
- in case of any issues, *restore* from a checked backup medium.
2019-09-23 20:39:30 +00:00
## Ignore files
Add a `.chkbitignore` file containing the names of the files/directories you wish to ignore
- each line should contain exactly one name
- you may use [Unix shell-style wildcards](https://docs.python.org/3/library/fnmatch.html)
- `*` matches everything
- `?` matches any single character
- `[seq]` matches any character in seq
- `[!seq]` matches any character not in seq
2019-09-23 20:39:30 +00:00
- lines starting with `#` are skipped
- lines starting with `/` are only applied to the current directory
2024-01-10 19:28:24 +00:00
- you can use `path/sub/name` to ignore a file/directory in a sub path
- hidden files (starting with a `.`) are ignored by default
2019-09-23 20:39:30 +00:00
## FAQ
### Should I run `chkbit` on my whole drive?
You would typically run it only on *content* that you keep for a long time (e.g. your pictures, music, videos).
### Why is chkbit placing the index in `.chkbit` files (vs a database)?
The advantage of the .chkbit files is that
- when you move a directory the index moves with it
- when you make a backup the index is also backed up
2019-10-04 20:57:08 +00:00
The disadvantage is obviously that you get hidden `.chkbit` files in your content folders.
2019-09-23 20:39:30 +00:00
### How does chkbit work?
chkbit operates on files.
2022-02-20 18:11:29 +00:00
When run for the first time it records a hash of the file contents as well as the file modification time.
2019-09-23 20:39:30 +00:00
When you run it again it first checks the modification time,
2022-02-20 18:11:29 +00:00
- if the time changed (because you made an edit) it records a new hash.
- otherwise it will compare the current hash to the recorded value and report an error if they do not match.
2024-01-13 19:18:58 +00:00
### I wish to use a different hash algorithm
2022-02-20 18:11:29 +00:00
2023-12-21 18:37:51 +00:00
chkbit now uses blake3 by default. You can also specify `--algo sha512` or `--algo md5`.
2022-02-20 18:11:29 +00:00
2023-12-21 18:37:51 +00:00
Note that existing index files will use the hash that they were created with. If you wish to update all hashes you need to delete your existing indexes first. A conversion mode may be added later (PR welcome).
2022-02-20 18:11:29 +00:00
### How can I delete the index files?
List them with
```
find . -name .chkbit
```
and add `-delete` to delete.
2019-09-23 20:39:30 +00:00
### Can I test if chkbit is working correctly?
2024-01-13 19:18:58 +00:00
On Linux/macOS you can try:
2019-09-23 20:39:30 +00:00
Create test and set the modified time:
```
$ echo foo1 > test; touch -t 201501010000 test
$ chkbit -u .
2023-12-21 18:29:27 +00:00
new ./test
Processed 1 file.
2023-12-22 19:55:56 +00:00
- 0:00:00 elapsed
2023-12-21 18:29:27 +00:00
- 192.31 files/second
- 0.00 MB/second
- 1 directory was updated
- 1 file hash was added
- 0 file hashes were updated
2019-09-23 20:39:30 +00:00
```
2023-12-21 18:29:27 +00:00
`new` indicates a new file was added.
2019-09-23 20:39:30 +00:00
Now update test with a new modified:
```
$ echo foo2 > test; touch -t 201501010001 test # update test & modified
$ chkbit -u .
2019-10-04 20:57:08 +00:00
upd ./test
2023-12-21 18:29:27 +00:00
Processed 1 file.
2023-12-22 19:55:56 +00:00
- 0:00:00 elapsed
2023-12-21 18:29:27 +00:00
- 191.61 files/second
- 0.00 MB/second
- 1 directory was updated
- 0 file hashes were added
- 1 file hash was updated
2019-09-23 20:39:30 +00:00
```
2019-10-04 20:57:08 +00:00
`upd` indicates the file was updated.
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
Now update test with the same modified to simulate damage:
2019-09-23 20:39:30 +00:00
```
$ echo foo3 > test; touch -t 201501010001 test
$ chkbit -u .
2020-11-18 22:33:35 +00:00
DMG ./test
2023-12-21 18:29:27 +00:00
Processed 1 file.
2023-12-22 19:55:56 +00:00
- 0:00:00 elapsed
2023-12-21 18:29:27 +00:00
- 173.93 files/second
- 0.00 MB/second
2020-11-18 22:33:35 +00:00
chkbit detected damage in these files:
2019-09-23 20:39:30 +00:00
./test
2023-12-21 18:29:27 +00:00
error: detected 1 file with damage!
2019-09-23 20:39:30 +00:00
```
2020-11-18 22:33:35 +00:00
`DMG` indicates damage.
2019-09-23 20:39:30 +00:00
2023-12-22 14:19:54 +00:00
## Development
With pipenv (install with `pipx install pipenv`):
```
# setup
pipenv install
# run chkbit
2024-01-16 08:37:58 +00:00
pipenv run python3 run.py
2023-12-22 14:19:54 +00:00
```
To build a source distribution package from pyproject.toml
```
pipx run build
```
You can then install your own package with
```
pipx install dist/chkbit-*.tar.gz
```
2024-01-13 19:18:58 +00:00
The binaries are created using pyinstaller via Github actions.