chkbit-py/README.md

178 lines
5.3 KiB
Markdown
Raw Normal View History

2019-09-23 20:39:30 +00:00
# chkbit
2020-11-18 22:33:35 +00:00
chkbit is a lightweight tool to check the data integrity of your files. It allows you to verify *that the data has not changed* since you put it there and that it is still the same when you move it somewhere else.
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
### On your Disk
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
chkbit starts with your primary disk. It creates checksums for each folder that will follow your data onto your backups.
2020-09-20 19:47:57 +00:00
2020-11-18 22:33:35 +00:00
Even though your filesystems should have built in checksums, it is usually not trivial to take them onto another media.
2020-09-20 19:47:57 +00:00
2020-11-18 22:33:35 +00:00
### On your backup
2020-09-20 19:47:57 +00:00
2020-11-18 22:33:35 +00:00
No matter what storage media or filesystem you use, chkbit stores its indexes in hidden files that are backed up together with your data.
2020-09-20 19:47:57 +00:00
2020-11-18 22:33:35 +00:00
When you run chkbit-verify on your backup media you can make sure that every byte was correctly transferred.
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
If your backup media fails or experiences [bitrot/data degradation](https://en.wikipedia.org/wiki/Data_degradation), chkbit allows you to discover what files were damaged and need to be replaced by other backups.
2019-12-17 20:07:53 +00:00
2020-11-18 22:33:35 +00:00
### Data in the Cloud
2019-12-17 20:07:53 +00:00
2020-11-18 22:33:35 +00:00
Some cloud providers re-encode your videos or compress your images to save space. chkbit will alert you of any changes.
2019-10-04 20:57:08 +00:00
2019-09-23 20:39:30 +00:00
## Installation
```
pip install --user chkbit
```
Or in its own environment:
```
pipx install chkbit
```
## Usage
Run `chkbit -u PATH` to create/update the chkbit index.
chkbit will
- create a `.chkbit` index in every subdirectory of the path it was given.
2023-12-19 23:09:50 +00:00
- update the index with md5/sha512/blake3 hashes for every file.
2020-11-18 22:33:35 +00:00
- report damage for files that failed the integrity check since the last run (check the exit status).
2019-09-23 20:39:30 +00:00
2019-10-04 20:57:08 +00:00
Run `chkbit PATH` to verify only.
2019-09-23 20:39:30 +00:00
```
2023-12-19 22:49:52 +00:00
usage: chkbit [-h] [-u] [--algo ALGO] [-f] [-i] [-s] [-w N] [-q] [-v] [PATH ...]
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
Checks the data integrity of your files. See https://github.com/laktak/chkbit-py
2019-09-23 20:39:30 +00:00
positional arguments:
2023-01-09 21:38:39 +00:00
PATH directories to check
options:
-h, --help show this help message and exit
-u, --update update indices (without this chkbit will only verify files)
2023-12-19 23:09:50 +00:00
--algo ALGO hash algorithm: md5, sha512, blake3
2023-01-09 21:38:39 +00:00
-f, --force force update of damaged items
-i, --verify-index verify files in the index only (will not report new files)
-s, --skip-symlinks do not follow symlinks
-w N, --workers N number of workers to use, default=5
-q, --quiet quiet, don't show progress/information
-v, --verbose verbose output
2019-09-23 20:39:30 +00:00
Status codes:
2020-11-18 22:33:35 +00:00
DMG: error, data damage detected
2019-09-23 20:39:30 +00:00
EIX: error, index damaged
old: warning, file replaced by an older version
2020-01-22 17:09:17 +00:00
new: new file
2019-09-23 20:39:30 +00:00
upd: file updated
ok : check ok
skp: skipped (see .chkbitignore)
EXC: internal exception
```
2020-11-19 10:24:15 +00:00
chkbit is set to use only 5 workers by default so it will not slow your system to a crawl. You can specify a higher number to make it a lot faster (requires about 128kB of memory per worker).
2019-09-23 20:39:30 +00:00
## Repair
2020-11-18 22:33:35 +00:00
chkbit cannot repair damage, its job is simply to detect it.
2019-09-23 20:39:30 +00:00
You should
- backup regularly.
- run chkbit *before* each backup.
2020-11-18 22:33:35 +00:00
- check for damage on the backup media.
- in case of damage *restore* from a checked backup.
2019-09-23 20:39:30 +00:00
## Ignore files
Add a `.chkbitignore` file containing the names of the files/directories you wish to ignore
- each line should contain exactly one name
- lines starting with `#` are skipped
2020-01-22 17:12:00 +00:00
- you may use [Unix shell-style wildcards](https://docs.python.org/3.8/library/fnmatch.html)
2019-09-23 20:39:30 +00:00
## FAQ
### Should I run `chkbit` on my whole drive?
You would typically run it only on *content* that you keep for a long time (e.g. your pictures, music, videos).
### Why is chkbit placing the index in `.chkbit` files (vs a database)?
The advantage of the .chkbit files is that
- when you move a directory the index moves with it
- when you make a backup the index is also backed up
2019-10-04 20:57:08 +00:00
The disadvantage is obviously that you get hidden `.chkbit` files in your content folders.
2019-09-23 20:39:30 +00:00
### How does chkbit work?
chkbit operates on files.
2022-02-20 18:11:29 +00:00
When run for the first time it records a hash of the file contents as well as the file modification time.
2019-09-23 20:39:30 +00:00
When you run it again it first checks the modification time,
2022-02-20 18:11:29 +00:00
- if the time changed (because you made an edit) it records a new hash.
- otherwise it will compare the current hash to the recorded value and report an error if they do not match.
### I wish to use a stronger hash algorithm
2023-12-19 23:09:50 +00:00
chkbit now supports sha512 and blake3. You can specify it with `--algo sha512` or `--algo blake3`.
2022-02-20 18:11:29 +00:00
Note that existing index files will use the hash that they were created with. If you wish to update all hashes you need to delete your existing indexes first.
### How can I delete the index files?
List them with
```
find . -name .chkbit
```
and add `-delete` to delete.
2019-09-23 20:39:30 +00:00
### Can I test if chkbit is working correctly?
On Linux/OS X you can try:
Create test and set the modified time:
```
$ echo foo1 > test; touch -t 201501010000 test
$ chkbit -u .
2019-10-04 20:57:08 +00:00
add ./test
Processed 1 file(s).
Indices were updated.
2019-09-23 20:39:30 +00:00
```
2019-10-04 20:57:08 +00:00
`add` indicates the file was added.
2019-09-23 20:39:30 +00:00
Now update test with a new modified:
```
$ echo foo2 > test; touch -t 201501010001 test # update test & modified
$ chkbit -u .
2019-10-04 20:57:08 +00:00
upd ./test
Processed 1 file(s).
Indices were updated.
2019-09-23 20:39:30 +00:00
```
2019-10-04 20:57:08 +00:00
`upd` indicates the file was updated.
2019-09-23 20:39:30 +00:00
2020-11-18 22:33:35 +00:00
Now update test with the same modified to simulate damage:
2019-09-23 20:39:30 +00:00
```
$ echo foo3 > test; touch -t 201501010001 test
$ chkbit -u .
2020-11-18 22:33:35 +00:00
DMG ./test
2019-10-04 20:57:08 +00:00
Processed 0 file(s).
2020-11-18 22:33:35 +00:00
chkbit detected damage in these files:
2019-09-23 20:39:30 +00:00
./test
2020-11-18 22:33:35 +00:00
error: detected 1 file(s) with damage!
2019-09-23 20:39:30 +00:00
```
2020-11-18 22:33:35 +00:00
`DMG` indicates damage.
2019-09-23 20:39:30 +00:00