TL;DR I configured a difftool and git-diff gives "intelligent" diffs but git-add creates "stupid" hunks. Why?
I configured the difftool to use nbdime with nbdime config-git --enable --global which I think essentially just adds these lines to my .gitconfig:
[diff "jupyternotebook"]
command = git-nbdiffdriver diff
[merge "jupyternotebook"]
driver = git-nbmergedriver merge %O %A %B %L %P
name = jupyter notebook merge driver
[difftool "nbdime"]
cmd = git-nbdifftool diff \"$LOCAL\" \"$REMOTE\" \"$BASE\"
[difftool]
prompt = false
[mergetool "nbdime"]
cmd = git-nbmergetool merge \"$BASE\" \"$LOCAL\" \"$REMOTE\" \"$MERGED\"
[mergetool]
prompt = false
Now git diff gives the good output I expect:
nbdiff /var/folders/6b/03yw1pts2nx_q8vftrh6fv140000gp/T//FILE.ipynb FOLDER/FILE.ipynb
--- /var/folders/6b/03yw1pts2nx_q8vftrh6fv140000gp/T//FILE.ipynb 2022-05-17 14:29:39.937318
+++ FOLDER/FILE.ipynb 2022-05-17 14:09:45.222229
## inserted before /cells/0:
+ code cell:
+ source:
+ ...
+ markdown cell:
+ source:
+ ...
## deleted /cells/0:
- markdown cell:
- source:
- ...
## inserted before /cells/2:
+ code cell:
+ source:
+ ...
But if I do git add -e FOLDER/FILE.ipynb, it gives me a "really bad" diff:
diff --git a/FOLDER/FILE.ipynb b/FOLDER/FILE.ipynb
index 3a1540c..17363f8 100644
--- a/FOLDER/FILE.ipynb
+++ b/FOLDER/FILE.ipynb
@@ -1,621 +1,716 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ...
- ]
- },
- ... almost every line in the file is removed
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "j1qKT6qtAYEj"
+ },
+ "outputs": [],
+ "source": [
+ ...
+ ]
+ },
+ ... almost every line in the file is added back
I may have a fundamental misunderstanding of what git-add does, but why isn't git add using the nbdime diff tool? And is there a way I can add just the changes that I see in git-diff ?
Both
git add -eandgit add -pneed to be able to understand an edited diff. They have a limited amount of comprehension of diffs in general, and require the "dumb" format from plaingit diff. Thenbdimetools take the original files apart, re-shuffle them into usable text, and diff that usable text,1 but that's not what's actually in the files, andgit add -eneeds to work on what's in the files, not some cleaned-up presentation thereof.1What's in the files is machine-readable JSON. The result of the
nbdimetools appears to be yaml. If Git had a native JSON diff engine,git add -pand company would be able to deal with the result, but Git doesn't, so it isn't. If Jupyter-notebooks used yaml, Git's line-oriented tools would be able to deal with them, but Jupyter-notebooks doesn't, so it isn't.