Merge select files in git repo with local repo which don't have any commits

Question

Merge select files in git repo with local repo which don't have any commits

110 Views Asked by an4s911 At 10 September 2022 at 11:42

I have a remote git repo with more than 100 commits. I have a copy of those files locally and have some changes. I initialized a git repo locally. I added the remote as origin, but now I am facing some problems. The local repo is only initialized, and no commits yet, and no files staged. I have some important changes locally which are not present in the remote. The problems:

I can't pull remote because that will overwrite the existing local files
I can't commit all local changes and push because they don't share the same commit history
I can't force push also, because that will delete the commit history in remote

What I am trying to achieve is to pull all commit history from remote, and then merge the local commit into the latest commit on remote.

Please note (in case this info will also help): This is a dotfiles git repo. In my previous machine, I set the local git repo using the command git --work-tree=/home/user --git-dir=.git/ init. In my new machine, before setting this repo again, and pulling the changes, I instead copied the config files and other dotfiles from my old machine and made a few changes. Now in the new machine I used the same command but the above mentioned problems were raised. Both remotely and locally I have the ~/.mozilla/firefox also backed up, so it contains binary (or similar) files which are neither readable nor writable, but have been changed while using firefox, including .db files. So for those files, I just wanna keep the local ones, and ignore the remote ones.

How can I approach this?

Original Q&A

There are 1 best solutions below

**torek** · Answer 1 · 2022-09-10T17:07:19.313000

I can't pull remote because that will overwrite the existing local files

Right. So don't do that.

I can't commit all local changes and push because they don't share the same commit history

You never commit changes in the first place. This statement starts with an incorrect assumption! That's where we'll fix things.

I can't force push also, because that will delete the commit history in remote

This part is right, so we won't do that.

Now, before we go about fixing #2 above, let's get into this part:

Both remotely and locally I have the ~/.mozilla/firefox also backed up, so it contains binary (or similar) files which are neither readable nor writable, but have been changed while using firefox, including .db files. So for those files, I just wanna keep the local ones, and ignore the remote ones.

This is ... difficult, and also illustrative: it shows the difference between a backup system (which should save and restore these databases) and a version control system (which should not). Git is a version control system, not a backup system; as such, you really don't want to store these databases in it. But you already did, so you're stuck with that. There is no good Git solution to this issue. Consider redoing everything so that these databases are excluded from your version control. (Whether you want to include or exclude browser bookmarks is a separate question, but note that these are typically imported and exported as XML or modified HTML or some such, and Git's merge algorithms perform poorly with these file formats.)

With that caveat—that these version-controlled databases are going to be a problem and there is very little you can do about it—let's go on to items 1 and 2 above. You have been led astray by learning the git pull command. It's not inherently bad, but git pull is composed of two more-basic commands:

git fetch, which you do need to use; followed by
a second Git command (usually git merge by default), which you must not use here.

Knowing that git pull = git fetch + second Git command, and what each of these two commands do, would have gotten you a lot closer to your answer. All you need to do is:

run git fetch to obtain all the commits from the remote named origin (i.e., run git fetch or git fetch origin).
Set things up so that you are "on" the desired branch, with the desired branch-tip as the current commit.
Add and commit your files. You will get a full snapshot of every file you added—and only those files—in the same way that every commit holds a full snapshot of every file. The parent of the new commit you just made will be the commit you were "on" when you ran git commit, so the difference between the parent of the new commit, and the new commit itself, will be the differences between the files in each of those two commits.

(In other words, that's where "changes" come from. Git does not store changes. Git stores snapshots. But the snapshot-diff duality [1] [2] means we can work with changes whenever we like.

Step 2 is the hard part. The "normal" way to "get on a branch" is to use git switch (or, for Git versions predating 2.23, git checkout), but this asks Git to overwrite your working tree files. As these files are not (yet) in any commit(s), you definitely do not want to do this. You:

are in a repository that has no commits until you run git fetch;
are on an unborn or orphan branch;
have no branches (see above bullet point);
have an empty index unless you have already run git add;
have a non-empty working tree with various dot-files.

I've reproduced this condition:

$ mkdir tt && cd tt && git init
Initialized empty Git repository in ...
$ git remote add origin ../t

(The ../t bare repository here is a repository full of dinky test files and other random stuff from old stackoverflow answers.)

$ git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)
$ git branch
$

So, no branches, no commits; let's run git fetch and populate with commits:

$ git fetch
remote: Enumerating objects: 125, done.
...
 * [new branch]      branch     -> origin/branch
 * [new branch]      fix-signal -> origin/fix-signal
 * [new branch]      foobranch  -> origin/foobranch
 * [new branch]      master     -> origin/master
 * [new tag]         tag-foo    -> tag-foo
$ git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Note that while I'm still on my own master branch, my master branch does not exist. I can change the name of this unborn branch with git checkout -b or git switch -c or, if my Git is new enough, git branch -m (move, i.e., rename, branch). It's up to you what branch name you want to use here. For illustration I'll switch mine to main. Then I will create it based on the upstream master, which I now have in my tt repository as origin/master:

$ git switch -c main
Switched to a new branch 'main'
$ git branch main origin/master
Branch 'main' set up to track remote branch 'master' from 'origin'.
$ git status
On branch main
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        deleted:    .gitignore
        deleted:    a.py
        deleted:    ast_ex.py
        deleted:    bar
        deleted:    clob.c
        deleted:    closure.py
...
$ git rev-parse HEAD
11ae6ca18f6325c858f1e3ea2b7e6a045666336d

Note how it now looks as though I've deleted every file. This is because the index (aka staging area) remains empty. I have to git add files to populate it, or run commands such as git restore -S to copy files from the current or HEAD commit into Git's index, or both.

The current commit is now 11ae6ca18f6325c858f1e3ea2b7e6a045666336d. That's the commit I specified when I ran git branch main to create the name main. I did that by writing origin/master, but note:

$ git rev-parse origin/master
11ae6ca18f6325c858f1e3ea2b7e6a045666336d

There's that same hash ID: origin/master means commit 11ae6ca18f6325c858f1e3ea2b7e6a045666336d. Branch names like main and remote-tracking names like origin/master are just ways we have Git remember hash IDs for us.

If I want, I can now run git reset, which does a --mixed reset, which means it moves the current branch name to the commit I specify. I'll use the default, which is HEAD, which is the current commit specified by the name main which now holds the same commit hash ID as origin/master which is the commit I'd like to "append to" after all. (That's the commit I chose with git branch main origin/master!) Then, having "moved" the current branch main from 11ae6ca18f6325c858f1e3ea2b7e6a045666336d to 11ae6ca18f6325c858f1e3ea2b7e6a045666336d—i.e., not having moved at all—git reset --mixed will read into Git's index all of the files from the commit I moved to. So now no files will be staged for deletion. Instead, the index / staging-area now matches the current commit, and git status reports on the difference between the index and my working tree (mine is empty, yours won't be):

$ git reset
Unstaged changes after reset:
D       .gitignore
D       a.py
...
$ git status
On branch main
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        deleted:    .gitignore
        deleted:    a.py
...

(the git status list is the same as the git reset list here, and again consists of every file in Git's index, which is now every file in the current or HEAD commit).

If I didn't want to fill Git's index like this, I can git rm -r --cached . or (a bit of a special case hack) git read-tree --empty now. But in general this is what you want to do:

You want to populate the Git objects database with commits, using git fetch.
You want to set up the correct orphan/unborn branch name if necessary.
You want to create the branch name at the correct commit (as found by origin/whatever), so that your next commit will use this commit as its parent.
Then you want to build a new commit as usual.

You can, if you like, set up your new branch as a different branch—not main or master or whatever—and you can set it up without an upstream using git branch --no-track newbr origin/master for instance, or you can remove the upstream later with git branch --unset-upstream. You can git restore -S (but not -W) and git reset --mixed (but not --hard) if you like. These are all just fiddling around the edges: the fundamentals you want are those in the bullet points above this paragraph.

On a completely different note: dotfiles repositories

I like the idea of storing (some / many / most of) my "dot-files" in a repository. What I don't like is having a .git repository in my home directory, where those dot-files live. So what I did was write an overly fancy program: I put my committed dot-files into a repository and then have the program install them into place, mostly with symlinks wherever that works. This lets me pick and choose which dot files actually get saved and hence work around problems like the Firefox binary databases.

Mine is messy and highly imperfect and I have not fussed with it for a few years at this point. It's probably not a great starting point for anyone else. But I think the general idea is sound enough: don't store the dot-files in a Git repository, store prototype dot-files that get copied or symlinked or whatever. Maintain the prototypes, not the actual files, so that you can accommodate quirks as needed. In other words, add a level of indirection.

Merge select files in git repo with local repo which don't have any commits

There are 1 best solutions below

On a completely different note: dotfiles repositories

Related Questions in GIT

Related Questions in DOTFILES

Related Questions in GIT-WORKTREE

Trending Questions

Popular # Hahtags

Popular Questions