I have a following structure for my project (see example at the bottom) . I craeted a branch one and initialized dvc in the subdirectory , via
cd projectA && dvc init --subdir
for now, i'm working with projectA only, I created a branch, and ran following steps
step 1 .
dvc remote add -d awsstorage s3://mlops-artifacts
dvc remote add s3cache s3://mlops-artifacts/cache
dvc config cache.s3 s3cache
git add projectA/.dvc/config
git commit -m "configure remote storage and cache"
git push
dataset folder contains csv data for trainign and output folder contains model.tar file - output of the training
step 2 -
dvc add --external awsstorage s3://mlops-artifacts/dataset
dvc add --external awsstorage s3://mlops-artifacts/output
git add projectA/dataset.dvc projectA/output.dvc
git commit -m "configure data and model output"
git push
I created another branch (branch-2) , made some changes to the input data and uploaded to the same se location and ran the training. after training, model output is dumped to same s3 path. so to capture these new changes . i ran step 1 and step 2 above in the new branch.
now when i tried to do a git checkout of the first branch and then dvc checkout , so that i can go to the previous version of data and model output and i got following error
error
Error: checkout failed for following targets:
s3://mlops-artifacts/dataset/training.csv
s3://mlops-artifacts/output/model.tar
Is your cache up to date?
<https://error.dvc.org/missing-files>
project structure
myproject
-- projectA
-- .dvc
...
-- ProjectB
-- .dvc
...
It seems that the DVC cache is not up to date and the files training.csv and model.tar are missing from the cache. This can happen if the cache has not been updated with the latest changes.
To update the DVC cache, you can try running the following command:
dvc pull
This will download the missing files from the remote storage and update the cache.
If you still get the error after updating the cache, you can try running the following command to force DVC to checkout the files from the remote storage:
dvc checkout
This should restore the missing files to your local filesystem.
If you continue to have issues, you can also try running dvc doctor to diagnose and troubleshoot any issues with your DVC installation and configuration.
Note that if you are switching between branches frequently, it's a good idea to run dvc push and dvc pull regularly to keep the remote storage and cache up to date.
Comments
Post a Comment