Statement : The sole purpose of this post is to learn how to keep in sync the remote data stored in AWS, Azure blob storage etc with the local file system.
Installation : Install rclone from the link based on your machine (Windows, Linux and MAC etc). I have worked on MAC so downloaded the respected file.
Steps : In my case, I have stored my files in Azure blob storage and AWS S3 bucket as well. So given below are the steps by which we can make the data in sync with the local directory.
- Go to downloaded folder and execute the following command to configure rclone –
tangupta-mbp:rclone-v1.39-osx-amd64 tangupta$ ./rclone config
- Initially there will be no remote found then you need to create the new remote.
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
- Now, It’ll ask for the type of storage like aws, azure, box, google drive etc to configure. I have chosen to use azure blog storage.
- Now it’ll ask for the details of azure blob storage like account name, key, end point (Keep it blank) etc.
Storage Account Name
account> your_created_account_name_on azure
Storage Account Key
Endpoint for the service - leave blank normally.
y) Yes this is OK
e) Edit this remote
d) Delete this remote
- To list all the contained created on Azure portal under this account name –
tangupta$./rclone lsd remote:
-1 2018-02-05 12:37:03 -1 test
- To list all the files uploaded or created under the container (test in my case) –
tangupta$./rclone ls remote:test
48128 Resume shashank.doc
- To Copy all the files uploaded or created under the container to the local machine or vice versa –
tangupta$./rclone copy /Users/tanuj/airflow/dag remote:test
- Most importantly, now use the below command to sync the local file system to the remote container, deleting any excess files in the container.
tangupta$./rclone sync /Users/tanuj/airflow/dag remote:test
The Good thing about rclone sync is that it’ll download the updated content only. In the way, you can play with AWS storage to sync the file. Apart from all these commands, rclone has given us the facility to copy, move, delete commands to do the respective job in the appropriate way.
Now, one can use the rsync command to copy/sync/backup the contents between different directories locally and remotely as well. It is widely used command to transfer the partial transfer (difference of data in files) between source and destination node.
tangupta$ rsync --avc --delete /Users/tanuj/airflow/test /Users/tanuj/airflow/dags
Hope this works for you. Enjoy 🙂