Ansible Copy versus Synchronize

By Christopher Burg

I run a lot of servers since I self-host almost all of the online services that I use. To simplify my life, I automate as much of the work as I can with Ansible.

When I started migrating from my WordPress site to this one, I built an Ansible playbook to automate building this site's server. One step in that playbook is copying the files output by Zola, which are located on my controller system, to the web server.

The first version of my playbook utilized Ansible's copy module.

- name: "Copying website files to the server."
  ansible.builtin.copy:
    src: "{{ website_repository_path }}"
    dest: "/var/www/"
    seuser: "system_u"
    serole: "object_r"
    setype: "httpd_sys_content_t"
    owner: "nginx"
    group: "nginx"
    mode: "u+rx-w,g+rx-w,o+rx-w"

website_repository_path is the path on the controller system to the website files. The server I'm using uses SELinux, hence the seuser, serole, and setype parameters. Since the site is static, I set the permissions so the files are read-only (the web server should never write to this directory so there's no reason to give it permission to do so).

I started the Playbook and went to do other things. I returned almost three hours later expecting the new website to be up and running. Instead the playbook was still copying the website files from my controller system to the web server. I logged onto the web server and verified that files were being copied, albeit very slowly. Over three hours to copy a small website obviously isn't sustainable so I started looking for another solution.

My search lead me to the Ansible synchronize module. The documentation told me everything I needed to know:

synchronize is a wrapper around rsync to make common tasks in your playbooks quick and easy.

I use rsync to copy files between computers all the time. I know it's fast. So I made a naive change in my playbook.

- name: "Copying website files to the server."
  ansible.builtin.synchronize:
    src: "{{ website_repository_path }}"
    dest: "/var/www/"
    seuser: "system_u"
    serole: "object_r"
    setype: "httpd_sys_content_t"
    owner: "nginx"
    group: "nginx"
    mode: "u+rx-w,g+rx-w,o+rx-w"

The mode parameter differs between the copy and synchronize module. Moreover, the synchronize module doesn't support the SELinux parameters. So I made another change.

- name: "Copying website files to the server."
  ansible.builtin.synchronize:
    src: "{{ website_repository_path }}"
    dest: "/var/www/"
    mode: "push"

- name: "Setting the permissions for the website files in /var/www."
    ansible.builtin.file:
    path: "/var/www/"
    state: "directory"
    recurse: "true"
    seuser: "system_u"
    serole: "object_r"
    setype: "httpd_sys_content_t"
    owner: "nginx"
    group: "nginx"
    mode: "u+rx-w,g+rx-w,o+rx-w"

The playbook ran, but when it got to the synchronize task, it stopped with an error. The error indicated that the playbook required the sudo password for the target system. I found this quite strange since I set the entire playbook to run with elevated privileges on the target system. The playbook already knows the sudo password. This is a good time to remind everyone to completely read the documentation before using something. Had I done so, I'd have seen this:

Currently, synchronize is limited to elevating permissions via passwordless sudo. This is because rsync itself is connecting to the remote machine and rsync doesn’t give us a way to pass sudo credentials in.

That's easy enough to work around. Tell the synchronize task to run with regular user permissions, which is done by including become: "no" in the parameter list (assuming you set become: "yes" globally in the playbook like I did). However, I could no longer synchronize the files directly to /var/www/ because the Ansible user doesn't have permission to write to that directory. Therefore, I had to change the workflow slightly.

  1. Synchronize the files from my controller system to a directory on the web server to which the Ansible user has write permissions.

  2. Copy the files from that location to /var/www/.

  3. Clean up the originally copied files.

  4. Set the correct permissions on the files in /var/www/.

Ansible is helpful in that there is an ansible_user variable so I could write the files to /home/{{ ansible_user }}/www/. The steps I described above ended up looking like this.

- name: "Copying website files to the server."
  become: "no"
  ansible.builtin.synchronize:
    src: "{{ website_repository_path }}"
    dest: "/home/{{ ansible_user }}/www/"
    mode: "push"

- name: "Moving the website files to /var/www."
    ansible.builtin.copy:
    remote_src: "true"
    src: "/home/{{ ansible_user }}/www/"
    dest: "/var/www/"

- name: "Removing files at temporary copy location."
  ansible.builtin.file:
    path: "/home/{{ ansible_user }}/www/"
    state: "absent"

- name: "Setting the permissions for the website files in /var/www."
    ansible.builtin.file:
    path: "/var/www/"
    state: "directory"
    recurse: "true"
    seuser: "system_u"
    serole: "object_r"
    setype: "httpd_sys_content_t"
    owner: "nginx"
    group: "nginx"
    mode: "u+rx-w,g+rx-w,o+rx-w"

When I ran the playbook with these changes, it took only a few seconds to copy the files from the controller system to the web server. The longest operation was setting the file permissions once the files were moved to /var/www/, which took no more than 30 seconds. This is sustainable.

If you're using Ansible to copy a lot of files from one system to another, consider the synchronize module. It's significantly faster.