Using libarchive inside smbclient

Table of Contents

Introduction

The goal of this project is to replace smbclient own limited tar implementation with a more complete one using a separate library (libarchive).

libarchive is a BSD licensed library which can read and write various archived and/or compressed file formats. It's fast, robust and portable.

Using libarchive would fix several bugs related to the current tar implementation, it would provide multiple archive/compression formats and it might improve performances.

Current implementation

To better understand what needs to be done, let's see how tar is handled in Samba.

Samba 4 client doesn't handle tar yet so I will focus on the v3 of smbclient. I'm using revision 251767cde9a146d8 from the git repo throughout this document.

The tar module lets the user make a backup of a remote share in the form of a tar file. It also provides a way to restore that backup i.e. extracting the backup file onto a remote share.

Most of the code reading and writing tar files lives in source3/client/clitar.c which exports the following functions:

  • int cmd_block(void)
  • int cmd_tarmode(void)
  • int cmd_setmode(void)
  • int cmd_tar(void)
  • int process_tar(void)
  • int tar_parseargs(int argc, char *argv[], const char *Optarg, int Optind)

Argument handling/parsing

All the cmd_xxx functions are called when parsing command line arguments.

The tar module is triggered with tar and -T on the command line.

help flags etc          @ client.c:4667
-> cmd_tar()            @ clitar.c:1499
  -> tar_parseargs()    @ clitar.c:1792
  -> process_tar()      @ clitar.c:1527

Additionnal options regarding archive bits and what to do with them can be specified with tarmode.

tarmode                 @ client.c:4667
-> cmd_tarmode()        @ clitar.c:1337

blocksize (cmd_block) lets the user chose the size of a tar block which is the atomic chunk of data for files handled by tar. In other words, a file stored in the tar achive will use at least one block, even if it's smaller.

blocksize               @ client.c:4605
-> cmd_block()          @ clitar.c:1311

setmode lets the user change certain remote file attributes (readonly, hidden, system, archive). I don't really know why it is placed along the code which handle tar backup.

This could be moved elsewhere in the source code.

setmode                 @ client.c:4664
-> cmd_setmode()        @ clitar.c:1381
  -> do_setrattr()      @ clitar.c:590
    -> cli_getatr()
    -> cli_setatr()       fetch and set new attribute

Tar options

The following options can be passed via -T on the command line. Some of them have different behaviour depending on the context (creation vs. extraction). Because we need to be retrocompatible, these must be supported in the new version.

More detailed descriptions can be found in the smbclient(1) man page.

General options

q
quiet

Extract options (-x)

I
include path (default)
X
exclude path
F
read path from file
r
regex incl/excl

Create options (-c)

I
include path
X
exclude path
F
read paths from file
b
tar blocksize
N
only backup files newer than filename
a
set archive bit on archived files
g
incremental mode
r
regex incl/excl

Backup script

There is also a simplified backup script called smbtar in the Samba distribution (source3/script/smbtar) which may need to be updated although I don't know if many people use it.

Tests

There is a tarmode test in source3/script/tests/test_smbclient_tarmode.sh which could be improved.

New implementation

I plan on first replacing tar reading and writing functions with libarchive ones without changing the command line interface so that users can use the new code without even knowing it.

Compression

A new option will be introduced in the tar flags to provide compression, probably via a z option following the tradition of re-using UNIX tar options. The compression will default to gzip but an additionnal option can be introduced to choose the compression algorithm.

Tests

There are not many tests for the tar module, I plan on improving the existing one for tarmode and add other to cover most of the tar module functions as I rewrite it.

Planning

I will be able to work full-time on the project once I finish all my finals so around the 24th of June. My classes start back around the 7th of September but I will give myself a week off before it.

I've splitted the work in weeks. It's not meant to be very accurate, it just gives an idea of where I should be in my work at any time.

Week 1

  • Set up a real testing environement with a bleeding edge version of the samba server + client.
  • Get more familiar with smbclient source. Try to change some things around, try out every possible options combination for tar creation/extraction.

Week 2

  • Get familiar with libarchive, study the docs and the examples, write test programs.

Week 3

  • Learn how to use the waf build system to add libarchive as a dependency.
  • Choose a starting point to integrate libarchive e.g. simple tar creation without any fancy options.
  • Set up a first simple test for it.

Week 4, 5, 6

  • Convert and implement each function to the new system.
  • Write regression tests along the way.

Week 7

  • Once the system is in place, adding compression should be easier.

Week 8

  • Polish everything up, write documentation where it's needed and discuss with other developper towards merging the code back.

Author: Aurélien Aptel

Created: 2013-05-02 Thu 03:39

Emacs 24.3.1 (Org mode 8.0.2)

Validate XHTML 1.0