Using libarchive inside smbclient
Table of Contents
Introduction
The goal of this project is to replace smbclient own limited tar implementation with a more complete one using a separate library (libarchive).
libarchive is a BSD licensed library which can read and write various archived and/or compressed file formats. It's fast, robust and portable.
Using libarchive would fix several bugs related to the current tar implementation, it would provide multiple archive/compression formats and it might improve performances.
Current implementation
To better understand what needs to be done, let's see how tar is handled in Samba.
Samba 4 client doesn't handle tar yet so I will focus on the v3
of smbclient. I'm using revision 251767cde9a146d8
from the git repo
throughout this document.
The tar module lets the user make a backup of a remote share in the form of a tar file. It also provides a way to restore that backup i.e. extracting the backup file onto a remote share.
Most of the code reading and writing tar files lives in
source3/client/clitar.c
which exports the following functions:
int cmd_block(void)
int cmd_tarmode(void)
int cmd_setmode(void)
int cmd_tar(void)
int process_tar(void)
int tar_parseargs(int argc, char *argv[], const char *Optarg, int Optind)
Argument handling/parsing
All the cmd_xxx
functions are called when parsing command line arguments.
The tar module is triggered with tar
and -T
on the command
line.
help flags etc @ client.c:4667 -> cmd_tar() @ clitar.c:1499 -> tar_parseargs() @ clitar.c:1792 -> process_tar() @ clitar.c:1527
Additionnal options regarding archive bits and what to do with them
can be specified with tarmode
.
tarmode @ client.c:4667 -> cmd_tarmode() @ clitar.c:1337
blocksize
(cmd_block
) lets the user chose the size of a tar block
which is the atomic chunk of data for files handled by tar. In other
words, a file stored in the tar achive will use at least one block,
even if it's smaller.
blocksize @ client.c:4605 -> cmd_block() @ clitar.c:1311
setmode
lets the user change certain remote file attributes
(readonly
, hidden
, system
, archive
). I don't
really know why it is placed along the code which handle tar backup.
This could be moved elsewhere in the source code.
setmode @ client.c:4664 -> cmd_setmode() @ clitar.c:1381 -> do_setrattr() @ clitar.c:590 -> cli_getatr() -> cli_setatr() fetch and set new attribute
Tar options
The following options can be passed via -T
on the command line. Some
of them have different behaviour depending on the context (creation
vs. extraction). Because we need to be retrocompatible, these must
be supported in the new version.
More detailed descriptions can be found in the smbclient(1)
man
page.
General options
-
q
- quiet
Extract options (-x
)
-
I
- include path (default)
-
X
- exclude path
-
F
- read path from file
-
r
- regex incl/excl
Create options (-c
)
-
I
- include path
-
X
- exclude path
-
F
- read paths from file
-
b
- tar blocksize
-
N
- only backup files newer than filename
-
a
- set archive bit on archived files
-
g
- incremental mode
-
r
- regex incl/excl
Backup script
There is also a simplified backup script called smbtar
in the Samba
distribution (source3/script/smbtar
) which may need to be updated
although I don't know if many people use it.
Tests
There is a tarmode test in
source3/script/tests/test_smbclient_tarmode.sh
which could be improved.
New implementation
I plan on first replacing tar reading and writing functions with libarchive ones without changing the command line interface so that users can use the new code without even knowing it.
Compression
A new option will be introduced in the tar flags to provide
compression, probably via a z
option following the tradition of
re-using UNIX tar options. The compression will default to gzip
but
an additionnal option can be introduced to choose the compression
algorithm.
Tests
There are not many tests for the tar module, I plan on improving the existing one for tarmode and add other to cover most of the tar module functions as I rewrite it.
Planning
I will be able to work full-time on the project once I finish all my finals so around the 24th of June. My classes start back around the 7th of September but I will give myself a week off before it.
I've splitted the work in weeks. It's not meant to be very accurate, it just gives an idea of where I should be in my work at any time.
Week 1
- Set up a real testing environement with a bleeding edge version of the samba server + client.
- Get more familiar with smbclient source. Try to change some things around, try out every possible options combination for tar creation/extraction.
Week 2
- Get familiar with libarchive, study the docs and the examples, write test programs.
Week 3
- Learn how to use the waf build system to add libarchive as a dependency.
- Choose a starting point to integrate libarchive e.g. simple tar creation without any fancy options.
- Set up a first simple test for it.
Week 4, 5, 6
- Convert and implement each function to the new system.
- Write regression tests along the way.
Week 7
- Once the system is in place, adding compression should be easier.
Week 8
- Polish everything up, write documentation where it's needed and discuss with other developper towards merging the code back.