concurrent writes to webdav lead to data loss (0kb resulting file)
How to reproduce?
The idea is to have two users writing to the same file name at the same time. As the customer reported seeing the issue on WANs (slow networks) or with large files, the strategy I used is to take a middle sized file (1.82Mb) and to slow down the network artificially, see below.
1) install a plain vanilla Alfresco 2.2SP5 (mysql+alfresco auth)
2) log in in the web UI as admin and create two users:
user1 with home folder user1
user2 with home folder user2
3) log in as user1 and invite user2 to user1's home folder with a role allowing write (e.g. collaborator)
4) from a first Windows client (XP or win2003s), open a "Network Place" to http://alfrescoserver:8080/alfresco/webdav as 'user1' (I used a VM talking to a Linux alfresco server using interface vbox0)
5) from a second Windows client (XP or win2003s), open a "Network Place" to http://alfrescoserver:8080/alfresco/webdav as 'user2' (I used a VM talking to a Linux alfresco server using interface vbox1)
6) Create a Excel file of about 2Mb, for instance generating a CSV file using:
for i in `seq 1 30000`
$i, worldworlworld--$i" >> tot.csv
and converting it to XLS with abiword or OOffice (or use the one attached 'tot.xls', 1.82Mb in size)
7) put that file as 'user1' in 'user1' home folder (using the web UI or webdav)
8) copy this file on each client desktop and check that each webdav client sees the file in webdav with the correct size (1.82Mb)
9) now you are ready to do a concurrent write:
a) slow down your network interfaces. On the alfresco server (which is a Linux host of the two windows VMs) you can do this using the 'tc' command (Traffic Control):
tc qdisc add dev vbox0 root tbf rate 10Kbit burst 10Kb lat 0.5s
tc qdisc add dev vbox1 root tbf rate 10Kbit burst 10Kb lat 0.5s
as the guest connections to the host is only at 10K you have time to
b) from the first windows client, copy 'tot.xls' from the windows desktop and write to the file with the same name in webdav: the webdav client asks if one want to overwrite the old file, say yes.
c) while the first client is uploading, from the second windows client, copy 'tot.xls' from the windows desktop and write to the file with the same name in webdav: the webdav client asks if one want to overwrite the old file, say yes.
d) now that you have initiated a concurrent write, if you do not want to wait and to speed up the finish of the process, you can speed up the guests-host network connections:
tc qdisc replace dev vbox0 root tbf rate 10000Kbit burst 10000Kb lat 0.5s
tc qdisc replace dev vbox1 root tbf rate 10000Kbit burst 10000Kb lat 0.5s
e) the two copy ends one after the other.
A refresh of the webdav client windows now show that each file has a 0kb size.
Trying to open the file shows no data: the data has been lost.
Concurrent writes are handled correctly by refusing (for instance) the second write request and in any case no data is lost.
The customer discovered this bug in 2.2.0. Another customer is affected in 2.2SP3. I reproduced this in 2.2SP5. I could not reproduce it in 3.1SP1.
A sample excel file.
A video of the process is attached to ticket 14091 (file scenario2_alex.ogv). I could not attach it to Jira as it is 16Mb.
Customer (Partner) says that alfresco webdav fails to pass:
Running its "locks" test suite against Alfresco gives:
-> 28 tests were skipped.
<- summary for `locks': of 13 tests run: 8 passed, 5 failed.