Uploaded image for project: 'Service Packs and Hot Fixes'
  1. Service Packs and Hot Fixes
  2. MNT-9137

Better logging for BulkImporter - latin1 filenames causing problems

    Details

    • Type: Service Pack Request
    • Status: Closed (View Workflow)
    • Resolution: Fixed
    • Affects Version/s: 4.1.4
    • Fix Version/s: 4.1.6
    • Component/s: Repository
    • Labels:
      None
    • Environment:
      linux pg tomcat

      Description

      How to reproduce?
      =================
      1) create a plain vanilla alfresco 4.1.4 (linux pg tomcat)
      2) on your local computer, create files that are UTF8 encoded and latin1 encoded.
      You can use the attached python script that create 4 files:

      #!/usr/bin/python3
      strings=["with áccent","without accent"]
      
      for s in strings:
          name=s.replace(" ","_")
          fname=name+".txt"
      
          for code in ["ISO-8859-1","UTF-8"]:
              fname=code+"_"+name+".txt"
              f=open(fname.encode(code),"w")
              f.write(s)
              f.close()
      

      or use the resulted files in the attached ZIP.

      3) in Alfresco explorer create a folder 'import' under 'Company Home'
      4) go to:

      http://localhost:8080/alfresco/service/bulkfsimport/status

      As "Import directory", enter your local export path

      As "Target space (NodeRef or Path)" enter

      /Company Home/import

      5) submit the form

      Results:
      ========
      only 3 files are imported, the file whose name contains accent latin1 encoded fails.

      In the logs, if we set
      log4j.logger.org.alfresco.repo.bulkimport=trace
      then we see:

      2013-06-27 14:51:53,936 WARN [bulkimport.impl.DirectoryAnalyserImpl] [BulkFilesystemImport-BackgroundThread] Skipping unreadable file '/home/madon/act/69176_michelinusa/export2/ISO-8859-1_with_�ccent.txt'.

      Expected result:
      ================
      1) If we expect to have a failure on non UTF8 filenames, then we should WARN in the logs indicating the reason.

      2)
      http://docs.alfresco.com/4.1/topic/com.alfresco.enterprise.doc/concepts/bulk-import-prepare-filesystem.html
      filenames encoding is not mentioned. We should specify that the tool only works if the filesystem is UTF8 encoded.

      3) could we modify the default log level for the bulk importer?
      Because at the current default, WARNings are not logged.

      4) there exist tools that can help in doing file name encoding conversion. One of them is called convmv, example of use:

      convmv --notest -r -f latin1 -t utf-8 *
      

        Attachments

        1. generate_latin1.py
          0.3 kB
        2. latin1a.png
          latin1a.png
          134 kB
        3. samples.tgz
          0.3 kB

          Issue Links

            Activity

              People

              • Assignee:
                closedbugs Closed Bugs
                Reporter:
                amadon Alex Madon [X] (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 30 minutes
                  30m