[ALF-675] Alfresco Search fails for some document sections in which search term is contained (.doc/.xls formats) Created: 02-Oct-09  Updated: 23-Mar-10  Resolved: 05-Mar-10

Status: Closed
Project: Alfresco
Component/s: Repository
Affects Version/s: 3.2 Enterprise
Fix Version/s: 3.3 Enterprise

Type: Bug Priority: Major
Reporter: Alfresco QA Team (Inactive) Assignee: Closed Bugs (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Alfresco Enterprise 3.2.0 SP2 (beta1 116), MS Windows 2003 Enterprise SP2 32bit, Tomcat 6.0.18 (JDK 1.5.0_09-b03), MySQL 5.0.67-community-nt (standalone installation).

Alfresco Enterprise 3.2.0 SP2 (beta1 116), Stack 1: RHEL 5.1 x64, Tomcat 6.0.18, MySQL 5.1.30, JDK 6u16 x64, Alfresco+OpenLDAP (2-machine cluster).


Attachments: Microsoft Word 111_search_00_ALL_anker_ankerfilename.doc     Microsoft Word 222_search_00_ALL_anker_ankerfilename.xls     Microsoft Word Results.xls     Microsoft Word results-3.2.0.224.xls    
Testcase ID:

Rep-402:JSF Client Search test for umlauts in doc/xls files
Rep-403:Web Site Search test for umlauts in submitted doc/xls files (WCM)
Rep-452:Share Client Search test for umlauts in doc/xls files in site's DocLib

Date of First Response:

 Description   

Placing search term in some sections of .doc/.xls-file, uploaded to Alfresco repository by any upload route, will cause Alfresco search failure (empty search results) for all the upload routes and search routes (Alfresco JSF Client Simple Search ("All Items" option), Alfresco JSF Client Search Website, Share Client Search All Sites).

Problematical document sections in the first place are:

  • footer;
  • header;
  • text fields;
  • comments;
  • macro.

Results don't depend on upload route. For both doc and xls files uploaded
1. using JSF client into DM repo
2. using JSF client into WCM project
3. using Share client into any site's DocLib
4. using FTP
5. using CIFS
6. using WebDAV
7. using Sharepoint protocol
search results are the same for all the 3 tested Alfresco search routes:
a. Alfresco JSF Client Simple Search ("All Items" option)
b. Alfresco JSF Client Search Website
c. Share Client Search All Sites

Please find detailed results in attached "results.xls".

Scripts for re-test are available here on svn:
https://svn.alfresco.com/repos/qa/Jmeter/Search_Testing/Umlaut_Search_Tests/.



 Comments   
Comment by Andrew Hind [X] (Inactive) [ 06-Oct-09 ]

The search terms depend on the document convertor used.
Please retest and ensure that OpenOffice is the only convertor available for these document types.
The other convertors are faster (and thus get chosen in preference). Open Office is the more accurate.

Comment by Andrew Hind [X] (Inactive) [ 06-Oct-09 ]

Please confirm this exists using the OpenOffice convertor and disable all other convertors.
Any issue with conversion at this point lies with OpenOffice upon which we rely.

Comment by Alfresco QA Team (Inactive) [ 12-Nov-09 ]

Re-tested for Alfresco Enterprise 3.2.0 (beta2 224) with OpenOffice as the only convertor configured: Alfresco was installed from Alfresco-Enterprise-3.2.0beta2-OOo-Setup.exe (once for build 224) with enabled OpenOffice and without any additional changes made in the configuration.

Results of search are the same as previuosly submitted (see last attached spreadsheet).

Alfresco Enterprise Network 3.2.0 Beta2 (build 224, standalone installation), MS Windows 2003 Enterprise SP2 32bit, Tomcat 6.0.18 (JDK 1.6.0_11-b03), MySQL 5.0.67.

Comment by Alfresco QA Team (Inactive) [ 10-Dec-09 ]

Re-tested for Alfresco Enterprise 3.2.0 (290) with OpenOffice 3.1 as the only converter configured.

Results of search are the same as previously submitted.
Please find results here: https://svn.alfresco.com/repos/qa/Jmeter/Search_Testing/Umlaut_Search_Tests/results/.

Alfresco Enterprise Network 3.2.0 (build 290, standalone installation), Windows 2008 SP1 x64, JBoss 5.1.0 GA (JDK 6u16), Oracle 10g 10.2.0.3.
Alfresco Enterprise Network 3.2.0 (build 290, standalone installation), MS Windows 2003 Enterprise SP2 32bit, Tomcat 6.0.18 (JDK 1.6.0_11-b03), MySQL 5.1.41.

Comment by Alfresco QA Team (Inactive) [ 11-Dec-09 ]

The same for 3.2.0.290 on Tomcat 6.0.18 + Oracle 10g with OpenOffice 3.1.

Alfresco Enterprise Network 3.2.0 (build 290, standalone installation), MS Windows 2003 Enterprise SP2 32bit, Tomcat 6.0.18 (JDK 1.6.0_11-b03), Oracle 10.2.0.1.0 x32.

Comment by Andrew Hind [X] (Inactive) [ 05-Mar-10 ]

There is nothing I can do here to fix the document conversion.

Comment by Steve Rigby [X] (Inactive) [ 05-Mar-10 ]

Can't fix until relevant text extraction plugins can extract relevant text

Comment by Steve Rigby [X] (Inactive) [ 05-Mar-10 ]

FYI

Comment by mkononovich [ 10-Mar-10 ]

Postponed according to previous comments

Comment by Alfresco QA Team (Inactive) [ 23-Mar-10 ]

Closing until convertors upgrade.

Generated at Sun Mar 07 18:40:47 GMT 2021 using Jira 7.13.15#713015-sha1:7c5ddd2c3e1709974ae9c48c17df8edd3919fe2c.