[MNT-2624] Correct mimetype not identified when uploading via Explorer or ftp Created: 20-Jun-12  Updated: 06-Mar-14  Resolved: 11-Nov-13

Status: Closed
Project: Service Packs and Hot Fixes
Component/s: Tika, POI, and Metadata Extraction
Affects Version/s: 4.0.1
Fix Version/s: 4.2.1

Type: Service Pack Request
Reporter: Marco Mancuso [X] (Inactive) Assignee: Closed Bugs (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 0 minutes
Time Spent: 7 hours
Original Estimate: Not Specified

Attachments: File Marketing - (potentielle) Partner.mmap     File Neueintritt Mitarbeiter.bpm     XML File custom-mimetypes.xml     File custom-tika-mimetypes.jar     XML File mimetypes-extension-map.xml     File testfile.xmind    
Issue Links:
Related
relates to MNT-7719 Upgrade of Tika to 1.3-SNAPSHOT Closed
is related to by SHA-123 Mime-type detection is not consistent... Open
is related to by MNT-2336 Files loose mimetype when the filenam... Closed
is related to by MNT-6978 Alfresco Explorer assigns incorrect m... Closed
Bug Priority:
Category 2
ACT Numbers:

45804

Build Location: http://releases.alfresco.com/Enterprise%204.2/4.2.1/build-00050/

 Description   

[Steps to reproduce]
1) Put mimetypes-extension-map.xml in extension/mimetype/
this file extends Alfresco with 3 new mime types:
∙ application/x-xmind (extension: xmind, xmap)
∙ application/vnd.mindjet.mindmanager (extension: mmap, mmp, mmpt, mmat, mmmp, mmas)
∙ application/bizagi-modeler (extension: bpm)
2)Start Alfresco
3)Send via FTP the attached files to the repository
4) Check the mimetype of the uploaded files on the details properties page on Share

[ Actual Result]
Documents have ZIP mimetype

[Expected result]
Files should have the correct mimetype

[Notes]
∙ If files are uploaded through Share the correct mimetype is visible
∙ Sending the files via FTP org.alfresco.repo.content.MimetypeMap.guessMimetype(String, ContentReader) is invoked
Possible workaround/solution -> add MediaType.APPLICATION_ZIP.equals(type) check:
// If Tika has supplied a very generic type, go with the filename one,
// as it's probably a custom Text or XML format known only to Alfresco
if (MediaType.TEXT_PLAIN.equals(type) || MediaType.APPLICATION_XML.equals(type) || MediaType.APPLICATION_ZIP.equals(type))

{ return filenameGuess; }

 Comments   
Comment by Amin Zamani (Inactive) [ 27-Jun-12 ]

Hallo,

our customer is waiting of a solution for this problem. Can you tell me the state ?

Comment by Amin Zamani (Inactive) [ 27-Jun-12 ]

By the way: If you upload the files through Alfresco Explorer the same problem appears.

Comment by Andrew Hunt [X] (Inactive) [ 27-Jun-12 ]

This looks like it is not a problem with the registering of the additional mimetypes.
Going to http://localhost:8080/alfresco/service/mimetypes?mimetype=* shows that the extra extensions have been imported, and they also show up in the Explorer content-type drop-down list.

Comment by Amin Zamani (Inactive) [ 28-Jun-12 ]

Hi,

thank you very much for the answer. But that does not resolve the problem. You have received my testfiles. So where is the Problem? I have done everything right. As I told you, Share does not have the problem. So how can we fix it? Why does it work in Share but not in Alfresco? I know that the extra extensions are imported and that they also show up in Explorer content-type drop-down. So there must be a difference between Share and Alfresco Explorer. Share realizes the correct mime-type but in Alfresco Explorer not. I thank you very much for solving this problem, because our customer is waiting. The problem also exists by FTP upload or CIFS upload. It seems that only Share is correctly working.

Best regards
Amin

Comment by Andrew Hunt [X] (Inactive) [ 28-Jun-12 ]

Amin - You are right, this does not solve the problem - this issue has been assigned to our engineering team to look at.
If you wish to continue the discussion, please do so via the ACT ticket where our support team will help you understand what your next steps are.

Comment by Amin Zamani (Inactive) [ 28-Jun-12 ]

Hi Andrew,

thank you very much! Of course I want to continue! We have to fix this problem. Thank you very much!

Comment by Nick Burch (Inactive) [ 04-Jul-12 ]

The issue relates to the fact that Tika has a full mimetype hierarchy, while Alfresco (which predates Tika by a long way) currently only has a flat list. This means that in Tika you can say "this is based on a zip" or "this is based on a FooBar which in turn is based on XML", that hierarchy information isn't defined in Alfresco, so can't be used to help direct detection. (There has been some talk of moving the Alfresco MimeType model to be based on the Tika one, to gain advantages of things like this, and also to benefit from the wider variety of mimetypes defined in Apache Tika, but this work has so far not been a high enough priority to be tackled)

What I'd suggest you do is define the mimetype to both Alfresco and Apache Tika. By defining it in Alfresco, it'll be available in dropdowns, descriptions etc. By defining it to Apache Tika, Alfresco will be able to use the full details of it via Tika, and then any other Tika using applications you may have (eg other content systems, standalone SOLR installs etc) will be able to correctly handle and detect your files.

To do this, there are two steps. The first is to create a custom Tika extension mimetypes file, that knows about your files. The second is to contribute this to Apache Tika, so it gets included upstream and will be present as standard going forward

Comment by Nick Burch (Inactive) [ 04-Jul-12 ]

If you drop the attached jar into your classpath, then detection should work correctly. Also attached is the custom-mimetypes file in the jar

To test it, grab the latest copy of the Tika-App jar, then run with something like

$ java -classpath tika-app-1.1.jar:custom-tika-mimetypes.jar --detect --detect Neueintritt\ Mitarbeiter.bpm
application/bizagi-modeler

This allows you to check that the detection is working correctly

Comment by Nick Burch (Inactive) [ 04-Jul-12 ]

As these seem to be fairly standard file types, I've added them to the core of Apache Tika as part of TIKA-949.

Once the version of Apache Tika included in Alfresco is upgraded to Tika 1.2 or newer (likely shortly after Tika 1.2 is released), the fix will be present as standard.

Comment by Alfresco QA Team (Inactive) [ 11-Nov-13 ]

Successfully verified against Alfresco Enterprise v4.2.1
(r57815-b58) schema 6052, Redhat 6.4 x64, Tomcat, PostgreSQL, Java 6 (all installer deployed) Client: FF 25.0, Windows 7 SP1 x64

Generated at Fri Apr 23 12:53:21 BST 2021 using Jira 7.13.15#713015-sha1:7c5ddd2c3e1709974ae9c48c17df8edd3919fe2c.