Difference between revisions of "XML Repair"
(update with new invalid files and suppressed old ones which were mistakes)
Revision as of 10:04, 24 November 2005
XML Repair and Validation service
The RFE 1119025 repair broken xml on load and the bug 1326574 bogus transform on a bitmap from "create bitmap copy" both provide examples of situations where Inkscape's intented behaviour would be to correct or warn about badly formated svg files. The aim of this page is to give examples of such files and to discuss the progress made on repairing them inside Inkscape and how to do it.
This page is written without having a clue of what is already done when inkscape opens of saves a file so please add whatever is necessary.
What kind of "wrong" XML?
At least two types of wrong files could be defined:
- non valid XML files which Inkscape cannot parse
- valid XML files which Inkscape can open but which cause editing or viewing problems
In the first case current behavior is not to open the file. In second case, current behavior is to let thee user find out what is wrong.
What to do about them?
RFE 1119025 is about trying to repair the file at startup and recover what is recoverable. The intended behavior would be to have a message of the form:
Document <name> contains errors. Do you want to attempt to repair it?
There could also be a button to show a more detailed explaination: "Inkscape can create a new document from any readable objects contained in your file. Your original file will not be touched. Inkscape will open a new window called <name> [Recovered]. You might want to export the recovered objects into a new file before proceeding. Note: You may be able to obtain better results using an XML editor or text editor."
Deciding to recover the file will parse it with a more permissive XML parser and recover what can be recovered. What cannot be recovered should be preserved as an XML comment.
The example of some valid files with problems was discussed on a bug report and on several threads of the devel mailing list: zooming bug, zooming bug bis, sanity checking on file open, what should zoom to content do, artificial limits. All started from a bug in Inkscape which produced one object with insane bounding box which itself induced a strange zooming behavior. The bug itself was corrected by bulia. The discussion which followed was focused on the fact that such files can be encountered in the "wild" and that some Inkscape feature could help dealing with them.
The intended behavior in this case would be to issue visible warnings and to provide solutions (even is they basically redirect the user to the XML editor). The problem about when to warn occurs and the suggestion was to build a "Validate file" or "Repair File" feature which would validate the XML, detect strange cases and ask the user what he wants to do about it (correct or suppress or...). The discussion is about adding to this features artifical limits, human defined, above which some bounding box for example is considered insane.
The two cases could obviously share some features.
In this section, files which cause particular problems can be linked and described.
Non valid XML
Valid XML with problems
Inkscape file with a bug: This file with one object having a huge bounding box was generated following a bug in Inkscape (fixed already) but some other might be encountered. It should look like [this ]
Files converted from EPS/PS by pstoedit Most scientific software output EPS or PS and pstoedit can make SVG from them but they behave strangely in Inkscape. Here are links to a sequence of files in a workflow based on scilab (a free numerical computing software, similar to matlab and octave): original EPSPNG of what it looks likeconverted to SVG by pstoeditSVG parsed by InkscapediffSVG copy-pasted in a new Inkscape document. The original file is generated by Scilab 3.1.1, converted by pstoedit version 3.40 / DLL interface 108 (build Oct 10 2005 - release build) and parsed by Inkscape CVS 20051123 on mac OS X. Many transform matrices look like: transform="matrix(2.007003e-3,0.000000,0.000000,-2.007003e-3,-6.079619e-2,1.461081)". Probably related to this, every new object or text looks huge and dashes cannot be applied to lines. But some "magic" happens: when copying all objects and pasting them in a new document, everything is fine! You have the diff between the two file to see what changed.
- 17 Oct 2005: The issue is raised ;-)