XML Repair

From Inkscape Wiki
Revision as of 15:15, 5 November 2005 by 194.167.139.18 (talk) (added some new files)
Jump to navigation Jump to search

XML Repair and Validation service

The RFE 1119025 repair broken xml on load and the bug 1326574 bogus transform on a bitmap from "create bitmap copy" both provide examples of situations where Inkscape's intented behaviour would be to correct or warn about badly formated svg files. The aim of this page is to give examples of such files and to discuss the progress made on repairing them inside Inkscape and how to do it.

This page is written without having a clue of what is already done when inkscape opens of saves a file so please add whatever is necessary.


What kind of "wrong" XML?

At least two types of wrong files could be defined:

  1. non valid XML files which Inkscape cannot parse
  2. valid XML files which Inkscape can open but which cause editing or viewing problems

In the first case current behavior is not to open the file. In second case, current behavior is to let thee user find out what is wrong.


What to do about them?

Case 1:

RFE 1119025 is about trying to repair the file at startup and recover what is recoverable. The intended behavior would be to have a message of the form:

"The document <name> could not be read in its entirety because it has errors, or some data are missing. Do you want to recover the readable data in this file?

Inkscape can create a new document from any readable objects contained in your file. Your original file will not be touched. Inkscape will open a new window called <name> [Recovered]. You might want to export the recovered objects into a new file before proceeding. Note: You may be able to obtain better results using an XML editor or text editor."

Deciding to recover the file will parse it with a more permissive XML parser and recover what can be recovered. What cannot be recovered should be preserved as an XML comment.

Case 2:

The example of some valid files with problems was discussed on a bug report and on several threads of the devel mailing list: zooming bug, zooming bug bis, sanity checking on file open, what should zoom to content do, artificial limits. All started from a bug in Inkscape which produced one object with insane bounding box which itself induced a strange zooming behavior. The bug itself was corrected by bulia. The discussion which followed was focused on the fact that such files can be encountered in the "wild" and that some Inkscape feature could help dealing with them.

The intended behavior in this case would be to issue visible warnings and to provide solutions (even is they basically redirect the user to the XML editor). The problem about when to warn occurs and the suggestion was to build a "Validate file" or "Fix file" feature which would validate the XML, detect strange cases and ask the user what he wants to do about it (correct or suppress or...). The discussion is about adding to this features artifical limits, human defined, above which some bounding box for example is considered insane.


The two cases could obviously share some features.


Example files

In this section, files which cause particular problems can be linked and described.

Non valid XML

  1. Octave (a free numerical computing software) can produce plots directly to SVG and it seems that even a very simple one cannot be opened in Inkscape. It might well be octave fault but the file is here for you to check.

Valid XML with problems

  1. file with one object having a huge bounding box. This file was generated following a bug in Inkscape but some other might be encountered.
  2. file with many insane transform matrices. This file was generated as eps by Scilab, a scientific plotting software, and converted to svg by pstoedit version 3.40 / DLL interface 108 (build Oct 10 2005 - release build). Many transform matrices look like: transform="matrix(2.007003e-3,0.000000,0.000000,-2.007003e-3,-6.079619e-2,1.461081)". Probably related to this, every new object or text looks huge and dashes cannot be applied to lines.
  3. Another file with many insane transform matrices converted from an eps file. The eps file was generated by R, a free, cross platform statistics software which is quite well known and standard in statistics studies. Then it was converted to an svg file by pstoedit. Finally it was parsed by inkscape and saved to Inkascape SVG. This file induces any new object to be insanely big and dashes cannot be applied to any stroke (already present or newly created).
  4. And agin a file with many insane transform matrices converted from an eps file. The ps file was generated by Octave, converted to an svg file by pstoedit and parsed by inkscape to Inkascape SVG. As the others, this file induces any new object to be insanely big and dashes cannot be applied to any stroke (already present or newly created).

Developement status

  • 17 Oct 2005: The issue is raised ;-)