Difference between revisions of "XML Repair"

From Inkscape Wiki
Jump to navigation Jump to search
(added some new files)
 
m (→‎Valid XML with problems: Some links cleaning)
 
(7 intermediate revisions by 4 users not shown)
Line 21: Line 21:
[http://sourceforge.net/tracker/index.php?func=detail&aid=1119025&group_id=93438&atid=604309 RFE 1119025] is about trying to repair the file at startup and recover what is recoverable. The intended behavior would be to have a message of the form:
[http://sourceforge.net/tracker/index.php?func=detail&aid=1119025&group_id=93438&atid=604309 RFE 1119025] is about trying to repair the file at startup and recover what is recoverable. The intended behavior would be to have a message of the form:


"'''The document <name> could not be read in its entirety because it has errors, or some data are missing. Do you want to recover the readable data in this file?'''
'''Document <name> contains errors. Do you want to attempt to repair it?'''


Inkscape can create a new document from any  
'''[Cancel] [Repair]'''
 
There could also be a button to show a more detailed explaination: 
"Inkscape can create a new document from any  
readable objects contained in your file. Your original file will not
readable objects contained in your file. Your original file will not
be touched. Inkscape will open a new window called  
be touched. Inkscape will open a new window called  
Line 37: Line 40:
The example of some valid files with problems was discussed on a [http://sourceforge.net/tracker/index.php?func=detail&aid=1326574&group_id=93438&atid=604306 bug report] and on several threads of the devel mailing list: [http://sourceforge.net/mailarchive/forum.php?thread_id=8578063&forum_id=36054 zooming bug], [http://sourceforge.net/mailarchive/forum.php?thread_id=8602372&forum_id=36054 zooming bug bis], [http://sourceforge.net/mailarchive/forum.php?thread_id=8602956&forum_id=36054 sanity checking on file open], [http://sourceforge.net/mailarchive/forum.php?thread_id=8606155&forum_id=36054 what should zoom to content do], [http://sourceforge.net/mailarchive/forum.php?thread_id=8629987&forum_id=36054 artificial limits]. All started from a bug in Inkscape which produced one object with insane bounding box which itself induced a strange zooming behavior. The bug itself was corrected by bulia. The discussion which followed was focused on the fact that such files can be encountered in the "wild" and that some Inkscape feature could help dealing with them.
The example of some valid files with problems was discussed on a [http://sourceforge.net/tracker/index.php?func=detail&aid=1326574&group_id=93438&atid=604306 bug report] and on several threads of the devel mailing list: [http://sourceforge.net/mailarchive/forum.php?thread_id=8578063&forum_id=36054 zooming bug], [http://sourceforge.net/mailarchive/forum.php?thread_id=8602372&forum_id=36054 zooming bug bis], [http://sourceforge.net/mailarchive/forum.php?thread_id=8602956&forum_id=36054 sanity checking on file open], [http://sourceforge.net/mailarchive/forum.php?thread_id=8606155&forum_id=36054 what should zoom to content do], [http://sourceforge.net/mailarchive/forum.php?thread_id=8629987&forum_id=36054 artificial limits]. All started from a bug in Inkscape which produced one object with insane bounding box which itself induced a strange zooming behavior. The bug itself was corrected by bulia. The discussion which followed was focused on the fact that such files can be encountered in the "wild" and that some Inkscape feature could help dealing with them.


The intended behavior in this case would be to issue visible warnings and to provide solutions (even is they basically redirect the user to the XML editor). The problem about when to warn occurs and the suggestion was to build a "Validate file"  or "Fix file" feature which would validate the XML, detect strange cases and ask the user what he wants to do about it (correct or suppress or...). The discussion is about adding to this features artifical limits, human defined, above which some bounding box for example is considered insane.
The intended behavior in this case would be to issue visible warnings and to provide solutions (even is they basically redirect the user to the XML editor). The problem about when to warn occurs and the suggestion was to build a "Validate file"  or "Repair File" feature which would validate the XML, detect strange cases and ask the user what he wants to do about it (correct or suppress or...). The discussion is about adding to this features artifical limits, human defined, above which some bounding box for example is considered insane.
 


The two cases could obviously share some features.
The two cases could obviously share some features.
Line 48: Line 50:


=== Non valid XML ===
=== Non valid XML ===
# Octave (a free numerical computing software) can produce plots directly to SVG and it seems that even [http://jo.irisson.free.fr/dropbox/inkscape/octave_svg.svg a very simple one] cannot be opened in Inkscape. It might well be octave fault but the file is here for you to check.
 
A bug in pre-0.43 versions, sometimes wrote numbers with
exponents in CSS where this is not allowed, and 0.43 rejects such
style specifications. To manually fix your file, replace in a text
editor all instances of "2.4424907e-14" by "0" and it will load in
0.43 fine. (Words by Bulia added by Alan, rephrase if you want.)
 


=== Valid XML with problems ===
=== Valid XML with problems ===
# [http://jo.irisson.free.fr/dropbox/inkscape/schema_bilan.svg file with one object having a huge bounding box]. This file was generated following a bug in Inkscape but some other might be encountered.
'''Inkscape file with a bug''': This [http://jo.irisson.free.fr/dropbox/inkscape/schema_bilan.svg file with one object having a huge bounding box] was generated following a bug in Inkscape (fixed already) but some other might be encountered. It should look like [this ]
# [http://jo.irisson.free.fr/dropbox/inkscape/sampling.svg file with many insane transform matrices]. This file was generated as eps by Scilab, a scientific plotting software, and converted to svg by pstoedit version 3.40 / DLL interface 108 (build Oct 10 2005 - release build). Many transform matrices look like: transform="matrix(2.007003e-3,0.000000,0.000000,-2.007003e-3,-6.079619e-2,1.461081)". Probably related to this, every new object or text looks huge and dashes cannot be applied to lines.
 
# Another file with many insane transform matrices converted from an eps file. The [http://jo.irisson.free.fr/dropbox/inkscape/R_plot.eps eps file] was generated by R, a free, cross platform statistics software which is quite well known and standard in statistics studies. Then it was converted to an [http://jo.irisson.free.fr/dropbox/inkscape/R_plot_pstoedit.svg svg file] by pstoedit. Finally it was parsed by inkscape and saved to [http://jo.irisson.free.fr/dropbox/inkscape/R_plot_pstoedit_inkscape.svg Inkascape SVG]. This file induces any new object to be insanely big and dashes cannot be applied to any stroke (already present or newly created).
'''Files converted from EPS/PS by pstoedit'''
# And agin a file with many insane transform matrices converted from an eps file. The [http://jo.irisson.free.fr/dropbox/inkscape/octave_ps.ps ps file] was generated by Octave, converted to an [http://jo.irisson.free.fr/dropbox/inkscape/octave_ps_pstoedit.svg svg file] by pstoedit and parsed by inkscape to [http://jo.irisson.free.fr/dropbox/inkscape/octave_ps_pstoedit_inkscape.svg Inkascape SVG]. As the others, this file induces any new object to be insanely big and dashes cannot be applied to any stroke (already present or newly created).
Most scientific software output EPS or PS and pstoedit can make SVG from them but they behave strangely in Inkscape. Here are links to a sequence of files in a workflow based on scilab (a free numerical computing software, similar to matlab and octave):
#
[http://jo.irisson.free.fr/dropbox/inkscape/scilab.eps original EPS][http://jo.irisson.free.fr/dropbox/inkscape/scilab.png PNG of what it looks like][http://jo.irisson.free.fr/dropbox/inkscape/scilab_pstoedit.svg converted to SVG by pstoedit][http://jo.irisson.free.fr/dropbox/inkscape/scilab_pstoedit_parsedbyinkscape.svg SVG parsed by Inkscape][http://jo.irisson.free.fr/dropbox/inkscape/diff_between_original_and_copypasted.txt diff][http://jo.irisson.free.fr/dropbox/inkscape/scilab_pstoedit_parsedbyinkscape_copypasted.svg SVG copy-pasted in a new Inkscape document].
The original file is generated by Scilab 3.1.1, converted by pstoedit version 3.40 / DLL interface 108 (build Oct 10 2005 - release build) and parsed by Inkscape CVS 20051123 on mac OS X.
Many transform matrices look like: transform="matrix(2.007003e-3,0.000000,0.000000,-2.007003e-3,-6.079619e-2,1.461081)". Probably related to this, every new object or text looks huge and dashes cannot be applied to lines. But some "magic" happens: when copying all objects and pasting them in a new document, everything is fine! You have the diff between the two file to see what changed.


----
----

Latest revision as of 20:54, 18 March 2012

XML Repair and Validation service

The RFE 1119025 repair broken xml on load and the bug 1326574 bogus transform on a bitmap from "create bitmap copy" both provide examples of situations where Inkscape's intented behaviour would be to correct or warn about badly formated svg files. The aim of this page is to give examples of such files and to discuss the progress made on repairing them inside Inkscape and how to do it.

This page is written without having a clue of what is already done when inkscape opens of saves a file so please add whatever is necessary.


What kind of "wrong" XML?

At least two types of wrong files could be defined:

  1. non valid XML files which Inkscape cannot parse
  2. valid XML files which Inkscape can open but which cause editing or viewing problems

In the first case current behavior is not to open the file. In second case, current behavior is to let thee user find out what is wrong.


What to do about them?

Case 1:

RFE 1119025 is about trying to repair the file at startup and recover what is recoverable. The intended behavior would be to have a message of the form:

Document <name> contains errors. Do you want to attempt to repair it?

[Cancel] [Repair]

There could also be a button to show a more detailed explaination: "Inkscape can create a new document from any readable objects contained in your file. Your original file will not be touched. Inkscape will open a new window called <name> [Recovered]. You might want to export the recovered objects into a new file before proceeding. Note: You may be able to obtain better results using an XML editor or text editor."

Deciding to recover the file will parse it with a more permissive XML parser and recover what can be recovered. What cannot be recovered should be preserved as an XML comment.

Case 2:

The example of some valid files with problems was discussed on a bug report and on several threads of the devel mailing list: zooming bug, zooming bug bis, sanity checking on file open, what should zoom to content do, artificial limits. All started from a bug in Inkscape which produced one object with insane bounding box which itself induced a strange zooming behavior. The bug itself was corrected by bulia. The discussion which followed was focused on the fact that such files can be encountered in the "wild" and that some Inkscape feature could help dealing with them.

The intended behavior in this case would be to issue visible warnings and to provide solutions (even is they basically redirect the user to the XML editor). The problem about when to warn occurs and the suggestion was to build a "Validate file" or "Repair File" feature which would validate the XML, detect strange cases and ask the user what he wants to do about it (correct or suppress or...). The discussion is about adding to this features artifical limits, human defined, above which some bounding box for example is considered insane.

The two cases could obviously share some features.


Example files

In this section, files which cause particular problems can be linked and described.

Non valid XML

A bug in pre-0.43 versions, sometimes wrote numbers with exponents in CSS where this is not allowed, and 0.43 rejects such style specifications. To manually fix your file, replace in a text editor all instances of "2.4424907e-14" by "0" and it will load in 0.43 fine. (Words by Bulia added by Alan, rephrase if you want.)


Valid XML with problems

Inkscape file with a bug: This file with one object having a huge bounding box was generated following a bug in Inkscape (fixed already) but some other might be encountered. It should look like [this ]

Files converted from EPS/PS by pstoedit Most scientific software output EPS or PS and pstoedit can make SVG from them but they behave strangely in Inkscape. Here are links to a sequence of files in a workflow based on scilab (a free numerical computing software, similar to matlab and octave): original EPSPNG of what it looks likeconverted to SVG by pstoeditSVG parsed by InkscapediffSVG copy-pasted in a new Inkscape document. The original file is generated by Scilab 3.1.1, converted by pstoedit version 3.40 / DLL interface 108 (build Oct 10 2005 - release build) and parsed by Inkscape CVS 20051123 on mac OS X. Many transform matrices look like: transform="matrix(2.007003e-3,0.000000,0.000000,-2.007003e-3,-6.079619e-2,1.461081)". Probably related to this, every new object or text looks huge and dashes cannot be applied to lines. But some "magic" happens: when copying all objects and pasting them in a new document, everything is fine! You have the diff between the two file to see what changed.


Developement status

  • 17 Oct 2005: The issue is raised ;-)