SpellCheckForTextNodes
Introduction
It is often desireable to spell check text nodes in an inkscape document.
The stable release of inkscape currently (0.41) does not support this. You will have to open the xml in a text editor and search and spell check the text nodes there.
The code in CVS (0.42pre) has support for spell check in that it marks misspelled words in the "Text" tab in the "Text and Font" dialog if you compile with WITH_GTKSPELL. It will then use gtkspell to check the text nodes. gtkspell seems to pick its language according to LC_MESSAGES. Thus, if you have LC_MESSAGES=fr_FR.UTF-8 in your environment it will presumably use "fr_FR" for spelling.
The desired solution is for inkscape to use the xml:lang attribute of the text node to do the spell checking against the proper language.
Implementation
There are two ways to implement this:
- allow one language per text element
- allow any number of languages per text element in that you let the user select a region in the text and define a language for that.
There are two parts to an implementation:
- an user interface to be able to set the xml:lang attribute for a text node
- make sure gtkspell makes use of this information for spell checking
User interface changes
One language per text element
If you allow only one language per text element the ui is fairly straight forward. The Text and Font dialog would contain an element to specify the language for the current text element.
The global language setting (the default language if you like) could be set in the Page tab of the document preferences dialog.
Multiple languages per text element
If you allow setting the language for any selection of text you could maybe use a combo box or a combo box entry in the text tool bar where you could specify the language for the current selection (or for the current text element if there is no selection). This would also show the current language setting of the selected text element or the selected region.
The global setting would again be in the document preferences dialog
Which languages in the combo box
The question is which languages should the user be able to choose from in the combo box? This question has an impact on the widget choice: for given small set of languages the combo box makes most sense. For a big number of languages the combo box entry might make more sense. There is basically two arguments:
- The user should only be able to choose from the installed dictionaries. Anything else doesn't make sense as she cannot do spell check for other languages anyway.
- The user should be able to choose from any existing language because this is a requirement for inkscape to claim to be a conforming svg authoring tool! (see http://www.w3.org/TR/WAI-WEBCONTENT/#gl-abbreviated-and-foreign)
The combo box entry is not nice as it might confuse novice users (too much choice, possibility of non-valid entry) however to be conforming we probably need a combo box entry. Gtk+ 2.6 has an improved GtkComboBox which can handle trees, so the languages could be set up in a tree where the first level would be the first character of the language (this should allow us to handle the many languages case gracefully without having to resort to manual entry of e.g. de_CH).
Inheritance
A more tricky part would be to allow for some kind of inheritance where you specify a default language and have the possibility to override this in specific text nodes. I would suggest to leave this feature out for now.
Code changes
The second point seems to be fairly straight forward. There are some comments and a todo on how to do this in src/dialogs/text-edit.cpp (search for WITH_GTKSPELL) where it says:
/* todo: Use computed xml:lang attribute of relevant element, if present, to specify the language (either as 2nd arg of gtkspell_new_attach, or with explicit gtkspell_set_language call in; see advanced.c example in gtkspell docs). sp_text_edit_dialog_read_selection looks like a suitable place. */
However it might be more desireable to set the GtkTextTags in sp_text_edit_dialog_read_selection. According to pjrm this seems to have the desired effect.
It seems that pjrm has basically hacked up a solution for the second part but has apparently hit a bug in gtkspell where it apparently ignores the language info from language GtkTextTag's in preference to querying locale (see the second irc log for all the details).
irc log
Here is the log of the chat on irc:
(11:15:23) egli: but I was wondering if there was a way to spell check the text objects? (11:17:06) egli: or in case it was not built in I was wondering how complicated it would be to add (11:18:02) SkoZombie: If you go to "Text and Font" and click on "Text" then it'll hilite possibly mispelt words (11:18:15) SkoZombie: but i'm runnong the latest CVS version, 0.41 (latest stable) might be different (11:18:27) egli: hm, let me check (11:21:26) egli: you are talking about the "Text" tab in the "Text and Font" Dialog, right? It doesn't hilight any words despite the fact that they are in german :-/. Yes this is 0.41 (11:23:16) egli: I presume the UI should have a possibility to specify the language of a text object. (11:23:36) egli: I guess I'll need to check out CVS (11:23:41) egli: thanks SkoZombie (11:23:43) SkoZombie: it must be a 0.42pre feature (11:23:59) SkoZombie: sorry i can't be of more help, but yes, a spell checker would be a nice addition :) (11:24:32) egli: I'll see if I find time to look into it (11:28:28) ^-: [pjrm] spell checking may depend on having gtkspell and possibly related libraries installed. (11:28:38) ^-: [pjrm] and aspell (11:28:57) ^-: [pjrm] Is there any message on standard error? (11:30:30) ^-: [pjrm] gtkspell seems to pick its language according to LC_MESSAGES. Thus, I have LC_MESSAGES=fr_FR.UTF-8 in my environment, and I get “gtkspell error: aspell: No word lists can be found for the language "fr_FR".” on stderr when I open the Text & Font dialog box. (11:31:31) ^-: [pjrm] (Of course inkscape ought to tell gtkspell what language to use according to xml:lang attributes, but that isn't implemented.) (11:33:21) ^-: [pjrm] Doesn't seem to be in 0.41. (11:33:41) ^-: [cornelius] does it requires to have gtkspell-devel installed while compilling inkscape? (11:33:47) ^-: [cornelius] *require (11:35:38) ^-: [pjrm] Looks like it: dialogs/text-edit.cpp has "#ifdef WITH_GTKSPELL" (11:36:06) ^-: [cornelius] ah ok, so that's why I don't have got it enabled :) (11:36:25) ^-: [pjrm] configure.ac tests if pkg-config --exists gtkspell-2.0 (11:37:30) ^-: [cornelius] /me will more look and less ask later (11:38:11) egli: pjrm: hm, you're saying that the svg:text node has a xml:lang attribute? (11:38:44) ^-: [pjrm] yes (11:39:01) egli: hm I don't see it if I open the xml editor (11:39:15) egli: do I have to add it manually for the moment? (11:39:22) ^-: [pjrm] yes (11:39:30) egli: ah :-) (11:39:30) ^-: [pjrm] was just about to say: "has" meaning "allows" (11:40:06) egli: so the spell check feature would require a gui which actually add the xml:lang attr to the text node? (11:40:45) ^-: [pjrm] well, as i say, inkscape doesn't yet look at xml:lang attributes. (11:41:51) egli: ok, I understand that, but in order to have spell check for different languages this would have to be implemented, correct? (11:41:52) ^-: [pjrm] I'm just saying that xml:lang attributes (if present) would be a better choice of language for the spell checker than querying the environment for the current user's chosen locale (11:42:05) egli: ok, yes, I agree (11:42:24) ^-: [pjrm] re "required", you could always restart inkscape with a different locale :) (11:43:33) egli: hehe, my birth announcement was bilingual (11:44:06) Uraeus [~cschalle@core.fluendo.com] entered the room. (11:44:08) ^-: [pjrm] the comment in src/dialogs/text-edit.cpp (search for WITH_GTKSPELL) gives more detail on how to implement (11:45:36) ^-: [pjrm] it doesn't mention how to code the gui for controlling xml:lang attributes. That's independent. I don't think it's too hard, apart from the desirability of distinguishing between "unspecified, just inherit" and providing explicit value. (11:49:37) egli: ah, pjrm, you're making me curious, I'm checking out the source from CVS now (11:53:19) ^-: [pjrm] :) (11:54:08) egli: inkscape is the module I want, right? (11:55:31) ^-: [pjrm] yes (11:55:37) egli: ok (11:55:50) ^-: [pjrm] see also gtkspell docs (11:56:49) ^-: [pjrm] hmm, i wonder if it would suffice to copy xml:lang attributes into pango markup in the text widget (11:58:03) ^-: * basic has left: Replaced by new connection (12:01:46) egli: pjrm: I found the relevant code snippet. I'll have to look at the gtkspell docs (12:02:47) ^-: [pjrm] there's a gtk_text_buffer_insert_with_tags function we could probably use instead of gtk_text_buffer_set_text in sp_text_edit_dialog_read_selection (12:06:58) egli: ok, I'm totally new at this. You're outlining two ways now: either pango markup in the text widget or the gtk_text_buffer_insert_with_tags function. Or do they amount to the same (12:10:07) ^-: [pjrm] apparently one can't directly set pango markup (12:10:47) ^-: [pjrm] the GtkTextWidget converts its GtkTextTag stuff to pango stuff (12:11:07) ^-: [pjrm] (according to my brief glance at text_widget_internals.txt.gz just now) (12:11:56) ringerc left the room (quit: "oops, someone let the magic smoke out."). (12:13:16) egli: ok, I want to put this stuff into the wiki so I do not forget. Where do I put this best? just create a new page in the wiki and link to it from the newfeatureProposals? (12:13:52) ^-: [pjrm] There's a feature request item in the tracker: see one of the buttons on the left of www.inkscape.org (12:17:17) egli: oh, this tracker thingy. I'd rather put this discussion into the wiki. But I dunno how the inkscape devs work (12:26:25) SkoZombie: egli: they can always remove it from the wiki if they think its inappropriate for whatever reason (12:26:59) egli: ok, thanks SkoZombie (12:55:07) ^-: [pjrm] Setting GtkTextTags seems to have the right effect.
Updates again from irc:
(14:18:43) egli: I don't quite understand what pjrm means when he says "Setting GtkTextTags seems to have the right effect". Does that mean he actually hacked up a solution that uses the xml:lang attribute for spell checking? (15:14:42) ^-: [pjrm_home] egli: When I said that, all I'd implemented was applying a single tag over the entire buffer with a hard-coded language code "fr_FR". When I ran inkscape in a C locale and typed text with mixed english & french, the english words were underlined as "misspelled". (15:16:16) ^-: [pjrm_home] I've since coded a little more. However, even if I could finish the feature tonight, I don't know whether it would be applied for 0.42 or not. (15:17:34) ^-: [pjrm_home] It's a nice touch, but it has a reasonable chance of introducing bugs (e.g. mismanaging memory, possibly even causing crash bugs), so it might be considered that the risk outweighs the goodness of this feature. (15:17:53) ^-: [pjrm_home] Of course it could go into cvs as soon as 0.42 were released, though. (15:17:56) egli: pjrm_home: you mean you applied the tag over the text buffer (15:18:11) ^-: [pjrm_home] yes, over the GtkTextBuffer. (15:18:14) egli: ok (15:18:43) egli: why would it introduce bugs, just because you add a tag to the text buffer (15:20:26) ^-: [pjrm_home] the feature involves creating data structures to represent the different languages of different regions. Also creating GtkTextTag's and doing stuff with GtkTextIter objects and GtkTextTagTable and a few other things. (15:20:44) ^-: [pjrm_home] re the value of the feature: we have no gui to control xml:lang tags (15:21:35) ^-: [pjrm_home] thus it has fairly small value, so there doesn't have to be much risk for the risk to outweigh the benefit. (15:22:51) ^-: [pjrm_home] bryce suggests that we try to release fairly soon, in time for a certain conference. (15:23:21) ^-: [pjrm_home] bbyak: your opinion? (15:23:33) ^-: [bbyak] on what? (15:24:13) ^-: [pjrm_home] whether it's worthwhile having the spell checker choose language by xml:lang tags if present, rather than solely on user's locale as the current cvs does. (15:24:27) ^-: [pjrm_home] i mean, whether worthwhile doing this for 0.42 (15:24:56) ^-: [bbyak] i think it may be safe enough for 0.42, as this feature is not much used anyway (15:25:06) ^-: [bbyak] afaik (15:29:40) egli: well I wasn't thinking of applying different xml:langs to different regions. I was more thinking of just having one xml:lang per node. This would go a long way towards usable spell checking. I do not need different langs for different regions (at least not now) (15:30:31) egli: if we just do that would it still be unsafe (as you do not need GtkTextIter if you only have one language per node) (15:32:17) ^-: [pjrm_home] by regions, i just meant "regions of the text", as marked by e.g. <tspan> elements (or whatever other element, the code wouldn't care what element had the xml:lang tags) (15:32:32) ^-: [pjrm_home] what do you mean by node? (15:33:08) ^-: [pjrm_home] anyway, i'll see what I can come up with in the next hour or so. (15:35:44) egli: I might not have the correct terminology. I call the xml:text element in XML a "node". From what I understand this corresponds to the element that I get when I insert a text object in inkscape (15:38:46) ^-: [pjrm_home] egli: correct terminology, but a <text> element [i take it you mean svg:text, btw] can contain children like <tspan>, and the text that gets put in the dialog box could include many different tspan elements' text. (15:38:55) egli: so I would attach the xml:lang attribute to the svg:text element as opposed to the svg:tspan (15:40:03) egli: [yes I mean svg:text] (15:42:13) egli: so you would want to allow for each tspan to have a separate xml:lang attribute. This would of course complicate things. It would also make the UI to enter the xml:lang attribute much more complicated. I would suggest to ignore xml:lang attributes on the tspan elements and only worry about xml:lang attributes of the svg:text element (15:43:46) egli: From a users point of view I can live with a few "misspelled" words if I have a few english words in my german document. (15:45:00) egli: However it would take me a long way if I could say this text is in english and that text is in german and have them spell checked properly. Say I'd have a bilingual flyer with german text on the left and english text on the right (15:45:05) ^-: [pjrm_home] re ui, it should be the same complication as setting bold attribute. (15:46:52) egli: pjrm_home: from what I can tell you can only set the bold on the svg:text node (at least in 0.41) (15:48:13) ^-: [bbyak] not so in 0.42 (15:48:26) ^-: [bbyak] there's full support for styling text fragments now (15:53:48) ^-: [pjrm_home] the documentation for gtk_text_buffer_get_iter_at_offset claims that its offset is number of characters, but that seems unlikely to me: i'd have expected number of bytes. Anyone know? (16:57:51) ^-: [pjrm_home] oh dear. My previous test indicating that language GtkTextTag's are respected, was faulty: apparently gtkspell is buggy in how it queries locale variables: LC_ALL=C doesn't override LC_MESSAGES for gtkspell. (16:58:23) egli: uh, are you sure? (16:58:40) ^-: [pjrm_home] So i incorrectly concluded that gtkspell got french from the GtkTextTag, whereas in fact it got it from my LC_MESSAGES environment variable (which I thought was being ignored given I had LC_ALL=C). (17:01:56) ^-: [pjrm_home] I'm quite disappointed about that: having implemented xml:lang querying fairly quickly, only to find that gtkspell is ignoring my work :-( . (17:02:07) ^-: [pjrm_home] I should submit a bug report against gtkspell. (17:02:41) ^-: [pjrm_home] it seems reasonable for it to get language info from language GtkTextTag's in preference to querying locale. (17:03:53) egli: hm, I was looking forward to spell checking :-/ (17:04:15) ^-: [pjrm_home] Especially when it seems that its locale querying is wrong anyway (or at least different from gettext's behaviour) (17:07:26) ^-: [pjrm_home] Hey, I can trigger a crash by setting LC_MESSAGES to fr_FR.UTF-8 (or whatever other non-english UTF-8 locale) while having unset LC_CTYPE. (17:08:04) ^-: [pjrm_home] seems to be throwing an exception from Glib::locale_from_utf8 (17:08:41) ^-: [pjrm_home] SkoZombie: I usually have LC_MESSAGES set to french to help me learn french. (17:15:09) ^-: [pjrm_home] oh well, bed time now. Night all. (17:15:34) sciboy: Good night =) (17:16:02) ^-: * pjrm_home has left: Déconnecté
irc log of user interface for language selection discussion:
(09:04:56) egli: pjrm: I was wondering what happened to your spell check work. Did you get up the next day and have a brilliant idea what the problem with GtkText and GtkSpell could be? (09:05:44) egli: as this code is probably not going into cvs right now it might make sense to post it somewhere for people (like me :-) to look at and play with it (09:11:05) pjrm: egli: I can send you the diffs, but it's only to apply language GtkTextTag's to the text buffer. gktspell apparently doesn't honour those. (09:11:52) egli: pjrm: yes, I'd love to have the diffs. That'll let me play with it and do some more experimenting (09:12:00) pjrm: It might still be a good patch to apply anyway just to help pango and maybe screen readers (?), but the benefit for gtkspell will have to wait until gtkspell honours them. (09:12:27) pjrm: i'll be home in half an hour. (09:12:30) egli: I just cannot believe that gtkspell would have such a blatant bug (09:12:31) pjrm left the room (Déconnecté). (09:12:38) egli: it's used everywhere (09:26:35) pjrm_home [pmoulder@bowman.csse.monash.edu.au/Gaim] entered the room. (09:28:55) pjrm_home: presumably they just didn't expect to find language tags. (09:29:23) pjrm_home: e.g. the buffer i'm typing in in gaim wouldn't have language tags. (09:30:56) egli: "they didn't expect to find language tags"? Hm, the people behind GtkSpell are pretty smart, but then again nobody is perfect (09:31:22) egli: have to look into this and do some debugging (09:36:24) pjrm_home: Re putting something like the current patch into CVS: I hope to be able to extend it a bit such that e.g. bold sections get preserved when the user clicks Apply. (09:37:05) egli: ah that would be good too. Then I could work from CVS (09:38:12) pjrm_home: In current CVS, if you have some bits of the text with a different style (bold or whatever), then we don't keep track of that in the text edit box in Text & Font dialog, so clicking Apply will in effect discard markup. (09:38:46) pjrm_home: (Not too much of a bug given that ppl rarely use that Text tab, but would still be nice to do right.) (09:38:59) egli: uhh, I didn't notice that when playing around with it (09:39:54) egli: then again if we accept this limitation we could also go for the limitation that language settings can only be set per GtkText and not individual regions (09:39:57) pjrm_home: More controversially, we could attempt to translate inkscape style into GtkTextTag style. Not too controversial for just the odd bold/italic word or two, but more questionable for font size or font face. (09:40:13) egli: from a user pov I think this is perfectly fine (09:40:37) pjrm_home: yes, that would be a good start. (09:41:07) egli: it would go a long way usability wise as I think it would cover the most common usecases (09:41:14) pjrm_home: yep (09:41:44) egli: I can accept that it would claim the odd misspelled word if I have one english word in a german text (09:42:56) egli: also from the ui pov it would be much simpler: just specify one language in the "Text and Font" dialog as opposed to be able to specify a language for each region (09:43:51) egli: how would the ui for changing the language of a selection in the text look like anyway? (09:44:43) egli: I can only think of how it is done in OO.o 1.1.x. Pretty unintuitive somewhere under the character format settings (09:50:09) pjrm_home: I'm not a user interface person. I was thinking of a text field and tickbox on the Text & Font dialog box, where the field greys out when the checkbox isn't ticked. I think there's a combo widget that allows selecting from a menu as a way of filling in the text box. (09:50:35) pjrm_home: The checkbox is to distinguish between inherit or not. (09:51:24) pjrm_home: The text field is because the set of allowable languages is infinite. (Also more convenient than selecting from a menu, once you know the format.) (09:52:29) egli: I was thinking of just allowing for a combo widget which lets you specify a language or leave it at the default which would say "use document default" (09:52:48) egli: would it make sense to be able to enter the language manually? (09:53:09) pjrm_home: what do you mean manually? (09:53:17) pjrm_home: do you mean type as distinct from select from menu? (09:53:22) egli: I think the user should only be able to enter the languages for which she has spelling dictionaries installed (09:53:37) pjrm_home: it isn't just for spelling on the local machine (09:53:55) pjrm_home: e.g. it facilitates google searches (09:54:03) egli: by manually I mean enter free form text in a text field as opposed to choose a selection from a combo widget (09:54:24) pjrm_home: certainly it should be possible to enter free-form text. (09:54:40) egli: hm I don't understand that (09:54:56) egli: I just want to say that this text is in german (09:54:57) pjrm_home: I thought combo widget meant something that allows either free-form text or selecting from a menu (09:55:14) pjrm_home: then enter either de or de_DE or de_CH or select from a menu (09:55:36) egli: ah, again my terminology, I mean a widget where you can only select from predefined vaules (09:56:08) pjrm_home: let's call that a popup menu. (Can someone here who does more gtk stuff fill us in on the standard gtk terms?) (09:56:23) egli: ok, sorry for the mixup (09:56:49) egli: evolution let's you choose the language in the composer (09:57:27) egli: and it uses user understandeable terms such as English (British) etc (09:58:17) egli: but what was the story with google searches. Why would you want that in the Text and font dialog? (09:58:54) pjrm_home: egli: It's good to have a menu available so that one doesn't need to know the right format, but it's also good to have text entry to allow less common languages, and because it's more convenient than selecting from a huge menu. (10:00:26) egli: ok, I see, but presumably the user only wants to see the languages for which she has spelling dicts, doesn't she? (10:01:31) egli: grandma tilly might be confused by a text box where she has to enter de_CH :-) (10:01:34) pjrm_home: languages with spelling dicts available could be at the top of the list for convenience, though sometimes one wants to specify the language correctly even when one doesn't have the appropriate spelling dictionary installed at present on the current machine (10:01:42) pjrm_home: grandma tilly can use the menu (10:01:47) egli: hehe (10:02:42) egli: I guess you do want to specify languages for which you don't have a dict, but that seems like a rare case (10:03:01) egli: very rare (10:03:56) pjrm_home: providing that possibility is i think one of the criteria for accessibility standards... I'll have a look. (10:05:03) egli: maybe a question for the GNOME HIG people? (10:06:51) pjrm_home: re google: google provides the ability to show pages/images for a specified language, e.g. searching for prix only in english pages to avoid the common french word meaning price. (10:07:12) pjrm_home: e.g. searching for SVG images containing the word Grand Prix. (10:10:10) egli: ah, now I get it. As you attach the xml:lang tag it can help with google searches. Yes. I thought you want to initiate google searches from the inkscape Text and Font dialog :-) (10:10:43) pjrm_home: re "rare case", providing the ability to specify the correct natural language is actually a requirement for inkscape to claim to be a conforming svg authoring tool! (10:10:48) pjrm_home: http://www.w3.org/TR/WAI-WEBCONTENT/#gl-abbreviated-and-foreign (10:11:32) pjrm_home: i'll need to check that, but it's certainly a priority one item for creating accessible documents. (10:12:42) pjrm_home: http://www.w3.org/TR/SVG11/access.html explains some of the relationship between svg conformance and accessibility (10:14:55) egli: ok, you have a point there :-). It just seems against the "just works" principle. Too many options, too much confusion (10:15:38) pjrm_home: I think guideline 1.1 and 1.2 at http://www.w3.org/TR/ATAG10/#gl-access-support indicate that this feature is a priority one guideline for authoring tools. (10:15:53) egli: maybe a possibility would be to cover the grandma tilly case with a popup menu and let the specialists enter any xml:lang with the xml editor (10:17:58) egli: I'm not saying we shouldn't let the user enter any language. I'm just arguing for an easy and intuitive way for the common case. Basically make easy things easy and hard things possible (10:17:59) pjrm_home: grandma tilly might be a bit of an extreme case, for whom it would be beneficial to hide options. I for one would appreciate the convenience of having a text field as an alternative. (10:18:21) pjrm_home: i think providing the menu is enough to make things easy & intuitive. (10:18:34) egli: ah, yes, but you're not grandma tilly (10:18:42) pjrm_home: granted (10:18:52) egli: I guess I could live with the menu :-) (10:19:26) egli: grandma tilly won't notice that she could also enter text there (10:20:44) pjrm_home: for grandma tilly, i believe it would be an improvement to suppress the text field. But for Andy Artist I'm not so sure; i'm inclined to think Andy would like the text field to be present so long as it doesn't cost too much space on the dialog box. (10:21:08) pjrm_home: Having an Other... entry in the menu is another approach (10:21:22) pjrm_home: that approach doesn't provide the convenience benefit though (10:22:02) egli: andy the artist could edit the in xml editor (10:22:27) egli: let me see how gimp handles this (10:23:16) pjrm_home: most of gimp's output formats don't represent text as text, so the ability to specify language isn't very relevant other than for spell checking (10:24:13) egli: ok, fair enough (10:24:50) egli: looks like you cannot even do spell checking in gimp 2.2.4 (10:26:32) egli: I'll try to talk to my A Illustrator friend to find out how it is done there. (10:34:28) pjrm_home: also valuable is guessing the language automatically. http://odur.let.rug.nl/~vannoord/TextCat/competitors.html has a few tools for this. Checking in available spell checkers is an alternative. (10:36:46) egli: hah, now we're getting fancy. I'd go for the easy stuff first, e.g. just use available spell checkers. language recognition can always be added :-)