Coding Style

From Inkscape Wiki
Revision as of 22:11, 20 June 2006 by GigaClon (talk | contribs) (Categorization)
Jump to navigation Jump to search

Inkscape Coding Style Discussion

The ["" official code style documentation] is on the main website.

This page is for discussing and working out changes and improvements to that document. When concensus is reached, the document can be updated in SVN in the inkscape_web module.

Editor support

Place the following at the end of source files to help emacsen & vim users follow some of our guidelines automatically:

  Local Variables:
  c-file-offsets:((innamespace . 0)(inline-open . 0)(case-label . +))
// vim: filetype=cpp:expandtab:shiftwidth=4:tabstop=8:softtabstop=4:encoding=utf-8:textwidth=99 :

Once you've added those lines, close the file and re-open it for it to take effect.

Then (for emacsen users), one can re-indent a region with C-M-\.

crazyCamelCase v. Underscores

  • The new coding standards require non-static member functions to be in pseudoCamelCase, and every other type of function to use underscores.
  • The current codebase has very few real C++ objects, half of these objects (or more) seem to use underscores for their member functions, the other half seem to use pseudoCamelCase.
    • Over-all, 95% of Inkscape's codebase uses underscores in function names (of coures, 90% of the code is still C).
      • It seems underscores are what Inkscape developers are already used to.
  • I prefer CamelCase for type names (classes), exceptions, namespaces, and the like. But pseudoCamelCase in member functions clutters the visual space; making types, exceptions, and namescapes stand out much less. And it makes the code much harder (for me) to read.
    • Why is there this inconsistency with regard to function names in the new coding standard?
      • (I.e., why a few functions pseudoCamelCase, and everything else underscores?)
    • Could all functions just use underscores? (Like the current codebase, like Gtkmm, like Linux, like the C & C++ standard libraries, and like almost every other project out there?)
    • Is static vs. non-static member functions really that big of an issue to warrant a change to pseudoCamelCase?
  • Hey, we're not programming in Java (or Qt) here! This is C++, where underscore reigns supreme.

Underscore as first character of identifiers

  • Please consider that the C++ standard reserves _* identifiers []: "Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace".

Tabs and Alignment

  • Something else to consider is developer culture -- most of our developers at this point will likely be accustomed to either GNU coding standards (2/4-space indent), or GNOME/Linux-kernel coding standards (8-space indention quantum). [What does KDE recommend?]
  • Objective studies have found that indentions between 2 and 6 columns are optimal.
Reference: [[[ProgramIndentation]] Miara, J. R., J. A. Musselman, J. A. Navarro, and B. Shneiderman. Program

Indentation and Comprehensibility. Comm. ACM 26, 11 (Nov. 1983), 861-867.]

  • No tabs in files.
    • By default many editors (and pagers as used for viewing diffs, and "diff -t") set tab stops to 8.
If files end up with mixed space & tab alignment (other than the "structural indentation" approach discussed below), then it breaks formatting to use anything but 8 as one's editor's tab stop size.
    • Suggesting that people change their tab stop size is only practical if all source code uses spaces & tabs the same way.
This may be possible with mode lines if all Inkscape developers stick to editors with appropriate modeline support (emacsen, vim/gvim; anything else?).
Do any popular Windows/MacOS editors (other than the above) support mode lines?
What about KDevelop/anjuta/other IDE's on Un*x?
It's fairly easy to define this behavior for Emacs and VIM. Having it explicit might be enough.
It's possible to add checkin scripts to CVS to warn/prevent checkins with tabs, etc.
    • A minor disadvantage of using spaces for indentation is that some text editors (not emacsen or vim) would require using the space key a lot. On the other hand, code written in such an editor probably requires reformatting anyway.
    • Observing a rule of using tabs for structural indentation, and spaces for cosmetic (e.g. continuation) indentation, works fine regardless of a particular user's tab stops.
Somewhat contrived example (as seen with an 8-character tab stop):
          if (gromzalon) {
                    imagine_that_this_is_very_long(some, arguments,
                                                   go, here);
The breakdown:
<=Tab====>if (gromzalon) {
<=Tab====><=Tab====>imagine_that_this_is_very_long(some, arguments,
<=Tab====><=Tab====><=Spaces======================>go, here);
What this looks like with a tab stop of 3:
   if (gromzalon) {
      imagine_that_this_is_very_long(some, arguments,
                                     go, here);
-- MenTaLguY
(By the way, anyone submitting code with identifiers like "gromzalon" or "sprunklefrink" will be summarily shot.)
However, such an indentation rule would only be a feasible recommendataion if it had automated support from at least one of {vim, emacsen, a standalone indenter like astyle}. We don't know of any software to support this. Indeed, most editors (including vim and emacsen) tend to "normalize" spacing when indenting or out-denting a block of code.
<=Spaces======================>go, here

will get mangled by most editors. Often like this:

<=Tab=Tab=Tab=Tab=Spaces=>go, here
    • Can anyone give a specification for common editors or astyle/indent to give this behaviour, or suggest a script to check where this hasn't been done?
Without such automation, this approach won't be used by many developers.

Brace Placement

Arguments against "disco"/compact braces:

  • Make it hard to match braces visually.
Partial counter-argument: emacsen & vim each have brace-locating commands. (Emacsen: C-M-u, C-M-n, C-M-p, C-M-d (up/next/prev/down); vim: `[ {', `] }'.)
  • The brace closes a block, not an if "statement".
However, the block is intimately related to the if/for/while in ways that a normal block is not.

Functions vs methods, i.e. member vs non-member functions

  • Information-hiding principle suggests that if a function doesn't need access to private variables then it's better not to make it a member function.
    • In more detail: when one wants to change some detain about a private variable, then one must look through everything that has access to that variable. This is easier if there's less code that has access to the variable, e.g. if one uses non-member functions in preference to member-functions.
  • Virtual functions must of course be member functions.
  • Trivial getter/setter functions seem most natural as member functions. (This is mostly a matter of style or tradition rather than technical reasons.)
    • "barType getBar() const { return _bar; }" / "void setBar(barType x) { _bar = x; }" are trivial. It's not clear how much else is to be considered trivial.
  • For things needing access to private variables, the requirement of the `friend' declaration for non-member functions is a cost: extra clutter in the class definition, requires updating for changes in the signature; and many of the arguments for non-member functions disappear.
  • Member functions must be declared in the class definition, which tends to require more recompilation because one can't use a separate header file.
    • Sometimes one can break the class into smaller, simpler classes if there are lots of member functions.
  • A related issue is that it's harder (impossible) to write private specializations (same name but subclass arguments): any specialization must be declared in the class definition too, so it isn't private other than through comments.
  • The shorter names typical of member functions are harder to search for.
    • (Counter-argument: this is a matter of custom, there's no technical reason why one can't use long names for member functions, or even short names for non-member functions.)
    • Someone has claimed that some tools do relatively well at finding the right version of a given name. Can someone give examples of such a tool (and some comment as to how widely used that tool is or can be by our developers) ?
  • Pointers to member functions aren't as easily used as pointers to non-member functions. ("unwieldy", "heavy-weight".)
  • Member functions can't be given static linkage.

static vs anonymous namespaces for file-local functions/objects

The current C++ standard marks this type of `static' as deprecated, meaning "Normative for the current edition of the Standard, but not guaranteed to be part of the Standard in future revisions".

(Reference: Final Draft of the C++ standard, Annex D "Compatibility features"; and § "Unnamed namespaces", ¶2.)

However, g++ up to 3.4 (and possibly later) don't give anonymous-namespace names the same advantages given to static names:

  • Warning if the name isn't referenced (and hence unused).
  • Optimizations when the function/object can't be used from other object files.
  • Static linkage (relevant to analysis using nm).

In addition, having a static tag in the declaration itself makes it easier for the programmer to see that the name isn't referenced outside of this translation unit.

Consequently, it is recommended that we use static for names not referenced from other translation units.

If and when we encounter a compiler that rejects `static' for objects in namespace scope, we can easily replace `static' with `namespace { ... }'.

What to put in what header file

Reasons for change:

  • Reduce compilation times (most important)
  • Fix the mess of NR::Point, NR::Matrix etc. header files. Less confusion as to what needs to be #included, or where to look for the definition of something (especially in the cases where tags-like programs don't help, e.g. if looking for something helpful for a known task rather than having a known name).

What won't change:

  • foo.h will still provide the same things it always has (unless you count things "accidentally" provided, which we hope to become less). It's just that some content may be physically moved to a file that foo.h #includes.

The issues are:

  • Ease for callers (what file/s to #include). This is largely taken care of by having files that #include other header files, and in particular having the foo.h file as at present.
  • "I need to change the blah declaration/definition; what file do I edit?".
  • Information we want ppl to be aware of.

Definitions are handled by tags/idutils/global etc (though all of these tools have limitations in their ability to find the _right_ definition). Declarations that are separate from the definition: [non-member] function declarations are the main case. member functions (obviously kept in same physical file as their class definition); non-member functions

Here are some types of things to be found in header files. Whatever rules you hypothesize, check that the rules cover each of these cases.

  • class foo definition
  • "class foo;" forward declarations
  • struct foo_class (i.e. GObject "class"/vtable type)
  • IS_FOO, FOO cast etc. macros
  • ancilliary types (e.g. struct StopOnTrue for marshalling)
  • enums
  • relevant instantiations (traits etc.)
  • foo_do_blah (declaration of function defined in .cpp file)
  • inline function definitions
  • should declaration & definition of inline functions go in different files (there are pros & cons)
  • other constants, including #defines
  • other macros. Where is the line between "other macro" and "inline function" and IS_FOO macros?

The physical placement of the IS_FOO etc. macros (as distinct from what header files "provide" them) isn't important: we never need to edit them, and they're relatively easy to find by either primitive tags-like programs or by grep.

  • This unimportance-of-physical-location is reflected in the inconsistency of current practice, sometimes placing them with foo.h and sometimes in blah-forward.h.
  • Exception to unimportance-of-physical-location: IS_FOO etc. should not be physically in the same header file as something that's "expensive" and unneeded by one or more translation units that need IS_FOO etc. ("Expensive" in same sense as `amount' defined below in `Reducing compile times'.)

The main thing for IS_FOO etc. is what header file(s) provide the macro. Thankfully, this is easy in the common case: usually any function that needs the definition of IS_FOO also needs lots of other things relevant to the foo type, so would simply #include foo.h.

Relevant properties of function declarations:

  • It is common to add or remove functions, or change the arguments.
  • Typically not found by tags-like programs. E.g. neither `tags' nor `TAGS' files store the location of declarations (other than the definition).
  • Programmers want to know what functions (including methods, macros, inline things) are available (or rather "is there something that does blah").
    • Probably doxygen or the like is the best task for this, though we'd need to make it convenient to use.
  • Needed in advance of function definitions (whether in .cpp file or inline function definitions). Not needed in advance by macro definitions per se (but needed in advance of function definitions where the macro is expanded).

Inline function definitions:

  • Relatively expensive in compile time for translation units that don't use that inline function definition.
  • Must come after (in the translation unit) all referenced declarations. Sometimes this requires that it be outside of the class definition or in a separate file from related declarations. (E.g. if foo.h and bar.h each provide an inline function definition that requires declarations provided by the other file.)

Operator overloading:

  • Can be unclear whether to associate with the type of the left operand or the type of the right operand.
  • We can simplify things for users of the function by having both left.h and right.h provide it (i.e. have both left.h and right.h #include it). This suggests that it should be physically neither in left.h nor right.h (unless it's OK for left.h to pull in all of right.h or vice versa).

Reducing compile times

  • For each translation unit U, reduce the amount of material that gets included by U and isn't needed by U.
    • "Amount" is measured by cost: how often that material changes (causing a recompile), and how expensive it is to compile (e.g. inline function definitions).
  • In what circumstances does a translation unit include things that it doesn't need?
    • Inline function definitions.
    • Needs a function declaration but doesn't need the type definition.
      • A special [non-] case of this is needing a member function declaration. Member function declarations can't be separated from their class definition. See also `methods vs functions' above.
    • Suppose that struct Foo has a `Bar _bar' member. The translation unit may need to create Foo objects (and thus need the type definition of Bar to know its size), but not need to access _bar (so not needing anything from bar.h other than the type definition of Bar).
      • This is very common. Common examples of Bar: SPShape, NR::Point/NR::Matrix.
    • Needs some function declarations but not others.
    • Needs some type forward declarations but not others.
    • Comments are never needed for compilation.
    • Definition of a marshaller is needed only by emitters. However, it's difficult to separate the marshaller from the signal, and difficult to separate the signal from the definition of the containing type (SPDesktop, SPItem).
    • Including the definition of a type when its declaration would suffice.
    • Copy&paste of #include lines.
  • Ways of reducing the need for type definitions:
    • Store just a pointer instead of the whole type.
    • Provide a wrapper function (not inlined) for `new Foo'.