Difference between revisions of "Coding Style"
(→crazyCamelCase v. Underscores: fixed pejorative title) |
|||
Line 27: | Line 27: | ||
Then (for emacsen users), one can re-indent a region with C-M-\. | Then (for emacsen users), one can re-indent a region with C-M-\. | ||
=== | === camelCase v. Underscores === | ||
* The new coding standards require non-static member functions to be in pseudoCamelCase, and every other type of function to use underscores. | * The new coding standards require non-static member functions to be in pseudoCamelCase, and every other type of function to use underscores. | ||
* The current codebase has very few real C++ objects, half of these objects (or more) seem to use underscores for their member functions, the other half seem to use pseudoCamelCase. | * The current codebase has very few real C++ objects, half of these objects (or more) seem to use underscores for their member functions, the other half seem to use pseudoCamelCase. | ||
** Over-all, 95% of Inkscape's codebase uses underscores in function names (of | ** Over-all, 95% of Inkscape's codebase uses underscores in function names (of course, 90% of the code is still C). | ||
*** It seems underscores are what Inkscape developers are already used to. | *** It seems underscores are what Inkscape developers are already used to. | ||
* I prefer CamelCase for type names (classes), exceptions, namespaces, and the like. But pseudoCamelCase in member functions clutters the visual space; making types, exceptions, and namescapes stand out much less. And it makes the code much harder (for me) to read. | * I prefer CamelCase for type names (classes), exceptions, namespaces, and the like. But pseudoCamelCase in member functions clutters the visual space; making types, exceptions, and namescapes stand out much less. And it makes the code much harder (for me) to read. | ||
** This is a personal preference. I find CamelCase to be more legible and more concise, as it takes less space and thus allows for longer identifiers. [[User:JonCruz|JonCruz]] | |||
** Why is there this inconsistency with regard to function names in the new coding standard? | ** Why is there this inconsistency with regard to function names in the new coding standard? | ||
*** (I.e., why a few functions pseudoCamelCase, and everything else underscores?) | *** (I.e., why a few functions pseudoCamelCase, and everything else underscores?) | ||
Line 40: | Line 43: | ||
* Hey, we're not programming in Java (or Qt) here! This is C++, where underscore reigns supreme. | * Hey, we're not programming in Java (or Qt) here! This is C++, where underscore reigns supreme. | ||
** A key point here is the misconception that camel case is not a C/C++ thing. Many platforms used camel case, including the Microsoft Windows API, Apple Macintosh API, etc. [[User:JonCruz|JonCruz]] | |||
** CamelCase is also prevalent in many other "non-pc" languages such as Smalltalk. | |||
=== Underscore as first character of identifiers === | === Underscore as first character of identifiers === |
Revision as of 22:08, 25 June 2011
Inkscape Coding Style Discussion
The official code style documentation is on the main website. "http://www.inkscape.org/doc/coding_style.php"
This page is for discussing and working out changes and improvements to that document. When concensus is reached, the document can be updated in SVN in the inkscape_web module.
Editor support
Place the following at the end of source files to help emacsen & vim users follow some of our guidelines automatically:
/* Local Variables: mode:c++ c-file-style:"stroustrup" c-file-offsets:((innamespace . 0)(inline-open . 0)(case-label . +)) indent-tabs-mode:nil fill-column:99 End: */ // vim: filetype=cpp:expandtab:shiftwidth=4:tabstop=8:softtabstop=4:encoding=utf-8:textwidth=99 :
Once you've added those lines, close the file and re-open it for it to take effect.
Then (for emacsen users), one can re-indent a region with C-M-\.
camelCase v. Underscores
- The new coding standards require non-static member functions to be in pseudoCamelCase, and every other type of function to use underscores.
- The current codebase has very few real C++ objects, half of these objects (or more) seem to use underscores for their member functions, the other half seem to use pseudoCamelCase.
- Over-all, 95% of Inkscape's codebase uses underscores in function names (of course, 90% of the code is still C).
- It seems underscores are what Inkscape developers are already used to.
- Over-all, 95% of Inkscape's codebase uses underscores in function names (of course, 90% of the code is still C).
- I prefer CamelCase for type names (classes), exceptions, namespaces, and the like. But pseudoCamelCase in member functions clutters the visual space; making types, exceptions, and namescapes stand out much less. And it makes the code much harder (for me) to read.
- This is a personal preference. I find CamelCase to be more legible and more concise, as it takes less space and thus allows for longer identifiers. JonCruz
- Why is there this inconsistency with regard to function names in the new coding standard?
- (I.e., why a few functions pseudoCamelCase, and everything else underscores?)
- Could all functions just use underscores? (Like the current codebase, like Gtkmm, like Linux, like the C & C++ standard libraries, and like almost every other project out there?)
- Is static vs. non-static member functions really that big of an issue to warrant a change to pseudoCamelCase?
- Why is there this inconsistency with regard to function names in the new coding standard?
- Hey, we're not programming in Java (or Qt) here! This is C++, where underscore reigns supreme.
- A key point here is the misconception that camel case is not a C/C++ thing. Many platforms used camel case, including the Microsoft Windows API, Apple Macintosh API, etc. JonCruz
- CamelCase is also prevalent in many other "non-pc" languages such as Smalltalk.
Underscore as first character of identifiers
- Please consider that the C++ standard reserves _* identifiers [17.4.3.1.2]: "Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace".
Tabs and Alignment
Inkscape code uses 4 spaces for indentation and no tabs anywhere.
That's clear (not to mention sensible). However, on the mail Coding Style page it says this:
We have decided to use 4 spaces as a tab for the project.
Calling it a "tab" (rather than something like "indentation amount") is a little confusing, I think. It made me think that source files should have tab characters, but that the editor should be set up to interpret them as being 4 columns (i.e., taking you to the next multiple of 4 columns).
Whitespace: Space between sigil and following token?
I.e. do we write ‘int *p
’ or ‘int * p
’; ‘int *f()
’ or ‘int * f()
’; ‘int *const p
’ or ‘int * const p
’ ?
Arguments in favour of no space:
- Promotes better understanding of C/C++'s parsing rules for cases like ‘
int *a, b
’: the sigil applies only toa
in that example, not tob
.- This argument is much less applicable to the case that the following identifier names a function: it is very rare for function prototypes to “syntactically share return types” that way; this rarity reduces the importance of knowing the meaning of ‘
int * a(), b()
’.
- This argument is much less applicable to the case that the following identifier names a function: it is very rare for function prototypes to “syntactically share return types” that way; this rarity reduces the importance of knowing the meaning of ‘
- Less horizontal whitespace usage (reducing the chance of overflowing the available width or requiring a continuation line).
- However, note that overflow is rare for variable declarations; for function headers, continuation doesn't hinder understanding of the header in question, so the argument rather becomes one of taking up more lines, reducing how many statements are visible in one screenful.
- For the case of declarations inside function bodies (and other cases as well, to a lesser extent), ‘
Foo * foo
’ is more easily mistaken for executable code (as distinct from a declaration) than ‘Foo *foo
’ is. - Less typing. (Minimal importance.)
Arguments in favour of space:
- For the cases where the following token is an identifier, improves the legibility of the identifier. For cases where the following token is a cv-qualifier (‘const’, ‘volatile’), the space improves the legibility of the sigil.
An additional issue is the relative prevalence of the approaches in existing code: this relates to either consistency or the cost of changing code (to the extent that changing is done manually).
Before doing any counting, the impression of one developer [writing this] is that no-space is much more common, but with-space is far from unheard-of. (It's valuable to get an informal impression as a defense against any deficiencies in simplistic automated counting: e.g. we ought to give more weight to code that's often looked at / changed than code that's less often looked at or changed, and should give more weight to code considered Inkscape code than imported code.)
The results of a simplistic automated count confirm that: there are 28037 lines with the no-space style to 7204 lines with the with-space style, a ratio of about 3.9:1; or 1899 vs 746 (2.5:1) in .h files. For the following-token-is-function-identifier case, the gap is narrower, at 1350 vs 792 (1.7:1).
The simplistic automated count in question is:
- find \( -name '*.h' -o -name '*.cpp' \) -type f -print0 |
- xargs -0 cat |tr '\t' ' ' |
- egrep -c '(void|int|unsigned|char|short|float|double|\<[A-Z][a-zA-Z0-9_]*)(( *const)? *[*&])+[a-zA-Z_]'
for no-space, and inserting ‘ +’ before the final ‘[a-zA-Z_]’ for with-space. (‘volatile’ is not used in Inkscape source code at the time of writing.) For the following-token-is-function-identifier case, the final ‘[a-zA-Z_]’ was extended to ‘[a-zA-Z_][a-zA-Z0-9_]* *\(’. Note that this means that both the no-space and with-space counts exclude Gnu-style function definition heads (where the return type is on a separate line).
Brace Placement
Arguments against "disco"/compact braces:
- Make it hard to match braces visually.
- Partial counter-argument: emacsen & vim each have brace-locating commands. (Emacsen: C-M-u, C-M-n, C-M-p, C-M-d (up/next/prev/down); vim: `[ {', `] }'.)
- The brace closes a block, not an if "statement".
- However, the block is intimately related to the if/for/while in ways that a normal block is not.
Functions vs methods, i.e. member vs non-member functions
- Information-hiding principle suggests that if a function doesn't need access to private variables then it's better not to make it a member function.
- In more detail: when one wants to change some detain about a private variable, then one must look through everything that has access to that variable. This is easier if there's less code that has access to the variable, e.g. if one uses non-member functions in preference to member-functions.
- Virtual functions must of course be member functions.
- Trivial getter/setter functions seem most natural as member functions. (This is mostly a matter of style or tradition rather than technical reasons.)
- "barType getBar() const { return _bar; }" / "void setBar(barType x) { _bar = x; }" are trivial. It's not clear how much else is to be considered trivial.
- For things needing access to private variables, the requirement of the `friend' declaration for non-member functions is a cost: extra clutter in the class definition, requires updating for changes in the signature; and many of the arguments for non-member functions disappear.
- Member functions must be declared in the class definition, which tends to require more recompilation because one can't use a separate header file.
- Sometimes one can break the class into smaller, simpler classes if there are lots of member functions.
- A related issue is that it's harder (impossible) to write private specializations (same name but subclass arguments): any specialization must be declared in the class definition too, so it isn't private other than through comments.
- The shorter names typical of member functions are harder to search for.
- (Counter-argument: this is a matter of custom, there's no technical reason why one can't use long names for member functions, or even short names for non-member functions.)
- Someone has claimed that some tools do relatively well at finding the right version of a given name. Can someone give examples of such a tool (and some comment as to how widely used that tool is or can be by our developers) ?
- Pointers to member functions aren't as easily used as pointers to non-member functions. ("unwieldy", "heavy-weight".)
- Member functions can't be given static linkage.
static vs anonymous namespaces for file-local functions/objects
The current C++ standard marks this type of `static' as deprecated, meaning "Normative for the current edition of the Standard, but not guaranteed to be part of the Standard in future revisions".
(Reference: Final Draft of the C++ standard, Annex D "Compatibility features"; and §7.3.1.1 "Unnamed namespaces", ¶2.)
However, g++ up to 3.4 (and possibly later) don't give anonymous-namespace names the same advantages given to static names:
- Warning if the name isn't referenced (and hence unused).
- Optimizations when the function/object can't be used from other object files.
- Static linkage (relevant to analysis using nm).
In addition, having a static tag in the declaration itself makes it easier for the programmer to see that the name isn't referenced outside of this translation unit.
Consequently, it is recommended that we use static for names not referenced from other translation units.
If and when we encounter a compiler that rejects `static' for objects in namespace scope, we can easily replace `static' with `namespace { ... }'.
What to put in what header file
Reasons for change:
- Reduce compilation times (most important)
- Fix the mess of NR::Point, NR::Matrix etc. header files. Less confusion as to what needs to be #included, or where to look for the definition of something (especially in the cases where tags-like programs don't help, e.g. if looking for something helpful for a known task rather than having a known name).
What won't change:
- foo.h will still provide the same things it always has (unless you count things "accidentally" provided, which we hope to become less). It's just that some content may be physically moved to a file that foo.h #includes.
The issues are:
- Ease for callers (what file/s to #include). This is largely taken care of by having files that #include other header files, and in particular having the foo.h file as at present.
- "I need to change the blah declaration/definition; what file do I edit?".
- Information we want ppl to be aware of.
Definitions are handled by tags/idutils/global etc (though all of these tools have limitations in their ability to find the _right_ definition). Declarations that are separate from the definition: [non-member] function declarations are the main case. member functions (obviously kept in same physical file as their class definition); non-member functions
Here are some types of things to be found in header files. Whatever rules you hypothesize, check that the rules cover each of these cases.
- class foo definition
- "class foo;" forward declarations
- struct foo_class (i.e. GObject "class"/vtable type)
- IS_FOO, FOO cast etc. macros
- ancilliary types (e.g. struct StopOnTrue for marshalling)
- enums
- relevant instantiations (traits etc.)
- foo_do_blah (declaration of function defined in .cpp file)
- inline function definitions
- should declaration & definition of inline functions go in different files (there are pros & cons)
- other constants, including #defines
- other macros. Where is the line between "other macro" and "inline function" and IS_FOO macros?
The physical placement of the IS_FOO etc. macros (as distinct from what header files "provide" them) isn't important: we never need to edit them, and they're relatively easy to find by either primitive tags-like programs or by grep.
- This unimportance-of-physical-location is reflected in the inconsistency of current practice, sometimes placing them with foo.h and sometimes in blah-forward.h.
- Exception to unimportance-of-physical-location: IS_FOO etc. should not be physically in the same header file as something that's "expensive" and unneeded by one or more translation units that need IS_FOO etc. ("Expensive" in same sense as `amount' defined below in `Reducing compile times'.)
The main thing for IS_FOO etc. is what header file(s) provide the macro. Thankfully, this is easy in the common case: usually any function that needs the definition of IS_FOO also needs lots of other things relevant to the foo type, so would simply #include foo.h.
Relevant properties of function declarations:
- It is common to add or remove functions, or change the arguments.
- Typically not found by tags-like programs. E.g. neither `tags' nor `TAGS' files store the location of declarations (other than the definition).
- Programmers want to know what functions (including methods, macros, inline things) are available (or rather "is there something that does blah").
- Probably doxygen or the like is the best task for this, though we'd need to make it convenient to use.
- Needed in advance of function definitions (whether in .cpp file or inline function definitions). Not needed in advance by macro definitions per se (but needed in advance of function definitions where the macro is expanded).
Inline function definitions:
- Relatively expensive in compile time for translation units that don't use that inline function definition.
- Must come after (in the translation unit) all referenced declarations. Sometimes this requires that it be outside of the class definition or in a separate file from related declarations. (E.g. if foo.h and bar.h each provide an inline function definition that requires declarations provided by the other file.)
Operator overloading:
- Can be unclear whether to associate with the type of the left operand or the type of the right operand.
- We can simplify things for users of the function by having both left.h and right.h provide it (i.e. have both left.h and right.h #include it). This suggests that it should be physically neither in left.h nor right.h (unless it's OK for left.h to pull in all of right.h or vice versa).
Reducing compile times
- For each translation unit U, reduce the amount of material that gets included by U and isn't needed by U.
- "Amount" is measured by cost: how often that material changes (causing a recompile), and how expensive it is to compile (e.g. inline function definitions).
- In what circumstances does a translation unit include things that it doesn't need?
- Inline function definitions.
- Needs a function declaration but doesn't need the type definition.
- A special [non-] case of this is needing a member function declaration. Member function declarations can't be separated from their class definition. See also `methods vs functions' above.
- Suppose that struct Foo has a `Bar _bar' member. The translation unit may need to create Foo objects (and thus need the type definition of Bar to know its size), but not need to access _bar (so not needing anything from bar.h other than the type definition of Bar).
- This is very common. Common examples of Bar: SPShape, NR::Point/NR::Matrix.
- Needs some function declarations but not others.
- Needs some type forward declarations but not others.
- Comments are never needed for compilation.
- Definition of a marshaller is needed only by emitters. However, it's difficult to separate the marshaller from the signal, and difficult to separate the signal from the definition of the containing type (SPDesktop, SPItem).
- Including the definition of a type when its declaration would suffice.
- Copy&paste of #include lines.
- Ways of reducing the need for type definitions:
- Store just a pointer instead of the whole type.
- Provide a wrapper function (not inlined) for `new Foo'.
Preprocessor MACROS
Capitalizing or not?
# define IS_FINITE(_a) (std::isfinite(_a))
Arguments pro capitalizing
- Macros don't have a namespace and can therefore easily clash with other names. This happened in the isFinite case. where we had across several files:
#define isFinite(x) ... ... class Point { bool isFinite(int x) }
This obviously nameclashed, where a capitalized IS_FINITE would have never clashes as (I guess) nobody uses capitalized function names. Note that an inline function instead of macro would also have worked.
- Some macros can expand strangely (don't have good example now), so it is good that the user knows it does so.
- Note that it is often also possible to define an inline function instead of a macro!!! (which is a better solution?)