Improving Ubuntu Unity User Interface

Modern computers are getting more and more complex, and so do the tasks we can execute with them. On the other hand, we, the users should simply concentrate on the core of our tasks: so let the computer do most of our work. In the end, a computer should never lower our creativity and productivity. As a result, an efficient and effective user interface is crucial nowadays. But what makes a user interface good? Well, first of all, it should be simple to use, thus supporting a clear layout. In addition, a GUI should be in some sense intelligent in that it respects the current context of running/active programs, so that only the minimal required information is displayed.

One good thing about Linux is that one can customize its UI anyway you like it, as long as you have the expertise and time. In recent years, the user interface of Ubuntu Unity (now called Unity 3D sometimes) has greatly improved. However, Unity UI is not perfect yet: there are some aspects which should be improved as soon as possible – simple changes, but resulting in huge improvements.

First, icons for open and closed programs are currently hard to distinguish. Second, the task bar always shows a program’s title being rarely needed – at least it is much less needed as a program’s menu. As a simple rule, use the desktop space as best as you can, an only for the most important things. Just consider the following scenario: when browsing the net with firefox, do you really need a panel showing you that you are currently using firefox (maybe along with a website title which is rarely needed in practice)? Instead, it should show common commands/tasks (or at least the ordinary menu bar).

Getting and compiling source code

To start customizing Unity, you need the source code. I strongly recommend to use the source code for the currently installed version of Unity – this will make your life much easier.[1] The article Building Unity very well explains the basic steps to get you started. In case you are only interested in experimenting with alternative user interfaces and don’t want to spend a lot of hours reading posts and compiling libraries, don’t use the command bzr branch lp:unity trunk. Instead, manually download the source code for your Unity. Also, please check out the list of bug reports to save time as much as you can.

After downloading, simply follow the next steps on this webpage: Install build dependencies, compile NUX and then compile Unity. In general, you may need to install more libraries.[2]

The Launcher

After compiling and successfully running your compiled Unity, edit some icons/images in ~/staging/share/unity/X, where X is the major version of your Unity branch, e.g. 4, 5, 6, 7 etc. This directory contains images for the launcher, the panel and other components.[3] Now is the time for getting creative: study this directory and edit the images while using your favourite image editor (Gimp, Krita, Photoshop or whatever).

launcher_bfb.png Launcher Icon
launcher_icon_back_54.png Background of icon (launcher)
launcher_icon_edge_54.png Edge of icon (launcher)
launcher_icon_glow_62.png Glow of icon (launcher) (when minimizing programs, program asks for user interaction)
launcher_icon_shadow_62.png Shadow of icon (launcher)
launcher_icon_shine_54.png Shine of icon (launcher)
launcher_icon{_selected}_back_150.png Back of icon (switcher)
launcher_icon{_selected}_edge_150.png Edge of icon (switcher)
launcher_icon_glow_200.png Glow of icon (switcher) (used when active)
launcher_icon_shadow_200.png Shadow of icon (switcher)
launcher_icon_shine_150.png Shine of icon (switcher)

As an alternative, download the launcher icon pack and extract the files to this directory. Use these images to get you started.

Now we have to edit the source code to improve the padding and margin of icons – it’s just a few lines. Locate the file Launcher.cpp (it’s in ~/code/unity/{YOUR_VERSION}/plugins/unityshell/src/).

1. Change the value of the constant ICON_PADDING to 0, somewhere around the line 96: const int ICON_PADDING = 0;. This makes your icons directly stick to the launcher edges.

const int ICON_PADDING = 0;

2. Locate the method Launcher::SetIconSize() and add the line tile_size += 25; immediately at the beginning of this method, before any other lines. I have done this because the should be a padding between the tile and the icon itself. Unfortunately, the source code author has not clearly distinguished between padding and margin.

void Launcher::SetIconSize(int tile_size, int icon_size) {
  tile_size += 25;
  _icon_size = tile_size;
  _icon_image_size = icon_size;
  _icon_image_size_delta = tile_size - icon_size;
  _icon_glow_size = icon_size + 14;

  icon_renderer->SetTargetSize(_icon_size, _icon_image_size, _space_between_icons);

  Resize();
}

The Panel

When working with programs, the most important tasks should be displayed in the panel, instead of the title. By default, when moving the cursor over the panel reveals the menu bar. This behaviour should be changed so that the menu bar is always displayed when having a program active, but not when using the switcher or the dashboard.

To edit this behaviour, study the header file “PanelMenuView.h” and look for the private boolean state variable _is_inside, which is true when ever the mouse is over the panel. The principle is either to keep this variable always true, or to ignore this variable when rendering the panel. Since the first approach is a bit counter productive (more work; worse logic), I would just ignore _is_inside in the method bool PanelMenuView::DrawMenus() const (somewhere around line 336). That’s it – for now at least. Remake Unity and run the new build (by using unity-env and unity --replace &).

bool PanelMenuView::DrawMenus() const
{
  auto wm = WindowManager::Default();
  bool screen_grabbed = (wm->IsExpoActive() || wm->IsScaleActive());

  if (_we_control_active && !_overlay_showing && !screen_grabbed &&
      !_switcher_showing && !_launcher_keynav)
  {
    if (_last_active_view || _show_now_activated || _new_application)
    {
      return true;
    }
  }

  return false;
}

Making Changes Permanent

Since the above changes should be used whenever using your Ubuntu account, edit or create the file ~/.bash_profile and add these lines to directly use your custom Unity environment when logging in to Ubuntu:

# ~/.bash_profile
# The personal initialization file, executed for login shells

# Unity-dev: use local development unity files and settings
export PATH=~/staging/bin:$PATH
export XDG_DATA_DIRS=~/.config/compiz-1/gsettings/schemas:~/staging/share:/usr/share:/usr/local/share
export LD_LIBRARY_PATH=~/staging/lib:${LD_LIBRARY_PATH}
export LD_RUN_PATH=~/staging/lib:${LD_RUN_PATH}
export PKG_CONFIG_PATH=~/staging/lib/pkgconfig:${PKG_CONFIG_PATH}
export PYTHONPATH=~/staging/lib/python2.7/site-packages:$PYTHONPATH

Now Get Creative

It’s time to customize your Unity the way you like it and need it for your daily work, and improve your productivity. Edit the image files, or experiment with the source code. If you want to get further, use the latest source code, but be aware of a lot of dependent libraries which you have to compiled as well, and many unknown/severe bugs in development version.

Further recommended improvements:
1. Change the launcher icon size to 33 (either via “All Settings” > “Appearance”, or via ubuntu-tweak)
2. Set the panel opacity to 0.3 with ubuntu-tweak

The improvements I presented here are far from perfect – they show what you can do with Ubuntu Unity and what would make a better user interface. You can download the launcher icon pack and the two source files to reflect the changes.

Below is a gallery of possible changes which would greatly improve Unity. And maybe see these changes in future releases.








  1. Only download the latest version if you definitely plan to contribute to Unity. But be prepared to manually compile a lot of additional dependent libraries and possibly unstable code.^
  2. To find out which additional libraries are needed, check out the output of the command “make” – it checks dependencies.^
  3. The system wide installed Unity uses a similar organisation: /usr/share/unity/X. However, never manually edit these files as this is asking for trouble!^

Surface Remeshing – A Topological Characterisation

Remeshing is a key technique to improve the quality of a mesh. It is important for industrial applications, such as numerical simulation, geometric modeling (like shape editing, animation, morphing).

In surface remeshing, the goal is to reduce the complexity (mesh simplification) or to improve the mesh quality (to be used as input for downstream applications). In addition, the input mesh is assumed to be a manifold (triangle) mesh. Otherwise, mesh repair algorithms are needed for processing.

The term mesh quality refers to non-topological properties, e.g. sampling density, regularity, size, orientation, alignment, mesh element shape.

Local Structure

Element Type

Most common types are triangles and quadrangles, while triangle meshes are easier to produce and modern GPUs are optimised to process a large volume of triangles.

To convert a triangle mesh into a quadrangle mesh, one can either use barycentric subdivision or a one-to-three split while discarding the original mesh edges.

Principles of Barycentric Subdivision

One-To-Thre Split and removing original triangle edges

Element Shape

The shape of elements can be either isotropic or anisotropic. An element is said to be isotropic if it is locally uniform in all directions – e.g. a triangle/quadrangle is isotropic if it is close to equilateral/square. For triangles, the ratio of the circumcircle radius to the length of the shortest edge is a good measure for its shape.

One good thing of isotropic shapes: their local uniform shape often leads to a better conditioning in numerical applications, e.g. FEM or geometry processing.

But when it comes to shape approximation with as few elements as possible, anisotropic shapes are often preferred. To achieve the same approximation quality, one has to carefully align the shapes, e.g. along the principal curvature directions of the surface. In addition, geometric primitives like cylinders or cones can be better expressed with anisotropic shapes (as in most technical models).

Examples of triangles and quadrangles with low/high isotropy

Element Density

In a uniform distribution, the elements are evenly spread across the entire model. But if the number of elements varies, the mesh is characterised as a nonuniform or adaptive distribution – e.g. a classic case is where smaller elements are assigned to areas with high curvature. This can lead to significantly fewer elements required to approximate meshes, compared to uniform meshes.

Element Alignment and Orientation

Elements should align to sharp features to properly represent tangent discontinuities. For faithful shape approximation, orientation of anisotropic elements plays a crucial role. This is especially important for resampling processes.

Global Structure

A regular vertex in a triangle mesh has valance of 6 (interior vertices) or 4 (boundary vertices). In a quadrangle mesh, a vertex is said to be regular if its valance if 4 (interior vertices) or 3 (boundary vertices). All other vertices are said to be irregular or extraordinary.

Regular vertex: interior and boundary triangles

Regular vertex: interior and boundary quadrangles

Irregular Meshes

They do not exhibit any kind of regularities in their connectivity.

Semiregular Meshes

The number of extraordinary vertices in a semiregular mesh is small and constant under uniform refinement. They are produced by regular subdivision of a coarse input mesh. Semiregular meshes are suitable for multiresolution analysis and modeling since they define a natural parameterisation of a model over a coarse base mesh.

Highly Regular Meshes

Most vertices are regular. They need not be the result of a subdivision process. Highly regular meshes are well suited to numerical simulations – mesh compression algorithms can take advantage of the mostly uniform valence distribution and produce a very efficient connectivity encoding.

Regular Meshes

All vertices are regular. Such a mesh can be compactly represented as a 2D array to be used for efficient rendering (see geometry image). They can be generated only for input models that topologically are (part of) a torus. All other models have to be cut into multiple topological disks.

Correspondences

Most remeshing algorithms iteratively relocate mesh vertices to improve the quality of the mesh. One key issue: to compute or to maintain correspondences between points on the generated mesh and their counterparts on the input mesh.

Global Parameterisation

The input model is globally parameterised onto a 2D domain. Sample points can then be easily distributed and relocated in the 2D domain and later be lifted to three dimensions. It is usually expensive and may suffer from parametric distortion or discontinuities (when mesh needs to be cut into a topological disk).

Local Parameterisation

The algorithm maintains a parameterisation of a local geodesic neighbourhood. As soon as a sample leaves this neighbourhood, a new neighbourhood has to be computed. It is stable and produces high-quality results, but it requires expensive bookkeeping to track, cache and re-parameterise the local neighbourhoods.

Projection

The sample point is projected onto the nearest element (points/edge/triangle) of the input model. Naive direct projection may produce local and global fold-overs if the points are too far away from the surface – solution: constrain the movement of the sample points to their tangent planes.

Histogram Equalization

In automated image processing, we often want to compare two images for certain features, e.g. lines, contours, color matching, illumination distributions. Such algorithms require input images having specific characteristics: contours may be clearly visible. However, images from scanned devices (satellites) may lack contrast. Although histogram stretching could be used in this case, dark areas should be more illuminated than light ones to increase and preserve existing (visual) structures. In this case, histogram equalization (HE) could be used to enhance details for subsequent processes. In addition, some digitizers or display systems introduce non-linear effects which can be corrected by histogram equalization too.[1]

This post should give you the basic idea behind histogram equalization. For detailed derivations, see the further reading section below.

Global histogram equalization

Workflow of histogram equalization


In general, this technique works well if the distribution of pixel values is similar throughout the image – otherwise lights are made too bright, thus loosing details. Dark areas are only increased in intensity, if the overall image is rather under exposed. Similarly, an over exposed image is generally reduced in intensity.

In practice, the first step is to compute the global histogram of the input image: create a number of buckets, distinctly divide the range of input pixel values into the number of buckets, and count the number of pixels belonging to each bucket.[2] For optimized implementations in C/C++, the input image pixels should first be converted to integer values – thus having a bucket width of 1.

histogram(image,buckets) ≡     
    h = int[buckets]
    foreach pixel in image
     h[ bucketof(pixel.value) ]++
    return h

Once the histogram have been obtained, an iteration cumulatively sums all elements in this array so that the last element contains the number of pixels of the input image. Since a cumulative distribution lookup table is required for further steps, all elements in this array have to be divided by the value of the array’s last element, i.e. the number of pixels of the image. Essentially, summing and dividing can be done in one iteration.

The cumulative distribution lookup table simply maps input pixel values to almost equalized pixel values in [0,1] range.[3]

cumulativeDistr(histogram) ≡     
    d = float[histogram.size]
    s = 0
    for i=0 to histogram.size
     s += histogram[i]
     d[i] = s
    return d / d[d.size-1]

This method can be ideally implemented as real-time, e.g. using shaders, CUDA or simply CPUs. However, contrast might become very high and noise may become quite noticeable.

Adaptive histogram equalization

Workflow of AHE-CLAHE

The problem of global histogram equalization is that lights may get too bright, and small dark areas are not sufficiently brightened. To overcome this limitation, a local histogram and cumulative distribution lookup table can be computed for each pixel, for a given radius – e.g. 8 pixel radius. The center pixel value can then be converted using its local lookup table.

Clearly, this method is computationally expensive. Another drawback: extreme contrast in areas with (almost) no contrast or high image noise – the local histogram will then have one huge peak, resulting in a sudden jump concerning the cumulative distribution function. If the center pixel is slightly left of this jump, it becomes very bright, otherwise it becomes very dark.

This method improves local contrast of an image while bringing out more detail. But it has a tendency to overamplify noise in relatively homogeneous regions of an image. Contrast at smaller scales is enhanced, while contrast at larger scales is reduced. It can further increase shadows, while lights are not made too bright. The quality of the result depends on radius and level of noise (a lot of noise and too small radius can result in extreme contrast)

adaptiveHistEqual(image, radius) ≡     
    outimage = new Image(image.with, image.height)
    foreach pixel in image
     h = 〈localHistogram(image, pivot, radius)〉
     distr = cumulativeDistr(h)
     outimage[pixel.position] = distr[pixel.value]
    return outimage

Contrast limited adaptive histogram equalization

Principle of contrast limiting cumulative distribution lookup table

Since AHE may overamplify noise and contrast, a better approach would limit the cumulative distribution’s slope which corresponds to actual contrast in the final image. One approach works directly with this distribution lookup table by limiting the maximum difference between subsequent table (array) values. If a value exceeds the preceding value plus a maximum difference, this and all following values are reduced by the difference of these two values by using a “carry”. This method simply clamps intensities, thus possibly changing contrast. Obviously, the resulting distribution lookup table may not be normalized. However, normalization further changes contrast and additional passes are usually required since the maximum slope may be exceeded again.

Another similar approach works with the histogram instead, by simply cutting of all intensity values above a given threshold. These intensities can either be clamped or distributed across the rest of the histogram. Note that the later approach may require several passes to ensure a given slope limit. In case of a uniform distribution, pixels corresponding to local minima in the histogram may become lighter.

Principle of CLAHE by using its histogram

Furthermore, absent color or gray values in the original image may suddenly become visible in the final image, thus possibly resulting in strange artifacts. In contrast, a non-uniform distribution increases pixel value intensities near the cutoff areas while preserving relative illumination, i.e. absent pixel values in the original images have still zero intensity in the final image.

Uniform distribution of clamped pixel values when using CLAHE

Non-uniform distribution of clamped pixel values when using CLAHE

In general, contrast limited adaptive histogram equalization slightly blurs the image, but produces much better results than without limiting the slope, as in AHE.

A simple pseudo code for the first approach with clamping:

cumulativeDistrClamped(histogram, max_slope) ≡     
    d = cumulativeDistr(histogram)
    dh = 0
    for i=1 to d.size
     dh += max(d[i] – dh – max_slope – d[i-1], 0)
     d[i] -= dh
    return d

A possible implementation in Python, although not as efficient as it could be:

import numpy as np   
import matplotlib.pyplot as plt  
import matplotlib

def imread_int( filename ):
    print "processing '" + filename + "' ..."
    img = plt.imread( filename )
    return np.array( img*255, dtype = 'int')

# grayscale values must be in range [0,255]
def histogram(grayscale, bins=256):
    h = np.zeros((bins))
    for i in xrange(grayscale.shape[0]):
        for j in xrange(grayscale.shape[1]):
            h[grayscale[i][j]] += 1
    return h

def localHistogram(grayscale, pivot, radius, bins=256):
    h = np.zeros((bins))
    for i in xrange(max(pivot[0]-radius,0), min(pivot[0]+radius,grayscale.shape[0])):
        for j in xrange(max(pivot[1]-radius,0), min(pivot[1]+radius,grayscale.shape[1])):
            h[grayscale[i][j]] += 1
    return h

def cumulativeDistr(histogram, slope_max = 0):
    d = np.zeros((histogram.size))
    c = 0
    for i in xrange(histogram.size):
        c += histogram[i]
        d[i] = c
    d /= c
    if (slope_max > 0):
        dh = 0
        for i in xrange(d.size-1):
            dh += max(d[i+1]-dh-slope_max-d[i],0)
            d[i+1] -= dh
        #d += 1-d[d.size-1]
    return d


def equalizeHistogram(grayscale):
    h = histogram(grayscale)
    d = cumulativeDistr(h)
    img = np.zeros(grayscale.shape)
    for i in xrange(grayscale.shape[0]):
        for j in xrange(grayscale.shape[1]):
            img[i][j] = d[grayscale[i][j]]
    return img

def adaptiveHistogramEqualize(grayscale, radius = 8):
    img = np.zeros(grayscale.shape)
    # brute force method
    for i in xrange(grayscale.shape[0]):
        for j in xrange(grayscale.shape[1]):
            d = cumulativeDistr(localHistogram(grayscale, (i,j), radius))
            img[i][j] = d[grayscale[i][j]]
        if (i % 10 == 0):
            print str(i*100/grayscale.shape[0]) + "%"    
    return img

def contrastLimitedAdaptiveHistEqual(grayscale, slope_max, radius = 8 ):
    img = np.zeros(grayscale.shape)
    # brute force method
    for i in xrange(grayscale.shape[0]):
        for j in xrange(grayscale.shape[1]):
            d = cumulativeDistr(localHistogram(grayscale, (i,j), radius), slope_max)
            img[i][j] = d[grayscale[i][j]]
        if (i % 10 == 0):
            print str(i*100/grayscale.shape[0]) + "%"    
    return img

  1. Because HE uses non-linear and non-monotonic transfer functions to map input to output pixels^
  2. In other words, calculate the cardinality for each bucket.^
  3. A discrete cumulative distribution function does not create a equalized histogram because the bucket width dx is greater than 0. Absolute equalization is only guaranteed if this function is continuous.^

Creating Zone Plates in Python

Zone Plate generated in Python

When subsampling images with rectangular-like or stripe-like patterns, moiree effects can occur. To simulate such effects, zone plates can be subsampled, i.e. reduzed in their size. Subsampling with a factor of 2 corresponds to choosing every second pixel per column and row of an image. Doing so with zone planes results in moiree effects.

To generate a zone plate in python, use this code:

import numpy as np   
import matplotlib.pyplot as plt  
import matplotlib
    
  
def zone_f( coords, center, k =0.001):
    deltaX = center[0] - coords[0]
    deltaY = center[1] - coords[1]
    return np.cos (k * ( deltaX * deltaX + deltaY * deltaY ))

def subsample(grayscale, factor):
    img = np.zeros((grayscale.shape[0]/factor, grayscale.shape[1]/factor))
    for i in xrange( img.shape[0] ):
        for j in xrange( img.shape[1] ):
            img[i][j] = grayscale[i*factor][j*factor]
    return img    

N =512
center = np.array([35 ,60])
plate_img = np.zeros((N ,N ))

for i in xrange ( plate_img.shape[0]):
    for j in xrange ( plate_img.shape[1]):
        plate_img[i][j] = zone_f( np.array([i,j]), center)

Note that it might not be the most efficient algorithm.[1]

For the subsampling method, a brute force way would look like this:

def subsample(grayscale, factor):
    img = np.zeros((grayscale.shape[0]/factor, grayscale.shape[1]/factor))
    for i in xrange( img.shape[0] ):
        for j in xrange( img.shape[1] ):
            img[i][j] = grayscale[i*factor][j*factor]
    return img   

Using this and the above definitions with matplotlib, see the results for subsampling factors of 2/4/8/16. I suggest you view them 1:1 – otherwise you see moiree effects depending on the actual minification.

Subsampling zone plates


  1. We had this code in a course in computer graphics 2 (image processing), and I thought it might be helpful for someone else^

Colorize Grayscale Images

When taking photographs from cities, urban areas, mountains, countries or other regions using satellites, from distant galaxies or when qualifying commercial products using laser scanners, resulting grayscale images usually have to be colorized for a better understanding of the image’s content. In this sense, atlas maps are quite a popular example: mountain peeks should be gray or white, while plains are usually green, and deep oceans appear to be dark blue. When the NASA takes photographs from distant galaxies or planets, they colorize these images for television and newspaper so that people better recognize key contents. Another example: when scanning suitcases in the airport, resulting images are usually improved with colors; note however that content recognition algorithms are used too, along with UV and/or IR scanners.

Colormap to Colorize Grayscale Images

Colorizing grayscale images tends to be rather straightforward: define a projection function which takes a grayscale value as argument and returns a corresponding color. This function can either be based on numeric calculations, or it can use a pre-defined color map where the start corresponds to black (0.0) and the end to white (1.0)[1].

Using a gradient map with RGB channels, one can easily “extract” such a function. Suppose grayscale as well as RGB values lie in the range [0.0, 1.0], and the first/second/third channel define red/green/blue colors. Then a possible projection function is

ColorizeRedComponent(graycolor, colormap) ≡     
    return colormap[(int)colormap.size*graycolor][0] // [0] for the red component in the colormap

Or more general for colormaps with arbitrary channels:
ColorizeComponent(graycolor, colormap, channel) ≡     
    return colormap[(int)colormap.size*graycolor][channel] // [0] for the red component in the colormap

Using the colormap from above for the following grayscale image

Grayscale Image to be colorized

colorization results in this image, where mountains, peeks and valleys are clearly to be seen now:[2]

Colorized grayscale using the colormap above

Another nice example: when we reverse this colormap and apply it for the north pole of mars, we get this RGB image:

Mars north pole photographs by NASA satellites

Colorized Mars north pole using the reverse colormap above

Conclusion: Although finding the right colormap can be a bit tricky sometimes, colorizing grayscale images is a simple and efficient task being used in all serious image editors. It should be standard vocabulary of graphics programmers – a little effort for a lot of outcome.


  1. Color values in images with one channel (e.g. grayscale) or multiple channels (e.g. RGB) are usually encoded within the interval [0.0,1.0], where 0.0/1.0 corresponds to black/white, or 0%-red/100%-red.^
  2. This colorization has been done using Python, numpy and matplotlib. Maybe not the most efficient implementation, but it suits well enough for testing purposes and prototypes^

Introduction to Image Thresholding

In image processing, thresholding is used to split an image into smaller segments, or junks, using at least one color or grayscale value to define their boundary. A possible threshold might be 40% gray in a grayscale image: all pixels being darker than 40% gray belong to one segment, and all others to the second segment. It’s often the initial step in a sequence of image-processing operations.

One key point: These segments are not necessarily convex since image content is arbitrary. In that sense segments should represent objects in the image, e.g. letters, cars, traffic signs, human faces in the foreground. Unfortunately, such objects vary with respect to colors, intensity, illumination, lens aberrations and noise which makes choosing the right threshold not an easy task. Multiple thresholds might be needed in connection with more sophisticated algorithms.

In case the object in the foreground has quite different gray levels than the surrounding background, image thresholding is an effective tool for this separation, or segmentation.

So how to determine the hopefully best threshold? For one, it can be manually chosen by a trial and error approach. Less time consuming are automated methods which usually extract some information from the histogram – one possible idea might be to find local maxima and minima.

Another important point is that usually no spatial characteristics are considered when calculating thresholding values.

Thresholding in practice

When should I not use thresholding?

  • Strongly varying illumination across the image (some adaptive techniques may still be useful)
  • Objects are hardly distinguishable (pre-process required, e.g. histogram equalization)
  • Spatial objects with characteristic forms and colors should be segmented

Global thresholding

The most straightforward approach is to use one threshold across the whole image. Although it’s computationally most effective compared to other thresholding techniques, varying illumination/colors and noise strongly influence the quality of the result. In addition, there is no guarantee for contiguous segments; and only intensity values are considered (so no relationships between pixels).

Popular thresholding functions

Although the thresholding function can have any definition appropriate to solve the task as best as possible, there are also some popular methods:

Image Thresholding Functions


BinaryThreshold(T,M) ≡     
    foreach Pixel in SourceImage
     if (Pixel > T)
     DestImage[Pixel.Position] = M
     else
     DestImage[Pixel.Position] = 0

InverseBinaryThreshold(T,M) ≡     
    foreach Pixel in SourceImage
     if (Pixel > T)
     DestImage[Pixel.Position] = 0
     else
     DestImage[Pixel.Position] = M

TruncateThreshold(T,M) ≡     
    foreach Pixel in SourceImage
     if (Pixel > T)
     DestImage[Pixel.Position] = M
     else
     DestImage[Pixel.Position] = Pixel

ToZeroThreshold(T) ≡     
    foreach Pixel in SourceImage
     if (Pixel > T)
     DestImage[Pixel.Position] = Pixel
     else
     DestImage[Pixel.Position] = 0

InverseToZeroThreshold(T,M) ≡     
    foreach Pixel in SourceImage
     if (Pixel > T)
     DestImage[Pixel.Position] = 0
     else
     DestImage[Pixel.Position] = Pixel

Adaptive, dynamic thresholding

Recall that varying illumination and noise are problematic for global thresholding. To further improve the result, one can vary the threshold to counteract against varying conditions in the image.

Note that background levels are often not constant, and that object contrast may greatly vary within the image, thus putting a burden on the following methods.

Clustering

Given a segmentation with a threshold T, calculate the mean of all pixels for each segment. Define T such that each pixel is closest to the mean of the segment to which this pixel belongs to (and not to others).

It works well if the spreads of the distributions are approximately equal, but these distributions should not have changing variances.

Otsu Method

Try to make each cluster as tight as possible (so hopefully minimizing their overlap). As we adjust the threshold one way, we increase the spread of one and decrease the spread of the other. The goal then is to select the threshold that minimizes the combined spread. The optimal threshold is the one that maximizes the between-class variance (or, conversely, minimizes the within-class variance). See More

It separates the two clusters according to the threshold – it also tries to optimize some statistical measure.

Mixture Modeling

This technique tries to minimize the classification error in the threshold.

Basically, it assumes that each group is Gaussian-distributed. Since a distribution has a mean and a standard deviation, it can assume that there already exists two such distributions which have to be found. Furthermore, the optimal threshold has two such Gaussian distribution that best approximate the actual histogram. So finding the optimal threshold comes down to estimated such a distribution.

Multispectral thresholding

The idea is to segment the image based on an estimated optimal threshold in one channel. Each region is then subdivided using some characteristics and properties of the second channel, then the third channel up to the last channel or until each region has a distribution that indicates a coherent region.

Further reading

Comparison of WebGL Framework APIs – Part 5

This article is part 5 of “Comparison of WebGL Framework APIs“.

TDL

Quite a low-level API for WebGL applications, written with (high) performance in mind. Official examples are really promising, being a head in front of other comparable APIs. A documentation and tutorials would help this API to evolve – but only a small getting started document has been published. Given the fact that this API is rather low level, a good documentation is even more important, if not crucial for the API to survive. TDL Homepage.

Naming conventions rather look like pure C++ code from about 10 years ago – such a dated convention should not be used. Classes are organized in modules, as in done in most APIs for WebGL.

A math module supports working with matrices (transpose, inverse, trace etc.) and vectors, but no quaternions. Quaternions are in a separate module, which is quite a bad engineering practice.

Basic support for models, but conversion between representations or event storage parameters cannot be found (as in most other APIs too). A low-level API should consider these options. Textures, shaders and buffers are implemented too.

More sophisticated tools like spline curves, subdivision surfaces, octrees and light classes are not available, which makes this API rather unsuitable for everday usage. Especially a framework claiming for performance rendering on the Web has to implement octrees. A low-level API should include spline curves.

But a particle system in integrated.

With such a powerful framework, advanced concepts like IK/FK solvers, surfaces, physics, and animation should really be implemented.

Conclusion: A fast API that misses a lot of useful tools for most larger and everyday projects. Naming conventions used should be rethought. Some hints of bad software engineering practices (a math module with matrix and vector; a quaternion module on the same level). No Euler or conversion between different presentations are implemented, although a quaternion can be converted to rotation matrices.

Three.js

This API provides quite promising live examples on github. Instead of a complete online documentation, just a simple manual is published. The root namespace “THREE” rather indicates a constant than a module and has to be renamed. Three.js Homepage.

Perspective and orthographic cameras are ready to be used.

Chosen API software engineering principle makes application source code rather clear and more human readable. Scene setup can be done very easily with a few lines of code.

A canvas can only have one renderer (makes sense as long as there is only one renderer). For simple to normal projects, this is totally sufficient. The renderer takes a scene and a camera as arguments, which is a good design decision (separation of rendering, scene data and camera). Supported renderers are canvas, DOM, SVG, WebGL. Critic point: the “WebGLRenderer” class consists of 6000 lines, whereas most other classes just have a few hundred lines => some logic should be moved to existing/new classes. Same applies for “WebGLShaders”.

A scene can have multiple lights and meshes and cameras. However, an additional external logic is required to choose a camera from the scene through to render. More complex scenes would require automated camera selection for given constraints/cases, e.g. by using a callback functions within certain types of schedulers.

For better performance, this API supports array buffers (static and dynamic), besides classic triangles and lines.

Materials supports textures, fogs, skinning, shadow maps, alpha tests and even morphing. Additional (specialized) materials are lambert, phong, normal, depth and face material – the face material should be renamed. There is also a shader material and for particles as well.

Lighting: ambient, directional, point and spot lights.

The framework includes math classes (found within the core module), but no advanced features like curves, keyframes, solvers, etc. Minimal implementation of quaternions (conversion to/from and slerp), and of splines (curves). “Core” module mostly contains math-related classes – so engineering rework required.

Conclusion: Good to normal support for WebGL without any official documentation or examples. Advanced features like animation, particle system and physics are not implemented, but can be added with add-ons. Although naming conventions used are contemporary, the namespace “THREE” looks more like a constant. Some software engineering rework is required, since most classes have a few hundred lines of code, where “WebGLRenderer” and “WebGLShaders” consist of 6000 / 3000 lines; the huge difference usually is a sign for bad practices.

X3DOM

This is the only API for WebGL that strongly follows the X3D specification. Its homepage provides basic examples and an average documentation – in fact it links to the specification used, without any explanations throughout the library. In addition, this framework can be considered to be a research project, so the question remains how long active development is supported. X3DOM Homepage.

In general, basic/standard WebGL up to a model hierarchy by using a transform hierarchy is implemented, as well as a camera animation.

For animation, interpolating between position and orientations can be done: a smooth transition between position, orientation and colors.

Highly interesting, a video texture is available too, but it did not work in the examples by using chrome browser.

Another good point: large meshes can be rendered by automatically splitting a mesh up into smaller chunks so not to exceed the maximum allowed mesh size by the current WebGL implementation.

Conclusion: X3DOM is still very limited, but a very promising project nonetheless since it follows the X3D specification for Web 3D APIs/Content. It already supports environment effects, interpolations, lighting, rendering, shaders, sounds, volume rendering, textures, geospatial tools and networking. But all implementations are pretty much limited (it’s a research project). So a lot of work remains to be done for real practical usage.

Comparison of WebGL Framework APIs – Part 4

This article is part 4 of “Comparison of WebGL Framework APIs“.

PhiloGL

It’s official homepage provides good examples and tutorials, while being in active development. Like CubicVR3D, it follows the jQuery coding and naming principle of mapped arrays as method argument to reduce application source code. PhiloGL Homepage.

A math module contains classes for 3D vectors, 4×4 matrices and quaternions. Each class contains mostly used methods. Conversions between these can be done too. It uses a convention where `&#36` before a method indicates that it updates the caller object. Although it might reduce application source code, using “Math.Vec3.add(v1,v2)” might be more clearer than “v1.&#36add(v2)” – but a good performance idea anyway due to less object creation. However, “v1 = v1.add(v2)” would be logically equivalent, but with more overhead (although a bit clearer).

The “Application” class contains a canvas, an active shader program, a camera, a scene, textures and buffers. For complex scenes and applications, a more elegant solution would separate scene, cameras and renderers, although lights and cameras are placed within the scene (but a renderer is given a scene with lighting, a camera through which to render, and a resource [an image, canvas] onto which to render). But this approach is perfectly fine for small applications. Another point is the missing public main loop function in this class. If properly designed, this should not be a critical drawback. Callback functions are passed the same way as in jQuery.

As in some other APIs, the “Program” class contains compiled fragment and vertex shaders, being applied during the OpenGL ES rendering pipeline.

The scene (stored in WebGL.Application) contains “Program” and “Camera” objects – here two logically distinct concepts (scene and renderer shaders) are combined. Shaders should either bound to materials or to renderers for post effects. Recall: a Program instance is referenced in a scene instance being stored in the Application object (which again contains a Program object). A classic case for redundant storage. A scene also contains a list of models (meshes), here represented by the O3D class. Internal storage format of mesh data is unspecified in the documentation. “O3D” should be renamed to either “Meshes” or “Models” or something else.

The API also contains some utilities for event handling and IO and “workers” using the map-reduce paradigm. Some basic animation (or better transitions) can be done using the Fx class.

Conclusion: With a good documentation, this API allows minimal application code, besides good module names. It has no support for advanced features like animation and physics. Might be inconvenient for complex scenes and applications. All in all, this API is on a good way. The math module should be extended. Some engineering should be redone (Scene, camera and renderer should be separated). The &#36 before methods to indicate caller object update is an interesting approach, but unfortunately quite unusual and contrary to common method naming conventions.

SceneJS

With quite interesting examples, this API follows the jQuery coding and naming principle, but defining a hierarchy of object nodes may become very nested (so bad design too), since more errors are possible and code becomes less clear. SceneJS Homepage.

Implementation of this API has to improve performance: provided examples were at least half as fast as most other APIs, according to Chrome task manager (FPS column). Some examples trigger useless debug-like alerts.

Application source code required for basic examples is already quite large compared to its actual rendering output. Thus large scene require unclear, unnecessary large code. More basic tasks like method mapping, scene setup etc should be moved into the API to further reduce application source code and to speed up setup time. Another approach would be to use a precompiled scene file or an XML scene file as in another API – but no such loader is included.

This framework supports custom shaders, COLLADA and OBJ importer. Note that this software has never left its BETA stage. And there is no official documentation – neither on its website, nor on github.

Conclusion: A lot of work needs to be done here. Although with basic GL support, it should not be used for critical and complex scenes and projects because only beta versions can be download from it’s official homepage.

SpiderGL

A framework in one JS file without any modules – possibly a reason for no active development: the last commit was about Feb. 2011. In addition, no documentation is available. SpiderGL Homepage.

Some interesting examples could have been created with this API (but good examples?).

Conclusion: A classic case for “spaghetti” source code in one JS file without any modules. But basic support for meshes, matrices, vectors, shader programs, textures, transformation stacks. Design is rather dated and rework required for complex scenes.

Continue reading part 5

Comparison of WebGL Framework APIs – Part 3

This article is the third part of “Comparison of WebGL Framework APIs“.

KUDA

Another API in active development. However, the name “kuda” sounds like the proprietary “cuda” by Nvidia for GPGPU. In case this API manages to evolve as a popular framework, it has to be renamed sooner or later because Nvidia has the right to legally force this API to be renamed in some jurisdictions (European union) since “kuda” remarkably resembles the trademark “cuda” (at least, it sound the same). KUDA Homepage.

However, this API requires nodejs (nodejs.org) and is actually an editor for 3D content on the web. It is no framework to create 3D applications on the web – no further discussion. I just included it here because some webpages incorrectly listed this API along with other WebGL APIs to create web applications for 3D content (and not editors).

O3D

Inactive development with its last updates in 2010 (or stopped). Quite old fashioned naming conventions: “o3djs.webgl.makeClients(initStep2);” and “o3djs.base.o3d = o3d;”. O3D Homepage.

Conclusion: It hast no documentation, only a source code viewer. Looks like this API supports textures, render graphs, shaders, meshes, a transform graph, animation (curves, functions) and some math utilities.

OSG.JS

This API is in active development with good examples, but a bad documentation. It follows the principle of openscenegraph. Models are stored and organized as a scene graph. osg.js Homepage.

Supports cameras, lighting, materials, shaders, textures, transformations and animations. But physics classes are not included, as well as classes for advanced animations. This API uses three root namespaces (osg, osgAnimation and osgGA), which is quite a bad practice – an API should only be available through one namespace. “osgAnimation.Animation” has to be restructured and renamed too.

Math classes are not organized in a math module, but directly in the root namespace. There are classes for 2/3/4 dimensional vectors, matrices and quaternions.

Lighting: support for standard lights with usual options (i.e. diffuse, direction, position, specular, spot cut off). A light can write to a shader instance and can create uniform variables being used throughout the shader.

Materials have usual options (ambient, diffuse, emission, shininess, specular) and can write to shader instances too. No explanation in which contexts writing to shaders should be used.

Meshes are known as “osg.Geometry”, in this API. Question is what “dirty”, “dirtyBound”, “drawImplementation”, “drawImplementationDummy” are used for. Supports buffers with the class “osg.BufferArray”.

The hierarchy of cameras, meshes and lights is done with the node class. A node can have a viewport as well as a node mask for rendering, where color, depth and mask buffers can be cleared. The viewport should instead be stored in the renderer, not the scene (same applies to the color/depth/mask buffers). Bounding volumes are computed with the node instance. Here, pre and post render stages and be set. However, this mixes scene description and rendering into one class (pretty bad).

This API seems to be in crucially active development, otherwise there seems to be no explanation for practically empty classes “osg.Camera.Transform”, “osg.Camera.Transform.Node” and “osg.Transform.Node”.

2D textures (with magnification and minification filters, mipmaps, wrap options, cube maps) and cube maps are included by default. One really good thing is that a texture can be created either form a canvas, form a “local” (relative path) or from a URL image. Since latest security updates in all major browsers supporting WebGL, reading texture images from URLs on foreign hosts has been or will be blocked due to the “same origin policy”. So the additional option to load a texture from a given URL can be removed. Besides, a texture can be written to a shader too.

Shaders are loaded, compiled and linked using the shader class. A “Program” instance combines vertex and fragment shaders – a good design decision. But the API has no support for COLLADA.

Some basic animations can be done with this API, but some classes should require more options because they are almost empty – usually a bad design. For more believable or complex animations, curves are needed – but cannot be found in this API. So the animation capabilities are rather limited. This system uses keyframe animation with channels, but how channels are bound to actual scene parameters remains unclear from the documentation. It uses a stack of quaternions, rotational axis and a translation vector. Thus a very low level and basic support. But orientations of meshes (here nodes) can be animated this way. By the way: a good animation system uses solvers, like keyframe solver, FK and IK as well as constraint – these should be implemented here.

Conclusion: The mostly JS counterpart of OSG (open scene graph) with support for shaders. But some pretty empty classes. math classes should be organized within a math module. It has a really bad documentation. Only one mesh representation (storage type) is available, which might be inefficient. No support for physics, and only classes for basic animations. Some method renaming has to be done. All in all a good software design.

Continue reading Part 4

Comparison of WebGL Framework APIs – Part 2

This article is the second part of “Comparison of WebGL Framework APIs“.

Copperlicht

Although a commercial solution, this API is free to use. The homepage provides a good documentation and interesting examples. Copperlicht Homepage.

The animation system uses “Milkshape”, “Microsoft DirectX” or “B3D” files – a rather limited selection. Basides support for paths and splines, the collision detection uses a response system via picking and click detection.

Unfortunately, a lot of “CopperLicht” names found throughout the documentation. The class “CL3D.CopperLicht” is rather unhappy named – it should be Main instead. CL3D is an abbreviation for CopperLicht 3D. So it shows very strong signs of a commercial application. This main class contains mouse event handlers, getters for scene and renderer, and can project points from “scene space” to world space and vice verse. Result: the author(s) mixed too many different tasks together. For example, projection methods should be moved into a separate module, as well as for mouse event handlers (see MVC pattern).

Here, a 3D vertex always contains position, color, normal and 2 texture coordinates being used in the mesh buffer: this limited architecture wastes precious memory on web and mobile devices, in which case dynamic arrays should be used instead (as is custom in OpenGL and VBOs). So this class can be replaced with an enumeration of a used array structure for better memory locality.

The “Vect3D” class has comprehensive methods to work with (length, normalize etc), but having a lot of vectors as class objects is really bad for memory locality and thus decreases performance unpredictably. When working with quite a lot vectors, a contiguous array should be the solution – methods like length and normalize are then defined as public helper methods of the math module. In addition, the methods “divideThisThroughVect” and “divideThroughVect” are really bad designed because these names are not self explanatory which is crucial when working in a team of probability international colleagues. Same applies to “add” and “addToThis”. “getLengthSQ” should be written as “getLengthSquared”, as is common in modern math APIs. “setTo” should be named “copy”. The method “getNormalized” can be redundant, as it can be split into “copy”/”clone” and “normalize”.

One class, “Vect2D” has almoust no documentation and methods. Why?

There is no consistency in some cases, because a Triangle3d object can be cloned, a Vect3d not. When checking whether a point is inside a given triangle, no separate methods for the same task should be offered in an API because when should the user choose the fast or the slow method? This decision should only be made by the API which in additional may be influenced by global parameters (fastaccurate). In any way, computing this task is straightforward using barycentric coordinates. Numerical stability is not taken into account, since a point can be just a little off side wards, an usually will.

A good point: this API supports conversion between Euler, matrix and quaternion representations – numerical stability remains to be tested though. Again, unhappy naming of class methods.

The scene supports many render modes, although the documentation is not meaningful enough. The background option has nothing to do with the scene and should therefore be moved into the renderer. Another unanswered question: what is the “scene type” used for? In addition, there is no support for quads and octrees for increased performance – but in some way implemented as OctTreeTriangleSelector (meaning undetermined). The renderer class has a method to get the currently active camera, but there is no such thing as an active camera because of possible simultaneous rendering (in future maybe). So a camera should not be attached to a renderer, but be part of the scene. I suggest an external controller/task scheduler to assign cameras to renderers as needed.

A 3D plane is used for intersection tests, but some static constants/parameters are not meaningful enough for a good clean API: “CL3D.Plane3d.ISREL3D_BACK” stands for back plane relation, and the documentation has no explanation for its meaning. Since it’s the only plane in this API, a better name might be “CL3D.Plane.BACK”. And another almost ridiculous (sorry) parameter name is “planar plane relation” – a plane is always planar of course, otherwise it would be a surface (it always has a front and back side by the way).

The renderer class in this API draws meshes or their buffers, but it can also be used to create textures to create materials. Instead of a distinct camera class, camera parameters (model, view and projection matrices only) are integrated into the renderer class. In addition, calls like “begin scene”, “end scene” and “draw mesh” has to be moved into separate classes, for the renderer should only know what and how to render data (here: meshes, textures, shaders).

As for materials, they can have up to 2 textures – a bit limited though. This API supports light maps, reflection and solid materials, added transparent and transparent alpha channel materials, transparent reflection and standard material. All in all interesting options, but they are too specialized. I suggest standard material types as found in OpenGL, or OpenGL ES.

Animation is supported with a log of animator classes. Some class names are not meaningful, e.g. “AnimatorAnimateTexture”, “AnimatorOnClick”, “AnimatorFlyCircle”.

No physics supports for richer scenes and “nice” effects.

Conclusion: Provides standard support for WebGL, with COLLADA, some animation, VBOs for increases performance, utility and math modules, besides lighting options. But naming conventions are not consistent. Some methods are unusually named or abundant. Class and method names are not always self explanatory. Numerical stability as well as memory locality should be considered much more strongly.

GLGE

Having a good documentation, this API supports COLLADA, video textures, some physics, particle systems, animation, materials, and math utilities as global methods under the GLGE root namespace. It has no shader class. Although the homepage provides some examples, they should be more helpful, possibly combined with some tutorials. GLGE Homepage.

The only lighting options are diffuse and specular, being enough for basic scenes (but not for practical projects).

Camera class methods can be used to create perspective projection matrices for the only general camera class. There is no explicit “look at camera” – only implicitly by creating and assigning a projection matrix to a camera object. In addition, the camera object should be part of the scene instance in an object hierarchy, thus can be easily bound to other movable/animated scene models or curves.

Renderer: normal, depth and RGBA options are rather classic. More sophisticated renderers should be helpful, or commonly used shader programs to be created and assigned to renderer objects. Another interesting approach would be templates (from xml file e.g.) to created commonly used renderers with links to shaders – share shader programs, build a shader library.

The mesh class is used for raw storage of vertices, normals and faces. It is part of the “Object”, i.e. the scene model. Again: this name has to be renamed to prevent confusion – a light instance is an object too. The model can have several materials, contains some rendering options (line with, draw type, whether to cast shadows, whether to do depth test), some of which should definitely be moved into the renderer class. For performance reason, whether to cast shadows or not can be set here. This class (“Object”) does not contain any transformation matrices, only the mesh, and thus cannot have child models to built a model hierarchy in which case meshes with different transformations can then be reused to save memory usage, esp. on mobile devices.

Public methods in “GLGE” should be split into separate modules like math, math.vector, math.matrix, math.quaternion, math.spatial for AABB, octrees, intersection test etc. (these names are just examples for an illustration). The methods “GLGE.makePerspective” and “GLGE.makePerspectiveX” are badly named and, in addition should be moved into the camera class. The root namespace (GLGE) contains many constants for key enumerations, camera types (ortho, perspective), rendering (lines, points, tris), blending modes: they should be moved into their logical context/classes as well, i.e. into renderer/camera/input/math classes.

This API does not support shading – instead 2 filter for AO (ambient occlusion) and glow which is quite old fashioned.

An interesting idea is that the renderer can have transition filter. But a more general concept would suit much more applications and users better. For exmaple: a post shader should take two rendered scenes and a transition curves for each option (i.e. a channel mixer); and the result from the rendered scene should be used after shaders has been applied to them.
A renderer takes a scene and viewport options which is a good design decision. However, it does not get any camera, which limits the scene to one active camera – there has to be one instance to set the camera active through which to render the scene. When rendering the scene through multiple cameras, this results in abundant API calls and more importantly, application source code grows and gets unnecessarily complicated. A better API would require the camera to be bound to the renderer (the glasses between brain and world, informally).

A good point: it supports XML scenes being defined as the “Document” class – the name is a bit misleading though. But what is the “DocumentPreloader” for? This extra class can be avoided with some options in the Document class. By additionally renaming it (e.g. XMLScene for loading into and storing from internal format), the API would be more clearer.

Animation curves are used for models that can be animated, i.e. for Camera, Object and ParticleSystem. A animation curve uses a trible of bezier points. The name “BezTriple” is unfortunate. Whether the API uses spline curves is not clear from its documentation.

The event class manages event listerners and fires them if needed – quite a common and good design choice. Although the key input is an event, it is implemented as a separate class.

The “Placeable” class adds many helpful methods to Object, Camera, Light, ParticleSystem. In mot situations, only a handful of methods are used, resulting in an unnecessary burden. It includes methods for scaling, rotation and position. But what position, scale and rotation displacements are used for is unexplained (a position in world space is a displacement in some sense too). This class also contains methods for quaternions. However, mixing matrix, vecotr and quaternion representations should be avoided – so they have to be moved into separate classes within a math module. This makes the API clearer, and faster too: the user usually does not know internal storage of orientation and position, and conversion between matrices and quaternions or Euler may be done implicilty or unknowingly and thus possibly quite often, resulting in more numerical instability in the worst case.

An integrated particle system features methods for life time, position, color and acceleration. In addition, intervals for each of these can be defined. However, the documentation does not explain how particle movements are actually computed. A better API would include a separate particle solver (a NewtonSolver e.g.). This solver could then be used for particle-particle and particle-model interaction. Additionally, there should be an option to bind models (e.g. billboards) and shaders (to generate sphere by the GPU on the fly). An API should not contain too specialized classes like “PhysicsCar”, “PhysicsMesh”, “PhysicsPlane”, “PhysicsWheel”. Quite a bad design after all.

Conclusion: Good documentation and basic support of OpenGL ES features, from lights, materials, models, cameras to basic animation and particle system. It has not explicit physics system. Explicit curve types and creation tools are still required for more complex animations. Shader classes are missing. Software engineering should be reworked, e.g. separate modules, less too specialized classes. Some class or constant names should be changed to be either less redundant or more meaningful.

Continue Reading Part 3