An Idea for a Media-Centreish Interface for a UNIX terminal/shell

18th December 2009

Back in July or August this year when I was going through the notes on unix shells for COMP2041 I came up with idea of doing a shell/terminal interface that looked like an interface for a media centre ie. rather than looking like this,

it would look "like" this (obvious not exactly the same but similar feel),

[caption id="attachment_970" align="aligncenter" width="450" caption="XBMC skin MediaStream by Team Razorfish. http://xbmc.org/wordpress/wp-content/gallery/mediastream/viewoptions.jpg"][/caption]

The key principles I had in mind were,

nice aesthetics
interface similar to a game or media centre
features easily discoverable for new users

My original motives were that I was just learning all these core-utils commands (ls, cat, mkdir, cp, mv...) and I found that although the shell had tab completion and apropos, it didn't categorise these or give them in a list of common commands. Then I came up with more abstract ideas,

categorise common commands and give help on them. eg. File System: ls, cd, cd .., mkdir. Filters: cat, wc, grep...
parse commands and their argument list based on common styles (eg. GNU style, short -las and long -l --all --size) and provide contextual information (eg hovering over an --argument gives a one line message about what that argument does (perhaps parse the man file to get this info)) also auto-layout the command line as per the argument style.
it could also parse the pipe lines and display these much more visually so its easier to see what's piping into what and allow the user to easily change the order/flow of the pipeline.
process management. don't force the user to remember Ctrl+C and Ctrl+Z and bg and fg commands, show these as pause and stop icons.
redirection of output should be easily changed in the interface rather than just adding a < or > to the command line (and allow one to redirect STDOUT to a file AFTER the command has already run (because currently you would need to run the command again, or copy and paste and put up the with new lines that gnome-terminal puts in))
bookmarking commands (including argmunts) so that those common ones you use that you haven't remembered yet are quick and easy to use.
colour STDERR in red.

I haven't really thought about it on a technical level, but it may not be so portable as say gnome-terminal. I don't know the really differences among different shells out there so I don't know how dependent this is on bash or even if it ties bash and the terminal together, but from a beginner user perspective I don't care about this.

The cloudy idea I have in my mind is basically a GUI/CLI hybrid but I think such a program would need to be careful not to go too far, because it could be made so that after doing an ls -la you could click on a file in the list and rename it, but then we are turning into a file manager in list mode (like Dolphin or Nautilus) which is unnecessary as those tools already exist.

I'm aiming to do come up with a list and more detailed list of requirements and a set of activity and use case scenarios, along with some wire-frame prototypes for such an interface soon. But for now I just needed to get it all out of my head an onto paper (and also public (in case someone tries to patent a concept)).

Tags: computing, sh.

The Features of My Utopian Music Player

11th December 2009

Ideally I would like to write my own music player because I don't really like any that are currently available (Amarok 1.4, Amarok 2, Songbird, Rhythmbox, Banshee, Exaile). I like features from each but none seem to fit all my needs. All the time I keep rethinking what I should do and I still cannot decide. Anyway this is what my ideal music player would be like...

Backend Database
- The backend metadata would be stored in an external Postgresql database, with the option for using sqlite for people who don't want to set up and run postgresql.
- The schema should be good and documented, so that a user can read and write into the database. If not at least give an interface to allow this.
- Full playback information. I want my music player to store the timestamps of every time a given song has been played. I want history too, for instance the times of when the song rating was changed.
Collection Manger
- I want the music player to be the library not just the librarian. I want to give it a file (say an MP3), along with details such as song title, artist, etc. and I want it to take that file and store it on the hard disk in a nice file structure (like iTunes does). Amarok 1.4 attempts to do this but its really hard, because initially it will just add the file to your playlist and not move it across to your collection, and even then if you change the details say the artist it will not correct this in the folder structure used to store that file.
- Tagging songs. Amarok does this well.
Web scraper
- Album art and lyric scraping (but who knows you might get sent to jail for writing a scraper for a specific site which you do not know if they have permission to distribute those certain copyrighted lyrics/album art which were available at the time of writing the scraper). Amarok seems to do this well.
Acoustic Analysis
- Surely there are algorithms to guess the BPM (beats per minute) of a song. I want that integrated into the music player.
- I need a moodbar so I can navigate a song, and to gather contextual information on how the style of the music varies over the song.
- I don't know much about acoustics, but there must be other algorithm which give meaningful measures of audio. These should be used to group songs and find similar ones.
- This must be done locally, I don't want to send things to web services (MusicBrainz, http://echonest.com/).
Navigation
- I want a concept of a "Library" rather than a Playlist. Amarok only has playlists, but 99.9% of the time I want a list of all my songs.
Statistics
- I want reports. Reports on my listening trends and song collection. eg. http://lastgraph3.aeracode.org/

Now for the solution. I could try everything from writing my own music player from scratch that implements that all (but I gave up on that after I could not decide what programming language to use C, C++, Java, Perl, Python, what GUI widget toolkit to use Qt, GTK+, wxWidgets, graphics api for nice graphs Cairo, raw OpenGL, OpenGL behind Clutter, R's graph drawing, Processing, or some other CPAN Perl module for drawing nice graphs. I can mix a few but the core app needs one programming language and it needs a core GUI toolkit for the GUI. There is too much choice and I don't have enough experience to know before hand what is best and what I will find easiest and simplest to use.)

I could try to capture playback statistics by looping last.fm and audioscrobber.com to localhost and capturing the data that Amarok sends. Or I could just write a script for Amarok which captures playback, but this only solves part of the problem and then I'm stuck using a certain application. Alternatively I could just take an existing program and fork it to suit my needs.

There should be more to come on this as I start experimenting.

Tags: computing, dev.

A Perl Script to Pause/Resume Amarok 1.4 Playback on Screensaver/Screenlock

11th December 2009

I've just uploaded to GitHub a script to pause Amarok 1.4 playback when the screensaver/screenlock starts and up pause again when closed/unlocked. It addresses the issue I was having with the script at http://nxsy.org/getting-amarok-to-pause-when-the-screen-locks-using-python-of-course where the script would start Amarok if it was not running and it would restart playback on screensaver end/unlock regardless of whether it was playing when the screensaver started.

You could start the script on start-up or plug it into Amarok's script engine to only be active when Amarok is active.

(Oh and in the future I'll try to avoid posts that just duplicate item's from other RSS/Atom feeds that don't add much extra value.)

Tags: computing, dev, sh.

Saving the Wordpress.com Export File and The Linked Media Files (and wget's strictness)

7th December 2009

So I've been wanting a way to automatically backup my wordpress.com export file. I decided to go for a bash and wget mix to do this work. But I soon had a problem wget won't save cookies that have a path different to the file you are downloading. This is a problem because, well here is what I basically do to get the export file.

Grab wp-login.php. This will issue a cookie that WP looks for as proof that I can indeed store cookies.

Next I post login credentials to wp-login.php. This will issue a bunch of authentication cookies. Specifically,

Set-Cookie: wordpress_test_cookie=WP+Cookie+check; path=/; domain=.wordpress.com
Set-Cookie: wordpress=some_string; path=/wp-content/plugins; domain=.wordpress.com; httponly
Set-Cookie: wordpress=some_string path=/wp-admin; domain=.wordpress.com; httponly
Set-Cookie: wordpress_logged_in=some_string; path=/; domain=.wordpress.com; httponly
Set-Cookie: wordpress_sec=some_string; path=/wp-content/plugins; domain=.wordpress.com; secure; httponly
Set-Cookie: wordpress_sec=some_string path=/wp-admin; domain=.wordpress.com; secure; httponly

The problem is Wget will refuse to save number 2,3,5 and 6 (only saving wordpress_test_cookie and wordpress_logged_in). It refuses the rest because it requires the cookie path to be the same as the path of the file you are requesting. Using --debug wget says,

cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-content/plugins, /wp-login.php
cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-admin, /wp-login.php
cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-content/plugins, /wp-login.php
cdm: 1 2 3 4 5 6 7 8Attempt to fake the path: /wp-admin, /wp-login.php

Specifically to get the export file I need the wordpress_sec cookie for the path /wp-admin. I can't just request /wp-admin and try to get the cookie from there because only wp-login.php will let me post credentials.

Possible solutions are A) write a hacky solution that just grabs the cookie value using grep/sed and manually add this to the cookies file, B) recompile wget to accept some other argument that will accept these cookies, or C) don't use wget.

I took a look at the source for wget, and it was easy to identify the problem area, I could just simply remove this segment,

/* The cookie sets its own path; verify that it is legal. */
 if (!check_path_match (cookie->path, path))
 {
 DEBUGP (("Attempt to fake the path: %s, %s\n",
 cookie->path, path));
 goto out;
 }

But then my download script wouldn't be as portable and I'll have to make sure I use and have the patched wget available.

I ended up using curl for some parts, but I probably could have done option A.

Anyhow, the script is here. It should grab the export xml file as well as any media files that it references and were uploaded to that wordpress.com blog.

Tags: sh.

Computer Graphics Notes

2nd December 2009

Not really complete...

Colour notes here, transformations notes here.

Parametric Curves and Surfaces

Parametric Representation

eg. $latex C(t) = (x(t), y(t))$

Continuity

Parametric Continuity

If the first derivative of a curve is continuous, we say it has C¹ continuity.

Geometric Continuity

If the magnitude of the first derivative of a curve changes but the direction doesn't then, we say it has G¹ continuity.
Curves need G2 continuity in order for a car to drive along them. (ie. not instantly change steering wheel angle at any points).

Control Points

Control points allow us to shape/define curves visually. A curve will either interpolate or approximate control points.

Natural Cubic Splines

Interpolate control points.
A cubic curve between each pair of control points
Four unknowns,
- - interpolating the two control points gives two,
  - requiring that derivatives match at end of points of these curves gives the other two.
Moving one control point changes the whole curve (ie. no local control over the shape of the curve)

Bezier Curve

This Bezier curve shown has two segments, where each segment is defined by 4 control points. The curve interpolates two points and approximates the other two. The curve is defined by a Bernstein polynomial. In the diagram changing points 1 and 2 only affects that segment. Changing the corner points (0 and 3) each only affect the two segments that they boarder.

Some properties of Bezier Curves:

Tangent Property. Tangent at point 0 is line 0 to 1, similarly for point 3.
Convex Hull Property. The curve lies inside the convex hull of the control points. (The corollary of this is if the control points are colinear, the curve is a line.)
They have affine invariance.
Can't fluctuate more than their control polygon does.
Bezier's are a special case of B-spline curves.

We can join two Bezier curves together to have C¹ continuity (where B₁(P₀, P₁, P₂, P₃) and B₂(P₀, P₁, P₂, P₃)) if P₃ - P₂ = P₄ - P₃. That is P₂, P₃, and P₄ are colinear and P₃ is the midpoint of P₂ and P₄. To get G¹ continuity we just need P₂, P₃, and P₄ to be colinear. If we have G¹ continuity but not C¹ continuity the curve still won't have any corners but you will notice a "corner" if your using the curve for something else such as some cases in animation. [Also if the curve defined a road without G¹ continuity there would be points where you must change the steering wheel from one rotation to another instantly in order to stay on the path.]

De Casteljau Algorithm

De Casteljau Algorithm is a recursive method to evaluate points on a Bezier curve.

To calculate the point halfway on the curve, that is t = 0.5 using De Casteljau's algorithm we (as shown above) find the midpoints on each of the lines shown in green, then join the midpoints of the lines shown in red, then the midpoint of the resulting line is a point on the curve. To find the points for different values of t, just use that ratio to split the lines instead of using the midpoints. Also note that we have actually split the Bezier curve into two. The first defined by P₀, P₀₁, P₀₁₂, P₀₁₂₃ and the second by P₀₁₂₃, P₁₂₃, P₂₃, P₃.

Curvature

The curvature of a circle is $latex \frac{1}{r}$.

The curvature of a curve at any point is the curvature of the osculating circle at that point. The osculating circle for a point on a curve is the circle that "just touches" the curve at that point. The curvature of a curve corresponds to the position of the steering wheel of a car going around that curve.

Uniform B Splines

Join with C2 continuity.

Any of the B splines don't interpolate any points, just approximate the control points.

Non-Uniform B Splines

Only invariant under affine transformations, not projective transformations.

Rational B Splines

Rational means that they are invariant under projective and affine transformations.

NURBS

Non-Uniform Rational B Splines

Can be used to model any of the conic sections (circle, ellipse, hyperbola)

=====================

3D

When rotating about an axis in OpenGL you can use the right hand rule to determine the + direction (thumb points in axis, finger indicate + rotation direction).

We can think of transformations as changing the coordinate system, where (u, v, n) is the new basis and O is the origin.

$latex \begin{pmatrix}u_x & v_x & n_x & O_x\ u_y & v_y & n_y & O_y\ u_z & v_z & n_z & O_z\ 0 & 0 & 0 & 1 \end{pmatrix}$

This kind of transformation in is known as a local to world transform. This is useful for defining objects which are made up of many smaller objects. It also means to transform the object we just have to change the local to world transform instead of changing the coordinates of each individual vertex. A series of local to world transformations on objects builds up a scene graph, useful for drawing a scene with many distinct models.

Matrix Stacks

OpenGL has MODELVIEW, PROJECTION, VIEWPORT, and TEXTURE matrix modes.

glLoadIdentity() - puts the Identity matrix on the top of the stack
glPushMatrix() - copies the top of the matrix stack and puts it on top
glPopMatrix()

For MODELVIEW operations include glTranslate, glScaled, glRotated... These are post multiplied to the top of the stack, so the last call is done first (ie. a glTranslate then glScaled will scale then translate.).

Any glVertex() called have the value transformed by matrix on the top of the MODELVIEW stack.

Usually the hardware only supports projection and viewport stacks of size 2, whereas the modelview stack should have at least a size of 32.

The View Volume

Can set the view volume using,(after setting the the current matrix stack to the PROJECTION stack

glOrtho(left, right, bottom, top, near, far) (Source: Unknown)
glFrustum(left, right, bottom, top, near, far) (Source: Unknown)
gluPerspective(fovy, aspect, zNear, zFar) (Source: Unknown)

In OpenGL the projection method just determines how to squish the 3D space into the canonical view volume.

Then you can set the direction using gluLookAt (after calling one of the above) where you set the eye location, a forward look at vector and an up vector.

When using perspective the view volume will be a frustum, but this is more complicated to clip against than a cube. So we convert the view volume into the canonical view volume which is just a transformation to make the view volume a cube at 0,0,0 of width 2. Yes this introduces distortion but this will be compensated by the final window to viewport transformation.

Remember we can set the viewport with glViewport(left, bottom, width, height) where x and y are a location in the screen (I think this means window, but also this stuff is probably older that modern window management so I'm not worrying about the details here.)

Visible Surface Determination (Hidden Surface Removal)

First clip to the view volume then do back face culling.

Could just sort the polygons and draw the ones further away first (painter's algorithm/depth sorting). But this fails for those three overlapping triangles.

Can fix by splitting the polygons.

BSP (Binary Space Partitioning)

For each polygon there is a region in front and a region behind the polygon. Keep subdividing the space for all the polygons.

Can then use this BSP tree to draw.

void drawBSP(BSPTree m, Point myPos{
   if (m.poly.inFront(myPos)) {
      drawBSP(m.behind, myPos);
      draw(m.poly);
      drawBSP(m.front, myPos);
   }else{
      drawBSP(m.front, myPos);
      draw(m.poly);
      drawBSP(m.behind, myPos);
   }
}

If one polygon's plane cuts another polygon, need to split the polygon.

You get different tree structures depending on the order you select the polygons. This does not matter, but some orders will give a more efficient result.

Building the BSP tree is slow, but it does not need to be recalculated when the viewer moves around. We would need to recalculate the tree if the polygons move or new ones are added.

BSP trees are not so common anymore, instead the Z buffer is used.

Z Buffer

Before we fill in a pixel into the framebuffer, we check the z buffer and only fill that pixel is the z value (can be a pseudo-depth) is less (large values for further away) than the one in the z buffer. If we fill then we must also update the z buffer value for that pixel.

Try to use the full range of values for each pixel element in the z buffer.

To use in OpenGL just do gl.glEnable(GL.GL_DEPTH_TEST) and to clear the z-buffer use gl.glClear(GL.GL_DEPTH_BUFFER_BIT).

Fractals

L-Systems

Line systems. eg. koch curve

Self-similarity

Exact (eg. sierpinski trangle)
Stochastic (eg. mandelbrot set)

IFS - Iterated Function System

================================================

Shading Models

There are two main types of rendering that we cover,

polygon rendering
ray tracing

Polygon rendering is used to apply illumination models to polygons, whereas ray tracing applies to arbitrary geometrical objects. Ray tracing is more accurate, whereas polygon rendering does a lot of fudging to get things to look real, but polygon rendering is much faster than ray tracing.

With polygon rendering we must approximate NURBS into polygons, with ray tracing we don't need to, hence we can get perfectly smooth surfaces.
Much of the light that illuminates a scene is indirect light (meaning it has not come directly from the light source). In polygon rendering we fudge this using ambient light. Global illumination models (such as ray tracing, radiosity) deal with this indirect light.
When rendering we assume that objects have material properties which we denote k_(property).
We are trying to determine I which is the colour to draw on the screen.

We start with a simple model and build up,

Lets assume each object has a defined colour. Hence our illumination model is $latex I = k_i$, very simple, looks unrealistic.

Now we add ambient light into the scene. Ambient Light is indirect light (ie. did not come straight from the light source) but rather it has reflected off other objects (from diffuse reflection). $latex I = I_a k_a$. We will just assume that all parts of our object have the same amount of ambient light illuminating them for this model.

Next we use the diffuse illumination model to add shading based on light sources. This works well for non-reflective surfaces (matte, not shiny) as we assume that light reflected off the object is equally reflected in every direction.

Lambert's Law

"intensity of light reflected from a surface is proportional to the cosine of the angle between L (vector to light source) and N(normal at the point)."

Gouraud Shading

Use normals at each vertex to calculate the colour of that vertex (if we don't have them, we can calculate them from the polygon normals for each face). Do for each vertex in the polygon and interpolate the colour to fill the polygon. The vertex normals address the common issue that our polygon surface is just an approximation of a curved surface.

To use gouraud shading in OpenGL use glShadeModel(GL_SMOOTH). But we also need to define the vertex normals with glNormal3f() (which will be set to any glVertex that you specify after calling glNormal).

Highlights don't look realistic as you are only sampling at every vertex.

Interpolated shading is the same, but we use the polygon normal as the normal for each vertex, rather than the vertex normal.

Phong Shading

Like gouraud, but you interpolate the normals and then apply the illumination equation for each pixel.

This gives much nicer highlights without needing to increase the number of polygons, as you are sampling at every pixel.

Phong Illumination Model

Diffuse reflection and specular reflection.

: Components of the Phong Model (Brad Smith, http://commons.wikimedia.org/wiki/File:Phong_components_version_4.png)

(Source: COMP3421, Lecture Slides.)

$latex I_s = I_l k_s \cos^n \left ( \alpha \right )$

n is the Phong exponent and determines how shiny the material (the larger n the smaller the highlight circle).

Flat shading. Can do smooth shading with some interpolation.

If you don't have vertex normals, you can interpolate it using the face normals of the surrounding faces.

Gouraud interpolates the colour, phong interpolates the normals.

Attenuation

inverse square is physically correct, but looks wrong because real lights are not single points as we usually use in describing a scene, and

For now I assume that all polygons are triangles. We can store the normal per polygon. This will reneder this polygon, but most of the time the polygon model is just an approximation of some smooth surface, so what we really want to do is use vertex normals and interpolate them for the polygon.

Ray Tracing

For each pixel on the screen shoot out a ray and bounce it around the scene. The same as shooting rays from the light sources, but only very few would make it into the camera so its not very efficient.

Each object in the scene must provide an intersection(Line2D) function and a normal (Point3D) function

Ray Tree

Nodes are intersections of a light ray with an object. Can branch intersections for reflected/refracted rays. The primary ray is the original ray and the others are secondary rays.

Shadows

Can do them using ray tracing, or can use shadow maps along with the Z buffer. The key to shadow maps is to render the scene from the light's perspective and save the depths in the Z buffer. Then can compare this Z value to the transformed Z value of a candidate pixel.

==============

Rasterisation

Line Drawing

DDA

You iterate over x or y, and calculate the other coordinate using the line equation (and rounding it).
If the gradient of the line is > 1 we must iterate over y otherwise iterate over x. Otherwise we would have gaps in the line.
Also need to check if x1 is > or < x2 or equal and have different cases for these.

Bresenham

Only uses integer calcs and no multiplications so its much faster than DDA.
We define an algorithm for the 1st octant and deal with the other octant's with cases.
We start with the first pixel being the lower left end point. From there there are only two possible pixels that we would need to fill. The one to the right or the one to the top right. Bresenham's algorithm gives a rule for which pixel to go to. We only need to do this incrementally so we can just keep working out which pixel to go to next.
The idea is we accumulate an error and when that exceeds a certain amount we go up right, then clear the error, other wise we add to the error and go right.

We use Bresenham's algorithm for drawing lines this is just doing linear interpolation, so we can use Bresenham's algorithm for other tasks that need linear interpolation.

Polygon Filling

Scan line Algorithm

The Active Edge List (AEL) is initially empty and the Inactive Edge List (IEL) initially contains all the edges. As the scanline crosses an edge it is moved from the IEL to the AEL, then after the scanline no longer crosses that edge it is removed from the AEL.

To fill the scanline,

On the left edge, round up to the nearest integer, with round(n) = n if n is an integer.
On the right edge, round down to the nearest integer, but with round(n) = n-1 if n is an integer.

Its really easy to fill a triangle, so an alternative is to split the polygon into triangles and just fill the triangles.

===============

Anti-Aliasing

Ideally a pixel's colour should be the area of the polygon that falls inside that pixel (and is on top of other polygons on that pixel) times the average colour of the polygon in that pixel region then multiply with any other resulting pixel colours that you get from other polygons in that pixel that's not on top of any other polygon on that pixel.

Aliasing Problems

Small objects that fall between the centre of two adjacent pixels are missed by aliasing. Anti-aliasing would fix this by shading the pixels a gray rather than full black if the polygon filled the whole pixel.
Edges look rough ("the jaggies").
Textures disintegrate in the distance
Other non-graphics problems.

Anti-Aliasing

In order to really understand this anti-aliasing stuff I think you need some basic understanding of how a standard scene is drawn. When using a polygon rendering method (such as is done with most real time 3D), you have a framebuffer which is just an area of memory that stores the RGB values of each pixel. Initially this framebuffer is filled with the background colour, then polygons are drawn on top. If your rending engine uses some kind of hidden surface removal it will ensure that the things that should be on top are actually drawn on top.

Using the example shown (idea from http://cgi.cse.unsw.edu.au/~cs3421/wordpress/2009/09/24/week-10-tutorial/#more-60), and using the rule that if a sample falls exactly on the edge of two polygons, we take the pixel is only filled if it is a top edge of the polygon.

: Anti-Aliasing Example Case. The pixel is the thick square, and the blue dots are samples.

No Anti-Aliasing

With no anti-aliasing we just draw the pixel as the colour of the polygon that takes up the most area in the pixel.

Pre-Filtering

We only know what colours came before this pixel, and we don't know if anything will be drawn on top.
We take a weighted (based on the ratio of how much of the pixel the polygon covers) averages along the way. For example if the pixel was filled with half green, then another half red, the final anti-aliased colour of that pixel would determined by, Green (0, 1, 0) averaged with red (1, 0, 0) which is (0.5, 0.5, 0). If we had any more colours we would then average (0.5, 0.5, 0) with the next one, and so on.
Remember weighted averages, $latex \frac{Aa + Bb}{A + B}$ where you are averaging $latex a$ and $latex b$ with weights $latex A$ and $latex B$ respectively.
Pre-filtering is designed to work with polygon rendering because you need to know the ratio which by nature a tracer doesn't know (because it just takes samples), nor does it know which polygons fall in a given pixel (again because ray tracers just take samples).
Pre-filtering works very well for anti-aliasing lines, and other vector graphics.

Post-Filtering

Post-filtering uses supersampling.
We take some samples (can jitter (stochastic sampling) them, but this only really helps when you have vertical or horizontal lines moving vertically or horizontally across a pixel, eg. with vector graphics)
$latex \left ( \frac{6}{9} \right )$ of the samples are Green, and $latex \left ( \frac{3}{9} \right )$ are red. So we use this to take an average to get the final pixel colour of $latex \begin{pmatrix}\frac{1}{3}, & \frac{2}{3}, & 0\end{pmatrix}$
We can weight these samples (usually centre sample has more weight). The method we use for deciding the weights is called the filter. (equal weights is called the box filter)
Because we have to store all the colour values for the pixel we use more memory than with pre-filtering (but don't need to calculate the area ratio).
Works for either polygon rendering or ray tracing.

Can use adaptive supersampling. If it looks like a region is just one colour, don't bother supersampling that region.

OpenGL

Often the graphics card will take over and do supersamling for you (full scene anti aliasing).

To get OpenGL to anti-alias lines you need to first tell it to calculate alpha for each pixel (ie. the ratio of non-filled to filled area of the pixel) using, glEnable(GL_LINE_SMOOTH) and then enable alpha blending to apply this when drawing using,

glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);

You can do post-filtering using the accumulation buffer (which is like the framebuffer but will apply averages of the pixels), and jittering the camera for a few times using accPerspective.

Anti-Aliasing Textures

A texel is a texture pixel whereas a pixel in this context refers to a pixel in the final rendered image.

When magnifying the image can use bilinear filtering (linear interpolation) to fill the gaps.

Mip Mapping

Storing scaled down images and choose closes and also interpolate between levels where needed. Called trilinear filtering.

Rip Mapping helps with non uniform scaling of textures. Anisotropic filtering is more general and deals with any non-linear transformation applied to the texture

Double Buffering

We can animate graphics by simply changing the framebuffer, however if we start changing the framebuffer and we cannot change it faster than the rate the screen will display the contents of the frame buffer, it gets drawn when we have only changed part of the framebuffer. To prevent this, we render the image to an off screen buffer and when we finish we tell the hardware to switch buffers.

Can do on-demand rendering (only refill framebuffer when need to) or continuois rendeing (draw method is called at a fixed rate and the image is redrawn regardless of whether the image needs to be updated.)

LOD

Mip Mapping for models. Can have some low poly models that we use when far away, and use the high res ones when close up.

Animation

Key-frames and tween between them to fill up the frames.

===============

Shaders

OpenGL 2.0 using GLSL will let us implement out own programs for parts of the graphics pipeline particularly the vertex transformation stage and fragment texturing and colouring stage.

Fragments are like pixels except they may not appear on the screen if they are discarded by the Z-buffer.

Vertex Shaders

position tranformation and projection (set gl_Position), and
lighting calculation (set gl_FrontColor)

Fragment Shaders

interpolate vertex colours for each fragment
apply textures
etc.

set gl_FragColor.

Tags: comp3421, computing, graphics.

Human Computer Interaction Notes

2nd December 2009

These notes are based around my COMP3511 course.

Interaction Design (+Scenarios)

Interaction Design is about creating user experiences that enhance and augment the way people work, communicate, and interact.1
Interaction Design has a much wider scope than Human Computer Interaction. ID is concerned with the theory and practice of designing user experiences for any technology or system, whereas HCI has traditionally been focused on/surrounding humans and computers specifically.
Interaction Design involves understanding the requirements.
Requirements can be functional (what should it do) or non-functional (what are the constraints). The usability principles (heuristics) fit into the non-functional requirements.
User Profiles are a set of persona's. A persona is a short description of a fictional user.
Scenarios
- Activity Scenario (narrative based on user's requirements)
  - Used at the beginning of the design process.
  - Who, When, What, Why
  - Not technical/no specific details (re: does not presuppose the interface)
  - From the users perspective
- Use Case Scenario (narrative of how the user uses the interface to fulfil their goal)
  - Include the users goals but focus on the user/computer interaction (re: talk about the technology)
  - Basically is a description of the use case diagram.
  - Do these scenarios after you figure out the requirements
  - Different users would have different use cases, we can show this with a use case diagram which shows the actors and the various use case's that they encounter.
- Task Scenario
  - Used when running a usability test. ie. give the participant a scenario before giving them a task to give them context.
- When describing a scenario, give the users goals, their context and situation. Use a narrative form.

Cooper et al. describe the process of interaction design as,

Identifying needs and establishing requirements for the user experience.
Developing alternative designs that meet those requirements.
Building interactive versions of the designs so that they can be communicated and assessed.
Evaluating what is being built throughout the process and the user experience it offers.

Scenarios are narratives about named people with an age. We need some background to understand where they are coming from (for instance their cultural background (eg. the US uses MM/DD/YYYY but Australia uses DD/MM/YYYY)). We try to remove incorrect assumptions about what we think a certain group of people are like. The scenario should explain their motivations and their goals.

Usability

Usability is all about producing things which are usable. Where something is usable when it meets these usability goals, however you should work out which goals are most important for the problem and focus on those first.

Usability Goals

easy to learn
easy to remember how to use
effective to use
efficient to use
safe to use
have good utility (providing the right kind of functionality to allow the user to achieve their goal)

User Experience Goals

satisfying
fun
helpful
motivating
universal access (accessibility)

Heuristics (Usability Principles)

visibility of system status (eg. busy mouse icon)
match between system and real world (includes interface metaphors. eg. files and folders concept, "speak the user's language" (avoid gargon that users may not understand))
user control and freedom (includes letting the user cancel/exit. eg. can pause/cancel file transfers)
consistency and standards (eg. consistent terminology, consistent workflows, common look and feel, GUI toolkits. eg. GNOME/GTK+)
help and documentation
help users recognise, diagnose and recover from errors (tell users what the problem was, why it happened and some possible solutions, using plain English. eg. recover from trash)
error prevention (eg. move to trash first)
recognition rather than recall (GUI applications menu as opposed to CLI)
flexibility and efficiency of use (eg. keyboard shortcuts (helpful for experts, but hidden from novices))
aesthetic and minimalist (uncluttered) design (maybe put rarely used info into a help page/manual rather than in the interface)

Design Principles

visibility
feedback (can be audio, visual...)
affordances (clues on how to use. eg. ⋮∶affords grabable)
consistency (includes look and feel consistency)
mapping
- eg. which light switch controls which light
constraints
- logical (eg. grey out menu options that are not allowed)
- physical (eg. you can't plug a USB cable into a VGA port, c.f. you can put a DVD in a CD player)
- cultural

When designing a system we need to consider,

who are the users,
how will the product be used,
where will the product be used

Identifying Needs

Requirements

When testing an interface with users/test participants, give them a high level goal and observe how they go about doing it. Don't give them specific instructions.

Use Scenario 1: For each task identified (or major tasks, or particularly special tasks if many tasks are defined), write a description of how that user would accomplish the task independent of how they would complete it within the application.

Use Case 1: If a use scenario has been implemented, include a matching use case which describes how the task use scenario can be completed in the application. There may be branching or multiple ways to complete the task, and this is a good way to document it.

To test if something is a requirement just ask, "If I remove this, will the product still fulfil its purpose?"

Design Conceptualisation

A conceptual model is a high-level description of how a system is organised and operates. --Johnson and Henderson, 2002, p. 26

I like to think of it as this. The person coding the web browsers understands that when the users types in a URL and presses enter an HTTP GET request is sent and the response is received and the HTML is processed and displayed. There are many technical interactions and details that are happening here. But the conceptual model of this is what the average non-technical uses thinks is happening. They just have some kind of model in their head that they type the URL hit enter and get the web site displayed. Its just an abstraction of what is going on.

Interface metaphors are used as they can help the user understand and determine how an interface works. We try to use them for this purpose but just using the metaphor directly can have some negative affects (eg. if your designing a radio application for desktop PC's, it may not be a good idea to just show an image of a real radio as the UI). We don't want to use the metaphor to an extent that it breaks the design principles.

A classic example of a conceptual framework is that of the relation between the design of a conceptual model and the user's understanding of it. In this framework there are three components, (Sharp et al., 2006)

The designer's model - the model the designer has how how the system works (or how it should work)
The system image - how the systems actual workings are portrayed to the users through the interface, manuals, etc. (or how it is presented)
The user's model - how the user understands the system works (or how it is understood)

[caption id="attachment_772" align="aligncenter" width="361" caption="Conceptual Framework (from Norman, 1988)"][/caption]

The designers job is to create the system image so that the users will invoke the same conceptual model as the designer's.

"A good conceptual model allows us to predict the effects of our actions." (Norman, 1988)

The interface could be made more transparent so the user can see the details of how the system works, but this is not always desirable as it may cause confusion. Also many users may not want to know all the gory details, nor should they have to know the actual implementation in order to use the system.

You can conceptualise how a user interacts with a system in terms of their goals and what they need to do to achieve those goals.

You can try to give the user a more correct mental model of the system by giving useful feedback based on their input and providing help and documentation.

Prototyping

Can do low fidelity or high fidelity prototypes.
Could use paper mockups of screens, storyboards, electronic mockup, electronic prototype...
Make sure you iterate.
"the best way to get a good idea is to get lots of ideas" --Linus Pauling

Using A Design Diary

Brainstorm ideas
Sketch interface mockups
Sketch storyboards/work flow diagrams

Wireframes

Here is an example wireframe.

[caption id="" align="aligncenter" width="445" caption="Example Wireframe from https://wiki.ubuntu.com/DesktopExperienceTeam/KarmicBootExperienceDesignSpec"][/caption]

Another paper prototype with a slightly higher fidelity.

[caption id="attachment_850" align="aligncenter" width="450" caption="An example paper prototype (from https://wiki.ubuntu.com/SoftwareStore)."][/caption]

Issues Table

In this course we list the issues vertically and the participants horizontally.
Prioritise and group the issues. (Maybe use affinity diagramming for the grouping)

Usability Testing

Can do interviews, questionnaires, usability tests (best to run a dry run of these before you start testing on many people), interaction logging...
The purpose of a usability test is to gather feedback from potential users about usability issues as well as ensuring that an interface can be used and works as expected.
Testing should be done throughout the whole process during prototyping, beta versions, and deployed applications.
According to Nielson you only need test an interface with 5 people to find 80% of the issues (see Nielsen, Usability Engineering, p173) (however to publish research 5 is statistically too small so you should use at least 10).
When planning a test you need to define scenarios and tasks as well as deciding what to test and how to collect the results. Its a good idea to have goals that try to measure the success of the usability principles.
Test the parts of your interface which would be used most, as well as any particularly difficult to design aspects first.

There are some legal and ethical issues to consider. The participant,

needs to sign a consent form for you to run a test with them.
is free to stop participating at any time.
must be made aware of how you are observing them and what will be done with data collected. eg. is the session been recorded on video, audio, observed by people, screen captured...? Will the data be antonymous, will the anonymous results be published...
should be made aware that you are testing the software, not them.

During a Usability Test,

Avoid leading questions. (eg. try to avoid "How much do you like this interface?")
When running a usability test be careful not to bias your results. For example instead of asking the user "How would do X? when there is a button "Do X", give them a scenario which has a goal and ask them how they would go about achieving this with the interface.
You want the participant to "think aloud" so that you know what they are thinking when they are using the interface you are testing.
If users are struggling give them a hint, if that doesn't help explain the expected solution and move on, but note that they needed assistance when recording the test data.
If a task is difficult for the user, its not the users fault, its the interface's!
We want to record things like time to complete the task, amount and nature of errors encountered and by who... Things that address the usability principles. You should record both these quantitative measurements and any qualitative things that you observer or the participant mentions.

After the Testing,

Collate feedback and test data (Use an issues table to record the usability issues that the participants had.)
Group issues and prioritise them.
When analysing results consider,
- If a user is asked to compare two interfaces, the results may bias towards the second as they learn from their first experience.
  - Can try to solve this by getting some participants to do A then B, and others B then A.
- Observing how a user interacts with an interface may change how they interact with it. (ie. they may have done things differently if they were at home, without you scrutinising their every move).
  - We can try to avoid this by making the participants feel comfortable and reinforcing that we are not testing them we are testing the interface. Assure them that there are no incorrect users and don't avoid doing things just because you know we are taking notes.

Usability Testing

When actually running a usability test you should follow a usability test plan. The test plan just details what the test facilitator should do during the test.

The usability testing procedures we used in this course are:

Explain procedures (think aloud, data collection methods in use...)
Make sure they agree and sign a consent for before proceeding (you keep one, they keep one)
Run a pre-test questionnaire (used to generate a participant profile) (this helps to give you an idea on their experience level, as well as any background they may have in using similar interfaces before, as these affect how the participant performs) (best to get the participant to do this a few days before the test so that you can focus on specific tasks.)
Introduce scenario
Run through tasks
Ask any post test questions
Do they have any extra comments/debriefing
Thank them for their time

Interviews

Can be open ended (participant gives longer responses which may include their reasoning) or closed (list of options participant chooses from).
When running an interview give,
- An introduction to the interview (what you are doing, purpose, what happens to the responses, how it it being recorded)
- Warm-up questions
- Main section. (use a logical sequence)
- Cool-off questions
- Closing remarks.

Questionnaires

Participant Sample (You probably want a sample representative of your users/target users).

User Centred Design Process

The UCD process is all about focusing on the users and tasks. It also means iterate your designs often. The development is driven by users needs rather than technical concerns.

More specifically Gould and Lewis (1985) give three principles,

Early focus on users and tasks.
Empirical measurement.
Iterative design.

Affinity Diagramming

This is where we collect ideas and then group them.
Don't use pre-defined groups, make them up as you start sifting through the ideas.
The idea is to organise a bunch of individual ideas into a hierarchy.

Card Sorting

Get a bunch of people to sort cards with some idea/concept/whatever into categories and then compare how they were sorted by the different participants.

Software Lifecycles

Waterfall
W-model
Spiral
Rapid Application Development
Star Model (evaluation at each step)
ISO 13407 (the HCI model goes around in a circle and only exits when satisfactory)

Cognitive Load Theory

Cognition is what goes on in our brains. It includes cognitive processes such as,

attention
- Some techniques are particularly good at grabbing attention (flashing, moving, colourful or large things). But these should be used sparingly.
perception and recognition
- Gestalt principles of perceptual organisation. ie. we group things by proximity, similarity, closure, symmetry and continuity. [caption id="attachment_852" align="aligncenter" width="389" caption="Gestalt principles of perceptual grouping. (a. you see two groups the left two and the right two; b. you see the objects in columns rather than rows; c. you group the () braces together; d. you see two curves and most people probably see the same two curves (as opposed to sharp edges that meet at the vertex))"][/caption]
memory
learning
reading, speaking and listening
problem-solving, planning, reasoning and decision-making.
automatic processes
- Stroop effect (trying to say the colour not the words eg. red green blue orange purple pink) is due to this.

Some Cognitive Load Theory

Huge long term memory (with the information stored in schemas) and a limited working memory.
- Schemas allow us to bypass our working memory limitations by chunking information.
- They allow us to ignore the huge amount of detail coming from our senses and instead integrate with our existing schemas.
- eg. its much easier to memories SYSTEM than YSMSTE.
- "Automated Schemas"
Worked Examples instead of Means-Ends Analysis
- Its better to give new users a quick (or even not so quick) 'worked example' of how the interface works/how to use it, than just let them work it out for themselves.
Split Attention Effect
- e.g. "See figure 16.", "Refer to page 26"... "Requires us to mentally integrate information that is physically split."²
- Solution physically integrate the material.
- Don't force users to recall information from a previous screen.
The Redundancy Effect
- It is better to emit redundant information as it generally just confuses people.
Expertise Reversal Effect
- Its better to assume your audience is novice, if you are unsure.
- Novices need lots of worked out examples
Reduce Search (reduces cognitive load)
- By using a consistent layout
- By reducing visual clutter
Diagrams can reduce cognitive load
Modality Effect
- Separate working memories for audio and visual senses.
- Therefore presenting information visually and auditory allows for a greater total working memory size than just presenting it visually or auditory. (But we should consider users with disabilities, so taking advantage of this by presenting some information visually and some auditory is not a good idea)

Some HCI Applications

The Training Wheels approach involves restricting the features of an interface for novices until they become more experienced when advanced features are enabled.

Memory

(From a psychologists perspective).

Memory is based on your context (eg. night, bed, tired, dark, dream... ask them to recall they will often recall sleep even though it was never mentioned). Give context before this will help them store the information and recall it better.

Miller's theory is that only 7± 2 chunks of information can be held in short-term memory at any one time. (But this doesn't mean say, only put seven items in any menu or something like that. This is only for short-term memory and when the information comes and goes, not when it can be rescanned.)

Long Term Memory

[caption id="attachment_794" align="aligncenter" width="360" caption="A Taxonomy of Memory"][/caption]

Explicit and Implicit Memory

"Imagine that you learn a list of items and are then required to recall or recognise them. This memory test would be accompanied by conscious awareness that you were remembering. Imagine that a considerable time later, a test of recall or recognition revealed no memory for the items. However if you were given the original list to relearn there would probably be some savings in learning time (i.e. you would take less time to learn the list the second time, oven though you were not aware of your memory of the items). This is the distinction between explicit memory, in which remembering is accompanied by either intentional or involuntary awareness of remembering, and implicit memory, in which remembering is not accompanied by awareness (Graf & Schacter 1985; Schacter 1987)." -- (Walker, "Chapter 9: Memory, Reasoning and Problem Solving". pg. 262 (sorry I don't have the title))

Not really related, but a good thing to hear a text book say,

"Finally, some long-enduring memories are for passages that our teachers have drulled into us... The Interesting thing about these memories is that they are preserved as they were memorised, in a very literal form, in exact wordings (Rubin, 1982). The memory code is not transferred from literal to semantic. In fact, the words are often remembered mechanically, with almost no attention to their meaning." --(Walker, "Chapter 9: Memory, Reasoning and Problem Solving". pg. 267 (sorry I don't have the title))

The method of loci.

The context that a memory is encoded, affects its retrieval. For example you may not initially recognise your neighbour on the train, as you are not used to seeing them there.
People are much better at recognising things than recalling things.

Obstacles to Problem Solving

Unwarranted Assumptions
- Example given in class. "A man married 20 women. yet he broke not laws and never divorced. How? He was a priest."
Seeing new Relationships
Functional Fixedness
- Being an expert at a system, you may not see things that novice would see.
- You avoid using things in an unconventional way.
- New users may find new or unintended uses of the system.
The Set Effect
- We may not notice that two similar problems actually need to be solved in different ways.

External Cognition

People use external representations to extend or support ones ability to perform cognitive activities. For example, pens and paper, calculators, etc. We do this to,

reduce memory load
- eg. post-it notes, todo lists. But the placement of say post-it notes is also significant.
offload computation
- eg. pen and paper to solve a large arithmetic problem (the mechanical kind).
annotate
- modifying or manipulating the representation to reflect changes

Experts vs. Novices

What distinguishes an expert is their large knowledge based stored in schemas.
Declarative (facts)/procedural(how to do) knowledge.
Skill acquisition.
- Cognitive stage (learn facts, encode declarative knowledge),
- Associative stage (procedural knowledge),
- Autonomous stage.
Novices tend to use ends-means analysis (uses a lot of trial and error) to solve problems. Experts tend to use their knowledge stored in schemas to apply and solve the problem (ie. past experience).
In software can have novice (limited functionality) and expert modes. (Could be different applications Photoshop Elements vs. Photoshop Pro, or just hide advanced functionality for novices by default eg. >Advanced Options which is clicked to show more functions.)
IDEA: Could provide popup hints to intermediate users to explain expert functions (eg. what's going on under the hood), or more advanced options (eg. keyboard shortcuts).

Visual Design

Alignment of items in an interface makes it easier for users to scan the screen.
Grouping
Colour
Gestalt Principles
Menu design (see ISO 9241)

Three types of icons,
- similar (eg. a file for a file object)
- analogical (eg. scissors for cut)
- arbitrary (eg. X for delete or close)
Can add text near the icon to make it easier for newbie's, but allows expert to ignore and just glance at the icon.

Internationalisation

Differences around the world,

character set
keyboard layout
text direction
language
icons
date, time, currency
calendars

Internationalisation (i18n) refers to designing and developing a software product to function in multiple locales. Localisation (L10n) refers to modifying or adapting a software product to fit the requirements of a particular locale. This could include translating text, changing icons, modifying layout (eg. of dates).⁵

A locale is a set of conventions affected or determined by human language and customs, as defined within a particular geo-political region. These conventions include (but are not necessarily limited to) the written language, formats for dates, numbers and currency, sorting orders, etc.⁵

Accessibility

Some clauses relating to requirements for Australian web sites in Australian Disability Discrimination Act (1992).

Quantification

A way to test an interface different to usability testing.

GOMS

Goals (eg. send an email)
Operators (eg. double click)
Methods (recalling what to do/how to do)
Selection Rules (deciding which method to use to achive the goal)

Keystroke Level Model

K (keying) - 0.2s - Press (and release) a keyboard key, or click the mouse. (Click and drag is only 1/2 K).
P (pointing) - 1.1s - Moving the mouse to a location on the screen.
H (homing) - 0.4s - Moving between the keyboard and mouse.
M (mentally preparing) - 1.35s - Preparing.
R (computer responding) - Waiting for the computer to respond.

Fitt's Law

In the field of ergonomics research.
Used to predict the time to move the cursor to a target.
$latex T = A + B \times log_2 \left ( \frac{D}{S} + 1 \right )$
A and B are experimentally determined constants. (Raskin used A = 50, B = 150).
D is distance between start and target
S is size of target (just dealing with 1 dimension here).
Lesson: The larger the target the faster one can move the mouse to that location.
Lesson: Targets at the edge of the screen have an infinite size, so they are fast to navigate to. (Problem, if you use edge bindings a lot your mouse will physically move further and further away, so the user may need to be constantly picking it up moving it)

References

[1] Sharp, Rodgers, Preece. (2006) Interaction Design: Beyond human computer interaction. 2nd Ed.

[2] Marcus, Nadine. (2009) COMP3511 Cognitive Load Theory Lecture Slides.

Woo, Daniel. (2009) COMP3511 Lecture Slides.

Norman, Donald. (1988) The Design of Everyday Things.

[5] http://www.mozilla.org/docs/refList/i18n/

Tags: comp3511, computing.

Entries from December 2009.

Parametric Curves and Surfaces

Parametric Representation

Continuity

Parametric Continuity

Geometric Continuity

Control Points

Natural Cubic Splines

Bezier Curve

De Casteljau Algorithm

Curvature

Uniform B Splines

Non-Uniform B Splines

Rational B Splines

NURBS

3D

Matrix Stacks

The View Volume

Visible Surface Determination (Hidden Surface Removal)

BSP (Binary Space Partitioning)

Z Buffer

Fractals

L-Systems

Self-similarity

IFS - Iterated Function System

Shading Models

Lambert's Law

Gouraud Shading

Phong Shading

Phong Illumination Model

Attenuation

Ray Tracing

Shadows

Rasterisation

Line Drawing

DDA

Bresenham

Polygon Filling

Scan line Algorithm

Anti-Aliasing

Aliasing Problems

Anti-Aliasing

No Anti-Aliasing

Pre-Filtering

Post-Filtering

OpenGL

Anti-Aliasing Textures

Mip Mapping

Double Buffering

LOD

Animation

Shaders

Vertex Shaders

Fragment Shaders

Interaction Design (+Scenarios)

Usability

Usability Goals

User Experience Goals

Heuristics (Usability Principles)

Design Principles

Identifying Needs

Requirements

Design Conceptualisation

Prototyping

Using A Design Diary

Wireframes

Issues Table

Usability Testing

Usability Testing

Interviews

Questionnaires

User Centred Design Process

Affinity Diagramming

Card Sorting

Software Lifecycles

Cognitive Load Theory

Some Cognitive Load Theory

Some HCI Applications

Memory

Long Term Memory