A small observation regarding usability and Gnome 3

17th September 2011

I've been running Nautilus 3.0.2 for the last few weeks, I noticed the delete key didn't actually delete files so I've been painstakingly deleting files by right click then "Move to Trash" or "Delete" from the menu. I always thought this was a bug and it would get fixed soon, but after getting sick of this workaround I finally did a Google search and found that this is by design. Now you need to use Ctrl+Del to send to trash and Shift+Del to skip the trash.

I think this is a poor choice, my reasoning is because now you have to use modifier+Del for both send to trash and skip trash. It makes sense to have Del as safe delete (send to trash) and a modified+Del for hard delete (no trash) that is logical and it is easy to remember the difference between the two. But now in this new situation a large majority of users (me included) won't be able to remember which modifier does what. Is it Ctrl or Shift and will just pick one at random, giving you a 0.5 chance of doing the actual type of delete you wanted.

Also I don't see any reason to add more safety to the straight delete key, after all it only goes to the trash so you can recover it if you accidentally hit the key. If you want add a sound upon sending something to the trash as an audio cue.

The number one thing I learned in my Human Computer Interaction course at uni was get feedback from the users (this isn't just asking for feedback, but also observing how they use the software). This is my feedback as a user, and if Gnome developers want to improve usability then they should take in the feedback, i.e. read this.

Tags: computing, usability.

I'm halfway towards an online home...

31st July 2011

Just recently I signed up with Linode for a virtual private server. I should have done this a long time ago but payment methods and too much choice lead me to put it off. So far I'm quite happy with the price and the quality of the product.

I think I would have preferred just buying a new dedicated machine myself, however unless you're in the CBD you can't get upload speeds much more than 1.5Mbps around here (if you are lucky), which is not really fast enough, hence I've gone with the 3rd party hosting option. Perhaps if this NBN the government keeps taking about is turned on and not too expensive I'll switch. In the mean time it seems silly that Australians need to host content that is mainly consumed by Australians overseas just because of the price difference.

For example if A wants to host content to Australians, but it is cheaper to do so from overseas, then A will mostly likely host their content overseas. This just means Australia has an even greater disturbance in the overseas traffic up/down ratio (i.e. we pull more than we push), which in turns means our local ISPs have to pay more compared to the overseas ISPs to lay the cables, this in turn leads to higher internet costs for Australians compared to overseas ISPs.

At any time our local ISPs could provide incentives for keeping traffic local inside the country by making overseas traffic more expensive than local traffic (for both people hosting and consuming), which would help Australians host locally and in turn even up the balance. It also means that the overseas pipes won't need to be as thick (because for sites which mostly get Australian visitors, hosting locally means the total amount of cross country traffic for that site is less, hence more efficient).

There is not much I can do about this, so I'm giving in and hosting overseas with the hope that one day things will return to sanity, but in the meantime I haven't been left off-line.

Anyway, back to Linode, I having a lot of fun at the moment setting it up and getting things working.

I'm running Debian 6.0, with unattended-upgrades, lighttpd, ufw, etckeeper (and soon either exim or postfix and possibly I'll migrate this wordpress blog across). I have run through http://www.debian.org/doc/manuals/securing-debian-howto/ it's a bit dated and I'm by no means following everything, but none the less it's still a nice read.

I've set up ssh with protocol 1 disabled (securing debian howto says it has some design flaws, so why enable it if I don't need it?), publickey authentication only, fail2ban, and only accepting traffic to the ssh port from IP ranges I know I'll be connecting from (I know if I've got security via publickey auth, I don't really need this, but it can't hurt... at least my logs don't fill up with as many break in attempts.).

However since one can always log into the machine via the console at linode.com, security here comes down to the weakest link of web based username/password and ssh publickey auth (ignoring all the other threats like compromised VM separation, compromised VM host, physical security, etc.. stuff I have no control over).

I have set up lighttpd as RAM is limited and I read that this (and nginx) are better than Apache httpd in this regard. I have also set up munin and munin-node (and adapted /etc/munin/apache.conf for lighttpd). I would prefer the munin stats not be appreciable to the world, so I just expose it to localhost only and access it via an SSH tunnel.

Additionally, while trying to debug lighttpd configuration I noticed that it wasn't responding to nc. Apparently lighttpd will only respond to my requests when lines end with \r\n but just typing away in the terminal as input to nc and using the Enter key I only end lines with \n. Apparently this is per the HTTP spec and it only works with Apache because it isn't as strict about this.

Next I have to see about getting a domain name. This is a huge problem in itself, but it since,

the IP can change at any time (and will change if I move away from Linode),
you can't have sub-domains to separate parts of the site, and
you can't really do email

I don't really have a choice. Unfortunately the current ICANN DNS is the only real option at the moment, so I'm just going to have to pay up, and try to avoid having some details which I don't want listed, listed in WHOIS. At the moment I'll probably go for a .id.au domain.

I'll probably move this blog across once I set up a domain name, so more news to come on this later.

Update: I've registered with gandi.net. My site is now available at http://tianjara.net/

Tags: computing, web.

Operating Systems Notes

31st July 2010

Here are are some rough notes I put together as part of revision for a uni course. They are heavily based on the course lecture notes by Kevin Elphinstone and Leonid Ryzhyk. All diagrams are sourced from those lecture notes, some of which are in turn from the text book, A. Tannenbaum, Modern Operating Systems.

OS Overview

The OS hides the details of the hardware and provides an abstraction for user code.
The OS is responsible for allocating resources (cpu, disk, memory...) to users and processes. It must ensure no starvation, progress, and that the allocation is efficient.
The kernel is the portion of the OS running in privileged mode (hardware must have some support for privileged mode for the OS to be able to enforce the user and kernel mode distinction).
When the hardware is in privileged mode all instructions and registers are available. When the hardware is in user mode only a limited sets of instructions, registers and memory locations are available. For instance when in user mode, it is not possible to disable interrupts because otherwise a user could ensure that the OS never gets control again..
A process is a instance of a program in execution.
A process in memory usually has at least three segments. The text segment for the program code (instructions), a data segment called the heap for global variables and dynamically allocated memory, and a stack segment used for function calls.
A process also has an execution context which includes registers in use, program counter, stack pointer, etc.
In a Monolithic OS different parts of the OS like processor allocation, memory management, devices and file systems are not strictly separated into separate layers but rather intertwined, although usually some degree of grouping and separation of components is achieved.

System Calls

The system call interface represents the abstract machine provided by the operating system.
man syscalls
Syscall numbers
hardware syscall instruction

Processes and Threads

Processes can contain one or more threads. These threads are units of execution, and always belong to a process.
A process can terminate by,
- normal exit (voluntary), usually returning EXIT_SUCCESS
- error exit (voluntary), usually returning EXIT_FAILURE
- fatal error (involuntary), eg. segfault, divide by zero error
- killed by another process (involuntary), eg. pkill (sending the SIGTERM signal)
Thread states
Running to ready could be from a voluntary yield, or end of time allocated by the CPU, so the scheduler moves the thread to the Ready queue to give another process a go.
Running to blocked could be from a calling a syscall and waiting for the response, waiting on a device, waiting for a timer, waiting for some resource to become available, etc.
The scheduler (also called the dispatcher) decides which thread to pick from the read queue to run.
The idea is that a process with multiple threads can still make process when a single thread blocks, as if one thread blocks, the others can still be working. In more recent times, it allows a process to have threads running on multiple CPU's on modern multi-core machines.
The other model is finite-state machine where syscall's don't block but instead allow the program to continue running and then send the process an interrupt when the syscall has been dealt with. This is not as easy to program though.

During the system initialisation background processes (called daemon's on linux, service's on windows) and foreground processes can be started.

Threads Implementation

Kernel-threads vs. User-level threads.
User-level threads
- Thread management is handled by the process (usually in a runtime support library)
- The advantages are,
  - switching threads at user-level is much cheaper than the OS doing a context switch
  - you can tune your user-level thread scheduler, and not just use the OS provided one
  - no OS support for threads needed
- The disadvantages are,
  - your threads must yield(), you can't really on an interrupt to give your scheduler control again (this is known as co-operative multithreading)
  - cannot take advantage of multiple CPU's
  - if one of your user-level threads gets blocks by the OS, your whole process gets blocked (because the OS only sees one thread running)
Kernel-level Threads
- Thread management is done by the kernel through system calls
- The downside is thread management (creation, destruction, blocking and unblocking threads) requires kernel entry and exit, which is expensive
- The upside is we now have preemptive multithreading (the OS just takes control when it wants to schedule another process, so no need to rely on the thread to yield), can take advantage of a multiprocessor, and individual threads can have isolated blocking.
A Thread switch can happen in between execution of any two instructions, at the same time it must be transparent to the threads involved in the switch. So the OS must save the state of the thread such as the program counter (instruction pointer), registers, etc. as the thread context. The switch between threads by the OS is called a context switch.

Concurrency

A race condition occurs when the computation depends on the relative speed of two or more processes.
Dealing with race conditions,
- Mutual exclusion
  - Identify the shared variables,
  - identify the code sections which access these variables (critical region)
  - and use some kind of mutual exclusion (such as a lock) to ensure that at most only one process can enter a critical region at a time.
- Lock-free data structures
  - Allow the concurrent access to shared variables, but design data structures to be designed for concurrent access
- Message-based communication
  - Eliminate shared variables and instead use communication and synchronisation between processes.
Mutual Exclusion - enter_region() and leave_region().
Hardware usually provides some mutual exclusion support by providing an atomic test and set instruction.
A uniprocessor system runs one thread at a time, so concurrency arises from preemptive scheduling. But with multiprocessor machines concurrency also arises from running code in parallel using shared memory.
The problem with the approach to mutual exclusion so far is that when process B is blocked, it just sits in a loop waiting for the critical region to become available. Can fix this by using sleep and wakeup. We use a mutex lock for this.
Blocking locks can only be implemented in the kernel, but can be accessed by user-level processes by system calls.
A mutex allows at most one process to use a resource, a semaphore allows at most N processes.
Producer-consumer problem
- Producer can sleep when buffer is full, and wake up when space available.
- Consumer can sleep when buffer is empty and wake up when some items available.
- Can do using semaphores or condition variables.
Monitors are set up to only allow one process/thread to operate inside it at once, with extra requests put in a queue. Implemented by the compiler.
Condition variables. cv_create, cv_destroy, cv_wait, cv_signal, cv_broadcast
Dining philosophers problem. Need to prevent deadlock.

Deadlock

A set of processes is deadlocked if each process in the set is waiting for an event that only another process in the set can cause.
Four conditions for deadlock:
- Mutual exclusion condition (device not shareable amongst threads)
- Hold and wait condition (a resource can be held, but then block awaiting more resources)
  - Can attack this by requiring all process ask for all resources at the start, and only start if they are granted.
- No preemption condition - previously granted resources cannot be forcibly taken away
- Circular wait condition
  - Always request resources in the same order
Dealing with deadlock:
- ignore the problem
- detect and recover
- dynamic avoidance (careful resource allocation) - we require some information in advance like which resources and how many will be needed before process starts.
  - Bankers algorithm
- prevention (negate one of the four conditions for deadlock)
Starvation is slightly different to deadlock as the system can be making progress, but there are some processes which never make progress.

File Systems

File systems provide an abstraction of the physical disk. They allow you to store files in a structured manner on a storage device.
The file system abstraction,
Architecture of the OS storage stack,
The application interacts with the system call interface which on linux provides creat, open, read, write, etc.
In the case of modern hard drives,
The disk controller (between device driver and physical disk) hides disk geometry and exposes a linear sequence of blocks.
The device driver hides the device specific protocol to communicate with the disk.
The disk scheduler takes in requests coming from different processes and schedules them to send one by one to the device driver.
The buffer cache keeps blocks in memory to improve read and write times for frequently accessed blocks.
The file system (FS) hides the block numbers and instead exposes the directory hierarchy, files, file permissions, etc. To do this it must know which blocks relate to which file, and which part of the file. It must also manage how to store this on the blocks of the linear address space of the disk, keeping track of which blocks are in use. See Allocation Strategies
The virtual file system (VFS) provides an interface so that different file systems which are suitable for VFS (ie. they have the concept of files, directories, etc..) can be accessed uniformly.
The open file table (OF table) and file descriptor table (FD table) keep track of files opened by user-level processes.

There are many popular file systems in use today. FAT16, FAT32 are good for embedded devices; Ext2, Ext3, Ext4 are designed for magnetic disks, ISO9660 is designed for CD-ROM's, JFFS2 is optimised for flash memory as you don't want to write to the same location too many times as it wears out. Others include NTFS, ReiserFS, XFS, HFS+, UFS2, ZFS, JFS, OCFS, Btrfs, ExFAT, UBIFS.

Performance issues with standard magnetic disks

To access a block on a disk the head must move to the required track (seek time), then we have to wait until the required block on the track reaches the head (rotational delay). We also have some mostly fixed overhead for the data to be passed between the device and the driver (transfer time).
Total average access time, $latex T_a = T_s + \frac{1}{2r} + \frac{b}{rN}$. Where r is the rotational speed, b is the number of bytes to transfer, N is the number of bytes per track and $latex T_s$ is the seek time.
Seek time is the most expensive, so we want to ensure that our disk arm scheduler gives good performance.
- First-in, First-out (no starvation)
- Shortest seek time first (good performance, but can lead to starvation)
- Elevator (SCAN) (head scans back and forth over the disk. no great starvation, sequential reads are poor in one of the directions)
- Circular SCAN (head scans over the disk in one direction only)

Block Allocation Strategies

Contiguous Allocation
- Gives good performance for sequential operations as all the blocks for a file are together in order
- though you need to know the maximum file size at creation
- as files are deleted, free space is fragmented (external fragmentation)
Dynamic Allocation
- Blocks allocated as needed, hence the blocks for a file could be all over the place on the disk
- need to keep track of where all the blocks are and the order for each file

External fragmentation - space wasted external to the allocated regions, this space becomes unusable as its not contiguous (so you have lots of small spaces but you need a larger space to fit a whole file in)

Internal fragmentation - space wasted internal to the allocated memory regions, eg. you get a 4K block allocated and only use 1K, but the OS can't give the leftover 3K to someone else.

Keeping track of file blocks

File allocation table (used for FAT16 and FAT32)
inode based FS; keep an index node for each file which stores all the blocks of the file.

Directories are stored as normal files but the FS gives these file special meaning.
A directory file stores a list of directory entries, where each entry containing the file name (because Ext2/3/4 store these with the directory file, not the inode for the file), attributes and the file i-node number.
Disk blocks (sectors) have a hardware set size, and the file system has a filesystem set block size which is sector size * 2^N, for some N. A larger N means large files need less space for the inode, but smaller blocks waste less space for lots of small files.

Ext2 inodes
The top bunch of data is file attributes. Block count is the number of disk blocks the file uses.
The direct blocks area stores index's for the first 12 blocks used by the file.
The single indirect is a block numbers to a block which contains more block numbers.w

System call interface

At the system call level a file descriptor is used to keep track of open files. The file descriptor is associated with the FS inode, a file pointer of where to read or write next and the mode the file was opened as like read-only.
Most Unix OS's maintain a per-process fd table with a global open file table.

VFS

Provides a single system call interface for many file systems.
- the application can write file access code that doesn't depend on the low level file system
- device can be hard disk, cd-rom, network drive, an interface to devices (/dev), an interface to kernel data structures (/proc)

Journaling file system keeps a journal (or log) of FS actions which allows for the FS to recover if it was interrupted when performing an action that is not atomic, but needs to be.

Memory Management and Virtual Memory

The OS needs to keep track of physical memory, what's in use and which process is it allocated to. It must also provide applications with a view of virtual memory that they are free to use.
Swapping (sometimes called paging) is where memory is transferred between RAM and disk to allow for more data in memory than we have space for in physical RAM. On base-limit MMU's swapping only allows who programs memory allocation to be swapped to disk, rather than just pages at a time as with virtual memory, hence you can't use swapping here to allow programs larger than memory to run.
If we are only running one program we can give it all of memory (and either run the in part of it, or in some other ROM). However many systems need more than one process to be running at the same time. Multiprogramming also means that we can be utilising the CPU even when a process is waiting on I/O (ie. give the CPU to the other process). But to support multiprogramming we need to divide memory up.
We could divide memory up into fixed size partitions and allocate these to processes, but this creates internal fragmentation (wasted space inside the partition allocated).
Using dynamic partitioning we give each process exactly how much it needs, though this assumes we know how much memory we need before the process is started. This approach also leads to external fragmentation where we have lots of little unallocated gaps, but these are too small to be used individually.
Approaches to dynamic partitioning include first-fit, next-fit, best-fit and worst-fit.
Binding addresses in programs
- Compile time
- Load time
- Run time
Protection - we only want each process to be able to access memory assigned to that process, not other process
- Base and Limit Registers - set by the kernel on a context switch, the hardware handles the rest
- Virtual Memory - two variants
  - Paging
    - Partition physical memory into small equal sized chunks called frames.
    - Divide each process's virtual (logical) address space into same size chunks called pages. So a virtual memory address consists of a page number and an offset within that page.
    - OS maintains a page table which stores the frame number for each allocated virtual page.
    - No external fragmentation, small internal fragmentation.
    - Allows sharing of memory
    - Provides a nice abstraction for the programmer
    - Implemented with the aid of the MMU (memory management unit).
  - Segmentation
    - A program's memory can be divided up into segments (stack, symbol table, main program...)
    - Segmentation provides a mapping for each of these segments using a base and limit.

Virtual Memory

If a program attempts to access a memory address which is not mapped (ie. is an invalid page) a page fault is triggered by the MMU which the OS handles.
Two kinds of page faults,
- Illegal Address (protection error) - signal or kill the process
- Page not resident - get an empty frame or load the page from disk, update the page table, and restart the faulting instruction.
Each entry in the page table not only has the corresponding frame number but also a collection of bits like protection bits, caching disabled bit, modified bit, etc.
Page tables are implemented as a data structure in main memory. Most processes don't use the full 4GB address space so we need a data structure that doesn't waste space.
The two level page table is commonly used,

An alternative is the inverted page table,
The IPT grows with size of RAM, not virtual address space like the two level page table.
Unlike two level page table which is required for each process, you only need one IPT for the whole machine. The downside is sharing pages is hard.

Accessing the page table creates an extra overhead to memory access. A cache for page table entries is used called a translation look-aside buffer (TLB) which contains frequently used page table entries.
TLB can be hardware or software loaded.
TLB entries are process specific so we can either flush the TLB on a context switch or give entries an address-space ID so that we can have multiple processes entries in the TLB at the same time.
Principle of Locality (90/10 rule)
- Temporal Locality is the principle that accesses close together in terms of time are likely to be to the same small set of resources (for instance memory locations).
- Spatial locality is the principle that subsequent memory accesses are going to be close together (in terms of their address) rather than random. array loop example
The pages or segments required by an application in a time window ($latex \Delta$) is called its memory working set.
Thrashing,
To recover from thrashing, just suspend some processes until it eases.

Fetch policy - when should a page be brought into memory? on demand, or pre-fetch?
Replacement policy - defines which page to evict from physical memory when its full and you start swapping to disk.
- Optimal
- FIFO - problem is that age of a page isn't necessarily related to its usage
- Least Recently Used - works well but need to timestamp each page when referenced.
- Clock (aka. Second Chance) - an approximation of LRU. Uses a usage or reference bit in the frame table
Resident Set Size - how many frames should each process have?
- Fixed allocation
- Variable allocation - give enough frames to result in an acceptable fault rate

Input Output

Programmed I/O (polling or busy waiting)
Interrupt Driven I/O
- Processor is interrupted when I/O module (controller) is ready to exchange data
- Consumes processor time because every read and write goes through the processor
Direct Memory Access (DMA)
- Processor sent interrupt at start and end, but is free to work on other things while the device is transferring data directly to memory.

Scheduling

The scheduler decides which task to run next. This is used in multiple contexts, multi-programmed systems (threads or processes on a ready queue), batch system (deciding which job to run next), multi-user system (which user gets privilege?).
Application behaviour
- CPU bound - completion time largely determined by received CPU time
- I/O bound - completion time largely determined by I/O request time
Preemptive (requires timer interrupt, ensures rouge processes can't monopolise the system) v non-preemptive scheduling.
Scheduling Algorithms can be categorised into three types of systems,
- Batch systems (no users waiting for the mouse to move)
- Interactive systems (users waiting for results)
  - Round Robin - each process is run and if it is still running after some timeslice t, the scheduler preempts it, putting it on the end of the ready queue, and scheduling the process on the head of the ready queue. If the timeslice is too large the system becomes sluggy and unresponsive, if it is too small too much time is wasted on doing the context switch.
  - The traditional UNIX scheduler uses a priority-based round robin scheduler to allow I/O bound jobs to be favoured over long-running CPU-bound jobs.
- Realtime systems (jobs have deadlines that must be meet)
  - Realtime systems are not necessarily fast, they are predictable.
  - To schedule realtime tasks we must know, its arrival time $latex a_i$, maximum execution time (service time), deadline ($latex d_i$).
  - tasks could be periodic or sporadic.
  - slack time is time when the CPU is not being used, we are waiting for the next task to become available.
  - two scheduling algorithms,
    - Rate Monotonic - priority driven where priorities are based on the period of each task. ie. the shorter the period the higher the priority
    - Earliest Deadline First - guaranteed to work if set of tasks are schedulable. The earlier the deadline the higher the priority.

Multiprocessor Systems

For this section we look at shared-memory multiprocessors.
There are uniform memory accesses multiprocessors and non-uniform memory access multiprocessors which access some parts of memory slower than other parts. We focus on the UMA MP's.
"Spinlocking on a uniprocessor is useless, as another thread on the same processor needs to release it, so blocking asap is desirable. On a multiprocessor, the thread holding the lock may be presently active on another processor, and it could release the lock at any time. On a multiprocessor, spin-locking can be worthwhile if the average time spent spinning is less than the average overheads of context switch away from, and back to, the lock requester."
Thread affinity refers to a thread having some preferred processor to run on an a multiprocessor machine. For instance if thread A started on cpu0 it may want to be scheduled again on cpu0 rather than cpu1 as parts of the cache may still be intact.
Multiprocessor systems have multiple ready queues, as just one ready queue would be a shared resource which introduces lock contention.

Security

Tags: computing.

Security Engineering Notes

31st July 2010

Here are are some rough notes I put together as part of revision for a uni course.

Security Engineering

CIA
- Confidentiality
- Integrity
- Availability
Some other attributes include,
- Authenticity
- Privacy
- Accountability,
- Non-reputability
- Receipt-freeness (for voting systems)
Methods of attack
- Brute Force
- Deception/Social Engineering
- Calculated attack on a vulnerability
- Insider (both intentional and unintentional/accidental)
  - Mitigation techniques: Keeping logs and conducting audits could. Also least privilege to prevent accidental insider attacks.
- Chance (right place at the right time)
- Accidental
Reliable systems (protect against random failures)
Secure systems (protect against targeted deliberate attacks)
Assets which need to be protected
- Tangible (eg. documents)
- Non-tangible (eg. reputation)

Security Engineering: Protecting assets against threats.
Kerckhoff's Principle - "the security of the cryptosystem must not depend on keeping secret the crypto-algorithm. It must depend only on keeping secret the key."
- Because, its often easier to change the key after a compromise or possible compromise than changing the algorithm,
- and it's usually also harder to keep the algorithm secret (can be reverse-engineered or bought) compared to the key

Crypto

Code (hidden meaning, ie. A spy using a phase that seems normal, but actually has some other hidden meaning)
Plaintext -> E(encryption key) Ciphertext -> D(decryption key)
Symmetric-key cryptography - Encryption key can be calculated from the decryption key and vice versa (usually they are the same though). The sender and receiver must agree upon the keys first. Two types,
- Stream Ciphers - operate one bit at a time (eg. RC4)
- Block Ciphers - operate on a group of bits at a time (eg. DES and AES/Rijndael)
  - Block chaining is used to make each block of cipher text depend on all the previous blocks of plaintext.
Public-key Cryptography (asymmetric-key) - public and private keys where you cannot (or is really hard to) calculate one from the other. (eg. RSA)
- One problem is that an attacker can use an adaptive chosen plaintext attack, where they encrypt a large number of messages using the target's public key. A fix is to pad messages with a long random string.
- Can be used to do digital signatures. (encrypt your message M (though in practice a one way hash is used instead) using your private key, verifiers just decrypt using your public key as only someone with the private key (ie. you) could have successfully encrypted M)
  - Birthday attack - find two messages that have the same hash but only differ by say spaces at the end of the line, and some other key part like dollar amount. Then when you get Alice to sign document A, the signature will be the same for document B (as Alice is signing the hashes and the hashes are the same)
- Public Key Infrastructure - how to ensure that Bob's public key really belongs to the Bob you want to communicate with (tries to solve the key distribution problem).
  - Public Key Certificates
  - Certificate distribution (X.509, GPG)
Most cryptosystems use public-key crypto just to exchange the symmetric key shared key, and then both parties this shared key to communicate with symmetric-key crypto. These systems are known as hybrid cryptosystems. This is done for mainly two reasons,
- public-key algorithms are usually around 1000 times slower than symmetric algorithms.
- public-key algorithms are vulnerable to chosen-plain text attacks.
Cryptanalysis is about determining the plaintext from the key.
Authentication

Extra notes from Schinder 2nd Ed.

Types of attacks,
- cipher-text only
- known plain text
- chosen plain text
- adaptive chosen plain text (you can modify your next choice based on the results of the previous choice)
- chosen cipher text
- known key relationship
- rubberhose (get the person who knows the key to reveal it)
Substitution ciphers
- Simple (monoalphabetic cipher)
  - A → 2 B → 3 C → 1 ...
- Homophonic (where multiple options available such as 5,2 one is chosen at random)
  - A → 5,2 B → 1,3 C → 4 ...
- Polygram
  - ABC → 819 ABB → 234 ABA → 567 ACA → 587 ...
- Polyalphabetic (simple substitution cipher, but the mapping also depends on the position of the input character in the plain text), examples include,
  - Caesar cipher - shift each letter forward or back by K. eg. if K = 1, A → B, B → C ...
  - rot13
  - Vigenere cipher - align the repeated key K over the plaintext
- These can be attacked by statistical cryptanalysis of the letter frequency
Transposition Ciphers - the order of the characters is changed (rotor machines such as the Enigma did this)
- eg. write across, but read down
```
HELLO
WORLD
THISI
SSOME
```
  encrypts to HWTSEOHSLRIOLLSMODIE
One time pads
One-way hash function (also known as message digest, fingerprint, cryptographic checksum)
- pre-image → f(x) → hash value
- A hash function is collision free if it is hard (ie. at best becomes brute force) to generate two pre-images with the same hash value.
- Also a single bit change in the pre-image should on average change half of the bits in the hash value.
Types of random sequences,
- Pseudo-random (looks random, good for applications like writing an random AI game agent)
- Cryptographically secure pseudo-random (unpredictable)
- True randomness (cannot be reproduced with the same inputs)
Zero-knowledge proofs - proving to someone that you know something without revealinfg them what that something is. Two types,
- Interactive
- Non-interactive

RSA

Choose two primes p and q and let n = pq. Choose e such that e and (p - 1)(q - 1) are relatively prime (ie. no common factor and both prime numbers). Let d be a solution of ed = 1 mod (p-1)(q-1). Public key $latex K = (e,n)$, private key $latex K^{-1} = (d,n)$.

$latex E_K(M) = M^e \mod n$

$latex D(M) = E_{K^-1}(M) = M^d \mod n$

Access Control

"An access control policy is a description of who may access what (and how)."
Principle of least privilege - "The powers that an agent in the system is given should be the minimum that they need to have to play their assigned role in the system."
AC can be done at many levels (eg. hardware, os, database, network, application...)
Managing Subjects, Objects and Operations. (can use Groups or Roles to classify subjects)
- Access Control Matrix (central database of all objects, subjects and operations permitted)
- Access Control List (store permissions with object)
- Capabilities (store permissions with subject)
Mandatory AC - central entity sets the AC.
- Multi-level (eg. top secret, secret, unclassified labels...)
  - Bell-LaPadula - no read up, no write down.
Discretionary - each object has an owner, and the owner sets the AC.
Commercial Policies
- Chinese Wall - eg set up rules like if a subject can access X they cannot access Y.
- Dual Control - need two people together to modify X, they cannot do it on their own.
Reference Monitor - some entity that sits between the subjects and objects to implement some policy (relies on trusted entity)

Elections

You want your elections to be,

verifiable - so the voter can check that their vote was indeed counted (and if possible check that everyone else's vote was counted correctly)
correct - votes are counted correctly and the results match calculated correctly
secret (ie. no one knows how you vote, so that you cannot be coerced)
coercion free (so you want to be recite free)

(additional notes, but don't really need to know for exam)

True Democracy (like the Greek's assembly)
Representative Democracy (where we vote in someone to represent us)
Participatory Democracy (where we represent ourself an all have equal say)

Australian Federal Elections are subject to the Mafia problem. Because we have preferential voting that is you can order your preferences that means that for n candidates there are n! ways of voting. So if there are enough candidates and few enough people you can coerce a person into using a specific voting pattern which you can then verify that they indeed did vote that way.
An alternative attack would be to obtain a blank voting paper, fill it in the way you want it to be give it to someone and tell them to submit it and return you a blank one. Although the attacker won't know if the victim has scribbled it out to make it invalid, but at least the attacker has prevented someone from voting.

Security Architecture

Security Design Principles:

Economy of Mechanism - keep things simple
Fail-safe defaults
Complete Mediation - every access request checked
Open design, ie. Kerckhoffs' Principle: Keep the key secret, don't rely on keeping the algorithm secret.
Separation of Privilege
Least Privilege
Least common mechanism
Psychological acceptability

Defence in depth - use many layers of security (or many security perimeters in layers)

Human Factors

People are trusting
People are lazy (to read the manual or change factory defaults)
People are greedy (will exploit the system if given the chance)
People are forgetful (forget to revoke access when terminating employees)
People are selfish
People are stick-beaks (eg. Medicare employees accessing client data for personal interest)

Some strategies for reducing the risk,

logging of insider's actions
honeypots for insiders (eg. fake web server running on the network, or fake data in the database)
security policy for staff to follow

Risk

Risk is not something I would have thought to be in a security course, but now that I think about it there are few if any bullet proof systems, so there is always some risk.

Whether it be secure communication (there is always some risk that an eavesdropper cracks your cryto and reads the message, so its important to weigh up those risks to decide if its worth sending the message in the first place), or be it running a web server (you cannot be sure that your web server doesn't have bugs, or even if you have verified the code to be matching the vulnerability free specification other things can happen like, has your CPU been verified to be bug free, are you sure that a cosmic ray won't flip a bit and subsequently create a vulnerability). So weighing up the risks involved is important to decide how much time and effort you devote to securing a system, or how the system is designed to work.

Business Risk Concepts

Exposure - what is the worst case scenario (or loss)
Volatility - how predictable is the loss
Probability - how likely is that loss

Degree of Risk

       <-- Probability -->
High Exposure/  |  High Exposure/         /\
Low Probability |  High Probability       ||
-----------------------------------    Exposure
Low Exposure/   |  Low Exposure/          ||
Low Probability |  High Probability       \/

Exposure - how widespread is the system? Probability - how easy is it to exploit the system, and how great is the incentive to do so (which relates to how valuable the assets you are protecting are)?

Risk management

Example. The risk of viruses; the mitigation strategy may be to use anti-virus detection, but the residual risk is zero-day attacks.
Carry the risk - just accept it.
Balancing the risk - eg. data backups at multiple locations, distributed servers at multiple locations (perhaps even running different systems, so for instance if you want to keep your web site up and running you may load balance between both a Linux box running Apache, and a windows box running IIS, so that an attack found in one is not likely to be present in the other so the attacker can't take both offline with the one attack.)
Mitigate it - reduce the risk
Transfer the risk - eg. in finance, get insurance

Tags: computing.

A Great Gnome Panel Applet + A UPnP Enabled Router

26th July 2010

I have just discovered that my ADSL router supports UPnP, and that this provides an interface to access 3 important bits of information from the router (external IP address, total bytes sent and total bytes received). I had previously been scraping the router's web interface to grab the external IP. As for the bytes sent and received, I didn't even know the router had a method of reporting these.

My first instinct was to look for a Perl library for UPnP, I found two. One in the Ubuntu repositories (and in CPAN) http://search.cpan.org/perldoc?Net::UPnP::GW::Gateway and another which appears to be in neither, http://perlupnp.sourceforge.net/.

I tried out the first one and after some fiddling get it working (though I haven't yet been able to eliminate the 2 seconds it spends blocked, ie. not executing on the CPU but still not complete).

Next I found a great program that allows you to place an arbitrary command's output in the Gnome Panel, http://code.google.com/p/compa/. Which resulted in,

The Perl script I use to provide the output to the Compa applet is at http://github.com/andrewharvey/perlmisc/blob/master/upnp_router_inoutbytes_to_compa.pl

Tags: computing, dev.

The URL shown in Firefox's status bar isn't always where you end up.

27th June 2010

Unless you are aware of the more technical details of web browsing its reasonable for the average web user to assume that if you hover your mouse over a link and Firefox tells you in the status bar that the link is to http://foobar.com/, then clicking on the link will actually take you to http://foorbar.com/. But sadly this is not the case for out of the box Firefox.

Take a look at a Google search results pages. Hovering your mouse over the links gives one URL in the status bar, yet clicking the link actually takes you somewhere else.

Here is a sample of the HTML for the link,

<a href="http://www.example.com/page1.html" onmousedown="return rwt(this,'','','res','1','$ID1','&amp;sig2=$ID2','$ID3')">Page Title</a>

Hovering over the link, you see in the status bar http://www.example.com/page1.html, but as soon as you mousedown javascript goes ahead and changes the href to something else (keep in mind that Firefox only goes to the new link on mouserelease), so that when you release the mouse your browser takes you to the replacement URL.

The problem I see with this is what if some unsuspecting user checks the link in the status bar, clicks the link thinking they are going one place then get taken somewhere else. This becomes even more of a problem if that site is susceptible to certain kinds of XSS attacks. So you can think your going to https://paypal.com/, and the URL bar after clicking the link goes to https://paypal.com/ but yet you've actually got some malicious js or html injected in the paypal.com/ page that you have loaded in your browser window.

I originally thought this was clickjacking, but the Wikipedia article describes that as when a transparent layer on top of the page provides the concealed URL.

Tags: computing, web.

Australian States Map/Graph API

19th February 2010

I've managed to do a couple things all in one here. I've made use of some Geoscience Australia Creative Commons licensed material, in a nice little program with a web API, and I've aggregated some data from the myschool scraper and parser. Putting them all together gives some nice images like this.

The program for generating these images basically takes an SVG template file with placeholder markers and then fills these values based on the CGI parameters. The API is fairly simple so one should be able to work out how to use it from the example in the README file. Here are the files I used to make the graphs (and the svg versions as Wordpress.com won't let me upload them to here).

ps. This gets cut off when viewing it from the default web interface of this blog, use print preview or even better look at the RSS feed to see the cut off parts. Also I tried to ensure the accuracy of the data, but I cannot be 100% sure that there are no bugs, in fact there are discrepancies with the averages I get from my scrape of myschool and the averages provided in the report on the NPLAN website. The numbers I get seem to be consistent (ie. the state rankings seem mostly the same), but nonetheless not exactly the same as those reported in the report. Although I would be very surprised if all the numbers I got were exactly the same as in the report. I mainly did this to use map/graph code I wrote, so if you really care about how certain state averages compare in these tests look at the reports on the NPLAN website.

The lighter the colour the higher the number.

Primary

	2008	2009
Literacy
Numeracy
All

Secondary

	2008	2009
Literacy
Numeracy
All

Tags: computing, education, graphics.

A Look into the myschool.edu.au Data

7th February 2010

After overcoming a few problems I managed to write a scraper for the myschool.edu.au data. Unfortunately they choose to put data in HTML, so the scraping process may have led my data to have some unknown errors. I publish (see bottom) the scraped data as I believe that per the IceTV v Nine Network [2009] HCA 14 case, any data that my scraper produces as output from the HTML input is not subject to the copyright of the original HTML content (this also means that I cannot publish the HTML pages) and the Telstra Corporation Limited v Phone Directories Company Pty Ltd [2010] FCA 44 case, that the raw data that is scraped is not subject to copyright.

I wish I could bzip2 up all those HTML pages and give them to you just to save your download, because the myschool.edu.au site doesn't compress their pages when I tell them I accept gzip over HTTP, so it took up almost 2GB of quota to download all the HTML pages, oh well.

Some preliminary statistics from the data.

There are a total of 9316 (or 9279 after I ran a newer scraper at a later data) schools. Of these,
- 1538 are Secondary (of which 30% are non-government and 70% are government)
- 1407 are Combined (of which 68% are non-government and 32% are government)
- 6054 are Primary (of which 23% are non-government and 77% are government)
- 317 are Special (of which 15% are non-government and 85% are government)
So,
- 6451 are Government (69%),
- 2865 are Non-government (31%)
These 9316 schools contain a total of 3 366 351 students of which,
- 1 745 224 are male (51%)
- 1 651 127 are female (49%)
The most schools in 1 postcode is 40, which are all in the postcode 2480.
The average student attendance rate is 92.007%
- 91.870% for Government, 92.335% for Non-government
- 89.205% for Secondary, 92.982% for Primary, 90.675% for Combined, 89.170% for Special.
There are a total of 265 960 teaching staff (full time equivalent of 241 408) and 124 117 non-teaching staff (full time equivalent of 86 511.9).

I could report a lot of stats like these above, all you need is a basic knowledge of SQL, but as much as I enjoy working out these stats I find graphs and graphics much more intuitive, so that is up next. Because of the vast dimensions to the data you can make all kinds of graphs so what would be best is a system to draw graphics dynamically which allows the user to decide what is graphed, but this takes more work so that is on the todo list.

I've also looked into doing some heatmaps using the geographical location of the schools, I could have used Google Maps, or I could use OpenStreetMap and libchamplain. Both have pros and cons... But for now I used Google Maps because their API is simple and I've always wanted to experiment with it, the downside is I'm not sure about the copyright of their maps and subsequently any derivative works. This image is just a test showing a dot for each school in the system, but its very easy to change the colour, size and opacity of the dots based on features of the school.

Another test (some markers will be missing or in the wrong place, like the ones in NZ!),

[caption id="attachment_1023" align="aligncenter" width="450" caption="Google Earth map showing markers for Australian schools (though not completely accurate). (Copyright notices in image)"][/caption]

Source code? http://github.com/andrewharvey/myschool

Don't want to scrape and parse but want the raw data in a usable form? http://github.com/andrewharvey/myschool/tree/master/data_exports/

Extra thought: Currently the code uses Google's API for geting the geolocation of the school, I could use OpenStreetMap for this also, however it would take more investiagtion to determine what tools exist. At the moment all I know is I have an .osm file of Australia, but schools aren't just one dot, they are a polygon so unless I find some other tools which probably exist, I would need to (probably) just use one of the points in the polygon.

Or I could used the Geographic Names Register for NSW, but that is just for NSW... http://www.gnb.nsw.gov.au/__gnb/gnr.zip

Tags: computing, education, politics.

anonymous FTP

22nd January 2010

So I've started reading a book about networks, and to complement this I've been taking a closer look at my network traffic in Wireshark (really great tool, by the way.).

So I pick an ftp site that I know, ftp://download.nvidia.com/ and see what happens in Wireshark when I visit it in Firefox. At the FTP application level this is what happens,

ftpsite to me: 220 spftp/1.0.0000 Server [69.31.121.43]\r\n
me to ftpsite: USER anonymous\r\n
ftpsite to me: 331 Password required for USER.\r\n
me to ftpsite: PASS mozilla@example.com\r\n
ftpsite to me: 230- \r\n
               230- ---------------------------------------------------------------------------\r\n
               230- WARNING:  This is a restricted access system.  If you do not have explicit\r\n
               230-           permission to access this system, please disconnect immediately!\r\n
               230 ----------------------------------------------------------------------------\r\n

But Firefox does not disconnect. So I did some more research and I found no references to "anonymous" users in either RFC 959 (FTP) or RFC 3659 (extensions to FTP). (Though there are references in latter RFCs, see RFC 2228).

The thing I find disconcerting is that I don't think I have "explicit permission" to access this system. I (or rather Firefox) just guessed a username and password and they happened to let me in (what if I guessed a different username and password that wasn't anonymous and it let me in?). If the RFC specified that a user of anonymous requires no password, or any password, then I would assume that the FTP server is granting me permission, but I assume rather people just started using anonymous as the user and it caught on...

The real problem here is that there are laws which govern such areas, and it doesn't help that that I don't understand what PART 6 - COMPUTER OFFENCES of the CRIMES ACT 1900 (NSW) is saying.

Tags: computing, law.

To fix broken audio, unplug faulty USB device.

19th January 2010

How weird is this, just recently when I started up my computer lots of stuff was broken, no audio (and /proc/asound/cards was empty, normally it has "0 [Intel ]: HDA-Intel - HDA Intel\nHDA Intel at 0xfa100000 irq 22"), libsensors weren't reporting any values (eg. no CPU temp reported), eth0 dissapeared from NetworkManager, and probably a host of other things that I didn't notice. Restarting didn't fix it.

Well long story short, it turns out that everything magically fixed when I unplugged a USB hard drive that was plugged in. I had seen a lot of concerning messages sent to /var/log/messages from the kernel about it,

Jan 19 09:45:00 host kernel: [ 564.100026] usb 1-3: reset high speed USB device using ehci_hcd and address 2 Jan 19 09:45:00 host kernel: [ 564.237716] sd 8:0:0:0: [sdd] Unhandled error code Jan 19 09:45:00 host kernel: [ 564.237719] sd 8:0:0:0: [sdd] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK

that repeated every so often, but I never thought that a dodgy USB device would break the kernel from doing some of its job.

Tags: computing.

An Idea for a Media-Centreish Interface for a UNIX terminal/shell

18th December 2009

Back in July or August this year when I was going through the notes on unix shells for COMP2041 I came up with idea of doing a shell/terminal interface that looked like an interface for a media centre ie. rather than looking like this,

it would look "like" this (obvious not exactly the same but similar feel),

[caption id="attachment_970" align="aligncenter" width="450" caption="XBMC skin MediaStream by Team Razorfish. http://xbmc.org/wordpress/wp-content/gallery/mediastream/viewoptions.jpg"][/caption]

The key principles I had in mind were,

nice aesthetics
interface similar to a game or media centre
features easily discoverable for new users

My original motives were that I was just learning all these core-utils commands (ls, cat, mkdir, cp, mv...) and I found that although the shell had tab completion and apropos, it didn't categorise these or give them in a list of common commands. Then I came up with more abstract ideas,

categorise common commands and give help on them. eg. File System: ls, cd, cd .., mkdir. Filters: cat, wc, grep...
parse commands and their argument list based on common styles (eg. GNU style, short -las and long -l --all --size) and provide contextual information (eg hovering over an --argument gives a one line message about what that argument does (perhaps parse the man file to get this info)) also auto-layout the command line as per the argument style.
it could also parse the pipe lines and display these much more visually so its easier to see what's piping into what and allow the user to easily change the order/flow of the pipeline.
process management. don't force the user to remember Ctrl+C and Ctrl+Z and bg and fg commands, show these as pause and stop icons.
redirection of output should be easily changed in the interface rather than just adding a < or > to the command line (and allow one to redirect STDOUT to a file AFTER the command has already run (because currently you would need to run the command again, or copy and paste and put up the with new lines that gnome-terminal puts in))
bookmarking commands (including argmunts) so that those common ones you use that you haven't remembered yet are quick and easy to use.
colour STDERR in red.

I haven't really thought about it on a technical level, but it may not be so portable as say gnome-terminal. I don't know the really differences among different shells out there so I don't know how dependent this is on bash or even if it ties bash and the terminal together, but from a beginner user perspective I don't care about this.

The cloudy idea I have in my mind is basically a GUI/CLI hybrid but I think such a program would need to be careful not to go too far, because it could be made so that after doing an ls -la you could click on a file in the list and rename it, but then we are turning into a file manager in list mode (like Dolphin or Nautilus) which is unnecessary as those tools already exist.

I'm aiming to do come up with a list and more detailed list of requirements and a set of activity and use case scenarios, along with some wire-frame prototypes for such an interface soon. But for now I just needed to get it all out of my head an onto paper (and also public (in case someone tries to patent a concept)).

Tags: computing, sh.

The Features of My Utopian Music Player

11th December 2009

Ideally I would like to write my own music player because I don't really like any that are currently available (Amarok 1.4, Amarok 2, Songbird, Rhythmbox, Banshee, Exaile). I like features from each but none seem to fit all my needs. All the time I keep rethinking what I should do and I still cannot decide. Anyway this is what my ideal music player would be like...

Backend Database
- The backend metadata would be stored in an external Postgresql database, with the option for using sqlite for people who don't want to set up and run postgresql.
- The schema should be good and documented, so that a user can read and write into the database. If not at least give an interface to allow this.
- Full playback information. I want my music player to store the timestamps of every time a given song has been played. I want history too, for instance the times of when the song rating was changed.
Collection Manger
- I want the music player to be the library not just the librarian. I want to give it a file (say an MP3), along with details such as song title, artist, etc. and I want it to take that file and store it on the hard disk in a nice file structure (like iTunes does). Amarok 1.4 attempts to do this but its really hard, because initially it will just add the file to your playlist and not move it across to your collection, and even then if you change the details say the artist it will not correct this in the folder structure used to store that file.
- Tagging songs. Amarok does this well.
Web scraper
- Album art and lyric scraping (but who knows you might get sent to jail for writing a scraper for a specific site which you do not know if they have permission to distribute those certain copyrighted lyrics/album art which were available at the time of writing the scraper). Amarok seems to do this well.
Acoustic Analysis
- Surely there are algorithms to guess the BPM (beats per minute) of a song. I want that integrated into the music player.
- I need a moodbar so I can navigate a song, and to gather contextual information on how the style of the music varies over the song.
- I don't know much about acoustics, but there must be other algorithm which give meaningful measures of audio. These should be used to group songs and find similar ones.
- This must be done locally, I don't want to send things to web services (MusicBrainz, http://echonest.com/).
Navigation
- I want a concept of a "Library" rather than a Playlist. Amarok only has playlists, but 99.9% of the time I want a list of all my songs.
Statistics
- I want reports. Reports on my listening trends and song collection. eg. http://lastgraph3.aeracode.org/

Now for the solution. I could try everything from writing my own music player from scratch that implements that all (but I gave up on that after I could not decide what programming language to use C, C++, Java, Perl, Python, what GUI widget toolkit to use Qt, GTK+, wxWidgets, graphics api for nice graphs Cairo, raw OpenGL, OpenGL behind Clutter, R's graph drawing, Processing, or some other CPAN Perl module for drawing nice graphs. I can mix a few but the core app needs one programming language and it needs a core GUI toolkit for the GUI. There is too much choice and I don't have enough experience to know before hand what is best and what I will find easiest and simplest to use.)

I could try to capture playback statistics by looping last.fm and audioscrobber.com to localhost and capturing the data that Amarok sends. Or I could just write a script for Amarok which captures playback, but this only solves part of the problem and then I'm stuck using a certain application. Alternatively I could just take an existing program and fork it to suit my needs.

There should be more to come on this as I start experimenting.

Tags: computing, dev.

A Perl Script to Pause/Resume Amarok 1.4 Playback on Screensaver/Screenlock

11th December 2009

I've just uploaded to GitHub a script to pause Amarok 1.4 playback when the screensaver/screenlock starts and up pause again when closed/unlocked. It addresses the issue I was having with the script at http://nxsy.org/getting-amarok-to-pause-when-the-screen-locks-using-python-of-course where the script would start Amarok if it was not running and it would restart playback on screensaver end/unlock regardless of whether it was playing when the screensaver started.

You could start the script on start-up or plug it into Amarok's script engine to only be active when Amarok is active.

(Oh and in the future I'll try to avoid posts that just duplicate item's from other RSS/Atom feeds that don't add much extra value.)

Tags: computing, dev, sh.

Computer Graphics Notes

2nd December 2009

Not really complete...

Colour notes here, transformations notes here.

Parametric Curves and Surfaces

Parametric Representation

eg. $latex C(t) = (x(t), y(t))$

Continuity

Parametric Continuity

If the first derivative of a curve is continuous, we say it has C¹ continuity.

Geometric Continuity

If the magnitude of the first derivative of a curve changes but the direction doesn't then, we say it has G¹ continuity.
Curves need G2 continuity in order for a car to drive along them. (ie. not instantly change steering wheel angle at any points).

Control Points

Control points allow us to shape/define curves visually. A curve will either interpolate or approximate control points.

Natural Cubic Splines

Interpolate control points.
A cubic curve between each pair of control points
Four unknowns,
- - interpolating the two control points gives two,
  - requiring that derivatives match at end of points of these curves gives the other two.
Moving one control point changes the whole curve (ie. no local control over the shape of the curve)

Bezier Curve

This Bezier curve shown has two segments, where each segment is defined by 4 control points. The curve interpolates two points and approximates the other two. The curve is defined by a Bernstein polynomial. In the diagram changing points 1 and 2 only affects that segment. Changing the corner points (0 and 3) each only affect the two segments that they boarder.

Some properties of Bezier Curves:

Tangent Property. Tangent at point 0 is line 0 to 1, similarly for point 3.
Convex Hull Property. The curve lies inside the convex hull of the control points. (The corollary of this is if the control points are colinear, the curve is a line.)
They have affine invariance.
Can't fluctuate more than their control polygon does.
Bezier's are a special case of B-spline curves.

We can join two Bezier curves together to have C¹ continuity (where B₁(P₀, P₁, P₂, P₃) and B₂(P₀, P₁, P₂, P₃)) if P₃ - P₂ = P₄ - P₃. That is P₂, P₃, and P₄ are colinear and P₃ is the midpoint of P₂ and P₄. To get G¹ continuity we just need P₂, P₃, and P₄ to be colinear. If we have G¹ continuity but not C¹ continuity the curve still won't have any corners but you will notice a "corner" if your using the curve for something else such as some cases in animation. [Also if the curve defined a road without G¹ continuity there would be points where you must change the steering wheel from one rotation to another instantly in order to stay on the path.]

De Casteljau Algorithm

De Casteljau Algorithm is a recursive method to evaluate points on a Bezier curve.

To calculate the point halfway on the curve, that is t = 0.5 using De Casteljau's algorithm we (as shown above) find the midpoints on each of the lines shown in green, then join the midpoints of the lines shown in red, then the midpoint of the resulting line is a point on the curve. To find the points for different values of t, just use that ratio to split the lines instead of using the midpoints. Also note that we have actually split the Bezier curve into two. The first defined by P₀, P₀₁, P₀₁₂, P₀₁₂₃ and the second by P₀₁₂₃, P₁₂₃, P₂₃, P₃.

Curvature

The curvature of a circle is $latex \frac{1}{r}$.

The curvature of a curve at any point is the curvature of the osculating circle at that point. The osculating circle for a point on a curve is the circle that "just touches" the curve at that point. The curvature of a curve corresponds to the position of the steering wheel of a car going around that curve.

Uniform B Splines

Join with C2 continuity.

Any of the B splines don't interpolate any points, just approximate the control points.

Non-Uniform B Splines

Only invariant under affine transformations, not projective transformations.

Rational B Splines

Rational means that they are invariant under projective and affine transformations.

NURBS

Non-Uniform Rational B Splines

Can be used to model any of the conic sections (circle, ellipse, hyperbola)

=====================

3D

When rotating about an axis in OpenGL you can use the right hand rule to determine the + direction (thumb points in axis, finger indicate + rotation direction).

We can think of transformations as changing the coordinate system, where (u, v, n) is the new basis and O is the origin.

$latex \begin{pmatrix}u_x & v_x & n_x & O_x\ u_y & v_y & n_y & O_y\ u_z & v_z & n_z & O_z\ 0 & 0 & 0 & 1 \end{pmatrix}$

This kind of transformation in is known as a local to world transform. This is useful for defining objects which are made up of many smaller objects. It also means to transform the object we just have to change the local to world transform instead of changing the coordinates of each individual vertex. A series of local to world transformations on objects builds up a scene graph, useful for drawing a scene with many distinct models.

Matrix Stacks

OpenGL has MODELVIEW, PROJECTION, VIEWPORT, and TEXTURE matrix modes.

glLoadIdentity() - puts the Identity matrix on the top of the stack
glPushMatrix() - copies the top of the matrix stack and puts it on top
glPopMatrix()

For MODELVIEW operations include glTranslate, glScaled, glRotated... These are post multiplied to the top of the stack, so the last call is done first (ie. a glTranslate then glScaled will scale then translate.).

Any glVertex() called have the value transformed by matrix on the top of the MODELVIEW stack.

Usually the hardware only supports projection and viewport stacks of size 2, whereas the modelview stack should have at least a size of 32.

The View Volume

Can set the view volume using,(after setting the the current matrix stack to the PROJECTION stack

glOrtho(left, right, bottom, top, near, far) (Source: Unknown)
glFrustum(left, right, bottom, top, near, far) (Source: Unknown)
gluPerspective(fovy, aspect, zNear, zFar) (Source: Unknown)

In OpenGL the projection method just determines how to squish the 3D space into the canonical view volume.

Then you can set the direction using gluLookAt (after calling one of the above) where you set the eye location, a forward look at vector and an up vector.

When using perspective the view volume will be a frustum, but this is more complicated to clip against than a cube. So we convert the view volume into the canonical view volume which is just a transformation to make the view volume a cube at 0,0,0 of width 2. Yes this introduces distortion but this will be compensated by the final window to viewport transformation.

Remember we can set the viewport with glViewport(left, bottom, width, height) where x and y are a location in the screen (I think this means window, but also this stuff is probably older that modern window management so I'm not worrying about the details here.)

Visible Surface Determination (Hidden Surface Removal)

First clip to the view volume then do back face culling.

Could just sort the polygons and draw the ones further away first (painter's algorithm/depth sorting). But this fails for those three overlapping triangles.

Can fix by splitting the polygons.

BSP (Binary Space Partitioning)

For each polygon there is a region in front and a region behind the polygon. Keep subdividing the space for all the polygons.

Can then use this BSP tree to draw.

void drawBSP(BSPTree m, Point myPos{
   if (m.poly.inFront(myPos)) {
      drawBSP(m.behind, myPos);
      draw(m.poly);
      drawBSP(m.front, myPos);
   }else{
      drawBSP(m.front, myPos);
      draw(m.poly);
      drawBSP(m.behind, myPos);
   }
}

If one polygon's plane cuts another polygon, need to split the polygon.

You get different tree structures depending on the order you select the polygons. This does not matter, but some orders will give a more efficient result.

Building the BSP tree is slow, but it does not need to be recalculated when the viewer moves around. We would need to recalculate the tree if the polygons move or new ones are added.

BSP trees are not so common anymore, instead the Z buffer is used.

Z Buffer

Before we fill in a pixel into the framebuffer, we check the z buffer and only fill that pixel is the z value (can be a pseudo-depth) is less (large values for further away) than the one in the z buffer. If we fill then we must also update the z buffer value for that pixel.

Try to use the full range of values for each pixel element in the z buffer.

To use in OpenGL just do gl.glEnable(GL.GL_DEPTH_TEST) and to clear the z-buffer use gl.glClear(GL.GL_DEPTH_BUFFER_BIT).

Fractals

L-Systems

Line systems. eg. koch curve

Self-similarity

Exact (eg. sierpinski trangle)
Stochastic (eg. mandelbrot set)

IFS - Iterated Function System

================================================

Shading Models

There are two main types of rendering that we cover,

polygon rendering
ray tracing

Polygon rendering is used to apply illumination models to polygons, whereas ray tracing applies to arbitrary geometrical objects. Ray tracing is more accurate, whereas polygon rendering does a lot of fudging to get things to look real, but polygon rendering is much faster than ray tracing.

With polygon rendering we must approximate NURBS into polygons, with ray tracing we don't need to, hence we can get perfectly smooth surfaces.
Much of the light that illuminates a scene is indirect light (meaning it has not come directly from the light source). In polygon rendering we fudge this using ambient light. Global illumination models (such as ray tracing, radiosity) deal with this indirect light.
When rendering we assume that objects have material properties which we denote k_(property).
We are trying to determine I which is the colour to draw on the screen.

We start with a simple model and build up,

Lets assume each object has a defined colour. Hence our illumination model is $latex I = k_i$, very simple, looks unrealistic.

Now we add ambient light into the scene. Ambient Light is indirect light (ie. did not come straight from the light source) but rather it has reflected off other objects (from diffuse reflection). $latex I = I_a k_a$. We will just assume that all parts of our object have the same amount of ambient light illuminating them for this model.

Next we use the diffuse illumination model to add shading based on light sources. This works well for non-reflective surfaces (matte, not shiny) as we assume that light reflected off the object is equally reflected in every direction.

Lambert's Law

"intensity of light reflected from a surface is proportional to the cosine of the angle between L (vector to light source) and N(normal at the point)."

Gouraud Shading

Use normals at each vertex to calculate the colour of that vertex (if we don't have them, we can calculate them from the polygon normals for each face). Do for each vertex in the polygon and interpolate the colour to fill the polygon. The vertex normals address the common issue that our polygon surface is just an approximation of a curved surface.

To use gouraud shading in OpenGL use glShadeModel(GL_SMOOTH). But we also need to define the vertex normals with glNormal3f() (which will be set to any glVertex that you specify after calling glNormal).

Highlights don't look realistic as you are only sampling at every vertex.

Interpolated shading is the same, but we use the polygon normal as the normal for each vertex, rather than the vertex normal.

Phong Shading

Like gouraud, but you interpolate the normals and then apply the illumination equation for each pixel.

This gives much nicer highlights without needing to increase the number of polygons, as you are sampling at every pixel.

Phong Illumination Model

Diffuse reflection and specular reflection.

: Components of the Phong Model (Brad Smith, http://commons.wikimedia.org/wiki/File:Phong_components_version_4.png)

(Source: COMP3421, Lecture Slides.)

$latex I_s = I_l k_s \cos^n \left ( \alpha \right )$

n is the Phong exponent and determines how shiny the material (the larger n the smaller the highlight circle).

Flat shading. Can do smooth shading with some interpolation.

If you don't have vertex normals, you can interpolate it using the face normals of the surrounding faces.

Gouraud interpolates the colour, phong interpolates the normals.

Attenuation

inverse square is physically correct, but looks wrong because real lights are not single points as we usually use in describing a scene, and

For now I assume that all polygons are triangles. We can store the normal per polygon. This will reneder this polygon, but most of the time the polygon model is just an approximation of some smooth surface, so what we really want to do is use vertex normals and interpolate them for the polygon.

Ray Tracing

For each pixel on the screen shoot out a ray and bounce it around the scene. The same as shooting rays from the light sources, but only very few would make it into the camera so its not very efficient.

Each object in the scene must provide an intersection(Line2D) function and a normal (Point3D) function

Ray Tree

Nodes are intersections of a light ray with an object. Can branch intersections for reflected/refracted rays. The primary ray is the original ray and the others are secondary rays.

Shadows

Can do them using ray tracing, or can use shadow maps along with the Z buffer. The key to shadow maps is to render the scene from the light's perspective and save the depths in the Z buffer. Then can compare this Z value to the transformed Z value of a candidate pixel.

==============

Rasterisation

Line Drawing

DDA

You iterate over x or y, and calculate the other coordinate using the line equation (and rounding it).
If the gradient of the line is > 1 we must iterate over y otherwise iterate over x. Otherwise we would have gaps in the line.
Also need to check if x1 is > or < x2 or equal and have different cases for these.

Bresenham

Only uses integer calcs and no multiplications so its much faster than DDA.
We define an algorithm for the 1st octant and deal with the other octant's with cases.
We start with the first pixel being the lower left end point. From there there are only two possible pixels that we would need to fill. The one to the right or the one to the top right. Bresenham's algorithm gives a rule for which pixel to go to. We only need to do this incrementally so we can just keep working out which pixel to go to next.
The idea is we accumulate an error and when that exceeds a certain amount we go up right, then clear the error, other wise we add to the error and go right.

We use Bresenham's algorithm for drawing lines this is just doing linear interpolation, so we can use Bresenham's algorithm for other tasks that need linear interpolation.

Polygon Filling

Scan line Algorithm

The Active Edge List (AEL) is initially empty and the Inactive Edge List (IEL) initially contains all the edges. As the scanline crosses an edge it is moved from the IEL to the AEL, then after the scanline no longer crosses that edge it is removed from the AEL.

To fill the scanline,

On the left edge, round up to the nearest integer, with round(n) = n if n is an integer.
On the right edge, round down to the nearest integer, but with round(n) = n-1 if n is an integer.

Its really easy to fill a triangle, so an alternative is to split the polygon into triangles and just fill the triangles.

===============

Anti-Aliasing

Ideally a pixel's colour should be the area of the polygon that falls inside that pixel (and is on top of other polygons on that pixel) times the average colour of the polygon in that pixel region then multiply with any other resulting pixel colours that you get from other polygons in that pixel that's not on top of any other polygon on that pixel.

Aliasing Problems

Small objects that fall between the centre of two adjacent pixels are missed by aliasing. Anti-aliasing would fix this by shading the pixels a gray rather than full black if the polygon filled the whole pixel.
Edges look rough ("the jaggies").
Textures disintegrate in the distance
Other non-graphics problems.

Anti-Aliasing

In order to really understand this anti-aliasing stuff I think you need some basic understanding of how a standard scene is drawn. When using a polygon rendering method (such as is done with most real time 3D), you have a framebuffer which is just an area of memory that stores the RGB values of each pixel. Initially this framebuffer is filled with the background colour, then polygons are drawn on top. If your rending engine uses some kind of hidden surface removal it will ensure that the things that should be on top are actually drawn on top.

Using the example shown (idea from http://cgi.cse.unsw.edu.au/~cs3421/wordpress/2009/09/24/week-10-tutorial/#more-60), and using the rule that if a sample falls exactly on the edge of two polygons, we take the pixel is only filled if it is a top edge of the polygon.

: Anti-Aliasing Example Case. The pixel is the thick square, and the blue dots are samples.

No Anti-Aliasing

With no anti-aliasing we just draw the pixel as the colour of the polygon that takes up the most area in the pixel.

Pre-Filtering

We only know what colours came before this pixel, and we don't know if anything will be drawn on top.
We take a weighted (based on the ratio of how much of the pixel the polygon covers) averages along the way. For example if the pixel was filled with half green, then another half red, the final anti-aliased colour of that pixel would determined by, Green (0, 1, 0) averaged with red (1, 0, 0) which is (0.5, 0.5, 0). If we had any more colours we would then average (0.5, 0.5, 0) with the next one, and so on.
Remember weighted averages, $latex \frac{Aa + Bb}{A + B}$ where you are averaging $latex a$ and $latex b$ with weights $latex A$ and $latex B$ respectively.
Pre-filtering is designed to work with polygon rendering because you need to know the ratio which by nature a tracer doesn't know (because it just takes samples), nor does it know which polygons fall in a given pixel (again because ray tracers just take samples).
Pre-filtering works very well for anti-aliasing lines, and other vector graphics.

Post-Filtering

Post-filtering uses supersampling.
We take some samples (can jitter (stochastic sampling) them, but this only really helps when you have vertical or horizontal lines moving vertically or horizontally across a pixel, eg. with vector graphics)
$latex \left ( \frac{6}{9} \right )$ of the samples are Green, and $latex \left ( \frac{3}{9} \right )$ are red. So we use this to take an average to get the final pixel colour of $latex \begin{pmatrix}\frac{1}{3}, & \frac{2}{3}, & 0\end{pmatrix}$
We can weight these samples (usually centre sample has more weight). The method we use for deciding the weights is called the filter. (equal weights is called the box filter)
Because we have to store all the colour values for the pixel we use more memory than with pre-filtering (but don't need to calculate the area ratio).
Works for either polygon rendering or ray tracing.

Can use adaptive supersampling. If it looks like a region is just one colour, don't bother supersampling that region.

OpenGL

Often the graphics card will take over and do supersamling for you (full scene anti aliasing).

To get OpenGL to anti-alias lines you need to first tell it to calculate alpha for each pixel (ie. the ratio of non-filled to filled area of the pixel) using, glEnable(GL_LINE_SMOOTH) and then enable alpha blending to apply this when drawing using,

glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);

You can do post-filtering using the accumulation buffer (which is like the framebuffer but will apply averages of the pixels), and jittering the camera for a few times using accPerspective.

Anti-Aliasing Textures

A texel is a texture pixel whereas a pixel in this context refers to a pixel in the final rendered image.

When magnifying the image can use bilinear filtering (linear interpolation) to fill the gaps.

Mip Mapping

Storing scaled down images and choose closes and also interpolate between levels where needed. Called trilinear filtering.

Rip Mapping helps with non uniform scaling of textures. Anisotropic filtering is more general and deals with any non-linear transformation applied to the texture

Double Buffering

We can animate graphics by simply changing the framebuffer, however if we start changing the framebuffer and we cannot change it faster than the rate the screen will display the contents of the frame buffer, it gets drawn when we have only changed part of the framebuffer. To prevent this, we render the image to an off screen buffer and when we finish we tell the hardware to switch buffers.

Can do on-demand rendering (only refill framebuffer when need to) or continuois rendeing (draw method is called at a fixed rate and the image is redrawn regardless of whether the image needs to be updated.)

LOD

Mip Mapping for models. Can have some low poly models that we use when far away, and use the high res ones when close up.

Animation

Key-frames and tween between them to fill up the frames.

===============

Shaders

OpenGL 2.0 using GLSL will let us implement out own programs for parts of the graphics pipeline particularly the vertex transformation stage and fragment texturing and colouring stage.

Fragments are like pixels except they may not appear on the screen if they are discarded by the Z-buffer.

Vertex Shaders

position tranformation and projection (set gl_Position), and
lighting calculation (set gl_FrontColor)

Fragment Shaders

interpolate vertex colours for each fragment
apply textures
etc.

set gl_FragColor.

Tags: comp3421, computing, graphics.

Human Computer Interaction Notes

2nd December 2009

These notes are based around my COMP3511 course.

Interaction Design (+Scenarios)

Interaction Design is about creating user experiences that enhance and augment the way people work, communicate, and interact.1
Interaction Design has a much wider scope than Human Computer Interaction. ID is concerned with the theory and practice of designing user experiences for any technology or system, whereas HCI has traditionally been focused on/surrounding humans and computers specifically.
Interaction Design involves understanding the requirements.
Requirements can be functional (what should it do) or non-functional (what are the constraints). The usability principles (heuristics) fit into the non-functional requirements.
User Profiles are a set of persona's. A persona is a short description of a fictional user.
Scenarios
- Activity Scenario (narrative based on user's requirements)
  - Used at the beginning of the design process.
  - Who, When, What, Why
  - Not technical/no specific details (re: does not presuppose the interface)
  - From the users perspective
- Use Case Scenario (narrative of how the user uses the interface to fulfil their goal)
  - Include the users goals but focus on the user/computer interaction (re: talk about the technology)
  - Basically is a description of the use case diagram.
  - Do these scenarios after you figure out the requirements
  - Different users would have different use cases, we can show this with a use case diagram which shows the actors and the various use case's that they encounter.
- Task Scenario
  - Used when running a usability test. ie. give the participant a scenario before giving them a task to give them context.
- When describing a scenario, give the users goals, their context and situation. Use a narrative form.

Cooper et al. describe the process of interaction design as,

Identifying needs and establishing requirements for the user experience.
Developing alternative designs that meet those requirements.
Building interactive versions of the designs so that they can be communicated and assessed.
Evaluating what is being built throughout the process and the user experience it offers.

Scenarios are narratives about named people with an age. We need some background to understand where they are coming from (for instance their cultural background (eg. the US uses MM/DD/YYYY but Australia uses DD/MM/YYYY)). We try to remove incorrect assumptions about what we think a certain group of people are like. The scenario should explain their motivations and their goals.

Usability

Usability is all about producing things which are usable. Where something is usable when it meets these usability goals, however you should work out which goals are most important for the problem and focus on those first.

Usability Goals

easy to learn
easy to remember how to use
effective to use
efficient to use
safe to use
have good utility (providing the right kind of functionality to allow the user to achieve their goal)

User Experience Goals

satisfying
fun
helpful
motivating
universal access (accessibility)

Heuristics (Usability Principles)

visibility of system status (eg. busy mouse icon)
match between system and real world (includes interface metaphors. eg. files and folders concept, "speak the user's language" (avoid gargon that users may not understand))
user control and freedom (includes letting the user cancel/exit. eg. can pause/cancel file transfers)
consistency and standards (eg. consistent terminology, consistent workflows, common look and feel, GUI toolkits. eg. GNOME/GTK+)
help and documentation
help users recognise, diagnose and recover from errors (tell users what the problem was, why it happened and some possible solutions, using plain English. eg. recover from trash)
error prevention (eg. move to trash first)
recognition rather than recall (GUI applications menu as opposed to CLI)
flexibility and efficiency of use (eg. keyboard shortcuts (helpful for experts, but hidden from novices))
aesthetic and minimalist (uncluttered) design (maybe put rarely used info into a help page/manual rather than in the interface)

Design Principles

visibility
feedback (can be audio, visual...)
affordances (clues on how to use. eg. ⋮∶affords grabable)
consistency (includes look and feel consistency)
mapping
- eg. which light switch controls which light
constraints
- logical (eg. grey out menu options that are not allowed)
- physical (eg. you can't plug a USB cable into a VGA port, c.f. you can put a DVD in a CD player)
- cultural

When designing a system we need to consider,

who are the users,
how will the product be used,
where will the product be used

Identifying Needs

Requirements

When testing an interface with users/test participants, give them a high level goal and observe how they go about doing it. Don't give them specific instructions.

Use Scenario 1: For each task identified (or major tasks, or particularly special tasks if many tasks are defined), write a description of how that user would accomplish the task independent of how they would complete it within the application.

Use Case 1: If a use scenario has been implemented, include a matching use case which describes how the task use scenario can be completed in the application. There may be branching or multiple ways to complete the task, and this is a good way to document it.

To test if something is a requirement just ask, "If I remove this, will the product still fulfil its purpose?"

Design Conceptualisation

A conceptual model is a high-level description of how a system is organised and operates. --Johnson and Henderson, 2002, p. 26

I like to think of it as this. The person coding the web browsers understands that when the users types in a URL and presses enter an HTTP GET request is sent and the response is received and the HTML is processed and displayed. There are many technical interactions and details that are happening here. But the conceptual model of this is what the average non-technical uses thinks is happening. They just have some kind of model in their head that they type the URL hit enter and get the web site displayed. Its just an abstraction of what is going on.

Interface metaphors are used as they can help the user understand and determine how an interface works. We try to use them for this purpose but just using the metaphor directly can have some negative affects (eg. if your designing a radio application for desktop PC's, it may not be a good idea to just show an image of a real radio as the UI). We don't want to use the metaphor to an extent that it breaks the design principles.

A classic example of a conceptual framework is that of the relation between the design of a conceptual model and the user's understanding of it. In this framework there are three components, (Sharp et al., 2006)

The designer's model - the model the designer has how how the system works (or how it should work)
The system image - how the systems actual workings are portrayed to the users through the interface, manuals, etc. (or how it is presented)
The user's model - how the user understands the system works (or how it is understood)

[caption id="attachment_772" align="aligncenter" width="361" caption="Conceptual Framework (from Norman, 1988)"][/caption]

The designers job is to create the system image so that the users will invoke the same conceptual model as the designer's.

"A good conceptual model allows us to predict the effects of our actions." (Norman, 1988)

The interface could be made more transparent so the user can see the details of how the system works, but this is not always desirable as it may cause confusion. Also many users may not want to know all the gory details, nor should they have to know the actual implementation in order to use the system.

You can conceptualise how a user interacts with a system in terms of their goals and what they need to do to achieve those goals.

You can try to give the user a more correct mental model of the system by giving useful feedback based on their input and providing help and documentation.

Prototyping

Can do low fidelity or high fidelity prototypes.
Could use paper mockups of screens, storyboards, electronic mockup, electronic prototype...
Make sure you iterate.
"the best way to get a good idea is to get lots of ideas" --Linus Pauling

Using A Design Diary

Brainstorm ideas
Sketch interface mockups
Sketch storyboards/work flow diagrams

Wireframes

Here is an example wireframe.

[caption id="" align="aligncenter" width="445" caption="Example Wireframe from https://wiki.ubuntu.com/DesktopExperienceTeam/KarmicBootExperienceDesignSpec"][/caption]

Another paper prototype with a slightly higher fidelity.

[caption id="attachment_850" align="aligncenter" width="450" caption="An example paper prototype (from https://wiki.ubuntu.com/SoftwareStore)."][/caption]

Issues Table

In this course we list the issues vertically and the participants horizontally.
Prioritise and group the issues. (Maybe use affinity diagramming for the grouping)

Usability Testing

Can do interviews, questionnaires, usability tests (best to run a dry run of these before you start testing on many people), interaction logging...
The purpose of a usability test is to gather feedback from potential users about usability issues as well as ensuring that an interface can be used and works as expected.
Testing should be done throughout the whole process during prototyping, beta versions, and deployed applications.
According to Nielson you only need test an interface with 5 people to find 80% of the issues (see Nielsen, Usability Engineering, p173) (however to publish research 5 is statistically too small so you should use at least 10).
When planning a test you need to define scenarios and tasks as well as deciding what to test and how to collect the results. Its a good idea to have goals that try to measure the success of the usability principles.
Test the parts of your interface which would be used most, as well as any particularly difficult to design aspects first.

There are some legal and ethical issues to consider. The participant,

needs to sign a consent form for you to run a test with them.
is free to stop participating at any time.
must be made aware of how you are observing them and what will be done with data collected. eg. is the session been recorded on video, audio, observed by people, screen captured...? Will the data be antonymous, will the anonymous results be published...
should be made aware that you are testing the software, not them.

During a Usability Test,

Avoid leading questions. (eg. try to avoid "How much do you like this interface?")
When running a usability test be careful not to bias your results. For example instead of asking the user "How would do X? when there is a button "Do X", give them a scenario which has a goal and ask them how they would go about achieving this with the interface.
You want the participant to "think aloud" so that you know what they are thinking when they are using the interface you are testing.
If users are struggling give them a hint, if that doesn't help explain the expected solution and move on, but note that they needed assistance when recording the test data.
If a task is difficult for the user, its not the users fault, its the interface's!
We want to record things like time to complete the task, amount and nature of errors encountered and by who... Things that address the usability principles. You should record both these quantitative measurements and any qualitative things that you observer or the participant mentions.

After the Testing,

Collate feedback and test data (Use an issues table to record the usability issues that the participants had.)
Group issues and prioritise them.
When analysing results consider,
- If a user is asked to compare two interfaces, the results may bias towards the second as they learn from their first experience.
  - Can try to solve this by getting some participants to do A then B, and others B then A.
- Observing how a user interacts with an interface may change how they interact with it. (ie. they may have done things differently if they were at home, without you scrutinising their every move).
  - We can try to avoid this by making the participants feel comfortable and reinforcing that we are not testing them we are testing the interface. Assure them that there are no incorrect users and don't avoid doing things just because you know we are taking notes.

Usability Testing

When actually running a usability test you should follow a usability test plan. The test plan just details what the test facilitator should do during the test.

The usability testing procedures we used in this course are:

Explain procedures (think aloud, data collection methods in use...)
Make sure they agree and sign a consent for before proceeding (you keep one, they keep one)
Run a pre-test questionnaire (used to generate a participant profile) (this helps to give you an idea on their experience level, as well as any background they may have in using similar interfaces before, as these affect how the participant performs) (best to get the participant to do this a few days before the test so that you can focus on specific tasks.)
Introduce scenario
Run through tasks
Ask any post test questions
Do they have any extra comments/debriefing
Thank them for their time

Interviews

Can be open ended (participant gives longer responses which may include their reasoning) or closed (list of options participant chooses from).
When running an interview give,
- An introduction to the interview (what you are doing, purpose, what happens to the responses, how it it being recorded)
- Warm-up questions
- Main section. (use a logical sequence)
- Cool-off questions
- Closing remarks.

Questionnaires

Participant Sample (You probably want a sample representative of your users/target users).

User Centred Design Process

The UCD process is all about focusing on the users and tasks. It also means iterate your designs often. The development is driven by users needs rather than technical concerns.

More specifically Gould and Lewis (1985) give three principles,

Early focus on users and tasks.
Empirical measurement.
Iterative design.

Affinity Diagramming

This is where we collect ideas and then group them.
Don't use pre-defined groups, make them up as you start sifting through the ideas.
The idea is to organise a bunch of individual ideas into a hierarchy.

Card Sorting

Get a bunch of people to sort cards with some idea/concept/whatever into categories and then compare how they were sorted by the different participants.

Software Lifecycles

Waterfall
W-model
Spiral
Rapid Application Development
Star Model (evaluation at each step)
ISO 13407 (the HCI model goes around in a circle and only exits when satisfactory)

Cognitive Load Theory

Cognition is what goes on in our brains. It includes cognitive processes such as,

attention
- Some techniques are particularly good at grabbing attention (flashing, moving, colourful or large things). But these should be used sparingly.
perception and recognition
- Gestalt principles of perceptual organisation. ie. we group things by proximity, similarity, closure, symmetry and continuity. [caption id="attachment_852" align="aligncenter" width="389" caption="Gestalt principles of perceptual grouping. (a. you see two groups the left two and the right two; b. you see the objects in columns rather than rows; c. you group the () braces together; d. you see two curves and most people probably see the same two curves (as opposed to sharp edges that meet at the vertex))"][/caption]
memory
learning
reading, speaking and listening
problem-solving, planning, reasoning and decision-making.
automatic processes
- Stroop effect (trying to say the colour not the words eg. red green blue orange purple pink) is due to this.

Some Cognitive Load Theory

Huge long term memory (with the information stored in schemas) and a limited working memory.
- Schemas allow us to bypass our working memory limitations by chunking information.
- They allow us to ignore the huge amount of detail coming from our senses and instead integrate with our existing schemas.
- eg. its much easier to memories SYSTEM than YSMSTE.
- "Automated Schemas"
Worked Examples instead of Means-Ends Analysis
- Its better to give new users a quick (or even not so quick) 'worked example' of how the interface works/how to use it, than just let them work it out for themselves.
Split Attention Effect
- e.g. "See figure 16.", "Refer to page 26"... "Requires us to mentally integrate information that is physically split."²
- Solution physically integrate the material.
- Don't force users to recall information from a previous screen.
The Redundancy Effect
- It is better to emit redundant information as it generally just confuses people.
Expertise Reversal Effect
- Its better to assume your audience is novice, if you are unsure.
- Novices need lots of worked out examples
Reduce Search (reduces cognitive load)
- By using a consistent layout
- By reducing visual clutter
Diagrams can reduce cognitive load
Modality Effect
- Separate working memories for audio and visual senses.
- Therefore presenting information visually and auditory allows for a greater total working memory size than just presenting it visually or auditory. (But we should consider users with disabilities, so taking advantage of this by presenting some information visually and some auditory is not a good idea)

Some HCI Applications

The Training Wheels approach involves restricting the features of an interface for novices until they become more experienced when advanced features are enabled.

Memory

(From a psychologists perspective).

Memory is based on your context (eg. night, bed, tired, dark, dream... ask them to recall they will often recall sleep even though it was never mentioned). Give context before this will help them store the information and recall it better.

Miller's theory is that only 7± 2 chunks of information can be held in short-term memory at any one time. (But this doesn't mean say, only put seven items in any menu or something like that. This is only for short-term memory and when the information comes and goes, not when it can be rescanned.)

Long Term Memory

[caption id="attachment_794" align="aligncenter" width="360" caption="A Taxonomy of Memory"][/caption]

Explicit and Implicit Memory

"Imagine that you learn a list of items and are then required to recall or recognise them. This memory test would be accompanied by conscious awareness that you were remembering. Imagine that a considerable time later, a test of recall or recognition revealed no memory for the items. However if you were given the original list to relearn there would probably be some savings in learning time (i.e. you would take less time to learn the list the second time, oven though you were not aware of your memory of the items). This is the distinction between explicit memory, in which remembering is accompanied by either intentional or involuntary awareness of remembering, and implicit memory, in which remembering is not accompanied by awareness (Graf & Schacter 1985; Schacter 1987)." -- (Walker, "Chapter 9: Memory, Reasoning and Problem Solving". pg. 262 (sorry I don't have the title))

Not really related, but a good thing to hear a text book say,

"Finally, some long-enduring memories are for passages that our teachers have drulled into us... The Interesting thing about these memories is that they are preserved as they were memorised, in a very literal form, in exact wordings (Rubin, 1982). The memory code is not transferred from literal to semantic. In fact, the words are often remembered mechanically, with almost no attention to their meaning." --(Walker, "Chapter 9: Memory, Reasoning and Problem Solving". pg. 267 (sorry I don't have the title))

The method of loci.

The context that a memory is encoded, affects its retrieval. For example you may not initially recognise your neighbour on the train, as you are not used to seeing them there.
People are much better at recognising things than recalling things.

Obstacles to Problem Solving

Unwarranted Assumptions
- Example given in class. "A man married 20 women. yet he broke not laws and never divorced. How? He was a priest."
Seeing new Relationships
Functional Fixedness
- Being an expert at a system, you may not see things that novice would see.
- You avoid using things in an unconventional way.
- New users may find new or unintended uses of the system.
The Set Effect
- We may not notice that two similar problems actually need to be solved in different ways.

External Cognition

People use external representations to extend or support ones ability to perform cognitive activities. For example, pens and paper, calculators, etc. We do this to,

reduce memory load
- eg. post-it notes, todo lists. But the placement of say post-it notes is also significant.
offload computation
- eg. pen and paper to solve a large arithmetic problem (the mechanical kind).
annotate
- modifying or manipulating the representation to reflect changes

Experts vs. Novices

What distinguishes an expert is their large knowledge based stored in schemas.
Declarative (facts)/procedural(how to do) knowledge.
Skill acquisition.
- Cognitive stage (learn facts, encode declarative knowledge),
- Associative stage (procedural knowledge),
- Autonomous stage.
Novices tend to use ends-means analysis (uses a lot of trial and error) to solve problems. Experts tend to use their knowledge stored in schemas to apply and solve the problem (ie. past experience).
In software can have novice (limited functionality) and expert modes. (Could be different applications Photoshop Elements vs. Photoshop Pro, or just hide advanced functionality for novices by default eg. >Advanced Options which is clicked to show more functions.)
IDEA: Could provide popup hints to intermediate users to explain expert functions (eg. what's going on under the hood), or more advanced options (eg. keyboard shortcuts).

Visual Design

Alignment of items in an interface makes it easier for users to scan the screen.
Grouping
Colour
Gestalt Principles
Menu design (see ISO 9241)

Three types of icons,
- similar (eg. a file for a file object)
- analogical (eg. scissors for cut)
- arbitrary (eg. X for delete or close)
Can add text near the icon to make it easier for newbie's, but allows expert to ignore and just glance at the icon.

Internationalisation

Differences around the world,

character set
keyboard layout
text direction
language
icons
date, time, currency
calendars

Internationalisation (i18n) refers to designing and developing a software product to function in multiple locales. Localisation (L10n) refers to modifying or adapting a software product to fit the requirements of a particular locale. This could include translating text, changing icons, modifying layout (eg. of dates).⁵

A locale is a set of conventions affected or determined by human language and customs, as defined within a particular geo-political region. These conventions include (but are not necessarily limited to) the written language, formats for dates, numbers and currency, sorting orders, etc.⁵

Accessibility

Some clauses relating to requirements for Australian web sites in Australian Disability Discrimination Act (1992).

Quantification

A way to test an interface different to usability testing.

GOMS

Goals (eg. send an email)
Operators (eg. double click)
Methods (recalling what to do/how to do)
Selection Rules (deciding which method to use to achive the goal)

Keystroke Level Model

K (keying) - 0.2s - Press (and release) a keyboard key, or click the mouse. (Click and drag is only 1/2 K).
P (pointing) - 1.1s - Moving the mouse to a location on the screen.
H (homing) - 0.4s - Moving between the keyboard and mouse.
M (mentally preparing) - 1.35s - Preparing.
R (computer responding) - Waiting for the computer to respond.

Fitt's Law

In the field of ergonomics research.
Used to predict the time to move the cursor to a target.
$latex T = A + B \times log_2 \left ( \frac{D}{S} + 1 \right )$
A and B are experimentally determined constants. (Raskin used A = 50, B = 150).
D is distance between start and target
S is size of target (just dealing with 1 dimension here).
Lesson: The larger the target the faster one can move the mouse to that location.
Lesson: Targets at the edge of the screen have an infinite size, so they are fast to navigate to. (Problem, if you use edge bindings a lot your mouse will physically move further and further away, so the user may need to be constantly picking it up moving it)

References

[1] Sharp, Rodgers, Preece. (2006) Interaction Design: Beyond human computer interaction. 2nd Ed.

[2] Marcus, Nadine. (2009) COMP3511 Cognitive Load Theory Lecture Slides.

Woo, Daniel. (2009) COMP3511 Lecture Slides.

Norman, Donald. (1988) The Design of Everyday Things.

[5] http://www.mozilla.org/docs/refList/i18n/

Tags: comp3511, computing.

A Review of Gnome Shell from my Perspective (and a Comparison with Compiz)

26th September 2009

I tried out Gnome Shell today. (And it didn't break everything! I followed their instructions it build and ran fine, and when I killed it, my normal environment with the normal Gnome Panel and Compiz... went back to normal.) Its shaping together nicely, there are many good things and I think its a great effort by everyone behind it. (But just a warning I don't know all the technical things behind everything here, so please excuse me if I miss something or don't use the correct terminology. This review is just from my perspective/my view. It is not a proper usability evaluation, nor have I looked and which is better engineered or anything too technical.)

[caption id="attachment_812" align="aligncenter" width="450" caption="My Desktop Running Gnome Shell"][/caption]

[caption id="attachment_814" align="aligncenter" width="450" caption="My Desktop Running Gnome"][/caption]

The obvious difference is there is no bottom panel in Gnome Shell and the top panel is different (but its still in development of course so in a later version they may make more use of it).

My Current Work-flow

Window Management

Normally in Gnome I use Compiz a lot to help me manage my open windows. Compiz/Compiz Fusion has a lot of plugins, but over time I've found a few which I really like and I use all the time.

If I have a bunch of windows in one workspace and I want to switch to another I usually use Scale (shortcut of Super + Tab), although I still sometimes use the bottom taskbar, and I always use that taskbar when the window is minimised (Because Compiz can't access minimised windows pixmaps so they don't appear in Scale unfortunately. This is a real killer.). I can also right click on a window in this view to close it. This makes it really easy and fast to kill a heap of windows that I have finished with. This makes my search space when changing windows much less and hence much easier.

[caption id="attachment_818" align="aligncenter" width="450" caption="The Compiz Fusion Scale Plugin"][/caption]

To change workspaces I use Expo (shortcut of Super + E). But I don't actually use more than I workspace all that often, even though I think I should be. The other great thing is I can drag windows from one workspace to another while in Expo.

[caption id="attachment_819" align="aligncenter" width="450" caption="Compiz Expo Plugin"][/caption]

Some other shortcuts I use for window management very frequently are,

Alt + Left Mouse to move a window (with the great wobbly windows effect)
Alt + Middle Mouse to resize a window.
Alt + Right Mouse to close a window.
Super + Scroll to zoom in.
Ctrl + Alt + (1-9) on the keypad to place a window in a grid. This is great for getting say a terminal to run your program next to the editor with the code. This gives me the benefits of a tiling window manager such as xmonad (although changing the window focus between two side by side windows is not as easy as it would be in xmonad)/an arrangement similar to what you can get using Terminator.
Super + Shift + (Up Arrow/Right Arrow) to extend a window to the maximum extents in a vertical/horizontal direction.

I keep making refinements to this, but it works very well for me as it is.

Application Starting

In Gnome I use the Panels Run Application dialogue (see pic) (with the shortcut Alt + ) and the terminal (with the shortcut Ctrl +) to start new applications. Those shortcuts really make things easier and faster.

[caption id="attachment_815" align="aligncenter" width="450" caption="The Panel's Run Application Dialogue"][/caption]

The run dialogue is good. I can run programs like firefox, gedit just as you would in the terminal but it means I don't have to have a terminal open or open one first (its all amount maximising efficiency, so I can get to where I want to be as fast as possible). Also I can enter locations such as /etc/whatever and Nautilus will be opened to that location. That text box has tab completion (and it actually shows the suggestions) which makes things easier and faster.

In Gnome Shell

[caption id="attachment_821" align="aligncenter" width="450" caption="Gnome Shell Activity Mode (sorry, I'm not sure what its actually called)"][/caption]

Window Management

In Gnome Shell (it uses Metacity not Compiz) you can do all your window management and application starting through the Activities mode. Which can be started either by the Super key, clicking Activities, or dragging the mouse to the top left edge (although it seems I must go to the exact 0,0 pixel not 0,1 or 1,0 which is a bit annoying). This is good it gives the user some choice they may happen to have their hand near Super so they use that, or they may only using the mouse so they can use that (actually I will set up Compiz Scale to work with both Super Tab and a top left mouse move).

On the down side, Gnome Shell did not seem to be as fast and responsive as similar Compiz tools. What I mean is that on my system where the Scale tool is fast, as in the windows move smoothly and quickly, when I go into the Activities mode its has a small delay (less than a second, but its still annoying) and its seems a bit jumpy and jerky when everything is moving. But of course its still in development so I'm not going to criticise this. Apart from this, it seems just like Compiz Expo + Scale together. This activity mode window management is good, but there are some small things like I can't seem to close windows from this activity mode (like I can in Compiz's Scale), but I can move windows from one workspace to another in Gnome Shell just like in Compiz's Expo. Also it can also be annoying to have Scale and Expo mixed together (of course I can just just Alt + Tab or move windows around so I can focus on another, but I don't really like that idea).

Unlike Compiz/Gnome's multiple workspaces, in Gnome shell you can add these dynamically. Which I think is a better idea than the static type that normal Gnome/Compiz uses.

[caption id="attachment_822" align="aligncenter" width="450" caption="Gnome Shell allows you to dynamically add/remove workspaces"][/caption]

Things seems to be shifting towards emphasising multiple workspaces. What I need to try to remember to do is USE these multiple workspaces, grouping windows together where they group nicely, instead of just putting everything in one workspace. Window managers could help me with this, like they could remember that I often have Firefox on workspace 2, so when I run it automatically put it there and switch to workspace 2. I haven't tried this, so I don't know if it would help me, or just frustrate me by doing what I don't want every time. I'm not even sure if Compiz can do this anyway.

I'm not sure where dock's like Avant or Cairo fit in, but I never really found them to make things easier.

Application Starting

The other noticeable thing in Gnome Shell is that bar on the left. In normal Gnome you have your menu bar which has Applications, Places and System (which I wish I could easily shorten to Apps, Places and Sys to save space). Given I have this new user thing on the right where I can shutdown/logout/suspend/hibernate... from the only real thing I use System for is the Preferences and Administration. Yet I can never remember if what I want is in admin or pref. I recently discovered this system preferences thing which just puts it all in one window categorised into appropriate groups. I'm sure some find the two lists easier and some find the single window easier. When I scan with my eyes in a list I just go up/down, but when I scan a grid my eyes wander all over the place with no apparent system. As such its probably a more random search than a well defined one. There is heaps of things you could test out (we looked at some in my HCI course) to try to make the grid layout faster but nonetheless I think I like the grid better.

I use the Places bar often, and I think the Gnome Shell implementation makes things easier as they are listed in two columns, unlike traditionally where the number of bookmarks is limited and I need to navigate to a sub-menu to show them all. It seems I can't change the size proportions of those three sections on the side, but again its still in development. You could look at this a number of ways but because the panels are gone, if you are using a full screen application you can focus on that, with nothing cluttering the edge or distracting you from your task at hand. Traditionally everything in layered down, you have panels, then window decorations then menu bars, status bars, tabs (in Firefox), removing all that so that you just have the task at hand in your vision can be a great thing (yes I know there is a full screen feature in Firefox, and you can set Gnome panels to hide). When you are working in a browser its up to the web site (unless you have the time to do some Greasemonkey scripts) to allow you to again remove outside clutter, yet many application-like web sites allow you to do this (Alt + Shift + G when editing in Wordpress, u in Google Reader (to some degree)). Anyway that is moving away a bit from the topic of this post.

At the bottom of the left bar, you have recent documents. I use recent documents very very rarely (as in the shortcuts to them, not the documents themselves). Although I still think that a well designed system for access to recent documents integrated with some kind of search capability would be very useful for me, and I would use it often. However I am yet to find such a system that I like. The concept in my mind is something like the Lifestream design that Wei Zhou blogged about. An interface where time is on the horizontal axis, where you could change the scale and location of this view easily, view related things such as the weather for that particular time, your location if you have a GPS enabled laptop, etc. Also it should be integrated with a good filter feature (anything such as file type, file size, location, tags...) that lets you narrow down your search space. Something like that is what I have in my mind as a great use of a "recent documents" feature. GNOME Zeitgeist looks like it may address some of this.

Lastly the top section is the application launcher.

[caption id="attachment_825" align="aligncenter" width="450" caption="Gnome Shell Menu"][/caption]

The actual menu in some ways is much better than the normal Gnome menu. Larger icons and a short description of the application are good. When I open the Gnome menu bar, I never need to see what's on my screen in order to make my selection from the menu bar (and if I forgot what I wanted to start I can always close it then open it again). You have the whole screen so you may as well use it, and Gnome Shell seems better in this respect. The bad thing is I don't like the use of pages. If not everything will fit on one column, you have to change the pages at the bottom. Instead you should be able to scroll through the options with the mouse wheel, or the ones that don't fit go in another column to the right (like Windows XP can do, and yes I used to use Windows XP).

The search box above this doesn't behave like the traditional Gnome Panel's Run Application dialogue. For example I can't type a file path, and tying gedit then enter won't take me where I want to go (gedit). Instead it takes me to some other entry I have defined in the menu bar. Now I can see some reasons why this could be better. Really I want to launch any executable files in my $PATH, but a user who doesn't use the terminal probably doesn't want this. An option so that the user can choose how they want it to behave would be better, I think.

[caption id="attachment_826" align="aligncenter" width="450" caption="Gnome Shell's search box doesn't behave as I expected."][/caption]

Having all my icon application starters in the top Gnome Panel was nice but there is no reason those can't be added to Gnome Shell, but again it's still in development. Although now that I've been using the interface for an hour or so, I think that they may create more clutter. Actually I think I would prefer that that top panel bar in Gnome shell would only appear in the Activity mode (but still recognise the top left mouse gesture). Although this may be scary for newbie's (hey I got intimidated the first time I used Blackbox, I couldn't work out that right clicking on the desktop gave me a menu) so an option would be much better.

Anything thing I wanted to mention was, I use Firefox a lot, and a lot of the concepts and issues with window management can be applied to tabs in a browser. The folks over at Mozilla are working on this so I'm eager to see what they come up with, but as more and more things are done through HTML web pages, it just means I'm going to have more and more tabs open that I need to manage, and navigate. Like starting a new application in a desktop environment you often start a new task (web page/tab) in a web browser. I've been using Ubiquity for a while now at I find it really good. Although they are up to release 0.5, I'm still using 0.1.9rc6. Although I can think of many improvements, its still really efficient at starting new tasks.

Oh an in case you were wondering from my Screenshots there, I'm using the orange-theme (orange-theme - 1.3.0.jaunty.ppa2+nmu1) from https://launchpad.net/~bisigi/+archive/ppa/+packages.

Tags: computing, ubuntu.

SBS Playlist to RSS Feed Perl Script v2

10th July 2009

I have made some changes to my original script. This new perl script will scrape info from sbs.com.au and give an RSS feed of the items in the specified playlist. I only know of two playlists (94 = Latest Full Episodes, 95 = Preview Clips). Only one line needs to be changed to use the script to give the RSS feed of a different playlist. The major improvement is the items that are only available over RTMP now have the correct URL which was previously incorrect (but now the script runs slower as it has to grab more pages from the web). I use the url, http://player.sbs.com.au/video/smil/index/standalone/$item_code/ to find out the url details.

FLVStreamer appears to do a good job of downloading media over the RTMP protocol. Just use ./flvstreamer -r rtmp://file.flv > file.flv. Mozilla has an article on how to add protocol's to firefox here. But I didn't bother with that as the command is simple as it is, and building an app with a save as dialogue is beyond me for now, but I hope to learn that soon.

[Update: It seems that you also need to have the --swfUrl argument set ('http://player.sbs.com.au/web/flash/standalone_video_player_application.swf' works.). Also the perl script doesn't get the file name correctly (it uses the thumbnail image url, rather it should be using the url's given at the /video/smil/index pages).]

For local use the current format will probably be what you want, but in a production environment you probably want to have the script save the RSS file to disk and have people hit that RSS file with the requests. Just set the perl script to run every now and then. Unfortunately I can't seem to upload .pl files to Wordpress (I've put a link, but that will expire eventually)... I really need to get my own site.. There is so much customisation I would like to do and many experiments to try out on a live server, but the $$$'s are too much...

On another note I tried out EPIC (Eclipse Perl Integration), which was fairly simple to install. It seems much nicer than using a plain text editor and command line, especially the debugging abilities that it adds.

SBSPlaylistToRSSv0.2.1.pl

Tags: computing, rss.

Deque Implementation Design

29th June 2009

For Assignment 1 of COMP2911 we got the task of implementing a deque, using arrays and linked lists (in Java). Here is the design I used for the implementation.

ArrayDeque

This was quite a challenge.

One approach was to have an array of size capacity, and also store some left and right index pointers to know where the deque ranges in the array.

My first idea (actually inspired by my tutor) was basically, to add to the right of the deque you start at the left and push right, to add to the left of the deque you start at the right and push left. When the two ends collide, you make a new array larger keeping the left and right parts of the deque at the leftmost and rightmost parts of the array.

This turned out to have some problems. All worked well, except for a couple of things. Such as show here,

, because now you are left in the middle and your not just pushing straight from one side to another.

This next situation below was also troublesome because you have to be careful where you store your left/right index pointers. You need to carefully think of what if they overlap? And if they are equal or overlap, is your arrayDeque full or empty?

So what I did is look at the different ways to store the left/right index pointers. In all these diagrams orange means the location of the left index pointer, and the violet is the location of the right index pointer.

a) In this diagram I point to the next available space.

b) In this diagram I point to the location of the most recently added item. At the very beginning I loop them over.

All this just lead to much confusion for me and many bugs and problems. So I had a look at an entirely new approach. It turned out much better, and I learnt that if things aren't working out and you just keep getting lots of bugs then sometimes trying another approach can really help out. I also learnt that its really hard to know the best way to do something right from the start you really have to try something and if its not working to well than just try something else and see how it goes. So here is what I finally did.

Initially I just ignored the fact that I had a limited size array, and that things are actually mod array size. I just had two index pointers (shown in orange and blue) which pointed to the next free space at the left and right ends of the deque. I kept these as ints and they I don't think of them as wrapping round. If you forget about the array based implementation and go back to the original problem of a deque you see that really all you need is a big long line with a left end and right end pointer. Now all you have to do is remember that whenever you are dealing with implementation level things such as accessing the array, to use index mod capacity, rather than the raw index value (which may be out of the array range). That and you need to have a check to know when to resize your array.

Under this scheme the number of items in your deque is rightIndex - leftIndex - 1, therefore your array is full if and only if (rightIndex - leftIndex - 1) >= capacity. (Where capacity is the size of the array). If this is true then you need to resize your array.

The method I choose was to simply shift the deque along (either direction) so that the leftIndex is at -1.

LinkedDeque

This was much simpler. Basically I made a link object that stored an item and a left and a right pointer (to other link objects). I would store the leftmost and rightmost link items for the deque and that is all. I guess I could have stored size as part of the deque object and updated it whenever new link items were added or removed to the deque, but as we were given no requirements to do it one way or another I made it so that the size method would go though the whole deque and do a summation every time it was called.

The only thing I really had to watch was to ensure that I kept the left and right pointers for each link item up to date with changes, and this was my primary source of bugs.

Tags: comp2911, computing.

Very Useful Ubuntu/Unix Commands

29th June 2009

Just some Ubuntu/Unix commands that I seem to find very useful but can be hard to remember at times.

PDF Concatenation

pdftk in1.pdf in2.pdf cat output out1.pdf

Its annoying that nautilus doesn't integrate this by default, so I could select some PDF's then right click and choose merge. Luckily I can do this with the fairly simple Nautilus Actions Configuration app. But this takes time and the average user doesn't have the time to research these things on how to do it themselves. Of course this could be extended further to match what you can do from the command line, such as choose some method of ordering the files, or choose to do weird things such as rotate every second page. Sure you want to keep things simple, so as a start it would be great to see something basic. There are probably even nautilus scripts in the repositories out there that do this exact thing, but you spend half your time finding them and installing them. I think this should be enabled by default, or put in some options page somewhere.

Truncate an mp3 file

mpgsplit input.mp3 [00:00:20-00:00:58] -o output.mp3

It appears that this doesn't re-encode so it seems to run very fast.

Mount an ISO

sudo mount -o loop ~/disk.iso /media/cdrom0

Strip audio from a video file

ffmpeg -ab 320k -i in.mov outaudio.mp3

Unfortunately when I left out the bit rate, it defaulted to 14K or something much lower than what the source file was using.

Search all files in the current directory for some string

find . -exec grep "searchthisstring" '{}' \; -print
find . -exec grep -q "searchthisstring" '{}' \; -print

Trim and nup pdfs

pdfnup --nup 2x2 --pages 26-140 --trim "1cm 1cm 1cm 1cm" infile.pdf

Wake from Suspend

sudo rtcwake -u -s 12600 -m on

... then you need to manually put the computer in suspend and 12600 seconds later the computer will resume.

Susped

pm-suspend

Shutdown

sudo shutdown -h now sudo shutdown -r now

... the -h means halt (shutdown), -r means reboot.

Tar and Gz all the files in a directory

tar -cvfz file.tar.gz *

Get file info from an HTTP server without downloading

wget -S --spider http://www.site.com/file

Tags: computing, linux.

COMP2121 Quick Summary on Particular Aspects

23rd June 2009

Von-Newman vs. Harvard Architecture.

Von-Newman has a single memory space, share for data and program instructions. Harvard Architecture has separate memory spaces for data and instructions (so you cannot execute from the data memory).

2's compliment

It is important to know that the hardware does all arithmetic in 2's compliment. It is up to the programmer to interpret the number as signed or unsigned.

To convert a number from 2's compliment, for example -45 in 2's compliment is 11010011, we can do something like this,

$latex 1 \times (-2^7) + 1 \times 2^6 + 1 \times 2^6 + 0 \times 2^5 +1 \times 2^4 + 0 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0 = -45$

To go the other way from say -1 to the 2's compliment form 11111111 we use that $latex 2^p - X$ formula. I'm not exactly sure how its supposed to work so I've hacked it to make it work.

If the number you wish to convert is negative, let $latex X = -n$, so that X is positive then take $latex 2^p$ where p is the number of bits you are using (say 8), then subtract X. If the number to convert is less than $latex 2^p$ (where p is the number of bits, say 8 ) then leave it as is and that in your 2's compliment.

Now that was complicated. But its the only way I can get that advertised $latex 2^p - X$ formula to work with the given set of sample data (as in that table above).

Sign Extension

Why do we need sign extension? We need it in order to do operations on numbers than have different bit lengths (the number of bits used to represent the number).

Decimal to Binary

From a human kind of approach to convert 221 to binary, we see that $latex 2^7 = 128$, that is 7 is the largest power of 2 less than 221, so we have $latex 1 \times 2^7$. That gives us 128, so we still have 93 (221-128) to go. We try $latex 2^6$, this is less than 93. So far we have $latex 1 \times 2^7 + 1 \times 2^6$. 29 left now, but $latex 2^5$ is greater than 29, so we put a zero in that digit, ie. $latex 1 \times 2^7 + 1 \times 2^6 + 0 \times 2^5$. If we go on we get $latex 1 \times 2^7 + 1 \times 2^6 + 0 \times 2^5 + 1 \times 2^4 + 1 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0$. Taking the coefficients of the $latex \times 2^x$ terms we get the number 221 in binary, 11011101.

We can convert hexadecimal to binary by going from hex to decimal then decimal to binary. For hex to decimal,

$latex \mbox{F23AC} = 15 \times 16^4 + 2 \times 16^3 + 3 \times 16^2 + 10 \times 16^1 + 12 \times 16^0 = 992172$ (where F23AC is in hex and 992172 is in decimal)

Operations on Signed and Unsigned Multi byte Numbers

add al, bl
adc ah, bh
does a + b, result in is a.

There are 3 multiplication operations, MUL (Multiply Unsigned), MULS (Multiply Signed) and MULSU (Multiply Signed with Unsigned). They each do this. Notice the result is stored in r1:r0.

Thus to do n*m = r where n is 2 bytes unsigned and m is 1 byte signed,
mulsu nl, m ;nl * (signed)m
movw rh:rl, r1:r0
mulsu nh, m ;(signed)nh * m
add rh, r0

We can also do 16bit * 16bit,

;* From AVR Instruction Set Guide, pg 99-100.
;* Signed multiply of two 16-bit numbers with 32-bit result.
;* r19:r18:r17:r16 = r23:r22 * r21:r20
muls16x16_32:
clr r2
muls r23, r21 ;(signed)ah * (signed)bh
movw r19:r18, r1:r0
mul r22, r20 ;al * bl
movw r17:r16, r1:r0
mulsu r23, r20 ;(signed)ah * bl
sbc r19, r2
add r17, r0
adc r18, r1
adc r19, r2
mulsu r21, r22 ;(signed)bh * al
sbc r19, r2
add r17, r0
adc r18, r1
adc r19, r2
ret

brge and brsh

brge is Branch if Greater or Equal, Signed. if ($latex N \oplus V = 0$) then branch. When you do cp Rd, Rr then brge, the branch will be taken if Rd $latex \ge$ Rr, where Rd and Rr are taken to be signed numbers.
brsh is Branch if Same or Higher. if ($latex C = 0$) then branch. When you do cp Rd, Rr then brsh, the branch will be taken if Rd $latex \ge$ Rr, where Rd and Rr are taken to be unsigned numbers.

Calculating Total Stack Space Needed

Draw a call tree, find the path with the most total weight, that total weight is the total stack size needed. Here is the sample question,

A C program consists of five functions. Their calling relations are shown as follows (the arguments and irrelevant C statements are omitted).
int main(void) {
  …
  func1(…);
  func2(…);
  …
}
int func1(…) {
  …
  func1(…);
  …
}

int func2(…) {
  …
  func3(…);
  func4(…);
  …
}
func1() is a recursive function and calls itself 15 times for the actual parameters given in main(). Both func3() and func4() do not call any function. The sizes of all stack frames are shown as follows.

main(): 200 bytes. func1(): 100 bytes. func2(): 400 bytes. func3(): 1,400 bytes. func4(): 300 bytes. How much stack space is needed to execute this program correctly? (3 marks)

There are three paths,

main() func1() func1() x 15 200+100+15x100 =1800

main() func2() func3() 200+400+1400 =2000

main() func2() func4() 200+400+300 =900

The path with the most total weight is main() > func2() > func3(), so this is the total stack space needed.

Nested Interrupts

(Source: Hui Wu's Lecture Notes)

Keypads with 'abc' 'def' ... buttons

These keypads where to enter b you need to press the abc button twice in succession, but wait to long at it will chose a. Here is a psudo algorithm that seemed to fit this,

.def reg = rN
.def reg = rM
.def count = rX

//passvalue means that we register the given value ie. abc abc wait > b

setup:
clr reg (to some value that is != to a key value) ;set to default
clr count
rjmp keyloop

keyloop:
  check pins for a key
  if no key pressed rjmp keyloop, else continue

  //key was pressed, and value is stored in key
  reset someTimeCounter
  if (key == reg) {
     inc count
     if (count == 3)
        passvalue(reg,count)
  }else{
     if (reg != default) ;so we don't initially passvalue
        passvalue(reg,count) ;send the last value
     reg = key ;store the new one
     count = 1
  }

rjmp keyloop

if someTimeCounter expires and count != 0 //(count up, so expires after time to wait for anymore keypresses) (check count != 0, because if its 0 then we never had any key pressed that we need to send)
   passvalue(reg,count)
   reg = default

Switch Bounce Software Solution

When a switch makes contact, its mechanical springiness will cause the contact to bounce, or make and break, for a few millisecond (typically 5 to 10 ms). Two software solutions are wait and see and counter-based.

If we detect it as closed, wait for a little bit and check again.
Poll the switch constantly. For each poll if the switch is closed increment the counter. If we reach a certain value in a certain time then the switch was closed (or button pressed).

Serial Communication (Start and Stop bit)

"[The] start bit is used to indicate the start of a frame. Without the start bit, the receiver cannot distinguish between the idle line and the 1 bit because both are logical one. A stop bit is used to allow the receiver to transfer the data from the receive buffer to the memory." (Wu, Homework 6 Solutions)

UART

(Source: Hui Wu, Lecture Notes)

Sample Q3a

(This code probably won't work and probably has errors (and maybe not just simple ones, but serious ones that mean that the logic is wrong))

.dseg
A: .byte 20 ;array of size 10, element size 2 bytes

.cseg
ldi XL, low(A)
ldi XH, high(A)

;add the contents of the array.
store 0
store 1
store 2
store 3
store 4
store 5
store 6
store 7
store 8
store 9

;find the largest value
ldi XL, low(A)
ldi XH, high(A)

;start with the 1st element of the array
ld r25, X+
ld r26, X+

ldi r20, 10 ;size of array
loop:
 cpi r20, 0
 breq endloop

 ld r21, X+
 ld r22, X+

 cp r25, r21
 cpc r26, c22
 brlo lowerthan
 ;we have a new max
 mov r25, r21
 mov r26, r22

 lowerthan:

 inc r20
 rjmp loop
endloop: rjmp endloop

.macro store
ldi r16, low(@0)
ldi r17, high(@0)

st X+, r16
st X+, r17
.endmacro

For some reason in my lecture notes I have "eg. fine 2nd or 3rd smallest or largest" so here is a modification to do something like that.

.dseg
A: .byte 20 ;array of size 10, element size 2 bytes

.cseg
ldi XL, low(A)
ldi XH, high(A)

;add the contents of the array.
store 0
store 1
store 2
store 3
store 4
store 5
store 6
store 7
store 8
store 9

;sort into accending
loopthough for the length of array, (by then we can be sure its sorted)
ldi r23, 10
largeloop:
 cpi r23, 0
 breq endlargeloop

 ;point X to the start of A
 ldi XL, low(A)
 ldi XH, high(A)

 ;start with the 1st element of the array
 ld r25, X+
 ld r26, X+

 ldi r20, 10 ;size of array
 loop:
 cpi r20, 0
 breq endloop

 ;the next value
 ld r21, X+
 ld r22, X+

 cp r25, r21
 cpc r26, c22
 brge gethan
 ;r22:r21 < r26:r25
 ;swap the order
 st -X, r26
 st -X, r25
 st -X, r22
 st -X, r21

 ld r24, X+ ;to change the X pointer
 ld r24, X+

 ld r25, X+
 ld r26, X+

 gethan:

 inc r20
 rjmp loop
 endloop:
endlargeloop:

inf: rjmp inf

.macro store
ldi r16, low(@0)
ldi r17, high(@0)

st X+, r16
st X+, r17
.endmacro

Tags: comp2121, computing.

COMP2121 - Wk05 - Interrupts

20th June 2009

To explain interrupts, Wu used an example of a network card that is downloading a file. The network card has a buffer, and only once this buffer is full (or data stream is complete) should the CPU then copy the contents from the buffer to the RAM. So how does the CPU know when the network card's buffer is full and when to execute the copy? He described two ways here interrupt and polling.

Polling involves the CPU periodically asking the network card, are you full? Two problems with this method are a) there may be a delay as you have to wait for the poll request to be made and b) it wastes a lot of CPU time. Polling is implemented in software, not hardware.

An alternative to polling is using interrupts whereby the network card will send an interrupt signal to the CPU to get its attention. This needs special hardware to implement, however it is very efficient compared with polling.

An interrupt system must (among other things),

Wait for the current instruction to finish before taking care of the interrupt.
Return to the interrupted program at the point where it was interrupted.
Signal the interrupting device with an acknowledge signal when the interrupt has been recognised.

IRQ is an interrupt request signal.

A daisy chain arrangement (as seen below) allows multiple devices to send and IRQ. However the CPU cannot determine from the IRQ line which device sent the interrupt. So in a daisy chain system when the CPU receives an IRQ, it will send a signal to IO1 asking "did you send the IRQ?" if IO1 sent the request it will reply "yes", if not it will pass the question on to the next IO device and so on. The response is passed back in the same way.

Reset is an interrupt in AVR, and in the AVR Mega64 there are five different sources of reset. There is a flag in the MCU Control Register for each of these and can be used to determine the source of a reset interrupt. The watchdog timer is one source of reset.

Watchdog Timer

The watchdog timer is used to try to reset the system if an error such hang occurs. The watchdog timer in AVR can be enabled or disabled.

If the Watchdog timer is enabled, it needs to be periodically reset using the wdr instruction. When (if) the Watchdog times out, it will generate a short reset pulse.

Tags: comp2121, computing.

PostgreSQL Very Basic Cheatsheet

5th May 2009

(Wrote this a few weeks ago when I knew nothing. Indented into my brain now. Should have published earlier or just trashed the post as it seems too simple now. So instead I'll update it when I find out some new neat tricks.)

List of databases: $ psql -l

To open one of them, $ psql MyDatabase

To see what is in the database (list of relations), mydb=# \d

To examine a specific table, mydb=# \d TableName

Can execute SQL, mydb=# select * from Table;

Can edit SQL in an editor from within PSQL, mydb=# \e

To quit, mydb=# \q

To load a schema from a file $ psql mydb -f /home/foo/bar

Also from the shell, $ pg_dump -O myDB > file (-O means no ownership information is outputed)

On my server configuration (default for ubuntu) you can restart the PostgreSQL service using, $ sudo /etc/init.d/postgresql-8.3 restart

Tags: comp3311, computing.

COMP3311 - Week 7 Notes

5th May 2009

Its pointless repeating what John put in the lecture slides, so this is just my additions that he mentioned but are not in the slides.

Aggregates

Initial condition is defaulted to NULL. So sometimes you will need to define,

  initcond = '';

This is different to NULL, because,

  null || 'abc' --> null

(where || is append) but

  '' || 'abc' --> 'abc'

PHP

$x = 2;
myFunc() {
  global $x;
}

If we omit the global $x, then any references to $x in myFunc will refer to a new local x, not the first x that is set to 2. To avoid this and force any references to x inside myFunc to refer to the first x that is equal to 2, we need this global $x line.

strpos

1. $i = strpos('abc', 'a') --> 0
2. $i = strpos('abc', 'b') --> 1
3. $i = strpos('abc', 'z') --> false

if($i) will only be true in case 2 (false in case 1 and 3). So if we want to test if the second string was at all in the first string we must use,

if($i !== false)

This one will be true in case 1 and 2, but not 3.

Tags: comp3311, computing.

COMP2121 - Wk03/04 - Data Transfer/Functions

17th April 2009

[Updated: Clarified what I wrote about the way stacks in AVR work, in the Push/Pop section.]

Data Transfer Instructions

(Credit: Hui Wu/COMP2121 Lecture Notes)

Load Direct (ld):

ld Rd, v
Rd $latex \in$ {r0, r1, ..., r31} and v $latex \in$ {x, x+, -x, y, y+, -y, z, z+, -z}

(remember the X, Y, Z pointers)

Load Program Memory (lpm):

Can take on three forms,

Syntax	Operation
lpm	R0⟵(Z)
lpm Rd, Z	R0⟵(Z)
lpm Rd, Z+	Rd⟵(Z) Z⟵Z+1

Z contains the byte address while the program flash memory uses word addressing. Therefore the word address must be converted into a byte address before having access to the data on the program flash memory. Example usage, (Table_1<<1 converts the word address into a byte address)

   ldi zh, high(Table_1<<1) ; Initialise Z pointer
   ldi zl, low(Table_1<<1)
   lpm r16, z+ ; r16=0x76
   lpm r17, z ; r17=0x58
   ...
Table_1: .dw 0x5876
   ..

IN/OUT:

AVR has 64 IO registers. Example,

in Rd, A

out A, Rd

where 0 ≤ A ≤ 63.

Push/Pop:

The stack pointer (implemented as two 8-bit registers in the I/O space) points to the next free space in RAM above/below (depends how you look at it) the stack. The stack in AVR is part of the SRAM space, and the stack (in AVR) grows from higher memory locations to lower memory locations. I'm talking about the actual value of the address, so 0xFFFF is a higher address than 0xDDDD. This got me a little confused at one stage because if you draw a picture of memory with 0x0000 at the top of the diagram, and 0x0001 below it and so on then in reference to the diagram a stack that is getting larger with PUSH operations is growing upwards (you usually associate higher to lower with down, this is why I got confused).

So the first thing you must do is initialise the stack pointer (RAMEND is a good starting location).

So a push operation,

push Rr

will push Rr onto the stack (ie. put the contents of Rr into the location that the SP points to), and then decrement the SP by 1. Pop has a similar opposite effect.

Shift Instructions

Logical shift left

lsl Rd

Rotate Left Through Carry

rol Rd

Both operation change some status register flags.

Functions

We dabbed into this in first year but just to revise and extend a little I'll try to reiterate this here.

The heap is used for dynamic memory applications (eg. malloc()).

The stack is used to store return addresses, parameters, conflict registers and local variables and other things.

In passing parameters in WINAVR (C compiler for AVR) for say a function call they are passed by value for scalar variables (eg. char, int, float) and passed by reference for non-scalar variables (eg. array, struct).

Rules are needed between the caller and the callee to resolve issues such as,

how to pass values and references to a function?
where to get the return value?
how to handle register conflicts? (if a function wants to use a register that was previously in use)
how to allocate and deallocate stack memory to and from local variables?

If a register is used in both caller and callee and the caller needs its old value after the return from the callee, then a register conflict occurs. Either the compiler or the assembly programmer needs to check for this. The work around is to save the conflict registers on the stack.

The return value of a function needs to be stored in designated registers. WINAVR uses r25:r24 for this.

A stack consists of stack frames. A stack frame is created whenever a function is called, and it is freed whenever the function returns.

(Credit: Hui Wu/COMP2121 Lecture Notes)

Macros

The AVR assembler offers macros. A macro is just a segment of code that you define and can then use by just calling the macro. Basically the macro name is just a place holder for the macro code. When the program is assembled the macro name will be replaced by the code that macro defines. This defines a macro named mymacro,

.macro mymacro
 lds r2, @0
 lds r3, @1
 sts @1, r2
 sts @0, r3
.endmacro

We can then invoke this macro with,

mymacro p, q

The p, q are used like arguments. So @0 will be replaced with p and @1 will be replaced by q. In AVR you can used @0 up to @9 in the macro body.

Assembly Process

The AVR assembler uses a two pass process.

Pass One:

Lexical and syntax analysis: checking for syntax errors.
Record all symbols (labels...) in a symbol table.
Expand macro calls.

Pass Two:

Use symbol table to substitute the values for the symbols and evaluate functions (eg. low(-13167)).
Assemble each instruction (ie. generate machine code). For example add Rd, Rr is encoded as machine code as 0000 11rd dddd rrrr.

References

Wu, Hui. COMP2121 09s1 Lecture Notes.

Tags: avr, comp2121, computing.

COMP2121 - ADD and ADC

14th April 2009

Just some notes on using ADD and ADC (and also a not on compare and branch instructions at the end). (Thanks also to my lab partner who helped me with some of this stuff.)

The registers for AVR are 8 bits long, as noted if I want to store a 16 bit number I need to use two registers, one will store the low byte and the other will store the high byte. eg the 16 bit binary number a = 1111111100000000 (as big endian) would be stored as a_high = 11111111, a_low = 00000000.

Another thing that I've learnt is how addition of multi byte numbers in AVR works. AVR has the instructions ADD (add without carry) and ADC (add with carry). Lets look at a simple case first.

ldi r16, 0b10101010 //the 0b indicates binary ($ or 0x for hex, nothing for decimal, 0 for octal). loads the binary number 10101010 into register 16.
ldi r17, 0b10101010
add r16, r17

What happens here is,

The carry bit is set, and hence and overflow has occurred. The actual answer is 101010100 but this has 9 bits and cannot fix in the 8 bits of space we have for the result to be stored into r16. Here we used the ADD instruction this means (at least to the best of my understanding, please correct me if I'm wrong) that we ignore what is originally in the carry flag (part of the status register). If we used ADC and the carry flag was set to 1 then when we add the right most bit as above 0+0 = 0 but we have a carry so we would get a 1 in that last bit. We can use this to do multi byte arithmetic.

Say we have,

//a is a 16 bit number 1010101010101010
ldi al, 0b10101010
ldi ah, 0b10101010
//b is a 16 bit number 1010101010101010
ldi bl, 0b10101010
ldi bh, 0b10101010

//to do a+b (and store the result in a) we do
add al, bl
adc ah, bh

We can then extend this to as many bytes as we want. Say we have a 3 byte number spread over the registers of a_0, a_1, a_2 and another 3 byte number in the registers b_0, b_1, b_2. We could add them like this,

add a_0, b_0
adc a_1, b_1
adc a_2, b_2

The result would be spread over a_0, a_1 and a_2 (remember though that we can still overflow this).

It was also enlightening (though also a nice reminder) to see how this low level addition works. (when doing 1bit addition (without carry), a + b, the sum = a xor b and the carry = a and b.) (see picture below)

(Credit: Annie Guo, Basics of Digital Systems, COMP2121 Notes)

This concept can similarly be applied to other instructions. For example,

cp a, b
breq label

will compare a and be and branch to the label label if a is equal to b (and if not continue on executing the subsequent instructions). However what if you wanted to compare if a two byte number was equal to a two byte number? Sure you could just repeat the code one time for the low byte and another for the high byte but you can also do this,

cp al, bl
cpc ah, bh
breq label

These cp and cpc instructions use some trickery (well not really but I didn't initially see how this worked, though its quite obvious now) where they modify some flags based on the result of the comparison and then the branch instructions (breq, brne, brlo...) look at the respective flag to see if they should branch or not.

Tags: avr, comp2121, computing.

COMP2121 - Wk04

13th April 2009

Some general notes from an AVR Tutorial that I have found useful and need to reiterate.

"Only the registers from R16 to R31 load a constant immediately with the LDI command, R0 to R15 don't do that.
The first register is always the target register where the result is written to!
Assembler directives always start with a dot in column 1 of the text. Instructions do NEVER start in column 1, they are always preceded by a Tab- or blank character! [There are no restrictions with respect to column placement of labels, directives, comments or instructions.]
A very special extra role is defined for the register pairs R26:R27, R28:R29 and R30:R31. The role is so important that these pairs have extra names in AVR assembler: X, Y and Z. These pairs are 16-bit pointer registers, able to point to addresses with max. 16-bit into SRAM [(part of the data memory)] locations (X, Y or Z) or into locations in program memory (Z). The lower byte of the 16-bit-address is located in the lower register, the higher byte in the upper register. Both parts have their own names, e.g. the higher byte of Z is named ZH (=R31), the lower Byte is ZL (=R30). These names are defined in the standard header file for the chips. Dividing these 16-bit-pointer names into two different bytes is done like follows:
```
.EQU Address = RAMEND ; RAMEND is the highest 16-bit address in SRAM
LDI YH,HIGH(Address) ; Set the MSB
LDI YL,LOW(Address) ; Set the LSB"¹
```

Define names for registers with the .DEF directive, never use them with their direct name Rx.
If you need pointer access reserve R26 to R31 for that purpose.
16-bit-counter are best located R25:R24.
If you need to read from the program memory, e.g. fixed tables, reserve Z (R31:R30) and R0 for that purpose.
If you plan to have access to single bits within certain registers (e.g. for testing flags), use R16 to R23 for that purpose.¹

Sections marked with ¹ are ©2002-2009 by http://www.avr-asm-tutorial.net, from Beginners Introduction to the Assembly Language of ATMEL AVR Microprocessors by Gerhard Schmidt 2003. Used under the supplied license, "You may use, copy and distribute these pages as long as you keep the copyright information with it."

Tags: comp2121, computing.

COMP3311 Wk02

26th March 2009

Various things about mapping ER Designs to Relational Schemas.

Mapping Strong Entities

The relational model supports only atomic attributes. To map composite attributes you can try,

Concatenate the attributes eg. Struct name {"John", "Smith"} --> "John Smith"
Map atomic components of the composite attribute to a set of atomic components. eg.
??

Mapping N:M Relations

Mapping 1:N Relations

Mapping 1:1 Relations

Notes from the Text Book (The Lecture Notes are a Little Different)

Domain Types & User Types

In the sample code for the first assignment to define "custom types" create domain is used. eg.

create domain PersonGender as char(1) check (value in ('M','F'));

However the text also shows create type. eg.

create type Dollars as numeric(12,2) final

It goes on to explain the difference.

Domains can have constraints (such as not null) specified on them, as well as default values defined on the domain type. You can't do this with user defined types.
Domains are not strongly typed. Hence you can assign values of one domain type to values of another domain type so long as the underlying types are compatible.

Pattern Matching

Patterns in SQL can be desribed using % and _.

Percent (%): The % character matches any substring.
Underscore (_): The _ character matches any character.

eg.

select foo from bar where lar like '_to%'

This will match to any of these strings, "Lto" "Ato" "ltoo" "rtoto" ... (any character at the start, then the "to" string, then any (even null) trailing string)

You can define the escape character for a like comparison as follows,

like 'he\%%'  escape '\'' --matches all strings begining with 'he%'

You can also use not like.

SQL:1999 allows for similar too which is similar to Unix regular expressions.

Drop vs. Delete

drop table r will remove all the tuples from r, and removes the schema of r, whereas

delete from r will just remove all the tuples from r, but leaving the schema so you can still add values to the table.

References

Shepherd, John. COMP3311 09s1 Lecture Slides. http://www.cse.unsw.edu.au/~cs3311/09s1/lectures/. (Diagrams have also been sourced from here).

Silberschatz. Database System Concepts. 5th Ed.

Tags: comp3311, computing.

COMP3311 - Wk01

23rd March 2009

Just a couple random notes, to reiterate some things I need to become acquainted with. Definitely not comprehensive.

The ER Diagram

Cardinalities

Each manager manages exactly one branch, and each branch is managed by exactly one manager.
Each branch holds zero or more accounts, but each account is held be at most one branch.
Each customer owns zero or more accounts and each account is owned by zero or more customers.

Participation

Not all customers must take out a loan (or it is not the case that every customer takes out a loan), but every loan is taken out by at least one customer. i.e. Every loan is associated with at least one person, but every person is not necessarily associated with at least one loan.

Attributes Linked to Relationships

In (a) you know how much time a particular person spends on a project. In (b) you only know how much time has been spend on a particular project. You don't know the distribution of that time among the researches that have worked on it.

References

Shepherd, John. COMP3311 09s1 Lecture Slides. http://www.cse.unsw.edu.au/~cs3311/09s1/lectures/. (Diagrams have also been sourced from here).

Tags: comp3311, computing.

COMP2121 - Wk02

20th March 2009

This week we learnt some of the things that separate assembly language from machine code in the context of AVR (or should I say AVR Studio).

Important Note 1: Assembly Language

In AVR assembly an input line takes the form of one of these, ([foo] = Optional)

[label:] directive [operands] [Comment]
[label:] instruction [operands] [Comment]
Comment
Empty Line

where a comment has form,

; [Text]

Important Note 2: Pseudo Instructions

AVR Assembly (using AVR studio) has some additional commands that are not part of the AVR instruction set, but the assembler (that is part of AVR studio) interprets into machine code.

Pseudo instructions are recognised by their preceding '.' (dot character). eg,

.equ CONST = 31

, will upon assembly go through the code and replace CONST with 31.

Here are the AVR assembly Pseudo Instructions.

Directive	Description
BYTE	Reserve byte to a variable
CSEG	Code Segment
CSEGSIZE	Program memory size
DB	Define constant byte(s)
DEF	Define a symbolic name on a register
DEVICE	Define which device to assemble for
DSEG	Data Segment
DW	Define Constant word(s)
ENDM, ENDMACRO	End macro
EQU	Set a symbol equal to an expression
ESEG	EEPROM Segment
EXIT	Exit from file
INCLUDE	Read source from another file
LIST	Turn listfile generation on
LISTMAC	Turn Macro expansion in list file on
MACRO	Begin macro
NOLIST	Turn listfile generation off
ORG	Set program origin
SET	Set a symbol to an expression

.byte - Reserve some space (only allowed in dseg). eg.

.dseg
  var1: .byte 4 ;reserve 4 bytes and store address in var1

.CSEG
  ldi r30, low(var1) ; Load address into Z low register
  ldi r31, high(var1) ; Load address into Z high register
  ld r1, Z ; Load VAR1 into register 1

...and some more...

.def FOO=r30 ;give register 30 the symbolic name FOO
.equ var = 2 ;like C's #define statement
.set var = 2 ;like a global variable in C

Important Note 3: Segments

There AVR three segments of an AVR assembly program. Also when writing an assembly program you need to be able to specify which segment you are referring to, so you need to declare this using an AVR assembler directive shown in brackets below.

Code Segment (.cseg)
Data Segment (.dseg)
EEPROM Segment (.eseg)

"The CSEG directive defines the start of a Code Segment. An Assembler file can consist of several Code Segments, which are concatenated into one Code Segment when assembled. The BYTE directive can not be used within a Code Segment. The default segment type is Code. The Code Segments have their own location counter which is a word counter. The ORG directive can be used to place code and constants at specific locations in the Program memory. The directive does not take any parameters." [1]

Notes from the Lab

Assembly Code and Machine Code

In the lab we looked at this AVR assembly program,

.include "m64def.inc"
.def a =r16 ; define a to be register r16
.def b =r17
.def c =r18
.def d =r19
.def e =r20
main: ; main is a label
ldi a, 10 ; load 10 into r16
ldi b, -20
mov c, a ; copy the value of r16 into r18
add c, b ; add r18 and r17 and store the result in r18
mov d, a
sub d, b ; subtract r17 from r19 and store the result in r19
lsl c ; refer to AVR Instruction Set for the semantics of
asr d ; lsl and asr
mov e, c
add e, d
loop:
inc r21 ; increase the value of r21 by 1
rjmp loop ; jump to loop

The machine code executable produced by AVR studio was 24 bytes long, the question was why. It's actually quite simple, all of the here instructions are 1 word (2 bytes = 16 bits), there are 12 instruction so total 24 bytes. One thing that initially confused me was the weird encoding. Back in 1917 I got the impression that if you have something like mov r16 r17 that this would be represented in machine code as some number for the mov operation followed by two more numbers of the same bit size for the registers. However this can vary depending on the architecture.

For example the mov operation, MOV Rd, Rr has encoding 0010 11rd dddd rrrr. The instruction takes up 6 bits, the source register takes up 5 bits and the destination takes up 5 bits (total 16 bits). The reason that the source and destination bits are intertwined is that it makes things easier on the hardware implementation to do it this way.

The program above has machine code (in hexadecimal),

E00A EE1C 2F20 0F21 2F30 1B31 0F22 9535 2F42 0F43 5993 CFFE

and annotated,

+00000000:   E00A        LDI     R16,0x0A         Load immediate
+00000001:   EE1C        LDI     R17,0xEC         Load immediate
+00000002:   2F20        MOV     R18,R16          Copy register
+00000003:   0F21        ADD     R18,R17          Add without carry
+00000004:   2F30        MOV     R19,R16          Copy register
+00000005:   1B31        SUB     R19,R17          Subtract without carry
+00000006:   0F22        LSL     R18              Logical Shift Left
+00000007:   9535        ASR     R19              Arithmetic shift right
+00000008:   2F42        MOV     R20,R18          Copy register
+00000009:   0F43        ADD     R20,R19          Add without carry
+0000000A:   9553        INC     R21              Increment
+0000000B:   CFFE        RJMP    PC-0x0001        Relative jump

Status Register

Stepping through this program also shows a few of the status registers in use. The Status register, like all the other registers has 8 bits, namely,

SREG	Status Register
C	Carry Flag
Z	Zero Flag
N	Negative Flag
V	Two’s complement overflow indicator
S	N ⊕ V, For signed tests
H	Half Carry Flag
T	Transfer bit used by BLD and BST instructions
I	Global Interrupt Enable/Disable Flag

Last week we saw how signed and unsigned numbers are stored, so if you look at the program above you will see that the last part just increments a register infinitely over and over. Stepping through this shows us what the status register does as we do this. Remember that AVR does all arithmetic in two's compliment.

0	H Z
1-127	H
128	H V N
129-255	H S N

As you can see the values that are negative under the two's complement 128-255 have the N (negative) flag. Also from 127 then to 128 under two's compliment we have gone from 127 to -128, so the V (Two’s complement overflow indicator) flag is indicated. 0 has the zero flag.

The Rest

The lecture notes go over some of the AVR instructions but I think the docs provided by Atmel are fine. What I do think needs explaining (and me learning) is the carry flag and the difference between operations like add without carry (ADD) and add with carry (ADC).

Here is how I understand the carry bit. Say we have 4 bit registers and try to add (in binary) 1000 and 1000, the answer is 10000 however this is 5 bits and we only have 4 bits available to store the result. An overflow has occurred. To introduce some terminology, the most significant bit (or byte) (msb) is the left most bit (or byte) in big-endian (right most in little-endian), and vice-versa for least significant bit (or byte) (lsb). The carry bit (C flag) is the carry from the msb. This can indicate overflow in some cases but not always, it those cases the V flag (Two’s complement overflow indicator) is set.

So getting back to ADD and ADC, ADD will just add the two numbers ignoring the carry bit and ignoring overflow whereas ADC will actually add the carry bit to the result. At least this is what I've observed, though I'm not 100% sure on this.

References

[1] AVR Assembler User Guide. www.atmel.com/atmel/acrobat/doc1022.pdf

Bibliography

AVR Assembler User Guide. www.atmel.com/atmel/acrobat/doc1022.pdf

Tags: comp2121, computing.

An old memory of Linux

13th March 2009

I only started using Linux when I started Uni (2008). I knew nothing about it prior to this. I had heard of it but that is all. I now wish I'd become aware of it sooner but then I realised I had the chance. A friend of mine introduced me to it in year 10 (2005) and I even got an Ubuntu 4, 5 or 6 CD (I can't recall which one). But I never used it. Why not? Well, it occurred to me that Windows was the sole reason for me not at least testing Linux out. I had had some bad experiences with some Windows operating systems and installing windows operating systems. So I had the thought "I don't want to risk breaking my computer/loosing things again with another OS install" just to test this Linux thing out. Huge mistake! Though at least I've learnt from this experience and I'm trying to try things out and give them ago before I just assume them to be poorer quality or choosing not to use them.

One of the best pieces of advice that I would give is to try new things out, if you don't try it you won't know what your missing. For if I was always too afraid to try new things out I would still be using Windows 95.

Tags: computing, linux.

COMP2121 - Wk01

13th March 2009

Just some reflections on the first week of my 09s1 COMP2121 Lectures and other materials. I will base a lot of this material on the course lecture notes by Hui Wu. Actually if you read this and the lecture notes you might see that they are pretty much identical.

The 4917 Microprocessor

This first lecture clarified a lot of the things I'd learnt about low level computing in COMP1917, which was good. Back in COMP1917 we were introduced to the 4917 microprocessor (a hypothetical designed just for this course). It was 4bit, has 16 memory locations and 4 registers (instruction pointer, instruction store, general register 0 and general register 1). Each memory location could store a number between 0 and 15, and there were 16 instructions.

1-byte Instructions
- 0 = Halt
- 1 = Add (R0 = R0 + R1)
- 2 = Subtract (R0 = R0 - R1)
- 3 = Increment R0 (R0 = R0 + 1)
- 4 = Increment R1 (R1 = R1 + 1)
- 5 = Decrement R0 (R0 = R0 - 1)
- 6 = Decrement R1 (R1 = R1 - 1)
- 7 = Ring Bell
2-byte Instructions, value of the second byte is called <data>
- 8 = Print <data> (The numerical value of <data> is printed)
- 9 = Load value at address <data> into R0
- 10 = Load value at address <data> into R1
- 11 = Store R0 into address <data>
- 12 = Store R1 into address <data>
- 13 = Jump to address <data>
- 14 = Jump to address <data> if R0 == 0
- 15 = Jump to address <data> if R0 != 0

It had a simple start up (registers set to 0, all memory locations set to 0, fetch-execute cycle begins), and a fetch-execute cyle,

The instruction at the address given by the instruction pointer is loaded into the instruction store.
The instruction pointer is incremented by 1 or 2 depending on whether the instruction store is a 1 or 2 byte instruction.
The instruction in the instruction store is executed.
This is repeated until the instruction store equals 0 (HALT)

So for example the following 4917 machine code program would print 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15.

8 0 3 11 1 15 0

This is understandable but was difficult for me to tie into my desktop computer level, for instance the CPU and the RAM. This became a bit clearer today.

The Von Newman and The Harvard Architectures

Two main computer architectures include the Von Newman Architecture and the Harvard architecture. The hypothetical 4917 from COMP1917 was a Von Newman Architecture because both the program and the data were stored in the same memory. (It seems that processors like the Intel Pentium 4, Intel Core 2, etc. use this architecture)

Another key thing is the Arithmetic and Logic Unit (ALU) and the Control Unit are collectively called the CPU (Central Processing Unit). A microprocessor is just a CPU on a single chip. A microcontroller is basically a computer on a chip, so the CPU, memory, etc. are all on a single chip.

I learnt a lot from experimenting with this architecture and the 4917 (and its successors). Though when I began to write more complex programs I found myself constantly putting a GOTO instruction number n command at the very begining, and using that skiped space as memory for data. I also came to see that this architecture allows for the buffer overflow vulnerability as program data can be executed as instructions if there are vulnerability. These two observations tend to lead to the Harvard architecture (which I am new to).

This is the architecture used for the Atmel AVR microprocessor, which is what we will focus on in this course (I think). I'll come back to this when I talk about address space.

Computer Memory

This is the computer memory hierarchy diagram from the lecture notes.

This helps put a lot of my misunderstandings of how the 4917 from COMP1917 relates to modern processors, as I didn't quite see if the instructions and data were stored in RAM or on the CPU. But really this is just an implementation issue so it didn't really matter to the 4917.

In the CPU there are registers which are really fast, but there are few (apparently in x86 (eg. Pentium, Core 2) there are only 8 integer and 8 floating point registers). Then there is the cache (on chip memory), this is what that 4MB cache that I see advertised that my CPU has. This cache can be separated for program code and data. This is faster that reading from RAM (the off chip memory), so currently active program code is moved to here from the RAM when necessary to speed things up (this is apparently implemented on the hardware level (which is always nice for a software developer ;) ). Then there is off-chip memory and auxiliary storage. This fits in nicely with the picture I get when I open up my computer.

The material on the types of RAM and ROM in the lecture notes needs no further commentary, so I'll skip that part.

ISA (Instruction Set Architecture)

"ISA is the interface between hardware and software

For (machine language) programmers (and compiler writers)
- Don’t need to know (much) about microarchitecture
- Just write or generate instructions that match the ISA
For hardware (microarchitecture) designers
- Don’t need to know about the high level software
- Just build a microarchitecture that implements the ISA" --Wu

This sums up the situation nicely, and makes perfect sense.

There are 4 things that make up memory instruction set architectures,

Memory Models (how does the memory look to the CPU)
- Addressable cell size. Most commonly is 8 bits (= 1 byte) (though AVR uses 16 bits for program memory)
- Alignment
- Address Space (AVR address space shown below)
- Endianness (Least or most significant byte first)
Registers
- General Purpose (temp results...)
- Special Purpose
  - Program Counter (PC)
  - Stack Pointer (SP)
  - Input/Output Registers
  - Status Registers
Data Types
Instructions

RISC - Reduced Instruction Set Computer CISC - Complex Instruction Set Computer

Atmel AVR is RISC.

Number Systems

Problem: How do we represent negative numbers in a computer? 4 main methods (from Wikipedia).

Sign-and-magnitude - Allocate one bit as the sign (problems: two zeros, circuitry more complicated)
Ones' complement - (problems: two zeros)

Two's complement

Excess-N (not in lecture notes)

It is important to know that the hardware does all arithmetic in 2's compliment. It is up to the programmer to interpret the number as signed or unsigned.

To convert a number from 2's compliment, for example -45 in 2's compliment is 11010011, we can do something like this,

$latex 1 \times (-2^7) + 1 \times 2^6 + 1 \times 2^6 + 0 \times 2^5 +1 \times 2^4 + 0 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0 = -45$

To go the other way from say -1 to the 2's compliment form 11111111 we use that $latex 2^p - X$ formula. I'm not exactly sure how its supposed to work so I've hacked it to make it work.

If the number you wish to convert is negative, let $latex X = -n$, so that X is positive then take $latex 2^p$ where p is the number of bits you are using (say 8), then subtract X. If the number to convert is less than $latex 2^p$ (where p is the number of bits, say 8 ) then leave it as is and that in your 2's compliment.

Now that was complicated. But its the only way I can get that advertised $latex 2^p - X$ formula to work with the given set of sample data (as in that table above).

This raises an issue for comparison operations. eg. Is 1111 1100_two > 0000 0010_two? If those two numbers are unsigned then YES, if they are signed thin NO. As such the hardware uses the S flag in AVR to indicate the result of the signed comparison, and the C flag to indicate the result of unsigned comparison.

Representing Strings. We saw one method back in COMP1917 though its nice to see that the other methods that come to mind were used also.

1st position of string reserved for length (Pascal)
an accompanying variable has the length of the string (as in a structure)
last position of string marked with a special character (NULL in C)

Sign Extension

How do we extend a binary number of m bits to an equivalent binary number of m + n bits? If the number is positive add n 0's to the most significant side (usually left) of the binary number. If the number is negative add n 1's to the most significant side of the binary number. This is called sign extension. To add twe binary numbers of different lengths just sign extend the shorter one so that both numbers have the same bit length then add as usual.

References

Wu, Hui. COMP2121 09s1 Lecture Notes.

(Updated 20th June 2009 with explanation of 2's compliment conversions and sign extension)

Tags: comp2121, computing.

Some thoughts on Feed Readers

3rd March 2009

I currently use Google Reader to subscribe to and read RSS and Atom feeds. It has pros and cons.

One of the great things about Google Reader is that its entirely in the cloud. I can access it wherever I am, whichever operating system I'm using. Also as its run on Google's servers, they pay the bandwidth bill not me. I subscribe to lots of feeds (181 as of now) but I don't read them all all the time. If I don't want to read some high volume feeds I don't open them and they don't get downloaded to my machine. This also means that its Google that checks the feed for updates all the time, not my machine. And that's the beauty of using a cloud based feed reader. Google only has to download the feed 1 time for all the subscribers of that feed in Reader, not once for each user.

This is the sole reason I use a cloud based reader, and this pro is too great to get me to switch. Unfortunately there are some things that I don't like about reader and some things I wish I could add. That's the bad thing about using a cloud service like this, I cannot edit the code for my needs. Open source desktop applications can, but with Google's cloud services you cannot.

Its good that in Reader you can download a list of your subscribed feeds, but I wish I could also download the feed items themselves in bulk. There is a copyright issue here that being, from what I've experienced when you subscribe to a new feed in Reader you can view the past postings going back probably to when it was first indexed, despite these items no longer being in the feed. I want to be able to download those too. I guess since you are already downloading them when you view them this is the same, so its not a new issue.

This is turning into a debate about cloud services vs. desktop software, so I'll stop here.

Tags: computing, google.

Entries tagged "computing".

OS Overview

System Calls

Processes and Threads

Threads Implementation

Concurrency

Deadlock

File Systems

Performance issues with standard magnetic disks

Block Allocation Strategies

Keeping track of file blocks

System call interface

VFS

Memory Management and Virtual Memory

Virtual Memory

Input Output

Scheduling

Multiprocessor Systems

Security

Security Engineering

Crypto

RSA

Access Control

Elections

Security Architecture

Human Factors

Risk

Business Risk Concepts

Degree of Risk

Risk management

Primary

Secondary

Parametric Curves and Surfaces

Parametric Representation

Continuity

Parametric Continuity

Geometric Continuity

Control Points

Natural Cubic Splines

Bezier Curve

De Casteljau Algorithm

Curvature

Uniform B Splines

Non-Uniform B Splines

Rational B Splines

NURBS

3D

Matrix Stacks

The View Volume

Visible Surface Determination (Hidden Surface Removal)

BSP (Binary Space Partitioning)

Z Buffer

Fractals

L-Systems

Self-similarity

IFS - Iterated Function System

Shading Models

Lambert's Law

Gouraud Shading

Phong Shading

Phong Illumination Model

Attenuation

Ray Tracing

Shadows

Rasterisation

Line Drawing

DDA

Bresenham

Polygon Filling

Scan line Algorithm

Anti-Aliasing

Aliasing Problems

Anti-Aliasing

No Anti-Aliasing

Pre-Filtering

Post-Filtering

OpenGL

Anti-Aliasing Textures

Mip Mapping

Double Buffering