# Mesh Data Structures

In most 3D graphics applications, meshes are refined, smoothed, simplified using decimation, merged, partly extracted or otherwise modified. Powerful subdivision surfaces are principally based on mesh refinement by iterating through edges, faces and/or vertices in the one-ring neighbourhood. Extremely large meshes generated by physical model scanners need to be simplified accurately, even with limited memory and computational power. And above all: mesh descriptions should not be ambiguous. Some models need to be 2-manifolds, others are just used to render them from various perspectives. But what combines them all? Which data structures are best suited for which operations?

When comparing different data structures (dats), size of bytes per vertex should be used for investigating space complexity. This influences time complexity for a given operation because various algorithms cannot be used for all dats.[1]

Required knowledge: basic 3D (mesh) geometry, basic computer architecture

# Overview

Topological requirements: mesh type (2-manifolds, complex edges, singular vertices, triangular/quad/dynamic meshes). Regular, semi-regular, irregular meshes. Flat or hierarchy of meshes. Static or dynamic topology (i.e. change over time).

Algorithmic requirements: read type (static/dynamic), write type (static/dynamic). Associate additional data with vertices/edges/faces? Upper limit for memory consumption? Access to local neighbourhood of vertices, edges, faces?

## Performance comparison

 Data Structure Space / Vertex Mesh Topology Rendering One-Ring Traversal Boundary Traversal Face-set 72 bytes Static, fixed (3,4) Fast Slow Slow Indexed face set 36 bytes Static, fixed (3,4) Fast Slow Slow Indexed face set + neighbor faces 64 bytes Usually static Fast Fast if static topology Slow Winged-edge 120 bytes Arbitrary (2-manifolds) Medium Slow (case distinctions) Slow Quad-edge 144 bytes Arbitrary (2-manifolds) Medium Fast Medium Half-edge 144 / 96 bytes Arbitrary (2-manifolds) Medium/slow Fast Fast Directed-edge 64 bytes Regular Triangular/quad meshes (2-manifolds) Medium/slow Medium Medium

## Strengths and weaknesses

 Data Structure Strengths Weaknesses Face-set Static meshes; rendering No explicit connectivity information; replicated vertices and associated data Indexed face set simple and efficient storage; static meshes; rendering; No explicit connectivity information; not efficient for most algorithms Indexed face set + neighbor faces Access to individual vertices/edges/faces. Oriented traversal; access to incident faces of an edge; access to an edge’s two endpoint vertices; one-ring traversal possible No explicit edge storage (no data attachments); massive case distinctions for one-ring traversal; complex & less efficient for general polygonal faces Winged-edge Arbitrary polygonal meshes Massive case distinctions for one-ring traversal Quad-edge One-ring traversal Slow rendering Half-edge One-ring traversal; explicit representation of edges Slow rendering Directed-edge Memory efficiency; One-ring traversal for triangular meshes Only for pure triangle/quad meshes; no explicit representation of edges

## Common applications

 Data Structure Common Applications Face-set stereo-lithography (STL) Indexed face set rendering (OpenGL vertex array, Direct3D), OFF, OBJ, VRML Indexed face set + neighbor faces 2D triangulation data structures of CGAL Winged-edge Rarely used today Quad-edge Rarely used today Half-edge Mostly used for mesh refinement, decimation, smoothing Directed-edge Mostly used for mesh refinement, decimation, smoothing of pure triangular meshes

# Face-Based Data Structures

The most simple format is to store a mesh as a set of individual polygonal faces represented by their vertex positions. This is referred to as a face set. The point is that for each face its vertices are represented by position vectors with usually three components. Since most faces share vertices with neighbouring faces, vertex positions are stored redundantly – so further increasing memory consumption. As a result, it requires on average $3\cdot3\cdot4$ = 36 bytes/face = 73 bytes/vertex when using 32-bit single precision numbers.[2] Face sets do not represent mesh connectivity, so its referred to as triangle soup sometimes.

Since face sets store vertex positions redundantly, a widely used variation stores and array of vertices and encodes polygons as sets of indices into this array: the indexed face set or shared-vertex. With 32 bits for vertex positions and array indices which is sufficient for most applications, storage requirements are about 12 bytes/vertex + 12 bytes/face = 36 bytes/vertex. However, it requires expensive searches to recover local adjacency information of a vertex – so it’s not efficient for most algorithms.

〈IndexedFaceSet〉 ≡         Vertex {    	Point		position    }    Face {    	VertexRef	vertex[3]    }

Another variation includes references to neighbouring faces (for each face) and to the parent face for each vertex, which additionally requires 28 bytes/vertex (64 bytes/vertex in general).

〈IndexedNeighboredFaceSet〉 ≡         Vertex {    	Point		position    	FaceRef		face    }    Face {    	VertexRef	vertex[3]    	FaceRef		neighbor[3]    }

# Edge-Based Data Structures

Why edge based? Connectivity primarily relates to mesh edges, so general polygon meshes are logically edge-based. For the winged-edge structure, each edge contains the most information, e.g. references to endpoint vertices, to its two incident faces, and to the next and previous edge within the left and right face. Vertices and faces store a reference to one of its incident edges. So memory requirement is about 16 bytes/vertex + 4 bytes/face + 32 bytes/edge = 120 bytes/vertex. This structure represents arbitrary polygonal meshes. But one-ring traversal still requires expensive case distinctions.

〈WingedEdge〉 ≡         Vertex {    	Point		position    	EdgeRef		edge    }    Face {    	EdgeRef		edge    }    Edge {    	VertexRef	vertex[2]    	FaceRef		face[2]    	EdgeRef		next[2]    	EdgeRef		prev[2]    }

Another quite similar variant is quad-edge.

# Halfedge-Based Data Structures

In principle: each unoriented edge is split into two oriented halfedges, which are consistently oriented in counterclockwise order around each face and along each boundary. This avoids case distinctions when performing one-ring traversal. It can represent arbitrary polygonal meshes being subsets of orientable 2-manifolds, i.e. no complex edges and vertices. A halfedge stores references to the vertex it points to, its adjacent face, the next halfedge of the face or boundary, the previous halfedge in the face and its opposite/inverse halfedge. Each face stores a reference to one of its halfedges, each vertex a reference to an outgoing halfedge. This leads to: 16 bytes/vertex + 4 bytes/face + 20 bytes/edge = 144 bytes/vertex, since number of halfedges H is about six times the number of vertices V. The implementation can be realised by using pointers or indices, but indices (thus arrays) should be used whenever possible due to simpler and more compact memory management and because all attributes of a vertex can be identified by the same index.

〈Halfedge〉 ≡         Vertex {    	Point		position    	HalfedgeRef	halfedge    }    Face {    	HalfedgeRef	halfedge    }    Halfedge {    	VertexRef	vertex    	FaceRef		face    	HalfedgeRef	next    	HalfedgeRef	prev    	HalfedgeRef	opposite    }

For implementation details on half edges, see CGAL library.

# Directed-Edge Data Structures

Halfedges should only be used for arbitrary mesh topologies (2-manifolds only). For triangular meshes, the memory optimised directed edge should be used. This method is based on indices: indexing follows certain rules that implicitly encode some of the connectivity information of a triangular mesh. Instead of pairing opposite halfedges, this data structure groups the three halfedges belonging to a common triangle.
Indexing is as follows: Let $f$ be the index of a given face. Then the indices of its three halfedges are given as $$\text{halfedge}(f,i) = 3f + i,\ \ i=0,1,2.$$ The index of a halfedge’s adjacent face is $$\text{face}(h) = h/3,$$ where $h$ is the halfedge’s index. The index within that face is $$\text{face_index}(h) = h\mod3.$$ The index of $h$’s next halfedge is $(h+1)\mod 3$.
In addition, each vertex stores its position and the index to an outgoing haldegdge. Each halfedge stores the index of its opposite halfedge and the index of its vertex. On average, directed edges requires about 16 bytes/vertex + 8 bytes/halfedge, resulting in 64 bytes/vertex.
Directed edges handle triangle meshes (pretty much like the general halfedge data structure), but boundaries are handled by special indices (mostly negative) indicating that the opposite halfedge is invalid. And it’s not possible to mix triangles and quads, or to represent general polygonal meshes. It can only be used for regular triangle/quad meshes as 2-manifolds, and edges are not explicitly represented. One-ring traversal is a bit slower due to no atomic operations when enumerating the next boundary edge.

# Summary

Suitable data structures are the key to efficient geometry processing algorithms based on polygonal meshes. When working with meshes, halfedges or directed-edge data structures are used for most problems, while rendering algorithms often require indexed face sets (e.g. OpenGL vertex arrays and OpenGL vertex buffer objects which are stored on the GPU for very fast rendering). In general, its actually much harder to achieve a good balance between versatility, memory consumption and computational efficiency. Matured libraries like CGAL, OpenMesh and MeshLab should be used whenever possible.

1. This article does not investigate algorithms based on mesh data, hence no time complexity as well^
2. A mesh with $F$ faces consumes $36 \cdot F [bytes] \approx 36 \cdot 2V [bytes] = 72 \cdot V [bytes]$, with $V$ vertices. Note that for triangular meshes, the number of faces is about twice the number of vertices (Euler’s Formula)^

# A Tiny Wavefront Object Loader – Part I

Abstract: 3D geometry can be represented in many ways concerning primary and secondary memory, e.g. in files. One particularly popular geometry file format is the Wavefront OBJ file format, also known as OBJ file format. A major advantage: it’s free and can be easily implemented, although supporting all functionality may make the implementation rather more complex. Using Google Search brings a lot of examples and implementations. However, after researching it turns out that most examples being published in the internet a best examples of how a geometry file loader should NOT be implemented. This post discusses some requirements for good file loaders for ASCII formats, it gives some bad practices found in the Internet and finally discusses a possible implementation and its limitations.

# Recall: OBJ File Format

According to the Wavefront OBJ file format specification, the following definition holds for vertices (v), normals (vn) and texture coordinates (vt):
 v vertex_x vertex_y vertex_z #v 0.0033 1.3943 593.33 vn normal_x normal_y normal_z #vn 0.1003 1.4933 2.4933 vt u_coord v_coord #vt 0.1144 0.8999 
where vertex, normals vectors and uv-coordinates are stored as single floating point values, although $u,v \in [0,1]$. For each face exists an arbitrary number of vertices, each optionally having a normal and texture coordinate. Because vertices are being used several times within a mesh, they are encoded in a separate (unique) list. A face references the indices of this list. Note that indices begin with 1, and not with 0! Its definition is
 f v1 v2 v3 ... #f 13 45 26 ... f v1/vt1 v2/vt2 v3/vt3 ... f v1/vt1/vn1 v2/vt2/vn2 v3/vt3/vn3 ... f v1//vn1 v2//vn2 v3//vn3 ... 

# What makes a good loader

When implementing a (new) loader, the task heavily depends on the given situation and environment in which the loader will operate. One major factor: which libraries are already loaded due to other classes or modules in the same system? In case some string or regular expression parser are available, they should be used – e.g. Boost XML library, custom regex libraries. Another question: character-based or binary-based? Binary file formats have the big advantage of faster IO operations, i.e. reading and writing binary files is much faster than writing and parsing ASCII data. However, OBJ files are stored as ASCII characters because they can be used easily by any programmer.[1]
Another factor: speed. There is absolutely no use for a loader if it takes minutes to load and parse data of average size – the user might not wait so long! Moreover, as any other program, it has to be reliable and robust: the loader should not break in case of corrupt files or “unknown” geometry.

## Speed Considerations

Three major tasks in a loader influence speed: parsing, data allocation and data access. When parsing an ASCII file, a couple of methods in C or C++ can be used to accomplish the conversion from character arrays to numeric values. A very low level method is atoi() and atof() from C – ASCII to integer or float. Unless C is required, these two methods should not be used anymore: they force the programmer to work with C-character-arrays and pointers. Although there is nothing bad about this, errors are more probable – esp. in hard time constraints. Another method is sscanf() – it is said to be much safer, and its fast – but not as fast as atoi. However, Windows compiler (like VC) likes to complain about using sscanf:
 warning C4996: 'sscanf': This function or variable may be unsafe. Consider using sscanf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. 
But why using a different method name for a safer implementation, if everyone is forced to rename the method name to use the updated version? It is just ridiculous! Both sscanf and sscanf_s do the same thing, so why a different name? And what about cross operation system compatibility?
Fortunately, C++ provides a fast and easy way to parse strings: stringstreams. However, stringstreams are not as fast as sscanf or even as atoi. But if you can spare a few seconds more while loading objects, parsing and conversion of string only requires a few lines of code when using stringstreams. This is the purpose of C++ – to make tasks easier and to support faster programming! The C++ STL has been thoroughly tested and optimized for performance. When some people complain about its poor performance, it is most likely because of the overall software design which uses STL.

The second major factor for speed: data allocation. There are two types or memory in C++: memory stack and memory heap. Allocation for the stack is much faster because it should be used for temporary variables within scopes. Allocating memory on the heap is permanent, but much more expensive and can significantly lead to internal/external memory fragmentation over time if used extensively, which in turn slows down future program execution! Thus, use new only when there is no other way. So the best way to store a lot of similar data is using arrays – or STL vectors when using C++[2]. The type for vectors should be a structs if they only hold some data, and classes only if the vector holds a list of real-world objects. Creation of a STL vector has some timer overhead, so you should never use them for too small lists! In case of OBJ files, a vector of a vector of vertices for each face and a geometry of about 40k faces will likely cost you an additional overhead of 10 to 20 seconds – use C arrays in this case (see the implementation).

And the third factor: data access. The fastest way to access a lot of data is when they are stored in one contiguous block of memory, because of cache locality! When the first position in an array is accessed for the first time, a cache miss is most likely to happen, so that the memory is loaded from main memory to the CPU’s cache where access to data is very fast. In modern operation systems and modern memory controllers, some nearby addresses are loaded into cache as well[3], thus increases access performance. Although a loader usually does not perform excessive read accesses, speed can be improved for large data.

To sum up: Use C++ STL whenever possible. A vector of a vector usually is a sign for bad system design.

## Robustness and Reliability

Per definition, a robust program can cope with errors during execution and it has the ability to continue to operate despite abnormalities in input and calculations. This means for a loader that it should not break no matter what files are to be loaded. Reporting should be done as well in case of errors. In general, STL stringstreams are safer than sscanf being safer than atoi. Another reason for using stringstreams. While atoi and atof implicitly imply for lines of source code and an increased probability of bugs, sscanf is considered unsafe / deprecated by some compilers (VC most notably), which leaves stringstreams again.

In case of OBJ files, a loader has to work with any mesh, even if it is malformed: The loader has to load data, and not to check model topology or even recalculate normals! Load the data the way they are – if there is something wrong with how data is stored, signal appropriate errors. In addition, the used data structure should be unambiguous – variables and methods to read loaded data are to be self-explanatory and not to complex. This increases robustness in the long term.

## Library Independence

It is astonishing to see how many different implementations are somewhat overly dependent on other libraries, in which cases the task could have been implemented much easier using STL and C/C++. Let a loader use 2 other libraries. How complex, cluttered, fast and how to maintain would a final program be if it supports 10 to 20 file loaders and writers, each in turn using 2 additional libraries. Such a library might be used several times, but in the worst case, you might end up with using 10 to 15 additional libraries just for loading files – just consider compile time (library updates?), program installation and loading.

Although sometimes libraries are absolutely needed, enough examples use libraries which should not be used (bad software design). Just take an example: I seen an OBJ loader implementation which compiles an OpenGL list after reading all data – so the class depends on OpenGL headers. What if Direct3D should be supported too, or what if this class is to be reused for another program using Direct3D? As you can see, problems will arise implying a new implementation of an OBJ loader. This takes time, to much time. It is quite simple: the OBJ file format is supposed to be independent of any graphics library or operating system, so should the loader! See Common Bad Practices for a list of examples how a loader should NOT be implemented.

In the following, I am trying to give a list of examples of how a loader should not be implemented. I am glad to hear about more examples.
Requiring OpenGL: A loader’s job is to load data from files into main memory. Nothing more! Study another example from leolol.com – although written in Java, the main principle should be clear: see the method drawModel(). Putting distinct tasks into one class inevitably leads to “spaghetti code”. Another Example: keithlantz.com (it also practices quite bad naming conventions, if there is any convention at all; the over all design is good, but parsing face lines is not good). Same applies to a gamedev.net post as well.
Custom List Data Structures: why implementing a custom data structure representing dynamic arrays if we have STL containers, which are most likely faster and error free? A perfect example is kixor’s object loader.
Redundant Data Storage: In large triangular meshes, vertices are approximately 6 times reused according to the Euler-Poincaré formula. Unique vertices as descriptions of unique points in 3D or 4D should therefore not be stored redundantly, i.e. several times. One perfect example is the posted code by limegarden.net: in the second part of the loading process, geometry data being previously loaded is copied into a mesh structure such that vertex, normal and texture coordinates are stored independently for each face. Given a large triangular mesh of about 250k faces, this results in a memory overhead of about $(5 + 5) ( 250 000 \cdot 0.5) \cdot 3 \cdot \text{sizeof}(float) = 14.3 MByte$ for coordinates in three dimensional space. Not to mention the additional assignments and memory allocations which is done twice. This is really bad, but it might not be important in systems with large primary memory. Although floating point assignments are required to be accurate, the problem arises as soon as the caller wants to perform operations based on the returned mesh structure. Suppose the loaded mesh has to be decimated, just figuring out identical vertices needlessly complicates this task – not to mention when updating vertices![4] [5]
Boost library: Only use one Boost library if performance is much more important than easy reuse.

# STL Vector Container Performance

 Vector Operation Properties Memory allocation Rarely, only to grow Traversal performance Fastest (like C array) Sequential memory access Yes Memory overhead 12-16 bytes total Initial allocation overhead Some ms Cache friendly Strong spatial cache locality

# A Possible Implementation – Geometry

To put the above discussed issues into practice, let’s see how an implementation of an OBJ loader might look like. When discussing the implementation, I am using literate programming to make introduction and discussion of concepts easier.[6]

〈Wavefront Object Loader〉 ≡         〈ObjLoader structures〉    〈ObjLoader class〉
With the OBJ loader class as follows:
〈ObjLoader class〉 ≡         class ObjLoader {    	public:    	〈ObjLoader constructor〉    	〈ObjLoader public data〉    };
Before discussing the actual loading technique, we have to define public data and appropriate data structures for vertices, normals, texture coordinates and faces. The point is that faces can contain an arbitrary number of vertices and thus edges, and faces in OBJ files do too. Consequently the loader has to support them as well – otherwise the loader would not be robust. So we have two repeating structures, faces and face nodes/edges. One further question remains: how to handle the optional normal and texture coordinates for each face node? Now this is a question between performance and memory efficiency. Since faces in most OBJ files usually stick to one definition pattern per file or per mesh (v, v/vt, v/vt/vn or v//vn), so can the type of data storage too. Of course, we could use three separate arrays with different length for each v, vt and vn, but this requires two additional short values per face node. Suppose a triangular mesh with 250k faces, these two short values make up $250 \cdot 1024 \cdot 3 \cdot 2 Bytes = 1,46 MByte$.[7] This overhead can be prevented: for each node in a given face, the array sizes for v, vt and vn are usually constant.[8] So we need one unsigned short per face. Such a structure looks like this:
 struct FaceIndices { int* vertices; int* normals; int* texcoords; unsigned short count; }; 
It has the advantage that the number of vertices, normals and texture coordinates may vary among faces, and that not every face needs an array of normals – in which case the pointer is Null. However, the three pointers are pointers to dynamically allocated arrays – pointers to data on the stack invalidate as soon as the program counter goes out of scope, resulting in dangling pointers; variables directly holding arrays on the stack must be reallocated and copied in the array of faces grow or shrink, resulting in heavily allocation overhead (time!). Another important drawback of struct FaceIndices is the use of three separate dynamic arrays: overhead for memory allocation[9] and memory fragmentation. Given the example mesh from above, this struct results in $250 000 \cdot 3 = 750 000$ allocations. Therefore, these three arrays should be combined into one array. Since most meshes contain normals and texture coordinates, the penalty for using a structure for each face node containing indices for vertex, normal and texture coordinate is kept low. The advantage: one dynamic array of this structure per face has all (most of) the advantages of FaceIndices, plus one contiguous block of memory for all v, vt and vn for each face. Thus 250k allocations are only needed. Such a structure obviously seems to be the best trade-off, so let’s define it:
〈ObjLoader structures〉 ≡         struct FaceVertexIndices {    	FaceVertexIndex* 	indices;    	unsigned short 		nodeCount;    };
I have chosen nodeCount instead of size or count because it is less ambiguous in calling code. Now let’s define the data – note that they are defined as public, which normally is not a good design, but the loader is rather temporary and only holds the loaded data. The data should be converted or transferred into application core geometry data structure. In addition, simple getter methods would only complicate calling code due to additional method invocations and the return of pointers to vectors.
〈ObjLoader public data〉 ≡         std::vector 	vertices;    std::vector 	normals;    std::vector 	texcoords;    std::vector 	faces;    unsigned short 	faceValence;
Before actually implementing the parsing method, let’s take a look at memory requirements and scaling characteristics for different (common) meshes. Recall that time complexity for memory allocation is non-deterministic.[10] $$MemoryData = (v \cdot 3 + n \cdot 3 + t \cdot 2) \cdot \text{sizeof}(float) + (1 + vn + vt) \cdot 3 \cdot f \cdot \text{sizeof}(int),$$ where $vn,vt$ is 1 if normals or texture coordinates exists, 0 otherwise. $$MemoryOverhead_\text{struct FaceIndices} = [((1-vn)+(1-vt)) \cdot \text{sizeof}(int) + \text{sizeof}(short)] \cdot f$$ $$NumberAllocations_\text{struct FaceIndices} = 3 \cdot f$$ $$MemoryOverhead_\text{struct FaceVertexIndices} = [((1-vn)+(1-vt)) \cdot 3 \cdot \text{sizeof}(int) + \text{sizeof}(short) + \text{sizeof}(int)] \cdot f$$ $$NumberAllocations_\text{struct FaceVertexIndices} = 1 \cdot f$$ Using the Euler-Poincaré formula[11] to calculate an approximate number of vertices for large triangular meshes, the number of normals and texture coordinates and be estimated as well. Results of these somewhat theoretical calculations can be found in the table below:

 Triangular mesh v n t Memory data struct FaceIndices struct FaceVertexIndices [in Bytes] mem overhead #mallocs mem overhead #mallocs 45k faces 22k 0 0 804 008 450 000 135 000 1 350 000 45 000 45k faces, vn, vt 22k 22k 135k 2 688 008 90 000 135 000 270 000 45 000 125k faces 62k 0 0 2 244 008 1 250 000 375 000 3 750 000 125 000 125k faces, vn, vt 62k 62k 375k 7 488 008 250 000 375 000 750 000 125 000 500k faces 250k 0 0 9 000 008 5 000 000 1 500 000 15 000 000 500 000 500k faces, vn, vt 250k 250k 1.5m 30 000 008 1 000 000 1 500 000 3 000 000 500 000 1m faces 500k 0 0 18 000 008 10 000 000 3 000 000 30 000 000 1 000 000 1m faces, vn, vt 500k 500k 3m 60 000 008 2 000 000 3 000 000 6 000 000 1 000 000

Now that we have the main data structures, let’s go to the actual implementation of parsing the geometry.

〈ObjLoader constructor〉 ≡         ObjLoader::ObjLoader(string filename) {    	faceValence = 0;    	〈Temporary variables〉    	ifstream ifs(filename.c_str(), ifstream::in);    	while (ifs.good() && !ifs.eof() && std::getline(ifs, line)) {    		〈Read line key〉    		if (key == “v”) {    			〈Parse vertex〉    		} else if (key == “vn”) {    			〈Parse normal〉    		} else if (key == “vt”) {    			〈Parse texture coordinate〉    		} else if (key == “f”) {    			〈Parse face〉    		}    	}    };
The fragment 〈Temporary variables〉 won’t be included here because it only contains default allocations of some variables (integers, floats, strings etc). Since we are using stringstreams, reading the key of a given line is quite straightforward:
〈Read line key〉 ≡         key = “”;    stringstream str(line);    str >> key >> ws;
This tells the stream to put all characters of str into key until the first whitespace. Next, we parse the vertex. Recall that the key v represents a line where a vertex is defined.
〈Parse vertex〉 ≡         str >> x >> ws >> y >> ws >> z;    vertices.push_back(x);    vertices.push_back(y);    vertices.push_back(z);
Here, all characters from the current position in the stream stream until the next white space are put into x. Note that x is of type float: now you see the strength of C++ – conversion is automatically done and the task is implemented in one line of code, which makes it easy to maintain and understand. Also note that this implementation implicitly assumes vertices of 3 coordinates. The same applies to the fragment , so it is not represented here. Parsing texture coordinates is quite similar too, except for a minor difference: a texture coordinate only has 2 float values:
〈Parse vertex〉 +≡         str >> x >> ws >> y;    texcoords.push_back(x);    texcoords.push_back(y);
There are quite a few ways of how to implement the parsing of face lines, each having their own drawbacks. Recall that FaceVertexIndices contains the pointer FaceVertexIndex* as a dynamic array to indices. In two different ways can this array be allocated and filled. Method#1: use STL vector and optionally convert it to a C array.
 vector* fvi = new vector(3); 
However, in practice this approach proved to be extremely slow compared to classic C arrays. That is way the implementation reuses one instance of this vector during parsing. While reading each component in a given face string, this vector is being overwritten (an additional index is updated). As soon as end of line is triggered, a dynamic C array is allocated on the heap, and all parsed data are copied from vector into this new C array. It’s address is stored in FaceVertexIndices:
〈Parse face〉 ≡         int i = 0;    while (!str.eof()) {    	〈Parse face components〉    }    FaceVertexIndices fi;    fi.nodeCount = i;    fi.indices = new FaceVertexIndex[i];    for (int j=0; j    	fi.indices[j] = indices[j];    faces.push_back(fi);    faceValence = (faceValence==i || faceValence==0) ? i : -1;
One method for parsing the face components would be to first parse the string by separated white spaces, resulting in a number of distinct strings for each face node, and then parse this substring. However, practical measurements showed that because of an additional string allocation and of double parsing. For a test triangular mesh of about 15k faces, 45k allocations of a stringstream instance on the stack took about 2.3s on a Quad 2.4GHz computer:
 string s1; str >> s1 >> ws; // ~ 500ms stringstream s(s1); // ~ 2.3s 
For the actual implementation, each character in the string in only parsed once. The following a, b and c variables hold the vertex index, texture coordinate index and the normal index, respectively. Moreover, each a, b and c need to be decremented if parsed because vector indices start with 0, but vertices in OBJ files are numbered starting with 1.
〈Parse face components〉 ≡         a = b = c = 0;    str >> a;    if (str.get() == '/') {    	if (str.peek() != '/') {    		str >> b;    		b--;    	}    	if (str.get() == '/') {    		str >> c;    		c--;    	}    }    if (i >= indices.size())    	indices.resize(i+2);    indices[i].vertex = (a-1)*3;    indices[i].texcoord = b*2;    indices[i].normal = c*3;    i++;
And this is pretty much all code required to load geometry in a given OBJ file format.

# Performance

Finally, let's take some measurements. The test models used originate mostly from The Stanford 3D Scanning Repository. Downloaded the decimated meshes from my server so that you can reproduce the results or that you don't have to simplify them too. Mesh simplification was done using MeshLab and Quadric Edge Collapse Decimation.

 Triangular mesh v n t f Time /ms Memory /Bytes Memory /MBytes xyzrgb_dragon 45k 45 077 45 077 0 89 999 13774.392 4 501 810 4.29 xyzrgb_dragon 90k 90 077 90 077 0 179 999 26371.997 9 001 810 8.58 xyzrgb_statuette 31k 31246 31246 0 62500 9608.579 3124904 2.98 xyzrgb_statuette 62k 62496 62496 0 125000 26371.997 6249904 5.96 xyzrgb_statuette 125k 124 996 124 996 0 250 000 36485.477 12 499 904 11.92 dragon 31k 31142 31142 0 62 500 9510.466 3 122 408 2.98 dragon 62k 62392 62392 0 125 000 18683.915 6 247 408 5.96 dragon 125k 124892 124892 0 250 000 35929.317 12 497 408 11.92

Profiling the process of loading the OBJ file "dragon - 125k.obj" gives quite interesting results: the bottleneck of the loader seems to be std::basic_string::_Eos(unsigned int), due to calling std::getline. An excerpt from the calltree:

 Level Function Name Elapsed Incl. Time Elapsed Excl. Time Elapsed Incl. Time % Elapsed Excl. Time % Number of Calls 5 ObjLoader::ObjLoader(class std::basic_string,class std::allocator >) 59,460.32 4.1 85.58 0.01 1 6 std::getline,class std::allocator >(class std::basic_istream > &,class std::basic_string,class std::allocator > &) 58,444.27 59.88 84.12 0.09 6,599 7 std::getline,class std::allocator >(class std::basic_istream > &&,class std::basic_string,class std::allocator > &,char) 58,381.38 606.3 84.03 0.87 6,599 8 std::basic_string,class std::allocator >::operator+=(char) 56,737.98 22.46 81.66 0.03 198,810 9 std::basic_string,class std::allocator >::append(unsigned int,char) 56,714.31 328.19 81.63 0.47 198,810 10 std::basic_string,class std::allocator >::_Eos(unsigned int) 55,807.72 54,584.78 80.32 78.56 198,810

The code of the bottleneck method is defined in xstring:
 void _Eos(size_type _Newsize) { // set new length and null terminator _Traits::assign(_Myptr()[this->_Mysize = _Newsize], _Elem()); } 
The function body makes up 78.6% of elapsed inclusive time. _Myptr() requires about 0.9%, and the implicit assign operation about 0.8% (std::char_traits::assign(char&, char const &)). So the question remains: what's going on and what can be done to speed up the process? Thanks to the great posts by Tino Didriksen, the reason is the by far optimal implementation of stringstreams by MSVC10! He suggests to use Boost.Spirit as being the fastest correct implementation.

# Current Limitations

The proposed loader assumes one object/mesh per OBJ file. Additionally, no groups or lines are handled, nor any materials. Besides, a performance implementation will be discussed and published soon.[12]

In the second part of this post, I am going to discuss some of these issues. In general, this approach proofed to be a good compromise between speed, memory efficiency and usage by rather small applications.

1. Reading binary files requires that the programmer knows the file format structure. Working with rather theoretical format specifications are required because the format cannot be easily seen.^
2. STL vector is a dynamic C array, managing all the bookkeeping. Traversal performance is the fastest among the STL containers (like C array), sequential memory access is faster due to cache locality.^
3. In reality, the MMU or some other hardware near it loads several cache lines at once. Because C arrays are stored in one contiguous block of memory, several more elements are loaded into cache too.^
4. Although the loader has to be independent of any graphics library, taking a look at how professionals designed data structures in OpenGL or Direct3D can be very helpful. For more information, see the required data structures for paramters in glVertexPointer and glBufferData (according to the professor Held I had in a computer graphics course).^
5. This section has been updated due to one incorrect statement: “It not only consumes a lot more memory data and allocations, but the caller has no unique/safe way to determine identical vertices due to floating point inaccuracies.” (thanks to Wouter Lindenhof). See post condition for floating point assignments. Besides, floating point calculations should be done carefully! Even with NVIDIA CUDA on double precision (e.g. division).^
6. Literate programming uses nested fragments ‹fragment› to build source code. See Overview and Wikipedia.^
7. An unsigned short value usually requires 2 Bytes of memory and holds an integer up to 65535. See Wikipedia’s article on integer.^
8. In most OBJ files, there is not face pattern like f v v/vt v//vn v/t ....^
9. Dynamic memory allocation approx. requires 10x more time due to much more complex operating system calls: the OS has to find best data blocks in primary memory to keep fragmentation as low as possible^
10. The time complexity for a heap allocator can be different on different systems, depending on what they might be optimizing for.^
11. The Euler-Poincaré formula states that, for a polyhedral mesh topologically equivalent to a sphere, V – E + F = 2, where V, E, and F are the numbers of vertices, edges, and faces, respectively. One can show that for a large triangular mesh the ratio V:F:E is approximately 1:2:3. See proof.^
12. A performance ASCII file reader will most likely use the Boost Spirit library for string to double conversion^