Computer science review

From Wiki Essays

Jump to: navigation, search

In case I'm quizzed on these topics, or need a quick reminder, here is a review.

Contents

Detailed Reviews

Language-Specifics

Data Structures

  • Linked list - basis of many dynamic structures
  • Stack - LIFO
  • Queue - FIFO
  • B-tree - balanced binary tree. Used in databases, especially spatially-enabled ones.
  • Hash
  • Heap
  • Trie - replacement for many data structures

Language-specific Data Structures

  • Perl:
    • scalar
    • array
    • associative array (hash)
    • reference
  • Python:
  • Java:

C/C++:

SQL and NoSQL

SQL

  • Postgres/PostGIS - geospatially aware Postgres
  • Greenplum - massively distributed Postgres

NoSQL (web-scale)

NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount, and more.

  • MongoDB - API: BSON, Protocol: lots of langs, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding.

Sharding

Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than splitting by columns (as for normalization). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

There are numerous advantages to this partitioning approach. The total number of rows in each table is reduced. This reduces index size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables a distribution of the database over a large number of machines, which means that the database performance can be spread out over multiple machines, greatly improving performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g. European customers vs. American customers) then it may be possible to infer the appropriate shard membership easily and automatically, and query only the relevant shard.

Data Structure Sizes (structs)

  • 32-bit vs 64-bit?
  • longs vs ints, floats vs doubles, signed/unsigned

Character Encodings

  • UTF-8 (and 16) are standard Unicode representations. Always use UTF-8.
  • ISO-8859-1 is not out of date -- too ASCII-based.

Signature

The signature of a function is a way of describing the parameters and parameter types with which a legal call to the function can be made. It contains the name of the function, its parameters and their type, and the return value. Signatures may also be referred to as "function signatures", "method signatures", or "prototype definitions"[citation needed].

The notion of a function signature is an important concept for all computer science undergraduates

Modern Object Orientation techniques make much use of Interfaces which are essentially a template made from function signatures

C++ teaching must cover the important concept of function overloading which involves first introducing function signatures.

The practice of multiple inheritance requires careful consideration of the function signatures to avoid unpredictable results.

Computer science theory and the teaching of polymorphism in particular makes much use of the concept of function signature.

Roughly equivalent to its prototype definition in the C programming language.

CS, like many sciences, reuses terminology frequently. If you encounter the term 'signature' in an area of CS not concerned with programming then be sure to check your terminology with the published texts to avoid confusion.

Other areas of Computer Science where you might encounter the term signature:

The study of databases might also involve the use of the term 'signature', however, before confusing the two things be sure of the definition at the top of this article. In the ML family of programming languages, signature is a keyword referring to a construct of the module system that plays the role of an interface.

Sorting

Recursion

  • refresh on fibonacci sequence, factorial, etc.

Web Service Scalability

Web applications, by nature, span two drastically different programming domains: The high-level web design and development domain and the low-level, high-performance, and distributed domain. Caching and load balancing play a role.

HTTP + JSON. The data being sent between the two languages is serialized, sometimes with JSON, and then passed along via HTTP. Using an independent intermediate representation to serialize the data (JSON), any component of the application stack may be swapped out for something completely new.

Django web framework comes pre-packaged with SimpleJSON to serialize the body of the HTTP request:

def send_to_erlang(data):
   url = "http://erlang.nodes.tld:8000/"
   body = json.dumps(data)
   headers = {'Content-Type': 'application/jsonrequest',
              'User-Agent'  : 'Python/Project/0.1'}
   urlopen(Request(url, body, headers))

Profiling

Threading

Memory management in C/C++

Definitions

Data structure alignment

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system). Data alignment means putting the data at a memory offset equal to some multiple of the word size, which increases the system's performance due to the way the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding.

Low coupling / high cohesion

degree of reliance between modules.

Serialization

In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and "resurrected" later in the same or another computer environment.[1] When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object oriented objects does not include any of their associated methods with which they were previously inextricably linked.

This process of serializing an object is also called deflating or marshalling an object.[2] The opposite operation, extracting a data structure from a series of bytes, is deserialization (which is also called inflating or unmarshalling).

Python Pickle

Python implements serialization through the standard library module pickle, and to a lesser extent, the older marshal. marshal does offer the ability to serialize Python code objects, unlike pickle. In addition, Python offers the cPickle module, which (as the name suggests) is a C implementation of the pickle module. It can be up to 1000 times faster than the pure Python pickle module, but has a few limitations. The shelve module is based on the pickle module and can be regarded as a serialized Python dictionary . As of version 2.6, Python's standard library also includes support for JSON and for XML-encoded property lists. (See json and plistlib, respectively.) However, these modules only handle basic Python types like strings, integers, and collections of basic types, whereas pickle is intended for arbitrary objects.

Expected questions

  • final vs finally vs finalize?
    • final – constant declaration.
    • finally – The finally block always executes when the try block exits, except System.exit(0) call. This ensures that the finally block is executed even if an unexpected exception occurs. But finally is useful for more than just exception handling — it allows the programmer to avoid having cleanup code accidentally bypassed by a return, continue, or break. Putting cleanup code in a finally block is always a good practice, even when no exceptions are anticipated.
    • finalize() – method helps in garbage collection. A method that is invoked before an object is discarded by the garbage collector, allowing it to clean up its state. Should not be used to release non-memory resources like file handles, sockets, database connections etc because Java has only a finite number of these resources and you do not know when the garbage collection is going to kick in to release these non-memory resources through the finalize() method.
  • What is VLSI?

very large scale integration

  • What is endianness? What part of a microprocessor determines endianness?


  • Which is faster -- stack or heap?

Stack is faster (only requires a pointer to be moved), but is limited. Heap is...

  • How do you determine if a linked list has circular references?

Reverse the list

O(n) time complexity

If you reverse the list, and remember the inital node, you will know that there is a cycle if you get back to the first node. While efficient, this solution changes the list. Reversing the list twice would put the list back in its initial state, however this solution is not appropriate for multi-threaded applications. In some cases there may not be a way to modify nodes. Since changing the nodes is not needed to get the answer, this solution is not recommended.

// Solution modifies the list
function boolean hasLoop(Node startNode){
 Node previousNode = null;
 Node currentNode = startNode;
 Node nextNode;
 if (!currentNode.next()) return false;
 while(currentNode){
   nextNode = currentNode.next();
   currentNode.setNext(previousNode);
   previousNode = currentNode;
   currentNode = nextNode;
 }
 return (previousNode == startNode);
}

A better algortihm -- Tortise and the Hare - Catch Loops in Two Passes

O(n) time complexity

Simultaneously go through the list by ones (slow iterator) and by twos (fast iterator). If there is a loop the fast iterator will go around that loop twice as fast as the slow iterator. The fast iterator will lap the slow iterator within a single pass through the cycle. Detecting a loop is then just detecting that the slow iterator has been lapped by the fast iterator.

// Best solution
function boolean hasLoop(Node startNode){
 Node slowNode = Node fastNode1 = Node fastNode2 = startNode;
 while (slowNode && fastNode1 = fastNode2.next() && fastNode2 = fastNode1.next()){
   if (slowNode == fastNode1 || slowNode == fastNode2) return true;
   slowNode = slowNode.next();
 }
 return false;
}
  • How does garbage collection work in Java?
    1. objects are created on heap in Java irrespective of there scope e.g. local or member variable. while its worth noting that class variables or static members are created in method area of Java memory space and both heap and method area is shared between different thread.
    2. Garbage collection is a mechanism provided by Java Virtual Machine to reclaim heap space from objects which are eligible for Garbage collection.
    3. Garbage collection relieves java programmer from memory management which is essential part of C++ programming and gives more time to focus on business logic.
    4. Garbage Collection in Java is carried by a daemon thread called Garbage Collector.
    5. Before removing an object from memory Garbage collection thread invokes finalize () method of that object and gives an opportunity to perform any sort of cleanup required.
    6. You as Java programmer can not force Garbage collection in Java; it will only trigger if JVM thinks it needs a garbage collection based on Java heap size.
    7. There are methods like System.gc () and Runtime.gc () which is used to send request of Garbage collection to JVM but it’s not guaranteed that garbage collection will happen.
    8. If there is no memory space for creating new object in Heap Java Virtual Machine throws OutOfMemoryError or java.lang.OutOfMemoryError heap space
    9. J2SE 5(Java 2 Standard Edition) adds a new feature called Ergonomics goal of ergonomics is to provide good performance from the JVM with minimum of command line tuning.
  • Have you ever written multithreaded code? Tell me about it..
  • What is a hash table? How do they work? What is the time complexity for a lookup? What happens if two keys are the same?
  • What's your favorite feature about your favorite language? Tell me about it...
  • Describe a depth-first search and a breadth-first search... What's the time complexity to iterate all nodes in a binary search tree?
  • How do you know which direction a memory stack grows in?

In C, print out the address of variables. You can tell if the direction by comparing 2 addresses in the same stack call.

Personal tools