Files
hash-map/CLAUDE.md

494 lines
13 KiB
Markdown

# HashMap Implementation - Technical Documentation
## Overview
This is a production-ready HashMap implementation in TypeScript that strictly follows OOP SOLID principles and best practices. The implementation uses separate chaining for collision resolution and provides automatic resizing based on load factor.
## SOLID Principles Implementation
### 1. Single Responsibility Principle (SRP)
Each class has one clearly defined responsibility:
#### `HashMap` (`src/core/HashMap.ts`)
- **Responsibility**: Managing the hash table and coordinating operations
- **Single Purpose**: Provide efficient key-value storage and retrieval
#### `HashNode` (`src/models/HashNode.ts`)
- **Responsibility**: Storing a single key-value pair and linking to the next node
- **Single Purpose**: Data container for collision chains
#### `DefaultHashFunction` (`src/hash-functions/DefaultHashFunction.ts`)
- **Responsibility**: Computing hash values for keys
- **Single Purpose**: Convert keys to bucket indices
#### `NumericHashFunction` (`src/hash-functions/NumericHashFunction.ts`)
- **Responsibility**: Optimized hashing for numeric keys
- **Single Purpose**: Provide better distribution for numeric data
### 2. Open/Closed Principle (OCP)
**Open for Extension, Closed for Modification**
The implementation is extensible without modifying core code:
```typescript
// Extend functionality by providing custom hash functions
class CustomHashFunction implements IHashFunction<string> {
hash(key: string, capacity: number): number {
// Custom hashing logic
return /* computed hash */;
}
}
// Use custom function without modifying HashMap
const map = new HashMap<string, number>(16, 0.75, new CustomHashFunction());
```
**Key Design Decisions:**
- Hash function is injected via constructor (dependency injection)
- New hash strategies can be added without changing HashMap
- Generic types allow any key/value types without modification
### 3. Liskov Substitution Principle (LSP)
**Subtypes must be substitutable for their base types**
All implementations properly implement their interfaces:
```typescript
// Any IHashFunction can replace another
function createMap<K, V>(hashFn: IHashFunction<K>): IHashMap<K, V> {
return new HashMap<K, V>(16, 0.75, hashFn);
}
// All these work identically
const map1 = createMap(new DefaultHashFunction());
const map2 = createMap(new NumericHashFunction());
const map3 = createMap(new CustomHashFunction());
```
**Guarantees:**
- All IHashFunction implementations provide correct hash values
- HashMap correctly implements IHashMap interface
- No unexpected behavior when substituting implementations
### 4. Interface Segregation Principle (ISP)
**Clients shouldn't depend on interfaces they don't use**
The codebase provides focused, minimal interfaces:
#### `IHashFunction<K>`
```typescript
interface IHashFunction<K> {
hash(key: K, capacity: number): number;
}
```
- Single method interface
- Only requires hash computation
- No unnecessary methods
#### `IHashMap<K, V>`
```typescript
interface IHashMap<K, V> {
set(key: K, value: V): void;
get(key: K): V | undefined;
has(key: K): boolean;
delete(key: K): boolean;
clear(): void;
// ... iterator methods
}
```
- Focused on map operations
- No coupling to hashing details
- Clean separation of concerns
### 5. Dependency Inversion Principle (DIP)
**Depend on abstractions, not concretions**
High-level modules depend on abstractions:
```typescript
export class HashMap<K, V> implements IHashMap<K, V> {
private readonly hashFunction: IHashFunction<K>; // Depends on abstraction
constructor(
initialCapacity: number = 16,
loadFactorThreshold: number = 0.75,
hashFunction?: IHashFunction<K> // Inject dependency
) {
this.hashFunction = hashFunction ?? new DefaultHashFunction<K>();
}
}
```
**Benefits:**
- HashMap doesn't depend on concrete hash implementations
- Easy to test with mock hash functions
- Can swap hash strategies at runtime
- Follows Dependency Injection pattern
## Architecture
### Directory Structure
```
src/
├── core/ # Core implementations
│ └── HashMap.ts # Main HashMap class
├── interfaces/ # Contracts and abstractions
│ ├── IHashFunction.ts # Hash function interface
│ └── IHashMap.ts # HashMap interface
├── models/ # Data structures
│ └── HashNode.ts # Collision chain node
├── hash-functions/ # Hashing strategies
│ ├── DefaultHashFunction.ts # General-purpose hashing
│ └── NumericHashFunction.ts # Numeric optimization
├── examples/ # Usage demonstrations
│ ├── basic-usage.ts
│ └── custom-hash-function.ts
└── index.ts # Public API exports
```
### Design Patterns Used
#### 1. Strategy Pattern
- **Where**: Hash function selection
- **Why**: Allows different hashing algorithms to be plugged in
- **Implementation**: `IHashFunction` interface with multiple implementations
#### 2. Iterator Pattern
- **Where**: `keys()`, `values()`, `entries()` methods
- **Why**: Provides consistent way to traverse the collection
- **Implementation**: Generator functions with `IterableIterator<T>`
#### 3. Dependency Injection
- **Where**: Constructor accepts `IHashFunction`
- **Why**: Decouples HashMap from specific hash implementations
- **Implementation**: Constructor parameter with default
### Data Structure Design
#### Collision Resolution: Separate Chaining
```
Buckets Array:
[0] -> Node(k1, v1) -> Node(k2, v2) -> null
[1] -> null
[2] -> Node(k3, v3) -> null
[3] -> Node(k4, v4) -> Node(k5, v5) -> Node(k6, v6) -> null
...
```
**Advantages:**
- Simple to implement
- No clustering issues
- Can handle high load factors
- Dynamic growth with chains
**Trade-offs:**
- Extra memory for node references
- Cache locality could be better
- O(n) worst-case for long chains
#### Load Factor and Resizing
**Default Configuration:**
- Initial Capacity: 16 buckets
- Load Factor Threshold: 0.75
**Resizing Strategy:**
```typescript
if (size / capacity >= loadFactorThreshold) {
resize(capacity * 2); // Double the capacity
}
```
**Why 0.75?**
- Good balance between space and time
- Keeps chains short on average
- Industry standard (used by Java HashMap)
## Performance Characteristics
### Time Complexity
| Operation | Average Case | Worst Case | Notes |
|-----------|--------------|------------|-------|
| `set(k, v)` | O(1) | O(n) | Worst case if all keys hash to same bucket |
| `get(k)` | O(1) | O(n) | Requires traversing collision chain |
| `has(k)` | O(1) | O(n) | Same as get |
| `delete(k)` | O(1) | O(n) | Requires finding and unlinking node |
| `clear()` | O(capacity) | O(capacity) | Must null all bucket references |
| `keys()` | O(n) | O(n) | Must visit all entries |
| `values()` | O(n) | O(n) | Must visit all entries |
| `entries()` | O(n) | O(n) | Must visit all entries |
### Space Complexity
- **Storage**: O(n) where n is number of entries
- **Overhead**: O(capacity) for buckets array
- **Per Entry**: Constant overhead for HashNode
### Load Factor Impact
```
Load Factor = size / capacity
Low Load Factor (< 0.5):
✓ Fewer collisions
✓ Faster operations
✗ Wastes memory
High Load Factor (> 0.9):
✓ Better memory usage
✗ More collisions
✗ Slower operations
Optimal (0.75):
✓ Good balance
✓ Reasonable memory usage
✓ Good performance
```
## Best Practices Demonstrated
### 1. Type Safety
```typescript
// Full generic support
const map = new HashMap<string, User>(); // Type-safe
map.set("id", user); // ✓ Correct
map.set(123, user); // ✗ Type error
```
### 2. Immutability Where Appropriate
```typescript
// Read-only properties
private readonly hashFunction: IHashFunction<K>;
private readonly loadFactorThreshold: number;
private readonly initialCapacity: number;
```
### 3. Defensive Programming
```typescript
// Validate constructor arguments
if (initialCapacity <= 0) {
throw new Error("Initial capacity must be positive");
}
if (loadFactorThreshold <= 0 || loadFactorThreshold > 1) {
throw new Error("Load factor must be between 0 and 1");
}
```
### 4. Clear Documentation
- Every public method documented with JSDoc
- Time complexity noted in comments
- Usage examples provided
### 5. Comprehensive Testing
- 32 test cases covering all functionality
- Edge cases (null, undefined, empty strings)
- Performance tests (1000 entries)
- Custom hash function tests
### 6. Iterator Support
```typescript
// Makes HashMap usable in for...of loops
[Symbol.iterator](): IterableIterator<[K, V]> {
return this.entries();
}
// Usage
for (const [key, value] of map) {
console.log(key, value);
}
```
### 7. Separation of Concerns
- Hashing logic separated from storage logic
- Node structure separated from HashMap
- Interfaces defined separately from implementations
## Advanced Features
### 1. Custom Hash Functions
Create domain-specific hash functions:
```typescript
// Case-insensitive string keys
class CaseInsensitiveHash implements IHashFunction<string> {
hash(key: string, capacity: number): number {
return computeHash(key.toLowerCase(), capacity);
}
}
// Composite object keys
class PersonHashFunction implements IHashFunction<Person> {
hash(person: Person, capacity: number): number {
const str = `${person.firstName}:${person.lastName}:${person.age}`;
return computeHash(str, capacity);
}
}
```
### 2. Performance Monitoring
```typescript
const map = new HashMap<string, number>();
// Monitor internal state
console.log(`Capacity: ${map.capacity}`);
console.log(`Size: ${map.size}`);
console.log(`Load Factor: ${map.loadFactor}`);
```
### 3. Bulk Operations
```typescript
// Efficient bulk insertion
const entries: [string, number][] = [
["a", 1], ["b", 2], ["c", 3]
];
for (const [key, value] of entries) {
map.set(key, value);
}
```
## Testing Strategy
### Test Coverage
```bash
bun test
```
**Coverage Breakdown:**
- Core HashMap: 100% function/line coverage
- Hash Functions: 66-87% (edge cases for special values)
- Overall: 92% line coverage
### Test Categories
1. **Constructor Tests**
- Default initialization
- Custom parameters
- Invalid input validation
2. **Basic Operations**
- Set/Get/Has/Delete
- Update existing values
- Non-existent keys
3. **Iteration Tests**
- Keys iterator
- Values iterator
- Entries iterator
- forEach callback
- for...of loops
4. **Resizing Tests**
- Automatic growth
- Data preservation
- Load factor triggers
5. **Edge Cases**
- Null values
- Undefined values
- Empty string keys
- Large datasets (1000 entries)
6. **Custom Hash Functions**
- NumericHashFunction
- Custom implementations
## Usage Examples
### Basic Usage
```typescript
const scores = new HashMap<string, number>();
scores.set("Alice", 95);
scores.set("Bob", 87);
console.log(scores.get("Alice")); // 95
```
### With TypeScript Interfaces
```typescript
interface Product {
id: number;
name: string;
price: number;
}
const products = new HashMap<number, Product>();
products.set(1, { id: 1, name: "Widget", price: 9.99 });
```
### Custom Configuration
```typescript
const map = new HashMap<string, number>(
32, // Initial capacity
0.8, // Load factor threshold
customHashFn // Custom hash function
);
```
## Comparison with Native Map
### Advantages of This Implementation
1. **Educational Value**: Shows internal workings
2. **Customizable**: Inject custom hash functions
3. **Observable**: Can monitor capacity and load factor
4. **Extensible**: Easy to add new features
### Native Map Advantages
1. **Performance**: Highly optimized in V8/JSC
2. **Battle-tested**: Used in production worldwide
3. **Standard API**: Consistent across platforms
### When to Use Each
**Use HashMap (this implementation):**
- Learning data structures
- Need custom hash functions
- Want to understand internals
- Require specific behavior
**Use Native Map:**
- Production applications
- Performance critical paths
- Standard use cases
- Browser compatibility needs
## Future Enhancements
Possible improvements while maintaining SOLID principles:
1. **Additional Hash Functions**
- CryptoHashFunction (secure hashing)
- IdentityHashFunction (reference equality)
2. **Performance Optimizations**
- Red-black tree for long chains (like Java 8+)
- Dynamic shrinking on deletions
3. **Additional Features**
- Weak key references
- Computed values (getOrCompute)
- Batch operations
4. **Observability**
- Event listeners for changes
- Statistics tracking
- Performance metrics
## Conclusion
This HashMap implementation demonstrates how to build a production-quality data structure while adhering to SOLID principles. The clean architecture makes it maintainable, testable, and extensible. It serves as both a practical tool and an educational resource for understanding hash tables and object-oriented design.