Smart Glasses Powered by femtoAI

More AI. Less Power. Lightning Quick.

Introduction

Portable intelligent devices are already ubiquitous, and they are rapidly becoming more capable. AI workloads that once required the cloud can now execute directly where data is generated, within ultra-low-power budgets, fast response times, and increasingly constrained form factors. These growing capabilities are driving growing demand.

Smart glasses exemplify this shift with sales increasing 110% YoY in 1H 2025. A smart glasses system is inherently distributed, ingesting multimodal inputs at the face, e.g., microphones, IMUs, touch sensors, and producing real-time outputs such as audio (voice assistants, calls), visual overlays, and haptic feedback. Today, much of this processing is offloaded to a companion phone, which offers more compute and battery capacity. However, this architecture fundamentally limits responsiveness, power consumption, privacy, and system robustness.

The Problem

Nearly all smart eyewear solutions on the market today rely on a companion phone or the cloud to perform their “intelligent” processing. While convenient to quickly add capabilities, this off-device architecture introduces fundamental limitations that directly impact user experience and product viability such as:

High latency and degraded responsiveness, particularly for real-time audio interactions
Increased power consumption and reduced battery life, due to constant data movement
Privacy and security concerns associated with streaming raw sensor data off-device
Dependence on closed, locked-down mobile SoC ecosystems, limiting flexibility and differentiation

These challenges are compounded by having these intelligent workloads execute concurrently, continuously, and within an extremely constrained thermal and power budget—typically ~200 mAh available for all on-device computation in entry-level designs.

Shifting core intelligence onto the edge fundamentally changes this tradeoff. By executing latency- and privacy-sensitive workloads directly on the device, edge deployment can eliminate bottlenecks introduced by connectivity, reduce power consumption, and improve system robustness.

To illustrate this, consider the most valuable audio-driven smart-glasses experiences—each of which strongly benefits from on-device edge AI

Always-on voice interfaces
Real-time noise reduction and speech enhancement
Premium, context-aware audio

Always-on Voice Interfaces: The human voice is a natural interface for controlling smart glasses. Wake-word detection (e.g., “Alexa”) and voice control (e.g., “increase volume”, “turn up the sound”) must operate continuously. Since smart glasses usually run on a single Li-Ion battery and have a typical battery life of 4-6hrs, any power savings on this feature will directly translate to battery life extension. A 10x savings in power will increase battery life by 10%! Note that along with extreme constraints on power , these applications simultaneously demand low latency and strong privacy guarantees. Constantly streaming all microphone inputs off-device is neither efficient nor acceptable for privacy-sensitive use cases. These workloads fundamentally belong at the very edge.

Real-time Noise Reduction: The ability to hear and be heard clearly in dynamic, noisy environments is fundamental to a usable smart-glasses experience. AI-based noise reduction dramatically outperforms classical DSP approaches by replacing classical filters with learned contextual ones. For example, hearing-enhancement smart glasses that leverage classical noise reduction algorithms fail to adapt to transient sounds such as clinking in a café or speech-like background noises. AI models trained on contextual noise can suppress transient and speech-like noise intelligently. This class of applications introduces two hard requirements:

Ultra-low latency: end-to-end delay must be <10 ms to ensure own voice naturalness
Algorithm chain compatibility: feedback cancellation, beamforming, and speech enhancement must operate together harmoniously, not as independent blocks

Achieving both under tight constraints and across multiple toolchains is extremely challenging with conventional approaches.

Premium Audio Experiences: High-quality audio is now table stakes; users expect high quality call quality and music playback, regardless of product tier. AI noise reduction enables better voice isolation with less distortion using fewer microphones and consuming less power than classical methods. Additionally, a premium audio platform should provide premium performance in all environments—from construction zones to live concerts—enabling not only clear voice, but high accuracy voice commands and environment-optimized playback.

The femtoAI Solution

With sufficiently capable AI acceleration, Edge AI solves the problems of latency, power, and privacy while granting developers freedom from locked down ecosystems. Most accelerators fall short in delivering value to developers either due to insufficient memory efficiency, power efficiency, developer experience, or all of the above. femtoAI’s accelerators deliver on all three dimensions necessary to realize the potential of Edge AI. Making sparsity a first-class design principle is the key to femtoAI differentiation. Sparsity removes unnecessary computation by eliminating weights and activations that contribute minimally to a neural network’s output. The result is dramatically lower power, smaller memory footprints, and reduced silicon area—without sacrificing model accuracy. Crucially, exploiting sparsity requires end-to-end support: Model pruning and training, sparsity-aware compilation, and hardware designed to execute sparse workloads efficiently.

Unlike quantization (i.e., using 4-bit precision instead of 32-bit precision), sparsity is not a well served optimization technique for developers. Most competitors do not enable sparsity acceleration at all. Of those that do, the acceleration is shoehorned into existing architecture, leading to high overhead that often outweighs the benefits. In contrast, femtoAI’s products are architected from the ground up to unlock the true power of sparsity.

Note also that sparsity allows for larger, higher performing models to be deployed on-device, often obviating the need for additional cloud model verification of on-device inferences.Privacy and security are inherent advantages of this architecture:

No raw data leaves the device
No dependency on cloud or companion devices
Personal identifiers (voice, face, biometrics) remain local at all times

Native Sparse Hardware: SPU-001

femtoAI’s Sparse Processing Unit (SPU-001) natively supports both weight¹ and activation² sparsity:

Up to 90% weight compression stored directly in on-chip SRAM
Up to 90% activation sparsity for dynamic zero skipping

When both are present, dynamic energy reduces by up to 100×, storage reduces by up to 10×, and latency by up to 10x³.

Open Developer-First Ecosystem

femtoAI provides a fully-open development stack designed for production deployment:

Model optimization tools compatible with common frameworks such as PyTorch
Support for customer-owned models and datasets
Full pipeline: define → train → prune → quantize → compile → simulate → deploy
Reference models qualified for immediate production deployment

In addition, femtoAI supports the porting of classical DSP programs through a domain-specific language (DSL) and a library of pre-compiled DSP blocks. This allows existing signal-processing pipelines—such as filtering, beamforming, and feedback cancellation—to be executed on the SPU. Customers can port their proven DSP algorithms to benefit from femtoAI’s low-power execution model.

Beyond porting DSP pipelines, femtoAI enables developers to combine their own custom code with femtoAI’s proprietary, pre-compiled AI algorithms. This makes it possible to stitch together customer-specific processing blocks and femto-provided AI components into a fully customized, end-to-end audio pipeline. Developers retain control over system behavior and differentiation, while avoiding the constraints and opacity of closed, black-box solutions.

System Flexibility and Integration

SPU-001 integrates with many common Bluetooth or Wi-Fi SoCs, allowing OEMs to decouple their AI roadmap from host processor selection. Many mobile and connectivity SoCs restrict AI execution to closed toolchains and opaque runtime environments, limiting differentiation and long-term flexibility. By offloading significantly more intelligence, SPU-001 allows customers to:

Use simpler host processors
Swap host SoCs across generations without re-architecting AI software

Reference integrations are available for most major Bluetooth and Wi-Fi chipsets.

Putting It All Together

Consider a real-time noise reduction and speech enhancement feature typical of smart glasses. Delivering an exceptional user experience requires ultra-low latency, minimal power consumption, small form factor, and seamless noise suppression. The implementation in Figure 1 is production-proven, running with only 8 ms algorithmic latency while consuming less than 900 µW.

At the core of this system is femtoAI’s first product, SPU001, a compact 1.6 mm × 2.2 mm chip that has effective⁴ on-chip memory capacity of 10MB, max throughput of 0.96 TOPS, and compute efficiency of 25 TOPS/W. Note how sparsity provides a 10x to 100x uplift from raw⁵ to effective metrics. A summary of the chip capabilities are in Table 1.

Table 1: Key specifications of femtoAI’s Sparse Processing Unit, SPU001

	SPU-001: Actual on SPU	SPU-001: Effective
Memory	1 MB	10 MB
Max Throughput	9.6 GOPS	0.96 TOPS
Compute Efficiency	250 GOPS/W	25 TOPS/W
Compute Density	9.6 GOPS/MB	0.96 TOPS/MB
Retention Leakage (TT25)	80 + 3 µW/64kB	80 + 0.3 µW/64kB

At the algorithmic level, the SPU001 combines AI algorithms (e.g., ClaraF2F, part of our Clara speech enhancement suite of production-ready AI models) with classical DSP algorithms (e.g., Acoustic Feedback Cancellation, AFC, and a beamformer) to deliver complete audio pipelines. The Clara F2F algorithm delivers strong speech enhancement across industry-standard metrics, including PESQ⁶ (1.57), STOI⁷ (0.799), and SI-SDR⁸ (7.74). Comprehensive metrics AI algorithms—covering quality, power consumption, and latency—are available on the femtoAI’s Developer Portal. An example classical DSP algorithm running on SPU001 is an adaptive beamformer, Minimum Variance Distortionless Response (MVDR), showing an average improvement of 2% in character error rate across varying noise conditions such as music, microwave, and range hood.

Figure 1: Block diagram illustrating a face-to-face hearing boost implementation with 2 mics that combines AI and classical DSP algorithms running on the SPU. Optionally (not shown) an additional dynamic range compression block can be added after ClaraF2F to fine tune the signal before the output. The host processor sends data to SPU-001 over SPI for processing and also boots and loads models onto the SPU. The host can be a WiFi/Bluetooth SoC, simple MCU, or any chip with host processor capability.

Next Steps

Edge AI solutions from femtoAI transform smart glasses from thin clients into fully autonomous processing nodes—delivering more intelligence per watt, greater robustness, and consistent user experiences even without connectivity or phone availability.

Delight customers by maximizing on-device intelligence with femtoAI: developers can deploy up to 10× more AI and DSP workloads on the SPU, unlocking a new class of independent, always-available smart glasses. Sign up for our developer portal at developer.femto.ai or reach out to us at developer@femto.ai to get started.

1. Weight sparsity refers to zero-valued weights within a neural network’s connections. Zero-valued weights/connections do not affect the network output and can be removed to reduce memory and compute with femtoAI hardware ↩

2. Activation sparsity refers to zero-valued outputs of a neural network’s neurons. Zero-valued neurons/activations do not affect the network output and are skipped to reduce energy consumption and latency with femtoAI hardware ↩

3. While sparse models deliver the best performance-per-watt and performance-per-MB, SPU-001 fully supports dense neural networks, ensuring flexibility and forward compatibility. ↩

4. Effective: Capacity and capability with sparsity optimizations using the femtoAI SDK ↩

5. Raw: Capability and capacity without any sparsity optimizations ↩

6. PESQ, a measure of the speech quality of the processed audio. PESQ is measured on a scale of 1 to 5, 5 being the best. ↩

7. STOI, a measure of speech intelligibility. STOI is measured on a scale from 0 to 1 ↩

8. SISDR, a measure of the amount of noise removed by the algorithm ↩

Roja de Cande

Director of Product Management

Jon Russo

Senior Solutions Architect

March 6, 2026