Hello world ❤️‍🔥

Jan 10

In November 2023, we founded Argmax to empower developers and enterprises who are eager to deploy commercial-scale inference workloads on user devices. To turbocharge our ambitious roadmap, we raised capital from General Catalyst and industry leaders. After the launch of our first open-source project, with several more in the pipeline, we are now hiring and starting to work with early customers. Read on for details or drop us a note!

Why On-device?

Our market research and survey of industry leaders convinced us that on-device deployment is highly desirable compared to server-side in many production settings for reasons such as:

0% marginal cost for inference
Data privacy and compliance by design
Consistent latency with no downtime, network lag or connectivity requirement

To be clear, we do not claim that all inference workloads will be on-device. We will transform the market so that most are. Today, the priority of on-device inference has high variance across market segments and we are working with customers where it is business critical first. In the meanwhile, we are building tools (open-source) and conducting research (open science) to transform the remaining market segments for broader adoption within the next two years.

Our founding team has spent the last 6 years at Apple building a track record of building on-device inference algorithms and software with industry-leading performance. Some notable recent projects include: Transformers for the Apple Neural Engine, Fastest Stable Diffusion on iPhone and Mixed-bit Model Compression. We are also core contributors to the private inference engine behind Core ML.

We have identified the “showstoppers” that are holding back on-device deployment from becoming the industry standard for most inference workloads and we are tackling them one by one:

System stability during high RAM utilization
High download and storage size requirements
Unacceptable quality degradation from traditional model compression
Battery life for mobile devices

If you think we missed an important one, we are eager to hear from you!

A common cause underlying these showstoppers is the model size. Applying compression techniques from the past to large foundation models of today created the perception that user devices are simply not capable enough to execute them just yet in a production setting. Argmax conducts compression research (applied and fundamental) to invent and deliver the next generation of model compression techniques and break this perception.

Open-Source

We benefited immensely from open-source projects such as PyTorch, coremltools, transformers, diffusers and so on. In turn, we bolstered projects such as Mochi Diffusion and whisper.cpp.

We are eager to sustain this virtuous cycle between open-source projects towards running state-of-the-art foundation models on user devices.

Towards that end, we are committed to open-sourcing most of our core R&D output while building our business on licensing bleeding-edge inference performance products, customer-requested features and customer-level quality-of-inference SLAs.

WhisperKit

We announced WhisperKit two months after founding Argmax as our first project. It is a collection of tools and libraries optimized for real-time performance and extensibility, built to deploy billion parameter scale Transformers compatible with the Whisper speech recognition model on Apple devices as small as the Apple Watch and as old as the iPhone 12!

WhisperKit transcribing a YouTube video in real-time on an Apple Watch Ultra 2

This is the first in a series of projects with vertically integrated software to deliver popular foundation models with industry-leading performance while handling the entire model lifecycle: From over-the-air model delivery to prediction post-processing. Each framework we build addresses a canonical inference workload for a market segment that is poised to leverage on-device inference as a competitive advantage.

WhisperKit is a case study in Argmax’s approach:

We formalize the behavior change (if any) in compressed models via quality-of-inference (QoI) reports with example-level granularity.
Software evolves, dependencies break, performance numbers regress. We offer Service-Level Agreements (SLA) as a countermeasure.
Roadmap for the project is on GitHub. We are happy to collaborate with the open-source community.

Join Argmax

We are hiring! See the Careers page for detail then email us at hireme @takeargmax.com with the role you are interested in and (optionally) a link to a related project you are proud of.

Argmax is an open-source inference optimization company building the next generation of compression techniques and on-device inference software for developers and enterprises. For press inquiries: press@takeargmax.com

Atila Orhon