LukeW

Syndicate content LukeW | Digital Product Design + Strategy
Expert articles about user experience, mobile, Web applications, usability, interaction design and visual design.
Updated: 8 hours 51 min ago

Chat Interfaces & Declaring Intent

11 hours 8 min ago

There's lots of debate within UI design circles about the explosion of chat interfaces driven by large-scale AI models. While there's certainly pros and cons to open text fields, one thing they are great at is capturing user intent. Which today's AI-driven systems can increasingly fulfill.

At their inception, computers required humans to adapt to how they worked. Developers had to learn languages with sometimes punishing syntax (don't leave the semicolon out!). People needed to learn CLI commands like cd ls pwd and more. Even with graphical user interfaces (GUIS), we couldn't simply tell a computer what to do—we had to understand what computers could do and modify our behavior accordingly by clicking on windows, icons, menus, and more.

Google changed this paradigm with a simple yet powerful way for people to declare their intent: an empty search box. Just type whatever you want into Google and it will find you relevant information in response. This open ended interface not only became hugely popular (close to 9 billion searches per day). It also created an enormous business for Google because matching people's expressed needs with businesses that can fulfill them monetizes extremely well.

But Google's empty text box was limited to information retrieval.

The emergence of large-scale language models (LLMs) expanded what an open-ended declaration of intent could do. Instead of information retrieval, LLMs enabled information manipulation through an empty text box often referred to as a "chat interface". People could now tell systems using natural language (and even misspellings) to summarize content, transform text into poetry, and generate or restructure information in countless ways. And once again this open-ended interface became hugely popular (ChatGPT has 300 million weekly actives since launching in 2022).

The next logical step was combining these capabilities—merging information retrieval with manipulation, as seen in retrieval augmented generation RAG applications (like Ask Luke!), Perplexity, and ChatGPT with search integration.

But finding and manipulating information is just a subset of the things computers allow us to do. An enormous set of computer applications exists to enable actions of all shapes and sizes from editing images to managing sales teams. Finding the right action amongst these capabilities requires remembering the app and how to access and use the feature.

Increasingly, though, AI models can not only find the right action for a task, they can even create an action if it doesn't exist. Through tool use and tool synthesis, LLMs are continuously getting better at action retrieval and manipulation. So today's AI models can combine information retrieval and manipulation with action retrieval and manipulation.

If that sounds like a mouthful, it is. But the user interface for these systems is still primarily an open text-field which allows people to declare their intent. What's changed dramatically is that today's technology can do so much more to fulfill that intent. With such vast and emergent capabilities, why do we want to constrain them with UI?

We've moved from humans learning to speak computer to computers learning to understand humans and I, for one, don't want to go backwards, which is why I'm increasingly hesitant to add more UI to communicate the possibilities of AI-driven systems (despite 30 years of designing GUIs). Let's make the computers figure out what we want, not the other way around.

Do All AI Models Need To Be Assistants?

Sat, 02/01/2025 - 2:00pm

While most AI models default to a "helpful assistant" mode, different dialogue frameworks could enable new kinds of AI interactions or capabilities. Here's how alternative dialogue patterns could change how we interact with AI.

Arguably, the best current Large Language model for coding and language tasks is Anthropic's Claude. Claude was fine-tuned through an approach Anthropic calls Constitutional AI which frames Claude as a "helpful, honest, and harmless" assistant. This framing is embedded in their constitutional principles which guide Claude to:

  • Stay honest without claiming emotions or opinions
  • Remain harmless while maintaining clear professional boundaries
  • Focus on task completion over engagement

But do all useful AI models need to be framed as helpful assistants? Could alternative frameworks create new possibilities for AI interaction? Education researcher Nicholas Burbules identified four main forms of dialogue back in the early nineties that could provide alternatives: inquiry, conversation, instruction, and debate.

  • Inquiry emphasizes joint problem-solving, with both participants contributing insights and methods to find solutions collaboratively. Neither party claims complete knowledge, making it well-suited for research and complex problem exploration.
  • Conversation, unlike task-oriented interactions, doesn't require a defined endpoint or solution, allowing ideas and perspectives to develop naturally through the exchange.
  • Instruction follows a guided learning approach where questioning leads to understanding. The focus stays on developing the learner's capabilities rather than simply providing answers.
  • Debate engages in critical examination of ideas through productive opposition. By testing positions against each other and exploring multiple viewpoints, this pattern helps strengthen arguments and clarify thinking.

Applying one these forms of dialogue to an overall framing for an AI models might lead to personalities that feel more like "rigorous challenger" or "thoughtful colleague" instead of "helpful assistant". While there's certainly a role for assistants in our lives, we work with and learn from lots of different kinds of people. Framing AI models using those differences might ultimately make them helpful in more ways then one.

Improving AI Models Through Inference Scaling

Thu, 01/30/2025 - 2:00pm

In her Inference Scaling: A New Frontier for AI Capabilities presentation at Sutter Hill Ventures, Azalia Mirohosfini shared her team's research showing that giving AI models multiple attempts at tasks and carefully selecting the best results can improve performance. Here's my notes from her talk:

Improving Model Performance
  • Pre-training and fine-tuning have been key focus areas for scaling language models.
  • Traditional fine-tuning starts with next-token prediction on high-quality specialized data
  • Reinforcement Learning from Human Feedback (RLHF) introduced human preferences into the process where people rate/rank outputs for steering model behavior.
  • Constitutional AI moves beyond collecting thousands of human labels to using ~10 human principles in a two-stage approach: models generate and critique outputs based on these principles then RLAIF (Reinforcement Learning from AI Feedback) adds model-generated labels.
  • This improves harmlessness and helpfulness and reduces dependency on human data collection
Inference Time Scaling
  • The "Large Language Monkeys" project showed that repeated sampling (trying multiple times) during inference can significantly improve performance on complex tasks like math and coding
  • Even smaller models showed major gains from increased sampling
  • Performance improvements follow an exponential power law relationship
  • Some correct solutions only appeared in <10 out of 10,000 attempts
  • Key inference time techniques that can be combined: repeated sampling (generating multiple attempts), fusion (synthesizing multiple responses), criticism and ranking of responses, verification of outputs.
  • Verification falls into two categories of problems: automated (coding, formal math proofs) and manual(needs human judgment).
  • Basic approaches like majority voting don't work well, we need better verifiers.
Future Directions
  • Need deeper investigation into whether parallel or serial inference approaches are more effective
  • As inference becomes a larger part of both training and deployment, high-throughput model serving infrastructure becomes increasingly critical.
  • The line between inference and training is blurring, with inference results being fed back into training processes to improve model capabilities.
  • Future models will need seamless self-improvement cycles that continuously enhance their capabilities.
  • More similar to how humans learn through constant interaction and feedback rather than discrete training periods.

Publishing in the Generative AI Age

Mon, 01/06/2025 - 2:00pm

Hey Luke, why aren't you publishing new content? I am... but it's different in the age of generative AI. You don't see most of what I'm publishing these days and here's why.

The Ask Luke feature on this site uses the writings, videos, audio, and presentations I've published over the past 28 years to answer people's questions about digital product design. But since there's an endless amount of questions people could ask on this topic, I might not always have an answer. When this happens, the Ask Luke system basically tells people: "sorry I haven't written about this but here's some things I have written about." That's far from an ideal experience.

But just because I haven't taken the time to write an article or create a talk about a topic doesn't mean I don't have experiences or insights on it. Enter "saved questions". For any question Ask Luke wasn't able to answer, I can add information to answer it in the future in the form of a saved question. This admin feature allows the corpus of information Ask Luke uses to expand but it's invisible to people. Think of it as behind-the-scenes publishing.

Since launching the Ask Luke feature in April 2023, I've added close to 500 saved questions to my content corpus. That's a lot of publishing that doesn't show up as blog posts or articles but can be used to generate answers when needed.

Each of these new bits of content can also be weighted more or less. With more weight, answers to similar questions will lean more on that specific answer.

Without the extra weighting, saved questions are just another piece of content that can be used (or not) to answer similar questions. You can see the difference weighting makes by comparing these two replies to the same question. The first is weighted more heavily toward the saved question I added.

Using this process triggered a bunch of thoughts. Should I publish these saved questions as new articles on my blog or keep them behind the scenes? What level of polish do these types of content additions need? On one hand, I can simply talk fluidly, record it, and let the AI figure what to use. Even if it's messy, the machines will use what they deem relevant, so why bother? On the other hand, I can write, edit, and polish the answers so the overall content corpus is quality is consistently high. Currently I lean more toward the later. But should I?

Zooming up a level, any content someone publishes is out of date the moment it goes live. But generated content, like Ask Luke answers, are only produced when a specific person has a specific question. So the overall content corpus is more like a fully malleable singular entity vs. a bunch of discrete articles or files. Different parts of this corpus can be used when needed, or not at all. That's a different way of thinking about publishing (overall corpus vs. individual artifacts) with more implications than I touched on here.

©2003 - Present Akamai Design & Development.