“We think we’re on the cusp of the next evolution, where AI happens not just in that chatbot and gets naturally integrated into the hundreds of millions of experiences that people use every day,” says Yusuf Mehdi, executive vice president and consumer chief marketing officer at Microsoft, in a briefing with The Verge. “The vision that we have is: let’s rewrite the entire operating system around AI, and build essentially what becomes truly the AI PC.”

…yikes

  • BarneyPiccolo@lemmy.today
    link
    fedilink
    English
    arrow-up
    23
    ·
    8 days ago

    I hate any voice-activated programs. Sometimes I’ll ask my phone to call someone, and most of the time it does. But every now and then, it seems to completely forget my voice, the English language, how to access my contacts, how to spell anything, etc. I end up spending five minutes trying to force it to dial by my voice, screaming and cursing at it like a psychopath, when it would have taken me literally 3 seconds to just make the call manually.

    If you try to do some sort of voice-to-text thing, it ALWAYS screws it up so bad, that you end up spending more time editing, than if you’d just typed it yourself in the first place.

    Fuck voice-activated anything. It NEVER works reliably.

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      9
      ·
      8 days ago

      It isn’t even unique to AI, human operators get things wrong all the time. Any time you put something involving natural language between the user/customer and completing a task, there’s a significant risk of it going wrong.

      The only time I want hands-free anything is when driving, and I’d rather pull over than deal with voice activation unless it’s an emergency and I can’t stop driving.

      I don’t get this fascination with voice activation. If you asked me to describe my dream home if money was no object and tech was perfect, voice activation would not be on the list. When I watch Iron Man or Batman talking to a computer, I don’t see some pinnacle of efficiency, I see inefficiency. I can type almost as fast as I can speak, and I can make scripts or macros to do things far faster than I can describe them to a computer. Shortcuts are far more efficient than describing the operation.

      If a product turns to voice activation, that tells me they’ve given up on the UX.

      • IsoKiero@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 days ago

        When I watch Iron Man or Batman talking to a computer, I don’t see some pinnacle of efficiency, I see inefficiency.

        Things like Jarvis from Iron Man are far beyond of just translating speech to computer commands. Like in the first Iron Man where Jarvis pretty much manages the whole process on manufacturing the suit and can autonomically manage a fleet of them. I could see benefit if some kind of AI could just listen on a engineers discussion and update CAD models based on that, taking care of that the assemblies work as they should, keeping everything in spec and managing all the documents accordingly. But that’s pretty much human-level AI at that point and specially the current LLM hype is fundamentally very different from it.

      • Flic@mstdn.social
        link
        fedilink
        arrow-up
        3
        ·
        8 days ago

        @sugar_in_your_tea @BarneyPiccolo especially in a language as widely used as English with regional nuance that an NLP could never distinguish. When I say “quite” is it an American “quite” or a British “quite”? Same for “rather”? What does it mean if we’re tabling this thing in the agenda? When/for how long is something happening, momentarily? Neither the speaker nor the program will have a clue how these things are being interpreted, and likely will not even realise there are differences.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          3
          ·
          8 days ago

          Even if they solve the regional dialect problem, there’s still the problem of people being really imprecise with natural language.

          For example, I may ask, “what is the weather like?” I could mean:

          • today’s weather in my current location (most likely)
          • if traveling, today or tomorrow’s weather in my destination
          • weather projection for the next week or so (local or destination)
          • current weather outside (i.e. heading outside)

          An internet search would be “weather <location> <time>”. That’s it. Typing that takes a few seconds, whereas voice control requires processing the message (a couple seconds usually) and probably an iteration or two to get what you want. Even if you get it right the first time, it’s still as long or longer than just typing a query.

          Even if voice activation is perfect, I’d still prefer a text interface.

          • setVeryLoud(true);@lemmy.ca
            link
            fedilink
            English
            arrow-up
            2
            ·
            8 days ago

            My autistic brain really struggles with natural language and its context-based nuances. Human language just isn’t built for precision, it’s built for conciseness and efficacy. I don’t see how a machine can do better than my brain.

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              8 days ago

              Agreed. A lot of communication is non-verbal. Me saying something loudly could be due to other sounds in the environment, frustration/anger, or urgency. Distinguishing between those could include facial expressions, gestures with my hands/arms, or any number of non-verbal clues. Many autistic people have difficulty picking up on those cues, and machines are at best similar to the most extreme end of autism, so they tend to make rules like “elevated volume means frustration/anger” when that could very much not be the case.

              Verbal communication is designed for human interactions, whether in long-form (conversations) or short-form (issuing commands), and they rely on a lot from the human experience. Human to computer interactions should focus on those strengths, not try to imitate human interaction, because it will always fail at some point. If I get driving instructions from my phone, I want it to be terse (turn right on Hudson Boulevard), whereas if my SO is giving me directions, I’m happy with something more long-form (at that light, turn right), because my SO knows how to communicate unambiguously to me whereas my phone does not.

              So yeah, I’ll probably always hate voice-activation, because it’s just not how I prefer to communicate w/ a computer.