• 1 Post
  • 52 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle



  • A lot of what you said is true.

    Since the TPU is a matrix processor instead of a general purpose processor, it removes the memory access problem that slows down GPUs and CPUs and requires them to use more processing power.

    Just no. Flat out no. Just so much wrong. How does the TPU process data? How does the data get there? It needs to be shuttled back and forth over the bus. Doing this for a 1080p image with of data several times a second is fine. An uncompressed 1080p image is about 8MB. Entirely manageable.

    Edit: it’s not even 1080p, because the image would get resized to the input size. So again, 300x300x3 for the past model I could find.

    /Edit

    Look at this repo. You need to convert the models using the TFLite framework (Tensorflow Lite) which is designed for resource constrained edge devices. The max resolution for input size is 224x224x3. I would imagine it can’t handle anything larger.

    https://github.com/jveitchmichaelis/edgetpu-yolo/tree/main/data

    Now look at the official model zoo on the Google Coral website.

    https://coral.ai/models/

    Not a single model is larger than 40MB. Whereas LLMs start at well over a big for even smaller (and inaccurate) models. The good ones start at about 4GB and I frequently run models at about 20GB. The size in parameters really makes a huge difference.

    You likely/technically could run an LLM on a Coral, but you’re going to wait on the order of double-digit minutes for a basic response, of not way longer.

    It’s just not going to happen.


  • when comparing apples to apples.

    But this isn’t really easy to do, and impossible in some cases.

    Historically, Nvidia has done better than AMD in gaming performance because there’s just so much game specific optimizations in the Nvidia drivers, whereas AMD didn’t.

    On the other hand, AMD historically had better raw performance in scientific calculation tasks (pre-deeplearning trend).

    Nvidia has had a stranglehold on the AI market entirely because of their CUDA dominance. But hopefully AMD has finally bucked that tend with their new ROCm release that is a drop-in replacement for CUDA (meaning you can just run CUDA compiled applications on AMD with no changes).

    Also, AMD’s new MI300X AI processor is (supposedly) wiping the floor with Nvidia’s H100 cards. I say “supposedly” because I don’t have $50k USD to buy both cards and compare myself.




  • And you can add as many TPUs as you want to push it to whatever level you want

    No you can’t. You’re going to be limited by the number of PCI lanes. But putting that aside, those Coral TPUs don’t have any memory. Which means for each operation you need to shuffle the relevant data over the bus to the device for processing, and then back and forth again. You’re going to be doing this thousands of times per second (likely much more) and I can tell you from personal experience that running AI like is painfully slow (if you can get it to even work that way in the first place).

    You’re talking about the equivalent of buying hundreds of dollars of groceries, and then getting everything home 10km away by walking with whatever you can put in your pockets, and then doing multiple trips.

    What you’re suggesting can’t work.


  • ATI cards (while pretty good) are always a step behind Nvidia.

    Ok, you mean AMD. They bought ATI like 20 years ago now and that branding is long dead.

    And AMD cards are hardly “a step behind” Nvidia. This is only true if you buy the 24GB top card of the series. Otherwise you’ll get comparable performance from AMD at a better value.

    Plus, most distros have them working out of the box.

    Unless you’re running a kernel <6.x then every distro will support AMD cards. And even then, you could always install the proprietary blobs from AMD and get full support on any distro. The kernel version only matters if you want to use the FOSS kernel drivers for the cards.


  • Two* GPUs? Is that a thing? How does that work on a desktop?

    I’ve been using two GPUs in a desktop since 15 years ago. One AMD and one Nvidia (although not lately).

    It really works just the same as a single GPU. The system doesn’t really care how many you have plugged in.

    The only difference you have to care about is specifying which GPU you want a program to use.

    For example, if you had multiple Nvidia GPUs you could specify which one to use from the command line with:

    CUDA_VISIBLE_DEVICES=0

    or the first two with:

    CUDA_VISIBLE_DEVICES=0,1

    Anyways, you get the idea. It’s a thing that people do and it’s fairly simple.









  • CeeBee@lemmy.worldOPtoSelfhosted@lemmy.worldOpenSubtitles Hostility
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    6
    ·
    9 months ago

    I know how debugging works. I’ve been a developer for a couple decades.

    I know for a fact that the lines I removed are normal verbose messages and entirely unrelated to my issue. I know not only because I’m a developer and understand the messages, but also because those lines show up every second of every minute of every day. They are some of the most verbose lines in the logs. The scheduled task for the subtitles only runs once a day and finishes within a few minutes.

    Also, they weren’t indicative of any code path because of how frequent they were. At such a high frequency it becomes impossible to determine which line came first in multi-threaded or asynchronous tasks.


  • I literally have a pinned tab for a Whisper implementation on github! It’s on definitely my radar to check out. My only concern is how well does it do things like multiple speakers and does it generate SDH subtitles? It’s the type that has those extra bits like “Suspenseful music” and “[groans]”, “[screams]”, etc. All the stuff someone hard of hearing would benefit from.