I am working on captioning some YouTube videos and some of them only have instrumental music. For example, if I have a 2-minute video of gentle piano music (with no speech), then I would start it off with a caption of:

[gentle piano music]

My question is how long I should leave that on the screen… does anyone know of any guidelines or recommendations?

I wondered if it should show the caption for the entire 2 minutes - but then I wondered if that might be annoying. What would you prefer?
  1. [gentle piano music] for the entire 2 minutes?
  2. [gentle piano music...] for the first 15 seconds?
  3. Something else?


A few moments is enough.

A very long ago we had subtitles in cartoons with songs. All we needed to do was follow the ball which set the beat, melody and so on. If they could find a way to reintroduce that again built to follow whatever tune as it travels across the screen it would provide a sense of what is being played.


Thank you very much @Mart and @x1heavy! That is super helpful.

The follow up question I have is if I do it for a few moments at the beginning - if I need to also do something to let the viewer know that the music continues with no speech after the caption disappears. I guess my concern is how does the viewer know there is not any background speech happening once the "[gentle piano music]" caption goes off screen.

A couple of ideas might be:
  1. [gentle piano music...]

    (adding ellipsis (…) to indicate it continues)
  2. [gentle piano music]
    music continues without speech throughout video
Or maybe it is enough to just have the caption and the viewer who can assume that no caption means there is no speech?


I saw recently just a musical note icon (don’t recall if it was one or two). That was clear that music without words was playing. That was displayed on screen for a few moments periodically until talking started again.