HTML Drag and Drop—What a Drag


An article about HTML and JavaScript—what's with that? I implement user interfaces for educational tools. Java on the client isn't really an option any more. The students use the tools through browsers, sometimes on devices without a Java runtime (such as an iPad).

For what I need to do, modern HTML and JavaScript are ok. I toyed with using ScalaJS or JSweet or Elm or whatever, but in the end, keeping it simple has many advantages. It was my good fortune that Internet Explorer ceased to matter when I started. I very rarely ran into situations where I needed to worry about browser incompatibilites. Except for drag and drop, the subject of this article.

Drag and Drop Basics

I had to implement a “Parsons puzzle” tool. A Parsons puzzle is a programming activity where students rearrange code tiles into a code snippet. These are good activities for beginners because students don't have to worry so much about syntax. The tool can be set up with any sequence of code tiles. When I design a particular problem, I like to put in a few distractors so students have to think carefully and don't just get 100% by reproducing the basic shape of the program.


Obviously, dragging the tiles is a very natural user interface for this activity. So that's what I had to implement. (Here is a sample.)

How difficult can drag and drop be? You use it every day. You drag something and you drop it.

Actually, it isn't all that simple. First, you somehow identify the thing that you want to drag. How? You don't click on it with the mouse. That would be a click. Instead, you move the mouse over it, depress the primary mouse button, then, while leaving the mouse button depressed, move it a little bit. What about touch interfaces? It is similar, except there is a bewildering set of tap and pinch and pan and long-press gestures, and it seems hard to figure out which o that you don't want to interfere with.

That is why you don't want to use low-level events to implement your own drag and drop logic. Instead, the HTML standard provides an API that leaves the detection of drag and drop events to the browser (and ultimately, the host desktop).

The HTML 5 Drag and Drop API

The Drag and Drop API deals with four issues:

  1. Which elements are draggable?
  2. What happens visually during the drag process?
  3. Which elements are drop targets, and what should happen when something is dropped on them?
  4. How is data transmitted from the drag source to the drop target?

What Can Be Dragged?

You can always drag images, text selections, and links. Usually you'll want to drag them outside the browser to somewhere else. Similarly, you can drag things from elsewhere into the browser (such as URLs or files from a file explorer). I am not interested in this form of drag and drop right now.

To make an arbitrary HTML element draggable, you must do three things:

  1. Set the draggable attribute of the element to true.
  2. Add a listener for the dragstart event
  3. In that listener, set some data to be transferred—see below

You really must do all of those things, whether you want to or not. If you do not set the transfer data, Firefox will not consider the element draggable. Is tempting to set the transfer data to a string like "Take that, Firefox!". But that might be embarrassing when the element is dragged into another app. Just provide some innocuous text.

Ponder for a minute what must go on inside Firefox. The drag gesture is initiated. The browser looks for the dragstart listener. If it doesn't exist, it concludes that the programmer was just kidding when making the element draggable. Then the browser calls the dragstart listener. When the listener returns, the browser checks whether the transfer data is really set. If not, it considers it a joke listener and refuses to start the drag. Who came up with this???

The dragstart and dragend Listeners

The dragstart event gives you a bunch of information about the drag start. The most useful are:

There is also a dragend event where you can undo anything that you set up in dragstart. This happens after a successful drop, or when the drag was canceled or abandoned.

Back to dragstart. The cursor position is particularly valuable when you drag and drop a geometric shape. The user probably didn't touch it in its perfect center, and you will want to take the offset into account when the object is dropped. (See this blog for the geometry.)

Except (insert evil laugh) on the iPad. It took Safari on iOS ages to support HTML drag and drop at all, so you'd think they'd have had plenty of time to get it right. But it's not so. In iOS 12, the clientX/screenX and clientY/screenY coordinates are simply wrong. I have no idea what they report. They don't even depend on where you touch.

Weirdly enough, the Apple documentation states:

The x and y parameters of setDragImage specify the point of the image that should be placed directly under the mouse. This value is typically the location of the mouse click that initiated the drag, with respect to the upper-left corner of the element being manipulated.

Unfortunately, obtaining this information in a cross-browser fashion is easier said than done. There is no standard way to determine the position of the mouse relative to the document because different browsers implement the standard event values in subtly incompatible ways.

In Firefox, Chrome, Edge, and Safari on the Mac, I can use clientX/clientY relative to element.getBoundingClientRect(). Only iOS fails. Not in a subtle way.

Visual Appearance of the Drag

As the drag proceeds, a copy of the dragged element is displayed. In my use case, that's the code line that is being dragged. That's good. The student drags teh code line to the left. I display indentation levels. The student aligns the tile and drops it.

Except (insert evil laugh) on the iPad. Unlike every other browser known to man, Safari on iOS takes it upon itself to resize the dragged make it smaller and cuter and more adorable. Of course, with that adorable visual enhancement, now you can't align the tile any more.

It is possible to set an image instead. Do this in the dragstart listener:

let img = document.createElement('img')
img.src = ...
e.dataTransfer.setDragImage(img, 32, 32)

If the image is small enough, then Safari on iOS leaves it alone, and my users can accurately position the drop location—particularly if they have transparent fingers.

Safari on iOS does not have a monopoly on idiotic behavior. With Chrome and Firefox, the drag image must be part of the DOM. Because the browser implementors couldn't be bothered to deal with this, you, the programmer, must insert it into the DOM and, because it would look stupid to see the image twice, hide it somehow. But the obvious hiding mechanisms (display: none, left: -1000px, etc.) cause the drag image not to show up. Try something like

position: absolute; left: 0px; top: 0px; z-index: -1;

Or preload another copy of the image.

Some browsers allow you to use a Canvas or SVG element for the drag image. Safari sensibly lets you use any element, except SVG didn't work on (you guessed it) the iPad.

Drop Targets and Drag Events

To be a drop target, an element must provide both drop and dragover listeners.

It is not enough to merely provide a drop listener, even if you have nothing useful to do in dragover. The dragover listener is fired “every 350ms (±200ms) milliseconds”, and the browser doesn't take it lightly if you don't provide that listener.

And there is always one thing to do. You must call e.preventDefault() in the listener. Because if you don't, then the drag is deemed to be canceled. Apparently you need to reassure the browser every few hundred milliseconds that you want to continue the operation.

There are also optional dragenter and dragleave events that you can use to light up drop targets, typically by adding and removing a style. There is no equivalent of a pseudoclass like :hover for drag and drop. (The W3C spec says that you must provide a dragenter listener, but that doesn't seem to be enforced.)

In dragenter, call e.preventDefault() when this target is willing to accept the drop. Otherwise, return without calling preventDefault. In dragleave, you don't have to call preventDefault.

The Mozilla docs say cheerily: “The dragleave event will always fire, even if the drag is cancelled, so you can always ensure that any insertion point cleanup can be done during this event.” It ain't so. If the element is dropped, then dragleave is not called (at least with Chrome). You need to duplicate the dragleave cleanup in the drop listener.

Firefox also has a dragexit event that is fired just before dragleave, but other browsers don't have it. Just ignore it.

Finally, there is a drag event that keeps occurring throughout the drag operation. You don't have to listen to it, and I don't know why you would want to. But if you do, be sure to call preventDefault, or the operation is canceled.

The “Drop Effect”

During the drag process, the browser may provide some visual feedback that indicates the nature of the operation, typically with different mouse cursors. The effect choices are:

The default is move, but users can select copy or link effects by hitting keyboard modifiers.

In the dragenter, dragover, and drop handlers, you get the provided effect, as a string, in e.dropEffect.

If you like, you can override that effect, by assigning a different string. For example, if a particular drop target always wants the move effect, no matter what modifier keys your users press, add this call to the dragover listener:

e.dataTransfer.dropEffect = 'move'

But it's not that simple. The drop target isn't in sole control of that process. Perhaps the drag source (which might come from a different web app) isn't happy with a move operation.

The drag source must set the allowed effects to one of the following nine strings:

If you implement drag and drop within one web app, just choose all in the dragstart listener and then deal with the details in dragover:

e.dataTransfer.effectAllowed = 'all'

There has been a fair amount of confusion about this at StackOverflow, but I think that's because people don't always think about cross-app drag and drop. Once you consider that, the setup makes sense.

Dropping and Data Transfer

In the drop listener, first of all, call stopPropagation for a successful drop, or return without calling it if you reject the drop.

You can, if you care, find the drop effect that the user selected, and act accordingly.

If you want to handle a cross-app drop, you then retrieve the transferred data. This is its own can of worms, because the browser support is inconsistent. Reliably sending a JSON string from one app to another seems quite an adventure. I am not going to dwell on it here.

For drag and drop in the same app, don't bother with data transfer. Just set up an object in dragstart and retrieve it in drop.

A Template

Here is a template with all the operations in one place. Keep in mind that this is for drag and drop within a single app. Otherwise, you need to delve into the data transfer.

Do this for the drag sources:

let dragData = undefined
let dragSource = ...
dragSource.setAttribute('draggable', true)
dragSource.addEventListener('dragstart', function(e) {
  e.dataTransfer.effectAllowed = 'all'
  e.dataTransfer.setData('text/plain', dragSource.textContent) // Firefox needs this 
  let bounds = dragSource.getBoundingClientRect() 
  dragData = {
     x: e.clientX - bounds.left,
     y: e.clientY -
     // Any other data...
  // Any other setup...
dragSource.addEventListener('dragend', function(e) {
  // Any other teardown...
  dragData = undefined

Do this for the drop targets:

dropTarget.addEventListener('dragenter', function(e) {
  if (dragData === undefined) return // Some sort of foreign drop
dropTarget.addEventListener('dragleave', function(e) {
dropTarget.addEventListener('dragover', function(e) {
  e.dataTransfer.dropEffect = 'move'
dropTarget.addEventListener('drop', function(e) {
  dropTarget.classList.remove('dragover') // Duplicate from dragleave
  if (dragData === undefined) return // Some sort of foreign drop
  // Drop action...

Other Haters

I am not the only one who thinks HTML Drag and Drop is a drag. Check out these blogs for more exasperation and other tips.

Why can't this spec get some love? A great deal of the busywork could be removed. Imagine that a drag source would just have to listen to dragStart, and a drop target only to drop. Imagine there was a pseudoclass for accepted drop targets.

Why do browser implementors have this passive-aggressive attitude towards drag and drop? The pain points could be fixed with a minimum amount of trouble. Surely far less than the trouble that they went through in the first place to implement the spec. I guess it's that iron law of software engineering: “If it was hard to code, it should be hard to use.”

Comments powered by Talkyard.