Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ds content is documented as undefined in ROM sections #350

Closed
mid-kid opened this issue Jun 1, 2019 · 38 comments
Closed

ds content is documented as undefined in ROM sections #350

mid-kid opened this issue Jun 1, 2019 · 38 comments
Labels
docs This affects the documentation (web-specific issues go to rgbds-www) enhancement Typically new features; lesser priority than bugs

Comments

@mid-kid
Copy link
Contributor

mid-kid commented Jun 1, 2019

The content fillled by the ds directive in ROM sections is documented as undefined yet is is very well defined.

Can a programmer rely on the ds areas being filled with the value to -p or not?

Also, it isn't very clear from the command-line usage that this value is used for the ds directive. (padding an image is only really relevant to rgblink and rgbfix)

@aaaaaa123456789
Copy link
Member

Note that any attempt at making this truly undefined will break backwards compatibility.

Case 1: rgbfix relies on the header section ($0104 - $014F) being all zeroed out; if some bytes aren't zero, it complains. Many ROMs that use a padding value of zero define this area as ds 76 (or any equivalent, such as ds $150 - $104). If this ds fill value is ever changed, rgbfix will not find the zeros it expects.

Case 2: many tools assume that the padding byte is used for unused space (free space calculators, for instance). Many build setups rely on these tools. Changing the value output by ds will break these tools.

TL;DR: this behavior is well-known by RGBDS users, and they take advantage of it; avoid another #136 and document it instead of making it truly undefined.

@nitro2k01
Copy link
Member

Warning: rant incoming!

Daily reminder that undefined doesn't mean random. It just means that you're not allowed to rely on the value, even if it always happens to be initialized to a certain value in a given implementation. And likewise, it means that when you define memory using DS you forfeit that the right to rely on the value being defined.

My claim, then, would be that it is strictly speaking a user error to use DS for defining constant data, however convenient and elegant it looks. The better way would be something like

REPT 76
    DB 0
ENDR

This makes sense because traditionally in assemblers, DS (define storage) has usually only been used for static allocation in RAM, which by its very nature is undefined. Especially on GB where RAM is uninitialized on startup. This is also reflected in the title of the document, "Declaring variables in a RAM section" which has gone unedited since the Carsten Sorensen days, apart from replacing BSS with RAM.

However, the documentation doesn't lie, or at least it didn't when it was written. Here's a blast form the past. The original versions of RGBDS, as released by Carsten himself, didn't clear the memory buffers before writing data to them. This was mostly not a problem since in practice they would mostly be zeroed. With the exception of the first 8 or so byte of the ROM, which would contain some junk data, probably because that buffer was realloc'd from a previous use. So I suspect that what the fill flag would actually do in previous versions of the assembler is to pre-clear the buffer, rather than use that as a defined value in the skip function. This issue showed up for me in the linker, but also applied to the assembler. I wasn't happy with it at the time, but I just used added -zff to both tools (which was the fill flag for those versions) and went on with my life. Also, at the time, RGBFIX wouldn't complain about non-zero values so this wouldn't have been an issue to begin with.

Allowing DS to be used for static ROM allocation is a bit of a kludge, and as far as I know, allowing the DS directive for ROM data is the only reason the assembler has a fill flag to begin with. The assembler shouldn't really need one. (Only the linker should, to fill in unallocated space.) If I redesigned RGBDS from the ground up I would simply remove this feature. But now it's a well established part of RGBDS so it needs to remain for backward compatibility.

I would strongly oppose any suggestions to randomize the fill value for DS, just to conform with a documentation originally written to describe how DS works when allocating RAM. That would be sheer lunacy. An assembler, given the same input twice, should generally produce the same result, unless you do something that very explicitly differs between runs. (Like a current date function or an explicit randomness function.)

I would instead propose the following:

  1. Document how the -p flag affects the DS directive.
  2. As far as the implementation goes, either change nothing, or warn/error if DS is used without specifying the -p flag.

Also, as the final part of the rant, I want to ask @aaaaaa123456789 exactly how that free space calculator would work. How could a tool rely on any continuous area of a certain value that's not at the end of a file/bank being empty? How would the tool know that this data was truly empty and not part of a data structure that just happens to be filled with a bunch of zeroes? It seems like such a tool would give wildly inaccurate results for many files. Furthermore, why use a binary analysis tool for output from RGBDS, when the linker can output a perfectly usable map file using the flag -m, which tells you in no uncertain terms which and how many bytes are allocated or not?

@aaaaaa123456789
Copy link
Member

@nitro2k01 If you want to calculate free space in a ROM, generate two ROMs with two different values of -p and count the bytes where they differ. (Make sure to ignore anything in the header.) It's far easier than parsing map files.

@nitro2k01
Copy link
Member

@aaaaaa123456789 Ok, that's one way to do it. Either way, the -p flag shouldn't really matter for that, since all it affects is the value used for ds. Do you usually use ds for defining unallocated space? I don't. Really, only the -p flag for rgblink and rgbfix should be applicable to that use case.

@aaaaaa123456789
Copy link
Member

Occasionally I use ds to mark unused space that is not at the end of a bank. (For instance, in the small area before $0100.)

@nitro2k01
Copy link
Member

nitro2k01 commented Jun 2, 2019

@aaaaaa123456789 So then if you are asking the linker explicitly to not use that space for anything useful, why would you want to include it in your free space calculation? Why not just let the linker use it if you consider it to be free? Even without marking it, that space would show up in your tool, since those bytes (if they are actually unused) are filled by the linker and would differ.

It's a very niche use case, and even so you could do without -p if you wanted. Use -DFILLVAL="$AA" to define a constant, then include it as such:

SECTION "FILLER",ROM0[$68]
REPT $98
    DB FILLVAL
ENDR

This is more academic than anything since I do support keeping -p as long as DS can be used for ROM, but just showing that there are ways without the -p flag.

@pinobatch
Copy link
Member

Let me propose a compromise: ds 76, $00

ca65, an assembler targeting the 6502 architecture (and any other architecture for which someone has written a macro pack), has the .res command that corresponds to RGBASM ds.

  • .res 76 reserves 76 bytes with an unspecified value. (If in a ROM section, they're technically specified by the linker configuration file.)
  • .res 76, $00 reserves 76 bytes with a specified value.

The latter form is equivalent to the following but does not occupy three source lines of code:

  .repeat 76
  .byte $00
  .endrepeat

I propose adding a 2-argument alternate form of ds equivalent to rept around db. For example, ds $150-$104, $00 would skip a header.

My own free space calculator (unused.py) avoids most false positives by requiring a run to be at least 32 bytes long. If a program triggers that, then the program needs a code review anyway in order to improve its data compression so that more can be packed into a 32K flash cart. As for @nitro2k01's suggestion to "output a perfectly usable map file using the flag -m", this works only on the output of RGBLINK. It won't work, say, for a ROM produced by concatenating an engine linked with RGBLINK to a separate asset archive.

@aaaaaa123456789
Copy link
Member

aaaaaa123456789 commented Jun 2, 2019

Two-argument ds would be useful, but it does cause the problem of having to specify the fill value in every location, which in turn means you have to change it in every location.

The convenience of -p is unbeatable. It could be replaced by a variable, but that would definitely be backwards-incompatible.

EDIT: I'm not arguing against two-argument ds; just saying it doesn't replace the current form.

@nitro2k01
Copy link
Member

nitro2k01 commented Jun 2, 2019

@aaaaaa123456789 Just to be clear, what I mentioned in the previous comment was a suggestion on how to reimplement your use case using existing features. It was not a new feature suggestion. The -D flag for defining a symbol already exists since however many versions back. And the rest is just a standard syntax for repeating a constant byte x times. Which again you could using existing functionality and not many more bytes of code.

My opposition to using DS this way is also somewhat philosophical. Reusing a directive normally used for data in RAM where it by its very nature is undefined, is a kludge, and passing the fill value as a switch is an afterthought that just happened to be useful for your use case. It doesn't really fit in with how the assembler works overall. I'd much rather that you'd solve this using more generic building blocks, like my suggestion above.

Another "philosophically consistent" way of implementing it would be to let the assembler leave bytes defined by DS undefined and let the linker fill them in with its fill value. But this would potentially break existing code since a common use is apparently to use this to fill in the header with zeroes, whereas you'd often want the linker to fill with $FF both for preserving the life of the flash chips in flash cartridges, and for catching potential out of bounds execution since $FF is the opcode for rst $38 which in turn can be used as a crash handler.

@pinobatch I do like the idea, but I'd implement it rather differently because of my aforementioned philosophical opposition to this use of the DS which I'd rather see deprecated in the long term, in particular when used without a data argument. Namely, I'd have two directives called something like DBREP and DWREP. These would take one length argument, and one or more data arguments of size byte or word. For example...

    DBREP 4,0,1,2,3

...would output the same data as...

    DB 0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3

So it would be a shorthand for a DB/DW in a REPT. But in its simplest use, you'd give it one length argument and one data argument.

@AntonioND
Copy link
Member

I agree with @nitro2k01, this looks like a way to abuse the DS directive, which is not the first abuse I see: #190

However, I have no strong feelings on what to do about it.

@mid-kid
Copy link
Contributor Author

mid-kid commented Jun 2, 2019

Imo, just document its current behavior properly. It's not exactly surprising or unintended behavior like #190 was, and people kind of expect it to work this way.

Heck, if anything, I'm just going to point to existing behavior of binutils, with its .skip and .org directives that do essentially what ds already does, and what pino is describing, and .fill that works more like what nitro describes.

@AntonioND
Copy link
Member

Then yeah, a documentation update would probably be the best thing to do.

@ISSOtm ISSOtm added docs This affects the documentation (web-specific issues go to rgbds-www) enhancement Typically new features; lesser priority than bugs labels Aug 29, 2019
@ISSOtm
Copy link
Member

ISSOtm commented Dec 6, 2019

I'm currently overhauling RGBDS' man pages, and here is the first draft of the section related to this issue:

   Declaring variables in a RAM section
     DS allocates a number of bytes.  The content is undefined.  This is the
     preferred method of allocationg space in a RAM section.  You can,
     however, use DB, DW and DL without any arguments instead.

           DS str_SIZEOF ;allocate str_SIZEOF bytes

     Note that in ROM sections, the bytes in the empty space left by DS will
     be filled with the value passed to the -p command-line option.

@aaaaaa123456789
Copy link
Member

The word "undefined" should probably be gone from that document. That content has never been undefined.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 6, 2019

It is undefined—in RAM sections.

@pinobatch
Copy link
Member

In the C standard, "undefined" means "don't effing do this because it can summon time-traveling nasal demons." In the case of ds as an attempt to replace rept around db, I think "unspecified" is a better match.

@aaaaaa123456789
Copy link
Member

It is undefined—in RAM sections.

RAM sections have no content — they are comparable to ELF NOBITS sections. It's not that their contents are undefined or unspecified; it's that they are symbol-only sections that don't have contents.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 6, 2019

A section like BSS has defined contents: all zeros.

@aaaaaa123456789
Copy link
Member

That's only reasonable when there is a program loader of some sort — typically the moral equivalent of what the C standard calls a hosted environment.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 6, 2019

But then, what's wrong with notifying the user that there is no such loader in the hardware? (The boot ROM could clear a portion of WRAM, for example.)

@aaaaaa123456789
Copy link
Member

Users expect tools to document their own behavior above everything else. That wording makes it seem like RGBDS may make users believe that RGBDS is making that area undefined.

Documenting the GB itself is nice, but it shouldn't get in the way of documenting the tool itself, and listing an environmental constraint amongst tool-specific behavior leads to confusion.

@pinobatch
Copy link
Member

Anything related to headers is also "an environmental constraint," yet the competing toolchain WLA DX has syntactic sugar for declaring headers on many platforms it supports. Even the 16 KiB size of banks is "an environmental constraint" in a way.

@aaaaaa123456789
Copy link
Member

aaaaaa123456789 commented Dec 6, 2019

Even the 16 KiB size of banks is "an environmental constraint" in a way.

Which is why you don't say "the ds keyword supports arguments between 1 and 16,384", because that's not a property of the keyword.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 6, 2019

It actually is a property, in order to avoid creating too large a section.

@nitro2k01
Copy link
Member

nitro2k01 commented Dec 7, 2019

@aaaaaa123456789

The word "undefined" should probably be gone from that document. That content has never been undefined.

Again, I believe this was true in the very early days of RGBDS as I explained in the previous post. RGBDS sometimes wouldn't clear memory before using it as buffers and could leak internal RGBDS state as a result. And while what I observed back in the day was for the first few bytes of the ROM, and I'd have to go back and try to reproduce it for DS I wouldn't be surprised if it applied to DS as well and that was Carsten's way of warning users about it.

@ISSOtm

It actually is a property, in order to avoid creating too large a section.

@aaaaaa123456789 's previous comment reminded me of something:

Which is why you don't say "the ds keyword supports arguments between 1 and 16,384"

That's actually incorrect. It supports arguments between 1 and 32768, because the tiny ROM mode exists, which allocates a flat 32 kiB area from $0-$7FFF. Now imagine if someone considered it a property of the DS keyword that it doesn't output >16384 bytes (because they forgot about tiny mode) and coded that as a hard limit, and someone working in tiny ROM mode wanted to allocate 16385 filler bytes. The point is that it's not DS's responsibility to check the size, because the infrastructure to check how much data fits within a section already exists elsewhere.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 7, 2019

Well, we do have some sanity checking, which in effect does limit the argument to ds. That said, you're correct in that the maximum argument is indeed 32,768.

I really insist on leaving "undefined" in this text because as mentioned above, using ds to fill memory with a certain value is not a good idea, and I would like the text to convey that the user's code should not rely on the contents or ds. I'm still open about adding a fill keyword that would serve just that purpose, though.

@aaaaaa123456789
Copy link
Member

But the sanity check isn't actually part of the behavior of ds, but of sections in general — any oversized section will trigger that check. Should it be mentioned on every instruction? Any of them could be the one that pushes the section past its maximum size, though.

using ds to fill memory with a certain value is not a good idea

And here we come to the true issue, the one that has plagued this toolchain for years: the maintainers actively fighting forms of programming they don't like.

I would like the text to convey that the user's code should not rely on the contents or[sic] ds.

This here is complete nonsense. The fill value of ds is completely well defined. Moreover, since build processes can and do rely on it, it cannot be changed without breaking compatibility — nor there is any reason to. There's no good reason why users shouldn't rely on the value, as the value is guaranteed to be what it is.

Describing it as undefined when it isn't is the complete opposite of good documentation. Documentation for a tool should primarily describe what the tool does, and using the word "undefined" is confusing because it might mislead users to believe the complete opposite of what actually happens.

Except that it gets worse. You say that the fill value is undefined because RAM is undefined, which might lead to people mistakenly believing it is undefined in ROM — which makes the word 'undefined' confusing. But then you go ahead and use that argument yourself as a justification for not relying on the fill value in ROM — meaning that the confusion is completely deliberate! Writing confusing documentation in order to prevent usage of a feature isn't just a mistake; it's downright malicious.

Documentation should strive to be truthful and accurate, not confusing and misleading. Doing the latter intentionally is nothing but a disservice to all users of the tool. It's actively harmful. Please stop this nonsense.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 7, 2019

It's not a problem that it's something I like or not, it's that it was never meant for that and it shows. It can only be used for a single value per program (the -p one), and this is why making a built-in replicating the following macro has been proposed time and time again:

fill: MACRO
    REPT \1
        db \2
    ENDR
ENDM

@aaaaaa123456789
Copy link
Member

The macro/directive you mention isn't a substitute for ds, as I said above. They are complementary.
Whether it was meant to be undefined or not a few centuries ago doesn't change the current behavior. Moreover, the current behavior has come to be expected by users, and since there's no good reason to change it, we can also expect that it will remain that way.

And the current behavior is not undefined by any means. It uses a specific value and it uses that value consistently.

Once again, documentation should describe actual behavior, and the behavior isn't undefined. Considerations about how someone believes you should use this feature are immaterial — first and foremost, the document should describe how it works, and the word 'undefined' doesn't.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 7, 2019

Well, that's exactly what the proposed snippet above says.

     Note that in ROM sections, the bytes in the empty space left by DS will
     be filled with the value passed to the -p command-line option.

@aaaaaa123456789
Copy link
Member

Then you have contradictory documentation. It begins by saying it's undefined, but then makes a small note about ROM values. Contradictory and confusing — deliberately so, as you proved a few comments above.

If you insist in noting that RAM values are undefined (which they always are due to hardware constraints, making that bit of documentation redundant), then that should be the sidenote, not the part of it that describes actual behavior (i.e., the ROM fill value).

Nobody who understands the platform expects RAM to be initialized; it's well understood that a fill value can only apply to ROM. (And those who don't will come across your sidenote — which should clearly mention that the values are uninitialized, which is not quite the same as undefined.) The true tool-specific and directive-specific documentation is that ds declares space, and empty space gets filled with the fill value in ROM. And that's what the documentation should focus on.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 7, 2019

The fill value is meaningless when using overlays, though, so ds does not do that. The note should mention "in ROM sections when not overlaying", though.

The point is that the contents should not be relied on, for example if a new feature is added that changes how empty space is handled. -p only matters for actual empty space.

@aaaaaa123456789
Copy link
Member

Then mention that overlaid ROMs are filled with the underlying ROM's data, and regular ROMs with the fill value.

All of this is worth being mentioned. It is the true behavior of the ds directive. Actual behavior should be the focus, not the fact that the console's RAM is uninitialized on power-on. This latter fact is documented in many other more correct places. If you think it's relevant to the ds directive, then make a sidenote mentioning it, but don't make it the main opening statement of its documentation.

@aaaaaa123456789
Copy link
Member

Also, if you change how ds works for regular ROMs, you'll break enough builds to cause a fork of the toolchain. That's insane.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 7, 2019

Changing the way ds works does not necessarily break existing code. Case in point, overlays.

@aaaaaa123456789
Copy link
Member

Because overlays are an opt-in feature by definition.

If you make non-breaking changes to ds, then users can rely on the fill value remaining unchanged. That's the definition of a non-breaking change.

@ISSOtm
Copy link
Member

ISSOtm commented Dec 7, 2019

Here is a second draft:

   Declaring variables in a RAM section
     DS allocates a number of empty bytes.  This is the preferred method of
     allocating space in a RAM section.  You can also use DB, DW and DL
     without any arguments instead (see Defining constant data below).

           DS 42 ; Allocates 42 bytes

     Empty space in RAM sections will not be initialized. In ROM sections, it
     will be filled with the value passed to the -p command-line option,
     except when using overlays with -O.

ISSOtm added a commit to ISSOtm/rgbds that referenced this issue Dec 7, 2019
ISSOtm added a commit to ISSOtm/rgbds that referenced this issue Dec 8, 2019
ISSOtm added a commit to ISSOtm/rgbds that referenced this issue Dec 8, 2019
@ISSOtm
Copy link
Member

ISSOtm commented Jan 7, 2020

Fixed by #455.

@ISSOtm ISSOtm closed this as completed Jan 7, 2020
ISSOtm added a commit that referenced this issue Feb 23, 2020
As suggested by #350 (comment)
The order `count` then `byte` was decided after some discussion:
- First argument consistent with single-arg syntax
- Intuitive at least to some people other than myself
- Consistent with other assemblers, at least ca65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This affects the documentation (web-specific issues go to rgbds-www) enhancement Typically new features; lesser priority than bugs
Projects
None yet
Development

No branches or pull requests

6 participants