Python's regex engine performs pathologically on regexes like
'<!--.*?-->' when fed a large string of repeating '<!--' clauses.
Thanks to JaimeSlome / security@huntr.dev for the report; closes#1380.
Solved by switching to the Rust implementation, which does not suffer
from this issue.
entsToText(), minimizeHTML(), and the old regex constants have been
removed; they do not appear to be used by any add-ons.
Matches should arrive in alphabetical order. Currently results are not
capped (JS should be able to handle ~1k tags without too much hassle),
and no reordering based on match location is done. Matches are substring
based, and multiple can be provided, eg "foo::bar" will match
"foof::baz::abbar".
This is not hooked up properly on the frontend at the moment -
updateSuggestions() seems to be missing the most recently typed character,
and is not updating the list of completions half the time.
Interday learning cards are now counted in the learning count again,
and are no longer subject to the daily review limit.
The thinking behind the original change was that interday learning cards
are scheduled more like reviews, and counting them in the review count
would allow the learning count to focus on intraday learning - the red
number reflecting the fact that they are the most fragile memories. And
counting them together made it practical to apply the review limit
to both at once.
Since the release, there have been a number of users expecting to see
interday learning cards included in the learning count (the latest being
https://forums.ankiweb.net/t/feedback-and-a-feature-adjustment-request-for-2-1-45/12308),
and a good argument can be made for that too - they are, after all, listed
in the learning steps, and do tend to be harder than reviews. Short of
introducing another count to keep track of interday and intraday learning
separately, moving back to the old behaviour seems like the best move.
This also means it is not really practical to apply the review limit to
interday learning cards anymore, as the limit would be split between two
different numbers, and how much each number is capped would depend on
the order cards are introduced. The scheduler could figure this out, but
the deck list code does not know card order, and would need significant
changes to be able to produce numbers that matched the scheduler. And
even if we ignore implementation complexities, I think it would be more
difficult for users to reason about - the influence of the review limit
on new cards is confusing enough as it is.
This was flawed - while non-Latin text is now acceptable
in an IRI, we still need to be concerned with reserved characters
such as spaces, and Anki unfortunately has been storing the filenames
in unencoded form in the DB, meaning we must encode them at display
time. We won't be able to move away from this until existing notes
are rewritten, and it will probably require breaking compatibility with
older clients.
https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier
This reverts commit 14110add55.
Will allow importing the Protobuf without pulling in the rest of
the library. This is not a full PEP420 namespace, and the wheel still
bundles everything - it just makes things easier in a Bazel workspace.
I originally tried with PEP420, but it required more invasive changes,
and I ran into issues with mypy.
In order to split backend.proto into a more manageable size, the protobuf
handling needed to be updated. This took more time than I would have
liked, as each language handles protobuf differently:
- The Python Protobuf code ignores "package" directives, and relies
solely on how the files are laid out on disk. While it would have been
nice to keep the generated files in a private subpackage, Protobuf gets
confused if the files are located in a location that does not match
their original .proto layout, so the old approach of storing them in
_backend/ will not work. They now clutter up pylib/anki instead. I'm
rather annoyed by that, but alternatives seem to be having to add an extra
level to the Protobuf path, making the other languages suffer, or trying
to hack around the issue by munging sys.modules.
- Protobufjs fails to expose packages if they don't start with a capital
letter, despite the fact that lowercase packages are the norm in most
languages :-( This required a patch to fix.
- Rust was the easiest, as Prost is relatively straightforward compared
to Google's tools.
The Protobuf files are now stored in /proto/anki, with a separate package
for each file. I've split backend.proto into a few files as a test, but
the majority of that work is still to come.
The Python Protobuf building is a bit of a hack at the moment, hard-coding
"proto" as the top level folder, but it seems to get the job done for now.
Also changed the workspace name, as there seems to be a number of Bazel
repos moving away from the more awkward reverse DNS naming style.
Back in the WebKit days, images with Unicode filenames would fail to
appear if they weren't percent-escaped. This no longer seems to be the
case - with this patch, images appear correctly on the Mac and Windows
platforms I tested with.
Fixes https://forums.ankiweb.net/t/anki-2-1-45-beta/10664/96Fixes#1219
An example of how we can start migrating the codebase to PEP8:
- enable invalid-name at the top
- use bazel run pylib:pylint to identify names that need renaming
- use PyCharm or similar to rename the functions/variables
- in the cases where the conversion is not just snake_case, use
.register_deprecated_aliases()
+ removed the __repr__() definition, it dumps all the note content
and obscures the error message
mypy's move to external types-* packages is a PITA, as it requires them
to be installed in site-packages, and provides no way to specify a custom
site-packages folder, necessitating extra scripts to mock the
site-packages path, and copy+rename the stub packages into a separate
folder.
- changes can now be undone
- the same field can now be mapped to multiple target fields, allowing
fields to be cloned
- the old Qt dialog has been removed
- the old col.models.change() API calls the new code, to avoid
breaking existing consumers. It requires the field map to always
be passed in, but that appears to have been the common case.
- closes#1175
The hard limit from sqlite may be larger, but things slow down as more
tags are selected.
https://forums.ankiweb.net/t/unable-to-create-custom-test/10467
There are a number of things that could be improved here:
- we should show a live count so users are aware of the limit
- we should be filling in the parent tags when they're not explicitly
listed on a card
- we should reconsider disabling the 'tags to include' by default
It may make sense to defer these changes until we can move this screen
into Svelte/handle the processing in the backend.
Like the previous change, avoid exposing the protobuf as a public API
for now. It requires more thought, and is probably better done with
either extra helper accessors like decks.name(), or via a native class.
This could potentially help us avoid having to refetch the notetype
during study in the future, though updates to Note initialization and
the LaTeX handling would be required first.
Also:
- fix issues where the Undo action in the Browse screen was not
consistent with the main window. The existing hook signature has been
changed; from a snapshot of the add-on code from a few months ago, it
was not a hook that was being used by anyone.
- change the undo shortcut in the Browse window to match the main
window. It was different because undoing a change in the editing area
could accidentally trigger an undo of an operation, but the damage is
limited now that (most) operations can be redone. If it still proves to
be a problem, perhaps we should just always swallow ctrl+z when an
editing field is focused.
- Daily limits are no longer inherited - each deck limits its own
cards, and the selected deck enforces a maximum limit.
- Fetching of review cards now uses a single query, and sorts in advance.
In collections with a large number of overdue cards and decks, this is
faster than iterating over each deck in turn.
- Include interday learning count in review count & review limit, and
allow them to be buried.
- Warn when parent review limit is lower than child deck in deck options.
- Cap the new card limit to the review limit.
- Add option to control whether new card fetching short-circuits.
Avoids duplicate work, and is a step towards allowing the next
states to be modified by third-party code.
Also:
- fixed incorrect underlined count, due to reviews being labeled as
learning cards
- fixed reviewer not refreshing when undoing a test review, by splitting
up backend queue rebuilding from frontend reviewer refresh
- moved answering into a CollectionOp
The explicit flush was clearing undo history, and the hook will need
re-working to support propagating OpChanges correctly. It will likely
come back as a GUI hook, instead of one in pylib.
Allows add-on authors to define their own label for a group of undoable
operations. For example:
def mark_and_bury(
*,
parent: QWidget,
card_id: CardId,
) -> CollectionOp[OpChanges]:
def op(col: Collection) -> OpChanges:
target = col.add_custom_undo_entry("Mark and Bury")
col.sched.bury_cards([card_id])
card = col.get_card(card_id)
col.tags.bulk_add(note_ids=[card.nid], tags="marked")
return col.merge_undo_entries(target)
return CollectionOp(parent, op)
The .add_custom_undo_entry() is for adding your own custom actions.
When extending a standard Anki action, instead store `target =
col.undo_status().last_step` after executing the standard operation.
This started out as a bigger refactor that required a separate
.commit_undoable() call to be run after each operation, instead of
having each operation return changes directly. But that proved to be
somewhat cumbersome in unit tests, and ran the risk of unexpected
behaviour if the caller invoked an operation without remembering to
finalize it.
- backend now updates current notetype as part of addition
- frontend no longer implicitly adds, so we can assign a new name and
add in a single operation
Instead, fetch the config order on the frontend and pass a builtin
variant into the backend.
That makes the following unnecessary:
* Resolving the config sort in search/mod.rs
* Deserializing the Column enum
* Config accessors for the sort columns
* Remove duplicate backend columns
* Remove duplicate column routines
* Move columns on frontend from state to model
* Generate available columns from Colum enum
* Add second column label for notes mode
- make sure we set flag in changes when config var changed
- move current deck get/set into backend
- set_config() now returns a bool indicating whether a change was
made, so other operations can be gated off it
- active decks generation is deferred until sched.reset()
remove_note() now returns the count of removed cards, allowing us
to unify the tooltip between browser and review screen
I've left the old translation in - we'll need to write a script at
one point that gathers all references to translations in the code,
and shows ones that are unused.
- pass the handler directly
- reviewer special-cases for flags and notes are now applied at
call site
- drop the kind attribute on OpChanges which is not needed
Updating a deck via protobuf is now exposed on the backend, but not
currently on the frontend - I suspect we'll be better off writing
separate routines for the actions we need instead, and we get a better
undo description for free.
This is currently causing an ugly redraw in the browse screen, which
will need fixing.
- use strum to generate an iterator for the protobuf enum so we don't
forget to add new labels if extending in the future
- no add-ons appear to be using dynOrderLabels(), so it has been removed
@RumovZ perhaps a similar approach might work for listing the available
browser columns as well?
I18n is not set up at init time, so the strings can't be generated
at import.
@kelciour you have a few importing add-ons, so wanted to give you a
heads-up. The importing code is likely to change more in
future months, but for now this should be the only change
Instead of generating a fluent.proto file with a giant enum, create
a .json file representing the translations that downstream consumers
can use for code generation.
This enables the generation of a separate method for each translation,
with a docstring that shows the actual text, and any required arguments
listed in the function signature.
The codebase is still using the old enum for now; updating it will need
to come in future commits, and the old enum will need to be kept
around, as add-ons are referencing it.
Other changes:
- move translation code into a separate crate
- store the translations on a per-file/module basis, which will allow
us to avoid sending 1000+ strings on each JS page load in the future
- drop the undocumented support for external .ftl files, that we weren't
using
- duplicate strings in translation files are now checked for at build
time
- fix i18n test failing when run outside Bazel
- drop slog dependency in i18n module
- Filtered deck creation now happens as an atomic operation, and is
undoable.
- The logic for initial search text, normalizing searches and so on
has been pushed into the backend.
- Use protobuf to pass the filtered deck to the updated dialog, so
we don't need to deal with untyped JSON.
- Change the "revise your search?" prompt to be a simple info box -
user has access to cancel and build buttons, and doesn't need a separate
prompt. Tweak the wording so the 'show excluded' button should be more
obvious.
- Filtered decks have a time appended to them instead of a number,
primarily because it's easier to implement. No objections going back to
the old behaviour if someone wants to contribute a clean patch.
The standard de-duplication will happen if two decks are created in the
same minute with the same name.
- Tweak the default sort order, and start with two searches. The UI
will still hide the second search by default, but by starting with two,
the frontend doesn't need logic for creating the starting text.
- Search errors now have their own error type, instead of using
InvalidInput, as that was intended mainly for bad API calls. The markdown
conversion is done when the error is converted from the backend, allowing
errors to printed as a string without any special handling by the calling
code.
TODO: when building a new filtered deck, update_active() is clobbering
the undo log when the overview is refreshed
- QueueConfig is only used by the scheduler
- DeckConfig was being used in places that Config should have been used
- Add "Dict" to the name so that the bare name is free for use with a
stronger type.
Now behaves the same way as standard find&replace:
- Will match substrings
- Regexs can be used to match multiple items; we no longer split
input on spaces.
- The find&replace dialog has been updated to add tags to the field
list.
We were (ab)using the bulk update routine to do deletions, but that
code was really intended to be used for finding&replacing, where an
exact match is not a requirement.
- clear_unused_tags() is now undoable, and returns the number of removed
notes
- add a new mw.query_op() helper for immutable queries
- decouple "freeze/unfreeze ui state" hooks from the "interface update
required" hook, so that the former is fired even on error, and can be
made re-entrant
- use a 'block_updates' flag in Python, instead of setUpdatesEnabled(),
as the latter has the side-effect of preventing child windows like
tooltips from appearing, and forces a full redrawn when updates are
enabled again. The new behaviour leads to the card list blanking out
when a long-running op is running, but in the future if we cache the
cell values we can just display them from the cache instead.
- we were indiscriminately saving the note with saveNow(), due to the
call to saveTags(). Changed so that it only saves when the tags field
is focused.
- drain the "on_done" queue on main before launching a new background
task, to lower the chances of something in on_done making a small query
to the DB and hanging until a long op finishes
- the duplicate check in the editor was executed after the webview loads,
leading to it hanging until the sidebar finishes loading. Run it at
set_note() time instead, so that the editor loads first.
- don't throw an error when a long-running op started with with_progress()
finishes after the window it was launched from has closed
- don't throw an error when the browser is closed before the sidebar
has finished loading
- Introduced a new transact() method that wraps the return value
in a separate struct that describes the changes that were made.
- Changes are now gathered from the undo log, so we don't need to
guess at what was changed - eg if update_note() is called with identical
note contents, no changes are returned. Card changes will only be set
if cards were actually generated by the update_note() call, and tag
will only be set if a new tag was added.
- mw.perform_op() has been updated to expect the op to return the changes,
or a structure with the changes in it, and it will use them to fire the
change hook, instead of fetching the changes from undo_status(), so there
is no risk of race conditions.
- the various calls to mw.perform_op() have been split into separate
files like card_ops.py. Aside from making the code cleaner, this works
around a rather annoying issue with mypy. Because we run it with
no_strict_optional, mypy is happy to accept an operation that returns None,
despite the type signature saying it requires changes to be returned.
Turning no_strict_optional on for the whole codebase is not practical
at the moment, but we can enable it for individual files.
Still todo:
- The cursor keeps moving back to the start of a field when typing -
we need to ignore the refresh hook when we are the initiator.
- The busy cursor icon should probably be delayed a few hundreds ms.
- Still need to think about a nicer way of handling saveNow()
- op_made_changes(), op_affects_study_queue() might be better embedded
as properties in the object instead
'card modified' covers the common case where we need to rebuild the
study queue, but is also set when changing the card flags. We want to
avoid a queue rebuild in that case, as it causes UI flicker, and may
result in a different card being shown. Note marking doesn't trigger
a queue build, but still causes flicker, and may return the user back
to the front side when they were looking at the answer.
I still think entity-based change tracking is the simplest in the
common case, but to solve the above, I've introduced an enum describing
the last operation that was taken. This currently is not trying to list
out all possible operations, and just describes the ones we want to
special-case.
Other changes:
- Fire the old 'state_did_reset' hook after an operation is performed,
so legacy code can refresh itself after an operation is performed.
- Fire the new `operation_did_execute` hook when mw.reset() is called,
so that as the UI is updated to the use the new hook, it will still
be able to refresh after legacy code calls mw.reset()
- Update the deck browser, overview and review screens to listen to
the new hook, instead of relying on the main window to call moveToState()
- Add a 'set flag' backend action, so we can distinguish it from a
normal card update.
- Drop the separate added/modified entries in the change list in
favour of a single entry per entity.
- Add typing to mw.state
- Tweak perform_op()
- Convert a few more actions to use perform_op()
Basic proof of concept, where the 'delete note' operation in the
reviewer has been updated to use mw.perform_op(). Instead of manually
calling .reset() afterwards, a summary of the changes is returned as
part of the undo status query, and various parts of the GUI can listen
to gui_hooks.operation_did_execute and decide whether they want to
redraw based on the scope of the changes. This should allow the sidebar
to selectively redraw just the tags area in the future for example.
Currently we're just listing out all possible areas that might be changed;
in the future we could theoretically inspect the specific changes in the
undo log to provide a more accurate report (avoiding refreshing the tags
list when no tags were added for example).
You can test it out by opening the browse screen while studying, and
then deleting the current card - the browser should update to show (deleted)
on the cards due the earlier change.
If going ahead with this, aside from updating all the screens that currently
listen for resets, some thought will be required on how we can integrate
it with legacy code that expects to called when resets are made, and expects
to call .reset() when it makes changes.
Thoughts?
Fixes the following issue:
- some code directly modifies the database, causing modified_in_python
to be set to true
- an undoable operation is run, which calls autosave() at the end
- autosave() notices there's an undoable operation, and commits immediately
- because modified_in_python was true, col.mtime was bumped in Python
- that invalidated the undo queue, preventing the operation from being
undone
Rust requires all methods of impl Trait to be in a single file, which
means we had a giant backend/mod.rs covering all exposed methods. By
using separate service definitions for the separate areas, and updating
the code generation, we can split it into more manageable chunks -
this commit starts with the scheduling code.
In the long run, we'll probably want to split up the protobuf file into
multiple files as well.
Also dropped want_release_gil() from rsbridge, and the associated method
enum. While it allows us to skip the thread save/restore and mutex unlock/
lock, it looks to only be buying about 2.5% extra performance in the
best case (tested with timeit+format_timespan), and the majority of
the backend methods deal with I/O, and thus were already releasing the
GIL.
The existing code was really difficult to reason about:
- The default notetype depended on the selected deck, and vice versa,
and this logic was buried in the deck and notetype choosing screens,
and models.py.
- Changes to the notetype were not passed back directly, but were fired
via a hook, which changed any screen in the app that had a notetype
selector.
It also wasn't great for performance, as the most recent deck and tags
were embedded in the notetype, which can be expensive to save and sync
for large notetypes.
To address these points:
- The current deck for a notetype, and notetype for a deck, are now
stored in separate config variables, instead of directly in the deck
or notetype. These are cheap to read and write, and we'll be able to
sync them individually in the future once config syncing is updated in
the future. I seem to recall some users not wanting the tag saving
behaviour, so I've dropped that for now, but if people end up missing
it, it would be simple to add as an extra auxiliary config variable.
- The logic for getting the starting deck and notetype has been moved
into the backend. It should be the same as the older Python code, with
one exception: when "change deck depending on notetype" is enabled in
the preferences, it will start with the current notetype ("curModel"),
instead of first trying to get a deck-specific notetype.
- ModelChooser has been duplicated into notetypechooser.py, and it
has been updated to solely be concerned with keeping track of a selected
notetype - it no longer alters global state.
This splits update_card() into separate undoable/non-undoable ops
like the change to notes in b4396b94abdeba3347d30025c5c0240d991006c9
It means that actions get a blanket 'Update Card' description - in the
future we'll probably want to either add specific actions to the backend,
or allow an enum or string to be passed in to describe the op.
Other changes:
- card.flush() can no longer be used to add new cards. Card creation
is only supposed to be done in response to changes in a note's fields,
and this functionality was only exposed because the card generation
hadn't been migrated to the backend at that point. As far as I'm aware,
only Arthur's "copy notes" add-on used this functionality, and that should
be an easy fix - when the new note is added, the associated cards will
be generated, and they can then be retrieved with note.cards()
- tidy ups/PEP8
- note.flush() behaves like before, as otherwise actions or add-ons
that perform bulk flushing would end up creating an undo entry for
each note
- added col.update_note() to opt in to the new behaviour
- tidy up the names of some related routines
- transact() now automatically clears card queues unless an op
opts-out (and currently only AnswerCard does). This means there's no
risk of forgetting to clear the queues in an operation, or when undoing/
redoing
- CollectionOp->UndoableOp
- clear queues when redoing "answer card", instead of clearing redo
when clearing queues
Reviews and operations on the backend that support undoing can now be
committed immediately, so they will not be lost in the event of a crash.
This required tweaks to a few places:
- don't set collection mtime on save() unless changes were made in
Python, as otherwise we end up accidentally clearing the backend undo
queue
- autosave() is now run on every reset()
- garbage collection now runs in a timer, instead of relying on
autosave() to be run periodically
- use dataclasses for the review/checkpoint undo cases, instead of the
nasty ad-hoc list structure
- expose backend review undo to Python, and hook it into GUI
- redo is not currently exposed on the GUI, and the backend can only
cope with reviews done by the new scheduler at the moment
- the initial undo prototype code was bumping mtime/usn on undo, but
that was not ideal, as it was breaking the queue handling which expected
the mtime to match. The original rationale for bumping mtime/usn was
to avoid problems with syncing, but various operations like removing
a revlog can't be synced anyway - so we just need to ensure we clear the
undo queue prior to syncing
- fetch sfld and csum when fetching notes, to make it cheaper
to write them back out unmodified
- make `fields` private, and access it via accessors, so we can
still catch when fields have been mutated without calling
prepare_for_update()
- fix python importing code passing a string in as the checksum
https://forums.ankiweb.net/t/anki-2-1-41-beta/7305/59
When originally implemented in 21023ed3e5,
a given deck's limit was bound by its parents. This lead to a deck list
that seemed more logical in the parent limit < child limit case, as
child counts couldn't exceed a parent's, but it obscured the fact that
child decks could still be clicked on to show cards. And in the parent
limit > child limit case, the count shown for the child on the deck list
did not reflect how many cards were actually available and would be
delivered.
This change updates the reviewer to ignore parent limits when getting
review counts for the deck, which makes the behaviour consistent with
the deck list, which was recently changed to ignore parent limits.
Neither solution is ideal - this was a tradeoff v2 made in order to keep
fetching of review cards from multiple decks reasonably performant. The
experimental scheduling work moves back to respecting limits on
individual children, so this should hopefully improve in the future.
Also removed _revForDeck(), which was unused.
- Currently we just use 1.5x and 2x the normal preview delay; we could
change this in the future.
- Don't try to capture the current state; just use a flag to denote
exit status.
- Show (end) when exiting
Notes:
- The fuzz seed is now derived from the card id and # of reps, so
if a card is undone and done again, the same fuzz will be used.
- The intervals shown on the answer buttons now include the fuzz, instead
of hiding it from the user. This will prevent questions about due dates
being different to what was shown on the buttons, but will create
questions about due dates being different for cards with the same
interval, and some people may find it distracting for learning cards.
The new approach is easier to reason about, but time will tell
whether it's a net gain or not.
- The env var we were using to shift the clock away from rollover for
unit tests has been repurposed to also disable fuzzing, which simplifies
the tests.
- Cards in filtered decks without scheduling now have the preview delay
fuzzed.
- Sub-day learning cards are mostly fuzzed like before, but will apply
the up-to-5-minutes of fuzz regardless of the time of day.
- The answer buttons now round minute values, as the fuzz on short
intervals is distracting.
This is not the way the code is intended to be used, but making it
conform to the existing API allows us to exercise the existing unit
tests and provides partial backwards compatibility.
- Leech handling is currently broken
- Fix answered_at in wrong units, and not being used